What's new in this release
An overview of the new features and functionality of this Language Weaver Edge product release.
Version 8.6.5
Language Weaver Edge 8.6.5.0 is a minor feature release, introducing the support of automatic speech recognition, new adaptation modes, Edge-Cloud scalability enhancements and multiple fixes.What's new
- Integration of an automatic speech recognition (ASR) module
- By integrating an open source ASR module, it is now possible to process audio content with Language Weaver Edge and automate the transcription and the translation of offline audio files
- The ASR module is an optional module that requires a separate installation. The ASR module should be installed on all worker hosts where a transcription engine will be enabled.
- Once the ASR model has been deployed on a host, it is possible to create an audio transcription engine.
- The Audio Transcription Engine provides access to all languages supported by the ASR model.
- Language support
- The supported languages are the ones supported by both Language Weaver Edge and the open source ASR model (OpenAI Whisper).
- Current list of supported language includes: Arabic, Armenian, Azerbaijani, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Malay, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese.
- Not all languages have the same transcription quality and may present different word error rates (WER). Please consult the OpenAI Github Whisper page to learn about the evaluation of models and the estimated WER per language.
- Each audio transcription engine will require a supported GPU for maximum performance.
- 5 GB of GPU RAM is required for each ASR engine.
- 2 CPU cores minimum are required on the host.
- Average transcription Real-Time Factor is 0.16 (it takes on average 1 second to process 6 seconds of audio content).
- Audio transcription engines and training engine cannot share the same GPU:
- Training engine requires exclusive access to the GPU. Therefore, it is not be possible to run both a training engine and an audio transcription engine on the same host using the same GPU.
- Multiple audio transcription engines can, however, share the same GPU.
- Audio transcription engine also supports CPU, but the transcription will be slower compared to using a GPU.
- Minimum hardware requirements per audio transcription engine when running on CPU only are:
- 16 CPU cores
- 16 GB of RAM
- On the above CPU configuration, average transcription Real-Time Factor is 3.2 (it takes 3.2 seconds to process 1 second of audio content).
- Minimum hardware requirements per audio transcription engine when running on CPU only are:
- Audio transcription does not require a separate license and can be enabled with any existing 8.6 license, assuming that processing units (PUs) are available from the license.
- Each transcription engine requires 1 PU to be allocated from the Language Weaver Edge license
- It is possible to have multiple translation engines running in parallel on separate hosts. Each engine requires one PU to run.
- Adaptation improvements
- It is now possible to adjust the generic data mix that is performed during adaptation to allow customers to choose between limiting generic domain degradation, maximizing in-domain improvements from the training data or having a good balance between in-domain improvements and generic domain degradation.
- Adjustment is performed in the UI using a slider which is now available during the adaptation process (either manual or auto adaptive).
- The three different adaptation modes are:
- Generic: With this mode, the adapted model will retain more of the generic nature of the baseline LP. This is the recommended mode for use with multiple domains, especially the one covered by the training data. Note that this mode was the default mode introduced in Language Weaver Edge 8.6.4, which requires longer adaptation time (GPU is then highly recommended).
- Balanced: This mode offers a good balance between generic and domain-specific content (new default in Language Weaver Edge 8.6.5).
- Domain Specific: Using this mode, the adapted model retains more of the domain-specific training set. It is recommended for use with a single domain that matches the provided training set.
- New adaptation modes are available for both manual and auto adaptation.
- Edge-Cloud improvements
- The Edge-Cloud service has been enhanced and optimized for scalability.
- Configuration is simplified: it is now possible to access all the language pairs combinations available in Cloud without the need to create Edge-Cloud engines for each LP in Language Weaver Edge.
- Only a single Edge-Cloud engine is needed, optimizing hardware resource consumption on the Language Weaver Edge hosts, therefore reducing TCO.
- Multiple Edge-Cloud engines can be enabled on different hosts to offer load-balancing and high-availability.
- Edge-Cloud engine hardware requirements are:
- 1 CPU core
- 1 GB of RAM
- Edge-Cloud engines do not require any PUs, but a Language Weaver subscription is required.
Enhancements
- Dictionary enhancements
- For language pairs that support fluent terminology, it is now possible to disable it at the term level in dictionaries. This can be useful if you want to force the model not to inflect terms and produce an inflection in the output. When disabling fluent terminology, the term is handled in the same way it was handled with a non-fluent terminology LP.
- It is also possible to enforce case matching at the term level. This only works for LPs that do not support fluent terminology or for terms where fluent terminology is disabled. When enabled, the dictionary term will only be matched when the exact casing is found in the source.
- Monitoring enhancements
- Added a new language pair WPM chart that aggregates the WPM for all TEngines of the same LP or source and target language.
- Added an WPM - Total row on the Activity table to show the real-time aggregated WPM of the entire cluster.
- Added all available engine types to the Deployments table.
- Changed the default charts shown to Translation Engine WPM and Host Memory Usage.
- When hovering on a specific chart label, automatically hide all other plots on the chart to easily and quickly focus on a plot of interest.
- Added new Prometheus metrics for document count, word count and character count, for successful translations. These are available in the REST API /api/v2/metrics endpoint.
- Added support for BMP format. BMP format is now supported for image translation.
- Added ability to show/hide masked passwords in UI edit fields.
- Added tooltip explanations for PDF smart selection options.
- Added a new List Dictionaries role that allows users to list and use dictionaries even if they don't have the View Dictionaries permission.
- Added an Edge installer option to make installation of the ABBYY PDF Converter optional.
- Added digital signatures for new Windows LP installers.
- Improved handling of bold/italics words for MS Word files in Asian languages.
- Accessibility improvements for keyboard navigation and other miscellaneous updates.
- Kubernetes configuration of translation engines with 0 minimum PUs assigned is now supported.
- Kubernetes Helm charts to support the OpenShift platform added.
- Kubernetes enhancement to allow changing the UID and GID for the mtedge user of the Docker images.
Fixes
- Fixed Arabic RTL text alignment on Feedback editing page and in Microsoft Excel spreadsheets.
- Fixed language detection of Chinese Traditional text.
- Fixed XLIFF translation failure when the file contains a starting UTF-8 Byte Order Mark.
- Fixed SMTP sender reset on upgrades.
- Fixed UI Translation History sort order to be stable even when changes are made to the Translation Settings panel.
- Fixed UI partial highlighting of feedback entry when the target word contains a dash.
- Fixed unexpected new browser tab that would get opened when a user clicked to download a translation.
- Fixed Kubernetes host management to be less sensitive to slow operations that would affect the reported engine statuses.
- Fixed rare crash when API Gateway process is started due to a race condition corruption when updating the configuration.
- Fixed occasional PDF translation failure on Windows when processing multiple PDFs in parallel.
- Fixed improper escaping of special password characters entered during the Language Weaver Edge installation.
- Disallow users for managing higher-privileged roles.
- Correctly detect newer OS in the generated myhost.json license profile.
- Security fix to prevent Open Redirection.
- Security fix to prevent user enumeration with failed credentials.
- Security fix to update 3rd party libraries to newer versions: RabbitMQ, OpenSSL, Tesseract, LibreOffice, etc.
Deprecated features
- Language Weaver Edge is no longer supporting operating systems that have reached end of life:
- CentOS 7
- CentOS 8
- Ubuntu 14.04
- Ubuntu 16.04
Version 8.6.4
Language Weaver Edge 8.6.4 is a patch release, introducing multiple fixes and the support of Alternative Translations.What's new
- New Alternative Translations are now available in the Translate tab
- Alternative Translations are displayed as a pop-up on the target segments in the Translate tab
- Administrators can globally enable or disable this option from Manage > Settings
Enhancements
- Add document watermark support for Excel, PowerPoint, HTML, XML, TMX, and XLIFF
- Add character-count indicator on the Translate tab and an Admin option to control the maximum allowed (default = 5,000)
- Prioritize feedback over other dictionary entries
- Add an option to automatically start an engine after its creation
- Add an Edit modal option to edit the running mode of an existing engine
- Set the default translation engine mode to CPU Optimized for Quality mode
- Add Edge-Cloud modal option to allow setting the connection region: Europe or North America
- Add TLS certificate expiration notification option
- Add installer option to skip database encryption (disabled by default)
- Add installer option to disable automatic starting of the Edge service after the installation
- Add a profile icon on the Translate tab to indicate when profiles are in use
- Add new columns to the Users page to show Last Modified On, Last Modified By, Last Login On, and Last Translation On
- Add option to allow PowerPoint layout direction (RTL or LTR) in the global JSON config
- Improve segmentation for English, German, and other languages
- Support Spanish "abreviaturas dobles" for acronyms to improve segmentation
- Language detection accuracy improvements
- Add REST API Adaptation option to choose the data cleaning pipeline
- Add support for the Summarizer engine on Windows (requires the Summarizer Docker container image) — Beta Preview only
- Increase the maximum number of allowed labels to 50,000 (from 5,000)
- Apply Theme primary color to buttons and cogwheel
- Accessibility improvements:
- Improve the element reading order
- Make buttons in all states (enabled/disabled), and tooltips, accessible
- Make the Title, Language Pair, and Status columns tabbable in tables
- Add field-sets to group-labels
- Add more aria labels to UI elements
- Align engines on the Deployments page to prevent shifting when entering Edit mode
Fixes
- Prevent JobEngine crash that may occur when using the Editor to modify a segment
- Fix Excel file corruption caused by translated sheet names that may lead to broken linked references
- Allow use of Adaptive LPs via Edge Cloud
- Properly apply the Select first dictionary option when enabled
- Prevent hiding of the Suggest button for highlighted feedback entries on the Translate tab
- Apply profiles to auto-detected jobs
- Reload SAML metadata when it changes at the IdP
- Allow SAML uppercase usernames
- Prevent random SAML login errors caused by improper JWT encoding
- Fix HTML WCAG-relevant syntax errors
- Add a button to allow navigating back from the Edit Feedback page
- Improve the email transmission error messages to mention the failed sending address
- Use a consistent naming convention for downloadable files throughout the UI
- Prevent displacement of text on PowerPoint slides
- Fix Edge licensing startup failure when host has an existing OpenSSL v3 installation
- Fix blank translation result when using a local LP chained with an Edge Cloud LP
Version 8.6.3
Language Weaver Edge 8.6.3 is a patch release, introducing several new capabilities, enhancement and fixes, including the Language Pair Profiles Configuration UI, Notifications for Admin and Watermark support for PDF and Microsoft Word documents.What's new
- New LP Profiles configuration UI page
- Allow administrators to define default models, linguistic options, labels and default dictionaries for Language Pairs
- This option was available in Language Weaver Edge 8.6.2 via API only - the new UI allows LP profiles to be configured from the Manage menu in the Language Weaver Edge UI directly.
- New notifications for Admins - banner and email alerts.
- Notification banner and email alerts can now be configured in the following cases:
- Language Weaver Edge license expiration: earing expiration - configurable number of days before expiration
- Low Disk Space on Controller and Worker Hosts - configurable threshold for minimum disk space
- High Memory on Controller and Worker Hosts: Show notifications for high memory usage - configurable threshold for memory limit
- Notification banner and email alerts can now be configured in the following cases:
- Watermark support for PDF and MS Word documents
- Admins can now define a customized watermark (line of text) that will be automatically added to all MS Word and PDF documents translated with Language Weaver Edge
- The watermark will be added in the footer of the document, centered and formatted as Italic.
- New Language Pair Training Queue page section
- The new page allows viewing of queued and in-progress auto adaptations and manual adaptations.
- The new page is available under the Adaptation section.
- Support for GPUs when using Kubernetes Translation and Training engines
Enhancements
- Language Detection Enhancements
- Improved detection of short segment input
- Improved detection of mixed language input
- Accessibility UI Enhancements
- Improved component placement at 200% zoom level
- Improved color contrast between foreground and background components
- Improved keyboard navigability across components
- Adaptation Enhancements
- Added a button to allow manually triggering an auto adaptation for a desired LP
- Security Enhancements
- Validate Language Weaver Edge License format on upload
- RabbitMQ data that is persisted to disk is now encrypted by default
- All Language Weaver Edge databases are now encrypted by default
- Other Enhancements
- Improved adaptation data cleaning rules
- Updated PDF backends for the Standard and ABBYY converters
- Updated licensing detection to support Windows 11
- Improved segmentation for Chinese and Japanese numbered lists
- Add support for 'esl' language code
- Performance optimization when exporting large Reports
- Allow editing a SAML/LDAP users in the Manage->Users page
Fixes
- Prevent Source Area From Locking on Text Drop
- Allow users to create duplicate feedback as long as both are not approved
- Support Fluent Terminology for Edge-Cloud
- Fix Language Weaver Edge startup failure on Amazon Linux 2023 due to missing libcrypt dependency
Version 8.6.2
Language Weaver Edge 8.6.2 is a major update release which introduces the support for the Feedback Editor and automatic data cleaning for adaptations.
The Feedback Editor allows users to easily provide real-time, direct feedback to the machine translation (MT) output by suggesting better translations at document level. The Feedback Editor is natively integrated in the Language Weaver Edge UI, and can be used to view the segments from any translated document, suggest better translations and download the formatted output document. All edits are automatically saved as feedback, and administrators can now define automatic feedback approval rules, based on users roles.
What's new
- New Feedback Editor
- Users can now provide feedback at document level, using the Feedback Editor. When enabled, users have access to a new option to access the Feedback Editor from the Translate Page and from the Translation Queue page.
- Edited segments are automatically added to the Feedback database and will influence future translations and language pair adaptations.
- Support for Role Permissions Customization
- A new management UI page is now available under Manage / Users to customize permissions for each role.
- Access to Feedback Editor and Automatic Approval of Feedback can now be enabled for specific roles using this new page.
- Default permissions can be restored by Super Admin if needed.
- New data cleaning workflow for Adaptive LPs
- Extra cleaning is applied to the supplied training data to remove any Translation Units (TUs) that may contain data that would impede the quality of the adaptation
- A summary of the discarded data is provided in the adaptation summary
- New Document Translation History available on the Translate Page
- Users can now access their latest translated documents from the Translate page.
- When enabled, the latest 3 documents processed will always be displayed on the Translate page and will provide easy access to specific actions such as Download, Delete or Open in the Feedback Editor.
- The option "Display Translation History" will be enabled by default and can be disabled in the Translation Settings.
- A new "View All" option is also allowing users to quickly access their "Translation Queue".
- New adaptation quality evaluation metrics
- In addition to BLEU scores, Language Weaver Edge will additionally compute and report the ChrF++ and Edit Distance (Levenshtein) evaluation scores
- Support for Fluent Terminology
- Dictionary terms are now automatically matched to their inflected terms, and their context is taken into account when applying the translation
- Support for Encrypted databases
- Added an installer command line option to turn on database encryption when desired
- Support for Language Pair Profiles
- Allow administrators to define default models, linguistic options, and default dictionaries for Language Pairs
Enhancements
- Accessibility UI improvements for visually impaired users
- Auto-extract test data from the training set if the manually supplied test does not contain adequate TUs
- The last used dictionaries are now remembered and sorted on the Translate page for each source/target pair
- Improved segmentation of abbreviations to prevent inadvertent sentence termination
- Added word and character count columns to the adaptation UI tables
- Added /api/v2/languages REST API endpoint to allow retrieving a mapping of Language Weaver Edge language pair codes to their standard BCP-47 language tag equivalents
- Windows Language Weaver Edge setup installer is now cryptographically signed
- Support flipping of a PowerPoint layout for RTL languages; enabled via manual configuration only
- Html image 'alt' attribute is now translated
Fixes
- Manually supplied test data was previously not being used during Adaptation, and an auto-generated test set was used instead; this has been fixed
- Improved language detection for CJK languages
- Improved font handling for MS Office files translated to CJK languages
- Password reset via SMTP for local accounts fixed - regression introduced in 8.6.1
- Obfuscate the value of lwe_user_s cookie and remove any identifiable information
- New lines in XLIFF output are now preserved
- Improve thread distribution on Windows: hyperthreading when using more than 8 logical cores would lead to suboptimal translation performance (WPM)
- Cleanup and remove older upgraded Language Weaver Edge versions from the "Apps & features" Windows listing
- Auto Adaptation edit page now shows the test data TU count in the Totals summary
- Temporary data during adaptation is now stored in the Language Weaver Edge installation folder's /data folder for better isolation
- Added a warning when removing a host from an auto adaptive LP deployment list, if the host has the auto adaptive LP deployed to it; and also when deleting an auto adaptive LP if it is actively deployed on a host
- Translation Queue data filtering was not properly respecting the user's time zone and might include results outside the desired range; this has been fixed
- Charts in PowerPoint may sometimes not appear in the translated output; this has been fixed
- When creating a new Adaptation, skip new analysis of adaptation data unless needed e.g. if some of the minor parameters are modified in the UI like changing the model name
- Avoid mid-sentence capitalization for long segments
Version 8.6.1
Language Weaver Edge 8.6.1 is a patch release, introducing multiple fixes, including security enhancements.
Enhancements
- Translate UI Enhancements
- Add text labels to the Translate page icons for copy, download, and feedback.
- Allow selection of multiple dictionaries in the Translate page.
- Show the list of supported translation file extensions on the tooltip for the File Upload button, in the Translate page.
- When the autodetected source language matches the target language, return the original untranslated document and do not report an error.
- Auto Adaptive Language Pairs Enhancements
- Add option to specify the Test Data used to compute the quality score for Auto Adaptation trainings.
- Add a warning prompt if a user attempts to navigate away while a file upload is in progress on the Auto Adaptation page.
- Allow selection of multiple Approval States in the Feedback search filter.
- Dictionaries and Brands Enhancements
- Add option to import new terms into an existing dictionary. The new option to "Import terms" is now available from within the Dictionary or when create a dictionary and is replacing the previous "Import Dictionary option"
- Add an option to overwrite existing brands in the Import Brands dialog.
- Other Enhancements
- Merge the Progress column on the Translate page into the Status column for a more compact view.
- Update generated Reports to format Excel cells as numbers and dates based on the column value type.
- Improve handling of abbreviations in translation source text.
Fixes
- Fix SAML claim validation vulnerability CVE-2022-41912.
- When using TLS, use a secure cookie to track the username.
- Fix possible failure when downloading very large translated files (>90MB).
- Fix crash when translating image files with the language set to auto-detection.
- Allow non-admin user to see the Public Base URL in their My Account page when available.
- Remove the job's UUID from the extracted files in zip translations to make the filenames more readable.
- Show the autodetected language on the Translation Queue page even when the LP is not available.
- Reduce the GPU's memory usage when performing adaptations that might lead to GPU out-of-memory training failures.
- Reduce the number of active threads created by the Job Engine which may lead to thread resource exhaustion and failure, when running with many Translation Engines.
Version 8.6
Language Weaver Edge 8.6 is a major update release which introduces the support of Auto Adaptive Language Pairs. Auto adaptive Language Pairs are language pairs that are constantly learning from feedback submitted by users, translation memories and dictionaries available, automatically, and transparently. Once enabled, the auto adaptive LPs will be available in the Portal and will also be available to use from the API.
What's new
- Support for Auto Adaptive LPs
- Auto Adaptive LPs can be managed from the "Adaptation" tab, under "Auto Adaptation". The "Adaptation" tab was previously "Adapted LPs". We still support manual adaptation of Language Pairs where you can define a specific name, provide a training data set and test set, and this option is now available under "Manual Adaptation".
- Support for Ampere GPUs
- We now support the Nvidia Ampere GPUs (A series). They can be used for Translation and model adaptation.
- Support for Language Pair Adaptation on CPU (Windows & Linux)
- Language Pairs model adaptation is now available on CPU only. If you don't have a GPU, you can still adapt the Language Weaver models by enabling a training engine on a worker with CPU only
- The adaptation based on a training performed on CPU provides identical output as if the training was performed on GPU, but it is slower. While a training on GPU can take between 1 and 5 hours, a training on CPU will take between 1 and 5 days, depending on the size of the training data available.
- The current hardware requirements are 4 CPU cores and 32 GB of RAM available per training engine. Multiple training engines can run in parallel on the same machine, as long as there is enough capacity available.
- Support for Brands with Regular Expression
- It is now possible to define Brands in Language Weaver Edge.
- Brands are collections of proper nouns, product names, or specific terms which appear in the source content but should remain untouched. The brands are then preserved in a consistent manner throughout the translations that users perform.
- Brands are not language-pair specific, which means that brands are applied across all language pairs in Language Weaver Edge.
- Brands support regular expressions so you can define a search pattern to identify specific patterns in the source and make sure it is not modified during the translation process.
- Brands can be managed by Admins, under "Manage/Settings/Brands".
- Brands can be exported and imported and it is possible to disable or enable specific brands.
Enhancements
- Support for New Columns in the Translation Queue
- New columns are available to customize the Translation Queue and display the information which is relevant to you.
- New information available includes: Translation ID, Start Date, End Date, Duration (seconds), Input Size, Source Char Count, Source Word Count, Target Char Count, Target Word Count and Words per minute.
- New highlight capabilities in the Translate UI
- It is now possible to highlight Dictionary, Brand and Feedback in the translation output. By using different color coded highlight, it becomes easy to identify if part or all of the translation output is coming from a specific adaptation capability (dictionary, brand or feedback).
- Reports:
- Add option to export Translation Records.
- Display oldest and latest timestamp of results and filter by language pair type.
- Support for Linguistic options for select LPs: Spelling and Quality Estimation
- Support for MHTML input format translation
- Add Label management options in Manage→Settings→Translation→Labels to allow admins to require translations to be labeled, or to hide the label option in the UI.
Fixes
- Improved space preservation between the source and translate text.
- For Microsoft Office translated documents, override the fonts to better match the target language; e.g. a document translated from English→Hindi will use the Nirmala font instead of Calibri.
- Language detection accuracy improved for short segments.
- Feedback import would fail when adding 1000 or more entries; this restriction has been removed.
- On the Dictionary page, multiple selected dictionaries would cause the download and deletion action to invalidly use wildcard matching when selecting the items.
- Do not require a restart of the Language Weaver Edge service after Chained LPs are enabled for translation using them to work.
- Properly report the Prometheus metric "mte_translation_engine_filled_pus".
- Remove style formatting from text copied from the Target panel of the Translate Page.
- Allow submission of translation jobs when Job or Translation Engines are temporarily offline within a 5-minute window (e.g. when they're restarting).
- Consolidate the Deployed/Deploying/Undeploying adapted LP states with "Ready to Deploy".
- Invalidate the password reset link after it has been used to reset the password once.
- Move Language Weaver Edge version information from page footer to new Manage→Settings→About page for easier access.
- Fix calendar selector widget display for Arabic localization.
- Order the list of returned LPs in the language-pairs API call by priority: Generic, Adapted, Custom, Edge-Cloud, Chained
- Commas in the filename no longer cause the download filename to be truncated.
- Disable coredump generation in the systemd service to avoid potentially filling the disk drive with core fails on error.
- Allow Translation Engines to be considered validly running even if they 0 PUs assigned.
- Enable JavaScript strict-mode for better security.
- Add a privileged command-line parameter, --admin-api-key, to the ets-api-gateway binary to allow resetting user's API keys.
- In the My Accounts page, display the Public URL (if defined in Manage→Settings→General), in the API Base URL.
- Remove REST API restriction preventing adding a dictionary unless an associated Language Pair has been installed.
- When uploading files on the Translate page, properly indicate the maximum file size limit in the error message.