Migrating from one TMS to another seems to scare a lot of us. The most common worry for such migrations is the leverage loss. Will I be able to move my translation memories? Will my 100% matches still be 100%? Will it be a lot of work to get it done properly?
We recently helped a client migrate GlobalSight TMS in an effort to automate their web translation workflow using CMSwithTMS and the migration went very smooth. I will share some data on the leverage tests that we have done after migration.
Note: This post will be focusing on TMs and leverage during migration. Obviously migrating translation management systems involves a lot more than just TMs. If you are interested in other areas of migration, drop a comment and I will cover those in another post.
Moving translation memories
Exporting translation memories from SDL and importing to GlobalSight was mostly a hassle free operation. The translation memory we worked with contained about 80K segments which would suggest roughly 1-1.3 million source words. Exporting such data results in a TMX file of 200MB. If you are working with a large file like this, it is always a good idea to extract a few translation units (~1K) and do an import test. This will give you a good idea about the possible import problems that you may encounter.
The only issue we encountered was the different locale codes that SDL TMS and GlobalSight use for Latin American Spanish. We had to replace the SDL version “ES-XL” with GlobalSight’s “ES-LX”. Though this is a simple search/replace operation, since the file was too big to process with a visual editor, we have processed the search/replace using a pearl script.
Uploading the 200MB file through the browser did not cause any problems. Importing went well and the only warning messages we received was about empty translation units which we simply ignored. Here is the TM figure in GlobalSight after the import:
Configuration and tuning
There are two main items that need to be configured:
Segmentation rules: Matching segmentation rules between the two tools is crucial. Export the SRX segmentation rules from SDL and import it to GlobalSight. SDL exports the SRX rules in the old format (1.0) while GlobalSight only accepts the 2.0 format. Some minor tweaking is necessary to get it to work but that is fairly easy. See this page for the details of the SRX specification.
Translation memory profile: This requires some playing and adjusting. Type, case, whitespace and code sensitive leveraging penalties are especially important. We gave 1% to case and code and left the others without penalties. Selecting “Leverage Default Matches” under the general leverage options also helped improve the numbers.
Testing the leverage
We imported a large file to both tools (~35K words) for testing the leverage. First image is from GlobalSight and next from SDL:
Note that SDL’s Fuzzy is broken in to different brackets in GlobalSight. Total word count is also different for the same file but this should not be a surprise to anyone from the localization industry. We all know that no two tools count the words the same.
This looks pretty good. There is not much of a difference in the leverage numbers between the two tools. The reason for the minor difference is mostly due to unclean translation memory entries that SDL TMS had created. These could be cleaned up but we decided to ignore since the loss is minor.
Conclusion? Migrating from one TMS to another may not be as scary as it sounds!