Translation memory cleaning and maintenance

Translation memories (TMs) reduce translation costs and turnaround times for frequently updated content. TMs streamline the process by helping identify all new content and auto-filling unchanged translations from prior versions.

Who benefits most from translation memories?

  • IT and manufacturing clients who need periodic updates of technical sales information, manuals, and software.
  • Insurance clients generating and translating plan documents in order to meet regulatory requirements.
  • Any global businesses translating marketing and sales content across digital platforms (print materials, websites, mobile apps, video).

Our service to on-going clients includes translation memory maintenance.

However, onboarding a new client might include a focused translation memory cleaning to ensure cost-efficient services. Keep reading to learn about how TMs are created, cleaned, and maintained.

How is a TM created?

At the start of a translation project, project managers use a computer assisted translation (CAT) tool to break the source text down into segments. A segment is a discrete chunk of text. It might be a short sentence, a clause, or a page heading.

The CAT tool displays the segments in a two-column interface so that a linguist can view the text in the source column and enter the translation in the target column. When the job is completed, the paired source-target segments are saved in the TM.

A TM can also be created after the fact, by uploading source and target documents and running an “alignment” to create source/target pairs. This works best with structured documents like contracts, policies, manuals, and the like. If a client comes to us for updates to previously translated content, we can leverage prior translations by running an alignment to create a TM and identify new content for translation.

How does a translator use a TM?

When a previously translated document needs an update, a translator uploads the new source document into the CAT tool along with the appropriate TM. The CAT tool scans the new document for matches and auto-completes whatever hasn’t been changed. Although every match still needs to be checked by the translator, the TM speeds the process and reduces the cost.

Problem: inconsistencies in the source texts

In order for a TM to be useful, a client’s source text needs to be consistent. In the best-case scenario, technical writers created the source using controlled language and technical writing tools. They will have followed the company style guide and used consistent technical and brand terminology.

In the real world, different departments within a global business don’t always coordinate with each other. For example, if one department uses the term “employee” and another uses “team member” the translator’s tool won’t recognize a full match. Instead, it will serve up a partial match (if the rest of the sentence is the same) and save more than one version of the same segment after the translation is complete. Over time, the TM will end up saving multiple translations of what is essentially the same phrase.

Problem: Inconsistent target texts

It’s not uncommon for businesses to localize in a piecemeal fashion, with each department working with a different language partner. Structure and vocabulary in translated target texts will vary. A simple piece of source content like a company history might have been translated differently with each project. When the business decides to centralize and streamline the localization function, there will be multiple TMs (or multiple versions of the same segment in one TM) and it will remain unclear which is the “right” one.

In these situations, a TM needs to be cleaned up in order to maximize its value.

Cleaning a TM

Efficient TM maintenance automates some initial steps, then engages human linguists as editors.

Step one:  reduce volume

TMs store more than simple source/target segment pairs. They also store metadata about each segment: when it was added to the TM, a history of who edited it, and when it was last used and for which jobs. Before starting the substantive part of TM maintenance, the first step is to prune it down. Sorting out the segments that haven’t been used in years is a logical first step. We usually automate this process.

Step two: remove inconsistencies in source segments

After stripping out segments that haven’t been used in a while, best practices require normalizing the segments that remain. Automated processes can also identify and consolidate similar segments with slight differences (“did not” vs. “didn’t” for example). A style guide will help establish standard usage. Metadata can also help determine which segments can stay. For example, you can prioritize by recency or by translator.

When we create a new TM by aligning source and target texts for a document the client had already translated, we also search for and resolve internal inconsistencies. Even if the source text used the same phrases consistently, they may have been translated differently throughout the target.

Step three:  Correct inconsistent terminology

After cleaning the translation memory using automated processes, we review the terminology. Any number of different QA tools (we use Xbench) can generate a terminology consistency analysis. Using an approved term base or bilingual glossary as a reference, translators edit target texts. With particularly large volumes, a team of translators can work together in a cloud-based tool to ensure consistency.

Step four: keep it clean!

A clean TM means better rates and faster turnaround times. Your language services partner should be following best practices for TM maintenance, but they’ll need your help:

  1. Maintain consistency in source documents by using controlled language and glossaries and following style guides.
  2. Don’t make preferential changes to approved technical texts. Preferential changes are edits that reflect a reviewer’s subjective preferences but do not change the meaning of the text. These can cause more harm than good because they create inconsistencies which will require additional review.
  3. If you do make changes to published target content, share them with the LSP. Always submit your reviewers’ changes for reconciliation. Also, the TM should include metadata to identify certain segments as client preferences, especially if linguists don’t agree with the changes.

Investing in a TM cleanup

Even with inconstancies floating around in their libraries of localized content, you can’t always make the case for investing in a TM cleanup project. Sometimes this is because management sees the localization function as a cost center rather than a revenue generator.

However, if your business translates high volumes with some regularity, and localization is not centralized, translation costs become harder to estimate and contain. TM cleaning and consolidation allows you to make the best use of your existing multilingual content and helps streamline future projects.

One way to proceed is to start with one language, then assess the ROI of the effort in terms of higher discounts, faster turnaround times, and overall brand consistency. You can then decide whether to move forward with TM cleanup across all the languages you use in your global business.