Converting translation memories into spreadsheets and vice versa

translation_articles_icon

ProZ.com Translation Article Knowledgebase

Articles about translation and interpreting
Article Categories
Search Articles


Advanced Search
About the Articles Knowledgebase
ProZ.com has created this section with the goals of:

Further enabling knowledge sharing among professionals
Providing resources for the education of clients and translators
Offering an additional channel for promotion of ProZ.com members (as authors)

We invite your participation and feedback concerning this new resource.

More info and discussion >

Article Options
Your Favorite Articles
You Recently Viewed...
Recommended Articles
  1. ProZ.com overview and action plan (#1 of 8): Sourcing (ie. jobs / directory)
  2. Réalité de la traduction automatique en 2014
  3. Getting the most out of ProZ.com: A guide for translators and interpreters
  4. Does Juliet's Rose, by Any Other Name, Smell as Sweet?
  5. The difference between editing and proofreading
No recommended articles found.

 »  Articles Overview  »  Technology  »  CAT Tools  »  Converting translation memories into spreadsheets and vice versa

Converting translation memories into spreadsheets and vice versa

By FarkasAndras | Published  02/17/2011 | CAT Tools | Recommendation:RateSecARateSecARateSecARateSecARateSecI
Contact the author
Quicklink: http://som.proz.com/doc/3194
Converting a TM into a spreadsheet or a spreadsheet into a TM is a common need; this little guide aims to offer some easy solutions using free software (some of it open source, some of it closed source, all free of charge). The guide is by no means complete, and it's not the ultimate word on the subject; if you have comments or corrections to make, send them to the email address indicated at the end.

TMs come in many shapes and sizes, the most widespread exchange format being TMX. So I will be focusing on TMX, which any CAT should be able to export/import.
I won't discuss reading or writing xls files, for the simple reason that they are interchangeable with tab delimited txt files for our purposes. If you have an xls, you can open it and paste its content into a text editor (such as Notepad) to get a tab delimited file, and if you have a tab delimited file, you can copy-paste it into Excel or OO Calc to get an xls. Always save your txt files in UTF-8 encoding with File/Save as, not in whatever encoding your text editor happens to use by default (which is usually not UTF-8).


1. Converting a TMX file to a tab delimited txt:

1 a) With ApSIC Xbench
Xbench is a powerful and versatile Windows-only tool that can handle many TM and termbase file formats, including TMX, Trados txt memories, TBX glossaries, MultiTerm XML files, TagEditor and SDLX files, Wordfast memories and glossaries etc. It can export all of them into tab delimited txt.
To open a file in Xbench, install and launch Xbench, available from http://www.apsic.com/en/downloads.aspx
Then click Project/Properties/Add../pick your filetype/Next/pick your file/Open/Next/OK/OK.
To export, click Tools/Export items/All items in glossary, Format: Tabbed text file, pick file name).
In my experience, very large files leave Xbench stumped; it will work with files containing 300,000 segments, but not with files containing 1,000,000 segments.
It also tends to ignore everything apart from the text in two languages; if you need metadata (creation date, creation ID, notes) or data in more than two languages to be conserved, Xbench won't be much help. Of course, when it comes to TMX->tabbed txt, I can't think of any other tool that does much more than Xbench, either.


1 b) With a command line tool ("TMX to tabbed")
I wrote a tool that does TMX to tabbed txt conversions on TMX files of any size in any OS. I'd recommend it if you don't have convenient access to a Windows box, or for files that are too large for tools like Xbench. (The reason why I primarily recommend Xbench is that this tool hasn't been tested too thoroughly, so Xbench is probably a somewhat safer option). The converter was updated to 1.3 on 16/02/2011 with many improvements and hopefully all bugs are fixed now. Still, no guarantees.
To launch the tool in Windows, just double click on the .exe. In other OSes, if double clicking on the .pl file doesn't work, open a console window, type the word perl and a space in it, drag and drop the .pl into the console window and press enter. You should get something like this:
perl "/your/folder/aligner/TMX_to_tabbed_1.0.pl"
Then click in the console window and press enter. From there on, just use the instructions in the console window.
The converter is available at http://sourceforge.net/projects/aligner/files/ in the "grab bag".

1 c) With Olifant
I'm not sure if Olifant can export to tabbed txt, but you can certainly copy-paste the contents of your TMX from it.
Download: http://okapi.sourceforge.net/downloads.html



2. Converting a tab delimited txt file to TMX:

2 a) With TMX Maker
This tool of mine runs on all major operating systems and can create TMX files from tab separated txt files of any size. The input file needs to be in UTF-8 encoding, so save it in UTF-8 with File/Save as...
You have quite a bit of control: you can specify a creator ID, the creation time, the language codes and a note that gets added to each segment in the TMX (such as the name of the source document). You can also add the content of the third column of the txt as a note (allowing you to assign different notes to different parts of your file), and you can even generate multilingual TMXes with as many languages in them as you want. This tool has been tested pretty thoroughly. I can recommend it without reservation.
The TMX maker is available as part of the LF Aligner package at https://sourceforge.net/projects/aligner

2 b) With ApSIC Xbench
Xbench can export all the files it can read in TMX. The drawback is that it's a bit dumb; you don't have much control over what creation time, creator ID and other fields get written in the TMX. You have full control over the language codes, though, which is just as well: many CAT tools won't accept a TMX if the language codes are not "right", right being whatever convention that particular tool happens to use. So, export a TMX from your CAT of choice in the language pair in question and check the language codes (such as tuv xml:lang="EN-GB") and make sure you use the same ones in TMXes you generate.

2 c) With SDLX
SDLX (the full, paid version) can also import tab delimited files (and a few other formats) and export them to TMX. Save your TM in a tab delimited file as usual (copy-paste it from Excel into a txt document and save in UTF-8 encoding). As with TMX Maker, the input file can contain a third (fourth, fifth) column containing metadata.
Start SDLX, choose TM/Import a translation memory in the menu on the opening screen, click Create New Translation Memory, pick a location and name, click Next/Delimited format files, Add selection, pick file, Next, choose languages and encodings (don't forget the encoding, UTF-8 is listed as Unicode (UTF-8)), set the delimiter to tabs, add Source, Target and any metadata fields you may have to the "Selected" area, click Next and you're done. If your TM content seems to be OK, export into a TMX and delete the SDLX TM.

2 d) Creating a bilingual word file with Excel
Not a TMX, but it's a bilingual format of sorts. Place your two texts in columns A and B of an Excel spreadsheet. Write this in C1: ="{0>"&a1&"<}100{>"&b1&"<0}"
...and copy it down as far as you have data in A and B. Just grab the bottom right corner of the cell and pull it down. Then copy all of column C to the clipboard and paste it into a Notepad window. Then select all (Ctrl-A), copy to clipboard again, paste into an empty Word document and save.
This solution works if your client insists on a bilingual Word file and you only have a tab delimited file, but generally, TMX is much a better solution. Trados Studio can't generate bilingual word files, only TMX files. So, in principle, if your client wants a bilingual doc and you work in Studio, you could export your TM into TMX, convert it to tab delimited and then use this method to get a bilingual Word document, but I'd recommend educating your client about the wonders of TMX files instead.
Of course, this won't make the source text hidden and it won't make the tags have the TW4Win stlye. Both can be remedied using some smart find and replace in Word, but the document still won't have the original's formatting. Doc files like this should work if you or your client want to clean the Word doc up to get the segments in a Trados TM but not much else. Again, just use TMX instead.


***************************

Well, that's it for this article. Hope it helps. Send comments, corrections, better solutions and bug reports to quca at freemail.hu.


Copyright © ProZ.com and the author, 1999-2024. All rights reserved.
Comments on this article

Knowledgebase Contributions Related to this Article
  • No contributions found.
     
Want to contribute to the article knowledgebase? Join ProZ.com.


Articles are copyright © ProZ.com, 1999-2024, except where otherwise indicated. All rights reserved.
Content may not be republished without the consent of ProZ.com.