How to translate tricky formats
Over the past few years, the variety of different text types and formats that require translation has grown immensely — from website translations with all sorts of different CMS to brochures, apps and video subtitles. The list of formats and their use is constantly evolving.
The good news is, where there is language, there is a solution for its extraction.
Nowadays, computer-assisted translation (CAT) tools that most translation companies and freelance translators use allow to work with a variety of different formats. This means you are not limited to Microsoft Word, Excel, and PowerPoint but can instead send files in different formats to your translation partner.
There are, however, a few handy tips and tricks you might be keen to know when it comes to sending different formats for translation. Some formats are a little more challenging to work with than others and require more pre-processing to make them suitable for translation. For example, did you know that the first step of video translation is always transcription? This is because every translation first needs a written script in a particular source language to then proceed with translation into other languages.
In this post, we will talk about file formats beyond the most popular ones for:
- Desktop Publishing (DTP)
- Metadata
- Subtitles
Most of these formats do not require any additional pre-processing to make them suitable for translation.
Desktop Publishing (DTP)
- .vsdx
- .indd, .idml
- .ai
Desktop Publishing (DTP) software is used for creating the design layout of a publication, the typesetting and layout of text, image editing, and the pre-press preparation of the original layout. DTP is used to create the layout for a wide range of different publications, including magazines, brochures, flyers, etc. The market-leading DTP programs include Adobe Illustrator, Adobe InDesign and Microsoft Publisher.
Modern CAT-tools accept various DTP export formats, which eliminates the additional work of extracting text from DTP documents into an editable text format like MS Word. Working directly with an export from DTP software allows users to keep the original formatting of the file.
BONUS: This also helps avoid extra work of copy-pasting the translated text back into the original file.
Metadata and Markup Languages
- .xml
- .html
- .json
- LaTeX
- .xliff
Markup language is a text-encoding system consisting of a set of symbols inserted in a text document. It is used to control its structure, formatting, or the relationship between its parts, and to convey information about its display.
In simple words, markup languages are used to encode “data about data”. This makes it suitable for storing certain types of data, for example databases and ontologies. The most popular formats here are xml, html and json.
Since markup languages govern what information may be included in a document and how it is combined with the content of the document, it allows easy filtering of the data. This may be useful in cases where not all the data contained in an xml or html file needs to be translated. Instead of manually excluding such text, a simple list of rules can be used to filter out the unnecessary information.
Another example of a markup language is LaTeX. Unlike xml, html and json, LaTeX is used primarily to produce technical and scientific documentation. The writer uses markup tagging conventions to define the general structure of a document, to stylise text throughout a document (such as bold and italics), and to add citations and cross-references. Not only does LaTeX allow to create well-designed documents, but it also gives users the ability to implement complex typing elements such as mathematical expressions and formulas, tables, graphs, and bibliographies very quickly, getting consistent markup across all sections. Strictly speaking, LaTeX is not a format, but its code can be used to extract only translatable text without breaking the other elements.
Finally, xlf and xliff are formats that are native to CAT-tools used by translators. They are based on xml and are used to store data for localization and translation.
BONUS: Some website hosts provide the option of exporting website content directly in the xliff format. This speeds up the exchange between the client and the translator and eliminates extra steps in the translation process.
Subtitles
- .srt
- .ttml
A lot of material in the corporate world and beyond is now presented in video format — internal training videos, marketing videos or even a personal YouTube channel. This makes it possible to reach a large audience. However, subtitles can help expand this audience, no matter what the source language of the video is. Subtitles can also help in situations where users may be watching videos without sound such as when they’re scrolling on LinkedIn or other platforms. Of course, they also make audiovisual material more inclusive for those who are hearing impaired.
With the help of cutting-edge technologies, one can transcribe hours of audio and video within minutes. This significantly speeds up audio/video processing and allows us to deliver the finished translation to you in the shortest possible time. There are both human and automatic machine-assisted transcription solutions for the videos that are not yet subtitled, as well as translation of transcripts and subtitles.
BONUS: When it comes to transcribing audiovisual material there are a few handy tricks that can significantly improve the end result. You can learn more about them here in our existing blogs about transcriptions.
We hope this little guide has helped answer some of the main questions you may have had about translating tricky formats.
If you’d like to run a specific file format by us and check how to best translate it, our translation pros will be happy to help!
-
Desktop-Publishing
Formats
Metadata
subtitle translation
translation