en

Many thanks!

We have received your enquiry and will be in touch as soon as possible.

Language Data Management in Localisation: What it is and why we need it

The localisation industry handles immense amounts of data every day. This creates a great opportunity for localisation leaders to explore a new avenue into AI leadership. Language industry researcher Konstantin Dranch is at the forefront of turning this opportunity into a more concrete concept: a new job title called Language Data Manager. In the vast sea of translation memories, termbases, recordings of interpreted meetings, subtitled videos and documents in multiple languages, there is a niche for a role and tools to orchestrate data created by localisation. But how exactly would this look like in an actual work setting? Konstantin explains.

Creating the role of a Language Data Manager (LDM) is the first step to actively utilising the data resources generated by localisation teams. In what way, shape or form does the role of the LDM currently exist?

Today, the LDM function is carried out by tools specialists inside language teams. We are talking about people who used to run WorldServer, Trados, XTM, Memsource, memoQ and so on. Basically, anyone who deals with file engineering would be taking over these responsibilities. Sometimes, there are machine translation experts dedicated to the work, and they naturally also swipe data under their wing. If they are movers and shakers, they try to find enterprise silos of language to continuously keep driving innovation.

Bilingual texts not only exist as a translation memory but very often in native unaligned formats: PDF, Word, XML. Once you go after the pockets of data that is not “yours”, i.e. not created by the translation team, you venture into LDM territory.

In contrast with language teams, IT departments inside the same organisations have an easier time taking what they want – they just hire dedicated data engineers, parse, scrape and reprocess anything they can find.

What kind of “people” will take on the role of LDM in the future? What skills will they need to perform well as an LDM? And what kind of problems will they be solving in 5 years’ time?

As with most jobs, there are two sides: the technical competency, and the emotional intelligence.

The technical aspects of the work evolve around Natural Language Processing (NLP). How do you align at scale? What is the metadata structure? Which scripts can you run to ensure the quality of the data in the pipeline? These are just some of the crucial questions.

Human aspects on the other hand are all about advocacy. Convincing other departments to share data or change workflows and data practices. It’s also about being able to identify needs inside the organisation and addressing them appropriately. Data management is a key driver for diversity and inclusion within an organisation. Presenting the results of LDM work will help achieve this.

Language Data Management done right can add a lot of value to the services an LSP provides to its clients. So how is it done right in your opinion? Which are the key factors to take into consideration when creating the role of a Language Data Manager or even when setting up an entire Language Data Management department?  

At the moment, this function is not very well developed yet, and so it is still agile to serve at various ends. Two possible approaches would be:

  1. For a manager of the localisation team to find a niche job for someone on their team who wants career progression. Ideally a digitally minded person who may have considered leaving language for the IT departments. This way, they can engage with and incorporate data management whilst staying in the language department.
  2. To add an LDM function to make a play for more IT projects and power inside the organisation. This makes particular sense inside organisations that want to nurture and promote new digitalisation leaders.

Regardless of the approach, a key factor should always be to set impactful goals. If an LDM brings about a 5% improvement to the localisation effort, the impact is not very strong. However, if an LDM opens pathways to digitalisation projects that work with user-generated content, ones that generate conversations, ones with compliance and growth, that’s a powerful story.

An important aspect about LDM is that data is used legally. Which steps need to be taken to guarantee that data is always handled both safely and legally?

The data sources must be tracked and the metadata with legal information and distribution rights maintained as the data travels. Imagine you have confidential data translated in a cloud-based tool, and the information falls under a signed NDA. In a real-world scenario, it is highly likely that nobody inside the organisation will know that there is a translation memory retaining confidential information. So even if the original document is wiped from the drive, the translation will survive. It will sit there quietly until one day someone needs data and just downloads the file from the repository, creating risk and legal liabilities for the whole organisation. Until there is some kind of blockchain technology tracking the journey of the data throughout its lifecycle, there is no easy way to navigate this.

Since there is a lot happening around the LDM topic at the moment, what are your predictions on LDM? Where is it headed and what will its future look like?

  1. LDM will become an established function similar to how diversity and inclusion have become established corporate functions.
  2. There will be a range of tools aimed at LDMs. For example: more and better parsers to steal data from online sources. The internet is a great source of training datasets and hosting data management platforms.
  3. Communities and repositories like Hugging Face, ELG, ELDA and ELREC will only become larger and more professional.
  4. There will be bigger data marketplaces. If LDMs can buy, people will make shops for them.
  5. LDM will dedicate more focus on compliance laws and tools.