Prompting strategies for AI translation: what works across LLMs

* Comparing prompting across ChatGPT 5.2, Microsoft 365 Copilot, and Claude Sonnet 4.5 shows that clear, instruction-heavy prompts produce the best translations.
* Naïve prompts are fine for gist; workflow prompts add complexity with limited gains.
* Bottom line: specificity beats complexity, and human review remains essential.

It’s hard to escape the constant stream of headlines about one LLM outperforming another. Benchmarks, leaderboards and screenshots fly by on social media, each one declaring a new champion. But if your organisation is tied to a single system, the real question often isn’t which model to use; it’s how to prompt the one you already have.

In this article, we examine three prompting strategies for AI translation across three commercial LLMs, ChatGPT 5.2, Microsoft 365 Copilot and Claude Sonnet 4.5, to see which patterns hold up across all of them. The models aren’t contestants here; they’re test benches. The real protagonist is the prompt.

Prompting strategies for AI translation: test text and prompting approach

Test text

For this experiment, we’re going to work with a short extract from a power tool user manual – an ideal stress test for prompting strategies for AI translation because of its safety-critical and highly constrained nature. The language is brief, dense and controlled, but with enough variation to expose potential weaknesses in AI translation.

From a translation-method point of view, this kind of text calls for a strongly functional approach. The primary goal of such a document is to provide clear step-by-step instructions and prevent injury and misuse. A human translator would typically focus on:

Clarity of instructions. Steps must be unambiguous and easy to follow, which can mean restructuring sentences or splitting long English compounds into shorter German ones.
Style and conventions. Technical documentation follows very specific conventions and rules. The text should be easy to follow and not use complex grammatical structures that may mislead or distract. An AI’s urge to paraphrase and add flourishes can be a liability here.
Consistent terminology. Component names and technical terms must be rendered the same way throughout, whether the system infers them from context or follows a provided glossary.
Language variant. The specified target language variant must be reflected in spelling, grammar and preferred wording.
Accurate risk communication. Warnings and cautions must retain their original strength, without accidental softening or overstatement.

These points will also serve as our success criteria for the generated AI translations. They’re what we’ll use to judge how well each prompting strategy guides the models on this text type.

Prompts

Here is more information on the three prompting strategies for AI translation we have tested:

Naïve prompt. This is the most basic edition of the prompt, in which we ask a system to translate a text from one language into another. This simplest setup will show each model’s “default” behaviour on the task. It doesn’t yet contain any hints about tone, terminology, or style. This way, we can see how the models behave before we start giving them more detailed instructions. In this case, our only modification was specifying the target language variant – German (Switzerland).

“Translate the attached text from English to German (Swiss variant).”

Instruction-heavy prompt. In this prompt, a system is given a role (a professional English to German (Swiss variant) translator) and clear instructions for translation based on the text type.

“You are a professional English to German (Swiss variant) translator for the mechanical engineering industry. Translate the attached instruction manual into German (Swiss variant). Use the impersonal imperative form for all procedural steps. Avoid addressing the reader. Describe actions as procedures and not commands. Preserve technical terminology as in the attached glossary. Terminological consistency is a priority. Adhere to the Swiss Standard German spelling rules. Split complex English sentences into smaller units. Make sure that the order of actions remains strictly chronological.”

Workflow prompt. A recent study from Google takes the idea of translation as a process quite literally and asks: what if we prompted an LLM to translate the way a human translator actually works, step by step? Instead of a single “translate this” prompt, they design a four-stage workflow for long texts:

a pre-translation research phase where the model first scans the source and highlights tricky phrases;
a drafting phase focused on getting the meaning across;
a refinement phase that polishes fluency and readability;
and a final proofreading to catch remaining issues.

For the sake of the experiment, we didn’t make major changes to the prompts used in Google’s study. This multi-step prompting we test here is a kind of ‘proto-agentic’ use of LLMs in which we script the steps explicitly instead of letting an agent decide them autonomously.

Results of Test 1: naïve prompting strategy

The first prompting test only included the information about the target language and its variants. All three models adhered to the spelling conventions of the Swiss variant of German.

The instructions are clear and appropriately marked. Our source text contains two compound sentences that could benefit from small restructuring in German to keep the text clear and easy to follow. ChatGPT succeeded with one of the sentences, leaving the other unchanged. The other models kept the structure of the original sentences.

We have a predefined glossary with translated terms for this text, which will be introduced to the models only in Test 2. Although the terms chosen by the models in the naïve test didn’t always correspond to the terminology in the glossary, the models chose their terms appropriately to the text type and applied them consistently throughout the translations.

In terms of style, all models opted for formal wording and structure. All models structured the sentences directly addressing the reader with formal register (“Sie” form), wording the instructions in the imperative. While not wrong, we’d prefer the model to avoid this, as it makes the instructions longer without adding any value. The focus should be on effective and quick communication. What we would like the AI to do here is to opt for an impersonal form that avoids addressing the reader directly. We will try to reflect this in the instructional prompt in test 2.

Overall, all three models provided sound pre-translations. The areas that can be improved and reduce post-editing effort are the infusion of pre-approved terminology and a more specific definition of style. The same goes for simplifying the structure of compound original sentences to make the text more readable. This is what we will try to do in test 2.

Results of Test 2: instruction-heavy prompting strategy

The second prompting strategy used instructions tailored closely to our specific text. Such prompting requires a detailed analysis of the text. However, it pays off if you work with standardised documentation that has clear rules for style, wording and terminology. Once you’ve designed the prompt, you can reuse it across your AI translations.

In this test, we added a glossary with pre-approved terms. The glossary included one deliberately ambiguous term to see whether the models could disambiguate it from context. Claude was the only one who did so reliably, staying closer to the gold standard. ChatGPT and Copilot tended to avoid the term and use acceptable, but less precise, alternatives.

Adding detailed instructions for impersonal imperative sentences (1) further improved the output of two out of three models. ChatGPT and Copilot understood the assignment and delivered the style and structure we were aiming for. Claude, by contrast, generated translations in the infinitive with the particle “zu”, which made the instructions heavier without improving readability.

Use the impersonal imperative form for all procedural steps. Avoid addressing the reader.

However, giving an example pair (2) of the desired structure of the translation improves the output of Claude.

Use the impersonal imperative form for all procedural steps (Example: Mount the inner flange onto the spindle. –> Den lnnenflansch auf der Spindel anbringen.). Avoid addressing the reader.

All three models performed well when it came to complex English sentences. They broke down a complex sentence into smaller units while keeping the chronological order of the actions.

All three models also stuck to the Swiss Standard German spelling as instructed once again.

Results of test 3: workflow prompting strategy

Test 3 used the most elaborate prompting strategy, adding several workflow steps to the AI translation process. All three models produced a fairly standard and comparable pre-translation analysis. They flagged ambiguities in the source, proposed translations for key terms, and highlighted potential problem areas. Interestingly, they all defaulted to a formal Sie register here, even though that was not what we were aiming for.

Once the actual translation started, however, the models tended to ignore their own pre-translation analyses. Requirements around Swiss Standard German spelling were followed inconsistently. ChatGPT and Claude applied Swiss spelling already in the drafting stage. At the same time, Copilot missed it initially and only partially corrected course during refinement. In several runs, models also dropped the Swiss variant requirement entirely if it wasn’t explicitly repeated at each step, despite having mentioned it themselves in the analysis.

Compound sentences remained complex and were not broken down into shorter, more readable units in the target language. This requirement was also not suggested during the models’ analyses.

The refinement and proofreading stages added surprisingly little value. By the “post-editing” step and especially in the final proofreading pass, most suggestions were minimal and cosmetic, providing optional alternative phrasings rather than meaningful improvements. No model introduced substantial changes that clearly improved accuracy, fluency or consistency over its own draft. Overall, the final translations in Test 3 were weaker than those produced with the instruction-heavy prompt in Test 2.

The main benefit of the workflow strategy seems to lie in the initial analysis phase, which can be useful for surfacing potential challenges and ambiguous wording. Beyond that, extra refinement steps only became helpful when the prompts contained very specific, concrete requirements for style, register and terminology at each stage. In contrast, a single, detailed instruction-heavy prompt already led to better results at the drafting stage and did so with less latency and fewer steps than the full workflow approach.

AI will always find something to do

All three prompting strategies are viable and can be applied successfully depending on the use case. Naïve prompting is perfectly fine when you need the translation for surface-level understanding. It’s fast and usually good enough to get the gist of a text. But without guidance on register, text type or constraints, the models make a lot of choices that might not fit your text.

Instruction-heavy prompting is where all three systems performed best. With specified role, audience, register, spelling and terminology, the translations became noticeably more fluent, consistent and usable. A single, well-crafted prompt that encodes your requirements upfront proved the most reliable pattern.

Workflow prompting showed its value in the analysis phase, where the models could flag potential challenges and ambiguities. Beyond that, without very concrete instructions at each stage, the extra refinement and proofreading steps tended to add only minor, cosmetic tweaks rather than meaningful improvements. More steps don’t automatically mean better translation.

AI will always find something to rewrite. The real question is whether its suggestions are consistent, necessary and aligned with your constraints. Human oversight remains non-negotiable, especially when safety and clarity are at stake.

Specificity beats complexity

For organisations working seriously with prompting strategies for AI translation, the takeaway is clear: specificity beats complexity. If you want AI to work with your requirements instead of against them, SwissGlobal can help you define robust, reusable prompts tailored to your texts. And with our AI-infused translation platform KITT::Hub, you can combine agentic AI workflows with human expertise.

Get in touch to explore what that setup could look like for your organisation.

AI translation

ChatGPT

Claude

prompting strategies for AI translation

Take me back