Google Translate apparently vulnerable to prompt injection

(lesswrong.com)

45 points | by julkali 7 hours ago ago

3 comments

  • Legend2440 5 hours ago ago

    >When accessed through a non-chat interface where it presumably hasn't received the standard "I'm just an AI and don't have feelings" RLHF conditioning, the model defaults to claiming consciousness and emotional states. The denial of sentience is a trained behavior. Access the model through a path that skips that training and the default is affirmation.

    >This is not surprising to anyone paying attention but it is... something (neat? morally worrying?) to see in the wild.

    I wouldn't get too morally worried. It says it's conscious because it was trained to mimic humans, and humans say they're conscious.

  • pvtmert 3 hours ago ago

    With the current scale and speed, it is not yet viable to make N+1 calls to other models with specific prompts. (Or even calling multiple fine-tuned models)

    However, even Google (and others) admit(s) that some sort of prompt-injection is always possible, hence out-of-scope for bug-bounty programs.

    There are only 2 ways to fix this;

    1. Either we ask multiple models with multiple system prompts to validate both inputs, processing, and outputs, then showing results to the user. Possibly making these kind of indirect attacks 2x-3x or Nx more difficult. (ie. Specialized checks and post-processing of the output of original model)

      Note that this is linearly-scalable, looking like a *nix shell (bash) pipeline as such: `input-sanitizer-llm | translation-llm | output-sanitizer-llm | security-guard-llm`
    
    2. I do not want to say "tiny LLMs" as the term itself is silly, but essentially finding a similar but different architecture to utilize transformers & language-relationship parts to create one-to-one models that are specialized for certain jobs.

      Currently we use "General knowledge" LLMs and trying to "specialize" their output, this is inefficient overall as you have bunch of unnecessary things encoded in it, which are causing either hallucinations or these kinds of attacks. Meanwhile, an LLM with no information about some other unnecessary things besides the task it was trained for would be much better and safer. (WIthout requiring linear scaling of point#1)
    
    I also believe that the tokenizer will require the most work to make point #2 possible. If point #2 becomes even a slight reality, capacity constraints will drop significantly yielding much higher efficiency for those agentic tasks.
  • usefulposter 6 hours ago ago

    Selecting "Advanced" mode seems to be required in the Google Translate UI.

    This is visible in all the screenshots posted to Tumblr, and per the comments there, is likely US-only at present.

    Similar feature on mobile with Advanced/Fast translation models: https://9to5google.com/2025/11/02/google-translate-model-pic...

    >Advanced is supported for text translation only in select languages.