Yeah the real problem is that hosted providers basically see everything about you over time. Each prompt alone might seem harmless, but the whole history adds up to a pretty detailed profile.
The way I'd think about it: run a local model for anything it can handle. Keep that stuff on your machine entirely. Then for the stuff that actually needs a more powerful model, don't just dump your whole context in — strip it down to just the specific thing you need answered, like "given X, what's Y" with no surrounding story. The hosted model just sees a decontextualized task, not what's actually going on.
You basically become your own privacy layer. More friction, but the provider never gets the full picture.
The intermediary approach shifts trust rather than eliminates it —
you're trading provider visibility for intermediary visibility.
The more meaningful architectural question is what the intermediary
retains after the request completes.
Zero-retention at the inference layer — processing the prompt purely
in memory, logging only metadata (latency, verdict, rule hits), and
discarding the payload immediately — reduces the exposure surface
considerably. It doesn't solve identity linkage at the payment layer,
but it means there's nothing to subpoena, breach, or misuse at the
content layer.
For regulated industries this distinction matters a great deal.
"We never stored it" is a much stronger compliance position than
"we stored it but encrypted it."
If you use something like VertexAI or Google Workspace for Business, they are separate from your personal (internet browsing) identity and have clear language that your inputs and interactions are not used for training.
Legal department approved
This is enough for me personally. Besides which, large scale deanonymization is now powered by LLMs and agents. Even with all the VPN and such, you have so many "fingerprints" that it's actually pretty easy to narrow things down to the point you can know exactly who you are.
Yeah the real problem is that hosted providers basically see everything about you over time. Each prompt alone might seem harmless, but the whole history adds up to a pretty detailed profile.
The way I'd think about it: run a local model for anything it can handle. Keep that stuff on your machine entirely. Then for the stuff that actually needs a more powerful model, don't just dump your whole context in — strip it down to just the specific thing you need answered, like "given X, what's Y" with no surrounding story. The hosted model just sees a decontextualized task, not what's actually going on.
You basically become your own privacy layer. More friction, but the provider never gets the full picture.
The intermediary approach shifts trust rather than eliminates it — you're trading provider visibility for intermediary visibility. The more meaningful architectural question is what the intermediary retains after the request completes.
Zero-retention at the inference layer — processing the prompt purely in memory, logging only metadata (latency, verdict, rule hits), and discarding the payload immediately — reduces the exposure surface considerably. It doesn't solve identity linkage at the payment layer, but it means there's nothing to subpoena, breach, or misuse at the content layer.
For regulated industries this distinction matters a great deal. "We never stored it" is a much stronger compliance position than "we stored it but encrypted it."
If you use something like VertexAI or Google Workspace for Business, they are separate from your personal (internet browsing) identity and have clear language that your inputs and interactions are not used for training.
Legal department approved
This is enough for me personally. Besides which, large scale deanonymization is now powered by LLMs and agents. Even with all the VPN and such, you have so many "fingerprints" that it's actually pretty easy to narrow things down to the point you can know exactly who you are.