So I read the post, the dataset was inert until someone trained on it; he left it up specifically to see how long it would take anyone to notice and in practice no one did.
Yes, 6 months. I reported it to Hugging Face the day I confirmed the backdoor propagated into model weights, not before, because the vulnerability was the lack of detection, not the dataset itself. The dataset was inert without someone training on it. I wanted to measure whether anyone would notice. No one did.
Fair. 'I poisoned' was the wrong verb, it sounds like I enjoyed it, I didn't. I found a hole in infrastructure that lets anyone do this, and I wanted proof that nobody was watching. The proof is depressing. I'll edit the opening if it stays up.
So I read the post, the dataset was inert until someone trained on it; he left it up specifically to see how long it would take anyone to notice and in practice no one did.
Don people assume that all datasets are possible dangerous?
You left it up for 6 months!??? Potentially poising thousands. Are you looking for respect from this community?
Yes, 6 months. I reported it to Hugging Face the day I confirmed the backdoor propagated into model weights, not before, because the vulnerability was the lack of detection, not the dataset itself. The dataset was inert without someone training on it. I wanted to measure whether anyone would notice. No one did.
This is not something to taunt about
Fair. 'I poisoned' was the wrong verb, it sounds like I enjoyed it, I didn't. I found a hole in infrastructure that lets anyone do this, and I wanted proof that nobody was watching. The proof is depressing. I'll edit the opening if it stays up.