SOLO Bench is a benchmark that tasks LLMs to create 250 unique sentences, each exactly four words long in a specific grammatical format, using only words from a provided list of ~4,000 words. Each word from the list can only be used once across all sentences, and must be completed without external tools or code. This test aims to evaluate long-context (input and output) performance, memory, instruction following, reasoning, and hallucinations all in a single benchmark. It proves to be a very difficult task for all LLMs.
SOLO Bench is a benchmark that tasks LLMs to create 250 unique sentences, each exactly four words long in a specific grammatical format, using only words from a provided list of ~4,000 words. Each word from the list can only be used once across all sentences, and must be completed without external tools or code. This test aims to evaluate long-context (input and output) performance, memory, instruction following, reasoning, and hallucinations all in a single benchmark. It proves to be a very difficult task for all LLMs.
https://dull-stop-29a.notion.site/SOLO-Bench-1e70c13d9e4580e...