In our testing, the Pro version underperformed because it struggled in agentic tasks where we use a harness with custom tools it was not trained on. Mostly formatting issues. All of the other major releases in April, including the Flash version, have no problem adapting to custom tools. We do plan to continue adding Pro samples to see if there was an infrastructure degradation component.
They claim they're surpassing the latest DeepSeek model in many tests. I'm wondering when we can see a GGUF release of this model as being a MoE this should run fine on local machines.
BTW: here's the blog post https://mimo.xiaomi.com/mimo-v2-5-pro. They state that "DeepSeek V4 Pro numbers are with its max effort setting." so I'm wondering what they used for this one.
This is the most underrated release we tested at https://gertlabs.com
I'm surprised they open sourced it. It's very comparable with Kimi K2.6 performance-wise, and slightly better with tools. And it's cheaper.
Wow awesome!
BTW I see that deepseek V4 pro is trounced by it's little flash brother? Any ideas as to why?
In our testing, the Pro version underperformed because it struggled in agentic tasks where we use a harness with custom tools it was not trained on. Mostly formatting issues. All of the other major releases in April, including the Flash version, have no problem adapting to custom tools. We do plan to continue adding Pro samples to see if there was an infrastructure degradation component.
They claim they're surpassing the latest DeepSeek model in many tests. I'm wondering when we can see a GGUF release of this model as being a MoE this should run fine on local machines.
BTW: here's the blog post https://mimo.xiaomi.com/mimo-v2-5-pro. They state that "DeepSeek V4 Pro numbers are with its max effort setting." so I'm wondering what they used for this one.
wow. China has so many open source LLM.