Show HN: DAAF – Reproducible AI-assisted data analysis for researchers

(github.com)

2 points | by brhkim 5 hours ago ago

1 comments

brhkim 5 hours ago ago

Hi HN! My name is Brian Kim -- I'm an education researcher and data scientist by trade, but I recently launched a major open-source effort to help bring real scientific rigor to AI assisted data analysis. Some friends mentioned it'd be a great idea to share here, and though I'm not a real software dev, I'm hoping you'll find it interesting and useful!
DAAF, the Data Analyst Augmentation Framework, is an open-source, forever-free, extensible workflow for Claude Code that allows skilled researchers to rapidly scale their expertise like a kind of research exo-skeleton, without sacrificing the transparency, rigor, or reproducibility demanded by our core scientific principles. DAAF explicitly embraces the fact that LLM-based research assistants will never be perfect and can never be trusted as a matter of course. But by providing strict guardrails, enforcing best practices, and ensuring the highest levels of auditability/reproducibility possible, DAAF ensures that LLM research assistants can still be immensely valuable for any critically-minded researchers capable of verifying and reviewing their work. I built it specifically so that anyone can install and begin using it in as little as 10 minutes from a fresh computer with a high-usage Anthropic account (crucial caveat, unfortunately very expensive for now!).
With DAAF, you can go from a research question to a shockingly nuanced research report with sections for key findings, data/methodology, and limitations, as well as bespoke data visualizations, with only 5-10mins of active engagement time, plus the necessary time to fully review and audit the results. To that crucial end of facilitating expert human validation, all projects come complete with a fully reproducible, documented analytic code pipeline and notebooks for exploration. Then: request revisions, rethink measures, conduct new sub-analyses, run robustness checks, and even add additional deliverables like interactive dashboards, policymaker-focused briefs, and more -- all with just a quick ask to Claude. And all of this can be done in parallel with multiple projects simultaneously.
By open-sourcing DAAF under the GNU LGPLv3 license as a forever-free and open and extensible framework, I hope to provide a foundational resource that the entire community of researchers and data scientists can use, benefit from, learn from, and extend via critical conversations and collaboration together. By pairing DAAF with an intensive array of educational materials, tutorials, blog deep-dives, and videos via project documentation and an accompanying educational (also forever-free) substack, I also hope to rapidly accelerate the readiness of the scientific community to genuinely and critically engage with AI disruption and transformation writ large.
If you want a quick walk-through, you can see my 10-minute video demo of the main functions here: https://youtu.be/ZAM9OA0AlUs
With all that in mind, I would love to hear what you think, what your questions are, and absolutely every single critical thought you’re willing to share, so we can learn on this frontier together. Thanks for reading and engaging earnestly!