Show HN: Research Hacker News, ArXiv & Google with Hierarchical Bayesian Models

(sturdystatistics.com)

1 points | by kianN 10 hours ago ago

1 comments

  • kianN 10 hours ago ago

    Some statistical notes for those interested:

    Under the hood, this model resembles LDA, but replaces its Dirichlet priors with Pitman–Yor Processes (PYPs), which better capture the power-law behavior of word distributions. It also supports arbitrary hierarchical priors, allowing metadata-aware modeling.

    For example, in an earnings-transcript corpus, a typical LDA might have a flat structure: Prior → Document

    Our model instead uses a hierarchical graph: Uniform Prior → Global Topics → Ticker → Quarter → Paragraph

    This hierarchical structure, combined with the PYP statistics, consistently yields more coherent and fine-grained topic structures than standard LDA does. There’s also a “fast mode” that collapses some hierarchy levels for quicker runs; it’s a handy option if you’re curious to see the impact hierarchy has on the model results (or in a rush).