Why I Built Reader: Open-source web scraping for LLMs

(reader.dev)

3 points | by nihalwashere 10 hours ago ago

7 comments

  • verdverm 10 hours ago ago

    Killer domain btw, how did you nab that one?

    Any docs on how to run this on multiple machines? (ideally k8s)

    • nihalwashere 9 hours ago ago

      Thanks! Honestly that was just pure luck with the domain :)

      There's a Docker deployment guide here: https://docs.reader.dev/documentation/guides/deployment

      For k8s, you can run multiple Reader instances behind a load balancer, each manages its own browser pool. Main things to watch:

      - Memory limits (~500MB-1GB per concurrent browser) - Headless Chrome needs --no-sandbox or a seccomp profile - Sticky sessions for crawl jobs (or run full crawl on single pod)

      A dedicated k8s guide is on the roadmap...

      • verdverm 9 hours ago ago

        The main challenge is distributed rate-limiting, something I'd hope the framework handles for me. Also having k8s settings that work well in your experience w.r.t. scaling

        • nihalwashere 9 hours ago ago

          Distributed rate-limiting is intentionally not in the core library, Reader focuses on the scraping primitives and stays unopinionated about orchestration.

          For multi-node rate limiting, you'd layer that on top: Redis + a simple limiter that gates calls to reader.scrape().

          For k8s resource settings, the Docker guide is a good starting point: https://docs.reader.dev/documentation/guides/deployment

          But I will add some reference examples on how to build a rate-limiting and K8s orchestration layer on top of Reader...

          Thanks for sharing this :)

          • verdverm 8 hours ago ago

            There's plenty of what you've built here to go around. It's trivial now to reproduce the basics.

            Distributed rate-limiting is a hard problem, one people may pay for

            • nihalwashere 8 hours ago ago

              I will add some reference examples on how to build a rate-limiting and K8s orchestration layer on top of Reader soon :)

              • verdverm 7 hours ago ago

                You're missing the point, I don't want to build it myself. The framework I will actually use will do it for me. If yours does not, it will not be in consideration.