7 comments

  • atmanactive 5 hours ago ago

    Nice, thanks! After reading the whole Github ReadMe, it's not clear to me how is the clipboard handled: if I have an image in my clipboard and I run textsnap with no arguments, where is the OCR text stored, back in the clipboard (that would be ideal)? Unrelated, I wish textsnap would look for it's model files not only in the well-known operating system's dir, but also next to itself (portable mode), as that would enable me to copy/move textsnap directory together with the model files to any computer and just use it from there without any setup steps necessary. The --model-dir is useful, but it is also cumbersome for day to day use. In other words, it would be great if --model-dir is understood to be wherever textsnap executable is, by default. Thanks.

    • mrkn1 an hour ago ago

      thank you being thorough

      clipboard: rn input is treated like any other source, so text gets written to ./textsnaps/clipboard_ocr.txt, and stdout just prints that path. Nothing goes back to the clipboard in this version (stay tuned)

      portability: agreed, and it's a small change. textsnap already looks for the checksum manifest next to the script before falling back to the cache, so extending it should be easy. I make a note for next version.

  • PeterStuer 4 hours ago ago

    I've been using docling-serv on one of my machines with a modest gpu. How does this compare?

    • mrkn1 2 hours ago ago

      Great question. I'm not familiar with docling-serv but pretty different beasts from what I gathered. Docling is a heavier pipeline (actually uses GPU).textsnap is the opposite: single-file CLI, small VLM running on plain CPU cores, one command, no server. Tradeoff is CPU decode is sequential so it's slower on dense pages, and it OCRs one image rather than doing full layout.

      If docling-serve is already meeting your needs it's probably not an upgrade. But it installs in one command, so would love to hear how it stacks up on your images, if you end up trying it.

  • freakynit 8 hours ago ago

    Cool tool... but, did you just vibe-coded this on similar lines as yapsnap? I sense an eerie similarity between the two. Yapsnap also was on frontpage today itself.

    https://news.ycombinator.com/item?id=48214399

    Nevertheless, very useful.

    Thanks..

    • mrkn1 8 hours ago ago

      thanks! yapsnap is audio to text, and textsnap is image to text. Both have been daily use cases for me for a while. And yes, the feedback on yapsnap encouraged me to also release textsnap on github

      • freakynit 5 hours ago ago

        Oh.. I didnt even notice it earlier.. you are also the author of yapsnap.. hence the similarity...

        I loved the simplicity in both. They both work, without the bloat.