Nice, thanks! After reading the whole Github ReadMe, it's not clear to me how is the clipboard handled: if I have an image in my clipboard and I run textsnap with no arguments, where is the OCR text stored, back in the clipboard (that would be ideal)? Unrelated, I wish textsnap would look for it's model files not only in the well-known operating system's dir, but also next to itself (portable mode), as that would enable me to copy/move textsnap directory together with the model files to any computer and just use it from there without any setup steps necessary. The --model-dir is useful, but it is also cumbersome for day to day use. In other words, it would be great if --model-dir is understood to be wherever textsnap executable is, by default. Thanks.
clipboard: rn input is treated like any other source, so text gets written to ./textsnaps/clipboard_ocr.txt, and stdout just prints that path. Nothing goes back to the clipboard in this version (stay tuned)
portability: agreed, and it's a small change. textsnap already looks for the checksum manifest next to the script before falling back to the cache, so extending it should be easy. I make a note for next version.
Great question. I'm not familiar with docling-serv but pretty different beasts from what I gathered. Docling is a heavier pipeline (actually uses GPU).textsnap is the opposite: single-file CLI, small VLM running on plain CPU cores, one command, no server. Tradeoff is CPU decode is sequential so it's slower on dense pages, and it OCRs one image rather than doing full layout.
If docling-serve is already meeting your needs it's probably not an upgrade. But it installs in one command, so would love to hear how it stacks up on your images, if you end up trying it.
Cool tool... but, did you just vibe-coded this on similar lines as yapsnap? I sense an eerie similarity between the two. Yapsnap also was on frontpage today itself.
thanks! yapsnap is audio to text, and textsnap is image to text. Both have been daily use cases for me for a while. And yes, the feedback on yapsnap encouraged me to also release textsnap on github
Nice, thanks! After reading the whole Github ReadMe, it's not clear to me how is the clipboard handled: if I have an image in my clipboard and I run textsnap with no arguments, where is the OCR text stored, back in the clipboard (that would be ideal)? Unrelated, I wish textsnap would look for it's model files not only in the well-known operating system's dir, but also next to itself (portable mode), as that would enable me to copy/move textsnap directory together with the model files to any computer and just use it from there without any setup steps necessary. The --model-dir is useful, but it is also cumbersome for day to day use. In other words, it would be great if --model-dir is understood to be wherever textsnap executable is, by default. Thanks.
thank you being thorough
clipboard: rn input is treated like any other source, so text gets written to ./textsnaps/clipboard_ocr.txt, and stdout just prints that path. Nothing goes back to the clipboard in this version (stay tuned)
portability: agreed, and it's a small change. textsnap already looks for the checksum manifest next to the script before falling back to the cache, so extending it should be easy. I make a note for next version.
I've been using docling-serv on one of my machines with a modest gpu. How does this compare?
Great question. I'm not familiar with docling-serv but pretty different beasts from what I gathered. Docling is a heavier pipeline (actually uses GPU).textsnap is the opposite: single-file CLI, small VLM running on plain CPU cores, one command, no server. Tradeoff is CPU decode is sequential so it's slower on dense pages, and it OCRs one image rather than doing full layout.
If docling-serve is already meeting your needs it's probably not an upgrade. But it installs in one command, so would love to hear how it stacks up on your images, if you end up trying it.
Cool tool... but, did you just vibe-coded this on similar lines as yapsnap? I sense an eerie similarity between the two. Yapsnap also was on frontpage today itself.
https://news.ycombinator.com/item?id=48214399
Nevertheless, very useful.
Thanks..
thanks! yapsnap is audio to text, and textsnap is image to text. Both have been daily use cases for me for a while. And yes, the feedback on yapsnap encouraged me to also release textsnap on github
Oh.. I didnt even notice it earlier.. you are also the author of yapsnap.. hence the similarity...
I loved the simplicity in both. They both work, without the bloat.