6 comments

  • sim04ful 8 days ago ago

    Just tried it, works great. I built something similar but for detecting fonts (https://fontofweb.com)

    Instead of spending funds using a headless browser you might want to look into crawling via an iframe and sending back the data via postMessage.

    • yavorsky 8 days ago ago

      Wow, that's a great idea!

      My idea was to create browser-less environment, so I can use it via https://www.npmjs.com/package/@unbuilt/cli or just API not loading users resources.

      I'm not actually spending funds, just because I built infra on DO droplets, so worth case scenario - people will wait a bit longer if no cache found and queue is too long.

      My question - how can I access iframe runtime of 3-rd party website and access window, or intercept requests to analyze better. For example, I check headers, check global vars, go to check source maps, etc. Having full control gives me a bit more flexibility to have the most confident analysis.

      • sim04ful 7 days ago ago

        Welcome!

        So what I did was have setup a proxy server in cloudflare workers.

        proxy.yourdomain.com?url=websitetocrawl

        Now in the worker, I replaced all the external resources of a html (CSS URLs, scripts) via regex to also go through the proxy.

        Since the proxy is under the same cross-origin you're able to do whatever you want. And even if it was a different domain, window.postMessage has a domain argument that allows you communicate explicitly with the parent window.

        • yavorsky 6 days ago ago

          Wow, that's a great idea!

          Thanks for sharing! It looks like it can improve performance a lot

    • yavorsky 8 days ago ago

      Btw, https://fontofweb.com/ looks amazing. Do you have it available on Github?

      • sim04ful 7 days ago ago

        Not yet, maybe sometime in the future.