I'm not actually spending funds, just because I built infra on DO droplets, so worth case scenario - people will wait a bit longer if no cache found and queue is too long.
My question - how can I access iframe runtime of 3-rd party website and access window, or intercept requests to analyze better. For example, I check headers, check global vars, go to check source maps, etc. Having full control gives me a bit more flexibility to have the most confident analysis.
So what I did was have setup a proxy server in cloudflare workers.
proxy.yourdomain.com?url=websitetocrawl
Now in the worker, I replaced all the external resources of a html (CSS URLs, scripts) via regex to also go through the proxy.
Since the proxy is under the same cross-origin you're able to do whatever you want. And even if it was a different domain, window.postMessage has a domain argument that allows you communicate explicitly with the parent window.
Just tried it, works great. I built something similar but for detecting fonts (https://fontofweb.com)
Instead of spending funds using a headless browser you might want to look into crawling via an iframe and sending back the data via postMessage.
Wow, that's a great idea!
My idea was to create browser-less environment, so I can use it via https://www.npmjs.com/package/@unbuilt/cli or just API not loading users resources.
I'm not actually spending funds, just because I built infra on DO droplets, so worth case scenario - people will wait a bit longer if no cache found and queue is too long.
My question - how can I access iframe runtime of 3-rd party website and access window, or intercept requests to analyze better. For example, I check headers, check global vars, go to check source maps, etc. Having full control gives me a bit more flexibility to have the most confident analysis.
Welcome!
So what I did was have setup a proxy server in cloudflare workers.
proxy.yourdomain.com?url=websitetocrawl
Now in the worker, I replaced all the external resources of a html (CSS URLs, scripts) via regex to also go through the proxy.
Since the proxy is under the same cross-origin you're able to do whatever you want. And even if it was a different domain, window.postMessage has a domain argument that allows you communicate explicitly with the parent window.
Wow, that's a great idea!
Thanks for sharing! It looks like it can improve performance a lot
Btw, https://fontofweb.com/ looks amazing. Do you have it available on Github?
Not yet, maybe sometime in the future.