I wonder if you could optimize for reducing the total probe count (at the expense of possibly longer total time, though it may be faster in some cases) by using some sort of "gradient descent".
Start by doing the multi-continent probe, say 3x each. Drop the longest time probes, add probes near the shortest time, and probe once. Repeat this pattern of probe, assess, drop and add closer to the target.
You accumulate all data in your orchestrator, so in theory you don't need to deliberately issue multiple probes each round (except for the first) to get statistical power. I would expect this to "chase" the real location continuously instead of 5 discrete phases.
I just watched the Veritasium video on potentials and vector fields - the latency is a scalar potential field of sorts, and you could use it to derive a latency gradient.
Yes, most likely there are multiple algorithms that could be used to get better results with fewer probes, but I'm not smart enough to do the math and implement them.
Time of flight from three points gets you two options for position with GPS, but GPS signals propagate directly in free space. At least mostly, reflections happen.
Internet signals generally travel by cable, and the selected route may or may not be the shortest distance.
It's quite possible for traffic between neighboring countries to transit through another continent, sometimes two. And asymetric routing is also common.
Since this is using traceroute anyway, if you characterize the source nodes, you could probably use a lot fewer nodes and get similar results with something like:
a) probe from a few nodes on different continents (aiming to catch anycast nodes)
b) assuming the end of the trace is similar from all probes, choose probe nodes that are on similar networks, and some other nodes that are geolocated nearby those nodes.
c) declare the target is closest to the node with the lowest measured latency (after offsetting from node charachterized first hop latency)
You'll usually get the lowest ping times if you can ping from nearby customer of the same ISP as the target. Narrowing to that faster is possible if you know about your nodes.
Some code may be AI generated, because the code uses "══════" to separate terminal output. In my experience, Claude really likes to use this character to separate terminal output.
How feasible would it be for the host under measurement to introduce additional artificial latency to ping responses, varying based on source IP, in order to spoof its measured location?
Courtesy of Xfinity and Charter overprovisioning most neighborhood’s circuits, we already have that today for a significant subset of U.S. Internet users due to the resulting Bufferbloat (up to 2500ms on a 1000/30 connection!)
You probably meant to say oversubscribing, not overprovisioning.
Oversubscription is expected to a certain degree (this is fundamentally the same concept as "statistical multiplexing"). But even oversubscription in itself is not guaranteed to result in bufferbloat -- appropriate traffic shaping (especially to "encourage" congestion control algorithms to back off sooner) can mitigate a lot of those issues. And, it can be hard to differentiate between bufferbloat at the last mile vs within the ISP's backbone.
Bit surprised this works. Latency variability is huge and sometimes quite disconnected from geo location. I recall talking to someone in NL and realised I've got better latency to NL content from the UK than he did. Presumably better peering etc.
Could just be local loop latency, in VDSL or DOCSIS you can get 5-15ms of latency just in your first 1KM. London (e.g Telehouse) > Amsterdam is only about 7ms.
Wouldn't you just be closer to the closest PoP and requesting mostly cached content? With how connected amsterdam is they couldn't be around there. Also depending on when it was up until like 7-8 years ago even in major city centers there was no fiber in most places in NL. Now it's mostly covered.
If I understood the post the author just takes the location of smallest ping as the winner. This seems like a very rudimentary approach. Why not do triangulation? If you take each ping time as a measurement of distance between two points, you should be able to ping from a random selection of IPs and from there calculate the location.
You mention the quality several times in the article but it's not clear how this is verified. Do you have a set of known-location-ip-addresses around the world (apart from your home)? Or are we just assuming that latency is a good indicator?
This is/was also my take.
I’m skeptical that a probe-based network can be granular enough to reliably pinpoint a city, especially when some paths are much better connected than others (fewer hops, uncongested fiber, no throttling).
However, ipinfo still appears to rely on active probing to triangulate geolocation data, which suggests they believe these routing asymmetries can be modeled or averaged out in practice.
It depends on the city, and how the ISPs in the city work.
The telco DSL and fiber in my metro area all runs through a single location where the PPPoE (hiss) concentrator is and the first hop latency from DSL interleaving swamps the latency from distance. You can someone is in the metro area, but not the county or city.
Cable company customers are a little more locatable, probably get the county.
yeah, when i used to live in New England, and had more time to be interested in transit, i always was peaked in how comcast would route.
No matter how far south i seemed to get, i'd always need to travel to Boston's peering point first to make it to NYC, even in New Haven. If you then simply switch isps, even at same address, verizon would send you south immediately.
so theres funky overlap wherein on one isp you appear closer to city A, and on isp 2 closer to city B, but its same physical address.
Continental classification I'd think would be good as they appear to be coalesced endpoints, separated by vast oceans.
You can extend this by looking at the IP route for the reverse path, I've found it's usually accurate to the state at least on the last hop before destination - added benefit that there's usually an airport or city code on the fqdn of that hop.
It'd be clever to integrate this into the TCP stack so it tells you immediately what the lowest bound is on the distance to the counterparty based on the time between data sent and the corresponding acknowledgements. I can see some immediate applications for that.
> Globalping is an open-source, community-powered project that allows users to self-host container-based probes. These probes then become part of our public network, which allows anyone to use them to run network testing tools such as ping and traceroute.
Tried with an IP allocated to a major wireless network operator. It was far off but also ran out of credits when trying with higher limits on subsequent attempts.
Seems tool is relying on ICMP results from various probes. So wouldn't this project become useless if target device disables ICMP?
I wonder if you can "fake" results by having your gateway/device respond with fake ICMP requests.
I talk about it a bit in the article. The easiest solution is to use the last available hop. In most cases its close enough to properly detect the country even if the target blocks ICMP.
Email me if you would like to get some additional credits to test it out, dakulovgr gmail.
This is a little project exploring the feasibility of using a service such as Globalping for geo location needs.
I had fun making it but please note that the current implementation is just a demo and far from a proper production tool.
If you really want to use it then for best possible results you need at least 500 probes per phase.
It could be optimized fairly easily but not without going over the anon user limit which I tried to avoid
I wonder if you could optimize for reducing the total probe count (at the expense of possibly longer total time, though it may be faster in some cases) by using some sort of "gradient descent".
Start by doing the multi-continent probe, say 3x each. Drop the longest time probes, add probes near the shortest time, and probe once. Repeat this pattern of probe, assess, drop and add closer to the target.
You accumulate all data in your orchestrator, so in theory you don't need to deliberately issue multiple probes each round (except for the first) to get statistical power. I would expect this to "chase" the real location continuously instead of 5 discrete phases.
I just watched the Veritasium video on potentials and vector fields - the latency is a scalar potential field of sorts, and you could use it to derive a latency gradient.
Yes, most likely there are multiple algorithms that could be used to get better results with fewer probes, but I'm not smart enough to do the math and implement them.
The simplest is drop the longest latency probe, and add a new one in the proximity of the fastest.
isn't 3 theoretically enough?
Time of flight from three points gets you two options for position with GPS, but GPS signals propagate directly in free space. At least mostly, reflections happen.
Internet signals generally travel by cable, and the selected route may or may not be the shortest distance.
It's quite possible for traffic between neighboring countries to transit through another continent, sometimes two. And asymetric routing is also common.
Since this is using traceroute anyway, if you characterize the source nodes, you could probably use a lot fewer nodes and get similar results with something like:
a) probe from a few nodes on different continents (aiming to catch anycast nodes)
b) assuming the end of the trace is similar from all probes, choose probe nodes that are on similar networks, and some other nodes that are geolocated nearby those nodes.
c) declare the target is closest to the node with the lowest measured latency (after offsetting from node charachterized first hop latency)
You'll usually get the lowest ping times if you can ping from nearby customer of the same ISP as the target. Narrowing to that faster is possible if you know about your nodes.
Congrats on doing it without AI! Just reading your crappy one-word commit messages make me happy.
Some code may be AI generated, because the code uses "══════" to separate terminal output. In my experience, Claude really likes to use this character to separate terminal output.
>Claude really likes
Plenty of developers really like it too though, because that's where Claude learned to use it.
How feasible would it be for the host under measurement to introduce additional artificial latency to ping responses, varying based on source IP, in order to spoof its measured location?
Totally feasible.
You could do even cooler tricks, like https://github.com/blechschmidt/fakeroute
Pointless? Almost certainly.
Not-impossible, but it would be a whole lot simpler to just not respond to pings in the first place.
But also, as mentioned in https://news.ycombinator.com/item?id=46836803 , someone can still probe the second-last hop and get pretty close.
Courtesy of Xfinity and Charter overprovisioning most neighborhood’s circuits, we already have that today for a significant subset of U.S. Internet users due to the resulting Bufferbloat (up to 2500ms on a 1000/30 connection!)
You probably meant to say oversubscribing, not overprovisioning.
Oversubscription is expected to a certain degree (this is fundamentally the same concept as "statistical multiplexing"). But even oversubscription in itself is not guaranteed to result in bufferbloat -- appropriate traffic shaping (especially to "encourage" congestion control algorithms to back off sooner) can mitigate a lot of those issues. And, it can be hard to differentiate between bufferbloat at the last mile vs within the ISP's backbone.
Have you seen excessive bufferbloat on a DOCSIS 3.1 modem?
Totally feasible but a bit like all these situations - it’s not happening in practice.
Hacks
>varying based on source IP,
Aha, that's what you would think, but what if I fake the source of the IP used to do the geolocation ping instead!
Bit surprised this works. Latency variability is huge and sometimes quite disconnected from geo location. I recall talking to someone in NL and realised I've got better latency to NL content from the UK than he did. Presumably better peering etc.
Could just be local loop latency, in VDSL or DOCSIS you can get 5-15ms of latency just in your first 1KM. London (e.g Telehouse) > Amsterdam is only about 7ms.
Wouldn't you just be closer to the closest PoP and requesting mostly cached content? With how connected amsterdam is they couldn't be around there. Also depending on when it was up until like 7-8 years ago even in major city centers there was no fiber in most places in NL. Now it's mostly covered.
If I understood the post the author just takes the location of smallest ping as the winner. This seems like a very rudimentary approach. Why not do triangulation? If you take each ping time as a measurement of distance between two points, you should be able to ping from a random selection of IPs and from there calculate the location.
I talk a little about it in the article, but the main goal was to build something simple that works as proof of concept.
This brute force approach works much better than I expected as long as you have enough probes and a bit of luck.
But of course there are much better and smarter approaches to this, no doubt!
How did you know how well these results work?
You mention the quality several times in the article but it's not clear how this is verified. Do you have a set of known-location-ip-addresses around the world (apart from your home)? Or are we just assuming that latency is a good indicator?
I run about 270 servers in verified locations as part of the Globalping network https://globalping.io/users/jimaek so I had plenty of targets to test
I tested against them, as well as other infrastructure I control that is not part of the network, and compared to the ipinfo results as well
Packets don't travel in straight lines.
This is/was also my take. I’m skeptical that a probe-based network can be granular enough to reliably pinpoint a city, especially when some paths are much better connected than others (fewer hops, uncongested fiber, no throttling).
However, ipinfo still appears to rely on active probing to triangulate geolocation data, which suggests they believe these routing asymmetries can be modeled or averaged out in practice.
https://ipinfo.io/blog/ipinfos-probe-network
It depends on the city, and how the ISPs in the city work.
The telco DSL and fiber in my metro area all runs through a single location where the PPPoE (hiss) concentrator is and the first hop latency from DSL interleaving swamps the latency from distance. You can someone is in the metro area, but not the county or city.
Cable company customers are a little more locatable, probably get the county.
yeah, when i used to live in New England, and had more time to be interested in transit, i always was peaked in how comcast would route. No matter how far south i seemed to get, i'd always need to travel to Boston's peering point first to make it to NYC, even in New Haven. If you then simply switch isps, even at same address, verizon would send you south immediately.
so theres funky overlap wherein on one isp you appear closer to city A, and on isp 2 closer to city B, but its same physical address.
Continental classification I'd think would be good as they appear to be coalesced endpoints, separated by vast oceans.
You can extend this by looking at the IP route for the reverse path, I've found it's usually accurate to the state at least on the last hop before destination - added benefit that there's usually an airport or city code on the fqdn of that hop.
Amazing idea and execution, the sort of stuff I wish there was more of on HN.
It'd be clever to integrate this into the TCP stack so it tells you immediately what the lowest bound is on the distance to the counterparty based on the time between data sent and the corresponding acknowledgements. I can see some immediate applications for that.
You can get tcp measured round trip time from tcp_info with
tcp_info varies by OS and version, but I think tcpi_rtt is well supported.> Globalping is an open-source, community-powered project that allows users to self-host container-based probes. These probes then become part of our public network, which allows anyone to use them to run network testing tools such as ping and traceroute.
How's this different from RIPE ATLAS?
Atlas is great but it is focused more on academic research and professional use.
Globalping offers real-time result streaming and a simpler user experience with focus on integrations https://globalping.io/integrations
For example you can use the CLI as if you were running a traceroute locally, without even having to register.
And if you need more credits you can simply donate via GitHub Sponsors starting from $1
They are similar with an overlapping audience yet have different goals
> Group and sort the results; the country with the lowest latency should be the correct one
Sometimes residential ISPs (that hosts the probe) may have a bad routing due to many factors, how does the algorithm take that into account?
You have a lot of probes so you also have one with good routing
If you like this topic, read "The Cuckoo's Egg" (1989) by Clifford Stoll
Wi-FI RTT is more accurate than trilateration with RSSI but requires hw support.
IEEE 802.11mc > Wi-Fi Round Trip Time (RTT) https://en.wikipedia.org/wiki/IEEE_802.11mc#Wi-Fi_Round_Trip...
/? fine time measurement FTM: https://www.google.com/search?q=fine+time+measurement+FTM
Wow, it works !
Tried with an IP allocated to a major wireless network operator. It was far off but also ran out of credits when trying with higher limits on subsequent attempts.
Seems tool is relying on ICMP results from various probes. So wouldn't this project become useless if target device disables ICMP?
I wonder if you can "fake" results by having your gateway/device respond with fake ICMP requests.
I talk about it a bit in the article. The easiest solution is to use the last available hop. In most cases its close enough to properly detect the country even if the target blocks ICMP.
Email me if you would like to get some additional credits to test it out, dakulovgr gmail.