[HN Gopher] How to build a IP geolocation database from scratch?
       ___________________________________________________________________
        
       How to build a IP geolocation database from scratch?
        
       Author : incolumitas
       Score  : 343 points
       Date   : 2023-09-14 11:00 UTC (11 hours ago)
        
 (HTM) web link (ipapi.is)
 (TXT) w3m dump (ipapi.is)
        
       | fasteo wrote:
       | >>> Consider Open Source Geolocation Projects
       | 
       | Not the definition of "from scratch" in my book
        
       | dboreham wrote:
       | Interesting but this isn't actually how geolocation is done,
       | right? The ARIN/RIPE data isn't sufficiently accurate to be
       | useful beyond country. Commercial geolocation involves
       | correlating client IP vs known physical location e.g. from WiFi
       | AP or mailing a package to the user. At least that's what I have
       | been told over the decades.
        
         | shortrounddev2 wrote:
         | I work in adtech and this is how we do geolocation. There's
         | also device geolocation but if the user doesn't consent to
         | sharing their GPS data with us, we just use IP address for
         | targeting. Common provider for this is Maxmind; they ship a
         | database that you host locally and query
        
           | oh_come_on wrote:
           | [dead]
        
           | tiffanyh wrote:
           | Does Cloudflare have the same data as Maxmind?
           | 
           | Because Cloudflare and Maxmind geolocate me to the exact same
           | longitude/latitude.
        
             | klaussilveira wrote:
             | CloudFlare uses Maxmind: https://developers.cloudflare.com/
             | support/network/configurin...
        
           | dawnerd wrote:
           | Even the free maxmind db is accurate enough for most
           | applications.
        
           | klaussilveira wrote:
           | Since you are in adtech: do you buy MaxMind, or roll your
           | own? Are there any providers for US-only data, and therefore,
           | cheaper?
        
             | shortrounddev2 wrote:
             | We licensed Maxmind's DB recently (it's like $300 a year or
             | something). idk if there are US-only databases. Our
             | customers are all in the US, and we use geo IP to filter
             | european users for compliance (GDPR and otherwise)
        
       | ChopSticksPlz wrote:
       | This is a very useful .csv, what is the license? Is it free for
       | personal and commercial use?
        
       | jl6 wrote:
       | Is anybody maintaining a historical archive of "IP address
       | metadata" (which would include geolocation)?
       | 
       | If I have logs from 10 years ago, can I look up information about
       | that IP as it was at the time?
        
       | sneak wrote:
       | I feel like a more useful and accurate way would be to buy client
       | ip and GPS location data in bulk from one of the mobile data
       | brokers who have their spyware embedded in zillions of popular
       | apps/games and then group it by /24 or something.
        
       | johnklos wrote:
       | I think it's interesting that the one IP range I decided to check
       | has correct information on the ipapi.is web site, but
       | unambiguously incorrect information in the downloadable
       | geolocationDatabaseIPv4.csv. Somehow Bedford, New Hampshire
       | (which came straight from WHOIS) became Bedford, Texas.
       | 
       | How'd that happen?
        
       | alberth wrote:
       | What are common use cases for needing IP geolocation?
        
       | kiririn wrote:
       | A modern version of the ping-based geoip mentioned
       | 
       | https://github.com/Ne00n/yammdb
        
         | JoshGlazebrook wrote:
         | This just links to a mmdb file that is already compiled, there
         | isn't anything relevant to show this is a "modern"
         | implementation of anything if the implementation isn't
         | available.
        
       | mootothemax wrote:
       | Any suggestions for geolocating datacenter IPs, even very
       | roughly? I'm analysing traceroute data, and while I have known
       | start and end locations, it's the bit in the middle I'm
       | interested in.
       | 
       | I can infer certain details from airport codes in node hostnames,
       | for example.
       | 
       | It would also be possible - I guess - to infer locations based on
       | average RTT times, presuming a given node's not having a bad day.
       | 
       | Anyone have any other ideas?
       | 
       | Edit: A couple of troublesome example IPs are 193.142.125.129,
       | 129.250.6.113, and 129.250.3.250. They come up in a UK traceroute
       | - and I believe they're in London - but geolocate all over the
       | world.
        
         | toast0 wrote:
         | Those IPs are owned by Google and NTT, who both run large
         | international networks and can redeploy their IPs around the
         | world when they feel like it. So lookup based geolocation is
         | going to be iffy, as you've seen.
         | 
         | Traceroute to those IPs certainly looks like the networking
         | goes to London.
         | 
         | The google IP doesn't respond to ping, but the NTT/Verio ones
         | do. I'd bet if you ping from London based hosting, you'll get
         | single digit ms ping responses, which sets an upper bound on
         | the distance from London. Ping from other hosting in the
         | country and across the channel, and you can confirm the lowest
         | ping you can get is from London hosting, and there you go. It
         | could also be that its connectivity is through London, but it's
         | elsewhere --- you can't really tell.
         | 
         | Check from other vantage points, just to make sure it's not
         | anycast; if you ping 8.8.8.8 from most networks around the
         | world, you'll get something nearby; but these IPs give
         | traceroutes to london from the Seattle area, so probably not
         | anycast (at least at the moment, things can change).
         | 
         | If you don't have hosting around the world, search for public
         | looking glasses at well connected network that you can use for
         | pings like this from time to time.
        
         | dontdoxxme wrote:
         | https://ensa.fi/papers/geolocation_imc17.pdf has some ideas.
         | 
         | Using RIPE atlas probes to get RTT to the IPs from known
         | locations is close to your idea and probably the best anyway.
        
         | tyingq wrote:
         | This looked promising:
         | 
         |  _" TULIP's purpose is to geolocate a specified target host
         | (identified by IP name or address) using ping RTT delay
         | measurements to the target from reference landmark hosts whose
         | positions are well known (see map or table)."_
         | 
         | https://tulip.slac.stanford.edu/
         | 
         | But the endpoint it posts to seems dead.
        
         | vinay_ys wrote:
         | > A couple of troublesome example IPs are 193.142.125.129,
         | 129.250.6.113, and 129.250.3.250. They come up in a UK
         | traceroute - and I believe they're in London - but geolocate
         | all over the world.
         | 
         | If I'm running a popular app/web service, I would have my own
         | AS number and I will have purchased a few blocks of IP
         | addresses under this AS and then I would advertize these
         | addresses from multiple owned/rented datacenters around the
         | world.
         | 
         | These BGP advertisements would be to my different upstream
         | Internet service providers (ISPs) in different locations.
         | 
         | For a given advertisement from a particular location, if you
         | see a regional ISP as upstream, you can make an educated guess
         | that this particular datacenter is in that region. If these are
         | Tier 1 ISPs who provide direct connectivity around the world,
         | then even that guess is not possible.
         | 
         | You can see the BGP relationships in a looking glass tool like
         | bgp.tools -
         | https://bgp.tools/prefix/193.142.125.0/24#connectivity
         | 
         | If you have ability to do traceroute from multiple probes
         | sprinkled across the globe with known locations, then you could
         | triangulate by looking at the fixed IPs of the intermediate
         | router interfaces.
         | 
         | Even this is is defeated if I were to use a CDN like Cloudflare
         | to advertise my IP blocks to their 200+ PoPs and ride their
         | private networks across the globe to my datacenters.
        
       | mannyv wrote:
       | [dead]
        
       | bullen wrote:
       | Here is a solution for those that care about speed:
       | 
       | https://www.miyuru.lk/geoiplegacy
        
       | hddqsb wrote:
       | Somewhat relevant: Google Maps can learn the location of your IP
       | based on which locations you browse in the map. If you browse a
       | specific location enough times, it will use that as the default
       | location when you open Google Maps, even if you clear all
       | cookies. (I discovered this just from using Google Maps, and I'm
       | a little concerned by the privacy implications, considering that
       | multiple people may share an IP address.)
        
         | gniv wrote:
         | I suspect it's the other way around. Google just has a very
         | good IP geolocation db, so it uses that when you browse, absent
         | any other info.
        
           | hddqsb wrote:
           | Google certainly uses its geolocation DB, but it _also_
           | learns based on map browsing patterns.
           | 
           | To clarify, the scenario I described is as follows: 1.
           | Initially, when I open Google Maps in a clean browser it
           | defaults to my real location. 2. I repeatedly browse some
           | other location. 3. When I open Google Maps in a clean
           | browser, it defaults to that other location. The _only_
           | reason for Google Maps to pick that other location is my map
           | browsing.
        
             | gniv wrote:
             | Thanks for clarifying. That is indeed surprising and you
             | are probably right.
        
           | netsharc wrote:
           | Well it has reporting beacons all over the world with GPS
           | receivers, in the form of Android phones, and perhaps Google
           | Maps users on iPhone too..
        
         | is_true wrote:
         | That would explain why it sometimes it thinks I'm in a river I
         | paddle often and other times where I have my summer house.
        
       | overcast wrote:
       | Step 1: Download Geolocation Database
        
         | Aachen wrote:
         | Scroll down, the article is confusingly below that
        
         | nonethewiser wrote:
         | Step 1: Download Geolocation Data
         | 
         | Unless you think CSV is a database?
        
           | debesyla wrote:
           | Maybe a dumb question (I have no knowledge), but why wouldn't
           | we think of .CSV files as databases? It can have columns and
           | rows filled with information and isn't that what makes a
           | thing a database?
        
             | nobleach wrote:
             | Best I can guess here, the reply is considering relational
             | databases as "real databases" and flat files.... not real.
        
           | nobleach wrote:
           | Are we really going to do the mincing of words here? Did you
           | need the word "dump" or "export" before you understood?
           | Although I wasn't wild about the original poster's "step 1"
           | terseness, it's silly to think a normal person wouldn't be
           | able to parse the sentence well enough to understand
           | "download the database contents - perhaps stored in CSV
           | format".
        
           | tmpX7dMeXU wrote:
           | If in your mind database implies a type of technology and not
           | something conceptual, you're really just outing yourself as
           | someone that needs someone between you and the boardroom.
           | Certainly not something to show off on Hacker News.
        
       | n2dasun wrote:
       | Step 1. Download Visual Basic
        
       | nanmu42 wrote:
       | Thanks for sharing.
       | 
       | I have heard there is much effort to use BGP data to build GeoIP
       | database.
        
       | bjornsing wrote:
       | I expected traceroute to play a bigger part in this. If you know
       | the route to an IP address and the location of routers, perhaps
       | even from a few different servers, then you should be able to
       | locate it fairly well.
        
       | T3RMINATED wrote:
       | [dead]
        
       | TZubiri wrote:
       | "how to scrape an ip geolocation database"
       | 
       | You know you can just run a whois query per ip you want to
       | analyze, no point in scraping the whole ipvN space.
        
         | incolumitas wrote:
         | I have to scrape the whole IP address space since I offer
         | location information as part of my API.
         | 
         | Also I only need to scrape as many WHOIS records as there are
         | different networks out there. So for example for the IPv4
         | address space, there are much less networks as there are IPv4
         | addresses (2^32).
         | 
         | Also, most RIR's provide their WHOIS databases for download.
         | 
         | Therefore, "scraping" is not really the correct word, it's an
         | hybrid approach, but mostly based on publicly available data
         | from the five RIR's.
        
           | notlukesky wrote:
           | What was the easiest and the most frustrating part?
        
         | djbusby wrote:
         | The whois data for IP is not accurate.
        
         | gsich wrote:
         | whois has no sane format.
        
       | louison11 wrote:
       | If you don't want to do this yourself, you can actually just get
       | Cloudflare to do it for you for free using a simple Worker since
       | all Cloudflare requests contain approximate IP location
       | information.
       | 
       | You can also just send a request to my URL (Cloudflare Worker
       | operated - so it should have global low latency):
       | https://www.edenmaps.net/iplocation
       | 
       | Use it for small applications, I don't mind. Just don't start
       | sending me 10M requests per day ;-)
        
         | oh_come_on wrote:
         | [dead]
        
         | tiffanyh wrote:
         | This is excellent!
         | 
         | Would you mind open sourcing the code for that?
        
           | louison11 wrote:
           | This is the code running this endpoint:
           | export function onRequest(context) {         return new Respo
           | nse(JSON.stringify([parseFloat(context.request.cf.longitude),
           | parseFloat(context.request.cf.latitude)]), {headers:
           | {"Content-Type": "application/json;charset=UTF-8"}})       }
           | 
           | This is a function on Cloudflare Pages (which is just a
           | different name for Cloudflare Workers). Minor adjustment
           | needed for Workers (get rid of "context", I believe)
        
         | emadda wrote:
         | Does anyone know how accurate Cloudflare geolocation is (for
         | workers requests)?
        
           | reincoder wrote:
           | I work for IPinfo and we do ping based geolocation. The best
           | thing you can do to verify geolocation accuracy is the
           | following:
           | 
           | - Download a few free IP databases - Generate a random list
           | of IP addresses - Do the IP address lookups across all those
           | databases - Identify the IP address that can be pinged -
           | Visit a site that can ping an IP address from multiple server
           | - Sort the results by lowest avg ping time
           | 
           | Then check where the geolocation provider is locating the IP
           | address and what is the nearest server from there.
        
           | banana_giraffe wrote:
           | As accurate as MaxMind[1], since that's what they use [2]. In
           | my experience, it's reasonably accurate for the US, less so
           | for other countries. MaxMind publishes some accuracy data
           | which might be an interesting starting point [3]
           | 
           | That said, for any analytics use cases of this data, be aware
           | that MaxMind will group a lot of what should be unknowns in
           | the middle of a country. Or, in the case the US now, I think
           | they all end up in the middle of some lake, since some farm
           | owners in Butler County, Kansas got tired of cops showing up
           | and sued MaxMind. It can cause odd artifacts unless you
           | filter the addresses out somehow.
           | 
           | 1 https://developers.cloudflare.com/support/network/configuri
           | n...
           | 
           | 2 https://www.maxmind.com/en/geoip-demo
           | 
           | 3 https://www.maxmind.com/en/geoip2-city-accuracy-comparison
        
             | matwood wrote:
             | Yeah, MaxMind is the best I have used with caveats. You
             | need to update it frequently, and you need to allow for
             | overrides.
        
         | [deleted]
        
         | carstenhag wrote:
         | I'm in Munich. Cloudflare tells a position that is 730km to the
         | north in a random forest.
        
         | Aachen wrote:
         | Or you download an IP database rather than sharing with a third
         | party which IP address is likely connecting to your service
         | with a third party
        
         | hotgeart wrote:
         | Located 100km from the Somali coast... I'm in Brussels,
         | Belgium, thx for protecting my privacy :D
        
           | louison11 wrote:
           | The result is [lon, lat]. You've most likely copied it onto
           | Google maps, which works with [lat, lon]. Believe it or not,
           | the industry still hasn't come up with a standard order.
        
       | cstuder wrote:
       | Question: What's the motivation to put coordinates in one's own
       | WHOIS record? (geoloc/geofeed)
        
         | incolumitas wrote:
         | Many service providers actually want their clients to be able
         | to locate them.
        
         | dontdoxxme wrote:
         | geofeed is used by big CDNs, it can actually help save money
         | for the provider by meaning a CDN uses a more optimal network
         | location.
        
       | nonethewiser wrote:
       | Comments seem fairly dismissive but I actually found this really
       | interesting. It reminds me of a task I had in my first position
       | to add PostGIS to our database and a location based search. That
       | was based off addresses and zipcodes.
        
         | mannyv wrote:
         | That's relatively simple to do, even in mysql. One trick is to
         | use a square instead of a circle, which avoids a lot of math.
        
       | junto wrote:
       | As someone that lives in a country where the national language is
       | not my first language, I hate websites that use IP location to
       | make assumptions about my choice of language and it being forced
       | on me based on a lazy assumption, when my browser is sending
       | language headers quite clearly, and they are ignored.
        
       | jwie wrote:
       | The easiest way to get a geolocation is to ask the user. Maybe
       | they'll just tell you, and if that's good enough for your
       | application there's no need for such solutions.
        
       | jedberg wrote:
       | It all depends on what you want to use it for and how accurate it
       | needs to be.
       | 
       | The best way to build a geolocation service is to have a billion
       | devices that report their location to you at the same time they
       | report their IP to you. That's basically Apple and Google. They
       | have by far the best geolocation databases in the world, because
       | they get constant updates of IP and location.
       | 
       | The trick is basically to make an app where people willingly give
       | you their location, and then get a lot of people to use it.
       | That's the best way to build an accurate geo-location database,
       | and why every app in the world now asks for your location.
       | 
       | 4-square had the right idea, they were just ahead of their time.
        
         | flounder3 wrote:
         | Even 10 years ago, Apple internal privacy policies prevented
         | itself from collecting precise lat/long. We had to use HTTP
         | session telemetry to determine which endpoints were best for a
         | given IP (or subnet, but not ASN), which informed our own
         | pseudo-geoIP database so we knew which endpoint to connect to
         | based on real world conditions.
         | 
         | Even still, it had to be as ephemeral as possible for the sake
         | of privacy. We weren't allowed to use or record results from
         | Apple Maps' reverse geo service outside of the context of a
         | live user request (finding nearby restaurants, etc).
        
           | jedberg wrote:
           | You don't need precise lat/lon to make a good database. Even
           | a 1km circle would be more than enough.
           | 
           | > but not ASN
           | 
           | Why wasn't ASN allowed? That's what Netflix used to make
           | endpoint routing decisions and worked really well.
        
             | flounder3 wrote:
             | You're not wrong, but privacy concerns were paramount.
             | 
             | ASNs were allowed but too vague. We needed more
             | granularity. Corporate proxies, subdelegations, many
             | providers aggregating announcements below /24, etc.
        
               | [deleted]
        
             | [deleted]
        
       | bagels wrote:
       | Surely someone is using online shopping shipping addresses for
       | this?
        
       | SirMaster wrote:
       | These IP geolocation lookups never seen to work for me.
       | 
       | They are always multiple states off, and checking multiple
       | different services pretty much never even seem to agree.
        
       | reincoder wrote:
       | First, I am big fan of your articles even before I joined IPinfo,
       | where we provide IP geolocation data service.
       | 
       | Our geolocation methodology expands on the methodology you
       | described. We utilize some of the publicly available datasets
       | that you are using. However, the core geolocation data comes from
       | our ping-based operation.
       | 
       | We ping an IP address from multiple servers across the world and
       | identify the location of the IP address through a process called
       | multilateration. Pinging an IP address from one server gives us
       | one dimension of location information meaning that based on
       | certain parameters the IP address could be in any place within a
       | certain radius on the globe. Then as we ping that IP from our
       | other servers, the location information becomes more precise.
       | After enough pings, we have a very precise IP location
       | information that almost reaches zip code level precision with a
       | high degree of accuracy. Currently, we have more than 600 probe
       | servers across the world and it is expanding.
       | 
       | The publicly available information that you are referring to is
       | sometimes not very reliable in providing IP location data as:
       | 
       | - They are often stale and not frequently updated.
       | 
       | - They are not precise enough to be generally useful.
       | 
       | - They provide location context at an large IP range level or
       | even at organization level scale.
       | 
       | And last but not least, there is no verification process with
       | these public datasets. With IPv4 trade and VPN services being
       | more and more popular we have seen evidence that in some
       | instances inaccurate information is being injected in these
       | datasets. We are happy and grateful to anyone who submits IP
       | location corrections to us but we do verify these correction
       | submissions for that reason.
       | 
       | From my experience with our probe network, I can definitely say
       | that it is far easier and cheaper to buy a server in New York
       | than in any country in the middle of Africa. Location of an IP
       | address greatly influences the value it can provide.
       | 
       | We have a free IP to Country ASN database that you can use in
       | your project if you like.
       | 
       | https://ipinfo.io/developers/ip-to-country-asn-database
        
         | caribdude wrote:
         | [dead]
        
         | Daviey wrote:
         | Would you consider no-signup inspection of the data you hold on
         | the requesters IP address? I would love to see what you have on
         | MY IP address, and if sufficiency accurate it feels that it
         | would be a good incentive to sign up to use commerically.
         | 
         | It feels like it couldn't be abused by 'freeloaders', because
         | i'd guess their use-case is viewing other peoples.
        
           | reincoder wrote:
           | We have a very open approach to our data. In fact, our
           | website is extremely accessible. It is quite useful for
           | researching IP addresses and does not require signing up. The
           | data is largely available to view on the website. Although we
           | display all IP address meta data on the home page, if you
           | intend to use our website frequently, I recommend utilizing
           | the IP data pages.
           | 
           | You can enter IP addresses on the right side to look up
           | information here: https://ipinfo.io/what-is-my-ip
           | 
           | Additionally, we offer some enjoyable tools that you can use
           | here: https://ipinfo.io/tools
           | 
           | The CLI tool is particularly entertaining.
           | 
           | You can also use our API service without signing up, with a
           | limit of 1000 requests per day.
           | 
           | If you do choose to sign up for a free account, you will
           | receive 50,000 requests per month, free IP databases, a bulk
           | lookup feature, and more.
        
           | kam wrote:
           | This is literally the most prominent thing on the
           | https://ipinfo.io home page.
        
             | qingcharles wrote:
             | Huh, that's cool. It got my home IP about 15 miles from
             | where I am, but still not bad.
             | 
             | Wait - how does this work for cell IPs? A lot of cellphone
             | v4 IPs are now shared between hundreds or thousands of
             | devices, right?
        
               | reincoder wrote:
               | I work there, and I am supposed to know these things, but
               | I don't exactly :/
               | 
               | It probably has something to do with important routers.
               | What tags do we show when you visit the IP data page? The
               | IP data page can be accessed by visiting
               | ipinfo.io/<IP_address>.
               | 
               | We use the generic term "data experts," but it actually
               | consists of about 2 dozen engineers, including data
               | engineers, data scientists, infrastructure engineers,
               | backend engineers, and a great technical CEO working on
               | all that. All those folks have gone on a boating trip off
               | the coast of Spain for a retreat.....except for me.
               | 
               | I will ask them and try to circle back with some answers.
        
             | Daviey wrote:
             | That's embarrassing for me... I thought that was a static
             | image of an example. And I did look through the site
             | looking for a search. Oops.
        
         | theogravity wrote:
         | How does that work with edge servers that use anycast to assume
         | the same IP across different regions?
        
           | SnorkelTan wrote:
           | Aren't any cast addresses a specific subset of ips and thus
           | knowable? Iirc, each autonomous system is allocated anycast
           | ip space?
        
         | TheClassic wrote:
         | Your comment is extremely interesting and what I was hoping to
         | learn from the article (without an existing source of
         | information, how do we determine the location of an IP
         | address). Thank you!
        
           | reincoder wrote:
           | I really appreciate. Thank you. We are very transparent about
           | our process. If you have any questions, you can always reach
           | out to us.
           | 
           | We have a simplified explanation of our probe network here:
           | https://ipinfo.io/blog/probe-network-how-we-make-sure-our-
           | da...
           | 
           | The only update is the number of servers is like 600+ now.
           | The probe network is growing extremely rapidly.
           | 
           | Our IP geolocation process is quite complicated, and we have
           | a team of data engineers, infrastructure engineers, and data
           | scientists working on various aspects of it. Therefore, our
           | approach is users can ask us questions, and we will try our
           | best to answer them.
        
             | freedomben wrote:
             | Just wanted to let you know, it's this transparency that
             | turned me into a customer!
             | 
             | I love your company and service, but I hate your pricing. I
             | work with a lot of small clients/apps that paying for usage
             | would be a no-brainer, but the defined monthly price
             | buckets don't make any economical sense at their scale. If
             | you added a "pay as you go" tier that a small app could
             | reasonably start by using dollars worth of API calls per
             | month and grow from there, I'd be spreading your seed all
             | over the place. I'm not saying this to rag on you, just
             | trying to provide some constructive feedback as a thank you
             | for your info sharing!
        
               | reincoder wrote:
               | Thank you very much; I really appreciate your feedback.
               | This is not the first time I have heard this. The
               | solution is to try to take as much advantage as you can
               | from the free tier.
               | 
               | # Check out the free IP databases
               | 
               | https://ipinfo.io/products/free-ip-database
               | 
               | The free databases come with commercial usage permission,
               | and because they are databases, you can make unlimited
               | lookups from them. The databases provide full accuracy
               | and are updated daily. They are just a subset of our IP
               | geolocation database that only provides IP to Country
               | information.
               | 
               | # Complement the database with the API service
               | 
               | If you only want city-level information, switch to the
               | API service. Use the database to look up IP-to-country
               | information as many times as you want. However, use the
               | API service only when necessary.
               | 
               | Additionally, if you include a credit link to us, we will
               | double your API limit to 100k/month. Visit
               | https://ipinfo.io/contact/creditlink.
               | 
               | # Cache data
               | 
               | All of our API libraries have native caching support. We
               | strongly recommend that users reduce their number of
               | requests by caching the response. I highly recommend you
               | check out our libraries: https://github.com/ipinfo
               | 
               | ---
               | 
               | The only challenge with the free IP databases is that you
               | need to host the database somewhere to lookup the IP to
               | Country information. Having an API service with nearly
               | unlimited lookups for IP to Country information will be
               | fantastic.
               | 
               | If you know someone who has an IP to Country as API
               | service please, let me know. We only require an
               | attribution for using our database. If you have a similar
               | service that is popular but don't want to maintain it let
               | us know as well, we can takeover the site and host it
               | ourselves with the IP to Country data.
        
               | freedomben wrote:
               | Thank you, that's super useful info. I didn't realize you
               | had an Erlang library! I'm definitely going to be putting
               | that to use :-)
        
               | sambazi wrote:
               | [flagged]
        
         | detourdog wrote:
         | I just noticed that my wifes iphone uses the same mycingular ip
         | address while driving accross 3 states over 5 hours.l while
         | checking mail.
        
           | inemesitaffia wrote:
           | There's several options/techniques for doing it. But just
           | imagine you have a permanent zero overhead VPN.
           | 
           | I don't know if that provider terminates long running calls,
           | but the calls would stay up too regardless of tower.
        
             | detourdog wrote:
             | Yes, I'm sure it is iOS anti-tracking and directly related
             | to why firewall apps inside SIP my not know what is going
             | on.
        
               | Vendan wrote:
               | More likely to be just standard Mobile IP
               | https://en.wikipedia.org/wiki/Mobile_IP. Fairly standard
               | stuff, can cause some false positives around traveling
               | (I've seen people get freaked out about stuff like "This
               | person just logged in from their home state and then less
               | then an hour later logged in from France!" when it was
               | just mobile IP treating their phone as still in the US
               | while they were in France on a trip, but their laptop
               | connected over normal internet was seen as coming from
               | France)
        
               | detourdog wrote:
               | this was a consistent ip address nothing to do with
               | location and nobody was freaked out.
        
         | matsur wrote:
         | ICMP response time not useful for "locating" an anycasted
         | address, some of which have logical location associated with
         | them. See https://blog.cloudflare.com/icloud-private-relay/ for
         | an example
        
           | cuu508 wrote:
           | Well, at least you can detect it is an anycast address, and
           | mark it as such.
        
         | EwanToo wrote:
         | Have you considered making your database available for download
         | as Parquet format so people could just copy the file to S3,
         | Google Cloud, etc, and query it immediately with various tools?
         | 
         | I know it can be done with CSV but it's not as smooth.
        
         | chaps wrote:
         | Not gonna lie, this creeps the heck out of me.
        
           | fragmede wrote:
           | Your IP address is LEAKING!
        
           | reincoder wrote:
           | Thousands of people live in a zip code, while hundreds and
           | thousands of people live in a city. We are literally giving
           | away that data for free through our API and database. The
           | creepiness of IP geolocation is mostly a meme.
           | 
           | IP geolocation is mainly used in cybersecurity and marketing
           | analytics. There are many ways to geolocate someone. I once
           | came across a project that could estimate the country a user
           | is from based on their writing style and grammar mistakes.
           | For example, American people sometimes use "should of"
           | instead of "should have". Knowing the geolocation of an IP
           | address isn't super creepy. It's just how things work on the
           | internet.
        
             | chaps wrote:
             | And you're literally advertising this project as being
             | helpful for targeted ads. So it's pretty clear from the get
             | go that what you consider creepy isn't what I consider
             | creepy. And having done enough reidentification work to
             | scare myself, "thousands of people" might as well be a
             | couple dozen or less. I get why you're defensive and why
             | you think it's not creepy, but calling it a "meme" is
             | insultingingly dismissive.
             | 
             | Just because it's "how things work on the internet" doesn't
             | make its mass collection right. Under the same logic, any
             | side channel attack is just "how it works", and its abuse
             | warrants no ethical question.
        
               | reincoder wrote:
               | I grok and understand your concern. I am not being
               | defensive; I am just trying to provide an explanation. I
               | really enjoy having conversations like this with
               | developers as honestly and empathetically possible.
               | 
               | I apologize if I was rude in any way by saying the word
               | "meme". I saw a sister comment and thought you were being
               | sarcastic. There is a popular meme about "I have your IP
               | address", so I thought you were referencing that. I have
               | had conversations with many young people who were
               | concerned about their IP address being leaked through a
               | game server. Therefore, I try to use humor to alleviate
               | their stress. However, I now realize that this situation
               | was different, and I am sorry for not understanding that.
               | 
               | We provide a service that helps users keep their
               | internet-connected services secure by providing IP
               | metadata information. Are you being attacked by malicious
               | actors? Use our free IP database to identify the location
               | and ASN to block them. Do you want to restrict access to
               | your service to certain regions? Do that for free with
               | our services.
               | 
               | We have the most accurate data available, and yet we
               | offer the most generous free tier. We provide a full
               | accuracy IP database for free, without any range
               | aggregation, and with daily updates and a commercially
               | permissible license. We have built a community forum
               | solely dedicated to answering users' questions. We invest
               | in website tools and open-source tools, all with the goal
               | of helping users maintain the security and functionality
               | of their services.
               | 
               | We do have premium tier services, but if you use our free
               | data as a foundation, you can always replicate those
               | premium features to a reliable degree.
               | 
               | Our IP metadata information is being used in marketing
               | and sales intelligence. It is the same data that you use
               | to protect your internet connected devices, used by our
               | customers to sell you something.
               | 
               | IP metadata information that we provide is a cornerstone
               | of keeping the internet safe and accessible for everyone.
               | That is how things just are. The deepweb is immune to IP
               | meta data information, and that is why it is such a messy
               | and chaotic place.
               | 
               | That is just truth of the internet. We are essential and
               | we prefer to be open about our process and listen to our
               | stakeholders (users + customers + non-users).
        
               | chaps wrote:
               | Thank you for the well thought out response. I disagree
               | with just about everything you say, but I understand
               | where you're coming from and I appreciate the validation
               | that the use of a VPN is more important than it's ever
               | been. As a professional courtesy: calling yourself
               | "essential" is an enormous red flag and you might want to
               | consider different phrasing.
        
               | reincoder wrote:
               | I should have used a different phrasing. :) I was reading
               | an article about essential workers today, and that word
               | popped up in my head when I wrote the comment.
               | 
               | It's good that you are using a VPN. I advocate for the
               | usage of VPNs, and many VPN companies actually use our
               | data to verify their server locations. In the VPN
               | industry, VPN companies get their VPN servers from
               | specialized hosting services that cater to dozens of VPN
               | companies. You can check out the ASNs of the VPN IP
               | addresses to find them.
               | 
               | - https://ipinfo.io/AS136787
               | 
               | - https://ipinfo.io/AS16247
               | 
               | VPN companies use our IP geolocation data to confirm the
               | actual location of their servers. Let me tell you a fun
               | story. One VPN company claimed to have a server in the
               | Bahamas, but upon investigation, we discovered that the
               | server was actually located in New York. It was a
               | surprising find. Getting a server in the Bahamas is more
               | challenging than getting one in NY. Just imagine users
               | thinking their internet activity is immune to US
               | jurisdiction because they are using a VPN service based
               | in Bahamas but in fact it is actually located in NY. So,
               | we might not be essential, but we are certainly very
               | useful!
               | 
               | Thank you for the great conversation, dude. Appreciate
               | it.
        
               | wpietri wrote:
               | For sure. When people work in any industry long enough,
               | it's easy to stop thinking about the basics. E.g., a
               | retail butcher thinks of his work very differently than a
               | cow or a vegan does.
               | 
               | When people work in advertising, they mostly forget that
               | the core of their business is for-profit manipulation of
               | people with little or no regard for truth or the people
               | concerned. But I personally think that's kinda creepy,
               | and only getting more so as it goes from broad
               | manipulation of millions via mass media down to
               | thousands, hundreds, or single individuals.
        
           | goodpoint wrote:
           | Together with the tons of data leaked by browsers it makes it
           | very easy to track people across places and devices.
        
           | giantrobot wrote:
           | You might want to unplug your router then. A conceit of being
           | connected to a network is you're connected to the network. If
           | you can see other nodes they can see you.
        
         | welder wrote:
         | Great comment. I'm a big fan and customer of IPinfo, using your
         | API in our login notification emails to say "You just logged in
         | from Berlin, Germany. If this wasn't you click here." To
         | provide country data for customers in their audit logs. And for
         | anti-spam and fraud detection.
        
         | chankstein38 wrote:
         | That's pretty neat! You're basically using ping triangulation!
        
           | sib wrote:
           | Trilateration (same technique as used for mobile network
           | location - in addition to the GPS on the phone)
        
         | incolumitas wrote:
         | Big fan of what articles? On https://incolumitas.com/ or on
         | https://ipapi.is/?
         | 
         | Great idea with latency triangulation, I used latency
         | information for a lot of things, especially VPN and Proxy
         | detection.
         | 
         | But I didn't assume you can obtain that accurate location. I am
         | honestly impressed. But latency triangulation with 600 servers
         | gives some very good approximation. Nice man!
         | 
         | Some questions:
         | 
         | - ICMP traffic is penalised/degraded by some ISP's. How do you
         | deal with that?
         | 
         | - In order to geolocate every IPv4 address, you need to
         | constantly ping billions of IPv4's, how do you do that? You
         | only ping an arbitrary IP of each allocated inetnum/NetRange?
         | 
         | - Most IP addresses do not respond to ICMP packets. Only some
         | servers do. How do you deal with that? Do you find the router
         | in front of the target IP and you geolocate the closest router
         | to the target IP (traceroute)?
        
           | carlhjerpe wrote:
           | You can guess pretty well how IP's are related by BGP
           | announcements, so as long as a few per block and if small,
           | ASN. You can use that logic.
        
           | withinboredom wrote:
           | I'm very curious why you'd do VPN/proxy detection...
           | 
           | But at a previous company I worked at that ran a very large
           | chunk of the internet, we did indexing of nearly the entire
           | internet (even large portions of the dark web) approximately
           | every two weeks. There were about 500 servers doing that non-
           | stop. So, I think it is relatively reasonable if you have 600
           | servers to do that.
        
             | meroje wrote:
             | In the business of media streaming, rightholder will
             | require that you check for vpn and proxies in addition to
             | countries when deciding if a given viewer will be able to
             | stream a given media.
        
               | withinboredom wrote:
               | Does that actually work? That could explain an issue with
               | a particular streaming service I use. There are currently
               | some ongoing routing issues in BGP land and my ISP. When
               | trying to stream, it says I'm using a proxy, so due to
               | the incredible route my packets are taking, that might be
               | it. What's funny is that the only way to watch this
               | service is to use a vpn right now.
        
               | vGPU wrote:
               | They probably just keep a list of known VPN server IP's.
        
               | sitzkrieg wrote:
               | of course it doesnt work but they gotta try clutching
               | pearls and applying whatever pressure they can think of
               | on these fronts
        
               | wpietri wrote:
               | Why is this getting downvoted? It seems to me that a lot
               | of the media-focused anti-piracy tooling is essentially a
               | performance of toughness to make rightsholder execs
               | comfortable. Everybody accepts you can't stop piracy
               | entirely, and nobody's willing to say, "Fuck it, we'll
               | compete on convenience and strong consumer
               | relationships," so we all put up with this weird middle
               | ground of performative DRM and the like. With only the
               | rare occasional bit of honesty, as from Weird Al:
               | https://sfba.social/@williampietri/110906012997848549
        
               | at_a_remove wrote:
               | This is correct. Imagine in the days of yore, some two
               | decades and change ago, when I was charged with
               | implementing putting some music reserves "online" for
               | streaming ...
               | 
               | [Harp music, progressive diagonal wave distortions
               | through the viewport ...]
               | 
               | We had _two_ layers of passwords (one to get to the
               | webpage for the class, one when actually streaming via
               | the client, which was RealPlayer) as well as an IP range
               | restriction to campus (you live off campus? So sorry)
               | because our lawyers were worried about what the RIAA 's
               | lawyers would find sufficient in the wake of a bunch of
               | Napster-baited lawsuits launched at universities. The
               | material itself was largely limited to snippets.
               | 
               | I wanted to say, "Calm down, have a martini or something.
               | College students are just not going to go wild to
               | download 128 kbps segments of old classical music," but
               | alas I was not in charge.
        
           | reincoder wrote:
           | https://incolumitas.com/
           | 
           | This is my all-time favorite article:
           | https://incolumitas.com/2021/11/03/so-you-want-to-scrape-
           | lik...
           | 
           | I used to do freelance web scraping, and that article felt
           | like some kind of forbidden knowledge. After reading the
           | article, I went down the rabbit hole and actually found a
           | Discord server that provided carrier-grade traffic relay from
           | a van which contained dozens of phones.
           | 
           | For the questions..... we have to kinda wait a bit, someone
           | from our engineering team might come here and reply.
           | 
           | By the way, as I have you here have you considered converting
           | the CSV files to MMDB format? I was planning to do that with
           | our mmdbctl tool later today.
           | 
           | https://github.com/ipinfo/mmdbctl
        
             | sambazi wrote:
             | > I used to do freelance web scraping
             | 
             | "don't sell warez"
        
         | voltagex_ wrote:
         | Can your probes be identified and blocked?
        
           | kube-system wrote:
           | iptables -A INPUT -p icmp -j DROP
        
             | chaps wrote:
             | This isn't helpful. The comment was specifically asking
             | about the probes, not ICMP traffic.
        
               | kube-system wrote:
               | Anybody can do this same thing, if you're worried about
               | this, you probably don't want inbound ICMP.
        
               | chaps wrote:
               | Cool. Thanks. But let's say I do.
        
               | kube-system wrote:
               | Then there's nothing you can do. If you respond to pings,
               | then others can take note of the responses you send.
        
               | chaps wrote:
               | You're missing the point that the question is effectively
               | asking for a list of hosts that they can block.
               | 
               | Edit: they provided a method:
               | https://news.ycombinator.com/item?id=37510063
        
               | kube-system wrote:
               | I understand that was the initial question. I am saying
               | that is a fools errand. Anyone with a few VPSes, a
               | calculator, and a map can do this. It isn't just
               | ipinfo.io doing this. There are a lot of ip geolocation
               | services.
        
             | j16sdiz wrote:
             | This breaks PMTU and is the source of many mystery download
             | stalls
        
             | eptyc1 wrote:
             | Indeed. Openwrt for some reason defaults to reply to pings.
             | I see the value of ICMP for servers, but I don't see the
             | value for home ISP routers.
             | 
             | I disabled ICMP reply on my home router.
        
               | sambazi wrote:
               | > Openwrt for some reason defaults to reply to pings.
               | 
               | it's a bit like greeting-back ppl on the street.
               | 
               | not doing it will not make you invisible. it will break
               | somebody's assumption of decency, but most ppl don't care
               | either way.
        
             | voltagex_ wrote:
             | http://shouldiblockicmp.com/
             | 
             | (But the guy running the probes is making a good counter
             | argument)
        
           | reincoder wrote:
           | It is just ping data. We ping an IP address, get the RTT,
           | draw a radius on the globe, and say that the IP could be
           | anywhere inside that radius. Then we do another ping and draw
           | another radius, and at the cross-section of the two radii
           | could be your IP address. Now, if we do it enough times, we
           | can get an estimate of where the IP address is located.
           | 
           | The data is not derived from the IP address itself, but
           | rather from the process itself. And it's just a ping.
           | Moreover, the majority of the IP addresses are not pingable.
           | So, we rely on other in house statistical and scientific
           | models to estimate the location. The probe infrastructure is
           | extremely complicated and there are billions and billions of
           | IP addresses, which is why we do not have a robust range
           | filter mechanism.
           | 
           | You can implement a dynamic ping blocking mechanism or use
           | our data to find hosting ASNs and block ranges of those ASNs.
           | You can download the database for free:
           | https://ipinfo.io/developers/ip-to-country-asn-database
        
       | spacedcowboy wrote:
       | So, at the risk of outing myself, I wrote http://www.hostip.info
       | a long time ago* which used a community approach to get ip
       | address location ("is this guess wrong ? Fix it please").
       | 
       | The last time I checked (maybe a decade ago [grin]) it worked
       | pretty much perfectly for a country, imperfectly for a region,
       | and better-than-a-coin-toss for city resolution. All the data is
       | free.
       | 
       | I don't think they have it on the site any more, but I used to
       | have a rotating 3D-cube thing (x,y,z were the first 3 octets of
       | the address) for things like known-addresses, recent lookups,
       | etc. I used different colours for different groups (country,
       | continent,...) It was so old it was written as a Java applet.
       | Yeah. I guess if I were to do it again, it'd be WebGL.
       | 
       | --
       | 
       | *: I sold it a long time ago, with the proviso that the data must
       | always remain free. I actually didn't believe the offer at first
       | (it came as an email, and looked like a scam) but it went through
       | escrow.com just fine, and I think we both walked away happy. That
       | was almost 2 decades ago now though.
        
       ___________________________________________________________________
       (page generated 2023-09-14 23:00 UTC)