Explore our IP Address Database Downloads for instant access to our IP address insights

Learn more
a day ago by Fernanda Donnini 9 min read

From Probability to Certainty: IPinfo’s Journey to Accurate IP Data

Our Journey to Accurate IP Data | IPinfo

Every great innovation starts with a simple realization: there has to be a better way. For IPinfo, that realization came through customer feedback. What began in 2013 as a simple API using freely available geolocation data evolved into something much more significant when users started telling us the hard truth—the data wasn't good enough.

This feedback sparked a mission that continues to define IPinfo today: creating the most accurate IP data possible. As we set out to build better geolocation data, our customers began asking for more context around IP addresses—company details, VPN detection, and other attributes that have become core parts of our offering.

As our founder Ben Dowling notes, "There are two main things we're always doing at IPinfo. The first is making sure that our data is great quality. That will be a project that will probably last forever. The other piece is how we make it easy for people to access that data."

The Dynamic Nature of IP Data

The internet is constantly evolving. IP addresses, in particular, change frequently for several reasons. Internet Service Providers (ISPs) often use dynamic IP assignments, meaning customers' IP addresses change periodically as part of normal network management. Corporate networks frequently reorganize their IP infrastructure during cloud migrations or network expansions. The rapid growth of mobile devices and IoT has also increased IP address turnover as devices connect and disconnect from different networks. Additionally, the ongoing transition to IPv6 and the reallocation of IPv4 addresses create constant shifts in how IP addresses are assigned and used.

Our data shows that 25% to 56% of IP addresses experience changes in at least one key attribute—location, type, VPN status, carrier, or ASN—every month. This dynamic nature means that accuracy isn't a static achievement but a continuous process of validation and refinement. 

The Problem With Relying Solely on WHOIS Data

When IP data providers rely exclusively on WHOIS records and self-reported location data, they're building on a fundamentally flawed foundation. Let us share a real-world example that illustrates why.

Recently, we discovered a case where a server physically located in Amsterdam was being advertised as being in multiple different locations across Europe. While most IP data providers—who rely solely on WHOIS and geofeed data—reported this server in its advertised location, our active measurements revealed the truth: the server was actually in Amsterdam all along.

This is just one of many cases where IPinfo's data accuracy surpasses other leading providers. We created a web page showing IPs that our competitors have identified as being in the wrong country. This is just a small sample, but the list is long.

Why does this happen? WHOIS and geofeed data are voluntarily provided, unverified, and self-published. In today's world of CDNs and VPNs, server vendors often have business incentives to misrepresent their IP locations. For instance, hosting resellers can charge premium prices for "exotic" server locations while actually providing services from standard data centers in major cities.

This isn't just a theoretical problem. In one detailed case study, we analyzed an Autonomous System (AS) where the discrepancy between reported and actual locations was striking:

  • For this AS, 57% of their IP ranges had completely inaccurate location information
  • IP addresses physically located in the Netherlands were being reported across 27 different countries in WHOIS records
  • Overall, 58.93% of the analyzed IP addresses didn't match their self-reported locations

While this is just one example, it illustrates how WHOIS data can be manipulated and why relying solely on self-reported information is problematic.

When IP data providers claim near-perfect accuracy while relying solely on WHOIS data, they're not just making unverifiable claims—they're building their entire service on data that can be, and often is, intentionally misleading.

What About Geofeeds?

Recent research further underscores the limitations of relying on geofeeds and self-reported location data. Geofeeds are files published by network operators that map IP address ranges to geographic locations. These structured data feeds are meant to provide authoritative location information for IP addresses under an operator's control. While they represent an attempt to standardize location reporting, research shows they have significant limitations. 

In our paper "Geofeeds: Revolutionizing IP Geolocation or Illusionary Promises?", our team partnered with independent researchers and analyzed the accuracy of geofeeds using constraint-based geolocation measurements. The initial 2023 study found that geofeeds covered 1.50% of IPv4 and 0.70% of IPv6 prefixes - a number that has since grown to ~10% of IPv4 space. However, even with this increased adoption, the study revealed geofeeds contained erroneous information for 0.9% of client IPs, 4.0% of router IPs, and 8.5% of server IPs.

The inaccuracies stem from multiple sources - operators may forget to update their geofeeds, make mistakes when creating them, or in some cases, intentionally misrepresent locations. This is particularly evident with VPN providers who use geofeeds to make IPs appear in locations where they don't actually have servers. A 2018 study analyzing proxy and VPN servers found that one-third of servers were definitely not in their advertised countries, with another third being questionable. Instead of being distributed globally as claimed, they were concentrated in countries with cheap and reliable hosting.

While geofeeds can provide hints for building IP geolocation datasets, they cannot be relied upon as ground truth due to these inaccuracies. That's why IPinfo takes a different approach. Instead of blindly trusting self-reported data like WHOIS and geofeeds, we:

  • Actively verify locations through our global probe network
  • Cross-reference multiple data sources
  • Provide evidence for our geolocation determinations
  • Maintain historical data to track changes over time

This means our customers can trust our data not because we claim it's accurate, but because we can prove why it's accurate through real-world measurements and validation.

Our Data Accuracy Process

1. Data Collection: Building the Foundation

We start by building a comprehensive foundation of IP data from over 20 different sources with 99% of our IP data refreshed every 24 hours to ensure we’re working with the most current information available. While other providers might stop at basic data collection relying only on WHOIS records and geofeeds, we see this as just the beginning. We gather data from:

  • WHOIS records and RIR databases
  • BGP routes and ASN peering data
  • Reverse DNS lookups
  • Geofeeds and other public databases

2. Data Processing: From Raw Data to Intelligence

Our proprietary algorithms transform raw data into actionable intelligence by processing over 20 terabytes of data daily, creating sophisticated models that enable accurate IP address predictions:

  • Analyzing and cross-referencing billions of data points
  • Creating dynamic probability models for IP behavior
  • Aggregating IP ranges using advanced CIDR techniques
  • Establishing domain-IP associations
  • Scoring privacy attributes and calculating confidence metrics
  • Identifying and filtering out incorrect data

3. Active Measurement: Verifying Reality

We are the only IP data company that actively maps the internet in real time, calculating precise IP locations through geographic polygons with ground-truth accuracy.

Our proprietary probe network leverages more than 910 servers worldwide to create precise geographic polygons by measuring round-trip times from multiple probe servers and individual IP addresses. The system calculates maximum possible distances based on the speed of light in fiber optic cables (200km/ms), establishing circular boundaries around each probe server based on measured delays. By finding the intersection of these circles, our network pinpoints possible location areas with unprecedented accuracy. This triangulation method is similar to how GPS systems determine location, but applied to IP addresses across the internet.

Beyond geolocation, our probe network actively scans the entire internet daily to validate and enrich IP intelligence. While competitors rely on third-party data, we independently verify our information through comprehensive active measurements, including:

  • Executing 400 billion IP measurements weekly to ensure data freshness
  • Conducting sophisticated ping and traceroute analysis for network path mapping
  • Performing detailed port scanning to detect active services
  • Validating carrier codes and connection types for accurate provider identification
  • Detecting and monitoring VPN and proxy services in real time

4. Certainty Validation: Turning Probability Into Fact

We employ comprehensive validation techniques to ensure data accuracy, including network telemetry analysis, WHOIS-based ownership confirmation, geofeed cross-checking, ground truth data comparisons, and statistical model validation. This active verification transforms theoretical data points into validated facts, ensuring our information reflects real-world conditions.

Our continuous validation process tags more than 1.4 billion IPs with over 22 meta tags, transforming probabilistic data into verified facts. Through rigorous validation and classification, we are the internet data company delivering unparalleled clarity in identifying key internet components across different categories:

  • Infrastructure and Hosting: We precisely identify network elements including anycast, CDN, internet exchanges, and routers, along with critical hosting services such as webservers, nameservers, resolvers, and GeoDNS.
  • Privacy and Security: Our system detects and classifies privacy-related services including proxies, relays, Tor nodes, and VPN endpoints, providing crucial intelligence about network security measures.
  • Network Services: We track communication services (mailservers, SSH), mobility solutions (mobile networks, hotspots, satellite connections), monitoring systems (crawlers, internet scanners), and specialized services like BitTorrent and cloud providers.

5. Research & Improvement: Innovating Together

Our commitment to accuracy is deeply rooted in collaboration. We believe the best solutions come from working closely with our customers and the broader internet research community. By leveraging customer feedback and global research, we push data accuracy forward, setting new industry benchmarks. This collaborative approach drives continuous improvement in several key ways:

Customer-Centric Innovation

We don't just build products and hope they work—we actively partner with our customers to understand and solve their real-world challenges. This partnership has led to numerous innovations in our data accuracy process, from enhanced VPN detection methods to the residential proxy dataset, 100% accurate country data and more precise carrier identification. When our customers encounter edge cases or unique scenarios, we work together to develop solutions that benefit not just that customer, but our entire user base.

Academic Research Program

Through our Academic Research Program, we collaborate with leading academic institutions and researchers worldwide. This program:

  • Provides academic researchers with free access to comprehensive IP datasets
  • Supports studies in network measurement, security, and internet topology
  • Facilitates the development of new methodologies for IP data validation
  • Contributes to peer-reviewed research publications
  • Brings cutting-edge academic insights into our data accuracy process

Our research contributions include groundbreaking work that challenges long-held assumptions about IPv6 performance. While IPv6 was historically perceived as slower than IPv4, our recent study by our Head of Research Oliver Gasser published in the ACM Internet Measurement Conference 2024 shows this is no longer the case. Through extensive measurement of major content providers' networks, we found that IPv6 performance now closely matches IPv4, with latency differences typically under 5 milliseconds, as shown in the figure below. For companies like Google, Akamai, and Netflix, the data shows their IPv6 infrastructure has reached performance parity with IPv4. This kind of research helps us better understand evolving internet architecture and ensures our IP data accuracy process reflects real-world network behavior.

Continuous Evolution

The internet never stands still, so neither does our improvement process. We:

  • Constantly refine our datasets based on real-world feedback
  • Develop new methodologies to address emerging challenges
  • Correct inaccurate public information through validated data
  • Integrate insights from both customer experience and academic research

This multi-faceted approach to research and improvement ensures that our data accuracy isn't just maintained—it continuously advances. 

Setting New Standards

At IPinfo, we've redefined what accuracy means in IP data intelligence. While others make claims about accuracy without substantiation, we deliver evidence-based intelligence that transforms how organizations understand and use IP data. Every data point we provide is backed by real-world measurements and active verification.

Our comprehensive data accuracy process has established new benchmarks for the industry through:

  • Unmatched Data Coverage: We gather data from more than 20 different sources and refresh data for 99% of IP addresses daily, with each update validated through our global probe network, ensuring you always work with verified, current information.
  • Evidence-Based Processing: Our systems process over 140 terabytes of data weekly, transforming raw information into actionable intelligence through sophisticated validation techniques for every IP attribute.
  • Active Verification: We conduct 400 billion weekly measurements through our global probe network, providing concrete evidence for every IP location and characteristic we identify, rather than relying on unverified third-party claims.
  • Continuous Validation: Our process never stops—we constantly validate and refine our data through real-world measurements, academic research, and documented network patterns, maintaining a clear chain of evidence for all our findings.

For organizations making critical decisions based on IP intelligence, IPinfo delivers more than just data—we provide certainty backed by evidence in an increasingly complex digital landscape. Our infrastructure and accuracy processes don't just meet industry standards; they define them through verifiable, measurement-based results.

Experience the IPinfo data accuracy difference

Get instant access to IPinfo’s IP geolocation API to try out our peerless data for yourself.

Try IP geolocation API

About the author

Fernanda Donnini

Fernanda Donnini

As the product marketing manager, Fernanda helps customers better understand how IPinfo products can serve their needs.