IPinfo on Google Cloud Marketplace
This document explores how to use our IP database within the Google Cloud Platform (GCP)'s data warehouse platform, BigQuery, delivered through Google Cloud Marketplace.
IPinfo BigQuery Datasets available on Google Cloud Marketplace
Compared to our IP data being pushed to GCP’s Google Cloud Storage (GCS), data delivery via the Google Cloud Marketplace provides enhanced benefits, including:
- Data is automatically ingested into the BigQuery platform. The organization/user does not have to write custom ingestion scripts.
- IPinfo BigQuery datasets come with metadata information that helps users better monitor the data updates, freshness and health.
- IPinfo BigQuery datasets also come prepackaged with IP lookup queries and solid documentation, making using our datasets a breeze.
- BigQuery, as the data warehouse, acts as the central repository of all data. With the readily available IPinfo BigQuery dataset, IP enrichment can be done on IP data across different datasets, various storage buckets, cloud logging data, etc.
Features
IPinfo BigQuery Datasets
Currently, customers have the following options to access IPinfo data in their BigQuery environments:
- Visit the GCP Data Marketplace and leverage IPinfo’s free country + asn dataset via self serve consumption.
- Make a request in the GCP Data Marketplace if you are interested in exploring IPinfo’s broader portfolio of contextualized IP address datasets, delivered into BigQuery.
IP Lookup mechanism
IP address datasets are specialized and involve specific solutions for looking up IP metadata information via IP addresses. We highly recommend that users follow our instructions regarding IP lookup mechanisms instead of creating DIY solutions. We have researched to find the most optimal and performant solution for IP lookups.
Our UDF lookup mechanism supports both IPv4 and IPv6 addresses simultaneously.
Advanced User Defined Functions
Through requests, we can provide Advanced User Defined Functions that are extremely performant. Please note that these UDFs require preprocessing the dataset and writing some boilerplate code. However, they are extremely performant and can provide MMDB-like query speed. If you are interested in using the Advanced User Defined Functions with our free IP database, please let us know.
Bringing log data from Cloud Logging
As the IPinfo data resides in BigQuery, you can bring in log data from GCP Cloud Logging and perform IP metadata enrichment (geolocation, privacy, ASN, etc.) inside BigQuery. We recommend checking the documentation provided by Google, visit View logs routed to BigQuery.
We recommend ensuring that only distinct IP addresses are enriched instead of all the IP addresses in a log.
SELECT ip, ipinfo_free.country_asn(ip).*
FROM (SELECT DISTINCT ip FROM my_ips)
As Cloud logging supports many different sources, IPinfo's data inside BQ acts as a powerful data enrichment and analytics platform.
Our free IP database on Google Marketplace
IPinfo's free IP to Country ASN database is available for free through Google Marketplace. You can bring reliable IP data into your BQ environment within seconds. To learn more about the free IP to Country ASN database itself, see its documentation.
The IPinfo IP to Country ASN can be acquired in two ways:
- Via the Google Cloud Marketplace (Redirects to Analytics Hub)
- Via Google Cloud Platform Analytics Hub
Resources included
The IPinfo free IP to Country ASN BQ dataset contains the following resources:
ipinfo__free_ip_to_country_and_asn_database
: The dataset (We recommend changing the dataset name toipinfo_free
)- Routines: Contains UDFs and helper functions
country_asn
: UDF for the IP lookup query
country_asn
: Core IP to Country ASN databaselog_data
: 1000 sample IP addresses with 500 IPv4 and 500 IPv6 addresses.
- Routines: Contains UDFs and helper functions
Using the country_asn
UDF
We package standard IP lookup UDFs with our BigQuery datasets. The UDF is available in the dataset's routine
directory with the same name as the IP data table. Using the UDF requires no custom coding or writing your functions.
SELECT ip, ipinfo_free.country_asn(ip).*
FROM ipinfo_free.log_data
Code breakdown:
- The
log_data
table contains IP addresses from traffic logs. Theip
columns hold all the IP addresses. - The
country_asn
is a UDF packaged with the IP to Location IPinfo BigQuery dataset. Note that both the dataset and the UDF usually have the same name. However, the brackets note the UDF as a function()
. .*
can return all the columns from the IP data table or even specific columns. For example, for city names,country_asn(ip).city
, for ASN,country_asn(ip).asn
.
Query performance
-- 1M IPv4
SELECT ip, `ipinfo-public`.ipinfo_free.ip_country_asn(ip).*
FROM (SELECT * FROM log WHERE ip NOT LIKE '%:%' LIMIT 1000000);
-- Elapsed: 9s
-- Slot time: 4m4s
-- Shuffled: 3.7GB
-- 1M IPv6
SELECT ip, `ipinfo-public`.ipinfo_free.ip_country_asn(ip).*
FROM (SELECT * FROM log WHERE ip LIKE '%:%' LIMIT 1000000);
-- Elapsed: 20s
-- Slot time: 5m42s
-- Shuffled: 3.83MB
-- 10M IPv6
SELECT ip, `ipinfo-public`.ipinfo_free.ip_country_asn(ip).*
FROM (SELECT * FROM log WHERE ip LIKE '%:%' LIMIT 10000000);
-- Elapsed: 5m30s
-- Slot time: 2h25m
-- Shuffled: 12.2GB
-- 100M IPv6
SELECT ip, `ipinfo-public`.ipinfo_free.ip_country_asn(ip).*
FROM (SELECT * FROM log WHERE ip LIKE '%:%' LIMIT 100000000);
-- Elapsed: 45m
-- Slot time: 1d18h
-- Shuffled: 91GB
Please note that our Advanced User Defined Functions are far more performant than our standard UDFs. They can be delivered upon request.
We welcome your feedback and if you have any feature requests or need support using the BQ IPinfo dataset, please create a post in our IPinfo Community.