My boss told me not to use strings to store IP addresses, no!

My boss told me not to use strings to store IP addresses, no!

[[432371]]

How to store IP address data in the database? Perhaps you have encountered this kind of problem more than once. How to store it? Isn't varchar(255) enough? Frankly speaking, in the several projects I have experienced, I have almost always encountered data fields storing IP addresses (V4, V6), and they all use variable-length strings varchar(15) to store them. Well, it feels pretty good...

In fact, I read in "High Performance MySQL 3rd Edition" a long time ago that IP addresses are a special type of data and should be converted to integer storage.

High Performance MySQL 3rd Edition

4.1.7 Special Types of Data

Some types of data do not fit neatly into built-in types. Timestamps with sub-second precision are one example; the previous section of this document demonstrates some options for storing this type of data.

Another example is an IPv4 address. People often use VARCHAR(15) columns to store IP addresses. However, they are actually 32-bit unsigned integers, not strings. The notation that uses a decimal point to divide the address into four segments is just to make it easier for people to read. So you should store IP addresses as unsigned integers. MySQL provides INET ATON() and INET NTOA() functions to convert between these two notations.

??But the project does not involve the high-frequency query business demand for IP addresses; so, you know, our programmers' three no rules: don't read the code that has nothing to do with you, don't learn the technology that your module doesn't use, and don't touch the legacy code as long as it can run!

Until the boss looked at our project data table and asked: "Do you use strings to store IP addresses? This is not good! You should use integers."

"Boss, I understand your optimization ideas. Look at our table, there are only dozens of data (dog head)..."

Until last week, a classmate asked me how to store IP addresses in the database. He was asked about this in an interview, and I suddenly realized that it was time to record this.

1. How to store IP addresses

In MySQL, when storing IPv4 addresses, you should use 32-bit unsigned integers (UNSIGNED INT) to store IP addresses instead of using strings. The IP address stored in the UNSIGNED INT type is a 4-byte integer.

If the IP address is stored as a string, in normal format, the minimum length is 7 characters (0.0.0.0) and the maximum length is 15 (255.255.255.255), so we usually use varchar(15) to store it. At the same time, in order to let the database accurately track how much data is in the column, the database will add an extra 1 byte to store the length of the string. This makes the actual data storage cost of the IP represented by a string require 16 bytes.

This means that storing each IP address as a string would cost approximately 10 bytes of extra space per row.

If you think that having enough disk is not a problem, then I have to tell you that this will not only consume more disk space for data files, but also increase the size of index files by the same proportion if the field is indexed. Cached data will require more memory to cache data or indexes, which may push other more valuable content out of the cache. When SQL is executed to perform CRUD on the field, more CPU resources will also be consumed.

When using Oracle10g earlier, there was no related function for IP integers and strings, but there are built-in functions in MySQL to convert IP and numerical values ​​to each other.

  • INET_ATON()

Convert IP to integer.

Algorithm: multiply the first digit by 256 cubed + the second digit by 256 quadratically + the third digit by 256 firstly + the fourth digit by 256 zerothly

  • INET_NTOA()

Reverse conversion of numbers to IP

  1. SELECT INET_ATON( '127.0.0.1' );
  2.  
  3. + ------------------------+  
  4. | INET_ATON( '127.0.0.1' ) |
  5. + ------------------------+  
  6. | 2130706433 |
  7. + ------------------------+  
  8. 1 row in   set (0.00 sec)
  9.  
  10.  
  11. SELECT INET_NTOA( '2130706433' );
  12.  
  13. + -------------------------+  
  14. | INET_NTOA( '2130706433' ) |
  15. + -------------------------+  
  16. | 127.0.0.1 |
  17. + -------------------------+  
  18. 1 row in   set (0.02 sec)

If it is an IPv6 address, you can use the functions INET6_ATON() and INET6_NTOA() to convert it:

  1. mysql> SELECT HEX(INET6_ATON( '1030::C9B4:FF12:48AA:1A2B' ));
  2.  
  3. + ------------------------------------------------+  
  4. | HEX(INET6_ATON( '1030::C9B4:FF12:48AA:1A2B' )) |
  5. + ------------------------------------------------+  
  6. | 1030000000000000C9B4FF1248AA1A2B |
  7. + ------------------------------------------------+  
  8. 1 row in   set  
  9.  
  10. mysql> SELECT INET6_NTOA(UNHEX( '1030000000000000C9B4FF1248AA1A2B' ));
  11. + ----------------------------------------------------------------+  
  12. | INET6_NTOA(UNHEX( '1030000000000000C9B4FF1248AA1A2B' )) |
  13. + ----------------------------------------------------------------+  
  14. | 1030::c9b4:ff12:48aa:1a2b |
  15. + ----------------------------------------------------------------+  
  16. 1 row in   set  

Then define the database as varbinary type and allocate 128 bits of space (because IPv6 uses 128 bits, 16 bytes); or define it as char type and allocate 32 bits of space.

2. Query performance experiment of integer storage IP address

Test data, using a stored procedure to generate 1 million random IP addresses;

1. Test range query:

Convert IP to Int, query: takes 0.60s

  1. select ip_int from T where ip_int > INET_ATON( '192.0.0.0' ) and ip_int <=INET_ATON( '192.255.255.255' );
  2.  
  3. 1726 rows in   set , 1 warning (0.60 sec)
  • IP is a string, query: takes 0.63s
  1. select ip_varchar from T where ip_varchar like   '192.%' ;
  2.  
  3. 1726 rows in   set , 1 warning (0.63 sec)

2. IP accurate query:

  1. select ip_int from T where ip_int = INET_ATON( '192.168.0.0' );
  2.  
  3. 1 row in   set , 1 warning (0.00 sec)
  1. select ip_varchar from T where ip_varchar= '192.168.0.0' ;
  2.  
  3. 1 row in   set , 1 warning (0.00 sec)

The results are all obtained in 0 seconds. This can be considered as a constant index query, with no significant difference in performance.

3. After sorting out the results, we found that:

  • Range query and exact query:

The difference is not obvious when the data volume is small. If the data volume is expanded to about 10 million or 100 million rows, the range query difference is expected to be reduced to 0.5s when the data volume is 100 million rows.

  • Storage space saving:

Based on 100 million rows, theoretically, varchar can store up to 15 bytes, and the value is 4 bytes, which saves about 10 bytes*100 million, or about 1G of space.

Including the space occupied by the index file, one index can save 1G, which can save about 2G of space.

Summarize

IP address data is stored as integers (UNSIGNED INT), which uses less storage and CPU resources than string storage. In range queries with large ambiguity, storing integers does not need to worry about the number of digits in the range, making queries more intuitive and convenient.

??However, integer storage needs to be processed using specific functions such as INET_ATON and INET_NTOA, which are easy to read. The functions will also consume additional CPU. After inspection, it was found that the CPU overhead is negligible.

Therefore, if a range query is required and the amount of data is large (e.g., more than 100 million), it is better to use numerical storage of IP addresses. If only unique IP addresses are required for precise query or the amount of data is not large, it is simpler to use string operations.

<<:  These four routers are the most worth buying during the shopping festival, covering all high, medium and low end

>>:  Investigation results of South Korea's nationwide Internet outage released

Recommend

Static routing or dynamic routing, an example to make it clear!

What is routing? Routing refers to the path that ...

Ethernet Adapter Market to See Record Revenue Growth in 2022

According to a recent report released by DellOro ...

Omdia: Next-generation PON equipment market to reach nearly $8.9 billion in 2025

LightReading reported that Omdia pointed out in i...

Network Slicing "Hot Pot Theory": Same Pot, Different Dreams

In the dog days of summer, when people are "...

What is the Internet? — Talking about the development of the Internet

A brief discussion on the Internet of Things (I):...

Modernizing Configuration Management to Address Network Complexity

The expansion of network infrastructure to multip...

Customize SD-WAN to meet your needs

Why do we always think we can adopt a revolutiona...