How to store IP address data in the database? Perhaps you have encountered this kind of problem more than once. How to store it? Isn't varchar(255) enough? Frankly speaking, in the several projects I have experienced, I have almost always encountered data fields storing IP addresses (V4, V6), and they all use variable-length strings varchar(15) to store them. Well, it feels pretty good... In fact, I read in "High Performance MySQL 3rd Edition" a long time ago that IP addresses are a special type of data and should be converted to integer storage. High Performance MySQL 3rd Edition 4.1.7 Special Types of Data Some types of data do not fit neatly into built-in types. Timestamps with sub-second precision are one example; the previous section of this document demonstrates some options for storing this type of data. Another example is an IPv4 address. People often use VARCHAR(15) columns to store IP addresses. However, they are actually 32-bit unsigned integers, not strings. The notation that uses a decimal point to divide the address into four segments is just to make it easier for people to read. So you should store IP addresses as unsigned integers. MySQL provides INET ATON() and INET NTOA() functions to convert between these two notations. ??But the project does not involve the high-frequency query business demand for IP addresses; so, you know, our programmers' three no rules: don't read the code that has nothing to do with you, don't learn the technology that your module doesn't use, and don't touch the legacy code as long as it can run! Until the boss looked at our project data table and asked: "Do you use strings to store IP addresses? This is not good! You should use integers." "Boss, I understand your optimization ideas. Look at our table, there are only dozens of data (dog head)..." Until last week, a classmate asked me how to store IP addresses in the database. He was asked about this in an interview, and I suddenly realized that it was time to record this. 1. How to store IP addressesIn MySQL, when storing IPv4 addresses, you should use 32-bit unsigned integers (UNSIGNED INT) to store IP addresses instead of using strings. The IP address stored in the UNSIGNED INT type is a 4-byte integer. If the IP address is stored as a string, in normal format, the minimum length is 7 characters (0.0.0.0) and the maximum length is 15 (255.255.255.255), so we usually use varchar(15) to store it. At the same time, in order to let the database accurately track how much data is in the column, the database will add an extra 1 byte to store the length of the string. This makes the actual data storage cost of the IP represented by a string require 16 bytes. This means that storing each IP address as a string would cost approximately 10 bytes of extra space per row. If you think that having enough disk is not a problem, then I have to tell you that this will not only consume more disk space for data files, but also increase the size of index files by the same proportion if the field is indexed. Cached data will require more memory to cache data or indexes, which may push other more valuable content out of the cache. When SQL is executed to perform CRUD on the field, more CPU resources will also be consumed. When using Oracle10g earlier, there was no related function for IP integers and strings, but there are built-in functions in MySQL to convert IP and numerical values to each other.
Convert IP to integer. Algorithm: multiply the first digit by 256 cubed + the second digit by 256 quadratically + the third digit by 256 firstly + the fourth digit by 256 zerothly
Reverse conversion of numbers to IP
If it is an IPv6 address, you can use the functions INET6_ATON() and INET6_NTOA() to convert it:
Then define the database as varbinary type and allocate 128 bits of space (because IPv6 uses 128 bits, 16 bytes); or define it as char type and allocate 32 bits of space. 2. Query performance experiment of integer storage IP addressTest data, using a stored procedure to generate 1 million random IP addresses; 1. Test range query: Convert IP to Int, query: takes 0.60s
2. IP accurate query:
The results are all obtained in 0 seconds. This can be considered as a constant index query, with no significant difference in performance. 3. After sorting out the results, we found that:
The difference is not obvious when the data volume is small. If the data volume is expanded to about 10 million or 100 million rows, the range query difference is expected to be reduced to 0.5s when the data volume is 100 million rows.
Based on 100 million rows, theoretically, varchar can store up to 15 bytes, and the value is 4 bytes, which saves about 10 bytes*100 million, or about 1G of space. Including the space occupied by the index file, one index can save 1G, which can save about 2G of space. SummarizeIP address data is stored as integers (UNSIGNED INT), which uses less storage and CPU resources than string storage. In range queries with large ambiguity, storing integers does not need to worry about the number of digits in the range, making queries more intuitive and convenient. ??However, integer storage needs to be processed using specific functions such as INET_ATON and INET_NTOA, which are easy to read. The functions will also consume additional CPU. After inspection, it was found that the CPU overhead is negligible. Therefore, if a range query is required and the amount of data is large (e.g., more than 100 million), it is better to use numerical storage of IP addresses. If only unique IP addresses are required for precise query or the amount of data is not large, it is simpler to use string operations. |
>>: Investigation results of South Korea's nationwide Internet outage released
1. VXLAN Overview 1. What is VXLAN VXLAN (Virtual...
What is routing? Routing refers to the path that ...
According to a recent report released by DellOro ...
LightReading reported that Omdia pointed out in i...
In the dog days of summer, when people are "...
A brief discussion on the Internet of Things (I):...
The expansion of network infrastructure to multip...
Last time we shared the news of V.PS Hong Kong ne...
Recently, we have received a lot of product or pr...
Why do we always think we can adopt a revolutiona...
Recently, Cisco announced the launch of a new IT ...
NVIDIA today announced a 6G research platform tha...
SK Telecom, Korea Telecom and LG Uplus have teame...
DogYun is a Chinese hosting company established i...
As the year draws to a close, the smartphone indu...