How to solve the problem of cross-border DNS resolution failure?

How to solve the problem of cross-border DNS resolution failure?

question

The company uses Alibaba Cloud infrastructure, and uses overseas Akamai as the DNS resolution service provider for domain names.

Currently, some applications need to be called by third-party applications, and there is also a need to actively call third-party applications. Recently, many call failures have occurred.

Application call failed:

Gitlab pull failed:

Troubleshooting

1. Perform loop packet capture on ECS and modify the resolv.conf configuration

Packet capture command: tcpdump -i any -s 0 port 53 and host [domain name] -C 100 -W 50 -w

/tmp/dns.pcap

Parameter Description:

-i: specifies the network card interface to be filtered. If you want to view all network cards, you can use -i any.

-s: By default, tcpdump will only capture the first 96 bytes. To capture all the message contents, use -s

number, number is the number of bytes in the message you want to intercept. If it is 0, it means intercepting the entire message content.

-C: file-size, tcpdump checks whether the file size exceeds file-size before saving the original data packet directly to the file. If it exceeds, the file will be closed and another file will be created to continue recording the original data packet. The newly created file name is the same as the file name specified by the -w option, but there is an additional number after the file name. The number will increase from 1 as the number of newly created files increases. The unit of file-size is million bytes (nt: here it means 1,000,000 bytes, not 1,048,576 bytes, the latter is calculated based on 1024 bytes as 1k, 1024k bytes as 1M, that is, 1M=1024 *1024 = 1,048,576). Here it is 100M

-W parameter: When used together with the -C parameter, it can achieve the effect of writing files in a loop. Here is to grab 50 files

-w file path and file name are used to specify the path and name of the saved file. If no path is specified, the system default path will be used.

Standard default resolv.conf standard configuration:

options timeout:2 attempts:3 rotate single-request-reopen

#This configuration makes a random selection among all nameservers when resolving domain names.

nameserver 100.100.x.xxx

nameserver 100.100.x.xxx

2. Frequently execute the git pull command and wait for the error to appear

3. Confirm the DNS export IP address

Execute the command multiple times: dig whoami.ds.akahelp.net txt +short

"ns" "106.xx.xxx.8"

"ns" "106.xx.xxx.8"

"ns" "106.xx.xxx.7"

"ns" "106.xx.xxx.6"

"ns" "106.xx.xxx.6"

"ns" "106.xx.xxx.7"

"ns" "106.xx.xxx.1"

"ns" "106.xx.xxx.6"

"ns" "106.xx.xxx.8"

"ns" "106.xx.xxx.6"

"ns" "106.xx.xxx.6"

"ns" "106.xx.xxx.7"

4. Check the cloud vendor server to find out the cause of the problem

Here are the reasons:

a. First, Alibaba Cloud's DNS service does not have a cache.

b. When a user or application initiates domain name resolution

c. If the Alibaba Cloud DNS server has the requested address and the TTL time has not expired, the result will be returned directly.

d. Otherwise, Alibaba Cloud DNS server will go to Akamai overseas to request resolution records, but due to the network from China to overseas,

Fluctuations may cause some requests to fail

Temporary solution

We used two temporary solutions before a long-term solution was implemented.

1. If it is an A record, solve it by temporarily binding Hosts

2. If it is a CNAME record or other, use Alibaba Cloud's Private Zone temporary intranet DNS resolution service to resolve it

Long-term solution

In order to solve this problem in the long term, we are still planning to put the domain name resolution service on Alibaba Cloud's cloud resolution service to ensure that there are no problems with domestic access. At the same time, the application also needs to make two changes:

1. To set up a retry mechanism for failed calls, for example, retry three times after a failure with an interval of 3 seconds each time

2. To set up a compensation mechanism for failures after retries, the business owner needs to formulate specific compensation rules.

<<:  Motorola Solutions and Zhongrui Technology Launch MOTOTRBO™R2 to Enter a New Era of Digital Trunk Communications

>>:  Enterprise 5G: A guide to planning, architecture and benefits

Recommend

Migrate WHM/cPanel data to DA (DirectAdmin)

I shared an article about migrating from CP to DA...

Seven development tools for continuous integration and continuous delivery

[[184286]] The software development cycle require...

What other issues do we need to address to grow our business?

Consumers in today's world are more "fic...

Quantum computing is always mixed, which requires constant coordination

The modern computing revolution was driven by the...