PrefaceHello, everyone. I am Dasai Ge (brother). It’s been a long time since we last met. I miss you so much. Recently, due to a small demand, I studied the encryption of two logins and successfully decrypted the encrypted parameters. I would like to share them with you here. Some time ago, a classmate's laboratory server campus network kept disconnecting, and he wanted to ask if there was any way to reconnect. I didn't study it because I was busy, and I haven't done it for a long time. I studied and analyzed it yesterday when I had nothing to do. This process may be a piece of cake for people with basic knowledge, but for those who have never understood it, you can try it out and maybe it will be useful later. Sometimes it's easy to learn one thing and then learn another. The scope of this content belongs to the advanced crawler: JS decryption. Of course, with the diverse and colorful packaging methods, anti-crawling methods are becoming more and more sophisticated. Many websites, especially those with commercial data, are really difficult to deal with. Most websites require permission authentication and management. Many pages and operations require authentication before they can be accessed, and login is the most critical step in authentication. If we want to be unimpeded on a page, most of the time we need users to log in. Login is the first problem that many crawler programs have to solve, and it is also the most complicated and difficult part of the entire crawler. Only after logging in can we use the program. The above lists two login situations. The first situation rarely occurs, but the login we wrote during our student stage is implemented in this way. The plain text is not encrypted, but this situation is not very safe, so most logins or requests will encrypt some parameters. If we want to use a program to simulate this login, we need to understand the formation process of each parameter and simulate the generation and sending. Of course, the most difficult problem of login is the verification code, which I have not studied due to my limited ability, so I will not explain it here. Most websites do not have verification codes when the error rate is not high, so you can still try it in most scenarios. The campus network needs to be successfully logged in through http before accessing the Internet. The login parameter password is encrypted. Below I will share with you a small analysis based on my own environment. analyzeHaving introduced so much above, let’s get straight to the point and start analyzing this issue. For the campus network, we connect to its wifi or network cable, and we are in the local area network of this campus network. Network traffic requires costs. When you access the external network, if you do not obtain authentication authorization, you will not be able to access external services. Only by successfully logging into the campus network platform can you access the Internet. However, there are many ways to log in now. The first step is to observe the login situation. I roughly divide it into two types: one is ordinary form login, and the other is Ajax dynamic login. How to distinguish between the two? It's very simple, just check if the URL changes when you log in (cool??). You see, the URL of a school's campus network login page remains unchanged after logging in, so this is a case of Ajax login. Is there any difference between the two? Not much, but Ajax generally does not require the use of professional packet capture tools, while some form logins may involve various redirects and new pages, and the browser may not be able to capture the corresponding information well, so you need to use some fiddler, wireshark and other tools to capture the packets. First, we need to open the browser's F12, open the network item, and then click on the XHR small directory. If you click all, you will get too much content, and a small part of the data may be hidden in JavaScript (normally not). Doc is generally the main page. If it is a normal form, you need to look at the doc request. After clicking login, you can see the content of each request interaction. You will find that there is a login on this page, and there is a getchallenge on login. First, click login to view the parameters carried. You can see that this request has three parameters, namely username, password, and an unknown challenge, but there is a getchallenge request above. Then I take a look and find that there is indeed a challenge parameter. Of course, if there are other parameters, it may exist directly on the page, or it may be dynamically generated through encryption. You have to analyze it yourself. From the above picture, you can see that we only need to decipher the encryption method of this password (people who are sensitive to data may have guessed what encryption it is). Now that you know which parameter needs to be resolved, you can generally start from two aspects. The first is to use the browser element to locate the login button, and use the global search to see where it is used in js. You can debug the logic, but many of these solutions seem to be difficult to find useful content from beginning to end, because you don’t know that its parameters may have been encrypted after you fill them in, so this method is not recommended. Search directly for the parameters, there are three parameters, username, password, challenge, you can search directly, here I search for password, to see where the password is used, including login and other words can be searched. Finally, I saw the logic of login somewhere, this password should be encrypted by the createChapPassword method. We set a breakpoint here, and then click login. The program successfully reaches the breakpoint, and at this time our account password is still in plain text, indicating that the data is still not encrypted. From here we need to start to sort out the logic. Enter and take a look at the function, and you will find that the core content is here.
I will explain the logic here to you. If you don’t understand, just use a search engine to search for it. The first is an id generated by a random number. When reproducing in other languages, you can choose a fixed one. Then Ajax sends a request to get a challenge parameter. The str is first added with the Unicode character corresponding to the id, and then challenge is added in pairs to form a hexadecimal number corresponding to the Unicode character. Encrypt str with MD5, and then piece together the returned result. So, the encryption logic of the parameters is here, and we just need to reproduce it. Logic ReproductionThe fact is that the logic of reproduction is not that simple. When I reproduced it, I honestly found that there was no problem in the front, and compared it with the content of the browser. However, when MD5 was implemented in Python, the result was inconsistent with the MD5 encrypted content on the front end. This problem really took a long time to investigate and wasted a lot of time. I would like to share the process with you. What's the situation? Normal programming languages need to encode the string first, and then perform MD5 encoding. The most well-known conventional encoding method is utf-8, and the results of websites using online encryption are the same as the encryption results of the Python library. Then I tried to print the result of UTF-8 encoding of the characters in the console, and used the browser console to encrypt my encoded string. I was shocked! The result was consistent with the controlled result (the string of 33c9). This means that the MD5 encryption library of JQuery does not encode the characters in utf-8 but adopts other methods. We need to find this method to implement in the programming language. After several attempts and searches, we finally found an encoding format: ISO-8859-1 This code was learned a long time ago when I downloaded files from a Java Web server and found that the Chinese file name was abnormal. I rarely encountered file re-encoding after that. After replacing it with this code, I finally printed out the result we wanted.
Now it's all done. Encapsulate the code and give it a try. Here I use Python to implement it, but Java can also be the same. I use the session of the requests module (this module automatically keeps cookies), but the code cannot be used, so I can only compare it with the previous front-end JavaScript logic.
Preparing for launch, there was no internet originally, but after logging in, the network came back. It seems our result was successful. Successful launch SummarizeThis question is not complicated for veterans, but it may be a novel and interesting thing for many people. Of course, in recent years, it is okay to play with crawlers by yourself, but it may be dangerous to design large-scale crawling of commercial or privacy data. I will have the opportunity to share some pages with traditional login methods later. This small encryption analysis is easy to reproduce but was stuck for a long time due to encoding issues. In the final analysis, it is because of the weak foundation. I did not have a solid grasp of these encryption algorithms and basic low-level stuff and wasted a lot of time. Many experts may guess what encryption it may be when they see a string, and what type of encoding this data format is... Fortunately, this demo will make up for this blind spot. |
<<: What the hell is cross-domain? Do you understand?
[Beijing, China, September 27] Today, the 2022 Ch...
There is no doubt that the ultra-high-speed Inter...
Pesyun (Standard Interconnect) has released the l...
Recently, Deepin Technology won the bid for the h...
A study conducted by Juniper Research reveals pro...
Sharktech continues to offer 10Gbps port server s...
Deutsche Telekom has become the third major opera...
10gbiz sent a blog reader exclusive discount code...
Open the e-commerce website, dual-band wireless r...
According to foreign media reports, the much-anti...
1. Overview of Wi-Fi 7 New Features Figure 1 is a...
[51CTO.com original article] Can cameras also tak...
HTTP status code is the response status code retu...
Typically, hackers will gravitate to the weakest ...
When interviewing for network-related positions, ...