Reconnect the campus network after it is disconnected. Use crawlers to fix it!

Reconnect the campus network after it is disconnected. Use crawlers to fix it!

[[433698]]

Preface

Hello, everyone. I am Dasai Ge (brother). It’s been a long time since we last met. I miss you so much.

Recently, due to a small demand, I studied the encryption of two logins and successfully decrypted the encrypted parameters. I would like to share them with you here.

Some time ago, a classmate's laboratory server campus network kept disconnecting, and he wanted to ask if there was any way to reconnect.

I didn't study it because I was busy, and I haven't done it for a long time. I studied and analyzed it yesterday when I had nothing to do. This process may be a piece of cake for people with basic knowledge, but for those who have never understood it, you can try it out and maybe it will be useful later. Sometimes it's easy to learn one thing and then learn another.

The scope of this content belongs to the advanced crawler: JS decryption. Of course, with the diverse and colorful packaging methods, anti-crawling methods are becoming more and more sophisticated. Many websites, especially those with commercial data, are really difficult to deal with.

Most websites require permission authentication and management. Many pages and operations require authentication before they can be accessed, and login is the most critical step in authentication. If we want to be unimpeded on a page, most of the time we need users to log in. Login is the first problem that many crawler programs have to solve, and it is also the most complicated and difficult part of the entire crawler. Only after logging in can we use the program.

The above lists two login situations. The first situation rarely occurs, but the login we wrote during our student stage is implemented in this way. The plain text is not encrypted, but this situation is not very safe, so most logins or requests will encrypt some parameters. If we want to use a program to simulate this login, we need to understand the formation process of each parameter and simulate the generation and sending. Of course, the most difficult problem of login is the verification code, which I have not studied due to my limited ability, so I will not explain it here. Most websites do not have verification codes when the error rate is not high, so you can still try it in most scenarios.

The campus network needs to be successfully logged in through http before accessing the Internet. The login parameter password is encrypted. Below I will share with you a small analysis based on my own environment.

analyze

Having introduced so much above, let’s get straight to the point and start analyzing this issue.

For the campus network, we connect to its wifi or network cable, and we are in the local area network of this campus network. Network traffic requires costs. When you access the external network, if you do not obtain authentication authorization, you will not be able to access external services. Only by successfully logging into the campus network platform can you access the Internet.

However, there are many ways to log in now. The first step is to observe the login situation. I roughly divide it into two types: one is ordinary form login, and the other is Ajax dynamic login.

How to distinguish between the two?

It's very simple, just check if the URL changes when you log in (cool??).

You see, the URL of a school's campus network login page remains unchanged after logging in, so this is a case of Ajax login.

Is there any difference between the two? Not much, but Ajax generally does not require the use of professional packet capture tools, while some form logins may involve various redirects and new pages, and the browser may not be able to capture the corresponding information well, so you need to use some fiddler, wireshark and other tools to capture the packets.

First, we need to open the browser's F12, open the network item, and then click on the XHR small directory. If you click all, you will get too much content, and a small part of the data may be hidden in JavaScript (normally not). Doc is generally the main page. If it is a normal form, you need to look at the doc request.

After clicking login, you can see the content of each request interaction. You will find that there is a login on this page, and there is a getchallenge on login. First, click login to view the parameters carried.

You can see that this request has three parameters, namely username, password, and an unknown challenge, but there is a getchallenge request above. Then I take a look and find that there is indeed a challenge parameter. Of course, if there are other parameters, it may exist directly on the page, or it may be dynamically generated through encryption. You have to analyze it yourself. From the above picture, you can see that we only need to decipher the encryption method of this password (people who are sensitive to data may have guessed what encryption it is).

Now that you know which parameter needs to be resolved, you can generally start from two aspects. The first is to use the browser element to locate the login button, and use the global search to see where it is used in js. You can debug the logic, but many of these solutions seem to be difficult to find useful content from beginning to end, because you don’t know that its parameters may have been encrypted after you fill them in, so this method is not recommended.

Search directly for the parameters, there are three parameters, username, password, challenge, you can search directly, here I search for password, to see where the password is used, including login and other words can be searched. Finally, I saw the logic of login somewhere, this password should be encrypted by the createChapPassword method.

We set a breakpoint here, and then click login. The program successfully reaches the breakpoint, and at this time our account password is still in plain text, indicating that the data is still not encrypted. From here we need to start to sort out the logic.

Enter and take a look at the function, and you will find that the core content is here.

  1. var createChapPassword = function ( password ){
  2. var id = '' ;
  3. var challenge = '' ;
  4. var str = '' ;
  5.  
  6. id = Math.round(Math.random()*10000)%256;
  7.  
  8. $.ajax({
  9. type : 'POST' ,
  10. url : globalVar.io_url + 'getchallenge' ,
  11. dataType : 'json' ,
  12. timeout : 5000,
  13. cache : false ,
  14. async : false ,
  15. success : function (resp){
  16. if(resp && (resp.reply_code != null ) && (resp.reply_code == 0)) challenge = resp.challenge;
  17. }
  18. });
  19.  
  20. str += String.fromCharCode(id);
  21. str += password ;
  22.  
  23. for (i=0;i<challenge.length;i+=2){
  24. var hex = challenge.substring (i,i+2);
  25. var dec = parseInt(hex,16);
  26. str += String.fromCharCode( dec );
  27. }
  28.  
  29. var hash = $.md5(str);
  30.  
  31. chappassword = ((id<16) ? "0" : "" ) + id.toString(16) + hash;
  32.  
  33. return { password : chappassword , challenge : challenge};
  34. };

I will explain the logic here to you. If you don’t understand, just use a search engine to search for it.

The first is an id generated by a random number. When reproducing in other languages, you can choose a fixed one.

Then Ajax sends a request to get a challenge parameter. The str is first added with the Unicode character corresponding to the id, and then challenge is added in pairs to form a hexadecimal number corresponding to the Unicode character.

Encrypt str with MD5, and then piece together the returned result. So, the encryption logic of the parameters is here, and we just need to reproduce it.

Logic Reproduction

The fact is that the logic of reproduction is not that simple. When I reproduced it, I honestly found that there was no problem in the front, and compared it with the content of the browser. However, when MD5 was implemented in Python, the result was inconsistent with the MD5 encrypted content on the front end.

This problem really took a long time to investigate and wasted a lot of time. I would like to share the process with you.

What's the situation? Normal programming languages ​​need to encode the string first, and then perform MD5 encoding. The most well-known conventional encoding method is utf-8, and the results of websites using online encryption are the same as the encryption results of the Python library.

Then I tried to print the result of UTF-8 encoding of the characters in the console, and used the browser console to encrypt my encoded string. I was shocked! The result was consistent with the controlled result (the string of 33c9).

This means that the MD5 encryption library of JQuery does not encode the characters in utf-8 but adopts other methods. We need to find this method to implement in the programming language. After several attempts and searches, we finally found an encoding format:

ISO-8859-1

This code was learned a long time ago when I downloaded files from a Java Web server and found that the Chinese file name was abnormal. I rarely encountered file re-encoding after that. After replacing it with this code, I finally printed out the result we wanted.

  1. ª124412ðRkhìy'LŒÁZosõ
  2. b '\xc2\xaa124412\xc3\xb0Rkh\xc3\xacy\xc2\x92L*\x08\xc2\x8c\xc3\x81Zos\xc3\xb5'  
  3. hash 297ad4844ee638891233c9ca65df4d9c
  4. chappasword aa297ad4844ee638891233c9ca65df4d9c

Now it's all done. Encapsulate the code and give it a try. Here I use Python to implement it, but Java can also be the same. I use the session of the requests module (this module automatically keeps cookies), but the code cannot be used, so I can only compare it with the previous front-end JavaScript logic.

  1. import requests
  2. import hashlib
  3. import urllib
  4.  
  5. from requests import sessions
  6.  
  7. # header request header, through the browser request packet capture to view the header information required for the request, including return data type, browser and other information
  8. header={
  9. 'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36' ,
  10. 'x-requested-with' : 'XMLHttpRequest' ,
  11. 'accept' : 'application/json, text/javascript, */*; q=0.01' ,
  12. 'accept-encoding' : 'gzip, deflate, br' ,
  13. 'accept-language' : 'zh-CN,zh;q=0.9' ,
  14. 'connection' : 'keep-alive' ,
  15. 'Content-Type' : 'application/x-www-form-urlencoded; charset=UTF-8'  
  16. , 'Host' : 'm.njust.edu.cn'  
  17. }
  18. #Data (carry this part of data to the backend account password, etc.) We need to carry the parameters to access the interface. Among them, we need to change the name and password , and assign the input account and password
  19. data={
  20. 'username' : '' ,
  21. 'password' : '' ,
  22. 'challenge' : ''  
  23. }
  24. def get_challenge():
  25. url = 'http://m.njust.edu.cn/portal/index.html'  
  26. req = session.get(url)
  27. #print(req.text)
  28. req2 = session.post( "http://m.njust.edu.cn/portal_io/getchallenge" )
  29. challenge = req2.json()[ 'challenge' ]
  30. return challenge
  31. def get_str2():
  32. str2 = chr(id)
  33. str2 = str2 + password  
  34.  
  35. for i in range(len(challenge)):
  36. if i % 2 == 1:
  37. continue  
  38. hex1 = challenge[i: i + 2]
  39. dec = int (hex1, 16)
  40. str2 = str2 + (chr( dec ))
  41. return str2
  42.  
  43. def login():
  44. loginurl= 'http://m.njust.edu.cn/portal_io/login'  
  45. req3=session.post(loginurl,data=data,headers=header)
  46. print(req3.text)
  47.  
  48. if __name__ == '__main__' :
  49. # Get cookie when logging in for the first time
  50. id = 162
  51. session = requests.session()
  52. challenge = get_challenge()
  53. username = '12010xxxxxx49'  
  54. password = "12xxxx2"  
  55. str2 = get_str2()
  56.  
  57. hash = hashlib.md5(str2.encode( 'ISO-8859-1' )).hexdigest()
  58. # Print the encrypted password # Test result, it is md5 32-bit encryption
  59. print( 'hash' , hash)
  60.  
  61. chappassword = hex( int (id))[2:] + hash ##remove the leading 0X
  62. print( 'chappasword' , chappassword)
  63.  
  64. data[ 'username' ] = username
  65. data[ 'password' ] = chappassword
  66. data[ 'challenge' ] = challenge
  67. login()

Preparing for launch, there was no internet originally, but after logging in, the network came back. It seems our result was successful.

Successful launch

Summarize

This question is not complicated for veterans, but it may be a novel and interesting thing for many people. Of course, in recent years, it is okay to play with crawlers by yourself, but it may be dangerous to design large-scale crawling of commercial or privacy data. I will have the opportunity to share some pages with traditional login methods later.

This small encryption analysis is easy to reproduce but was stuck for a long time due to encoding issues. In the final analysis, it is because of the weak foundation. I did not have a solid grasp of these encryption algorithms and basic low-level stuff and wasted a lot of time. Many experts may guess what encryption it may be when they see a string, and what type of encoding this data format is... Fortunately, this demo will make up for this blind spot.

<<:  What the hell is cross-domain? Do you understand?

>>:  Use the PipedInputStream class and the PipedOutputStream class to learn about communication between pipe streams.

Blog    
Blog    

Recommend

How do modern data centers meet the needs of a hyper-connected global economy?

There is no doubt that the ultra-high-speed Inter...

By 2027, global 5G IoT roaming connections will reach 142 million

A study conducted by Juniper Research reveals pro...

Foreign media: Germany may completely shut down 3G network in 2022

Deutsche Telekom has become the third major opera...

Why the popular dual-band wireless router advantages tell you

Open the e-commerce website, dual-band wireless r...

5G will play a key role in driving semiconductor market growth

According to foreign media reports, the much-anti...

802.11be (Wi-Fi 7) Technology Outlook

1. Overview of Wi-Fi 7 New Features Figure 1 is a...

Interview surprise: What are the common HTTP status codes?

HTTP status code is the response status code retu...

The overlooked hardware vulnerabilities in enterprise networks

Typically, hackers will gravitate to the weakest ...

Can the interviewer ping 127.0.0.1 after being disconnected from the Internet?

When interviewing for network-related positions, ...