The well-known research organization Aberdeen Group once conducted a survey, and the results were shocking. Across the entire Internet, web crawlers generate 37.2% of traffic! In other words, out of every 100 Internet users, only 63 are real humans, and the rest of the traffic is generated by robots. There is an even more terrifying statement that in the future more than 50% of Internet traffic will be generated by robots. In the real world, humans are still worried about the threat of artificial intelligence, but in the virtual world, the traffic generated by robots is already on par with that of humans, or even exceeds that of humans. At every moment, crawlers are imitating human online behavior, strolling around various websites, clicking buttons, checking data, or reciting the information they see. They never get tired and repeat the cycle over and over again. You must have seen a verification code. It may look like this: It could also be: Or like this: No matter what it looks like, a CAPTCHA has only one purpose: to identify a real human user. Open Baidu search, search for some information, solve some problems. Inadvertently, you have become one of the many crawler users. Crawlers have spread to every corner of the Internet, affecting everyone. But do you know the past and present life of reptiles? The good sideIn 1994, Xiao Ma, who was participating in the "Information Media Digital Library" project at Carnegie Mellon University, developed a search engine called Lycos using 3 pages of code in order to solve some difficulties of the project. Lycos is the abbreviation of Lycosidae (a type of wolf spider that is good at catching prey). This simple search engine made Xiao Ma see the huge business opportunities behind it, so soon after, Lycos was officially established.
In just two years, Lycos successfully went public, becoming the fastest company to go public in history. According to Nielsen/NetRatings, in October 2002, Lycos had 37 million visitors, making it the fifth most visited website in the world. However, the big cake of search engines cannot escape the fate of being eaten by a pack of wolves. In 1995, one year after the birth of Lycos, two computer science students at Stanford University, Xiaola and Xiaoxie, began to study a computer program called BackRub.
This program is a search engine that uses backlink analysis to track and record data on the Internet. They were determined to develop a powerful search engine that could be used by people around the world to obtain information from the Internet more conveniently. In 1998, Laura and Michael Dowdell used all their assets, plus a little financial support from their alma mater and roommates, to establish a company called Google. Because they did not have sufficient financial security, they had to buy second-hand computer parts and work in a garage. The difficult entrepreneurial environment made Xiaola and Xiao Xie want to sell Google at one point. They invited Yahoo, Excite and several other Silicon Valley companies to buy Google. Unfortunately, these companies were only willing to offer $1 million, which was seriously inconsistent with their psychological expectations, so the matter had to be abandoned. At almost the same time, on the other side of the earth, a young man named Xiao Ma developed a chat software called QQ and also wanted to sell it, but he was unsuccessful.
History always repeats itself surprisingly well. No one expected that these two little-known companies would become Internet giants. On the other side of the world, Xiao Li, who had been in the United States for 8 years, saw that the domestic Internet environment had matured, so he immediately returned to China to start a business and founded a company called Baidu.
At this point, a situation in which Google, Yahoo, and Baidu divided the market into three parts gradually formed. In ancient times, the Internet was still a pure land where sages gathered. In order to respect the rights of websites, major search engines discussed and established a gentleman's agreement through emails - robots.txt. Just put a robots file in the root directory of your website and tell the search engine which content cannot be crawled, and the web crawlers will abide by the agreement and not crawl these contents. The evil sideWith the development of the Internet, the amount of information has grown rapidly. The entire online world is filled with a lot of valuable information, including product information, flight information, and personal privacy data. Some lawless elements saw huge benefits from it. Under the temptation of profit, these people began to violate the crawler protocol, write crawler programs, and maliciously crawl the content of the target website. The first lawsuit involving crawlers in history occurred in 2000, when eBay sued a website that aggregated price information.
eBay believes that it has used robot protocols to clearly tell what information cannot be crawled and what information can be crawled, but this company violated the agreement and illegally crawled information such as product prices. However, the defendant believes that user data on eBay and product information uploaded by users should belong to the users collectively, not to eBay, and the robot agreement is invalid. Ultimately, the court ruled in favor of eBay. This case set a precedent for using crawler protocols as primary reference evidence. Nowadays, crawler technology is developing rapidly, and there are already types such as general web crawlers, focused web crawlers, incremental web crawlers, and deep web crawlers. There are also many ways to crawl targets, such as based on target web page features, based on target data patterns, based on domain concepts, etc. Crawler technology, whether well-intentioned or malicious, will always be around the Internet, affecting every minute of every day of the Internet. |
>>: Enterprises want to formalize WFH network architecture
The latest generation of Wi-Fi technology, Wi-Fi ...
DesiVPS is an Indian VPS hosting provider headqua...
It is very easy to create a local TCP server, whi...
According to Mobile World Live, Ookla's lates...
The Polar code launched by Chinese technology com...
The internet has become an integral part of our l...
Domestic and foreign merchants have successively ...
Cloud computing technology is creating a new and ...
With the vigorous development of cloud-native tec...
It is a commonplace to say that data centers need...
The rollout of 5G is expected to usher in the Fou...
I have already introduced to you what a server is...
Intuitively, 5G has a very obvious role in drivin...
This article intends to discuss gateways around s...
[[410935]] Recently, AT&T, the second largest...