At present, we are in the Internet era, and Internet products are flourishing. For example, when you open a browser, you can see all kinds of information. How does the browser communicate with the server? When you open WeChat to chat with friends, how do you communicate with friends? All of these rely on communication between network processes and sockets. So what is a socket? What are the modules related to network communication in node? These questions are the focus of this article. 1. Socket Sockets are derived from Unix, and the basic philosophy of Unix is that "everything is a file" and can be operated in the "open ==> read/write ==> close" mode. Sockets can also be understood in this way. Regarding Sockets, we can summarize the following points: - It can realize the underlying communication. Almost all application layers communicate through sockets, so "everything is socket"
- Encapsulates the TCP/IP protocol to facilitate application layer protocol calls and is an intermediate abstract layer between the two.
- Each language has its own implementation, such as C, C++, node
- In the TCP/IP protocol family, there are two common protocols in the transport layer: TCP and UDP. The two protocols are different because the socket implementation process with different parameters is also different.
2. Architecture implementation of network communication in node From the perspective of two language implementations, the modules in node exist in two parts: Javascript and C++, and the relationship is established through process.binding. The specific analysis is as follows: - Standard node modules include net, udp, dns, http, tls, https, etc.
- V8 is the core of Chrome, which provides JavaScript interpretation and running functions, including tcp_wrap.h, udp_wrap.h, tls_wrap.h, etc.
- OpenSSL is a basic cryptographic library that includes encryption algorithms such as MD5, SHA1, and RSA, and constitutes the crypto in the node standard module.
- The cares module is used for DNS resolution
- libuv implements cross-platform asynchronous programming
- http_parser is used for http parsing
3. Use of net The net module is a socket network programming module based on the TCP protocol. The http module is implemented on the basis of this module. Let's take a look at the basic usage: - // Create a socket server server.js
- const net = require( 'net' )
- const server = net.createServer();
- server. on ( 'connection' , (socket) => {
- socket.pipe(process.stdout);
- socket.write( 'data from server' );
- });
- server.listen(3000, () => {
- console.log(`server is on ${JSON.stringify(server.address())}`);
- });
- // Create a socket client client.js
- const net = require( 'net' );
- const client = net.connect ({port: 3000});
- client. on ( 'connect' , () => {
- client.write( 'data from client' );
- });
- client. on ( 'data' , (chunk) => {
- console.log(chunk.toString());
- client.end ();
- });
- // Open two terminals and execute `node server.js` and `node client.js` respectively. You can see that the client and the server are communicating with each other.
Use const server = net.createServer(); to create a server object. What are the characteristics of the server object? - // net.js
- exports.createServer = function (options, connectionListener) {
- return new Server(options, connectionListener);
- };
- function Server(options, connectionListener) {
- EventEmitter.call(this);
- ...
- if (typeof connectionListener === 'function' ) {
- this.on ( 'connection' , connectionListener);
- }
- ...
- this._handle = null ;
- }
- util.inherits(Server, EventEmitter);
The above code can be divided into several points: - createServer is a syntax sugar that helps new generate server objects
- The server object inherits EventEmitter and has event-related methods
- _handle is the handle of the server. The attribute value is ultimately created by the TCP and Pipe classes in the C++ part.
- connectionListener is also syntactic sugar, serving as a callback function for connection events
Let's take a look at the callback function of the connectionListener event, which contains a socket object, which is a connection socket, a five-tuple (server_host, server_ip, protocol, client_host, client_ip), and the relevant implementation is as follows: - function onconnection(err, clientHandle) {
- ...
- var socket = new Socket({
- ...
- });
- ...
- self.emit( 'connection' , socket);
- }
Because Socket inherits stream.Duplex, Socket is also a readable and writable stream, and you can use stream methods to process data. The next step is the key port monitoring (port), which is the main difference between server and client. Code: - Server.prototype.listen = function () {
- ...
- listen(self, ip, port, addressType, backlog, fd, exclusive);
- ...
- }
- function listen(self, address, port, addressType, backlog, fd, exclusive) {
- ...
- if (!cluster) cluster = require( 'cluster' );
- if (cluster.isMaster || exclusive) {
- self._listen2(address, port, addressType, backlog, fd);
- return ;
- }
- cluster._getServer(self, {
- ...
- }, cb);
- function cb(err, handle) {
- ...
- self._handle = handle;
- self._listen2(address, port, addressType, backlog, fd);
- ...
- }
- }
- Server.prototype._listen2 = function (address, port, addressType, backlog, fd) {
- if (this._handle) {
- ...
- } else {
- ...
- rval = createServerHandle(address, port, addressType, fd);
- ...
- this._handle = rval;
- }
- this._handle.onconnection = onconnection;
- var err = _listen(this._handle, backlog);
- ...
- }
- function _listen(handle, backlog) {
- return handle.listen(backlog || 511);
- }
There are several points to note in the above code: - The objects to be monitored can be ports, paths, defined server handles, or file descriptors.
- When creating a worker process through cluster, exclusive determines whether to share the socket connection
- Event monitoring is ultimately implemented through TCP/Pipe listen
- The backlog specifies the limit for socket connections, which is 511 by default.
Next, let's analyze the most important _handle in listen. _handle determines the function of the server: - function createServerHandle(address, port, addressType, fd) {
- ...
- if (typeof fd === 'number' && fd >= 0) {
- ...
- handle = createHandle(fd);
- ...
- } else if(port === -1 && addressType === -1){
- handle = new Pipe();
- } else {
- handle = new TCP();
- }
- ...
- return handle;
- }
- function createHandle(fd) {
- var type = TTYWrap.guessHandleType(fd);
- if (type === 'PIPE' ) return new Pipe();
- if (type === 'TCP' ) return new TCP();
- throw new TypeError( 'Unsupported fd type: ' + type);
- }
_handle is implemented by Pipe and TCP in C++, so if you want to fully understand the network communication in node, you must go deep into the source code of V8. 4. UDP/dgram usage Compared with the net module, the dgram module based on UDP communication is much simpler, because there is no need to establish a connection through a three-way handshake, so the entire communication process is much simpler. For business scenarios where data accuracy is not too high, this module can be used to complete data communication. - // Server-side implementation
- const dgram = require( 'dgram' );
- const server = dgram.createSocket( 'udp4' );
- server. on ( 'message' , (msg, addressInfo) => {
- console.log(addressInfo);
- console.log(msg.toString());
- const data = Buffer. from ( 'from server' );
- server.send(data, addressInfo.port);
- });
- server.bind(3000, () => {
- console.log( 'server is on ' , server.address());
- });
- //Client side implementation
- const dgram = require( 'dgram' );
- const client = dgram.createSocket( 'udp4' );
- const data = Buffer. from ( 'from client' );
- client.send(data, 3000);
- client. on ( 'message' , (msg, addressInfo) => {
- console.log(addressInfo);
- console.log(msg.toString());
- client.close () ;
- });
Analyze the principle implementation of the above code from the source code level: - exports.createSocket = function (type, listener) {
- return new Socket(type, listener);
- };
- function Socket(type, listener) {
- ...
- var handle = newHandle(type);
- this._handle = handle;
- ...
- this. on ( 'message' , listener);
- ...
- }
- util.inherits(Socket, EventEmitter);
- const UDP = process.binding( 'udp_wrap' ).UDP;
- function newHandle(type) {
- if (type == 'udp4' ) {
- const handle = new UDP();
- handle.lookup = lookup4;
- return handle;
- }
- if (type == 'udp6' ) {
- const handle = new UDP();
- handle.lookup = lookup6;
- handle.bind = handle.bind6;
- handle.send = handle.send6;
- return handle;
- }
- ...
- }
- Socket.prototype.bind = function (port_ /*, address, callback*/) {
- ...
- startListening(self);
- ...
- }
- function startListening(socket) {
- socket._handle.onmessage = onMessage;
- socket._handle.recvStart();
- ...
- }
- function onMessage(nread, handle, buf, rinfo) {
- ...
- self.emit( 'message' , buf, rinfo);
- ...
- }
- Socket.prototype.send = function (buffer, offset, length, port, address, callback) {
- ...
- self._handle.lookup(address, function afterDns(ex, ip) {
- doSend(ex, self, ip, list, address, port, callback);
- });
- }
- const SendWrap = process.binding( 'udp_wrap' ).SendWrap;
- function doSend(ex, self, ip, list, address, port, callback) {
- ...
- var req = new SendWrap();
- ...
- var err = self._handle.send(req, list, list.length, port, ip, !!callback);
- ...
- }
There are several points to note in the above code: - The UDP module does not inherit stream, but only inherits EventEmit. All subsequent operations are based on events.
- When creating UDP, you need to pay attention to ipv4 and ipv6
- UDP's _handle is created by the UDP class
- During the communication process, you may need to perform a DNS query to resolve the IP address before performing other operations
5. DNS Usage DNS (Domain Name System) is used for domain name resolution, that is, to find the IP address corresponding to the host. In computer networks, this work is implemented by the ARP protocol of the network layer. There is a net module in node to complete the corresponding functions, among which the functions in dns are divided into two categories: Rely on the underlying operating system to implement domain name resolution, that is, in our daily development, the domain name resolution rules can be used to use browser cache, local cache, router cache, dns server, this class only has dns.lookup This type of DNS resolution directly goes to the NDS server to perform domain name resolution - const dns = require( 'dns' );
- const host = 'bj.meituan.com' ;
- dns.lookup(host, (err, address, family) => {
- if (err) {
- console.log(err);
- return ;
- }
- console.log( 'by net.lookup, address is: %s, family is: %s' , address, family);
- });
- dns.resolve(host, (err, address) => {
- if (err) {
- console.log(err);
- return ;
- }
- console.log( 'by net.resolve, address is: %s' , address);
- })
- // by net.resolve, address is : 103.37.152.41
- // by net.lookup, address is : 103.37.152.41, family is : 4
In this case, the results of the two resolutions are the same, but what if we modify the local /etc/hosts file? - // In the /etc/host file, add:
- 10.10.10.0 bj.meituan.com
- // Then execute the above file, the result is:
- by net.resolve, address is : 103.37.152.41
- by net.lookup, address is : 10.10.10.0, family is : 4
Next, analyze the internal implementation of DNS: - const cares = process.binding( 'cares_wrap' );
- const GetAddrInfoReqWrap = cares.GetAddrInfoReqWrap;
- exports.lookup = function lookup(hostname, options, callback) {
- ...
- callback = makeAsync(callback);
- ...
- var req = new GetAddrInfoReqWrap();
- req.callback = callback;
- var err = cares.getaddrinfo(req, hostname, family, hints);
- ...
- }
- function resolver(bindingName) {
- var binding = cares[bindingName];
- return function query( name , callback) {
- ...
- callback = makeAsync(callback);
- var req = new QueryReqWrap();
- req.callback = callback;
- var err = binding(req, name );
- ...
- return req;
- }
- }
- var resolveMap = Object. create ( null );
- exports.resolve4 = resolveMap.A = resolver( 'queryA' );
- exports.resolve6 = resolveMap.AAAA = resolver( 'queryAaaa' );
- ...
- exports.resolve = function (hostname, type_, callback_) {
- ...
- resolver = resolveMap[type_];
- return resolver(hostname, callback);
- ...
- }
There are several points to note in the source code above: - There are differences between lookup and resolve, so be careful when using them
- Both lookup and resolve depend on the cares library
- There are many types of domain name resolution: resolve4, resolve6, resolveCname, resolveMx, resolveNs, resolveTxt, resolveSrv, resolvePtr, resolveNaptr, resolveSoa, reverse
6. HTTP Usage In web development, HTTP is the first and most important application layer. It is the basic knowledge that every developer should be familiar with. It is also a topic that I must ask during interviews. At the same time, most students probably use the http module first when they come into contact with node. Let's take a look at a simple demo: - const http = require( 'http' );
- const server = http.createServer();
- server. on ( 'request' , (req, res) => {
- res.setHeader( 'foo' , 'test' );
- res.writeHead(200, {
- 'Content-Type' : 'text/html' ,
- });
- res.write( '' );
- res.end (``);
- });
- server.listen(3000, () => {
- console.log( 'server is on ' , server.address());
- var req = http.request({ host: '127.0.0.1' , port: 3000});
- req. on ( 'response' , (res) => {
- res. on ( 'data' , (chunk) => console.log( 'data from server ' , chunk.toString()) );
- res. on ( 'end' , () => server. close () );
- });
- req.end ();
- });
- // The output is as follows:
- // server is on { address: '::' , family: 'IPv6' , port: 3000 }
- // data from server header
There are many things worth exploring in the above demo. If you don't pay attention, the service will crash. Let's study them one by one according to the official documentation of node. 6.1 http.Agent Because HTTP is a stateless protocol, each request needs to establish a connection through a three-way handshake. As we all know, the three-way handshake, slow start algorithm, four-way handshake and other processes are very time-consuming. Therefore, HTTP1.1 introduced keep-alive to avoid frequent connections. So how to manage TCP connections? http.Agent does this job. Let's take a look at the key parts of the source code: - function Agent(options) {
- ...
- EventEmitter.call(this);
- ...
- self.maxSockets = self.options.maxSockets || Agent.defaultMaxSockets;
- self.maxFreeSockets = self.options.maxFreeSockets || 256;
- ...
- self.requests = {}; // Request queue
- self.sockets = {}; // TCP connection pool in use
- self.freeSockets = {}; // Idle connection pool
- self. on ( 'free' , function (socket, options) {
- ...
- // requests, sockets, freeSockets read and write operations
- self.requests[ name ].shift().onSocket(socket);
- freeSockets.push(socket);
- ...
- }
- }
- Agent.defaultMaxSockets = Infinity;
- util.inherits(Agent, EventEmitter);
- // About the socket related add, delete, modify and query operations
- Agent.prototype.addRequest = function (req, options) {
- ...
- if (freeLen) {
- var socket = this.freeSockets[ name ].shift();
- ...
- this.sockets[ name ].push(socket);
- ...
- } else if (sockLen < this.maxSockets) {
- ...
- } else {
- this.requests[ name ].push(req);
- }
- ...
- }
- Agent.prototype.createSocket = function (req, options, cb) { ... }
- Agent.prototype.removeSocket = function (s, options) { ... }
- exports.globalAgent = new Agent();
There are several points to note in the above code: - maxSockets By default, there is no upper limit on the number of TCP connections (Infinity)
- The core of connection pool management is to add, delete and check sockets and freeSockets.
- globalAgent will be used as the default agent for http.ClientRequest
Next, you can test the agent's restrictions on the request itself: - // req.js
- const http = require( 'http' );
- const server = http.createServer();
- server. on ( 'request' , (req, res) => {
- var i=1;
- setTimeout(() => {
- res.end ( 'ok' ,i++);
- }, 1000)
- });
- server.listen(3000, () => {
- var max = 20;
- for (var i=0; i
- var req = http.request({ host: '127.0.0.1' , port: 3000});
- req. on ( 'response' , (res) => {
- res. on ( 'data' , (chunk) => console.log( 'data from server ' , chunk.toString()) );
- res. on ( 'end' , () => server. close () );
- });
- req.end ();
- }
- });
- // Execute time node ./req.js in the terminal, the result is:
- // real 0m1.123s
- // user 0m0.102s
- //sys 0m0.024s
- // Add the following code in req.js
- http.globalAgent.maxSockets = 5;
- // Then time node ./req.js in the same way, the result is:
- real 0m4.141s
- user 0m0.103s
- sys 0m0.024s
When maxSockets is set to a certain value, the TCP connection will be limited to a certain value, and the remaining requests will enter the requests queue. When there is a free socket connection, it will be popped from the request queue and the request will be sent. 6.2 http.ClientRequest When http.request is executed, a ClientRequest object is generated. Although this object does not directly inherit Stream.Writable, it inherits http.OutgoingMessage, and http.OutgoingMessage implements the write and end methods, so it can be used in the same way as stream.Writable. - var req = http.request({ host: '127.0.0.1' , port: 3000, method: 'post' });
- req. on ( 'response' , (res) => {
- res. on ( 'data' , (chunk) => console.log( 'data from server ' , chunk.toString()) );
- res. on ( 'end' , () => server. close () );
- });
- // Use pipe directly to add data to the request
- fs.createReadStream( './data.json' ).pipe(req);
Next, let's look at the implementation of http.ClientRequest, which inherits OutgoingMessage: - const OutgoingMessage = require( '_http_outgoing' ).OutgoingMessage;
- function ClientRequest(options, cb) {
- ...
- OutgoingMessage.call(self);
- ...
- }
- util.inherits(ClientRequest, OutgoingMessage);
6.3 http.Server http.createServer actually creates an http.Server object. The key source code is as follows: - exports.createServer = function (requestListener) {
- return new Server(requestListener);
- };
- function Server(requestListener) {
- ...
- net.Server.call(this, { allowHalfOpen: true });
- if (requestListener) {
- this.addListener( 'request' , requestListener);
- }
- ...
- this.addListener( 'connection' , connectionListener);
- this.timeout = 2 * 60 * 1000;
- ...
- }
- util.inherits(Server, net.Server);
- function connectionListener(socket) {
- ...
- socket. on ( 'end' , socketOnEnd);
- socket. on ( 'data' , socketOnData)
- ...
- }
There are several points to pay attention to: - The creation of services depends on net.server, which is used to implement service creation at the bottom layer.
- By default, the service timeout is 2 minutes.
- The behavior of connectionListener after processing TCP connection is consistent with net
6.4 http.ServerResponse See how node.org officially introduces the response object on the server side: This object is created internally by an HTTP server–not by the user. It is passed as the second parameter to the 'request' event. The response implements, but does not inherit from, the Writable Stream interface. It is very similar to http.ClientRequest, inherits OutgoingMessage, does not inherit Stream.Writable, but implements the functions of Stream and can be used as flexibly as Stream.Writable: - function ServerResponse(req) {
- ...
- OutgoingMessage.call(this);
- ...
- }
- util.inherits(ServerResponse, OutgoingMessage);
6.5 http.IncomingMessage An IncomingMessage object is created by http.Server or http.ClientRequest and passed as the first argument to the 'request' and 'response' event respectively. It may be used to access response status, headers and data. http.IncomingMessage is created internally in two places, one is as a request on the server side, and the other is as a response in the client request. At the same time, this class explicitly inherits Stream.Readable. - function IncomingMessage(socket) {
- Stream.Readable.call(this);
- this.socket = socket;
- this.connection = socket;
- ...
- }
util.inherits(IncomingMessage, Stream.Readable); 7. Conclusion The above is a rough analysis of the main network communication modules in Node, and we have a general understanding of the details of network communication. However, this is far from enough, and it still cannot solve various network problems that occur in Node applications. This article is just a beginning. I hope that we can have a deeper understanding of the details and go deep into the C++ level later. |