How should spaces and plus signs in URLs be encoded?

How should spaces and plus signs in URLs be encoded?

[[427910]]

This article is reprinted from the WeChat public account "Gopher Guide", the author is New World Grocery Store. Please contact the Gopher Guide public account to reprint this article.

It is a consensus that URLs cannot contain spaces explicitly. However, the form in which spaces exist is not completely consistent in different standards, resulting in different implementations in different languages.

Rfc2396 clearly states that spaces should be encoded as %20.

However, the W3C standard states that spaces can be replaced with + or %20.

Lao Xu was confused on the spot. The space was replaced by +, so + itself can only be encoded. In this case, why not encode the space directly? Of course, this is just Lao Xu's doubt. We can no longer trace the previous background, and we cannot change the facts. However, whether the space is replaced by + or 20%, and whether + needs to be encoded are the problems we need to face now.

Three common URL encoding methods in Go

As a Gopher, the first thing we focus on is the implementation of the Go language itself, so let's first understand the similarities and differences between the three commonly used URL encoding methods in Go.

url.QueryEscape

  1. fmt.Println(url.QueryEscape( " +Gopher points to north" ))
  2. // Output: +%2BGopher%E6%8C%87%E5%8C%97

When using url.QueryEscape encoding, spaces are encoded as +, and + itself is encoded as %2B.

url.PathEscape

  1. fmt.Println(url.PathEscape( " +Gopher points to north" ))
  2. // Output: %20+Gopher%E6%8C%87%E5%8C%97

When using url.PathEscape encoding, spaces are encoded as 20%, while + is not encoded.

url.Values

  1. var query = url.Values ​​{ }
  2. query.Set ( "hygz" , " +Gopher points to north" )
  3. fmt.Println(query.Encode())
  4. // Output: hygz=+%2BGopher%E6%8C%87%E5%8C%97

When using the (Values).Encode method to encode, the space is encoded as +, and + itself is encoded as %2B. Further checking the source code of the (Values).Encode method shows that it still calls the url.QueryEscape function internally. The difference between the (Values).Encode method and url.QueryEscape is that the former only encodes the key and value in the query, while the latter encodes both = and &.

For us developers, which of these three encoding methods should we use? Please continue reading and I believe you will find the answer in the following articles.

Implementation in different languages

Since the URL encoding of spaces and + in Go is implemented differently, does this also exist in other languages? Let's take PHP and JS as examples.

URL encoding in PHP

urlencode

  1. echo urlencode( ' +Gopher points to north' );
  2. // Output: +%2BGopher%E6%8C%87%E5%8C%97

rawurlencode

  1. echo rawurlencode( " +Gopher points to north" );
  2. // Output: %20%2BGopher%E6%8C%87%E5%8C%97

PHP's urlencode and Go's url.QueryEscape functions have the same effect, but rawurlencode encodes both spaces and +.

URL encoding in JS

encodeURI

  1. encodeURI( '+Gopher pointer' )
  2. // Output: %20+Gopher%E6%8C%87%E5%8C%97

encodeURIComponent

  1. encodeURIComponent( ' +Gopher points to north' )
  2. // Output: %20%2BGopher%E6%8C%87%E5%8C%97

JS's encodeURI and Go's url.PathEscape functions have the same effect, but encodeURIComponent encodes both spaces and +.

What should we do?

It is more recommended to use the url.PathEscape function encoding

In the previous article, we have summarized the encoding operations of +Gopher pointer in Go, PHP and JS. The following is a two-dimensional table to summarize whether the corresponding decoding operations are feasible.

Encoding/Decoding url.QueryUnescape url.PathUnescape urldecode rawurldecode decodeURI decodeURIComponent
url.QueryEscape Y N Y N N N
url.PathEscape N Y N YY Y YY
urlencode Y N Y N N N
rawurlencode Y YY Y Y N Y
encodeURI N Y N Y Y Y
encodeURIComponent Y YY Y Y N Y

In the above table, YY and Y have the same meaning. Lao Xu only uses YY to indicate that url.PathEscape is recommended for encoding in Go, while rawurldecode and decodeURIComponent are recommended for decoding in PHP and JS, respectively.

In the actual development process, Gopher will definitely need to be decoded. At this time, it is necessary to communicate with the URL encoding party to obtain the appropriate decoding method.

Encode the value

Is there a universal way that does not require URL encoding and decoding? There is undoubtedly a way! Take base32 encoding as an example. Its encoding character set is AZ and numbers 2-7. At this time, after base32 encoding the value, there is no need for URL encoding.

Finally, I sincerely hope that this article can be of some help to all readers.

This article uses the console of PHP 7.3.29, go 1.16.6 and js Chrome 94.0.4606.71.

refer to

https://www.rfc-editor.org/rfc/rfc2396.txt

https://www.w3schools.com/tags/ref_urlencode.ASP

<<:  What you don’t know about 5G

>>:  Let’s talk about the top ten challenges of 6G

Recommend

What is AirGig?

At the MWC 2017 conference, which has ended, peop...

spinservers new VPS host 50% off, $7/month-2GB/20G SSD/1TB/San Jose data center

spinservers launched a new VPS host product this ...

Donghua IDC comprehensive operation management system

IDC——Innate Investment Gene As social division of...

Building a digital foundation: a vast expedition to reshape future education

In the past two years, with the rise of big model...

How many hosts can 100 IPs serve?

I have calculated this once in an old article, bu...

SRv6—A killer for 5G technology implementation

The development of 5G services has put forward hi...

Why Wi-Fi will not disappear but become more important in the 5G era?

This article is reproduced from Leiphone.com. If ...

A 10,000-word article that explains computer networks with pictures!!!

[[383719]] The author has developed a simple, sta...

ServerKurma: $3/month KVM-2GB/20GB/1TB/Türkiye VPS

ServerKurma is a foreign hosting company founded ...