Design tiny url

Design tiny url

Design tiny url

For example, Maimai will not allow you to send too long URLs and will convert them into short links.

1.Scenario

Generate a short url based on a long url.

Such as http://www.javaedge.com => http://bit.ly/1ULoQB6

Restore the long url according to the short url and jump:

Questions to confirm with the interviewer:

Do long urls and short urls have to correspond one to one?

Short URL has not been used for a long time. Does it need to be released?

1.1 QPS Analysis

  • Ask about daily activity, such as Weibo 100M
  • Calculate the QPS of a tiny URL

Assume that each user posts an average of 0.1 microblogs with URLs per day (10 microblogs, one of which contains a link).

Average write QPS = 100M * 0.1 / 86400 = 100

Peak write qps = 100 * 2 = 200

  • Calculate the QPS of clicking a tiny URL

Assume that each user clicks 1 tiny url on average

Average write QPS = 100M * 1 / 86400 = 1k

Peak read qps = 1k * 2 = 2k

  • deduce The storage occupied by new URLs generated every day

100M * 0.1 = 10M records

The average length of each URL is 100, totaling 1G

1T hard drive can be used for 3 years

From the analysis of 2 and 3, we can see that distribution or sharding is not required. To support 2k QPS, one SSD MySQL is enough.

2. Service - logical block clustering and interface design

The system is actually very simple, it only needs one service: URL Service. Since tiny url has only one UrlService:

  • It is actually a small independent application
  • No need to worry about any other business functions

Method design:

UrlService.encode(long_url): encoding method

UrlService.decode(long_url): decoding method

There are currently two common mainstream styles for access port design:

  • GET /REST styleReturn a http redirect responseREST styleReturn a http redirect response

  • POST /data/shorten (not recommended, not in line with REST design style, but some people use it) return a short url

So, which service design does your company choose for its short-chain system?

3. Storage data access (best reflects practical experience)

  • select storage structure
  • Scheme refinement data table

3.1 SQL VS NoSQL

Do you need transactions? No, nosql+1

Do you need rich SQL queries? No, nosql+1

Want to be lazy? Tiny URL requires simple code, nosql+1

Is the qps high? 2k, not high. sql+1

How high is the scalability requirement? Storage and QPS are not high, and a single machine can handle it. sql+1

 - SQL requires you to write your own code to scale
- Nosql, all of this is done for you

Do you need a sequential ID? It depends on your algorithm.

  • SQL provides auto_increment sequential ID, i.e. 1, 2, 3
  • Nosql ID is not sequential

3.2 Algorithm

Convert long ur to a 6-digit short url. Given a long url, return a short url.

Implement two methods:

  • longToShort(url)​ Converts a long URL to a short URL starting with http://tiny.url/
  • shortToLong(url) Convert a short URL to a long URL shortToLong(url) Convert a short URL to a long URL

standard:

  • The key length of the short URL should be 6 (excluding the domain name and backslash). The only characters allowed are [a-zA-Z0-9]. For example: abcD9E[a-zA-Z0-9]. For example: abcD9E
  • Any two long URLs will not correspond to the same short URL, and vice versa.

Using two hash tables:

  • One is to map the short URL to the long URL
  • One is to map the long URL to the short URL

The short URL has a fixed format: "http://tiny.url/" + 6 characters, and the characters can be arbitrary.

To avoid duplication, we can use them in lexicographical order, or use a set to record whether it has been used based on random generation.

Use a hash function (not feasible)

For example, take the last 6 digits of the MD5 of the long URL:

  • quick
  • It is difficult to design a hash algorithm without hash collisions

Randomly generate shortURL + DB deduplication

Randomly pick a 6-digit short URL. If it has not been used before, bind it to the long URL.

 public String long2Short ( String url ) {
while ( true ) {
String shortURL = randomShortURL ( ) ;
if ( ! databse .filter ( shortURL = shortURL ) .exists ( ) ) {
database .create ( shortURL = shortURL , longURL = url ) ;
return shortURL ;
}
}

}

public class TinyUrl {

public TinyUrl ( ) {
long2Short = new HashMap < String , String > ( ) ;
short2Long = new HashMap < String , String > ( ) ;
}

/**
* @param url a long url
* @return a short url starts with http://tiny.url/
*/
public String longToShort ( String url ) {
if ( long2Short .containsKey ( url ) ) {
return long2Short .get ( url ) ;
}

while ( true ) {
String shortURL = generateShortURL ( ) ;
if ( ! short2Long .containsKey ( shortURL ) ) {
short2Long .put ( shortURL , url ) ;
long2Short .put ( url , shortURL ) ;
return shortURL ;
}
}
}

/**
* @param url a short url starts with http://tiny.url/
* @return a long url
*/
public String shortToLong ( String url ) {
if ( ! short2Long .containsKey ( url ) ) {
return null ;
}

return short2Long .get ( url ) ;
}

private String generateShortURL ( ) {
String allowedChars = "0123456789" + "abcdefghijklmnopqrstuvwxyz" + "ABCDEFGHIJKLMNOPQRSTUVWXYZ" ;

Random rand = new Random ( ) ;
String shortURL = "http://tiny.url/" ;
for ( int i = 0 ; i < 6 ; i ++ ) {
int index = rand .nextInt ( 62 ) ;
shortURL += allowedChars .charAt ( index ) ;
}

return shortURL ;
}
}

Advantages: Simple implementation

Disadvantage: The speed of generating short links becomes slower as the number of short links increases.

Relational database table: only two columns, Short key and Long URL, are needed, and indexes are created for each column.

You can also use nosql, but you need to create two tables:

  • According to long query short key = longurl column = shorturl value = null or timestamp
  • According to short query long key = shorturl column = longurl value = null or timestamp


Base32 conversion (microblogging implementation)

Base62:

  • Treat the 6-digit short URL as a 62-bit number (0-9, az, AZ)
  • Each short url corresponds to an integer
  • This integer corresponds to the primary key of the DB table.

Different URLs that can be represented by 6 bits:

  • 5 bits = 62^5 = 0.9B = 900 million
  • 6 bits = 62^6 = 57B = 57 billion
  • 7 bits = 62^7 = 3.5T = 3.5 trillion

Advantages: High efficiency

Disadvantages: Strong reliance on global auto-increment id

 public class TinyUrl {
public static int GLOBAL_ID = 0 ;
private HashMap < Integer , String > id2url = new HashMap < Integer , String > ( ) ;
private HashMap < String , Integer > url2id = new HashMap < String , Integer > ( ) ;

private String getShortKey ( String url ) {
return url .substring ( "http://tiny.url/" .length ( ) ) ;
}

private int toBase62 ( char c ) {
if ( c >= '0' && c <= '9' ) {
return c - '0' ;
}
if ( c >= 'a' && c <= 'z' ) {
return c - 'a' + 10 ;
}
return c - 'A' + 36 ;
}

private int shortKeytoID ( String short_key ) {
int id = 0 ;
for ( int i = 0 ; i < short_key .length ( ) ; ++ i ) {
id = id * 62 + toBase62 ( short_key .charAt ( i ) ) ;
}
return id ;
}

private String idToShortKey ( int id ) {
String chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" ;
String short_url = "" ;
while ( id > 0 ) {
short_url = chars .charAt ( id % 62 ) + short_url ;
id = id / 62 ;
}
while ( short_url .length ( ) < 6 ) {
short_url = "0" + short_url ;
}
return short_url ;
}

/**
* @param url a long url
* @return a short url starts with http://tiny.url/
*/
public String longToShort ( String url ) {
if ( url2id .containsKey ( url ) ) {
return "http://tiny.url/" + idToShortKey ( url2id .get ( url ) ) ;
}
GLOBAL_ID ++ ;
url2id .put ( url , GLOBAL_ID ) ;
id2url .put ( GLOBAL_ID , url ) ;
return "http://tiny.url/" + idToShortKey ( GLOBAL_ID ) ;
}

/**
* @param url a short url starts with http://tiny.url/
* @return a long url
*/
public String shortToLong ( String url ) {
String short_key = getShortKey ( url ) ;
int id = shortKeytoID ( short_key ) ;
return id2url .get ( id ) ;
}
}

Because we need to use an auto-increment id, we can only use a relational DB table:

id primary key, long url (index)

4. Scale

How to improve the response speed and make it as efficient as directly opening the original link.

To be clear, this is a business with more reads and less writes.

4.1 Cache Aside

The cache needs to store two types of data:

  • long2short (needed to generate new short url)
  • short2long (needed when querying short url)

4.2 CDN

Use geographic location information to speed up.

Optimize server access speed:

  • Different regions use different web servers
  • Resolve users in different regions to different servers through DNS

Optimize data access speed

  • Use centralized MySQL + distributed Redis
  • One MySQL with multiple Redis, Redis distributed across regions

4.3 When do you need multiple DB servers?

Insufficient cache resources or low hit rate

Too many write operations

More and more requests cannot be satisfied by cache

What can multiple DB servers optimize?

• Solve the problem of insufficient storage: storage

• Solve the problem of being too busy: qps

So what is the main problem with tiny url? Storage is no problem, the key is qps. So, how to shard?

Vertical splitting: assign multiple tables to multiple machines. This is not applicable, there are only two columns and they cannot be split further.

Horizontal split:

If id and shortURL are used as shard keys:

• When querying long2short, it can only be broadcast to N databases for querying

• Why do we need to check long2short? To avoid duplicate creation

• This is OK if you don't need to avoid duplicate creation

Use long url as shard key:

When performing short2long queries, the query can only be broadcast to N DBs.

4.3.1 Sharding Key Selection

If a long can correspond to multiple short

• Use cache to cache all long2short

• When creating a short URL for a long URL, if the cache miss occurs, a new short URL is created.

If a long can only correspond to a short

• If using a random generation algorithm

• Two tables, one for long2short and one for short2long

• Each mapping relationship is stored twice, so long2short and short2long queries can be supported at the same time

• If using base62 conversion

• There is a serious problem: how to maintain a global auto-incrementing ID between multiple machines?

• Generally, relational DB only supports global auto-increment ID on one machine.

4.4 Global auto-increment id

4.4.1 Dedicate a DB for auto-increment service

This DB does not store real data and is not responsible for other queries.

To avoid single point of failure, multiple DBs may be required.

4.4.2 Using ZooKeeper

But using a global auto-increment ID is not the best solution for tiny URLs. Generating a Distributed Sequence Number

4.5 Base62-based sharding strategy

Hash(long_url)%62 as shard key

And put hash(long_url)%62 directly into short url

If the original short key is AB1234, the current short key is

  • hash(long_url) % 62 + AB1234
  • If hash(long_url)%62=0, then it is 0AB1234

In this way, the shard key can be obtained through both short and long.

Disadvantage: The number of DB machines cannot exceed 62.

So, the final optimal architecture is:

Can 4.6 be further optimized?

Communication between web server and database.

The communication between the centralized server cluster and cross-regional web servers is slow: for example, the server in China needs to access the DB in the United States.

Why not allow Chinese servers to access Chinese DBs?

If data is repeatedly written to the Chinese DB, how to solve the consistency problem? It is difficult to solve!

Consider user habits:

  • When Chinese users visit, DNS will assign Chinese servers to them.
  • The websites visited by Chinese users are generally Chinese websites
  • So you can shard according to the regional information of the website
  • How to obtain the regional information of websites? Simply put the websites that users frequently visit into a table.
  • What should Chinese users do when visiting American websites?
  • When a Chinese server accesses a US database, it won't be too slow.
  • The majority of users are in the middle of the visit, and the optimization system is aimed at the main needs

So, the final structure is:

You can also maintain a whitelist of domain names to access the DB in the corresponding region.

5. User-defined short links

Implement a customer short URL so that customers can create their own short URLs. That is, you need to implement createCustom based on the previous method. createCustom.

Three methods need to be implemented:

  • long2Short(url) Convert a long URL into a short URL starting with http://tiny.url/ long2Short(url) Convert a long URL into a short URL starting with http://tiny.url/
  • short2Long(url) Convert a short URL to a long URL short2Long(url) Convert a short URL to a long URL
  • createCustom(url, key) Set the short URL of a long URL to http://tiny.url/ + keycreateCustom(url, key) Set the short URL of a long URL to http://tiny.url/ + key

Notice:

  • The key length of the short URL generated by long2Short should be equal to 6 (excluding domain names and backslashes). The only characters that can be used are [a-zA-Z0-9]. For example: abcD9EThe key length of the short URL generated by long2Short should be equal to 6 (excluding domain names and backslashes). The only characters that can be used are [a-zA-Z0-9]. For example: abcD9E
  • Any two long URLs will not correspond to the same short URL, and vice versa
  • If createCustom cannot complete the settings expected by the user, it should return "error". Otherwise, if the long URL is successfully mapped to the short URL, the short URL should be returned. If createCustom cannot complete the settings expected by the user, it should return "error". Otherwise, if the long URL is successfully mapped to the short URL, the short URL should be returned.

5.1 Based on Base62

Is it possible to directly add a new column custom_url in URLTable to record the corresponding custom url?

Not feasible! For most data, this column is actually empty, which wastes storage space.

Add a new table to store custom URLs: CustomURLTable.

Create a custom short link: query and insert in CustomURLTable

Create a short link based on the long link:

  • First check whether CustomURLTable exists
  • Then query and insert in URLTable

As before, two hash tables are used to handle the mapping between long URLs and short URLs. An additional process is to return "error" when the URL set by the user conflicts with the existing one. Note: If the URL set by the user is exactly the same as the existing one, the short URL should also be returned.

 public class TinyUrl2 {
private HashMap < String , String > s2l = new HashMap < String , String > ( ) ;
private HashMap < String , String > l2s = new HashMap < String , String > ( ) ;
private int cnt = 0 ;
private final StringBuffer tinyUrl = new StringBuffer ( "http://tiny.url/" ) ;
private final String charset = "qwertyuiopasdfghjklzxcvbnm1234567890QWERTYUIOPASDFGHJKLZXCVBNM" ;

private String newShortUrl ( ) {
StringBuffer res = new StringBuffer ( ) ;
for ( int i = 0 , j = cnt ; i < 6 ; i ++ , j /= 62 )
res .append ( charset .charAt ( j % 62 ) ) ;
cnt ++ ;
return tinyUrl + res .toString ( ) ;
}

/*
* @param long_url: a long url
* @param key: a short key
* @return: a short url starts with http://tiny.url/
*/
public String createCustom ( String long_url , String key ) {
String short_url = tinyUrl + key ;
if ( l2s .containsKey ( long_url ) ) {
if ( l2s .get ( long_url ) .equals ( short_url ) )
return short_url ;
else
return "error" ;
}
if ( s2l .containsKey ( short_url ) )
return "error" ;
l2s .put ( long_url , short_url ) ;
s2l .put ( short_url , long_url ) ;
return short_url ;
}

/*
* @param long_url: a long url
* @return: a short url starts with http://tiny.url/
*/
public String longToShort ( String long_url ) {
if ( l2s .containsKey ( long_url ) )
return l2s .get ( long_url ) ;
String short_url = newShortUrl ( ) ;
l2s .put ( long_url , short_url ) ;
s2l .put ( short_url , long_url ) ;
return short_url ;
}

/*
* @param short_url: a short url starts with http://tiny.url/
* @return: a long url
*/
public String shortToLong ( String short_url ) {
if ( s2l .containsKey ( short_url ) )
return s2l .get ( short_url ) ;
return "error" ;
}
}

5.2 Based on random generation algorithm

No need to make any changes, just create the custom url as a short url!

Reference: https://www.zhihu.com/question/29270034

<<:  How 5G will change engineering design

>>:  From 5G to 6G: The race between innovation and disruption

Recommend

China leads the world in quantum technology patents

[[388060]] Quantum technology has become the comm...

Talk about STM32 network interruption

[[380734]] 01 Introduction Network interrupt vect...

5G vs. WiFi 6: Tips for choosing the best wireless network option

There has been much to prove about 5G’s theoretic...

Healthcare and smart city services will lead 5G IoT adoption

Juniper Research predicts that by 2026, there wil...

Video and Network - Why is 700MHz the golden channel?

Labs Guide 700M is called the "golden freque...

How 5G will shape the future of construction

5G is an enabler that will deliver new capabiliti...

Software-based routing is eating into the traditional branch router market

As more and more enterprises begin to realize the...

Graphical explanation | A brief history of what is HTTP

[[344212]] This article is reprinted from the WeC...