DEV Community

Jeevan Joshi
Jeevan Joshi

Posted on

What URL Shortener is and how does it work?

Have you ever wondered how the URL shortener or tiny url works? The Engineering behind it upholds the efficiency, scalability and security of the system we often take for granted.

Stickman trying to understand the logic of URL Shortener

From generating a shorten version of a url to fetching and redirecting it to its actual destination, the engineering behind it beautifully showcases the elegance of simplicity backed by powerful systems.

To understand the Engineering behind it, let’s start with the crucial component of it called compression or Encoding. To achieve the compression the system uses the Base-62 or Base-64 encoding, and what Base-62 or Base-64 encoding is?

Many of us know how to convert a Decimal Number to a Binary Number, it is called the Base-10 to Base-2 conversion, where we basically use the LCM logic to convert the Base-10 number to Base-2 number, for reference have a look at the image below:

Illustration of Base 2 conversion

The above image shows how A Base 10 digit gets converted to Base 2, where we divide the Base 10 digit with 2 repeatedly (with the base), and if reminder becomes greater than 0 then corresponding bit/number becomes 1 else 0.Finally, we read the number in bottom to up (reverse order), which gives us the Base 2 representation of the number.

e.g. 32 Base-10 represented as 100000 in Base-2

Now imagine having a Base 10 number as 100 Billion, Representing such large numbers in base-10 becomes cumbersome, not just in storage but also in readability and transmission. To make it more manageable, readable we encode the number to a Base-64/Base-62 number, in base-64 each digit can be represented as 64 possible values, resulting in the significant length compression.
Base-10: 100000000000 (100 Billion)

Base-2: 01011101001000011101101110100000000000

Base-64:

Illustration of the calculation of Base 64 length of 100 billion

The length will be 7, now let’s see how we use it in the URL Shortening.

A URL can be of any length, to make a tiny URL or short URL for the corresponding URL, we have Base-64 encoding, which consist of Numbers from 0 to 9, Uppercase and Lowercase character from A to Z and a to z, and two special characters underscore and hyphen (_, -).

Now how does we compress the URL?

A URL directly does not get compressed to a tiny URL but it’s index or id (Unique Identifier) of database entry gets compressed to a Base-64 from Base-10. Have a look at the below JSON Object:

Object with Base 10 ID

Here in this above JSON block the unique Identifier is 10032, which can be converted to 2sM, and then this ID will be used to make the URL tiny, for example: https://example.com/2sM

Object with Base 64 ID

Now the question arise that how it gets converted to “2sM”, so it is so basic that we have an array/String of all the allowed Base-64 characters and some mathematical logic of getting the reminders, where we get reminder of the integer id with 64, and add its corresponding index character to the resulting string, and then we divide the original number to 64 repeatedly until it become ≤ 0.

JavaScript code snippet of Base64 Encoding

Now by converting the database id into shorter, encoded string, we create a tiny representation of that document or entry. this tiny string act as a Key that maps back to the original URL stored in the database. The entire tiny URL system fundamentally relies on this logic beneath. (this logic is also used by YouTube to create their platform videos unique IDs)

Now let’s understand how the Entire Working Happens.

when you hit a tinyurl (e.g. http://example.com/xyz) it first requests the server to check the cache (Redis or any other) database, if there exist any entry with the id “xyz” if cache hits, it sends back the original URL with the redirect command, otherwise the server then checks its databases, if still there is no entry then 404 will be responded otherwise the server sends back the Original URL and also updates the cache (if bulk query comes) and tracks the clicks, IP and other information of the user through the Headers of the packet.

Flowchart of Process

Now after fetching the Original URL the website/App simply redirects you to the corresponding website or provides a button to redirect you to the website.

Even wondered what we can craft with this logic, we can create a TTL (Time to Live) based mechanism for a particular link giving database an advantage to have more links with a single identifier, where the server redirects to the only link which is currently is not expired. Or also we can craft a Location based URL maybe give your ideas in the comment section.

With that I bind my words, and I hope you understand the logic behind URL shortener, I’ll soon be releasing an article entitled “The High-level and Low-level system design behind the Tiny URL”.

Thanks for reading.

Top comments (0)