System Design URL Shortner

·

3 min read

motive: understanding partition and sharding

Approach 1: Hashing the URL

if the user gives input, it will hash the value with its size of 128 bits, but

drawback: if two users give the same URL, they will get the same hash value so we differentiate the count of two websites and how many people are visiting

Approach2:-Integer ID as short

example chakri/1234 chakri/1235 chakri/1236

with is predictable, someone can easily get what will be next URL

Approach3:- 3: We can explore integers directly and what we encode them into a string

total number of characters in short URL

\=(a-z)+(A-Z)+(0–9)

\=26+26+10

\=62with is less then <2power(6)

we will create a hashtable

000000->a

000001->b

000010->c…..

example: chakribp.

b=000001 p=001111

drawback: still, the user can predict if the user tries to request again and again, the user will get bp, bq, and W.

solution:

before the table is this

000000->a

000001->b

000010->c…..

Now we will assign randomly only over the database, knowing it

001100->m

001001->l

000010->z….

Still, there is a problem: what if some users back-to-back requests of 50 or more do reverse engineering?

Let's come to the storage part:

we will store the shortcode (with we generate) and URL

we can see for short code we are using 8bits and 120bits for URL (8+120)

What if we have 100 million users (8+120)x100=12. 8GB storage is not a big concern to use but we will struggle to handle the load

What happens when someone visits the API server?

When a user request comes to our API server, we will return 3a 01 redirect with the user URL

as we can see in the above diagram, the user gives a URL it hits to over API sever go to the database we have done sharding as per 100million users we can distribute the load we will get short URL before that we will send over consumer data thought queue we are using Kafka with is used for message stream with will handle large throughput it hit customer API we can show visually on grafana

Newly published URL are more likely to hit

we can cache short url that or propular and hit so we added redis to cache things

How was this short code/Url generated?

if we go one by one, easy to predict, then next number

So we will generate a random number, but collisions of numbers will occur because of that. We need a unique random number

from above diagram, we will get random ID from ID column.

that id current value will update with +1 like curr=curr+1 We will take value every time new value is added. In the next iteration, we get same ID, and the current value is updated

this is for small range lets take big range The maximum integer value will be 2 power(32), which is equal to 4billions

in our example, we get user monthly 100M per month; it take 40months==3years with is less

if we want to support more, we increase int 64bits

2 power(64)=1.84x10power(19)

84x10 power(19)

— — — — — — — — — ==>1.84x10 power(11)=>15billion years

100x10power(6)

if you have any doubt, feel free to reach out https://www.linkedin.com/in/bonthachakri/