link-shortener

Pet application to play with AWS and NoSQL databases.

Deployment and local run

The project uses serverless framework for deployment and testing lambdas locally and it requires AWS account to be configured.

To test a function locally, but using remote dynamodb, you can run:

npx sls invoke local -f functionName --path path/to/request

For other things, like emulating AWS locally (using serverless offline and local dynamodb) or deployment, check out npm scripts.

Generating shortlink

There are several ideas for that over the Internet.

Hash-based

The idea is simple: calculate hash(url + timestamp), take the first 7 characters of its Base62 representation and store it as a key. This gives us 62^7 (or 16^7 if we used hexadecimal representation instead) possibilities, which should be enough.

The hash can be e.g. md5, sha1, as we don't really need to care about security of these algorithms here.

We are adding timestamp (or some other salt) to the URL to generate different shortlinks for the same URLs submitted by different users.

The main issue with this approach is that we might have collisions, because of trunkating hash 7 characters. An idea to solve this would be to take some other part of the hash, when a collision occurs.

Id-based

The idea is simple, again. Instead of calculating the hash, we could use an autoincrementing id and convert it to Base62 number (potentially removing some easily-confused characters from the alphabet, like 0 and O, 1 and l). A string generated this way would be reversible - instead of storing generated values, we can simply store ids and convert incoming Base62 values to Base 10 ids and access them directly. The drawback is that it would be quite easy to guess what will be the next generated link and perform some crawling.

If this is undesirable, we could add some randomization part (e.g. salt, or use random values instead of monotonic ids) - in that case it might be necessary to store the calculated shortlink, as we might not be able to convert it to the decimal id (in case of using salt).

The main issue with this approach is that we're using distributed database (DynamoDB) which doesn't really have autoincrementing ids as it's an antipattern. We could potentially use an Atomic Counter, but that's rather not something it is intended for. A better alternative would be to have an id generation service, like Twitter Snowflake.

Since this is a pet-project (and id generation is not its main concern) and we don't really need to care about scale as much as Twitter or Instagram, we should be fine with id-based solution, where id is generated as current_timestamp + random_int(128) (where + is concatenation) and is Base58 encoded (without 0, O, l, I characters). If a collision ever occured, we can simply retry.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
src		src
.gitignore		.gitignore
README.md		README.md
example_request_expand.json		example_request_expand.json
example_request_shorten.json		example_request_shorten.json
package-lock.json		package-lock.json
package.json		package.json
serverless.yml		serverless.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

link-shortener

Deployment and local run

Generating shortlink

Hash-based

Id-based

About

Releases

Packages

Languages

a-mroz/link-shortener

Folders and files

Latest commit

History

Repository files navigation

link-shortener

Deployment and local run

Generating shortlink

Hash-based

Id-based

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages