Market Pulse Tech: ADN (API Delivery Network) gives superpowers to our APIs

HOW ADN HAS SUPERCHARGED OUR APIs

FEATURES

ADN supercharges our APIs to meet 99th percentile response time of less than 4ms for over 200K requests per minute.

Built to provide:
1. Instant Preheating (warming up)
2. Real-time purges (in less than 100 ms)
3. supports TTLs as low as 5 seconds

DIFFERENCE BETWEEN CDN AND ADN

CDN is a Hit-Miss model, whereas ADN is designed to always be a Hit, while still supporting the typical Hit-Miss model

Request Mapping of ADN to multiple origins
A simple multiple backend setup at the Phoenix level, inspired by varnish model, does the job for us.

THE NEED

Fast, really fast changing data
For us, data is changing every 300ms. That means that most of our one-to-many APIs cannot cache data for more than 60 seconds.

CDNs didn’t work for our APIs
CDNs work great for static content, videos, images and for one-to-many APIs that benefit a hit-miss model. But when it comes to dynamic, user-specific content that changes frequently, CDNs struggle often.

With Cloudflare, we have found:
1. Making millions of purges per day is ridiculously expensive.
2. Purges can take up to 30 seconds across all PoPs.

With Fastly, we have observed:
1. Purges aren’t reliable. We have seen this repeatedly.
2. They make one request to origin per PoP, that can still lead to multiple requests to origin.

Also, both of these don’t have pre-heating capabilities, leading to traffic going to origin for all the misses.

Origins bombarded at 9:15 am
When the stock markets open at 9:15am, 50% of our users (~150K) end up opening the app in less than 5 minutes.
This leads to a spike of about 200K requests per minute on all our APIs. 40% of these are one-to-many and 60% are one-to-one (user specific).

Problems with one-to-one types:
These include authentication token validation, subscription validity, user sync APIs that are called only once a day, when the first-time user opens the app.
As you see, these cannot benefit from a hit-miss model, as the result will be a Miss ever day, and never a Hit. This leads to over 300K requests flooding in during a very short span at market opening hour.

In addition, these APIs make multiple DB calls and are running on RoR with a typical average response time of 50-100ms.

With ADN, the response time now is less than 1ms.

Problems with one-to-many types:
As an example, let’s consider one of our APIs that needs to be purged every day, because it uses the previous day’s history of the stock to calculate today’s range.
At 9:15am, this leads to about 5-10K rpm. Now this may seem small, but it isn’t.

This request fetches data from a time series database of 45GB, sanitises and sends the result as a json. The origin takes about 10ms to return. Now, that’s fine too. But when you get 10K rpm to the DB, the DB gets overloaded. And that increases the response time to 30-40ms, leading to nasty timeouts for clients.

With CDN + Origin, the response time was still about 30-40ms and would put a lot of stress on the origin.

With ADN, the response time now is less than 4ms, with zero stress on the origin.

BENEFITS OF ADN

Lightweight
It’s built on Varnish, Elixir and Redis with an extremely tiny codebase and minimal infrastructure needs.

Focus on code, not caching
ADN lets our developers focus on solving the core business problem, and leave caching headaches to the ADN.
We have no caching layer in our applications, apart from caching DB queries. There is no need for page caching, fragment caching, view caching, method caching.

Push vs pull mechanism
All CDNs or reverse proxies are built on a Hit/Miss model which inherently requires them to pull from the origin. This results in unpredictable traffic to the server.
With ADN, the server pushes to the ADN and can lead to zero requests to the origin.

Agnostic of backend technology
Our applications run on Elixir/Phoenix, Ruby on Rails and Golang. This approach allows us to support caching across all technologies, seamlessly.

One caching layer, many applications
We are able to work with a centralised caching layer, helping us optimise and manage it better, rather than every application having their own caching layers.

Co-location of origins not needed
We can maintain our ADN PoP (point of presence) closer to the user, without needing the origin to be closer to the user.

WHAT NEXT?

We are moving all our applications behind this layer and hoping to open source our solution. If this can help you or you wish to contribute to this project, get in touch with us.

ENJOY CRACKING TOUGH TECHNICAL PROBLEMS?

JOIN US