February 3, 2022
The competing consumer is an architectural pattern that helps deliver a scalable, reliable, and highly available SaaS - or software solutions in general. But instead of just writing about the pattern theoretically, we’ll start off with an imaginary application where this pattern would apply.
Imaginary SaaS Product Inc. is a start-up that’s recently launched a new mobile application. In order to ensure that new users provide their actual email addresses, we want the system to send a verification email to newly registered users. From there, users will need to click on the personalized link included in that email to verify their email address. This is a step they need to accomplish before they can start using the app.
This feature has worked fine over the last couple of weeks – emails get sent directly when the user is registered. The code calls the email-server directly - and tells it to send an email.
Over on Imaginary SaaS Product Inc.’s marketing department, they’ve been working on getting articles written to drum up awareness for the product. Recently, a large publisher released a positive review of the application. On the day the article was published, hundreds of thousands of their readers decided to download and register for the app.
Unfortunately, all except the first few hundred users did not receive their registration email within the first few minutes of registering. Many new users experienced errors during the registration process, creating a poor first experience which resulted in many deleting the app almost immediately.
There are actually a number of things that could cause problems with this architecture. However, the primary issue faced by Imaginary SaaS Product Inc. is that their system is not prepared to deal with heavy intermittent loads.
A single email-server can get congested. And when this server runs at maximum capacity, this can and will cause a variety of complications, delivering a poor experience to most new users.
In each of the above scenarios, the first problem is that the system uses its own email server. While this may seem cheaper than using a cloud-based service, in the long run, it's better to consider collaborating with the experts that focus on this particular knowledge area.
That said, the problem presented is a common one in the software development world. Any task that is dependent on an external service or any task that takes more than a few seconds to execute may experience this problem. Here are some steps to consider:
Looking at the architecture, we can identify that the problem is that the API is directly dependent on the email-server being online. In the industry, we call this concept coupling.
When the email-server is unavailable, this creates a number of problems.
In a scenario where the API attempts to reach the email-server but can’t, this results in incomplete requests. As requests build-up, this causes a server slow-down, which eventually leads to a time-out. Since the API can’t complete the registration request, this results in errors for the end-user.
The Solution: To solve this problem, you want to remove this direct dependency, or to decouple them. While there are a variety of options to choose from, we recommend the simplest version for this scenario - introducing a message queue.
Think of the message queue as an old fashioned physical mailbox. When someone drops a letter in this box, that message will stay in there until the mailbox is physically emptied. The same goes for the message queue: when a message gets enqueued, it will remain there until it gets consumed.
The API is responsible for dropping letters in this mailbox. These letters contain instructions to send an email. This includes who the recipients are, the subject of the message, and the message itself.
The queue itself does not contain any logic, so it won’t send any emails by itself. To solve this, we introduce a new service that is responsible for opening up the mailbox and taking out one letter at the time. This service is called the email handler.
The email handler’s task is simple: It reads the letters and performs the actual task that needs to be executed; which in this case, is the actual sending of the email.
The logic to deal with availability problems can be built into the email handler. Since this now happens in the background, the user only has to wait for the API to enqueue the message and the email handler can send the message when the email-server is ready to take on new work.
In contrast, when the email-server is not available, the email handler can just leave the message in the queue and try it again at a later stage.
Now that our architecture has improved, we have the ability to introduce an additional email handler. This step does not require any additional coding and can be done entirely by configuration.
This is because when the primary handler becomes unavailable the secondary service still keeps operating, and vice versa. As a result, you avoid having a single point of failure and is overall a good practice to have in place.
We also recommend that a second email server is used for the exact same reason. This way, both email handler 1 and email handler 2 consume messages from the same message queue, but never the same message.
What we want to avoid are instances where email handler 1 and email handler 2 consume the same message, therefore sending the email out twice.
Remember how we said that it was important that the handler only opens a single letter at the time? This is because the handler must complete the entire action to successfully do its job. Only then can it start with another message.
This pattern is what we call a competing consumer. Email handler 1 and email handler 2 both compete to consume messages from the queue.
Although we have solved a few of the problems that Imaginary SaaS Product Inc experienced, the requirement to deliver emails within 1 minute of registration still exists. The system is still not prepared to deal with high intermittent loads.
When we get thousands of new users signing up at the same time – the queue may still not be processed as fast as we’d like. The great thing is, our architecture can now deal with this – and most cloud providers have tools to do this automatically.
For instance, when there are 50 messages in the queue, this may mean the system is running behind. To address this backlog, we can configure the cloud to spawn another email-server and an additional email handler to handle the higher load. When we reach 100 messages in the queue, we can spawn another set.
This can be done until the number of messages in the queue starts to decrease. When that happens, we can begin shutting down the additional email handlers until we are back at only two.
We understand that a naïve approach can cost you thousands of users, so our policy is to not take shortcuts when implementing solutions. If you're looking for a scalable, beautiful, and reliable solution for your software - let's talk.