Messaging Services: Paradox of choice

Most applications today need to be fast; responses should be back in less than a 100ms. It’s also common in any architecture that a group of services interact with each other.

How do these complex interdependent applications meet the needs today? Some factors that could affect the response latency:

  • Current load
  • Downstream service performance
  • Security components
  • Geographic location of the incoming request and the backend
  • Size of the request payload
  • Application’s hosting stack and network

A few other issues with such tightly coupled interdependent services are:

  1. Single Point of Failure (SPOF). If one service has an increased load (or is down) the entire system would be impacted.
  2. Each service has to handle different connections and different protocols.

We could resolve the above issues by decoupling the services. This can be achieved by introducing a messaging system which makes the system more resilient.

As shown above in the application “produces” and “consumes” messages via Messaging Service.

If a service is down, the messages would be persisted in the Messaging System. When the service is back up, it just continues to process the message.

Tools/Services available for decoupling today:

Different Cloud platforms have different flavors of messaging systems.

AWS has

Amazon MQ, Amazon SQS, Amazon SNS, Amazon, Pinpoint, Amazon Kinesis, AWS IoT Message Broker

Azure offers

Azure Storage Queues, Azure Service Bus Queues, Azure Service Bus Topics, Azure Notification Hubs, Azure Event Grid, Azure Event hubs, Azure IoT Hub, Azure Logic Apps Azure Signal IR Service

GCP has Pub/Sub

Apache Kafka is another popular event streaming platform.

There are also other traditional messaging services like RabbitMQ, Jakarta Messaging API (formerly Java Message Service or JMS API).

We have a plethora of choices here. How do we decide which one to choose?

Are you processing an Event or a Message ?

This concept is something extensively used in Microsoft Azure documentation for obvious reasons. Azure itself offers at least 9 different types of options

Event is a lightweight notification, or a condition or a state change. It’s a continuous stream of data (like temperature, stock price). We may or may not choose to do any action based on an event.

A message (or a command) is a raw data produced by a service to be consumed by some other service. An “action” is expected here (bank transaction, booking an appointment)

Apache Kafka, Azure Event Hub, Amazon Kinesis Data Streams are ideal for events.

Azure Service bus, RabbitMQ, Jakarta Messaging API, Google Pub/Sub are ideal messaging systems.

Scalability or Reliability?

Scalability

The term scalability means the measure of system’s ability to increase or decrease performance. When we have a stream of events coming in, there is a possibility of unpredictable heavy load impacting the system

What we need is

A system that is fast to scale out (adding more equivalent functional components in parallel to spread out the load) with no downtime.

Streaming platforms like Kafka are designed for this situation. It has a horizontal and linear scalable commit log (also referred to as write-ahead log, transaction log). It has the ability to add a new broker node(server) without requiring downtime and impacting the customers, nor are there any limits to the number of machines you can have in one cluster horizontally.

Reliability

Reliability is ability to control how many times a message is read (or consumed) and ensure there is absolutely no data loss.

Messaging services like Service bus have duplicate detection, FIFO ordering and at most once read guarantee.

Some financial institutions even have insurance for each message. (It’s that critical!)

This does NOT mean Streaming platforms are “not reliable” or Messaging services cannot scale. Kafka guarantees “Exactly once” delivery (provided you have the right configuration). Most cloud providers ensure that the services scale with ease.

The point is, Azure Service bus, RabbitMQ, Jakarta Messaging have design patterns built for reliability (eg: ordering, transaction script, dead letter queue, message priority).

Whereas Kafka and Event Hub are ideal for streams. They can be replayed and read again (imagine data coming in like logs). The data is persisted until the configured retention time.

In certain scenarios it’s OK if the system is slow (scaling takes time) but delivery is important. In some cases, it has to scale so as to collect as much data as possible.

A majority of developers look for a middle ground. You can always choose one and build (code) features that you don’t have.

With the right set of configurations, you could have the best of both worlds.

Protocol

AMQP has been a standard protocol used in the industry for a while now. These are used by traditional messaging brokers like Rabbit MQ.

Kafka on the other uses a custom protocol on top of TCP/IP. Event hub supports AMQP protocol (along with Kafka protocol). Traditional messaging systems focus more on reliability.

This parameter matters only when you are migrating an existing application from one service to another. There is less configuration required (open port in firewall) if you are using a similar protocol. For eg: Kafka and Event Hub

Conclusion

It’s really hard (and probably unfair) to box any messaging service into a category. Every service offers certain unique features and is capable of consuming and/or producing both events or messages.

It’s about how you define the problem, and these are two key parameters to think about, while making that decision.

Event v/s Message

Scalability v/s Reliability

A summary of popular messaging services:

There is no single right answer or path forward, but there is one right way to frame the problem (by Clayton M. Christensen) and — If you define the problem correctly, you almost have the solution (by Steve Jobs)

Passionate Software Engineer who loves to solve problems