Around a year ago, I first became aware of the concepts of event sourcing and CQRS. This piqued my interest and I have been fascinated by the possibilities ever since. I have now embarked upon the task of building an event-driven microservice architecture at Works, and I have written a detailed essay outlining some of the design considerations we encountered. The essay is titled “Building Out an Antifragile Microservice Architecture @ Works — Design Consideration” and I strongly recommend reading it for further information.
This blog post will provide an overview of our use of event sourcing, Command Query Responsibility Segregation (CQRS), and Apache Kafka to build a scalable, data-driven infrastructure. It should be noted that the author does not have extensive experience in developing event-sourced applications, so any feedback from those with firsthand experience would be greatly appreciated.
The Old Way of Doing Things
In order to ensure that users are interacting with the most up-to-date data, many applications that involve data management must implement a process of continual updating. The classic CRUD (Create, Read, Update, Delete) paradigm involves retrieving data from a repository, making adjustments to it, and committing those changes through transactions that lock the data. However, this approach presents numerous issues, such as:
- When dealing with event-driven systems, the use of a distributed transaction that includes both the database and the Message Broker requires that a two-phase commit (2PC) be implemented. This process of committing can have a negative impact on transaction throughput, as it tends to slow it down. To learn more about two-phase commit and its effects, please take a look at this informative video clip.
- Data update conflicts are more likely to occur in a collaborative environment with multiple users working at the same time, as update operations on a single piece of data can lead to discrepancies. This is a consequence of the fact that multiple users may be making changes to the same data simultaneously, resulting in conflicting updates.
- Unless there is an extra auditing system, which records the specifics of each operation in a separate log, history is lost.
Event sourcing provides an alternative to the traditional approach of maintaining business entities through the use of two-phase commit (2PC) protocols. Instead of simply tracking the current state of a given entity, event sourcing relies on the history of state changes to reconstruct its present form. This approach is advantageous in that all changes are treated as individual, atomic actions, making it simpler and more reliable to track and update a given entity. By keeping track of each event as it occurs, event sourcing ensures that all changes are secure and durable.
An Event Store is a database-like data structure specifically designed to track events within an application that relies on event-driven architecture. It functions similarly to a Message Broker, providing an API for event subscription, allowing all subscribed users to receive events from the store. Event Stores are a cornerstone of event-sourced microservice architectures, as they serve as the central data store. If you’re looking for a more detailed explanation of event sourcing, we recommend that you read Martin Fowler’s article on the subject, as well as the “Introduction to Event Sourcing” article.
When considering my options, why should I choose event sourcing?
The event history is only one of the many advantages of event sourcing. Some examples of these are:
- Follow-up on an audit Events are persistent and serve to document the entire system’s progression over time. This allows them to provide a comprehensive audit trail of all system activity, making it possible to track and observe the system’s behavior.
- Compatibility with other parts of the system When the application’s state is modified, the event store may send out a notification to the necessary components. Furthermore, the event store maintains a comprehensive record of all the events it has sent out to other elements of the infrastructure.
- Change the Past Having the ability to access a record of past events associated with a domain object enables the ability to reconstruct the system’s state at any given time. This capability provides the organization with the ability to answer any inquiries they may have had in the past concerning the system.
Separation of Duties Between Command and Inquiry (CQRS)
By utilizing Command Query Responsibility Segregation (CQRS), it is possible to easily divide a system into multiple components. Objects should only execute commands or perform queries, but not both. Queries return information without altering the object’s state, while commands modify the object’s state without supplying any additional information. The end result provides a clearer understanding of a system’s state and the factors that influence it.
Implementing an event-driven system is made more efficient with the use of the Command Query Responsibility Segregation (CQRS) pattern. In event sourcing, the current state is stored as a sequence of events, making retrieval of this state a lengthy process since the state must be reconstructed within the system by replaying the events leading up to the present state. To reduce the overhead of this process, two models are used; a write model in the form of an eventstore and a read model in the form of a conventional database. When an event is saved to the database, a separate process is responsible for refreshing the read store. The read store is a read-only database that can be used to retrieve information. By applying the CQRS pattern, writing and reading are completely decoupled by using the readstore and eventstore respectively.
As an EventStore, Kafka
Kafka serves as a broker or queue for messages (comparable to AMQP, JMS, NATS, RabbitMQ). It caters to two distinct clientele:
- producers: Message Kafka
- consumers: Kafka allows users to subscribe to message streams.
Kafka’s log-like form makes it both captivating and practical as an eventstore. To help with understanding how Kafka works, please refer to this accompanying visual as a quick reference.
- Producers, provide updates to Kafka topics like the users topic.
- Whenever messages are created, they are appended to the end of the partition they are sent to, such as a user’s topic, when they are published to a Kafka topic. Currently, Kafka only supports writing to topics and does not provide any other operation.
- Each topic partition is a log of the topics it contains (a totally ordered sequence of events)
- There is no ordering guarantee between partitions inside a subject since they are all separate entities.
- Due to the fact that the data for each partition is stored on disk and is duplicated across a number of computers (the number of which is based on the replication factor of the partition’s topic), the system is able to maintain a level of fault-tolerance and is able to recover from the loss of a single machine.
- The offset of messages within a single partition increases continuously (as indicated by the log position). When a client wishes to consume messages from Kafka, they can do so in a sequential manner, beginning from an offset that has been specified. This approach provides the customer with a decrease in costs.
Using Kafka for event sourcing
At Works, each microservice is associated with a single bounded context, and each bounded context is associated with a specific Kafka topic. As an example, we will discuss the use of our user-management-service microservice. Our user-management-service is responsible for managing all of our users. To begin, we evaluated all of the potential events that could be sent to the users’ subject. After our assessment, we created a calendar to document the domain events, which included:
All microservices that are interested in domain events will subscribe to them once they are published, regardless of the microservice responsible for user management. The following figure illustrates the simplified process of a user request.
The API gateway receives a POST request from the end user and responds by initiating a remote procedure call (RPC) to the user management service’s CreateUser function. To ensure the accuracy of the user’s input, several checks are performed at the CreateUser endpoint. If any errors are detected, the API gateway will relay this information back to the user. Upon successful validation, a UserCreatedEvent will be sent to one of three subtopics within the users topic, based on predetermined criteria. To ensure user events are properly aggregated, all messages are sent to a single partition, as it provides no ordering guarantee for messages sent to different partitions. Our system is designed with this consideration in mind.
Various microservices will be configured to respond to events that are published on different channels. Our notification services, including those responsible for managing users and sending emails and messages in Slack, are subscribed to the UserCreatedEvent. Upon receiving this event, the user management service invokes the userCreate handler, which subsequently creates a user in our Postgres database. Furthermore, additional microservices, such as the Email Notification service and the Slack Notification service, will be triggered to inform the user via email and administrators via Slack.
In our example, we are using three separate instances of the user management service, with each instance having its own subscription to one of the three partitions. This strategy ensures that only one of the clones receives the message when it is published and that it is only processed once. For instance, if there are ten subtopics under the “users” subject, the three instances of the service will divide the workload evenly, with each copy drawing resources from four different partitions. Since there are limited resources available, the number of copies of the user management service must be determined by the number of partitions of each subject that are used. Therefore, it is advised to over-partition initially; we began with 50 groups per subject.
The read store is where data is retrieved from when performing read operations such as listing all users or retrieving a single user (postgresDB).
Despite the numerous advantages that event sourcing presents over traditional methods, it is notoriously challenging to implement. These difficulties may manifest in a variety of ways, such as:
- There is a learning curve since it is a new and unusual programming language.
- In the current environment, there is a severe lack of materials for developing event sourced apps.
In light of the challenges that come with using event sourcing to create a Twitter clone, I am planning to write a series of blog entries that document my experience. I am still undecided on which language to use, Node.js or Golang, so I am asking for feedback from readers in the comments section of the blog. I would appreciate any input that can help me make an informed decision.
Articles to Read on the Subject
- This Is The Record Of Everything That Happened
- Inverting the database so that it may be examined from all sides
- How come local state is such a basic building block in stream processing?
- The Kafkaesque Plan
- Putting up a sturdy building out of logs
- Your data pipeline’s blood is the log.