Roughly a year ago, my interest in event sourcing and CQRS was sparked, which has captivated me ever since. I am now in the process of constructing an event-driven microservice architecture at Works, and have documented the design considerations in a comprehensive essay, titled “Design Consideration: Building Out an Antifragile Microservice Architecture @ Works“. I highly recommend giving it a read for further details.
In this blog post, we will present an outline of how we have employed event sourcing, Command Query Responsibility Segregation (CQRS), and Apache Kafka to create a data-oriented infrastructure that is scalable. We would like to highlight that the author has not acquired significant experience in the development of event-sourced applications, hence we welcome any feedback from individuals with hands-on expertise.
The Traditional Approach
To ensure that users interact with the most current data, various applications concerning data management need to establish a continuous updating process. The conventional CRUD (Create, Read, Update, Delete) model entails getting data from a repository, making alterations and then confirming these modifications via transactions that block the data. However, this approach brings about numerous challenges, including:
- When operating with event-driven systems, utilizing a distributed transaction that involves both the Message Broker and the database necessitates the implementation of a two-phase commit (2PC). This commit process could negatively impact transaction throughput, ultimately slowing it down. To gain more insight into the two-phase commit and its ramifications, watch this educational video clip.
- In a cooperative framework comprising numerous users working at the same time, data update conflicts are more probable to arise, due to the update procedures implemented on a singular piece of data, causing inconsistencies. This is caused by the fact that several users may be modifying the same data simultaneously, leading to conflicting updates.
- Without an additional auditing system that saves the details of each operation in a separate log, the history is forfeited.
Event sourcing offers an alternative to the usual practice of maintaining business entities by utilizing two-phase commit (2PC) protocols. In contrast to merely tracking the current status of an entity, event sourcing employs the history of state alterations to reconstruct its current form. This process provides various benefits, such as treating each modification as an individual, atomic operation, which simplifies the tracking and updating of an entity. Event sourcing guarantees that all changes are secure and reliable by recording each event as it takes place.
An Event Store refers to a database-like data structure intentionally created for tracking events within an event-driven architecture. It operates similarly to a Message Broker, presenting an API for event subscription, enabling all subscribed parties to obtain events from the store. Event Stores serve as the central data store in event-sourced microservice architectures, and we suggest reading Martin Fowler’s article on the subject, as well as the “Introduction to Event Sourcing” article, for a more detailed explanation of event sourcing.
Why Should I Choose Event Sourcing When Evaluating My Alternatives?
Although event history is a significant advantage of event sourcing, it is not the only one. Here are some examples:
Fulfill Auditing Requirements:Events are long-lasting and serve as a record of the entire system’s evolution over time. This enables them to supply a comprehensive audit trail of all system activity, making it possible to trace and track the system’s behavior.
Interoperability with Other Parts of the System:Upon modifying the application’s state, the event store may notify the relevant components. Additionally, the event store maintains a comprehensive record of all the events it has disseminated to other elements of the infrastructure.
Modify the Past:Having access to a log of previous events associated with a domain object enables the ability to rebuild the system’s state at any time. This feature enables the organization to answer any queries related to the system they may have had in the past.
Division of Responsibilities Between Command and Inquiry (CQRS)
By utilizing Command Query Responsibility Segregation (CQRS), it is easy to divide a system into several components. Objects should either execute commands or carry out queries but not both. Queries provide information without modifying the object’s state, whereas commands modify the object’s state without providing any additional information. The outcome is a clearer understanding of a system’s state and the factors that influence it.
Using the Command Query Responsibility Segregation (CQRS) pattern enhances the efficiency of implementing an event-driven system. In event sourcing, the current state is stored as a sequence of events, requiring the reconstruction of the state by replaying the events that lead up to the present state within the system, making retrieval of the state extended. Employing two models helps reduce this process’s overhead; an eventstore serves as a write model while a conventional database serves as a read model. When an event is stored in the database, a distinct procedure updates the read store. The read store is a database exclusively used for retrieving information. By utilizing the CQRS pattern, the read and write operations are segregated by employing the readstore and eventstore, respectively.
Kafka as an EventStore
Kafka functions as a message broker or queue (similar to AMQP, JMS, NATS, RabbitMQ). It caters to two different types of clients:
- Producers: Produce Kafka Messages
- Consumers: Kafka enables users to subscribe to message streams.
Kafka’s log-like structure makes it both appealing and useful as an eventstore. To assist in comprehending Kafka’s functionality, please consult this accompanying visual as a quick reference guide.
- Producers update Kafka topics such as the users topic.
- Messages are appended to the end of the partition they are transmitted to (e.g., a user’s topic) when published to a Kafka topic. Kafka presently only supports topic writing and no additional operations.
- Each topic partition consists of a log of the topics it comprises (a completely ordered sequence of events).
- Since all partitions are distinct entities, there is no ordering assurance within a topic between partitions.
- Since the data for each partition is stored on disk and replicated across multiple computers (the count is determined by the replication factor of the partition’s topic), the system can sustain a degree of fault tolerance and recover from the loss of a single machine.
- The offset of messages within a partition increases uniformly (indicated by the log position). When a client desires to consume messages from Kafka, they may do so sequentially, starting from a specified offset. This method offers cost savings to the client.
Employing Kafka for Event Sourcing
At Works, each microservice corresponds to a single bounded context, and each bounded context is associated with a specific Kafka topic. For instance, let’s take a look at our user-management-service microservice. Our user-management-service is responsible for handling all our users. We started by reviewing all the potential events that could be transmitted to the users’ topic. Upon examination, we created a schedule to record the domain events, which included:
All microservices that are interested in domain events will subscribe to them once they are published, irrespective of the microservice in charge of user management. The following diagram offers a simplified depiction of a user request process.
When a POST request is submitted by an end user, the API gateway commences a remote procedure call (RPC) to the CreateUser function within the user management service, and responds to the user. At the CreateUser endpoint, several checks are performed to validate the user’s input. If any issues are detected, the API gateway will relay the information back to the user. Upon successful validation, a UserCreatedEvent is transmitted to one of three subtopics in the users topic, following predetermined criteria. To guarantee the proper aggregation of user events, all messages are transmitted to a single partition, since no ordering assurance is provided for messages directed to different partitions. Our system is established with this principle in mind.
Numerous microservices are configured to respond to published events on various channels. Our notification services, including those tasked with user management, sending emails, and posting messages in Slack, are subscribed to the UserCreatedEvent. When this event is received, the user management service invokes the userCreate handler, which creates a user in our Postgres database. In addition, other microservices, such as the Email Notification service and the Slack Notification service, are activated to inform the user via email and the administrators via Slack.
In our example, we employ three different copies of the user management service, with each copy subscribing to one of the three partitions. This approach ensures that only one of the replicas receives the transmitted message and processes it just once. For a subject like “users“, if there are ten subtopics, the three service replicas split the work equally, with each copy drawing resources from four separate partitions. Since resources are limited, the number of replicas of the user management service should be determined by the number of partitions used for each subject. Therefore, it is recommended to over-partition initially; we initially started with 50 groups per subject.
The read store is where data is obtained during read operations, such as collecting a list of all users or retrieving a single user (postgresDB).
Even though event sourcing provides various benefits over conventional methods, implementing it can be particularly difficult. These difficulties may manifest in a variety of ways, such as:
- There is a learning curve since it is a new and unfamiliar programming language.
- In the current setting, there is a severe shortage of resources for building event-sourced applications.
Considering the difficulties involved in using event sourcing to develop a Twitter-like application, I intend to document my experience in a series of blog posts. I have yet to decide between Node.js and Golang as the programming language, so I am requesting feedback from my readers in the blog’s comments section. Any insights that can guide my decision-making would be greatly appreciated.
Recommended Readings on the Topic
- This is the record of all events that occurred.
- Flipping the database to enable a comprehensive analysis from all perspectives
- Why is local state considered such a fundamental building block in stream processing?
- The Kafka-inspired Strategy
- Constructing a robust structure using logs
- The log serves as the lifeblood of your data pipeline.