Designing for Resilience in a Workplace Microservice Architecture

As of May 3, 2022, I commenced my tenure at Works. In the past, I worked as a technical trainer responsible for developing the expertise of budding junior ruby/rails engineers. Currently, I am entrusted with leading a team of five accomplished programmers to transition all Works systems to the resilient Antifragile Microservice Pattern.

Here, we outline certain design choices made during the implementation of the Antifragile Microservice Architecture.

Real-Time Interprocess Communication (IPC)

With the widespread use of microservices, it is crucial to exchange data between them promptly. To guarantee quick response times for users, we have implemented techniques to reduce network latency, irrespective of the number of services contacted by a given request. The following alternatives exemplify the actions we have taken:

  • REST (JSON or XML)
  • Cross-Platform SOAP (XML)
  • Thrift (Binary) – Platform-Agnostic (originated at Facebook)
  • Java RMI (JVM Binary)
  • The Avro format (Binary) is transportable and functional across a multitude of systems.
  • Protocol Buffers (Binary) – Transferable (originated at Google)

Although Representational State Transfer (REST) is an ideal mode of communication between applications, it is less suitable for interprocess communication (IPC) since speed is crucial in this context due to the significant size of the JavaScript Object Notation (JSON) payload when compared to its binary equivalent, as well as the expense of serializing and deserializing said payload. Extensible Markup Language (XML) is even less efficient in this regard. We did not opt for the Simple Object Access Protocol (SOAP) as the payload of Java Remote Method Invocation (RMI) is binary and could have been a viable option, but it is only compatible with the Java Virtual Machine (JVM).

Upon careful contemplation, Protocol Buffers was deemed the most suitable option among the triad of cross-platform Remote Procedure Call (RPC) implementations (Thrift, Avro, and Protocol Buffers) for distributed systems due to its superior Interface Definition Language (IDL) and its more user-friendly schema evolution methodology. This preference was primarily based on the binary payloads of these implementations, which are more concise and necessitate less expensive conversion of binary data into an object when compared to JSON.

Tools for Microservices Development

Choosing the ideal microservice framework from the array of options available can be a daunting task. Many of these frameworks are exclusive to certain platforms, such as the Netflix Microservice toolkit necessitating the usage of Java Virtual Machine (JVM), Seneca relying on Node.js, and Go-Kit developed using Golang. Following careful evaluation, gRPC was identified as the optimum framework due to its versatility in supporting multiple programming languages, including Golang, Node.js, Python, Ruby, and Java, making it a perfect choice that caters to all our requirements.

Here at Works, we firmly stand by the opinion that gRPC is the perfect framework for creating microservices. Its exceptional performance, open-source design, and compatibility with mobile devices and HTTP/2 make it the ideal choice. Furthermore, the use of Protocol Buffers as the preferred message format further elevates the framework’s effectiveness. To gain deeper insight into why we endorse gRPC as the ultimate choice for microservice development, we encourage you to familiarize yourself with its underlying principles and motivation.

Event-Driven Data Management

In our pursuit of creating an Antifragile microservice, we have concluded that each service must possess its individual database rather than sharing a common one. Nevertheless, this resolution has presented two primary obstacles.

  • Foremost, there is the predicament of conducting business transactions in a manner that ensures seamless continuity across disparate services.
  • Another challenge is determining how to effectively execute queries that retrieve data from multiple sources.

Following a thorough evaluation of the possible challenges we could face, we resolved to migrate to an event-driven design approach (choreography). In this architectural model, whenever there is a significant modification made to a business entity – such as a microservice – the microservice will circulate an event to announce the alteration. These events can then be subscribed to by other microservices. It is even possible for a microservice to alter its own business entities in response to an incoming event, resulting in a greater number of published events. To gain a better understanding of event-driven architecture, we came across the subsequent article, which proved to be extremely informative.

Event-Based Data Management for Microservices

Following due deliberation, we opted for an event-driven design that mandated a message broker capable of enabling services to both publish and subscribe to events. The final candidates that we ultimately chose are outlined below.

National Air Traffic Services (NATS) is a robust and dependable messaging platform developed using cutting-edge technology. Developed using GoLang, it delivers exceptional speed and performance. My interaction with this platform has been extremely satisfying, as evidenced by the performance chart depicted above. Nonetheless, I believe that some questionable design decisions have been made during the development of this product, which include:

  • Only a single shipment is permitted (fire and forget)
  • Lack of Persistence

In spite of the absence of a persistence layer in NATS, it seems that opting for the “fire and forget” approach was done in order to simplify message delivery. Nevertheless, it is vital to acknowledge that this method does not ensure successful message delivery.

Over time, RabbitMQ has emerged as a dependable, adaptable, and uncomplicated messaging platform. Although it is slower than Nats, Kafka, and Redis, it remains a fitting choice for most messaging needs. However, it does not support event sourcing, so we have opted against using it.

As a result, Kafka was the most suitable option for our requirements due to the following:

  • Kafka’s message delivery is lightning fast (second only to Nats)
  • The built-in persistence mechanism of Kafka utilizes logs.
  • Kafka naturally supports event sourcing, and we can replay events if required.

Given that event sourcing played a crucial role in our choice of Kafka, I wish to elaborate further on the topic and our future plans for its implementation.

Approach for Assessment

With numerous microservices in a system, testing can be a daunting task. To ensure that modifying a microservice does not disrupt the performance of any dependent microservices or API gateways, we follow a three-stage testing procedure. This approach assists us in assuring the dependability of the system and maintaining its smooth operation.

  • Each element of a microservice must undergo extensive unit testing to verify its ability to function independently and efficiently. This is crucial for ensuring the proper operation of the microservice as a whole.
  • We carried out end-to-end testing of all microservice components to ensure seamless and error-free functioning. For all microservices, including API gateway, we have created a comprehensive test suite to validate proper performance, which can be found in our blog post on deciding between microservices and monolithic architectures.
  • Code for acceptance tests is kept in a separate repository and is formulated in a user story style. We rely on the godog golang package to develop and present the user narrative.

At present, we are exploring patterns that can help us build outstanding test suites. This could likely be an ongoing voyage.

Continuous Integration/Continuous Deployment (CI/CD) Pipeline

At Works, we rely on CircleCI to automatically build and test all of our microservices after each commit. To build the service and its tests within a CircleCI environment, we require connections to both the local PostgreSQL instance and an external Kafka instance. We presently use ConcourseCI to create our Continuous Integration/Continuous Delivery (CI/CD) pipeline, which is initiated for every push to the development branch.

Once any modification is made to the development branch, ConcourseCI will pull it and immediately commence the testing process. Upon successful testing, a new Docker image will be created, labeled with a semantic version, and uploaded to Google’s Container Registry. Subsequently, the code will be added to a staging area where other microservices have been deployed. Here, ConcourseCI will acquire the acceptance test repository and execute tests following each successful deployment. Once the tests pass, a final Docker image will be created and uploaded to the Google Container Registry. The ultimate release for production will be triggered manually. At present, our pipeline adheres to this process.

Deployment Methods

In monolithic application deployment, it is a common procedure to run several identical instances of the application. This typically entails deploying N real or virtual servers, with M duplicates of the program installed on each one.

In a microservices application, there can be numerous, if not infinite, services. These services, which are constructed using various languages and frameworks, function independently and need their own infrastructure, resources, scalability, and monitoring measures.

Outlined below are a few potential microservice deployment patterns.

  • Deploying Multiple Instances of a Service Across Different Hosts
  • Distributing Service Instances Across Hosts (or Virtual Machines)
  • Service Instances Based on Containers
  • Deployment with Zero Hosts

Following thorough deliberation, we have concluded that the most appropriate solution for our requirements is the Service Instance per Container Pattern, with Docker serving as the container engine. To comprehend the reasoning behind our decision, it is essential to examine the factors that influenced it. Please continue reading for a detailed explanation of these recurring themes.

  • As a result, we can now communicate in any language we want (unlike serverless deployment)
  • As opposed to the Service Instance per Host(VM) Pattern, it enabled multiple microservices to utilize a single host.
  • It enabled every microservice to operate independently, in its own CPU and memory space (in contrast to the Multiple Service Instances per Host Pattern)

Upon selecting the Service Instance per Container Pattern as the most fitting approach for our needs, we recognized the necessity of selecting an appropriate container orchestration platform. This platform will manage a variety of tasks, including but not limited to:

  • Scaling each microservice horizontally can accommodate a greater number of users.
  • Optimizing the procedure for launching and revising
  • Discovering and distributing service demands.
  • For instance, restarting containers that have failed

Subsequent to conducting additional research, we identified two suitable alternatives, namely Kubernetes and Marathon. Both of these solutions are open-source and free to use. As Google offers a hosted Kubernetes solution in the form of Google Container Engine, we concluded it was the most optimal choice and elected to employ it.


The transition of Works systems to an Antifragile Microservice Architecture has been a thrilling venture for the core team. Since embarking on this project, we have acquired invaluable knowledge and enhanced our capabilities considerably. To ensure that we are making the most informed and advantageous decisions, we consistently evaluate our strategies and outcomes. Though we have encountered challenges along the way, I have found the journey to be extremely fulfilling.

Per week, I will release a fresh blog entry to delve deeper into the subjects I have previously addressed, and will also include an extensive examination of technologies that I did not have a chance to touch on in this post, such as circuit breakers, monitoring, and metrics.

Join the Top 1% of Remote Developers and Designers

Works connects the top 1% of remote developers and designers with the leading brands and startups around the world. We focus on sophisticated, challenging tier-one projects which require highly skilled talent and problem solvers.
seasoned project manager reviewing remote software engineer's progress on software development project, hired from Works blog.join_marketplace.your_wayexperienced remote UI / UX designer working remotely at home while working on UI / UX & product design projects on Works blog.join_marketplace.freelance_jobs