Strategies for managing data in microservices

In this post, we’ll look at some common patterns for managing data in a distributed microservice architecture. Managing data in a monolithic application is fairly easy and well understood, but in a microservice architecture it can be a lot more challenging and different patterns are needed. By just re-using patterns from the monolithic world we often end up with poor results and this anti-pattern is often known as the “distributed monolith”.

Private data-stores and synchronous calls between service interfaces

Each microservice manages their own data in private data-stores. Every piece of data is owned by a single service. When a service needs to exchange data with other services it calls their public api by using REST, gRPC, etc. Designing stable, reusable and encapsulated API’s are crucial.

Synchronous communication

Pros

Easy to implement
Very well understood pattern and programming paradigm
Fairly easy to follow the program flow between the services assuming good logging and tracing facilities are in place
Easy to reason about eventual consistency in a synchronous architecture

Cons

Availability can be significantly decreased if one or more services stops responding. This can have a ripple-effect throughout the architecture if there are many runtime-dependencies. This can be mitigated to some extent by using circuit breaker and similar patterns.
Transactions spanning across multiple services are hard to implement. Distributed transactions should be avoided because of the CAP theorem. Can be mitigated by using the saga pattern, workflows or compensating transactions for example.
Implementing queries that “joins” across multiple services are challenging. Can result in poor performance as you typically need to ask one service before you can call another.
Poor scalability and performance because of synchronous calls. Can to some extent be mitigated by using caching.

Private data-stores and asynchronous events

Each microservice manages their own data in private data-stores. Every piece of data is owned by a single service, but data can be replicated across services as reference-data. Each microservice stores the necessary reference-data it needs in an optimized data structure. Services can publish business events when state changes by the use of message-oriented middleware (ActiveMQ, RabbitMQ, Kafka, etc) so that the other services can update their reference-data accordingly in an asynchronously manner. The events will typically include an event-type plus data describing what has happened. This pattern is often referred to as event-driven architecture.

Eventdriven architecture

Pros

Great scalability and performance
Availability - each microservice operates on its own data and are not runtime-dependent on others.
“Quality of service” guarantees are provided by message-oriented middleware (guaranteed delivery, FIFO, etc)
Easy to add new microservices that can start subscribing to events

Cons

Can be more complex to implement
Asynchronous communication can be more challenging than synchronous communication
Need for a middleware component for messaging
Harder to reason about eventual consistency as updates happens asynchronously
Single point of failure in the message-oriented middleware

Event-sourcing and Streaming

This pattern is somewhat similar to “private data-store, async events”, but it reduces the need for storing data locally in the services. Services publishes business events to an event-store. Services can subscribe to, and process, events or they can do queries directly towards the event store. This pattern is often used in conjunction with CQRS. There are some very interesting technologies emerging in this field, like KSQL which is a query language for Kafka.

Event-sourcing

Pros

Highly scalable and performant
Event-store provides strong audit capabilities
Potential for high recoverability as events can be re-processed
Easy to add new microservices that can read from and publish to the event-store
Flexible pattern that can spur new ideas, creativity and opportunities.

Cons

Arguably somewhat immature technologies, but gaining a lot of traction
Can be more complex to implement

Shared metadata-libraries

This is an old but useful pattern, even in a distributed microservice world. Read-only and immutable metadata is stored in libraries and used by the different microservices. Examples: Countries, states, colours, etc.

Metadata libraries

Pros

Very well-understood and easy to implement
Productivity gain
Shared model across services

Cons

Change in data often requires each services to be re-built.

Closing words…

As always there’s no silver bullet when it comes to managing data in microservices and software architecture is an evolutionary process as well. It’s important to choose the right tools for the job and different use-cases requires different patterns. Some data needs some degree of synchronous communication and some data fits more closely with asynchronous communication. It’s highly likely that you’ll find data in both categories, and data that’s somewhere in between, in every software system. Each of the patterns can easily be used in conjunction with each other.

One thing that’s certain is that you have to have a bigger toolbox in the microservice-world and things like events and asynchronous communication should definitely be first-class citizens in that toolbox.

This post was inspired by a great talk by Randy Shoup at QCon New York 2017: Managing data in microservices

Back to posts

Nothin but code 👨‍💻

Strategies for managing data in microservices

Private data-stores and synchronous calls between service interfaces

Private data-stores and asynchronous events

Event-sourcing and Streaming

Shared metadata-libraries

Closing words…