Data Mesh and Its Distributed Data Architecture: A Comprehensive Guide

In recent years, there has been a remarkable surge in the amount of data being generated. To help conceptualise this growth, it is useful to consider the “Three V’s” of data: Volume, Variety, and Velocity. As the quantity of data increases, companies are increasingly expected to capitalise on it, extracting deeper insights and utilising it more effectively and efficiently.

For decades, organisations around the world have relied on the same methods for data analysis. However, in our modern age, where organisations often require near-instantaneous analysis, it is becoming increasingly clear that the traditional approach has certain limitations and drawbacks. To address this issue, a new, more efficient and streamlined data analysis paradigm needs to be developed; this is where data mesh comes into play. Data mesh promises to provide an alternative to existing data lakes and warehouses that is simpler, faster, and more effective at meeting the real-time data analysis needs of today’s organisations.

Course on Data Mesh Fundamentals

In the past, organisations leveraged records to identify opportunities for improvement and uncover patterns. However, in this digital age, the traditional approach of first constructing and then delivering reports in the afternoon no longer applies. Data mesh is a modern analytics technique that recognises the need for real-time insights. In today’s business environment, it is imperative that all teams and personnel have access to real-time analytics to optimise performance.

In the current data landscapes, companies are increasingly turning to data lakes to store their abundance of data. A data lake is a single repository where all types of data can be stored. However, this centralised architecture is not always the most optimal solution. A data mesh is a more recent development that allows organisations to divide their data into several distinct domains, with each being managed by its own set of administrators. This provides a more granular approach to data storage and management, allowing for more efficient and secure storage of data.

There are four pillars of data mesh:

  1. Control of the Domain
  2. Information as a Good
  3. Access to info on your own time
  4. The Federated Computational Governance of Organisations

Control of the Domain

The primary objective of this strategy is to ensure that the data is retained by those who are best suited to make the most of it. The question that arises now is to determine who has the authority to access what information.

Zhamak Dehghani‘s book, “Data Mesh Principles and Logical Architecture,” published in 2020, introduces a unique approach to decomposing a data set. The Data Mesh method utilises the boundaries between functional units as the primary axis of division, which is particularly relevant in today’s businesses that are broken down into various specialisations. Examples of these divisions include departments such as Order, Shipping, Inventory, Billing, and Customer Service.

Ensuring that the data that has been generated adheres to the company’s standards would be the responsibility of specialised “data teams” in each of the divisions. These teams would be carefully composed of professionals with expertise in their respective areas. As the nomenclature utilised is standardised, it enables the “data teams” to access high-quality, organised data from a variety of sources.

Information as a Good

The shared data in your domain should be seen as premium goods, and the other “data teams” inside your organisation as your internal clients.

According to an article by Zhamak Dehghani, the implementation of a “data as a product” philosophy can help to address the issues of data quality and data silos. Gartner has described data which companies acquire, process and store during the regular course of business, but which is not often taken advantage of for other purposes, as “black data.

The emergence of “data teams” has created the need for a new role, the domain data product owner, to ensure that other domain teams receive their data in an optimal format. This individual will be responsible for ensuring that data is standardised and properly packaged, as decentralisation and the resulting local autonomy can lead to data silos and significant data quality issues. It is therefore essential to take a proactive approach to avoiding data chauvinism.

Independent information hubs

In order to ensure the success of any project, it is essential to have access to accurate and timely information. When embarking on a new project, it is advisable to incorporate historical data into the database and continuously update the information to reflect any changes or events that occur. This will ensure that the project is based on the most up-to-date and relevant information available.

Typically, you will be presented with a Comma Separated Values (CSV) file, however this presents its own set of complications. It can be difficult to create a matching schema if the new event streams and snapshot files have dissimilar schema or different field names. Furthermore, this process is rarely automated.

This concept has been designed to address the issue of providing historical and real-time data on demand across all databases within an organisation. In order to ensure a successful implementation, a complex, centralised, and autonomous infrastructure must be established.

The Federated Computational Governance of Organisations

The establishment of global naming conventions is of paramount importance for the successful communication and cooperation among the numerous autonomous data teams that are experts in different fields of research. Such standards will enable the efficient exchange of information between these groups, thereby enhancing the overall productivity of their work.

Zhamak’s work provides an explicit definition of federated computational governance, which includes the ability to leverage disparate data products and extract valuable insights, datasets, and machine intelligence. Such a system requires a governance architecture that facilitates decentralisation, domain self-sovereignty, interoperability via global standards, dynamic topology, and automated execution of choices. These features enable the platform to perform graph, set, or other operations at scale in order to produce higher-order datasets, insights, or machine intelligence, thus forming a data mesh. This system of decentralised and automated governance is referred to as federated computational governance.

The establishment of a unified set of international standards – regulations which apply to all data products and the user interfaces associated with them – by a consortium of data product owners and data platform product owners, with the right of autonomy and the authority to make decisions in their respective fields.

Data Mesh is an innovative approach to data management which seeks to replace the traditional centralised strategy for data processing and exchange between departments within an organisation, in order to reduce overhead costs and maximise efficiency. Data Mesh is designed to provide an alternate solution which streamlines the process, allowing for effective and efficient data transfer and exchange.

Despite the optimistic prospects, extensive hard work from a multitude of people is still necessary for the successful implementation of data mesh. Data mesh is still in its infancy and it is likely to take some time before it can supplant the existing data architecture frameworks.

Join the Top 1% of Remote Developers and Designers

Works connects the top 1% of remote developers and designers with the leading brands and startups around the world. We focus on sophisticated, challenging tier-one projects which require highly skilled talent and problem solvers.
seasoned project manager reviewing remote software engineer's progress on software development project, hired from Works blog.join_marketplace.your_wayexperienced remote UI / UX designer working remotely at home while working on UI / UX & product design projects on Works blog.join_marketplace.freelance_jobs