How do data strategies work and do companies even need them? This question is on the minds of many business leaders today. At the same time, they are under increasing pressure to remain competitive and innovative. A key factor in achieving this goal is the effective use of data: it allows companies to identify efficiency reserves in processes and to better understand customers to adapt products and services or even develop new offerings.
Once there is consensus on what you want to achieve with the data, a data strategy can be derived from it — that is, a concrete framework for action to structure measures and pursue the overarching goal, the “North Star”. A data strategy turns the many different individual decisions in different areas of the company into a coherent effort to achieve a common goal.
In the following example, we look at a company that aims to be the leading online optician: To successfully sell glasses online, the company needs to combine data from different sources, ranging from product data from lens and frame manufacturers to marketing data and medical data from customers, as well as information on warehouse and production capacity utilization.
In addition, the online optician needs to encode a significant amount of expertise from opticians into its product data. This expertise determines, for example, the optimal frame size, taking into account factors such as the customer’s vision and the geometry of the lenses. By coding this knowledge into the product data, the company can ensure that customers receive the right glasses.
With all this data in place, the various departments involved in the process — procurement, production and e-commerce — can work seamlessly together to deliver a high-quality result, optimize material usage planning and make forecasts for future developments.
No data strategy, no data value
A data strategy is not just about managing data, but about how business knowledge can best be stored in and understood through data. A data strategy is accompanied by an appropriate organizational culture that provides mechanisms for balancing the needs of different stakeholders, including promoting collaboration and knowledge sharing.
Without a data strategy to structure various efforts, the value added from data in any organization of a certain size or complexity falls far short of the possibilities. In such cases, data is only used locally or aggregated along relatively rigid paths. The result? The company’s agility in terms of necessary changes remains inhibited. In the absence of such a strategy, technical concepts and architectures can hardly increase this value either.
A well-thought-out data strategy can be formulated in various ways. It encompasses several different facets, such as availability, searchability, security, protection of personal data, cost control, etc. However, four key aspects that form the basis for a data strategy can be identified from a variety of data-related projects: identity, bitemporality, networking and federalism.
The four key aspects of a data strategy
1. Identity
The first central element of a data strategy is identity: How are the entities, the attributes and their values identified, that is, how is it possible to decide unambiguously which physical or virtual artifact a data record refers to and/or whether several data records refer to the same thing? Who is responsible for deciding whether two entities are identical? What meaning is associated with identity?
In the example of our online optician, we must ask ourselves whether a type of frame or a frame in a certain size, color or material, or even a specific individual frame, has to be identified.
In the simplest cases, identity is determined by a single source of truth (SSOT), a single source or central reference point that identifies entities. All subsequent systems can then use the identity from the SSOT. However, this simple model cannot be applied to a lot of data. For example, product data. For historical reasons, the data of products in different categories is identified in different systems and under the responsibility of different departments. Sometimes, differently composed keys are used for identification if, for example, product variants use the same code but still have to be specified by color or size attributes. The data strategy must therefore answer how entities, attributes and attribute values are identified.
2. Bitemporality
The second central element of a data strategy is bitemporality, which loosely translates as “two-temporality”. This separates the time of the data query from the time to which the query refers: Is a product currently (i.e. right now, at the time of the query) available? Will it (probably) be available in 14 days? Was the product available on March 3, 2024, at 10:17 CET and if so, which pre-products were in the product sold on March 3, 2024? What did the supply chain for it look like? The data strategy must therefore clarify how information about the current status of an entity is distinguished from information about an earlier status.
3. Networkability
The third factor of a data strategy must clarify the question of which information can be linked, i.e. meaningfully related to each other. For example, similar or matching products can be linked in e-commerce in order to recommend them.
4. Federalism
Federalism is, by definition, an organizational principle that is based on the extensive independence of individual units, which together, however, form a whole. This fourth aspect of a data strategy thus concerns the organization of responsibility for the data under consideration. Federalism means that, although there is responsibility for the data at the local level, it is also clear beyond this level how the data is handled. Federalist structures define the extent of responsibility at each level of responsibility. This means that it is clear who is allowed to do what with data and how it must be stored/made available so that other levels can also access it.
A very specific example, namely brand logos, illustrates these four elements of a data strategy:
The brand is identified and only then is the logo in a specific form (e.g. file format or resolution). The data for the brand and logo are obviously linked, as already becomes clear from the identification. If the logo changes, for example in the case of rebranding, this ensures that the reference is consistent.
Taking into account bitemporality ensures that the logo change is implemented at a defined point in time in all systems involved: Bitemporality takes into account notifications of expected changes and allows a query for future content to cache the new logos in preparation. Conversely, a query directed into the past can also be useful, but typically more so for ingredients, prices, delivery conditions, etc. than for a brand logo.
The determination of which brands exist, how the formats are identified, etc. cannot be determined by purchasing or marketing alone. This requires the necessary networking.
These decisions must be made by a common authority, an institution in the federal system of the data strategy.
The essential nature of a data strategy for companies is also demonstrated by the phenomenon of product data. They have many sources, for example, their own product development, but also “foreign” manufacturers and intermediaries. They are an essential part of important business processes, are needed in a wide variety of departments and are used in a variety of systems:
- In catalogs, together with price information, availability, images
- In inventory management
- In returns management together with warehouse and logistics information
- For planning and reporting
- At the item or category level
- For aggregating sales and revenue figures together with temporal or spatial criteria
- In master data
- Order processing
- E-commerce
- Controlling systems
- In many other contexts
If it is not clear what product data identifies, how it is related to other data, and which rules apply to its cross-domain provision, an impenetrable data swamp will result, which cannot be used productively or analytically. This applies all the more the larger and more differentiated a company is. Many large companies operate numerous e-commerce systems, a number of production lines in different countries, and manage different brands and product categories.
Data is encoded employee knowledge
A data strategy also determines how companies encode the knowledge about their products, services, processes and business models. This makes solutions possible that also allow for automated decision support. To sell glasses online, a lot of specialized optician knowledge must be encoded so that the customer does not make serious mistakes when configuring their glasses. The optimal size of the progressive lenses depends, among other things, on the visual acuity and the lens geometry. To successfully sell glasses online, this experiential knowledge of opticians must be encoded in the product data, and the various departments (procurement, production, e-commerce) must maintain, connect and use this data.
A knowledge graph captures the meaning of the data and plays a special role in identifying and linking the data: Dave McComb’s three-layer knowledge graph model expands a typical two-layer view of schemas or classes on the one hand and data or instances on the other. McComb introduces a middle layer that takes on a hybrid role and refers to these three layers as concepts, categories and data.
Katariina Kari, lead ontologist at Inter Ikea Systems, and her team have introduced a knowledge graph of this kind in a very practical way. We will use this example as a guide but apply it to the online optician example.
- The top layer contains the central concepts, for example, “frame” with “properties.” The number of concepts is in the hundreds. They are closely coordinated and subject to rigorous central governance.
- At the middle level, the category “color” is defined as a property with the characteristics “Tortoise” or “Havana.” The number of categories typically runs into the thousands, but the categories can be subdivided thematically, and corresponding subject matter experts define the individual thematic areas.
- McComb describes the lowest layer as data, and this layer includes everything that is colored, such as the bridge of a pair of glasses. The number of entities at the data level potentially runs into the millions. The data layer breaks down into areas, each of which is subject to the control of the domains. The principle of federalism is particularly evident here.
The integration of categories and, in particular, data into the entire landscape is done via reference to the higher-level levels, so that networking is possible across them. For example, all frames can be linked to the color of the bridge in tortoise. Similar products can be suggested in the e-commerce system via similarities.
Elements of the data strategy correspond with data mesh principles
The concept of data mesh, which is currently the subject of much discussion and was developed by Zhamak Dehghani, the technology director of the IT consultancy ThoughtWorks, is nothing more than a specific manifestation of a data strategy. This socio-technical concept is based on four principles: domain ownership, data as a product, self-service data platforms and federated governance. We will discuss this concept in relation to the four key aspects of identity, bitemporality, interconnectedness and federalism.
Domain ownership
This principle states that responsibility for data should not be borne by a central data team, but rather in the domains in which it is created. In concrete terms, this means that the team that is responsible for an end-to-end business function is also responsible for the data that is created in connection with that business function.
Data as a product
Collecting, processing and providing data is not an end in itself, but must — like any product for its user — create value. However, this also requires strategic planning, a suitable product-market fit and the marketing of the respective data product: data products focus on the data consumer and their needs but also balance the different wishes of different consumers. The form of a data product, for example as an API, as database access, or as a visualization, depends on the needs of the consumers, and different data products can certainly be generated from the same data for different needs.
Self-service data platform
To enable product teams to provide their data products quickly and efficiently, they need the right tools, and a kind of production and distribution line for data products. Ideally, these tools should be interlinked in such a way that consumers can easily link different data products. “Self-service” — or perhaps it would be better to say “in line with the principle of subsidiarity” — means that data owners can offer data products independently. Contrary to what the name “data platform” suggests, it is therefore equally a question of the available infrastructure and the organizational structure to set up teams in such a way that this independence can be realized.
This principle represents the greatest hurdle for the realization of a data mesh approach in terms of complexity. Not because the availability of corresponding data platforms is lacking, but because the balance of competencies within the organization must be re-balanced accordingly.
Federated governance
To generate added value, the data mesh approach emphasizes data products under local responsibility. In line with our points above, the added value arises precisely in the networking of different domains, in the relationship between data producers and consumers. There are areas, dictated at the latest by external regulation in terms of security, data protection, etc., that cannot be regulated locally by the data owners. There must be overarching structures and guidelines that determine how data is organized and used in larger contexts. The federal principle of subsidiarity applies here: Like the interaction between municipalities, states and the federal government, decisions are made at the institutional level whose competence is just sufficient for that. If the individual, the smallest group at the lowest institutional level lacks competencies, a higher instance takes action.
Identity, bitemporality, networking and federalism in a data mesh
Depending on the business requirements and the complexity of the data streams in a company, a data mesh can be the most sensible way to implement a data strategy. All too often, the technical side is emphasized more than the sociological side. However, we also see that the four principles of domain ownership, data as a product, self-service data platform and federated governance provide little concrete orientation: What does a data product contain? How is it related to other data products? What should a self-service data platform enable?
This brings us back to the four key aspects of a data strategy: identity, bitemporality, interconnectedness and federalism. These key aspects focus the data strategy on specific points and can thus, for example, provide structure for the realization of a data mesh:
Which identities are exposed in the data products? Which data products need to reference common identities to enable interconnectedness? Do data products only have to be realized “for the moment” or for a look forward or back — keyword bitemporality?
And above all this, there is the question: Who has the authority to identify entities? In this context, authority means both the business, technical and design knowledge as well as the generally recognized mandate to design the corresponding information spaces.
The data mesh approach explicitly applies the federal principle to governance, i.e. to administration, including the design of administration. With our understanding of federalism, we go further and explicitly include the design of the data spaces: the creation and maintenance of concepts, categories and data in a knowledge graph is also organized as a federal structure. The category level can be broken up and implemented locally. In particular, different sub-areas of the second level can be managed by different teams. The data level is then created locally in the domains and is subject to the respective owner of a data product.
A data strategy requires a culture
In recognition of Peter Drucker’s adage “culture eats strategy for breakfast,” a corresponding culture is also an essential prerequisite for a successful data strategy. (Corporate) culture encompasses the intangible foundations of an organization’s creative achievements.
Regarding data culture, for example, the question of how federal structures are designed arises: Does an organization tend to emphasize central responsibility or local responsibility? Do federal levels also correspond to hierarchical levels, i.e. are decisions escalated through management or are competent committees (with decision-making authority), put together differently? How is the decentralized competence of the domains balanced in comparison to centrally provided platforms that are to be used with the shortest possible learning curve for users from the domains, but which have to be operated at considerable expense?
Moving step by step to the ‘North Star’
Companies that are rethinking their data strategy should develop a North Star but then proceed in a very pragmatic way. The North Star represents the desired end state: Do you want to increase efficiency, improve products or services based on insights from existing data or open up new business areas? If the goal of a data strategy and corresponding initiatives is not clear, then the realization is doomed to failure. Only when the direction is clear can practically realizable steps lead to success.
The organization can be carefully modified, for example, to establish federal governance structures, implement central control of the top ontology layer, and adapt and improve it in interaction with the domains. The domains must be empowered to independently implement data products, with a central definition of the policies that must apply to all, for example with regard to identity and access management. And here, in the creation of a platform — planned or emergent because of only loosely coordinated initiatives to reduce communication overheads — the data strategy approaches the classic IT strategy, particularly concerning cloud architectures.
Conclusion: A data strategy for informed decisions
Competitiveness through innovation requires a well-thought-out data strategy. By focusing on the key aspects of identity, bitemporality, networking and federalism, companies can unlock the potential of their data and make informed decisions.
This is not just about collecting and analyzing data, but about creating a culture of data-driven decision-making. It requires the ability to strike a balance between centralization and decentralization. In this context, federalism, a core element of our society, becomes the structuring element.
Dr. Christian Betz designs solutions to create knowledge and value from data. Efficient data management, artificial intelligence and interactive data visualizations are his tools of choice. He develops digital strategies and architectures for new solutions at Randstad Digital Germany AG.
Leave a Reply