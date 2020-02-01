Recently we have seen many “x-Ops” management practices appear on the scene, all of DevOps derivatives, which want to coordinate the output of developers and operational teams in a smooth, consistent and fast flow of software versions. Another emerging practice, DataOps, strives for an equally smooth, consistent and fast data flow through companies. Like many things nowadays, DataOps is flooded by the large internet companies, which process petabytes and exabytes of information on a daily basis.

Photo: Joe McKendrick

Such unrestrained data flow is increasingly vital for companies that want to become more data-driven and want to scale artificial intelligence and machine learning in such a way that these technologies can have a strategic impact.

Awareness of DataOps is high. A recent survey of 300 companies by 451 Research shows that 72 percent are actively working with DataOps and the remaining 28 percent are planning to do so in the coming year. A majority, 86 percent, increases their spending on DataOps projects for the next 12 months. Most of these expenditures go to analysis, self-service data access, data virtualization, and data preparation efforts.

In the report, 451 research analyst Matt Aslett defines DataOps as “The alignment of people, processes, and technology to enable a more flexible and automated approach to data management.”

The catch is: “most companies are not prepared, often because of behavioral standards – such as hoarding of territorial data – and because they are lagging behind in their technical capabilities – often stuck with cumbersome extraction, transformation and load (ETL) and master data management (MDM) – systems, “said Andy Palmer and a team of co-authors in their latest report Getting DataOps Right, published by O’Reilly. In most companies, data is closed, disconnected and usually inaccessible. There is also an abundance of data that is completely undiscovered that decision makers are not even aware of.

Here are some of Palmer’s recommendations for building and shaping a well-functioning DataOps ecosystem:

Keep it open: The ecosystem in DataOps must resemble DevOps ecosystems in which there are many best-of-breed free and open source software and proprietary tools that are expected to work together via APIs. “This also includes carefully evaluating and selecting from the range of tools that have been developed by major internet companies.

Automate it all: Collecting, taking, organizing, storing and popping up huge amounts of data at an almost real-time pace has become almost impossible for humans to manage. Let the machines do it, Palmer insists. Areas that are ripe for automation include “operations, repeatability, automated testing, and data release.” Look at the ways in which DevOps facilitates the automation of the software building, testing and release process, he notes.

Process data in both batch and streaming modes. Although DataOps is about real-time data delivery, there is still a place – and reason – for batch mode. “The success of Kafka and similar design patterns has validated that a healthy next-generation data ecosystem offers the possibility to simultaneously process data from source to consumption in both batch and streaming modes,” Palmer notes.

Follow data line: Confidence in the data is the most important element in a data-driven enterprise and it can simply cease without functioning. That is why well thought out data governance and a metadata (data over data) layer are important. “A focus on data line and tracking of processing in the data ecosystem results in an increase in reproducibility and confidence in the increase in data,” says Palmer.

Have layered interfaces. Everyone touches data in different ways. “Some key users need access to data in their raw form, while others just want answers to well-formulated questions, “Palmer says. Therefore, a layered set of services and design patterns is required for different personas of users. Palmer says there are three ways to meet these multi-layer requirements:

“Data access services that are” View “abstractions to the data and are essentially SQL or SQL-like interfaces. This is the level of the lead user that data scientists prefer.

“Messaging services that form the basis for stateful data exchange, event processing and orchestration data exchange.

“REST services built on or wrapped in APIs for the ultimate in flexible direct access to and exchange of data.”

Managers increasingly rely on their technology leaders and teams to transform their organizations into data-driven digital entities that can respond to events and opportunities almost immediately. The best way to achieve this – especially with the lean budgets and limited support that is thrown away with this mandate – is to coordinate the way data flows from source to storage.