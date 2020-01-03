Loading...

Credit: Accenture

In the next year, we see the cloud, artificial intelligence and data management as the mega forces of the data and analysis agenda. And so, picking up what Big on Data's brother, Andrew Brust, left last week, we are analyzing some of the underlying problems that are shaping adoption.

In the world of data and analysis, you cannot start a conversation today without incorporating the cloud and artificial intelligence. Yesterday, in Part I, we touched the cloud checkbox: we explored how the next generation change in business applications will change the context of how companies will evaluate cloud implementation. Today we turn our attention to the central building block: what is happening in the databases and what we hope will become this year's problem in AI.

Now it's data, not big data

But first a little context. Until now, we framed our annual perspectives on Big Data because until recently, it was considered exceptional. The definition of Big Data was presented by Doug Laney, today the director of Caserta, when he was with the analyst firm Meta Group in 2001. Big Data was novel because the processing was beyond existing data storage technologies and analytical tools of BI of the time. .

Today, Big Data is only Data because the need has become the mother of the invention. As we will note below, the universe of the database has expanded well beyond the central relational model to cover a wide spectrum of platforms and data types. So, now we just call it data and change the name of our annual perspective. Of course, we are not the first to make that observation, since Gartner withdrew Big Data from the exaggerated cycle in 2015.

Now let's go back to our regularly scheduled program.

Remove the AI ​​from the black box

Among the industry observations reported by Andrew last week was the perception that AI has become a mainstream in analysis. In fact, analytics is the tip of the iceberg, as consumers, machines and organizations consume services that work with artificial intelligence every day. But as the consumption of AI results extends through the services that drive the economy, there has been a growing concern about ethics, prejudices or other assumptions that can easily skew algorithms and the selection of data that boost AI.

Today, AI is hardly considered intelligent. While data sets and models can be complex, decisions lack human context. AI can make yes / no decisions, detect patterns and provide predictive or prescriptive recommendations, but in the foreseeable future, unlike humans, AI will not be able to learn something in one context and apply it to another. But even by making simple decisions, such as granting a loan or making recommendations, AI can still cause damage. The former Wall Street quantum, Cathy O & # 39; Neill, raised awareness of the possible AI bias with her 2016 book Weapons of Math Destruction.

The selection and handling of data is another. Get a large enough data set and you can always find at least some pattern. For example, accumulate eating habits in a sufficiently large group of licensed drivers and you may find some risk-related patterns. But since correlation is not always causality, determining whether those patterns are relevant to changing subscription standards or if they are simply sampling phenomena still requires a human in the circuit.

As AI becomes more and more general, companies will be responsible for decisions made with the help of AI algorithms, regardless of how powerful or limited their capabilities are. Over the past year, we have seen the appearance of early stabs to make AI "explainable" by IBM, Google, H2O.ai and others.

As expected, since these are still the first days when it comes to the explanation of AI and the detection of biases, it is that the capabilities are still quite rudimentary: they generally operate at the level of individual characteristic or attribute, similar to See the trees but not the forest. Check out the outreach pages like this one or videos that show a realistic picture of what is possible today.

For example, current capabilities can statistically identify which characteristics of a model most influenced the outcome (for example, generate a decision, prediction or recognize an image or text). For extremely simple models, such as the last step of a food chain to make decisions in regulated sectors such as finance or medical care, they can generate "reason codes." They can also identify which attributes or characteristics must be traced to detect possible biases (which is similar to data security tools to identify PII data). And based on these findings, today's tools can carry out a "disparate impact analysis," which is an elegant term to identify whether the model was biased against a particular segment of people. In some cases, the capabilities to interpret or explain models are limited to a single framework such as TensorFlow. As for something more ambitious, today in the best case there are better conjectures to extrapolate more holistic explanations of why models make decisions.

Our opinion is that the explainability or interpretability of the model is ripe for development. Look for ads here. Behind all the noise of AI-related product announcements this year, we hope that the data-based collaboration tools of data and AI and AutoML services based on the cloud will improve your explanatory game. Today, most of these services can document changes in the models over time, and will probably use model lineage data as a starting point to develop their capabilities to articulate why the models make decisions. Initially, these capabilities are likely to present their findings through statistical visualizations, which requires a data scientist to translate. Later, they will probably add more natural language capabilities to business people.

The explainability of artificial intelligence will not only focus on technology, but will also include best practices. One of the interesting lessons we learned from listening to Patrick Hall from H2O.ai is, if you want your model to be explainable, don't make it too complex. Data scientists could learn one or two things from application developers.

However, by the end of the year we will still be far from being able to obtain holistic explanations that go beyond individual details or attributes. The explicability of AI will be a work in progress for some time.

Credit: Ovum

Clash of the Titans: specialized databases versus multiple models

After the conclusion of Y2K, the relational database became the de facto standard of the company, but as volumes and types of data exploded, so did a new generation of platforms, from key values ​​to documents, graphics, column stores, blockchain and more. It reached the point where Amazon's portfolio now lists 15 different database platforms.

And that has opened a debate among platform providers that should sound familiar: the old single platform debate against the best in its class has now spread from the application to the database space. On the one hand, Amazon promotes the strategy of choosing the right database for the job; On the other hand, players like Oracle, Microsoft and even SAP that have promoted the Swiss army knife approach. Traditionally, database platforms such as Oracle or SQL Server have approached multimodel capacity by expanding their SQL query capabilities or adding capabilities, such as support in the R or Python database.

With the new generation of cloud-born databases, many store data in a canonical format and then expose it through API. Microsoft Azure Cosmos DB is the secondary element of this approach, but if you look below the surface, you will find that some of the specialized third-party cloud database native platforms are also using APIs prominently in their architectures.

In a previous life as an Ovum analyst, we predicted in 2014 that the next era of database diversity would also lead to overlapping the database (see diagram). Specialized databases would continue to thrive, but they would add capabilities that overlapped other forms of data, such as relational databases that query JSON documents or that document-oriented databases have SQL type query languages. This is useful to enhance the large base of SQL developers and provide them with additional query capabilities. However, the fact that, for example, Oracle or IBM Db2 could consult JSON was not intended to replace the need for MongoDB; instead, we consider them as borderline cases if the line organization that works with a database of customer transactions also wanted the ability to query nonrelational data in the customer profile.

By the way, in that same investigation, we raised the question of who would be the "owner" of the query. Enter the current era of data catalogs.

As we noted in Part I of our 2020 perspective, our view that companies will increasingly consider cloud natives as their default implementation option will simply intensify this almost old debate. Our opinion is that there is not a single silver bullet, binary response.

Don't get us wrong, the right databases for your purpose are here to stay. If the use case is very focused on a single type of data, a database that is promoted as a multimodel will be exaggerated. There is also the issue of highly sophisticated capabilities, such as writing extremely complex SQL statements that require multiple combinations of tables or graphical queries that span three breaks. For those, it is better to keep the best in their class.

But we also expect that extreme cases that require a combination of data access approaches become much more common. Match an asset management transaction system with IoT data to plan maintenance, or a supply chain planning system with mobile and IoT data, and you will have a case ready for extensibility.

And that's where we would like to see native cloud database providers take a step forward. As some of its platforms already use API to expose data, and they should exploit the potential of providing multiple paths to the data, matching SQL, JSON, graphic and / or search, for example. It's not just about extending SQL. We hope to know more about the cross-cutting capabilities of each of the leading cloud database providers this year.

Our data perspective for 2020 consists of two parts. For Part I, which covers the default Hybrid cloud, click here.