The BS Behind Most AI/NLP Solutions for BI/Analytics
AI and NLP are forms of “machine learning” (ML), a class of algorithmic models that learn based on data input. The global enterprise AI market, according to Allied Market Research, is expected to generate $53.06 billion by 2026. While those are impressive numbers, I've personally come across too many instances to count where vendors claimed to offer “AI/NLP solutions” which leveraged, at best, very basic machine learning, and offered none of the benefits of AI.
Wayne Butterfield at ISG shed some light on this pointing out that “AI covers over 200 different disciplines, so it’s not uncommon to be using a branch of AI in a tool.” In other words, if the threshold for having AI is simply that your software employs a module or two with algorithms that use some form of machine learning, then AI is now table stakes. But this isn’t any sort of meaningful definition of AI--certainly not when you need software to make critical business decisions. Consequently, I see a lot of companies getting stuck with platforms that don’t deliver on their promises.
In evaluating platforms, how does one sift through the “AI/NLP B.S.” that’s so prevalent? Consider the following.
Text-based Search is NOT the Same as Natural Language
Google, Bing and other search engines have nicely transitioned from keyword-based search to natural language search, and since many of us are old enough to have grown up in the ‘keyword era’, we probably find ourselves searching both ways on these platforms. In the keyword-based approach, the algorithm breaks down your query into what it has been programmed to assess as the most important terms. If you want to understand the limits of keyword-based search, think about all the off-base outreach you receive on LinkedIn, where someone attempts to recruit you to a job you’re totally unqualified for, or sell you something that has nothing to do with your work just because of the presence of a few key search terms in your profile. It’s simply counting key terms--there’s no understanding of context or intent.
Natural language search, on the other hand, involves more sophisticated machine learning models. You talk to a computer the same way you’d talk to a colleague or friend, and the program discerns the intent and context of what you’re asking for. Here’s a crude example. In the early days of search, a query for “Chicago Bulls” might pull up pictures of Chicago’s buildings and livestock. Today’s more sophisticated natural language search algorithms understand, however, that your intent is to search for a sports team, and that you are probably not interested in an online bovine encounter. Ironically, while every 10-year-old kid on Google takes this capability for granted, business analysts querying their data management systems still have to put up with the equivalent of photos of cows in Chicago.
Does your platform understand complex queries with components that modify each other and may not have any meaning (or perhaps have a totally different meaning) when taken independently? Can it handle qualifiers like “biggest, highest, oldest” and ordered items like the “top 100 retail locations by sales”? Does it build on previous questions for context? If so it is probably giving you advanced NLP capabilities. If not, it’s probably just text-based search in the guise of NLP.
Pre-defining EVERYTHING is NOT Intelligent
Software applications can repeat steps perfectly, so they can perform feats which, in a demo-setting, may seem mind-blowing. But when you purchase the software and try to use it on your own data it produces less-than-stellar results. If this scenario sounds familiar it’s probably because the platform relied more on hard-coded rules than machine learning. Sadly, in data management, when you peel back the onion you find that many vendor's version of AI is little more than a bunch of predefined SQL queries--the system simply calls on one that has some commonalities with your question and executes it. Not surprisingly, the results it gathers are usually pretty useless.
On the other end of the spectrum you have platforms that are as “intelligent” as you make them. In other words, if you provide every bit of minutiae it in turn may provide a cogent answer. At that point, however, you might as well have coded it yourself for all the work that you’ve put into asking the question.
AI-based models, as JP Baritugo of Pace Harmon put it, “autonomously find patterns in the data (‘inputs’) to provide insights, predictions, and prescriptions (‘outputs’) that could have significant business impact.”
When There is no Training, the Model Never Gets Smarter
On a related note, in traditional hard-coding, algorithms take data as input and carry out pre-defined instructions--but they don’t change their behavior based on that data. Machine learning algorithms, on the other hand, learn from the data, and as a general rule increase in accuracy with more data. This is called “training” or “supervised learning” in ML parlance. A true AI or NLP driven software platform, therefore, should be learning from your data as it goes. In data management, for instance, it would be gaining understanding of context and intent based on not just a user’s current query, but from every query that any user within the company has ever done on the platform, and from every data warehouse, database or data lake that it’s connected to.
In sum, you don’t have to be a machine learning expert to spot an AI phony--you can tell from how it functions. If it isn’t adaptive and geared for constant improvement based on the data you give it, and if it doesn’t discern context and intent from your queries, it’s not AI/NLP.
To learn more about how natural language search is being used to speed up data discovery, read on.
Author’s note: there are ‘unsupervised’ machine learning models that, while they do learn from data input, don’t ‘train’ on data sets in the way just described. What we’re talking about here is ‘supervised learning’ within machine learning.