When is deep learning overkill?
Apr 07, 2022 • 8 min read
Deep learning is perhaps one of the most efficient AI tools for businesses looking to succeed in highly-digitized and fast-paced markets. Computers can use algorithmic models to analyze large amounts of unstructured and structured data better and faster than the average human, leading to greater accuracy in data-driven decision-making. But is deep learning the right fit for every business solution? Not exactly.
There is a common saying that “if all you have is a hammer, everything looks like a nail.” Depending on the industry, that hammer can be a variety of things; for instance, the nurse’s low-strength antibiotic, the chef’s basket of onions, or the carpenter’s actual hammer. In the case of modern data science applications, the proverbial hammer is often one of the more popular advanced Neural Network based approaches, especially if they are that data scientist’s area of research.
While machine learning techniques can solve problems that previously seemed insurmountable, it can also be misused in situations where a simpler parametric approach or even a rules-based approach may be better suited. It’s not only important to know how to use an AI tool (our hammer) but when to use it as well.
In this blog we’ll be outlining a handful of situations where the use of deep learning wasn’t the be-all-end-all for businesses (but was still successful in certain sub-cases). These include:
- Pharmaceutical companies hoping to implement Next Best Action algorithms;
- A financial technology company looking for a one-size-fits-all solution for different NLP problems;
- Our experiences with recommender systems and other data science fields.
Next best action: a long term goal
In 2021, a pharmaceutical company contacted Grid Dynamics about creating a Next Best Action approach for their product. After reading our blog post on reinforcement learning, they felt that there would be great value in seeing how the NBA model could increase their adherence metrics. If we could ensure that patients continue to take their medication as prescribed, we could improve not only health outcomes but customer loyalty as well.
We were tasked with creating an implementation road map in which we laid out the steps needed to bring the algorithmic approach to fruition. Given that we had just completed a Proof of Concept for a different client within a related industry, it would appear easy to repeat the same process for this new client. However, being a data scientist often means putting aside personal biases and adjusting to the data, rather than making the data adjust to you.
We explored their entire system in-depth with the help of subject matter experts from the client and learned the ways in which their agents communicate with patients. Additionally, we gained a better understanding of what kind of data they had in each of their data management systems and dug into the potential limitations of said data. As a result, we discovered that much of the necessary data was either missing or limited in quality, which ultimately impacted the quality of patient engagement. Agents made decisions on a case by case basis with few overarching business rules. Their existing ML model was built to help where it could, even if the performance was non-ideal. To return to the idiom, this may be a screw we have to drill rather than a nail we should hammer.
Rather than pushing through our NBA architecture, we listened to the stakeholders’ requests and used the company’s existing data to find a path that would maximize value for the company in both the short and long term. Implementing the NBA model turned out to be just one of several recommendations, rather than our sole recommendation. While we saw Next Best Action as the long term solution, we also recommended data pipelining, feature engineering, existing model improvement, and the implementation of a Lifetime Value model in the short term.
Whereas the client within the related industry yielded a 30% improvement when we completed implementing the Next Best Action model, it was clear that the situation for this pharmaceutical company was not as analogous as it first seemed. The related case had clearer objective metrics, a higher volume of and more robust data, and clearly defined business rules. However, as the data pipelining process, business rules, and component models mature, we can grow more and more confident that the NBA model will be able to elevate this client’s processes to the next level.
Simplicity is key: A launching point
At a financial technology company we work with, regulatory practices indicate that the company must report how often customers complain, by which means, about what, and how often the company is at fault in those complaints. The complaints can range from withheld funds to technical issues with the service. As a team, we came up with many methods including longformer language models that ultimately performed very well on the basic complaint identification cases.
However, just because an algorithm works well for some use cases, does not mean it will work well for all of them. Even within the same project, there is rarely a one-size-fits-all model.
A notable example where the bleeding edge longformer approach struggled was the initial at-fault labeling. The model simply could not achieve both high precision and high accuracy given the circumstances. After careful consideration, we elected to instead use a simpler model using a tf-idf vectorizer and a linear SVM. This model looked at each sentence in the text individually and then assigned an at-fault label to the entire interaction if any of those sentences were deemed to be an admission of fault. Additionally, in order to improve our minority class accuracy, we upsampled positive cases and downsampled negative cases.
While it may seem that using cutting edge techniques would outperform methods from the last millenia, not every data science problem is a nail that needs to be hammered. The model vastly outperformed the language model due in large part to the data quantity (only 100s of positive cases) and class imbalance (99% of data was in the negative class). Only once there were several more months of specially labeled data and a change in the metric of choice did a transformer approach (RoBERTa) begin to outperform the older model.
Simply put, a good deal of cutting edge models rely on the recent rise in data quantity and computing power to achieve their better results[1]. When one or both of those are unavailable, falling back to more classical techniques is the way to go.
This type of cost-benefit analysis and decision making occurs in all sorts of industries. When one of our engineers was developing part of a geospatial data analysis framework to help petroleum engineers assess where and how they should develop wells, he too had to balance out both classical and deep learning approaches. Some of the potential drill sites had robust enough data to perform the analysis with feedforward neural networks. Whereas for others, linear regression with variable transformation, radial basis function approaches, or kriging with various semivariograms (which describe spatial autocorrelation of points according to distance) performed best. In addition, it is worth mentioning that kriging itself is based on geostatistical models in contrast to neural networks which are universal approximators, thus allowing more interpretable and explicable results in geospatial modeling. In the end, the team decided to allow users to make their own choice regarding the model from the product framework once they had seen the results of their data on all the potential models and the corresponding model quality metrics. The team provided suggestions (i.e. this is more likely to be variable due to the low amount of data), but users were the ones who decided what their most important conditions were. For some, a straighter, simpler drilling pattern (oftentimes the result of traditional parametric methods) was preferable to a more complicated pattern that measured higher in certain metrics.
A common trend we can see in all these cases is that additional data is often the key to turning deep learning from being computationally expensive with mediocre results at best to being the go-to method. This is not just limited to the quantity of data, but also its type and robustness. When one of our engineers was working for a large Russian retailer, his attempts to use deep learning as part of his recommender solution provided little lift over traditional methods, while also requiring more time and resources. A collaborative filtering method followed by a LightGBM ranking algorithm built on Spark proved to be the fastest, most effective solution for both online and offline learning. But when the products have rich descriptions or higher quality images, deep learning begins to flex its muscles (and the golden hammer begins to shine).
Recommender systems and other data science fields
We can see that in the case of large fashion and parts retailers, we can gain a lot of insight from using a combination of deep learning Natural Language Processing (NLP) methods such as BERT on the text with Convolutional Neural Networks (CNNs) on the images. With this combination of NLP and image data, deep learning can be the fuel needed to rocket user recommendations into the stratosphere.
There are numerous additional examples where deep learning proved to be the right tool. These include: modern visual search (Convolution Neural Networks), object detection and classification (Convolutional Neural Networks), supply chain and price optimization (Reinforcement Learning), and Recommender systems. Nonetheless, there are other cases where Neural Networks were either not relevant or performed poorly relative to classical methods and newer non-deep learning methods. Among other things, these include computationally complex shipping problems (where Mixed-Integer Linear Programming is the preferred method) and safety stock calculations (where gradient boosting often performs best).
Going forward
What all of these cases have in common is that they illustrate the need to be patient, thorough, and thoughtful about models and action plans within the field of data science. As data scientists, we must adjust to the data we have in order to solve the challenging problems we face. Sometimes that means performing data imputation techniques such as SMOTE or back-translation, sampling techniques such as oversampling or undersampling, or general feature engineering. Sometimes, that means simply collecting more data and creating an interim solution until then. We recommend that businesses:
- Consider the data that they have and ensure that as much of it is at the data scientists’ disposal as possible;
- Start with classical method to provide baseline or fallback models;
- And only thereafter begin experimenting with deep learning models and methodologies.
By following these basic three steps, businesses can put themselves in a better situation to succeed in whatever field they are in. Regardless of whether you are a business, a student, or a hobbyist, we should all make sure to consider everything in our data science toolkit, and not just our shiny new deep learning hammer.
-
The major exception are some biological models (which must contend with p>>n) and certain NP-hard problems ↩︎