In the past decade, the amount of information processed by commercial organizations has grown exponentially. This information, known as big data, can bring many benefits to businesses, but only if it is worked with correctly. Without proper preparation and processing, big data doesn’t add much value.
The impact of data science is especially visible in retail and e-commerce. Industry analysts highlight that more than 75 percent of organizations will use it in 2021 to improve workflows and decision formation. The pandemic has accelerated the digital transformation. Therefore, more retailers are turning to big data to improve service, implement smart automation, and get a better return on investment (ROI), and this direction of software development is becoming more relevant every day.
In this article, we’ll consider the application of data science in retail, challenges on the path of implementing it, and features of deploying such solutions.
How Is Data Science Being Applied in Retail in 2021?
Data science is a branch of informatics that studies the problems of analysis, processing, and presentation of data. It combines data processing, statistical analysis, data mining techniques, and artificial intelligence (AI) applications. Such systems are increasingly used in the field of retail and consumer packaged goods (CPG). According to IBM, 62 percent of retailers assert that innovations give them valuable competitive advantages.
In 2021, data science solutions have already been implemented in the processes of almost all industries. They help generate added value, find new ways to promote products, and create an amazingly accurate portrait of the target customer. Data science in the retail industry is most commonly used for the following purposes:
It is essential for a retailer to meet customer expectations since it is the right customer experience that will boost sales and accurately target the audience. Data science tools allow you to do this with less time and effort, and with the highest possible accuracy.
Data science in retail is also effective for:
- Forecasting user churn — algorithms determine patterns based on individual data and predict individual churn rates;
- Predicting customer lifetime value (CLV), which determines dependencies between the choices/behaviors of clients and helps to create better retention offers; and
- Analyzing the customer’s path to purchase, logic of product selection, and buying habits.
Recommendations Based on Big Data Analysis
One of the most representative retail data science examples is recommendations improvement. Today, a significant proportion of purchases are driven by a referral engine based on a buying history. Machine learning (ML) models and algorithms analyze items that have already been put in the shopping cart, have been noticed or liked before, and have been seen or ordered recently by others. This data becomes a source for recommendations of other goods that are created automatically.
ML models can be an excellent tool for retailers to anticipate customers’ behavior and increase revenue. To better demonstrate how data analysis improves recommendations, we have presented our case of implementing such a solution. To see it, read the section with our practical advice below.
Many retail data science projects are devoted to forming pricing tactics that flexibly adapt to changing market conditions. Data-driven price management allows merchants to attract customers in ‘empty hours’ and capitalize on demand. These initiatives boost the profits and can raise profit margins by as much as 7 percent in just a year, with an ROI of 200 percent or more on average.
Competitive advantage can also be reached due to:
- In-depth comparative retail analytics: data science helps to collect information about competitors’ products electronically via algorithms that explore various websites. For example, Groupon’s discount site processes one terabyte of raw data daily. The company uses big data platforms and a major IT framework to quickly collect and analyze info from hundreds of websites in real-time.
- Personal pricing: it helps to set fair prices that meet the buyers’ expectations and stimulate demand. For instance, the retail industry uses ML-based dynamic pricing to reduce time on tracking the competitors’ rates and warehouse prices and determine the best price point they should use. You can see how the combination of AI tools and algorithms automates pricing in CHI Software’s case.
Better Logistics and Inventory Management
Data science helps to apply advanced mathematical techniques, data analysis platforms, and ML algorithms that improve inventory management and predictive logistics analytics. Merchants can foresee the behavior of machines, people, and even weather. When companies use data analysis in their logistics operations, they can:
- Figure out the correlation through supply chains, reduce downtime, and reach optimal distribution of goods;
- Predict a number of specific goods customers may want to purchase and create the required volume of stock goods (for instance, for times of crisis); and
- Achieve better routing, customer satisfaction, and inventory formation.
Other applications of data science for retail:
- Robust fraud detection. Crimes committed by retail employees lead to the loss of colossal amounts of money each year, but AI and data analysis systems can indicate “fishy” activities and make internal processes more transparent.
- Forecasting trends via social media. A huge online environment where people express themselves is precious for sellers. Natural language processing (NLP) tools can extract a vast amount of unstructured data and identify tendencies with high accuracy. Take a look at Nordstrom, one of America’s biggest fashion retailers. It uses social data (particularly Pinterest pins) to find popular items and then promote such products in the shop.
Challenges Retailers Have to Face to Implement Data Science
There are some data science problems in retail, despite the attractive prospects. Among the important challenges are:
The Issue of Storing User Data
Sellers use multi-platform tools, run social media campaigns, execute online financial transactions, and get a wealth of info about their customers. Most of it is placed in the cloud for processing, which poses leakage threats and risks of confidentiality loss. Thus, government agencies impose sanctions on retailers who do not comply with legal requirements.
A multi-stage data protection system helps to solve the problem. It can include biometric authentication, multi-factor protection of payment gateways, data storage in a secure cloud, etc. Therefore, you should consider implementing such solutions when developing products based on data processing and AI.
A separate issue is the rules for collecting corporate data. Retailers are responsible for safeguarding business info. Therefore, IT experts should find a way to control the modification, management, deletion, and storage of records (paper and electronic forms) and company archives in general when implementing big data solutions.
Dangerous Predictive Analytics Errors
The power of data science is expanding rapidly. Analysis of the array of “sensitive” data puts buyers’ privacy under confidentiality and leads to dangerous incidents in targeted advertising and sending confidential offers.
When algorithms study the customer’s purchase history, they can make an erroneous conclusion that is offensive and unacceptable. For example, in a well-known Target store incident, the analytics software offered maternity products to a schoolgirl, just as she started buying unscented cosmetics. It naturally angered her parents, and the company received a lot of bad PR.
Overly precise sentences also scare customers, as people perceive it as an invasion of privacy (“Why does this store know so much about me?”). Therefore, when using predictive ML models to stimulate demand, one should not forget about moderation, empathy, and, of course, the quality of data processing.
A Growing Number of Data Sources
At first, retailers served customers in offline stores and by phone. Then websites, mobile apps, and email newsletters were added. Today, the list is supplemented by targeted ads in messengers, intelligent chatbots, orders in social networks, and more — and who knows what will happen tomorrow?
The wide variety of ways to communicate with customers has led AI algorithms and systems to aggregate data from disparate sources, each of which can contain voluminous information. However, not all data analysis tools are capable of properly processing it and formulating the right strategies. Retailers must leverage advanced relational search engines, artificial intelligence, and powerful cloud solutions to handle so much unstructured data.
Discriminatory Lending Decisions
Today, any lender must act following the Equal Credit Opportunity Act (ECOA), which states that discrimination on racial, religious, ethnic, or other grounds is unacceptable in making decisions about credit, its term, rate, and other parameters.
However, today, lending decisions are often made automatically based on predictive models. Credit institutions need to use them with caution, as big data algorithms can make decisions that are logical in form but discriminatory in nature. To ensure that applicants’ rights are protected, lenders need to be careful and evaluate predictive models in terms of equality of opportunity.
Data Science Use Cases in Retail
Using aggregated customer data just seems innovative — retail started to use it in sales a century ago. The first such cases date back to 1923, when Arthur C. Nielsen, Sr. founded a buyers’ behavior research company. Today, the Nielsen Corporation is a globally famous consumer data analytics firm.
Data Analytics in the 20th Century
The breakthrough inventions of the 1980s helped to broaden data science’s scope. Among the highlights of that time was the Walmart experience of improving supply chains and processes. Together with Procter & Gamble and other partners, the chain of stores created a software system where the number of goods, their dispatch from warehouses, and invoicing were controlled automatically. It allowed Walmart to cut costs, optimize inventory, and simplify control of goods flow.
Osco, Unica, and some other companies also experimented with data mining, and then there was an absolute flood of data. Everyone realized that predictive selling analytics was an incredible breakaway from the competition, and a brave new world began at Amazon.
Founded in 1994, the company has become a global giant, actively using collected data to manage inventory, generate individual recommendations, and improve customer service. Amazon analyzed data, predicted future demand patterns, and created offers related to items similar to previously purchased ones. To optimize pricing, the company used algorithms that recommended goods based on customers’ buying habits.
The New Age of Data Science
In the 2000s, Staples, Discover, Orbitz, and other online retailers joined Amazon in leveraging big data. Retailers realized the potential for the data that is collected from online shoppers and social media.
A striking example of the 2010s data science use cases in retail is the British company Argos. As part of boosting digital marketing, it began to monitor comments customers posted on social media. The Bandwidth platform offered data for analyzing consumer sentiment. Using it, Argos took action to meet buyers’ expectations — and its income has grown significantly.
Among the latest data science retail use cases, we must notice:
- Netflix. The company uses global data on user preferences and habits. Based on their analysis, Netflix makes recommendations for content and offers videos of interest to a specific person.
- Valve’s Steam. This video game distribution platform has implemented a successful pricing policy based on big data analysis. The games’ cost for buyers is constantly being adjusted depending on the gamer’s enthusiasm, the frequency of visiting the resource, and many other factors. This strategy is based on ML models.
- Starbucks. The company uses data analytics to decide where to open its next coffee shop. Starbucks analyzes the area’s demographic parameters, traffic, audience behavior, and ability to pay to fix whether a new shop will be profitable, and how profitable it will be.
How to Apply Data Science in Retail Business
The development pipeline itself can be represented as:
- Research and discovery phase,
- Data mining and modeling,
- Development and testing, and
- Delivery and maintenance.
Research and Discovery Phase
First off, we have to define the business and technical requirements, the scope of technologies needed to create our product, and the lineup. The data analysis team includes data scientists, ML engineers, business analysts, and dataset labeling experts. To develop digital products, you’ll need a project manager, solution architect, user interface (UI) designer, and frontend and backend developers. Depending on the product, you have to involve engineers in software testing, DevOps, etc.
Then, we can go to data processing. To work with data, scientists and software engineers use machine learning tools. ML is a class of artificial intelligence methods. Its characteristic feature is not finding a direct solution to a problem but learning when applying solutions to many similar issues.
Data Mining and Modeling
We need to collect the relevant data we have on our clients, considering legal restrictions about sensitive information that can be stored. In order not to break the law, you can resort to data anonymization. It can be done, for instance, with a common user ID across every system you use.
To create data science projects in retail, you’ll need to analyze the data belonging to a specific business area to generate essential information and add business value. There are two basic methods in data analysis:
- Data visualization: visual presentation of data in the form of graphs and charts. It illustrates conclusions based on data, allowing you to compare analysis results, see patterns and trends, etc.
- Hidden data extraction: the search for trends and patterns in information. It is data that is not clearly visible but is of interest to the business.
Even the advanced ML algorithms won’t work without appropriately collected and prepared data. Most often, engineers use the Cross-Industry Standard Process for Data Mining (CRISP-DM). It is divided into business understanding, data understanding and preparation, modeling, evaluation, and deployment. In most cases, engineers apply steps cyclically and repeat them several times (besides deployment).
Next, you have to create, train, and test the ML model. Training is feeding data to apply statistical weights that allow the model to perform needed actions automatically. You can use various methods for training. We at CHI Software apply deep learning or reinforcement learning techniques to understand the problem better and get accurate results.
When the model is trained, we have to test it. For a test dataset, we need to filter the input information to avoid overfitting. It is a situation when the model identifies artifacts in the data that do not exist. When we can evaluate and deploy our model to use it in the application.
Development and Testing
To create deep learning and ML-based software, you have to develop a minimum viable product (MVP) with a set of basic features and then further develop the product with improvements — the full-fledged application or another digital system.
To simplify and speed up the process, developers often use ready-made platforms, third-party application programming interfaces (APIs), frameworks, libraries, and other third-party tools. They help to implement features more easily, and add payment services, chatbots, and other functions into the product. Many developers distribute an app to multiple servers that communicate with each other using an API. The servers that work as an auxiliary function to the main application server are called microservices.
In general, data science-driven app development is no￼￼ different from developing other products (except for CRISP-DM). You have to:
- Think over the architecture of the solution,
- Design the user interface,
- Create the frontend and backend, and
- Test the finished product with the help of QA engineers.
When the system is thoroughly tested, you can deliver it or place the application in app stores. In the future, it ￼will likely be necessary to regularly update the system (not always, but usually) so that it supports the latest OS versions, and improve the models by adjusting and supplementing the previously collected data.
We at CHI Software have fulfilled many big data and ML-based projects and have accumulated extensive practical experience in solving problems in this area. Here are some tips to improve your solutions in data science for retail stores:
Adjustment of Recommendation Engines
To manage and adjust recommendations according to the customers’ choices, you can use three main techniques:
- Collaborative filtering that makes predictions of what consumers might like based on many other users’ preferences;
- Content-based filtering that focuses on the products, not the buyers, and recommends items with similar attributes or characteristics; or
- A hybrid recommendation system that combines two techniques and their results.
The advantage of managing recommendations via ML is gaining efficiency without conducting hundreds of A/B tests to make decisions. The algorithm determines products being shown to the user in a personalized way. This result can be achieved by prior training of the model, classifying the data beforehand, and setting parameters that relate the items to each other. ML algorithms refine the selection after the implementation and optimize it repeatedly. Extensive data history greatly improves the accuracy of recommendations.
To detect fraud in retail and financial transactions, developers use data science and ML techniques such as deep neural networks (DNNs). By processing a vast amount of data from online transactions, the software can predict fraud transactions and prevent them.
You can also detect click fraud using ML-based advanced attribution models. They always record how the user behaves online, integrating data collected from the site and external platforms (e.g., different advertising campaign channels). Models allow you to create a realistic scenario, monitor each channel’s profitability, reallocate budgets, or evaluate an affiliate network’s effectiveness.
Processing Data from Social Networks
Social network data is mainly unstructured, and filtering and preparing it is a complicated task. You can turn to natural language processing (NLP) to extract information and to ML to understand it, gaining an edge over the competition. However, you need to strike a balance between using data to reach customers’ loyalty and respecting their privacy.
Among the possibilities of NLP are:
- Text classification and clusterization;
- Extracting information (e.g., proper names) and named entity recognition;
- Sentiment analysis (e.g., negative or positive comments on social media or user reactions);
- Information retrieval (search and ranking by keywords among public documents);
- Spam detection, topic modeling, etc.
How We Implement Data Science Solutions:
As an example of technology combination and tool usage, we will give our use cases of data science applications in retail.
Solution for consumer analytics:
- Client’s problem: needing tools to process the big data of TV viewing and people’s preferences, and collect user feedback;
- Our task: to create an easy-to-use environment for collecting, processing, and delivering big data for the client’s purposes (task estimation and prioritization, code development, unit and integration testing, and user feedback collection);
- Technologies used: Scala, Java, Apache Spark, Apache Hadoop, Amazon Web Services;
- Result: the solution developed by CHI Software allows our client to receive the prepared data for further use, broaden segmentation, and offer a higher level of precision to reach deeper and boost consumer engagement.
- Client’s problem: absence of software for searching for the best market prices to increase the efficiency of the analysis of the growing dataset;
- Our task: creating a solution for price analysis and financial calculation that includes data pipelines building from raw data ingesting and processing to complex analytics;
- Technologies used: Scala, Apache Spark, Spark GraphX, Kafka, AWS stack. Most prominent: BI – AWS Redshift, AWS EMR, AWS Cloud formation, Akka Actors, and Akka Streams;
- Result: we successfully built data pipelines and a solution for price analysis and financial calculation that allows the client to increase calculations accuracy and revenue significantly.
More and more retail companies are looking to apply data science to their operations and marketing campaigns. They test and implement different data mining models, processing everything they know about customers, from their previous purchases to their searches. And it works: data science truly gives retailers the tools to improve the customer shopping experience, better manage risk, increase productivity, and, as a result, generate more revenue.
Therefore, to keep up with competitors, you cannot stand aside and ignore a trend that will remain for a long time. Try data science, and you will understand how much it has to offer retailers right now.