Utilizing Machine Learning and Operations Research to Optimize Assortment Size

Setting the Scene

In the retail sector, assortment planning is paramount. This process entails striking a balance across a wide array of products, ranging from budget-friendly to high-end, with a focus on either proprietary brands or specialized items tailored for a particular customer demographic. Our case fell into the former category to streamline an extensive product range to enhance operational efficiency and the consumer experience.

Auchan is celebrated for its vast selection of products, and stepping into an Auchan hypermarket ensures that customers will find exactly what they need, thanks to a wide variety of options available for each requirement. However, what once was a significant advantage has gradually become less beneficial. An overabundance of products can dilute sales, escalate costs, and potentially muddle the shopping experience.

To manage this extensive assortment, we categorized the offerings into segments from A1 to A10 for every need category, referred to as “box size.” Each outlet could then align with a box size for each category of need, which, depending on the size, would offer access to either a broader or more limited selection of products.

These categorizations were established by industry experts several years ago and have yet to be revisited. That’s where our intervention comes into play.

Challenge Introduction

Our initiative aimed to optimize the assortment size across various “needs,” such as beverages, household goods, juices, etc., to amplify sales and foster customer loyalty. Achieving this demanded a comprehensive insight into product demand, consumer behavior, and profitability to guarantee a balanced and attractive mix of needs.

In discussions about assortment, the emphasis is not on individual products but on the optimal quantity of ‘needs’ to be fulfilled. The challenge lies in redefining the optimal number of products per need within the assortment for a specific stratification level. Identifying the ideal quantity of products in needs categories (e.g., fresh juices) required to maximize sales, enhance the average basket size, and improve customer retention.

However, the challenge extends further to include the optimal number of products and the ideal needs distribution, aiming for a harmonious assortment while mindful of potential cannibalization and cross-selling opportunities.

The goal is diminishing our overall selection while optimally integrating each need category within this reduced product portfolio.

The Constraints

There were specific constraints in the assortment process, including maintaining a lower product count than the existing range while ensuring no increase in the variety of fresh products due to logistical challenges. This required a strategic approach that could adapt to shifting market dynamics.

Regarding figures, the strategy aimed to balance fresh and non-perishable items and cut the overall selection by 5% to 10%.

Moreover, we employed upper and lower limits associated with each need category as defined by a prior market analysis to delineate the optimization’s scope.

These limits were crucial to the optimization process, providing a foundation for evaluating various combinations in each iteration.

The Dataset

Utilizing comprehensive data on store choices, sales trends, and customer demographics, we crafted a framework to foresee the ideal range size of product offerings. This model incorporates diverse elements such as customer segmentation, distribution of product types (for example, premium, economical, national brands, etc.), and pricing to forecast sales and consumer engagement results. I structured the data in the following manner:

Each entry corresponds to six months, outlining a need, a store, a stratification tier, and key performance indicators (KPIs) like:

Total customer count by segment (premium, regular, occasional).
Comprehensive product count and type distribution (e.g., store brand, national brands, high-end, budget-friendly).
Average, minimum, and maximum product pricing per need.
Overall sales and revenue include breakdowns by customer segment, sociodemographic, and store location attributes.

We also monitored changes in these KPIs from the preceding semester.

Insights on Our ML Target

Leveraging data represented as percentage changes from the prior term, we devised queries for our algorithm, concentrating on the potential alteration in the average basket size per consumer if the total product count for a need was X, its progression, and product type distribution.

We varied this central question by altering the target metrics, including:

The evolution of the average basket size per consumer.
Variations in average expenditure per customer.
Increases in the total count of unique customers.
Expansion of the premium customer segment.

These inquiries bolstered our aim for multi-objective optimization.

Multi-Layered Optimization Approach

Our solution framework consisted of three fundamental layers, each contributing to the refinement and efficacy of our optimization strategy:

Machine Learning Layer: This foundational layer applied predictive analytics to gauge the effects of assortment modifications on key performance indicators, reassessing features like product quantity and product type distribution.

Using time series cross-validation and LightGBM, we customized our machine learning models to anticipate the dynamic market reaction to changes in the assortment. These models facilitated our comprehension of the influence of assortment size on consumer purchasing patterns, empowering data-informed decisions to fine-tune our retail approach.

Our approach resembles a forecasting model with distinct target differentiation. This adjustment mitigates trend and seasonality effects and enables a consistent estimation scale across all needs, which exhibit highly variable trends. Considering that some stores might shift from an A3 to an A7 positioning or even completely change their product focus, the variance from one semester to the next could be significantly high, introducing instability.

To address this issue, we consistently adjust our target variable to equivalent days compared to the previous semester, ensuring a meaningful comparison of changes in requirements regardless of their duration.

Another critical aspect is our approach to KPIs, which, although anticipating a continuous numerical value, are distinct. We recognize that the predicted evolution may not be perfect but should at least align with the actual direction of change. Therefore, our principal KPIs include:

Sign accuracy: A measure of the accuracy in predicting the direction of change, whether negative, positive, or neutral.
Bucket accuracy: This denotes the precision within which our forecast falls. The concept is that a deviation of 5% instead of 4% is acceptable as long as the direction and magnitude of change are accurately predicted.

Weighted Mean Absolute Error: A Unique Continuous Metric

Weighted Mean Absolute Error is one of the sole continuous metrics we employ to assess evolution, converting it into a gross value.

Security Layer: Enhancing Feasibility through Kernel Density Estimation

By applying Kernel Density Estimation to stratified sociodemographic data, this layer guarantees the practicability of our recommendations. It refines proposals to ensure a realistic and executable product assortment, thus strengthening the resilience of our strategy. We calculate the kernel density histogram for the needs of all stores, aggregated and analyzed through a sociodemographic perspective. This histogram is then normalized and utilized as a foundational element for oversight.

Maximization Layer: Synthesizing Insights for Optimal Solutions

In this phase, we amalgamate insights from preceding layers, leveraging the NSGA-III algorithm for exhaustive multi-objective optimization. This method enables us to navigate a broad spectrum of potential solutions, pinpointing configurations that harmonize customer contentment with operational efficiency. Orchestrated by Optuna through a callback mechanism, each “trial” formulates a new optimal size for each need. These configurations are influenced by predefined low and high thresholds per need, established based on market analysis by stakeholders. These thresholds reflect competitive considerations, our offer’s penetration for that need, and market share.

Upon generating a new combination, we employ a callback to adapt most of the algorithm’s parameters based on the assortment of products and their distribution. This information then forms a novel set of predictors for forecasting this combination. Each model, aligned with its specified target, calculates the percentage change as per the target. This result is then adjusted by a density vector from the Security Layer (e.g., Kernel Density Histogram) to mitigate improbable combinations.

Before concluding the process, we verify adherence to specified constraints—the balance between fresh and dry goods and the scope of the offer. Should these criteria be met, the trial is documented; otherwise, it is deemed unsuccessful. Subsequently, we proceed through 3,000 to 15,000 iterations, incorporating an early stopping and density pruning mechanism.

About NSGA-III

The NSGA-III (Non-dominated Sorting Genetic Algorithm III) is an evolutionary algorithm for addressing intricate multi-objective optimization challenges. It employs a reference point-based method to maintain diversity, partitioning the objective space into multiple segments to ensure representation from each sector in the final solution set. This strategy provides a diverse collection of Pareto-optimal solutions.

Results and Impact

Although tangible results are forthcoming, our theoretical framework anticipates a potential 3–4% uplift in turnover alongside a 4–5% expansion in the customer base. These forecasts highlight the effectiveness of integrating machine learning with operational research in retail assortment planning. We’ve affirmed the consistency of our propositions with business stakeholders and remain committed to refining these outcomes and adapting to market fluctuations.

Future Directions: Pioneering Beyond the Existing Framework

Looking forward, we aspire to enhance our methodology by:

Delving into Causal Forecasting Models: To accurately capture the dynamics between product cannibalization and cross-selling effects.
Benchmarking Optimization Tools: Comparing the efficacy and adaptability of Optuna against alternative solvers like Pyomo.
Strategic Product Selection: Determining the optimal products to choose in line with the recommended needs size.

Rienstra' Machine Learning Blog