108:30 — Submodular Order Functions and Their Applications

We define a new class of set functions that in addition to being monotone and subadditive, also admit a very limited form of submodularity defined over a permutation of the ground set. We refer to this permutation as a submodular order and give fast and best possible approximation algorithms for offline and online maximization of submodular order functions. We show the utility of this new notion in a variety of applications, including static and dynamic assortment optimization, online allocation of reusable resources, and large-scale optimization in the streaming model.

209:00 — Optimizing and Learning Assortment Decisions in the Presence of Platform Disengagement

We consider a problem where customers repeatedly interact with a platform. During each interaction with the platform, the customer is shown an assortment of items and selects among these items according to a Multinomial Logit choice model. The probability that a customer interacts with the platform in the next period depends on the customer’s past purchase history. The goal of the platform is to maximize the total revenue obtained from each customer over a finite time horizon.

First, we study a non-learning version of the problem where consumer preferences and return probabilities are completely known. We formulate the problem as a dynamic program and prove structural properties of the optimal policy. Next, we provide a formulation in a contextual episodic reinforcement learning setting, where the parameters governing contextual consumer preferences and return probabilities are unknown and learned over multiple episodes. We develop an algorithm based on the principle of optimism under uncertainty for this problem and provide a regret bound.

Previous approaches that address user disengagement often constrain exploration. However, in our model with non-permanent disengagement with assortments, the optimal solution simply offers larger assortments at the beginning of the horizon and exploration is unconstrained during the learning process. We numerically illustrate model insights and demonstrate regimes where our algorithm outperforms naively myopic learning algorithms.

309:30 — Learning to Price Under Competition for Multinomial Logit Demand

We consider a sequential price competition problem a seller faces when multiple sellers selling the same product compete on prices under an unknown multinomial logit (MNL) demand. In particular, at the beginning of each period, each seller first (simultaneously) posts a price and then observes their own realized demand and posted prices of other sellers. However, they do not observe the demand of other sellers. The goal is to find a pricing policy as a function of historical observations to maximize the expected revenue over the selling horizon. This problem is introduced in van de Geer et al. (2019) as a part of the Dynamic Pricing Challenge at the 2017 INFORMS Revenue Management & Pricing Conference. However, to our knowledge, prior to this work there is no known algorithm with provable guarantees. Gallego et al. (2006) study a corresponding single period price competition problem under a known MNL model and show a unique pure-strategy Nash equilibrium. We give a stochastic gradient-based online learning algorithm in the sequential setting and show that it converges to the unique Nash equilibrium. In particular, we reformulate the learning problem with multiple sellers under a stationary MNL demand to learning under a non-stationary logit demand for a single seller.
Furthermore, we show that for any seller i, our algorithm achieves an expected regret at most O (T2/3 log T log2 N) as compared to the expected revenue of seller i under full information where we fix the sequence of prices of other sellers in all periods and optimize the price for seller i in each period. Here N and T are the numbers of sellers and sequentially arriving customers respectively. This full-information benchmark is stronger than the benchmark of expected revenue for the seller at Nash equilibrium.

410:00 — Maximum Load Assortment Optimization

Motivated by modern-day applications such as Attended Home Delivery and Preference-based Group Scheduling, where decision makers wish to steer a large number of customers toward choosing the exact same alternative, we introduce a novel class of assortment optimization problems, referred to as Maximum Load Assortment Optimization. In such settings, given a universe of substitutable products, we are facing a stream of customers, each choosing between either selecting a product out of an offered assortment or opting to leave without making a selection. Assuming that these decisions are governed by the Multinomial Logit choice model, we define the random load of any underlying product as the total number of customers who select it. Our objective is to offer an assortment of products to each customer so that the expected maximum load across all products is maximized. We consider both static and dynamic formulations. In the static setting, a single offer set is carried throughout the entire process of customer arrivals, whereas in the dynamic setting, the decision maker offers a personalized assortment to each customer, based on the entire information available at that time. The main contribution of this paper resides in proposing efficient algorithmic approaches for computing near-optimal static and dynamic assortment policies. In particular, we develop a polynomial-time approximation scheme (PTAS) for the static formulation. Additionally, we demonstrate that an elegant policy utilizing weight-ordered assortments yields a 1/2- approximation. Concurrently, we prove that such policies are sufficiently strong to provide a 1/4-approximation with respect to the dynamic formulation, establishing a constant-factor bound on its adaptivity gap. Finally, we design an adaptive policy whose expected maximum load is within factor $1-\eps$ of optimal, admitting a quasi-polynomial time implementation.