TY - JOUR

T1 - Bayesian Inventory Control

T2 - Accelerated Demand Learning via Exploration Boosts

AU - Chuang, Ya Tang

AU - Kim, Michael Jong

N1 - Publisher Copyright:
Copyright © 2023, INFORMS.

PY - 2023/9/1

Y1 - 2023/9/1

N2 - We investigate Bayesian inventory control problems where parameters of the demand distribution are not known a priori but need to be learned using right-censored sales data. A Bayesian framework is adopted for demand learning, and the corresponding control problem is analyzed via Bayesian dynamic programming (BDP). In the Bayesian setting, it is known that the BDP-optimal decision is equal to the sum of the myopic-optimal decision plus a nonnegative “exploration boost.” The goal of this paper is to (i) identify those applications in which adding an exploration boost is important and (ii) characterize the form of the exploration boost. In contrast to recent research that suggests that ignoring the exploration boost (i.e., adopting the myopic policy) can perform reasonably well in certain settings, we show that for applications with moderate time horizons and high parameter uncertainty, the optimality gap between the myopic policy and the BDP-optimal policy can be arbitrarily large and in particular, grows in proportion to the posterior index of dispersion of the unknown mean demand. With regard to characterizing the form of the BDP-optimal exploration boost, we prove that the exploration boost is also proportional to the posterior index of dispersion of the unknown mean demand. This characterization expresses in clear terms the way in which the statistical learning and inventory control are jointly optimized; when there is a high degree of parameter uncertainty (encoded as a large posterior index of dispersion), inventory decisions are boosted to induce a higher chance of observing more sales data so as to more quickly resolve statistical uncertainty (i.e., accelerated demand learning), and to not do so will necessarily lead to poor performance.

AB - We investigate Bayesian inventory control problems where parameters of the demand distribution are not known a priori but need to be learned using right-censored sales data. A Bayesian framework is adopted for demand learning, and the corresponding control problem is analyzed via Bayesian dynamic programming (BDP). In the Bayesian setting, it is known that the BDP-optimal decision is equal to the sum of the myopic-optimal decision plus a nonnegative “exploration boost.” The goal of this paper is to (i) identify those applications in which adding an exploration boost is important and (ii) characterize the form of the exploration boost. In contrast to recent research that suggests that ignoring the exploration boost (i.e., adopting the myopic policy) can perform reasonably well in certain settings, we show that for applications with moderate time horizons and high parameter uncertainty, the optimality gap between the myopic policy and the BDP-optimal policy can be arbitrarily large and in particular, grows in proportion to the posterior index of dispersion of the unknown mean demand. With regard to characterizing the form of the BDP-optimal exploration boost, we prove that the exploration boost is also proportional to the posterior index of dispersion of the unknown mean demand. This characterization expresses in clear terms the way in which the statistical learning and inventory control are jointly optimized; when there is a high degree of parameter uncertainty (encoded as a large posterior index of dispersion), inventory decisions are boosted to induce a higher chance of observing more sales data so as to more quickly resolve statistical uncertainty (i.e., accelerated demand learning), and to not do so will necessarily lead to poor performance.

UR - http://www.scopus.com/inward/record.url?scp=85173990414&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85173990414&partnerID=8YFLogxK

U2 - 10.1287/opre.2023.2467

DO - 10.1287/opre.2023.2467

M3 - Article

AN - SCOPUS:85173990414

SN - 0030-364X

VL - 71

SP - 1515

EP - 1529

JO - Operations Research

JF - Operations Research

IS - 5

ER -