TY - JOUR
T1 - Bayesian Inventory Control
T2 - Accelerated Demand Learning via Exploration Boosts
AU - Chuang, Ya Tang
AU - Kim, Michael Jong
N1 - Publisher Copyright:
Copyright © 2023, INFORMS.
PY - 2023/9/1
Y1 - 2023/9/1
N2 - We investigate Bayesian inventory control problems where parameters of the demand distribution are not known a priori but need to be learned using right-censored sales data. A Bayesian framework is adopted for demand learning, and the corresponding control problem is analyzed via Bayesian dynamic programming (BDP). In the Bayesian setting, it is known that the BDP-optimal decision is equal to the sum of the myopic-optimal decision plus a nonnegative “exploration boost.” The goal of this paper is to (i) identify those applications in which adding an exploration boost is important and (ii) characterize the form of the exploration boost. In contrast to recent research that suggests that ignoring the exploration boost (i.e., adopting the myopic policy) can perform reasonably well in certain settings, we show that for applications with moderate time horizons and high parameter uncertainty, the optimality gap between the myopic policy and the BDP-optimal policy can be arbitrarily large and in particular, grows in proportion to the posterior index of dispersion of the unknown mean demand. With regard to characterizing the form of the BDP-optimal exploration boost, we prove that the exploration boost is also proportional to the posterior index of dispersion of the unknown mean demand. This characterization expresses in clear terms the way in which the statistical learning and inventory control are jointly optimized; when there is a high degree of parameter uncertainty (encoded as a large posterior index of dispersion), inventory decisions are boosted to induce a higher chance of observing more sales data so as to more quickly resolve statistical uncertainty (i.e., accelerated demand learning), and to not do so will necessarily lead to poor performance.
AB - We investigate Bayesian inventory control problems where parameters of the demand distribution are not known a priori but need to be learned using right-censored sales data. A Bayesian framework is adopted for demand learning, and the corresponding control problem is analyzed via Bayesian dynamic programming (BDP). In the Bayesian setting, it is known that the BDP-optimal decision is equal to the sum of the myopic-optimal decision plus a nonnegative “exploration boost.” The goal of this paper is to (i) identify those applications in which adding an exploration boost is important and (ii) characterize the form of the exploration boost. In contrast to recent research that suggests that ignoring the exploration boost (i.e., adopting the myopic policy) can perform reasonably well in certain settings, we show that for applications with moderate time horizons and high parameter uncertainty, the optimality gap between the myopic policy and the BDP-optimal policy can be arbitrarily large and in particular, grows in proportion to the posterior index of dispersion of the unknown mean demand. With regard to characterizing the form of the BDP-optimal exploration boost, we prove that the exploration boost is also proportional to the posterior index of dispersion of the unknown mean demand. This characterization expresses in clear terms the way in which the statistical learning and inventory control are jointly optimized; when there is a high degree of parameter uncertainty (encoded as a large posterior index of dispersion), inventory decisions are boosted to induce a higher chance of observing more sales data so as to more quickly resolve statistical uncertainty (i.e., accelerated demand learning), and to not do so will necessarily lead to poor performance.
UR - http://www.scopus.com/inward/record.url?scp=85173990414&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85173990414&partnerID=8YFLogxK
U2 - 10.1287/opre.2023.2467
DO - 10.1287/opre.2023.2467
M3 - Article
AN - SCOPUS:85173990414
SN - 0030-364X
VL - 71
SP - 1515
EP - 1529
JO - Operations Research
JF - Operations Research
IS - 5
ER -