3.1. Data and Sample Construction
The empirical setting is the Dominick’s Finer Foods public scanner database, using the cereal movement file and the cereal UPC dictionary [
6]. The raw weekly movement file (
wcer) contains 6,602,582 store–UPC–week observations, and the UPC dictionary contains 490 product records prior to cleaning. After merging, each observation includes store, UPC, week, units sold (MOVE), quantity (QTY), price (PRICE), sale status, a data-quality flag, and product descriptors. The analysis focuses on ready-to-eat cereal, identified by commodity code 311.
The Stage 1–4 cleaning pipeline standardizes variable names, preserves UPC identifiers, merges the movement and UPC files, and constructs per-unit price. Because the Dominick’s price field can reflect multi-unit deals, the cleaned pipeline defines per-unit item price as
when multi-buy variation is not separately identified. This choice matters even though non-unit quantities are rare, because ignoring them would mechanically overstate effective unit price in bundled promotions.
The main sample is restricted to observations with positive price and nonnegative unit sales. The cleaning workflow also retains standard product observations and excludes merchandise-like placeholders that cannot be interpreted as meaningful cereal package sizes. The merged notebook reports a 100% UPC match rate and then constructs the core variables used throughout the analysis.
Regular price is defined at the store × UPC level as the median observed per-unit price. Discount depth is therefore measured as
which is then clipped at zero to obtain
. The dependent variable in the static analysis is
The treatment is coded through mutually exclusive discount-depth bins: 0–5% (omitted baseline), 5–10%, 10–20%, and 20%+.
After basic cleaning, the constructed panel contains 4,751,202 observations. Trimming removes the 1% tails of and and caps at 0.80, leaving a final trimmed panel of 4,639,362 observations. The identifying variation is substantial. Among 71,245 UPC-weeks, 97.1% include at least two stores, 69.4% exhibit positive within-UPC-week depth variation, and 54.1% display cross-store depth-bin variation within a UPC-week. The preferred static sample uses the top 100 UPCs ranked by cumulative sales, producing 2,685,320 observations across 93 stores, 100 UPCs, and 366 weeks. The corrected balanced event-study sample contains 293,335 observations.
Table 1.
Sample construction and estimation samples.
Table 1.
Sample construction and estimation samples.
| Stage / sample |
Observations |
Notes |
| Raw weekly movement file (wcer) |
6,602,582 |
Store × UPC × week cereal movement records |
| UPC dictionary (upccer) |
490 |
Product records before cleaning |
| Constructed clean panel |
4,751,202 |
After merge, positive price, nonnegative sales, and unit-price construction |
| Trimmed analysis panel |
4,639,362 |
1% tails of and removed; capped at 0.80 |
| Main static sample (top 100 UPCs) |
2,685,320 |
Preferred baseline fixed-effects estimating sample |
| Event-study working sample (top 50 UPCs) |
1,431,668 |
Pre-event panel before episode construction |
| Balanced corrected event sample |
293,335 |
Top 50 UPCs; full 17-week windows around promotion start |
Table 2.
Summary statistics for key variables in the clean panel.
Table 2.
Summary statistics for key variables in the clean panel.
| Variable |
N |
Mean |
SD |
Min |
Median |
95th pct. |
99th pct. |
| price |
4,751,202 |
3.114 |
0.764 |
0.05 |
3.14 |
4.37 |
4.85 |
| qty |
4,751,202 |
1.002 |
0.058 |
1.00 |
1.00 |
1.00 |
1.00 |
|
4,751,202 |
3.113 |
0.765 |
0.05 |
3.14 |
4.36 |
4.84 |
|
4,751,202 |
3.118 |
0.708 |
0.25 |
3.15 |
4.25 |
4.75 |
|
4,751,202 |
0.037 |
0.073 |
0.00 |
0.00 |
0.169 |
0.389 |
| MOVE |
4,751,202 |
19.565 |
58.725 |
1.00 |
13.00 |
46.00 |
135.00 |
Table 3.
Promotion-depth bins in the trimmed sample.
Table 3.
Promotion-depth bins in the trimmed sample.
| Depth bin |
Count |
Share (%) |
| 0–5% |
3,474,890 |
74.90 |
| 5–10% |
537,027 |
11.57 |
| 10–20% |
469,218 |
10.11 |
| 20%+ |
158,227 |
3.41 |
Figure 1.
Distribution of promotion-depth bins. The figure shows that most observations are regular-price or shallow-discount weeks, while very deep discounts are comparatively rare.
Figure 1.
Distribution of promotion-depth bins. The figure shows that most observations are regular-price or shallow-discount weeks, while very deep discounts are comparatively rare.
To describe promotion dynamics beyond depth intensity, promotion episodes are defined as consecutive weeks with
for a given store–UPC pair.
Table 4 reports the duration distribution, and
Table 5 summarizes episode length and depth.
3.2. Econometric Design
The baseline static design is a high-dimensional fixed-effects model estimated on the top-50 and top-100 UPC subsamples. The identifying variation comes from within-week, within-product, and within-store differences in discount depth while flexibly absorbing confounding demand and supply conditions. The static specification is
where
denotes store × UPC fixed effects,
denotes store × week fixed effects, and
denotes UPC × week fixed effects. Estimation uses
AbsorbingLS, and standard errors are clustered two ways, by store and by UPC.
Under this specification, the coefficients measure the incremental association between discount depth and weekly unit sales relative to the 0–5% reference category. Intuitively, the model compares the same UPC across stores in the same week and the same store across UPCs in the same week, net of persistent store-specific product heterogeneity.
The dynamic design builds promotion episodes from a threshold indicator equal to one when
. A clean promotion start is defined as a promotional week preceded by a non-promotional week and by an eight-week pre-window with no earlier promotion for the same store–UPC pair. Non-overlapping starts are retained, event time is indexed from
to
, and the corrected event-study sample is restricted to balanced 17-week windows. The promotion-start event-study specification is
where
are event fixed effects and
are calendar-week fixed effects.
The paper evaluates three empirical hypotheses. H1 predicts monotone positive sales lift across depth bins. H2 predicts diminishing marginal returns, which in the present coarse-bin design would imply that the incremental gain from 10–20% to 20%+ is smaller than the gain from 5–10% to 10–20%. H3 predicts a post-promotion dip consistent with inventory drawdown. The static model provides strong evidence on H1, mixed evidence on H2, and the promotion-end event study provides direct evidence on H3.
Stage 9 and Stage 10 robustness checks target three concerns: regular-price mismeasurement, treatment-coding choices, and sample contamination. The first replaces the store × UPC median regular price with the 75th percentile. The second redefines bins more coarsely. The third excludes nearly chain-wide promotions, restricts attention to stable UPCs, and trims likely stockout-risk observations.