Submitted:
14 June 2025
Posted:
17 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. The Evolving Landscape of E-Commerce and the Imperative for Optimization
- A great deal of subjectivity and HiPPO Syndrome: Most decisions are taken based on what the Highest Paid Person thinks or the dominant internal voices rather than any objective data on the preferences and behavior of users (Goodhart, 1984; Feast & Cielen, 2021).
- Assumption-Driven Design: Designs are based on what users want or need, which may not align with what real observed user behavior is. This, often, comes from incomplete data on the customer or flawed mental models of the customer.
- Not a Cause and Effect: Many of the big changes happen all at the same time in a typical relaunch. It is very hard, if not possible, to say that the things done in the design are what made the performance better or worse after the relaunch. This does not stop learning and efforts to make things better later.
- A high risk profile is a flawed all at once relaunch that can cripple the business by alienating the existing customers, introducing critical usability flaws, or damaging search engine rankings. There is no opportunity to learn and iterate before full market exposure; it is a high stakes gamble.
- Resource Misallocation: Concentrating the major inputs in terms of design, development, and marketing over elements that in the long run do not add to performance, and may even turn out to have a negative effect would result in a waste of resources and opportunity cost.
1.3. A/B Testing as a Scientific Approach to Relaunch Optimization
1.4. Research Objectives and Contribution
- Set up a sound principle and economic framework for the use of A/B testing in e-commerce website relaunches and to see experiment as an investment in information and risk.
- Take a wider look into advanced methods, Bayesian experimental design in particular, for the improvement of the statistical power, decision efficiency, and business relevance of A/B tests for relaunch.
- Extend the display of tangible evidence, through a lengthy examination of numerous case studies (both real and typical, well-founded examples), showing the measurable impact of A/B testing on primary metrics like conversion rates, and AOV, user engagement as a secondary metric, and importantly, risk mitigation during website relaunches.
- Provide actionable strategies for e-commerce businesses and conversion rate optimization (CRO) agencies to integrate A/B testing effectively into their processes of relaunch. Move from project-based to a mindset of continuous optimization.
- Make a contribution to the academic literature by synthesizing insights from economics, statistics (particularly information economics and experimental economics), and marketing science to formalize the study of A/B testing-driven e-commerce relaunch as a strategic imperative.
- This study builds on existing work but with a much more in-depth, economically based view of the complete relaunch lifecycle, wherein A/B testing is the focal strategy. It goes above and beyond single test analyses by supporting a complete optimization paradigm that will, according to the proposal, maximize expected return and minimize downside risk for a critical e-commerce relaunch.
2. Methodological Framework for A/B Testing-Driven Relaunches
2.1. Theoretical Economic Framework: Experimentation as an Investment Under Uncertainty
- Prior Beliefs (Priors): The first assessment based on existing data or how likely the design would change.
- Cost of Experimentation (C<sub>exp</sub>): Direct costs and the short-term negative impact of inferior variations.
- Value of Information (VoI). This is a decrease in uncertainty about the quality of the decision that will result from the test's output. It will be greatest whenever initial uncertainty was large and proper choice has big subsequent effects.
- Exploration vs. Exploitation Trade-off: The basic trade-of of allocating resources (traffic) between testing new uncertain options and deploying the current best-known option. This is fundamental to multi-armed bandit problems that serve as theoretical lenses for dynamic allocation strategies in A/B testing (Robbins, 1952; Scott, 2010; Thompson, 1933).
2.2. Bayesian Experimental Design for Relaunch A/B Testing
- Intuitive Interpretation of Results: Bayesian methods give direct probability statements about hypotheses such as "There is 98% probability of Variation B having a higher conversion rate than Variation A" and the magnitude of the difference between the two variations. This is more useful for making business decisions than p-values and confidence intervals, which are actually often misinterpreted (Goodman, 2008).
- Incorporation of Prior Knowledge: Priors (probability distributions representing beliefs about parameters before observing new data) can be formally incorporated. For a relaunch, this might include data from previous smaller tests, industry benchmarks, qualitative user research insights, or even expert opinion (though the latter should be used cautiously). This is particularly useful when testing radical redesigns where initial data might be sparse or when baseline rates are well-established.
- Cumulative Learning and Adaptive Stopping: Monitoring results on the go with the flow of data and stopping tests as and when there is enough evidence—rather than fixed pre-determined sample sizes—can be set up to stop tests when enough evidence has been gathered (for example when the "probability to beat original" crosses a predefined threshold or "expected loss" of choosing one variant over another). This would make the decision process faster and save a lot of valuable traffic if one variant has much better performance.
- Handling Small Sample Sizes (with appropriate caution): While still requiring adequate data for robust inference, Bayesian methods can offer more stable and informative inferences with smaller samples compared to frequentist methods, especially when informative priors are justified and used responsibly. This can be relevant when initially testing a completely new site on a low traffic segment.
- Value Based Decision Making: Savage (1954) developed a model for utility/loss functions that could be explicitly integrated into any decision (Bayesian or not). The expected value of the decision replaces an abstract consideration about the optimality of a choice. Thus, statistics can directly serve the businesses by making decisions that reflect the true aims of the study (e.g., making decisions that reflect the aim to maximize expected profit rather than simply finding a "significant" difference").
- Defining former probability allocations for the KPI of interest (e.g., conversion rate, θ) for every alternative (Control θA, Variation θB). These may be non-informative (e.g., uniform allocation) or informative.
- Selecting a possible function that explains the data generation method (e.g., binomial for conversions, Gaussian for continuous metrics like AOV).
- Viewing data from the A/B test (e.g., number of visitors and conversions for each version). Compute the posterior probability distributions for θA and θB using Bayes' theorem: P(θ|Data) ∝ P(Data|θ) * P(θ).
- Make decisions based on these posterior distributions, for example by computing P(θB > θA | Data), the expected uplift E[θB - θA | Data], or the expected loss that the loss is conditioned on choosing the suboptimal alternative.
2.3. Data Sources and Case Study Selection Criteria
- Publicly Available Case Studies: Reputable A/B testing platforms (e.g., VWO, Optimizely, Convert.com), CRO agencies (e.g., Conversion Rate Experts, Speero), and e-commerce businesses often publish detailed case studies. These are critically evaluated for methodological rigor, clarity of reporting, statistical validity, and verifiability of claims. When primary sources are not directly accessible, secondary reports are acknowledged.
- Academic Research and Meta-Analyses: Existing peer-reviewed studies on A/B testing, website optimization, experimental economics, and consumer behavior in digital environments are integrated.
- Clear articulation of the business problem, hypothesis, and specific goals of the test within the relaunch context.
- Detailed description of the control and variation(s), highlighting the key differences.
- Reported sample sizes (where available), test duration, and definition of primary and secondary KPIs.
- Reported statistical significance (e.g., p-values) or Bayesian equivalents (e.g., probability to beat original, credible intervals, expected loss).
- Quantifiable impact on relevant business metrics (e.g., CVR, AOV, revenue per visitor).
- Relevance to common e-commerce relaunch challenges. Archetypal examples are used to illustrate common principles where specific public data is limited and are clearly identified as such.
2.4. Determining Test Duration, Sample Size, and Statistical Power
- Type I Error (False Positive): Concluding a new design is better when it is not. The probability of this error is denoted by α (the significance level), with α typically set at 0.05 or lower.
- Type II Error (False Negative): Failing to detect a truly better design. The probability of this error is denoted by β. Statistical power (1-β) is the probability of correctly detecting a true effect, typically aimed at 0.80 or higher.
- Key parameters for sample size calculation in a frequentist framework include:
- Baseline Conversion Rate (BCR): The performance of the current site (Control).
- Minimum Detectable Effect (MDE): The smallest improvement (absolute or relative) deemed business-relevant and worth detecting. For a full relaunch, the MDE might be set higher (e.g., 5-10% relative lift) than for minor element tests, reflecting the investment and strategic importance.
- Significance Level (α).
- Statistical Power (1-β).
2.5. Phased Relaunch vs. Full Relaunch A/B Testing Strategies
-
Iterative Phased Relaunch: The new design vision is broken down into key components or sections (e.g., homepage redesign, new navigation structure, revised product page layout, streamlined checkout process). Each component is redesigned and A/B tested against its counterpart on the old site. Winning changes are incrementally rolled out, gradually evolving the site towards the new vision.
- ▪
- Pros: Lower risk per individual test; easier to attribute impact of specific changes to observed KPI movements; learning accumulates iteratively; allows for adjustments to the overall vision based on early test results.
- ▪
- Cons: Can be a slower overall relaunch process; potential for a "Frankenstein" design if elements tested in isolation do not cohere well aesthetically or functionally; may miss holistic synergistic effects (or negative interactions) of a completely new, unified experience.
-
Full Experience Relaunch A/B Test: The complete new website design (Variation B) is developed and then A/B tested against the entire old website design (Control A). Typically, traffic is split (e.g., 50/50, or a smaller percentage to the new site initially for risk management, like 90/10, then scaled up).
- ▪
- Pros: Directly measures the overall aggregate impact of the new experience, capturing all interaction effects between changed elements; provides a clear "go/no-go" signal for the entire redesign.
- ▪
- Cons: Higher risk if the new design performs significantly worse, as a larger portion of traffic is exposed; more complex to implement technically (serving two entirely different site versions); if the new design under-performs, it can be challenging to pinpoint specific causal factors within the new design without further granular testing. Requires robust infrastructure and careful planning for deployment and rollback.
3. Results: Empirical Evidence of A/B Testing Efficacy in Relaunches
3.1. Aggregate Performance Lifts from Comprehensive Relaunch A/B Tests
-
Case Study 1: "TechGadgetPro" (Archetypal B2C Electronics Retailer)
- ▪
- Background: This archetypal case represents an established online retailer facing declining conversion rates (Baseline CVR: 1.8%), an outdated site design, and a poor mobile user experience. The strategic goal of the relaunch is to modernize the overall design, significantly improve mobile usability, and thereby increase overall CVR and revenue. This scenario is common in the e-commerce industry.
- ▪
- Methodology: A full A/B test is conducted, splitting traffic 50/50 between the old site (Control) and a completely redesigned new site (Variation). The test runs for 6 weeks. The new site features responsive design, streamlined navigation, enhanced product imagery, and an intuitive checkout.
- ▪
- Primary KPI: Overall e-commerce conversion rate.
- ▪
- Secondary KPIs: AOV, bounce rate, mobile CVR, add-to-cart rate.
- ▪
-
Results (Illustrative of potential outcomes):
- ○
- The new site (Variation) achieves an overall CVR of 2.15%, a +19.44% relative uplift (statistically significant, with p-value < 0.01).
- ○
- Mobile CVR improves from 0.9% to 1.5% (+66.7% uplift).
- ○
- AOV increases by +3.5%.
- ○
- Homepage bounce rate decreases by -12%.
- ▪
- Economic Implication: For a site with $10M annual revenue, such a CVR lift could translate to an additional ~$1.94M annually. This archetypal example highlights the potential scale of impact and risk mitigation benefits of testing a full redesign.
-
Case Study 2: Earth Class Mail (B2B Lead Generation by Conversion Rate Experts)
- ▪
- Background: Earth Class Mail, a provider of virtual mailroom services, was acquired by an investor group aiming for significant growth. Conversion Rate Experts (CRE) was engaged to improve traffic, leads, and revenue (Convert.com, 2024).
- ▪
- Methodology: CRE employed their research-heavy methodology, including consumer surveys, usability analysis (e.g., Hotjar), Google Analytics data review, and stakeholder interviews. Based on this, new versions of key pages were designed and A/B tested using Convert Experiences software. For example, research revealed many users learned of Earth Class Mail from "The 4-Hour Workweek," so a quote from the book was added for credibility (Convert.com, 2024).
- ▪
-
Results:
- ○
- A/B testing on the landing page yielded a 61% increase in leads.
- ○
- A/B testing on the pricing page, with a new design built in Convert Experiences, generated 57% more leads than the original.
- ○
- The cumulative effect of these (and potentially other) validated changes resulted in over $1.5 million in increased annual revenue for Earth Class Mail (Convert.com, 2024).
- ▪
- Implication for Relaunches: This case study directly demonstrates how a research-backed, A/B testing-driven approach to optimizing key pages within a broader growth initiative can lead to substantial increases in conversions and revenue for B2B services. It highlights the power of validating specific changes with empirical data.
3.2. Iterative Improvements within a Phased Relaunch Strategy
-
Navigation Optimization during Relaunch:
- ▪
- Case Study 3: Furniture E-commerce Site (Archetypal Navigation Test)
- ▪
- Background: An online furniture retailer identifies through analytics that users struggle with a complex navigation menu, leading to high bounce rates. This is a common issue addressed in phased relaunches.
- ▪
- Methodology: Based on principles often discussed by A/B testing platforms (e.g., VWO, n.d., offers various ideas for homepage and navigation testing), a conceptual test is designed:
- ▪
- Control: Existing complex mega-menu.
- ▪
- Variation: Redesigned, wider navigation displaying main subcategories more directly to reduce clicks and improve discoverability.
- ▪
- Results (Conceptual/Illustrative of potential outcomes):
- ▪
- Such navigation optimizations frequently yield positive results in practice. An illustrative uplift for such a test, if successful, might fall in the range of +3-5% in overall conversion rate. For instance, a hypothetical +3.85% lift (as sometimes anecdotally discussed for such changes) would be considered a strong, positive outcome for this type of test, though actual results are always specific to the site and implementation.
- ▪
- Economic Rationale: Improving navigation reduces user friction and cognitive load. A/B testing platforms often feature general discussions or examples where simplifying navigation or making options more discoverable leads to better engagement and conversion.
-
Product Page Enhancements during Relaunch:
- ▪
-
Case Study 4: Residence Supply (Product Page Overhaul)
- ○
- Background: Residence Supply, a home improvement products provider, struggled with converting site traffic into sales and needed to address bottlenecks in their conversion funnel, particularly on product pages (OptiMonk, n.d.).
- ○
- Methodology: They decided to overhaul their product pages. Instead of manual updates, they used "Smart Product Optimizer" to streamline the process and add new compelling descriptions automatically to thousands of product pages.
- ○
- Results: This automated product page optimization led to a 17.4% conversion rate increase and a 3.1% increase in revenue (OptiMonk, n.d.).
- ▪
-
Case Study 5: Clarks (Footwear Retailer - Highlighting Free Delivery)
- ○
- Background: Clarks sought to increase completed orders by highlighting their free delivery policy for orders over £50, a benefit many users were unaware of (abtest.design, 2024).
- ○
- Methodology: The team tested a variation of the product and delivery page that emphasized the free delivery offer against the original version.
- ○
- Results: This A/B test resulted in an increase in conversions by 2.6% (abtest.design, 2024).
-
Value Proposition Communication & Trust Signals during Relaunch:
- ▪
-
Case Study 6: Crown & Paw (Homepage Headline A/B Test)
- ○
- Background: Crown & Paw, specializing in personalized pet portraits, experienced high traffic but low conversion rates (OptiMonk, n.d.).
- ○
- Methodology: They began by A/B testing their homepage headline to find the most effective message for their audience.
- ○
- Results & Analysis: This headline A/B test led to a 16% increase in orders. Further AI-optimized product pages resulted in an additional 12% increase in orders compared to original pages and a 43% increase in revenue (OptiMonk, n.d.).
- ○
- Relaunch Implication: This demonstrates that even seemingly simple elements like homepage headlines, when optimized through A/B testing as part of a relaunch or CRO initiative, can significantly impact conversions by better communicating the core value proposition.
3.3. Risk Mitigation and Avoidance of Negative Outcomes through Relaunch A/B Testing
-
Case Study 7: "HomeGoodsOnline" (Archetypal Risk Mitigation Scenario)
- ▪
- Background: An archetypal scenario where internal stakeholders strongly favor an aesthetically-driven design change (e.g., ultra-minimalist product listing pages) that inadvertently harms usability. This conceptual example illustrates a common challenge.
- ▪
- Methodology: An A/B test compares the existing, functional PLP (Control) with the new minimalist PLP (Variation) which hides key information like price until hover.
- ▪
-
Results (Illustrative of potential negative outcomes):
- ○
- The minimalist Variation shows a significant drop in Add-to-Cart rates (e.g., -22%) and overall CVR (e.g., -15%). Session recordings reveal user frustration.
- ▪
- Outcome: The A/B test prevents the rollout of a detrimental design, saving potentially significant revenue and preserving user experience. This conceptual example illustrates a crucial, yet often undocumented, benefit of A/B testing: avoiding costly mistakes.
- General Observation on Risk in Relaunches (Industry Insight): It is widely acknowledged in the CRO industry that a substantial percentage of proposed changes, even those believed to be improvements, perform neutrally or negatively when A/B tested (Kohavi et al., 2020). Without testing, these detrimental changes deployed during a relaunch can collectively degrade overall site performance.
3.4. Summary of Patterns from Case Studies
- Substantial Uplifts Are Possible: Targeted A/B tests, whether on specific elements like headlines and product page layouts (e.g., Crown & Paw, Residence Supply) or more comprehensive page redesigns (e.g., Earth Class Mail), can yield significant improvements in key metrics.
- Risk Mitigation is a Key Value: The archetypal "HomeGoodsOnline" scenario, coupled with industry understanding (Kohavi et al., 2020), underscores A/B testing's crucial role in safeguarding against detrimental changes that might otherwise be deployed based on intuition.
- Context Matters: The varied outcomes in different tests (e.g., free shipping impact can differ) emphasize that principles must be applied and tested within each unique business context.
- Iterative Gains Add Up: Improvements validated on specific site sections contribute to overall enhanced performance, supporting the value of phased approaches or component-wise optimization within larger relaunches.
- Data Trumps Intuition: Several cases highlight how data-driven results can contradict internal opinions or prevailing design trends, reinforcing the necessity of empirical evidence.
4. Discussion
4.1. Theoretical Implications for E-Commerce Optimization and Economic Models of Experimentation
- Value of Information in Reducing Uncertainty: E-commerce relaunches are inherently decisions made under substantial uncertainty regarding user response. A/B testing directly addresses this by providing empirical data, thereby reducing uncertainty and increasing the probability of making value-maximizing choices. This aligns with Stigler's (1961) work on the economics of information.
- Validation of Experimental Economics Principles: The success of A/B testing in optimizing complex systems like e-commerce websites aligns with the core principles of experimental economics, which advocate for the use of controlled trials to determine causal impacts of interventions in economic environments (Smith, 1994; List, 2011).
- Exploration-Exploitation in Practice: The decision to test a new relaunch design versus sticking with the old (or iterating further) is a direct application of the exploration-exploitation trade-off (March, 1991). A/B testing provides the data to inform when to shift from exploring new designs to exploiting a proven superior one.
- Fat-Tailed Distribution of Innovation Returns: The observation that some A/B tests yield modest or no gains, while others can yield exceptionally large returns (e.g., Earth Class Mail), is consistent with the idea that the impact of business innovations often follows a fat-tailed distribution (Kohavi et al., 2020). This is critically important for relaunch strategy because it justifies broad experimentation: while many individual tested ideas within a relaunch may yield small or no measurable gains, the potential for a few "big wins" to deliver massive ROI can offset the costs of numerous smaller experiments and drive significant overall improvement. This encourages a portfolio approach to testing ideas during a relaunch, rather than relying on a single, untested "silver bullet" redesign.
- Behavioral Economics Insights: A/B testing often reveals consumer behaviors that deviate from purely "rational" models, highlighting the impact of psychological biases, heuristics, and framing effects (Kahneman, 2011). A/B tested relaunches can systematically leverage these behavioral insights.
4.2. Practical Implications for E-Commerce Relaunch Strategy and Execution
- Adopt an Experimentation Culture: The most significant implication is the need for a cultural shift towards embracing experimentation as a core competency, moving away from opinion-based decisions to a "test and learn" mindset.
- Strategic Integration of A/B Testing in Relaunch Planning: A/B testing should be an integral part of the relaunch strategy from the outset, including budget and time allocation.
- Prioritize User-Centric Hypotheses: Successful A/B tests are driven by strong, user-centric hypotheses grounded in data (analytics, user feedback, usability studies).
- Combine Qualitative and Quantitative Insights: A synergistic approach, where qualitative insights inform test hypotheses and quantitative results validate them, is most powerful.
- Invest in Robust A/B Testing Infrastructure and Expertise: Effective A/B testing requires appropriate tools, skilled analysts, and capable developers.
- Define Business Significance, Not Just Statistical Significance: The Minimum Detectable Effect (MDE) should be tied to tangible business goals and ROI, beyond mere statistical significance (e.g., achieving a result where the p-value < α, with α typically set at 0.05).
- Plan for Post-Test Learning, Iteration, and Monitoring: Losing variants provide valuable insights. Winning variants require long-term monitoring to ensure sustained lifts and watch for unintended consequences.
- Systematically Segment Results: Analyzing results across key user segments can reveal valuable insights and opportunities for personalization.
4.3. Addressing the "Full Redesign" A/B Test Challenge and Risk Stratification
- Phased Rollout of the Test: Start by exposing a small percentage of traffic (e.g., 5-10%) to the new design.
- Prior Iterative Learning: A "full redesign A/B test" is often best approached as the culmination of previous iterative learnings.
- Clear Rollback Plan: A robust and tested rollback plan is essential.
- Focus on Transformational Change: Such large-scale tests are most justified when the new design represents a fundamental shift.
4.4. Limitations and Methodological Considerations
- Publication Bias: Case studies, particularly those from commercial sources, tend to highlight successes. The true average impact of A/B tested relaunches across all businesses might be more modest if failures, inconclusive results, or very small wins were reported with the same frequency.
- Generalizability of Specific Uplifts: Specific percentage uplifts are highly context-dependent (industry, traffic, baseline performance, nature of changes, audience). Principles are transferable, but outcomes are not guaranteed to replicate.
-
Long-Term Effects, Novelty, and Change Aversion:
- ▪
- Short-term A/B test results can be influenced by the novelty effect, where users initially react positively to any change simply because it is new, or by change aversion, where users initially react negatively to unfamiliar interfaces even if they are objectively better. These behavioral phenomena can temporarily skew results, making a losing variant appear to win, or vice-versa.
- ▪
- To mitigate these biases and ascertain the true, sustainable impact of a relaunch, long-term tracking of KPIs for several weeks or even months post-implementation of a winning variant is crucial. Furthermore, cohort analysis is an indispensable tool. By comparing the behavior and performance metrics of the user cohort exposed to the new design over an extended period against cohorts who only experienced the old design (or a control group not part of the initial test), businesses can distinguish genuine, durable improvements from temporary behavioral shifts attributable to novelty or initial resistance to change. This rigorous post-test analysis is essential for validating the long-term success of a relaunch.
- Interaction Effects in Phased Relaunches: When iteratively testing and implementing winning elements, their combined effect might not be strictly additive due to complex interaction effects. A holistic design vision remains important.
- Organizational Culture and Resource Constraints: A genuine culture of experimentation and adequate resources (time, budget, expertise) are prerequisites. These can be significant hurdles, especially for smaller businesses.
- Scope of Testing and Strategic Priorities: It's often impractical to A/B test every single change in a major relaunch. Strategic decisions, guided by risk assessment and potential impact, must determine which elements warrant rigorous testing.
- Variability Across Sectors and Business Models: The impact and feasibility of A/B testing-driven relaunches can differ significantly. B2C e-commerce with high traffic volumes may allow for rapid testing of many variations, while B2B sites with longer sales cycles and lower traffic (like Earth Class Mail, which still benefited greatly) may require different approaches, longer test durations, or a focus on micro-conversions and lead quality. Subscription models versus one-time purchase models also present different optimization challenges and KPI focuses. Market maturity and competitive intensity within a sector can also influence the urgency and types of changes tested, potentially leading to different risk/reward calculations for experimentation.
4.5. Future Research Directions
- Longitudinal Studies of A/B Tested Relaunches: Tracking performance over extended periods (1-3 years) to assess sustained impact and ROI versus traditional methods.
- Advanced Segmentation and Personalization: Developing methodologies for identifying heterogeneous treatment effects across fine-grained user segments during relaunch tests, potentially using machine learning.
- Quantifying the Economic Value of Risk Mitigation More Precisely: Developing formal models to estimate the "avoided loss" or "option value" of A/B testing.
- Optimizing Testing Sequences and Managing Interaction Effects: Research into optimal sequencing of tests during phased relaunches to maximize positive interactions.
- The Role of AI and Automation: Exploring how AI can assist in hypothesis generation, variation design, dynamic traffic allocation, or real-time personalization based on A/B test data.
- Integrating A/B Testing with Broader Business Strategy: Research on how insights from relaunch A/B tests inform product development, pricing, and marketing communications.
5. Conclusions
References
- abtest.design. (2024, September 11). Highlighting free delivery. Retrieved May 28, 2025, from https://abtest.design/tests/highlighting-free-delivery.
- Arrow, K. J. (1971). Essays in the Theory of Risk-Bearing. North-Holland Publishing Company. [CrossRef]
- Convert.com. (2024, June 4). Convert Case Study: CRE and Earth Class Mail. Retrieved May 28, 2025, from https://www.convert.com/case-studies/conversion-rate-experts/.
- Conversion Rate Experts. (n.d.). Homepage. Retrieved May 28, 2025, from conversion-rate-experts.com.
- Deng, A., Xu, Y., Kohavi, R., & Walker, T. (2013). Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In Proceedings of the sixth ACM international conference on Web search and data mining (WSDM '13) (pp. 123-132). ACM.
- Feast, G., & Cielen, D. (2021). Practical A/B Testing. O'Reilly Media.
- Goodhart, C. A. E. (1984). Problems of monetary management: The U.K. experience. In A. Courakis (Ed.), Inflation, Depression, and Economic Policy in the West (pp. 111-146). Mansell.
- Goodman, S. N. (2008). A dirty dozen: Twelve p-value misconceptions. Seminars in Hematology, 45(3), 135-140. [CrossRef]
- Gronau, Q. F., Ly, A., & Wagenmakers, E. J. (2020). Informed Bayesian inference for the A/B test. Journal of Statistical Software, 95(10), 1-39. [CrossRef]
- Growth Rock. (2019, January 11). E-commerce Free Shipping Case Study: How much can it increase... Retrieved May 28, 2025, from https://growthrock.co/ecommerce-free-shipping-case-study/.
- Howard, R. A. (1966). Information value theory. IEEE Transactions on Systems Science and Cybernetics, 2(1), 22-26.
- Johari, R., Koomen, P., Pekelis, L., & Walsh, D. (2017). Peeking at A/B tests: Why it matters, and what to do about it. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '17) (pp. 1517-1525). ACM.
- Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
- Kohavi, R., & Longbotham, R. (2017a). Online controlled experiments and A/B testing. In C. Sammut & G. I. Webb (Eds.), Encyclopedia of Machine Learning and Data Mining (pp. 910-918). Springer.
- Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press.
- Kotler, P., & Keller, K. L. (2016). Marketing Management (15th ed.). Pearson Education Limited.
- Kruschke, J. K. (2014). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan (2nd ed.). Academic Press.
- List, J. A. (2011). Why economists should conduct field experiments and why they haven't. Journal of Economic Perspectives, 25(3), 3-16.
- Manzi, J. (2012). Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society. Basic Books.
- March, J. G. (1991). Exploration and exploitation in organizational learning. Organization Science, 2(1), 71-87.
- Moe, W. W., & Fader, P. S. (2004). Capturing evolving visit behavior in clickstream data. Journal of Interactive Marketing, 18(1), 5-19. [CrossRef]
- OptiMonk. (n.d.). Conversion Rate Optimization Case Studies: 6 Success Stories and ... Retrieved May 28, 2025, from https://www.optimonk.com/conversion-rate-optimization-case-studies/.
- Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527-535. [CrossRef]
- Savage, L. J. (1954). The Foundations of Statistics. John Wiley & Sons.
- Scott, S. L. (2010). A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6), 639-658.
- Shankar, V., Kleijnen, M., Ramanathan, S., Rizley, R., Holland, S., & Morrissey, S. (2016). Mobile shopper marketing: Key issues, current insights, and future research avenues. Journal of Interactive Marketing, 34, 37-48. [CrossRef]
- Smith, V. L. (1994). Economics in the laboratory. Journal of Economic Perspectives, 8(1), 113-131.
- Stigler, G. J. (1961). The economics of information. Journal of Political Economy, 69(3), 213-225.
- Stucchio, C. (2015). Bayesian A/B Testing at VWO. VWO Whitepaper from vwo.com/downloads/VWO_SmartStats_technical_whitepaper.pdf.
- Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285-294.
- VWO. (n.d.). Resources/Case Studies. Retrieved May 28, 2025, from vwo.com/resources/.
- Wald, A. (1947). Sequential Analysis. John Wiley & Sons.
- Wasp Barcode Technologies. (n.d.). Homepage. Retrieved May 28, 2025, from https://www.waspbarcode.com.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
