Introduction
Repeated Measures ANOVA
Repeated measures ANOVA (RMA), also called dependent ANOVA, is a method for comparing two or more means of related groups, or (more commonly) the same group measured at two or more times, contexts, or conditions. Like independent ANOVA it may be used for two or more means, and also uses the assumption of normality of outcomes at each level if the independent variable. Sphericity is defined as the equality of variances of the differences between each pair of treatment levels, and is also an assumption for the use of parametric RMA (Field, 2013). The assumption of sphericity therefore requires at least three levels of the independent variable. Sphericity is assessed by Mauchly’s test of sphericity, and this is standard output for software computing RMA. If it is non-significant, sphericity may be assumed. If significant, corrections exist. These include the Greenhouse-Geisser and Huynh-Feldt corrections (Field, 2013). For violations of normality, several methods exist such as Friedman’s ANOVA. Whereas the assumption of independent observations exists for independent ANOVA, RMA assumes related observations (i.e., relatedness). Hence, this method is typically used in studies where participants are measured at two or more timepoints, contexts, or conditions. It may also be potentially used to study related samples (e.g., comparing the same traits of several generations of plants or bacteria). Repeated measures ANOVA also requires complete data for all timepoints. Any case with incomplete data will be excluded from analysis, and standard repeated measures ANOVA may not be used. For studies requiring repeated measurements, where the violations of related observations and complete data occur, methods such as multilevel models may be used. Finally, for analyses with more than two levels, post hoc comparisons may be used. A brief discussion of post hoc comparisons and their descriptions is listed below.
Table 1.
List of ANOVA pairwise comparisons/post hoc tests.
Table 1.
List of ANOVA pairwise comparisons/post hoc tests.
| Pairwise comparison |
Description of method |
| Bonferroni |
Single step. Strong control of Type I error/family-wise error rate (FWER). However, strong control may lead to Type II error. |
| Dunnett |
Single step. Designed to compare treatment groups to a control group. |
| Benjamini-Hochberg/False Discovery Rate (BH/FDR) |
Controls false discovery rate (FDR) rather than FWER; more powerful than Bonferroni, Holm. |
| Benjamini-Yekutieli (BY) |
Same as above. |
| Hochberg |
Step-up. Similar to Tukey, but better for unbalanced designs. More powerful than Bonferroni. |
| Holm |
More powerful, step-down modification of Bonferroni procedure, while controlling FWER. |
| Hommel |
Step up. Generally more powerful than Holm. |
| Sidak |
Single step, though a step down version also exists. Stringent Type I error control, but more powerful than Bonferroni. |
| Scheffe |
Single step. Flexible, with lower power compared to Tukey for pairwise comparisons. |
| Tukey HSD (Honestly Significant Difference) |
Single step. Widely used method across several fields; use for balanced designs. |
| Games-Howell |
Single step. Use when the assumption for equal variances is violated. |
| Dunn |
Single step. Non-parametric test; use with non-parametric ANOVA. |
| Fisher’s Least Significant Differences |
Single step. Does not control for inflation of Type I error; generally not recommended for post hoc testing. |
The R environment
R is a free and open-source environment for statistical computing, supported by a worldwide network of scientists and programmers. It contains add-on packages for statistics, data science, and related areas such as geographical information systems. These qualities make R a powerful tool for teaching and data analysis. However, R requires coding which may intimidate new users, and may compound statistical anxiety in introductory statistics courses using R. R may be used with an integrated development environment such as RStudio (Posit, 2025) to more easily manage data, packages, and objects.
Repeated Measures ANOVA in R
There are several packages that implement RMA in R. The rstatix package (Kassambara, 2025) is a comprehensive option for both parametric and nonparametric RMA. Other packages include afex (Singmann, et al., 2025) combined with datawizard (Patil, et al, 2022), and emmeans (Lenth & Piakowski, 2025) for parametric RMA. The ARTool package (Wobbrock at al., 2011) combined with emmeans are also a good option for nonparametric analysis. The WRS2 package also offers a robust option as well as robust post hoc tests. The tidyr package (Wickham et al., 2024) is used to demonstrate reformatting of data from long to wide, and vice versa. The ggstatsplot package (Patil, 2021) will be used for visualization.
For Type I sums of squares in ANOVA analysis, the order in which predictors are entered matters. This method is not recommended for repeated measures designs, as they do not evaluate main effects or interactions. The base R package uses this variant only (Field, Miles & Field, 2012). Conversely, Type II sums of squares takes all main effects into consideration, while ignoring higher order (interaction) effects. Type III sums of squares evaluates the effects (main and interaction), taking all other effects into context, and is also used for unbalanced designs. It is the default used by commercial packages that implement RMA like SPSS, and for R packages such as rstatix (Field, Miles & Field, 2012; Kassambara, 2025). The second problem with using the base R functions is that they do not produce tests of sphericity. Hence repeated measures ANOVA using the base R package will be omitted from this tutorial.
The packages presented here use the long data format. This means that the time or context variable has one dedicated column with all of its levels, and each observation’s timepoint is in a separate row. Hence, for a study that measures four timepoints, each participant will have four rows of data. An ID number is used so that the software can discern every participant across all levels. This is in contrast to the wide format, where every level of time or condition has its own dedicated column, and therefore each participant has a dedicated row across several columns. This latter format is used for packages such as SPSS (Armonk, NY), and JASP (JASP Team, 2025). Syntaxes for conversion from wide long format and vice versa will be shown.
Objects
An object is a way of summarizing, storing, and managing some entity created, or found in R. For example, an object may be used for a dataset. For example, data1<-c(2, 4, 6, 17) assigns the object “data1” to four scores. In the table below, a dataset is converted from long to wide format, and a new object is assigned to the newly formatted data. Users are encouraged to create and manage objects at their discretion.
Data
The premise for this dataset is that a group of students engaged in a four-phase learning program designed to improve their performance in an undergraduate applied statistics course. Student performance was measured at each phase to evaluate the program’s effectiveness.
Method
Once the datasets have been uploaded into RStudio, the following steps may be taken to reformat data if needed, and for analysis. For parametric analysis, rstatix as well as afex and emmeans will be used. The rstatix package, WRS2, as well as ARTool and emmeans will be used. The syntaxes for converting from long to wide and vice versa using the tidyr package are shown below.
Table 2.
Syntaxes for conversion of formats.
Table 2.
Syntaxes for conversion of formats.
| Syntax |
Purpose |
| library(tidyr) |
Loads the package |
| pivot_wider(rma, names_from = phase, values_from = score) |
Converts data from long format to wide format. |
| rma_wide<-pivot_wider(rma, names_from = phase, values_from = score) |
Assigns a new object to the newly created data. |
| View(rma_wide) |
Checks to make sure dataset was properly converted. |
| rma_long<-pivot_longer(rma_wide, cols = -ID, names_to = "phase", values_to = "score") |
Converts data from wide to long format. |
Parametric Repeated Measures ANOVA
Repeated Measures ANOVA Using the Rstatix Package
The following syntaxes are used to compute RMA. Both the rstatix and afex packages require the long format for analysis. The ggstatsplot package is used to create a custom plot to visualize the analysis.
Table 3.
Parametric Repeated Measures ANOVA syntax using rstatix.
Table 3.
Parametric Repeated Measures ANOVA syntax using rstatix.
| Package |
Syntax |
Purpose |
| |
library(rstatix) library(afex) library(emmeans) |
|
| rstatix |
rma_desc<-rma %>% group_by(phase) %>% get_summary_stats(score) |
Creates object for descriptive statistics table. |
| |
rma_desc |
Shows descriptive statistics. |
| |
rm_rstatix<-anova_test(data = rma, dv = score, wid = ID, within = phase) |
Repeated measures ANOVA in rstatix. General eta squared is the default effect size. |
| |
rm_rstatix0<-anova_test(data = rma, dv = score, wid = ID, within = phase, effect.size = "pes") |
Similar to the above. Although the general interpretation is similar, note the difference between general and partial eta squared. |
| |
rm_rstatix_post_hoc<- rma%>% pairwise_t_test(score~phase, paired = TRUE, p.adjust.method = "holm") |
Post hoc tests using holm adjustment. Other options include Bonferroni, Hochberg, Hommel, and others. |
Figure 1.
Descriptive statistics.
Figure 1.
Descriptive statistics.
The default effect size for rstatix is general eta squared as shown below. Note the difference from partial eta squared, which tends to be more upward biased (Ellis, 2017).
Figure 2.
RM ANOVA summary table.
Figure 2.
RM ANOVA summary table.
If the assumption of Sphericity were met, the ANOVA table could be used, and reported as follows: F(3, 87) = 181, p < .01, η2g = .48. However, Mauchly’s Test of Sphericity is violated. Hence, either the Greenhouse-Geisser or Huynh-Feldt adjustment should be reported. The Greenhouse-Geisser adjustment is reported as follows: F(1.89. 54.68) = 181, p < .01, η2g = .48. In other words, the F value from the second line of the output is reported, along with the Greenhouse-Geisser adjusted degrees of freedom and the p-value from the “Sphericity Corrections” section. Alternatively, the Huynh-Feldt adjustment can be reported as follows: F(2.01, 58.36) = 181, p < .01, η2g = .48. Feel free to round as appropriate, e.g. F(2, 58.4) = 181, p < .01, η2g = .48.
If partial eta squared is chosen as the effect size, then report as follows: F(1.9. 54.7) = 181, p < .01, η2p = .86. Note that the latter report uses the Greenhouse-Geisser adjustment, and the Huynh-Feldt adjustment may be reported instead.
Figure 3.
ANOVA table with partial eta squared effect size.
Figure 3.
ANOVA table with partial eta squared effect size.
Finally, the output for the Holm post hoc test is shown below. Pairwise comparisons show that all means differed from one another. The descriptive statistics table shows that the score increased significantly from Phase 1 through Phase 4.
Figure 4.
Post hoc tests using Holm adjustment for multiple comparisons.
Figure 4.
Post hoc tests using Holm adjustment for multiple comparisons.
Other options for post hoc testing are shown in
Table 2. Simply use the adjustment you wish in the “
p.adjust.method” command in all lower-case letters. For example, the Hochberg method is shown below.
Figure 5.
Post hoc tests using Hochberg adjustment for multiple comparisons.
Figure 5.
Post hoc tests using Hochberg adjustment for multiple comparisons.
Repeated Measures ANOVA Using Afex and Emmeans
The afex package may also be used to compute RMA in tandem with the emmeans package to compute post hoc analysis. Visualization using the ggstatsplot package is also shown in this section.
Table 4.
Parametric Repeated Measures ANOVA syntax using datawizard, afex and emmeans.
Table 4.
Parametric Repeated Measures ANOVA syntax using datawizard, afex and emmeans.
| Package |
Syntax |
Purpose |
| |
library(afex) library(datawizard) library(emmeans) library(ggstatsplot) |
|
| datawizard |
describe_distribution(rma, select = "score", by = "phase") |
Descriptive statistics |
| afex |
rm_afex<-aov_ez(data = rma, id = "ID", dv = "score", within = "phase", anova_table = list(es = "pes")) |
Repeated measures ANOVA in afex. |
| |
rm_afex |
Shows ANOVA table with any adjustments. Greenhouse-Geisser is the default method. |
| |
summary(rm_afex) |
Same as above, but with more detail. |
| |
rm_afex_HF<- aov_ez(data = rma, id = "ID", dv = "score", within = "phase", anova_table = list(es = "pes", correction = "HF")) |
Use this syntax to get Huynh-Feldt adjustment instead. |
| emmeans |
emm<-emmeans(rm_afex, ~ phase) |
Shows means and other parameters for each level of the ANOVA. |
| |
pairs(emm) |
Pairwise comparisons; Tukey post hoc is the default. |
| |
rm_afex_hochberg = pairs(emm, adjust = "hochberg") |
Creates an object for Hochberg post hoc tests. Bonferroni, Hommel, Holm, Scheffe, Sidak, and other options may be found here. |
| ggstatsplot |
rmplot<-ggwithinstats(data = rma, x = phase, y = score, type = "parametric", pairwise.comparisons = TRUE, p.adjust.method = "holm", title = "Scores by Phase") |
Visualization. |
Figure 6.
Descriptive statistics using the datawizard package.
Figure 6.
Descriptive statistics using the datawizard package.
Note that the “aov_ez” command provides an abridged ANOVA table, while the summary command offers more detailed output. However, the abridged output mentions that the Greenhouse-Geisser adjustment is used, and also gives the adjusted degrees of freedom for this method. Therefore, this command intuitively suggests that Mauchly’s Test of Sphericity is violated.
Figure 7.
Repeated measures ANOVA table using afex.
Figure 7.
Repeated measures ANOVA table using afex.
The more detailed output below does provide an actual value for Mauchly’s Test, and p-values for both adjustments, but does not offer the adjusted degrees of freedom for either.
Figure 8.
Detailed afex ANOVA output.
Figure 8.
Detailed afex ANOVA output.
Post hoc testing may be obtained using the syntaxes below using the emmeans package. The Tukey adjustment is the default method.
Figure 9.
Tukey post hoc comparisons using emmeans.
Figure 9.
Tukey post hoc comparisons using emmeans.
Other post hoc tests may be obtained by specifying the adjustments as shown below.
Figure 10.
Hochberg and Hommel post hoc comparisons using emmeans.
Figure 10.
Hochberg and Hommel post hoc comparisons using emmeans.
Finally, visualization using the ggstatsplot package is shown below.
Figure 11.
Visualization using ggstatsplot.
Figure 11.
Visualization using ggstatsplot.
Options for Violated Normality Assumptions
Friedman’s ANOVA Using Afex
Friedman’s ANOVA is a popular non-parametric option for heavily skewed data, or small sample sizes. The rstatix package also performs Friedman’s ANOVA. The syntaxes are shown below, with ggstatsplot used for visualization.
Table 5.
Friedman’s ANOVA using rstatix.
Table 5.
Friedman’s ANOVA using rstatix.
| Package |
Syntax |
Purpose |
| rstatix |
friedman_rstatix<-friedman_test(data = rma, score~phase|ID) |
Friedman’s ANOVA in rstatix. |
| |
fes<-friedman_effsize(data = rma, score~phase|ID) |
Generates Kendall’s W. |
| |
friedman_post_rstatix<-pairwise_wilcox_test(rma, score~phase, paired = TRUE, p.adjust.method = "holm") |
Pairwise Wilcoxon tests with Holm adjustments. |
| |
pairwise_wilcox_test(rma, score~phase, paired = TRUE, p.adjust.method = "bonferroni") |
Same as above, except with Bonferroni adjustment. |
| ggstatsplot |
friedmanplot<-ggwithinstats(data = rma, x = phase, y = score, type = "nonparametric", pairwise.comparisons = TRUE, p.adjust.method = "holm", title = "Scores by Phase") |
Visualization. This package is helpful as it also gives an APA style report and post hoc comparisons. |
The Friedman test and Kendall’s W effect size outputs are shown below. They may be reported as follows: χ2Friedman(3, n = 30) = 80.5, p < .001, W = .90
Figure 12.
Friedman’s ANOVA output using rstatix.
Figure 12.
Friedman’s ANOVA output using rstatix.
Figure 13.
Pairwise Wilcoxon post hoc comparisons with Holm adjustments.
Figure 13.
Pairwise Wilcoxon post hoc comparisons with Holm adjustments.
Robust Repeated Measures ANOVA Syntax Using WRS2
The WRS2 package is another alternative. It offers robust and bootstrapped options for RMA and post hoc tests in case of violated assumptions. The syntaxes are shown in the table below.
Table 6.
Syntax for repeated measures ANOVA using WRS2.
Table 6.
Syntax for repeated measures ANOVA using WRS2.
| Package |
Syntax |
Purpose |
| |
library(WRS2) |
|
| WRS2 |
robust_RM_WRS2<-rmanova(y = rma$score, groups = rma$phase, blocks = rma$ID) |
Robust repeated measures ANOVA |
| |
with(rma, (rmanova(y = score, groups = phase, blocks = ID))) |
Alternate syntax for above. |
| |
rmmcp(y = rma$score, groups = rma$phase, blocks = rma$ID) |
Post hoc tests for trimmed means. |
| |
rmanovab(y = rma$score, groups = rma$phase, blocks = rma$ID) |
Bootstrapped robust ANOVA. |
| |
rmanovab(y = rma$score, groups = rma$phase, blocks = rmaf$ID, nboot = 2000) |
Same as above, specifying 2000 bootstrapped samples. |
| |
pairdepb(y= rma$score, groups = rma$phase, blocks = rma$ID, nboot = 2000) |
Bootstrapped post hoc tests for the above ANOVA. |
The first option for WRS2 is repeated measures ANOVA with 20% trimmed means. The null hypothesis is rejected, and should be reported as follows: F(2.78, 47.24) = 186.2, p < .001. The interpretation is exactly the same as the Friedman’s ANOVA in rstatix. However, a disadvantage of the WRS2 package is that it does not offer an effect size for RMA.
Figure 14: Robust one-way repeated measures ANOVA using
WRS2
The WRS2 package also allows bootstrapped robust ANOVA and post hoc tests, as shown below. The overall null hypothesis is also rejected using this method. The bootstrapped robust RMA showed a significant effect of phase on score, as the test statistic (186.2) exceeded the critical value of 3.0, thus rejecting the null hypothesis. Note the interpretation of the post hoc tests is also similar to the rstatix analysis.
Figure 15.
Bootstrapped robust one-way repeated measures ANOVA using WRS2.
Figure 15.
Bootstrapped robust one-way repeated measures ANOVA using WRS2.
Discussion
R offers several options for one-way RMA analysis. In this tutorial, the rstatix package was used as it is a comprehensive package for parametric and nonparametric options. It offers descriptive statistics, parametric and nonparametric RMA, effect sizes, and post hoc tests. The afex package also offers parametric RMA and effect sizes. However, another package such as datawizard may be used to get descriptive statistics, and emmeans may be used for post hoc tests. The WRS2 package offers robust and bootstrapped RMA and post hoc tests, although it does not offer descriptive statistics or an effect size. The ARTool package may be used in tandem with the effectsize and emmeans packages to offer effect sizes and post hoc tests respectively.
Limitations
While four options are shown for RMA in this paper, it is not a comprehensive list of packages for this method. Users are encouraged to explore other packages at their discretion. This method is included in this tutorial, as it naturally progresses from the paired t-test, as independent ANOVA naturally progresses from the independent t-test. That said, RMA is not taught in all introductory courses, and instructors may choose to include or omit as they see fit. Multilevel models are an option for violations of the assumptions of relatedness and incomplete data. As this content is meant for an introductory course, these are not included.
Conclusions
R offers several options for RMA analysis and visualization. The rstatix package offers the most comprehensive option for parametric and nonparametric analyses. The afex package is a good option for parametric analysis, while WRS2 and ARTool offer options for violated assumptions. Readers are encouraged to practice using these packages and explore others at their own discretion.
References
- Ben-Shachar, M., Lüdecke, D., & Makowski, D. (2020). effectsize: Estimation of effect size indices and standardized parameters. Journal of Open Source Software, 5(56), 2815. [CrossRef]
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.
- Ellis, P. D. (2017). The essential guide to effect sizes : statistical power, meta-analysis and the interpretation of research results. Cambridge University Press.
- Field, A. (2013). Discovering statistics using IBM SPSS statistics (4th ed.). Sage Publications.
- Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. SAGE.
- Kassambara, A. (2025). rstatix: Pipe-friendly framework for basic statistical tests. R package version 0.7.2.
- Kay, M., Elkin, L., Higgins, J., & Wobbrock, J. (2025). ARTool: Aligned rank transformation for nonparametric factorial ANOVAs. [CrossRef]
- Lenth, R., & Piaskowski, J. (2025). Emmeans: Estimated marginal means, aka least-squares means. [CrossRef]
- Mair, P., & Wilcox, R. (2019). Robust statistical methods in R using the WRS2 package. Behavior Research Methods, 52. [CrossRef]
- Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach. Journal of Open Source Software, 6(61), 3167. [CrossRef]
- Patil, I., Makowski, D., Ben-Shachar, M. S., Wiernik, B. M., Bacher, E., & Ludecke, D. (2022). datawizard: An R Package for Easy Data Preparation and Statistical Transformations. Journal of Open Source Software, 7(78), 4684. [CrossRef]
- Posit (2025). RStudio Desktop. Posit. https://posit.co/download/rstudio-desktop/.
- R Core Team (2025). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
- Singmann H., Bolker B., Westfall J., Aust F., & Ben-Shachar, M. (2025). afex: Analysis of factorial experiments. R package version 1.5-0, https://github.com/singmann/afex.
- Wickham, H., Vaughan, D., & Girlich, M. (2024). tidyr: Tidy Messy Data [R package tidyr version 1.1.3] https://doi.org/10.32614/CRAN.package.tidyr. R package version 1.3.1. [CrossRef]
- Wobbrock, J. O., Findlater, L., Gergle, D., & Higgins, J. J. (2011). The aligned rank transform for nonparametric factorial analyses using only ANOVA procedures. Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems - CHI ’11. [CrossRef]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).