We have argued that IRBs resemble other regulatory bodies that can generate unintended consequences when the costs they impose are borne by others, and when the benefits of their interventions are difficult to observe or verify directly. This echoes prior work has examined IRBs through the lens of public choice and bureaucratic behaviour, most notably Zywicki (2007). Building on this perspective, we consider how these costs propagate through the research ecosystem to shape the kinds of knowledge that are ultimately produced. We argue that compliance costs directly reduce research activity, uneven burdens distort the composition of research, and self-reinforcing institutional dynamics extend IRBs’ reach and entrench their influence. This affects how findings are produced, the methodologies that produce them, and which actors are best positioned to participate in the system. Most importantly, IRBs shape not only what research is conducted, but also what research never happens.
All this implies that the consequences of IRB expansion cannot be adequately assessed by focusing solely on whether identifiable (or imagined) harms were prevented in particular cases. The real question is not simply whether IRBs prevent harm, but how they alter the balance between the risks of conducting research and the risks of not conducting it. To answer this question, research is needed. A review of 52 studies on IRBs concluded that, although there is evidence that IRBs can impede research, current data of this is insufficient to quantify for the purposes of IRB policy reform (Silberman & Kahn, 2011). The following sections outline a set of empirical predictions that follow from this framework and consider potential strategies for mitigating these costs as they become clearer.
3.2. Evidence and Implications
One potential problem with the decisions of HRECs is that the criterion they ultimately rest on is what seems plausible to the IRB panellists in domains where harms are counterfactual and feedback is structurally weak. But perfectly plausible beliefs about complex processes can be wrong because such things can be equally compatible with many plausible beliefs until more facts are known. Like the presumptions of IRBs, our focus on process characteristics may likewise seem plausible, and its utility must hinge on the degree to which it describes reality. Distinct patterns are predicted related to the effects of costs, distortions, decisions, and selection effects described by our framework.
For example, if IRB oversight increases the cost of conducting research, then fields subject to greater regulatory intensity should exhibit lower knowledge output. This could be measured in terms of research quantity and quality, such as fewer replications, and longer time to completion, publications and citation rates. These effects should be observable using variation in IRB intensity across time, institutions, and domains. Quasi-experimental designs such as difference-in-differences could compare trends in output between areas that experienced substantial increases in oversight and those that did not. These effects may be particularly evident in domains where independent replication is already resource-intensive, amplifying the impact of additional compliance costs.
The decline in perceived quality of academic research has become apparent in the trends of increasing decoupling between industry with academia. For instance, Springer Nature (2017) reported that pharmaceutical firms have halved their average scientific publication output (from 29 to 12 papers per firm annually) while reducing basic research funding shares from 26% to 22% of total R&D (1980–2006), increasingly outsourcing discovery to selective academic collaborations rather than broad support. UK industry-academia links surveys reveal a 33% drop in industrial placements alongside shifts toward targeted postdoctoral work, reflecting pharma's pivot from early-stage academic partnerships. Such trends amplify IRB-induced cost inflation by eroding external demand for academic outputs, further contracting research ecosystems. Anecdotally, a major reason for this boils down to declines in perceived replicability of findings in academic research. Surveys of traditional research industry partners could be combined with thematic analysis to help determine if the reasons for this sort of decline in demand for university-led research are connected to the mechanisms adumbrated above.
Within scholarly fields we predict distortions in the composition of research as a function of growing IRB intensiveness in the production of knowledge. The limited data at hand again heightens our concerns. Vazirani et al. (2024), for example, found that among surgical studies, randomized controlled trials and studies undergoing full board review took substantially longer to obtain IRB approval than less complex designs. A review also found that papers with more controversial protocols attracted relatively more revisions and longer waiting times from IRBs (Silberman & Kahn, 2011). More generally, we expect that designs that are costlier to get approved — such as experimental or intervention-based studies, and those pioneering more novel methodologies — to become relatively less common than lower-friction alternatives. This can be tested by comparing the prevalence of experimental versus correlational studies, or of harder-to-approve versus easier-to-approve samples, across fields with differing levels of IRB burden.
For instance, just as we predict Ethics Creep to lead to an increase in substitution of correlational for experimental designs, we would also predict (along with Rice, 2011) researchers to increasingly switch to the use of animal models instead of human participants. This is because HRECs are often separate to IRBs overseeing studies using animal research, and so differences in their efficiencies may mean it is easier to get approval from the latter. Comparisons between vertebrate and invertebrate research also offers a potentially informative case, as the degree of oversight differs markedly across these types of research. All types of substitution should be most pronounced where oversight regimes are administratively simpler or more predictable, rather than simply being less stringent. If observed shifts reflect general scientific progress, similar trends should appear across both; if they reflect regulatory cost pressures, they should be more pronounced where oversight burdens are greater.
Another testable implication concerns the statistical properties of published findings, particularly those associated with the replication crisis. If increased compliance costs reduce the likelihood of independent replication while simultaneously raising the stakes associated with producing publishable results, then fields subject to greater IRB burden should exhibit stronger or earlier signatures of selective reporting and result inflation. These may include increased evidence of publication bias (e.g., funnel plot asymmetry), excess significance relative to statistical power, or patterns consistent with “p-hacking”. Importantly, this prediction is again comparative: such indicators should increase more sharply in domains where ethical oversight has expanded most, relative to those where constraints have remained comparatively stable. By linking institutional features of research governance to well-established metrics of research reliability, this approach provides a further avenue for evaluating whether IRBs influence not only the volume and composition of research, but also its credibility. This would be consistent with a reduction in the disciplining role of replication and an increased reliance on statistical thresholds as proxies for evidentiary strength.
The kinds of inconsistency and apparent arbitrariness reported anecdotally by Schrag (2011) are also precisely what would be expected of a review system insulated from feedback and operating under locally evolving precedent. The picture painted of a mosaic landscape in which IRB practices become increasingly “rudderless and inefficient” (Zywicki, 2012) implies that we should observe low consistency and increasing divergence in review outcomes. If IRBs operate under limited feedback and locally evolving precedents, identical protocols should receive materially different decisions across institutions. This prediction could be tested by presenting the same proposals to different review panels. A further implication is that this variation should be smaller among people no longer embedded in active IRB structures, providing a way to separate institutional from individual sources of judgment.
Current evidence hints that this prediction will be borne out. Hirshon et al. (2002) examined a single, low-risk emergency medicine protocol submitted to multiple IRBs. They found marked variability in risk classification, review requirements, and demanded modifications across institutions. Because the protocol itself was held constant, this dispersion cannot plausibly be attributed to differences in study design or participant population. Instead, it is consistent with a decision-making environment characterized by weak feedback, locally evolving precedent, and limited mechanisms for cross-institutional calibration. Similarly, Gonsalus et al (2007) compared the time taken by three different IRBs within the same city to review studies with the same, identical low-risk protocols and found decision time varied from 10 and 77 days. Another study compared decisions by IRBs as to whether studies should be considered exempt from the need for review and found only 75 per cent agreement across IRBs (with 16.7 per cent of studies that were considered by some to be fully exempt forced into full review) (Tsan & Van Hook, 2022). While in theory consistency may be maintained by oversight by additional layers of bureaucracy aimed at establishing best practice, in reality this is unlikely given the scale of increase in IRB operations: In the United States the less than one per cent of IRBs under the auspices of the Department of Health and Human Services are inspected annually (U.S. Government Accountability Office [GAO], 2023). Based on the current political-economic perspective, such variability is not merely an administrative inconvenience, but an almost inevitable outcome. This contrasts with what might be expected in a well-functioning regulatory system, where best scientific practice with regards to “safety” should be universal rather than reflecting local aesthetic.
As with other political economic analyses of how regulatory bodies can diminish competition (e.g., Sowell, 1980; Stigler, 1975), the framework predicts selection and concentration effects within the research ecosystem. Higher fixed compliance costs should favour larger and more established research groups while discouraging entry by smaller or newer teams. These dynamics should be observable in patterns of authorship, funding allocation, and collaboration networks, particularly where new partnerships introduce approval risk or administrative delay. Over time, such effects should increase concentration and reduce the diversity of research approaches. One aspect of this relates to the predicted consequences on the researcher characteristics. We expect Ethics Creep to have added to increasing bureaucratic burdens generally and suspected to have caused disproportionate number of high-talent early career researchers to exit academia by diverting their research time towards compliance, hindering innovation and favouring those most tolerant of administrative loads than creative talent (as has been argued by Eysenck, 1995).
The systemic effects considered above may also have psychological ramifications that can also be tested for. Measures of constructs like creativity, intellectual independence, and fluid intelligence of researchers exiting different fields of research could be compared to those who remain to see if our narrative about self-sorting holds up. It could be interesting to see if the timing of indices of Ethics Creep within a field could predict traits like these of individuals being on the margin. Such traits are predicted to be both positively correlated with outstanding scientific achievement and negatively correlated with conformist rule following (Eysenck, 1995; 1997). More surreptitious research designs may be required to explore our other corollaries, such as the prediction that interactions with IRBs can encourages the cultivation of dishonest tendencies. We note there is already some evidence for this (e.g., Kieth-Spiegel & Koocher, 2005) suggesting that these dynamics deserve to be better understood. Similarly, anecdotes (e.g., Foster, 2015a; 2015b) suggest that our hypothesis that the power of HREC presents panellists with temptations to abuse their power is one worthy of serious study.
Finally, our framework implies that the effects of IRBs may be identified more directly through exogenous changes in regulatory intensity and through behavioural experiments that simulate review environments. Manipulations of administrative burden or review stringency could be used to examine how researchers adjust their design choices, collaboration strategies, and willingness to pursue particular topics under different constraint regimes. In other words, while much of the above predictions measure the impacts of IRBs on decision making ex post, these experiments we may get an idea of the extent they shape decisions ex ante, such as through researchers altering their designs as a function of experience because they internalize their local IRB board’s risk aversion.
Taken together, these predictions provide multiple avenues for assessing whether IRBs influence not only the conduct of individual studies, but also the structure and direction of scientific knowledge production. This is critical because many of the impacts we predict will not be obvious to individual researchers and so underlines the need for steps taken to mitigate the impacts of Ethics Creep to be calibrated on evidence rather than perceptions. If we can identify the ways in which IRBs distort research, we replace ignorance with uncertainty that is at least tractable. Even a malfunctioning compass can be useful if we know how it is wrong while a compass whose error pattern is unknown is much less helpful. In the same way, understanding how IRBs affect different fields can improve both the interpretation of existing findings and the design of future research.