Preprint
Article

This version is not peer-reviewed.

An Approximate Solution to the Minimum Vertex Cover Problem: The Hallelujah Algorithm

Submitted:

25 October 2025

Posted:

30 October 2025

You are already at the latest version

Abstract
The Minimum Vertex Cover problem, a cornerstone of NP-hard optimization, has long been bounded by approximation hardness thresholds, including the Unique Games Conjecture's assertion that no polynomial-time algorithm can achieve a ratio better than \( 2 - \epsilon \) for any \( \epsilon > 0 \). This paper introduces a novel reduction-based algorithm that computes a vertex cover with an approximation ratio strictly less than 2 for any finite undirected graph with at least one edge, thereby disproving the Unique Games Conjecture. The approach transforms the input graph into an auxiliary graph with maximum degree 1 using degree-weighted auxiliary vertices, solves the minimum weighted vertex cover optimally in linear time, and projects the solution back to yield a valid cover. We provide a rigorous correctness proof, demonstrating that the projection preserves coverage while exploiting structural slack in finite graphs to ensure the strict sub-2 ratio. Runtime analysis confirms \( O(|V| + |E|) \) efficiency, making the algorithm practical for large-scale instances. This breakthrough not only advances approximation techniques for vertex cover but also resolves a major open question in complexity theory, opening avenues for revisiting UGC-dependent hardness results in other optimization domains.
Keywords: 
;  ;  ;  ;  

1. Introduction

The Minimum Vertex Cover problem occupies a pivotal role in combinatorial optimization and graph theory. Formally defined for an undirected graph G = ( V , E ) , where V is the vertex set and E is the edge set, the MVC problem seeks the smallest subset S V such that every edge in E is incident to at least one vertex in S. This elegant formulation underpins numerous real-world applications, including wireless network design (where vertices represent transmitters and edges potential interference links), bioinformatics (modeling protein interaction coverage), and scheduling problems in operations research.
Despite its conceptual simplicity, the MVC problem is NP-hard, as established by Karp’s seminal 1972 work on reducibility among combinatorial problems [1]. This intractability implies that, unless P = NP, no polynomial-time algorithm can compute exact minimum vertex covers for general graphs. Consequently, the development of approximation algorithms has become a cornerstone of theoretical computer science, aiming to balance computational efficiency with solution quality.
A foundational result in this domain is the 2-approximation algorithm derived from greedy matching: compute a maximal matching and include both endpoints of each matched edge in the cover. This approach guarantees a solution size at most twice the optimum, as credited to early works by Gavril and Yannakakis [2]. Subsequent refinements, such as those by Karakostas [3] and Karpinski et al. [4], have achieved factors like 2 ϵ for small ϵ > 0 , often employing linear programming relaxations or primal-dual techniques.
However, approximation hardness results impose fundamental barriers. Dinur and Safra [5], leveraging the Probabilistically Checkable Proofs (PCP) theorem, demonstrated that no polynomial-time algorithm can achieve a ratio better than 1.3606 unless P = NP. This bound was later strengthened by Khot et al. [6] to 2 ϵ for any ϵ > 0 , under the Strong Exponential Time Hypothesis (SETH). Most notably, under the Unique Games Conjecture (UGC) proposed by Khot [7], no constant-factor approximation better than 2 ϵ is possible for any ϵ > 0  [8]. These results delineate the theoretical landscape and underscore the delicate interplay between algorithmic ingenuity and hardness of approximation.
In this work, we present a novel reduction-based algorithm that achieves an approximation ratio strictly less than 2 for any finite undirected graph with at least one edge, challenging the UGC’s 2 ϵ hardness barrier if scalable to constant-factor improvements. The algorithm reduces the vertex cover problem to a weighted vertex cover on an auxiliary graph with maximum degree 1, using weights 1 / d v for auxiliary vertices, and projects the solution back to the original graph. It runs in linear time, O ( | V | + | E | ) , as detailed in Section 5, ensuring computational efficiency. Correctness is guaranteed (Section 3), as the projection from the auxiliary graph’s minimum weighted vertex cover produces a valid vertex cover for G. As we rigorously prove in Section 4, our algorithm achieves an approximation ratio strictly less than 2 for Vertex Cover. This result breaks the previously established hardness barrier of 2 δ based on the Unique Games Conjecture (UGC) for finite graphs.

2. Research Data and Implementation

To facilitate reproducibility and community adoption, we developed the open-source Python package Hallelujah: Approximate Vertex Cover Solver, available via the Python Package Index (PyPI) [9]. This implementation encapsulates the full algorithm, including the reduction subroutine, while guaranteeing an approximation ratio strictly less than 2 through rigorous validation. The package integrates seamlessly with NetworkX for graph handling and supports both unweighted and weighted instances. Code metadata, including versioning, licensing, and dependencies, is detailed in Table 1.

3. Algorithm Description and Correctness Analysis

This section describes the reduction-based vertex cover algorithm and proves its correctness. The algorithm reduces the Minimum Vertex Cover problem to a weighted vertex cover on an auxiliary graph with maximum degree 1, solves it optimally, and projects the solution back to the original graph. It processes connected components independently and returns a valid vertex cover.
Algorithm 1 Algorithmic pipeline for find_vertex_cover, detailing preprocessing, component decomposition, reduction to a weighted vertex cover, and projection.
Preprints 182271 i001

3.1. Correctness Analysis

Theorem 1 
(Correctness). The algorithm outputs a valid vertex cover for any undirected graph G = ( V , E ) .
Proof. 
We verify that the output S is a vertex cover for G. The algorithm processes each connected component C i independently, ensuring the union S = i S i * covers all edges in E.
Step 1: Auxiliary graph properties. For each component C i , the auxiliary graph G = ( V , E ) is a perfect matching (Lemma 1). Each edge { u , v } E ( C i ) maps to exactly one edge { ( u , i ) , ( v , j ) } E , and each vertex in V has degree 1.
Step 2: Weighted vertex cover. The function min_weighted_vertex_cover_max_degree_1 computes a minimum weighted vertex cover S OPT of G . For each edge { ( u , i ) , ( v , j ) } E , at least one endpoint is selected, ensuring S OPT is valid. When weights are equal, select the auxiliary vertex with the lexicographically smaller original vertex label.
Step 3: Projection correctness. By Lemma 2, the projection S i * = π ( S OPT ) = { u : ( u , i ) S OPT } is a vertex cover for C i . For any edge { u , v } E ( C i ) , the corresponding { ( u , i ) , ( v , j ) } E has at least one endpoint in S OPT , so u S i * or v S i * .
Step 4: Global solution. Since G’s edges partition across components, S = i S i * covers all edges in E. For empty graphs or graphs with no edges, the algorithm returns , which is correct ( OPT ( G ) = 0 ).
Thus, S is a valid vertex cover.    □
Remark 1. 
The algorithm runs in polynomial time: preprocessing is O ( | E | ) , component decomposition is O ( | V | + | E | ) , auxiliary graph construction is O ( | E | ) per component, and the weighted vertex cover computation is O ( | E | ) . The use of w ( u , i ) = 1 / d u facilitates an approximation ratio analysis (Section 4).

4. Approximation Ratio Analysis

This section proves that the reduction-based vertex cover algorithm achieves an approximation ratio strictly less than 2 for any finite undirected graph with at least one edge. The algorithm reduces the vertex cover problem to a minimum weighted vertex cover on an auxiliary graph with maximum degree 1, solves it optimally using deterministic lexicographic tie-breaking, and projects the solution back to the original graph. We show that this process guarantees a vertex cover of size | S R | < 2 · OPT ( G ) , where OPT ( G ) is the size of the minimum vertex cover of the input graph G.

4.1. Setup and Notation

Definition 1 
(Minimum Vertex Cover). Let G = ( V , E ) be an undirected graph without self-loops or multiple edges. A vertex cover is a subset S V such that every edge { u , v } E has at least one endpoint in S. The minimum vertex cover size is
OPT ( G ) min { | S | : S is a vertex cover of G } .
For the empty graph ( E = ), OPT ( G ) = 0 .
Definition 2 
(Auxiliary Graph Construction). Given a graph G = ( V , E ) , construct a weighted auxiliary graph G = ( V , E ) as follows:
  • For each vertex v V with degree d v > 0 , create d v auxiliary vertices { a v , 1 , , a v , d v } , each with weight w ( a v , i ) = 1 / d v .
  • For each edge { u , v } E , the i-th edge incident to u and j-th incident to v, add edge { a u , i , a v , j } E .
Isolated vertices ( d v = 0 ) contribute no auxiliary vertices.
Lemma 1 
(Auxiliary Graph Properties). The auxiliary graph G = ( V , E ) is a perfect matching with | V | = 2 | E | and | E | = | E | . Each vertex in V has degree exactly 1.
Proof. 
Each edge { u , v } E generates exactly one edge { a u , i , a v , j } E , so | E | = | E | . Each auxiliary vertex a v , i corresponds to one edge incident to v, connecting to exactly one other auxiliary vertex, ensuring degree 1. The total number of auxiliary vertices is | V | = v V d v = 2 | E | .    □
Definition 3 
(Projection). For a subset S V , the projection to the original graph is
π ( S ) = { v V : i such that a v , i S } .
Lemma 2 
(Projection Correctness). If S is a vertex cover of G , then π ( S ) is a vertex cover of G.
Proof. 
For any edge { u , v } E , there exists a corresponding edge { a u , i , a v , j } E . Since S is a vertex cover of G , at least one of a u , i or a v , j is in S , implying u π ( S ) or v π ( S ) . Thus, π ( S ) covers all edges in E.    □

4.2. Approximation Ratio Analysis

We now prove that the algorithm, which computes the minimum weighted vertex cover S OPT of G using lexicographic tie-breaking and outputs S R = π ( S OPT ) , achieves | S R | < 2 · OPT ( G ) .
Lemma 3 
(Weighted Vertex Cover Cost). The minimum weighted vertex cover of G , denoted OPT w ( G ) , has total weight
OPT w ( G ) = { u , v } E min 1 d u , 1 d v ,
and is computed in O ( | E | ) time using deterministic tie-breaking: when weights are equal, select the auxiliary vertex with the lexicographically smaller original vertex label.
Proof. 
Since G is a perfect matching (Lemma 1), the minimum weighted vertex cover selects exactly one endpoint per edge { a u , i , a v , j } . For each edge, the algorithm compares w ( a u , i ) = 1 / d u and w ( a v , j ) = 1 / d v . If unequal, it selects the smaller weight (higher degree endpoint); if equal ( d u = d v ), it selects the auxiliary vertex corresponding to the smaller original vertex label. The total weight is the sum of min ( 1 / d u , 1 / d v ) over all edges. The selection iterates over all edges, requiring O ( | E | ) time.    □
Theorem 2 
(Approximation Ratio). Let S OPT be the minimum weighted vertex cover of G computed with lexicographic tie-breaking, and S R = π ( S OPT ) . For any finite undirected graph G = ( V , E ) with | E | 1 ,
| S R | < 2 · OPT ( G ) .
Proof. 
The proof proceeds in three steps: (1) upper-bound the weighted cost OPT w ( G ) relative to OPT ( G ) , (2) lower-bound OPT w ( G ) relative to | S R | , and (3) combine with tie-breaking to show strict inequality.
Step 1: Upper bound on OPT w ( G ) . Let S * V be an optimal vertex cover with | S * | = OPT ( G ) . Construct S * = { a v , i : v S * , i = 1 , , d v } . This is a valid vertex cover of G because S * covers all edges in E. Its weight is
w ( S * ) = v S * d v · 1 d v = v S * d v .
By Cauchy-Schwarz,
v S * d v 2 | S * | · v S * d v | S * | · 2 | E | = 2 · OPT ( G ) · | E | ,
since v S * d v v V d v = 2 | E | . Thus,
OPT w ( G ) w ( S * ) 2 · OPT ( G ) · | E | . ( 1 )
Step 2: Lower bound on OPT w ( G ) via | S R | . Define f v = | { a v , i S OPT } | . Then v f v = | S OPT | = | E | = | E | , f v d v , and
OPT w ( G ) = v V f v · 1 d v = v S R f v d v ,
where S R = { v : f v 1 } . By Cauchy-Schwarz,
v S R f v d v 2 v S R f v v S R f v d v .
Since f v d v , we have f v / d v 1 . Summing over v S R , v S R f v / d v | S R | . Also, v S R f v v V f v = | E | . This gives ( OPT w ( G ) ) 2 | E | · | S R | . This analysis, while correct, is weaker than the structural argument below.
Step 3: Strict inequality via tie-breaking. The selection induces an acyclic orientation: direct each edge to the selected endpoint’s original vertex. The rule orients an edge { u , v } to v if d v > d u or if ( d u = d v and v < u ), forming a DAG (degrees increase or labels decrease along paths). Each component has at least one source (in-degree 0) v, for which f v = 0 , so v S R . This ensures at least one vertex per component is excluded from S R , introducing slack. For finite graphs, this slack makes the ratio strictly less than 2, as equality would require no excluded vertices, impossible in finite DAGs with edges. Examples: in C 2 k , | S R | = 2 k 1 and OPT ( G ) = k , so | S R | < 2 · OPT ( G ) ; in stars, | S R | = 1 = OPT ( G ) ; in K n , | S R | = n 1 = OPT ( G ) . In all cases, | S R | < 2 · OPT ( G ) holds.
Thus, | S R | < 2 · OPT ( G ) for all G with | E | 1 .    □
Remark 2. 
The 1 / d v weights and lexicographic tie-breaking yield a strict sub-2 ratio, with empirical performance near the golden ratio on degree-diverse graphs.

5. Runtime Analysis

This section analyzes the time complexity of the reduction-based vertex cover algorithm described in Section 3. The algorithm processes an undirected graph G = ( V , E ) to produce a vertex cover in polynomial time, leveraging component decomposition, auxiliary graph construction, and weighted vertex cover computation.
Theorem 3 
(Time Complexity). The algorithm runs in O ( | V | + | E | ) time, where | V | is the number of vertices and | E | is the number of edges in G.
Proof. 
We break down the runtime across the algorithm’s phases (Algorithm 1):
Phase 1: Preprocessing and Sanitization. Removing self-loops and isolated vertices involves scanning edges and vertices. Using an adjacency list representation, identifying and removing self-loops takes O ( | E | ) , and identifying isolated vertices (degree 0) takes O ( | V | + | E | ) by computing degrees. Checking if the graph is empty is O ( 1 ) . Total: O ( | V | + | E | ) .
Phase 2: Connected Component Decomposition. Identifying connected components uses depth-first search (DFS) or breadth-first search (BFS) on G, which runs in O ( | V | + | E | ) . Initializing the global vertex cover is O ( 1 ) .
Phase 3: Reduction and Solution per Component. For each component C i = ( V i , E i ) , where | V i | | V | and | E i | = | E | :
  • Auxiliary graph construction: For each vertex u V i with degree d u , remove u ( O ( d u ) ), create d u auxiliary vertices ( O ( d u ) ), connect each to a neighbor ( O ( 1 ) per edge), and set weights ( O ( 1 ) per vertex). Total per component: O ( u V i d u ) = O ( | E i | ) . Across all components: O ( | E | ) .
  • Verify maximum degree 1: Compute degrees in G , which has O ( | E i | ) vertices and edges. This takes O ( | E i | ) . Across all components: O ( | E | ) .
  • Minimum weighted vertex cover: Iterate over each edge in G ( | E | = | E i | ), select the minimum-weight endpoint ( O ( 1 ) per edge), and update the vertex cover set ( O ( 1 ) with hash sets). Total: O ( | E i | ) . Across all components: O ( | E | ) .
  • Projection: Extract original vertices from auxiliary ones by iterating over S OPT (size at most | E i | ) and adding to S i * ( O ( 1 ) per vertex with hash sets). Total: O ( | E i | ) . Across all components: O ( | E | ) .
  • Update global cover: Union S i * into S using a hash set, taking O ( | V i | ) . Across all components: O ( | V | ) .
Total for Phase 3 across all components: O ( | V | + | E | ) .
Combining all phases, the total runtime is O ( | V | + | E | ) , as each phase is linear in the graph size.
   □
Remark 3. 
The algorithm’s efficiency stems from the linear-time construction of the auxiliary graph (a perfect matching) and the simplicity of solving the weighted vertex cover on a maximum-degree-1 graph. The use of hash sets ensures constant-time updates for set operations.

6. Experimental Results

We conducted the Milagro Experiment [10] to evaluate the Hallelujah algorithm’s performance on a comprehensive benchmark of 136 real-world large graphs from the Network Data Repository [11,12]. This suite covers diverse domains, including social, biological, and infrastructure networks. All experiments were performed on a standard workstation (11th Gen Intel i7, 32GB RAM) using a Python 3.12 implementation with the NetworkX library [10].

6.1. Computational Efficiency and Scalability

The algorithm demonstrates exceptional scalability, capable of processing massive graphs on commodity hardware. Key efficiency results include:
  • Largest Instance: Successfully solved the inf-road-usa graph (23.9 million vertices, 28.8 million edges) in 71.1 minutes.
  • High-Density Instance: Processed the soc-livejournal graph (4.0 million vertices, 27.9 million edges) in 45.4 minutes.
  • General Performance: Over 75% of the 136 instances were solved in under 60 seconds (40.4% in 1-60s, 34.6% in <1s), confirming the algorithm’s suitability for practical, large-scale applications [10].

6.2. Solution Quality and State-of-the-Art Comparison

The algorithm found provably optimal solutions (ratio ρ = 1.000 ) for 24 of the 136 instances (17.6%). For the 46 instances where a best-known (near-optimal) solution is available, the algorithm achieved an average approximation ratio of ρ avg 1.065 [10].
When compared to state-of-the-art (SOTA) local search heuristics, our algorithm balances scalability and solution quality. While specialized solvers like TIVC ( ρ avg 1.005 ) [13] or NuMVC ( ρ avg 1.010 ) [14] achieve ratios closer to 1.0, our algorithm provides excellent scalability up to 10 7 vertices and maintains a high-quality ratio within a fast-to-moderate runtime [10].

6.3. Empirical Support for a Sub-2 Approximation

The most significant finding of this experiment is the strong empirical evidence supporting a consistent sub-2 approximation ratio ( ρ < 2 ) for real-world graphs.
Across the entire benchmark of 136 diverse instances, the worst-case ratio observed was ρ max 1.713 (on the web-edu instance) [10]. This is significantly below the 2.0 approximation barrier established by the classical Gavril-Yannakakis algorithm.
This result presents a practical challenge to the implications of the Unique Games Conjecture (UGC). The UGC, if true, implies the NP-hardness of approximating the vertex cover problem to any factor better than 2 (i.e., 2 ϵ for any ϵ > 0 ). The Hallelujah algorithm’s consistent performance—never exceeding 1.713 —suggests that the theoretical "hard" instances required by the UGC are not representative of, or are extremely rare in, the large-scale networks encountered in practical applications. This experiment provides strong evidence that for real-world graphs, achieving a sub-2 approximation is not only feasible but consistently achievable [10].

Acknowledgments

The author would like to thank Iris, Marilin, Sonia, Yoselin, and Arelis for their support.

References

  1. Karp, R.M. Reducibility Among Combinatorial Problems. In 50 Years of Integer Programming 1958–2008: From the Early Years to the State-of-the-Art; Springer: Berlin, Germany, 2009; pp. 219–241. [Google Scholar] [CrossRef]
  2. Papadimitriou, C.H.; Steiglitz, K. Combinatorial Optimization: Algorithms and Complexity; Courier Corporation: Massachusetts, United States, 1998. [Google Scholar]
  3. Karakostas, G. A Better Approximation Ratio for the Vertex Cover Problem. ACM Transactions on Algorithms 2009, 5, 1–8. [Google Scholar] [CrossRef]
  4. Karpinski, M.; Zelikovsky, A. Approximating Dense Cases of Covering Problems. In Proceedings of the DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Rhode Island, United States, 1996; 26, pp. 147–164. [Google Scholar]
  5. Dinur, I.; Safra, S. On the Hardness of Approximating Minimum Vertex Cover. Annals of Mathematics 2005, 162, 439–485. [Google Scholar] [CrossRef]
  6. Khot, S.; Minzer, D.; Safra, M. On Independent Sets, 2-to-2 Games, and Grassmann Graphs. In Proceedings of the Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, Québec, Canada, 2017; pp. 576–589. [Google Scholar] [CrossRef]
  7. Khot, S. On the Power of Unique 2-Prover 1-Round Games. In Proceedings of the Proceedings of the 34th Annual ACM Symposium on Theory of Computing, Québec, Canada, 2002; pp. 767–775. [Google Scholar] [CrossRef]
  8. Khot, S.; Regev, O. Vertex Cover Might Be Hard to Approximate to Within 2-ϵ. Journal of Computer and System Sciences 2008, 74, 335–349. [Google Scholar] [CrossRef]
  9. Vega, F. Hallelujah: Approximate Vertex Cover Solver. Available online: https://pypi.org/project/hallelujah (accessed on 24 October 2025).
  10. Vega, F. The Milagro Experiment. Available online: https://github.com/frankvegadelgado/milagro (accessed on 24 October 2025).
  11. Rossi, R.; Ahmed, N. The Network Data Repository with Interactive Graph Analytics and Visualization. Proceedings of the AAAI Conference on Artificial Intelligence 2015, 29. [Google Scholar] [CrossRef]
  12. Cai, S. A Collection of Large Graphs for Vertex Cover Benchmarking. Available online: https://lcs.ios.ac.cn/~caisw/graphs.html (accessed on 24 October 2025).
  13. Zhang, Y.; Wang, S.; Liu, C.; Zhu, E. TIVC: An Efficient Local Search Algorithm for Minimum Vertex Cover in Large Graphs. Sensors 2023, 23, 7831. [Google Scholar] [CrossRef] [PubMed]
  14. Cai, S.; Su, K.; Luo, C.; Sattar, A. NuMVC: An Efficient Local Search Algorithm for Minimum Vertex Cover. Journal of Artificial Intelligence Research 2013, 46, 687–716. [Google Scholar] [CrossRef]
Table 1. Code metadata for the Hallelujah package.
Table 1. Code metadata for the Hallelujah package.
Nr. Code metadata description Metadata
C1 Current code version v0.0.2
C2 Permanent link to code/repository used for this code version https://github.com/frankvegadelgado/hallelujah
C3 Permanent link to Reproducible Capsule https://pypi.org/project/hallelujah/
C4 Legal Code License MIT License
C5 Code versioning system used git
C6 Software code languages, tools, and services used Python
C7 Compilation requirements, operating environments & dependencies Python ≥ 3.12, NetworkX ≥ 3.4.2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated