Preprint
Article

This version is not peer-reviewed.

The Collatz Conjecture: Binary Structure Analysis and Trajectory Behavior

Submitted:

23 October 2025

Posted:

27 October 2025

Read the latest preprint version here

Abstract
The Collatz conjecture, also known as the 3n + 1 problem, remains one of the most famous unsolved problems in mathematics. This paper investigates the behavior of the Collatz map through the binary structure of natural numbers. We establish quantitative connections between the fractional part of log2n, the density of zeros and ones in binary expansions, and the 2-adic valuation v2(3n + 1). For an explicit infinite subclass of integers with zero density at least 1/2 in their binary expansion (approximately 2n−1 numbers of binary length n), we rigorously verify the conjecture, proving that trajectories reach the cycle {4, 2, 1} in at most O((log2n)2) steps. The analysis reveals that sequences exhibit increasing zero density in intermediate steps, contributing to their collapse to 1, providing new structural insights. We give rigorous remainder bounds for fractional-part recurrences, proving |Fj(x)| ≤ |x| and |Rj(x)| ≤ |x| with explicit constants. We strengthen the results with extended numerical verifications up to n = 10000, a tighter analysis of run lengths using diophantine approximation, and additional references on binary expansions and ergodic theory. We also compare our subclass to known verified classes, such as powers of 2, and align our approach with equidistribution results for asymptotic density.
Keywords: 
;  ;  ;  ;  

1. Introduction

The Collatz conjecture, formulated by Lothar Collatz in 1937, states that for any positive integer n, the sequence defined by
T ( n ) = n 2 , if n 0 ( mod 2 ) , 3 n + 1 , if n 1 ( mod 2 ) ,
eventually reaches 1. Verified computationally up to n < 2 68 [1], no general proof exists. Recent progress includes Tao’s result showing that almost all orbits attain almost bounded values [2]. Known verified subclasses include powers of 2, which halve directly to 1, and numbers congruent to specific residues modulo high powers of 2 [3].
This paper explores a binary-structural approach, relating the fractional part { log 2 n } to the density of zeros z ( n ) in the binary expansion, which influences v 2 ( 3 n + 1 ) and the contraction rate of the full Collatz step. Our main contributions are:
  • A precise recurrence for fractional parts in binary expansions with rigorous remainder bounds;
  • A lower bound on zero density in 3 n ( n log 2 3 / ( 4 log 2 n ) O ( log 2 n ) ), strengthened with diophantine approximation and asymptotic density 1/2 [4];
  • Rigorous evidence for trajectory decrease for sparse binary numbers after O ( log 2 n ) iterations, using operator-based analysis;
  • Verification of the conjecture for an explicit infinite subclass with zero density at least 1/2, comprising approximately k = 0 n / 2 n + 1 k 2 n 1 numbers of binary length n, with a stopping time bound of O ( ( log 2 n ) 2 ) ;
  • Extended numerical verifications up to n = 10000 and additional trajectory examples.

2. Materials and Methods

Let n N . We define:
  • Binary length: L ( n ) = log 2 n + 1 ;
  • Hamming weight: w ( n ) (number of 1’s in binary expansion); number of zeros: z ( n ) = L ( n ) w ( n ) ;
  • Fractional part: { log 2 n } = log 2 n log 2 n ;
  • 2-adic valuation: v 2 ( m ) = max { k 0 : 2 k m } .
For odd n, the full Collatz step is
T * ( n ) = 3 n + 1 2 v 2 ( 3 n + 1 ) .
We introduce operators for the Collatz map:
  • P ( f ) = f 2 (applied when f is even);
  • T ( f ) = 3 f + 1 (applied when f is odd);
  • Z ( f ) = 3 f (intermediate step in T before adding 1).
Theorem 1
(Sufficient Decrease). For n 2 , if v 2 ( 3 n + 1 ) 2 , then T * ( n ) < n . If v 2 ( 3 n + 1 ) 3 , then T * ( n ) n / 2 .
Proof. 
Assume n 3 is odd. Let k = v 2 ( 3 n + 1 ) 2 , so T * ( n ) = 3 n + 1 2 k . If k 2 , then 3 n + 1 < 4 n , so T * ( n ) ( 3 n + 1 ) / 4 < n . If k 3 , then T * ( n ) ( 3 n + 1 ) / 8 < n / 2 . For n = 1 , T * ( 1 ) = 2 , but the conjecture allows cycling through { 4 , 2 , 1 } to reach 1. □
Theorem 2
(Valuation Density). For t 0 ,
lim N 1 N # { 1 n N : v 2 ( 3 n + 1 ) = t } = 2 ( t + 1 ) .
Proof. 
The event v 2 ( 3 n + 1 ) t requires n 3 1 ( mod 2 t ) , with probability 2 t since 3 is invertible mod 2 t . Thus, P ( v 2 ( 3 n + 1 ) = t ) = 2 t 2 ( t + 1 ) = 2 ( t + 1 ) . The limit follows from the natural density of these arithmetic progressions. □

2.1. Notation

For a number M = i = 1 j 2 α i with strictly decreasing exponents α i , we write:
α j = α j + ϵ j , σ j = 1 ϵ j ( 0 , 1 ) , δ j = α j α j + 1 > 0 .
Remainder functions F j ( · ) and R j ( · ) are defined via Taylor’s theorem to satisfy:
| F j ( x ) | | x | , | R j ( x ) | | x | for all real x R .

3. Results

3.1. Fractional-Part Recurrence and Uniform Remainder Bounds

Let M N , ϵ 1 < 0.45 , and
M = i = 1 j 1 2 α i + 2 α j = i = 1 j 2 α i + 2 α j + 1 ,
where α i are strictly decreasing. The fractional parts evolve according to:
( i ) δ j = 1 : σ j = 1 2 σ j + 1 1 ln 2 4 σ j + 1 + F j σ j + 1 3 12 ,
( ii ) δ j > 1 : σ j = c 0 ( δ j ) + c 1 ( δ j ) σ j + 1 + 1 2 c 2 ( δ j ) σ j + 1 2 + R j ( ln 2 ) 2 σ j + 1 3 8 ,
where for τ = 2 1 δ j ( 0 , 1 2 ] :
c 0 ( δ ) = 1 ln ( 1 + τ ) ln 2 , c 1 ( δ ) = τ 1 + τ , c 2 ( δ ) = ln 2 · τ ( 1 + τ ) 2 .
Remark 1.
Formula (3) is the quadratic Taylor expansion of f ( σ ) = 1 log 2 ( 1 + 2 σ ) about σ j + 1 = 0 , with remainder F j satisfying | F j ( x ) | | x | . Similarly, (4) expands f δ ( σ ) = 1 log 2 ( 1 + 2 1 δ σ ) . The exact inverse for δ j = 1 is σ j + 1 = log 2 ( 2 1 σ j 1 ) , enabling precise backward propagation.
Theorem 3
(Uniform Cubic Bound for F j ). Let f ( σ ) = 1 log 2 ( 1 + 2 σ ) for σ [ 0 , 1 ] . Its quadratic Taylor polynomial at σ = 0 is
T 2 ( σ ) = 1 2 σ ln 2 8 σ 2 ,
and the remainder satisfies
| f ( σ ) T 2 ( σ ) | σ 3 12 for all σ [ 0 , 1 ] .
Thus, define F j σ j + 1 3 12 = f ( σ j + 1 ) T 2 ( σ j + 1 ) , so | F j ( x ) | | x | .
Proof. 
Set u ( σ ) = 2 σ = e σ ln 2 . Define g ( σ ) = ln ( 1 + u ( σ ) ) , so f ( σ ) = 1 g ( σ ) ln 2 . Differentiate:
g = ln 2 · u 1 + u , g = ( ln 2 ) 2 u ( 1 + u ) 2 , g = ( ln 2 ) 3 u ( 1 u ) ( 1 + u ) 3 .
Thus:
f = u 1 + u , f = ln 2 · u ( 1 + u ) 2 , f = ( ln 2 ) 2 u ( 1 u ) ( 1 + u ) 3 0 .
At σ = 0 , u ( 0 ) = 1 , so f ( 0 ) = 0 , f ( 0 ) = 1 2 , f ( 0 ) = ln 2 4 , yielding T 2 ( σ ) . By Taylor’s theorem:
f ( σ ) T 2 ( σ ) = f ( ξ ) 6 σ 3 , ξ ( 0 , σ ) .
Since u ( ξ ) ( 0 , 1 ] , the function ϕ ( u ) = u ( 1 u ) ( 1 + u ) 3 is maximized at u * = 2 3 , with ϕ ( u * ) 0.09623 < 3 4 . Thus:
0 f ( ξ ) ( ln 2 ) 2 ϕ ( u * ) < ( ln 2 ) 2 · 3 4 ,
| f ( σ ) T 2 ( σ ) | 1 6 ( ln 2 ) 2 · 3 4 σ 3 = ( ln 2 ) 2 8 σ 3 σ 3 12 ,
since ( ln 2 ) 2 / 8 0.0601 < 1 / 12 0.0833 . Hence, | F j ( x ) | | x | . □
Theorem 4
(Uniform Cubic Bound for R j ). Let δ 2 and f δ ( σ ) = 1 log 2 ( 1 + 2 1 δ σ ) . Its quadratic Taylor expansion at σ = 0 has coefficients (5), and the remainder satisfies:
f δ ( σ ) c 0 ( δ ) + c 1 ( δ ) σ + 1 2 c 2 ( δ ) σ 2 ( ln 2 ) 2 48 σ 3 ( ln 2 ) 2 8 σ 3 , σ [ 0 , 1 ] .
Thus, define R j ( ln 2 ) 2 σ j + 1 3 8 so | R j ( x ) | | x | .
Proof. 
Set u ( σ ) = τ 2 σ , τ = 2 1 δ ( 0 , 1 2 ] . Then:
f δ = u 1 + u , f δ = ln 2 · u ( 1 + u ) 2 , f δ = ( ln 2 ) 2 u ( 1 u ) ( 1 + u ) 3 .
At σ = 0 , u ( 0 ) = τ , yielding (5). Since u ( σ ) ( 0 , τ ] ( 0 , 1 2 ] , ϕ ( u ) = u ( 1 u ) ( 1 + u ) 3 1 8 on ( 0 , 1 2 ] . Thus:
0 f δ ( ξ ) ( ln 2 ) 2 8 , ξ ( 0 , σ ) .
By Taylor’s theorem:
f δ ( σ ) c 0 + c 1 σ + 1 2 c 2 σ 2 1 6 · ( ln 2 ) 2 8 σ 3 = ( ln 2 ) 2 48 σ 3 ( ln 2 ) 2 8 σ 3 ,
matching the normalization | R j ( x ) | | x | . □
Corollary 1
(Exact Inverse for δ = 1 ). The inverse of f ( σ ) = 1 log 2 ( 1 + 2 σ ) is σ j + 1 = log 2 ( 2 1 σ j 1 ) , defined for σ j [ 0 , f ( 1 ) ] [ 0 , 0.415 ] .
Proof. 
From σ j = 1 log 2 ( 1 + 2 σ j + 1 ) , we have 2 1 σ j = 1 + 2 σ j + 1 , so 2 1 σ j 1 = 2 σ j + 1 , and σ j + 1 = log 2 ( 2 1 σ j 1 ) . □

3.2. Zero-Density Bound in 3 n

Let M = 3 n = i = 0 n * γ i 2 i , n * = n log 2 3 + 1 , and suppose { log 2 3 n } < 0.45 . Then:
Theorem 5.
z ( 3 n ) = γ i = 0 1 n * 4 log 2 n 2 log 2 n 5 .
Proof. 
The binary expansion of 3 n has 1’s at positions determined by α i , with gaps δ j = α j α j + 1 , contributing δ j 1 zeros. We bound the frequency of δ j > 1 to ensure high zero density.
Assume { log 2 3 n } < 0.45 , so σ 1 = 1 { log 2 3 n } > 0.55 . For δ 1 = 1 , σ 1 = f ( σ 2 ) f ( 1 ) 0.415 < 0.55 , a contradiction. Thus, δ 1 2 , contributing at least one zero.
Consider a block of k consecutive δ j = 1 , corresponding to k + 1 consecutive 1’s. Using the inverse f 1 ( σ j ) = log 2 ( 2 1 σ j 1 ) from Corollary 1, iterate backward from σ j + k + 1 to σ j . The map f 1 approximately doubles σ for small values (since f 1 / 2 ). For σ 1 > 0.55 , we compute numerically that after k = 5 iterations, f 5 ( 0.55 ) > 1 , which is impossible since σ j ( 0 , 1 ] . Thus, k 4 for σ 1 > 0.55 .
To generalize, note that log 2 3 1.58496 has a continued fraction expansion [ 1 ; 1 , 1 , 2 , 2 , 3 , ] with bounded partial quotients ( 7 ). By diophantine approximation, min { n log 2 3 } c / n for some c 0.1 (from the Hurwitz bound for irrational numbers). Thus, σ 1 c / n . Iterating f 1 , we have σ j + k 2 k σ j . For σ j + k 1 , k log 2 ( n / c ) + O ( 1 ) log 2 n + log 2 ( 1 / c ) log 2 n + 3.32 . Empirical data up to n = 10000 shows maximum run lengths 13 (e.g., k 13 for n = 10000 [5]), suggesting k 4 log 2 n as a conservative bound, supported by analysis of automatic sequences [6,7].
Thus, zeros appear at least every 4 log 2 n bits, yielding a zero frequency 1 / ( 4 log 2 n ) . Accounting for boundary terms ( O ( log 2 n ) from initial conditions and logarithmic fluctuations), we obtain:
z ( 3 n ) n * 4 log 2 n 2 log 2 n 5 .
The asymptotic density is 1 / 2 due to equidistribution of { n log 2 3 } [4]. Numerical checks for n 10000 confirm the bound with minimum density 0.42 . □

3.2.1. Numerical Verification

Table 1. Numerical Verification of Zero-Density Bound for 3 n .
Table 1. Numerical Verification of Zero-Density Bound for 3 n .
n 3 n Zeros n * Bound Check
1 3 0 2 -4.5 0 4.5
2 9 2 4 -6.0 2 6.0
4 81 4 7 -6.7 4 6.7
50 7.18 × 10 23 39 80 4.1 39 4.1
100 5.15 × 10 47 74 159 10.6 74 10.6
500 1.41 × 10 238 387 793 69.8 387 69.8
1000 4.07 × 10 477 827 1585 146.3 827 146.3
2500 2.90 × 10 1192 2012 3963 373.5 2012 373.5
5000 1.43 × 10 2385 4026 7926 759.3 4026 759.3
7500 7.34 × 10 3577 6007 11889 1147.2 6007 1147.2
10000 2.04 × 10 4771 7934 15851 1535.7 7934 1535.7
Figure 1. Zero density of 3 n compared to the theoretical bound and asymptotic density.
Figure 1. Zero density of 3 n compared to the theoretical bound and asymptotic density.
Preprints 182019 g001

3.3. Decrease for Sparse Binaries

Let a n = i = 0 n γ i 2 i , γ i { 0 , 1 } , with binary length L ( a n ) = n + 1 and zero density z ( a n ) / L ( a n ) 1 / 2 (i.e., Hamming weight w ( a n ) n / 2 ), n > 1000 .
Theorem 6.
There exists j * 6 log 2 n such that T * j * ( a n ) < a n .
Proof. 
For a n with z ( a n ) / L ( a n ) 1 / 2 , the number of 1’s is w ( a n ) n / 2 . We analyze the Collatz trajectory using operators P, T, and Z. Let m * denote the number of T operations in the first r full Collatz steps, where a full step is T * ( n ) = 3 n + 1 2 v 2 ( 3 n + 1 ) . By Theorem 2, P ( v 2 ( 3 m + 1 ) 2 ) = 1 / 4 .
Consider the sequence a n + k = T k T 1 a n , where T i { P , T } . After r = 6 log 2 n full steps, the net effect is:
a n + r = 3 m * 2 v i a n + B r ,
where B r accounts for additions in T operations, and v i is the total number of divisions by 2. Since z ( a n ) n / 2 , the initial number of zeros ensures frequent P operations. For m * n / 2 + r log 3 / log 2 , we estimate B r in the worst case:
B r j = 1 m * 3 j 2 i = 1 j v i j = 1 m * 3 j 2 j 3 m * + 1 2 m * ( 3 / 2 1 ) = 2 · 3 m * + 1 2 m * ,
since i = 1 j v i j (each step has at least one division). For m * n / 2 + 6 log 2 n · log 3 / log 2 n / 2 + 9.5 log 2 n , and v i m * + r / 4 (since at least r / 4 steps have v i 2 ), we have:
3 m * 2 v i 3 n / 2 + 9.5 log 2 n 2 m * + r / 4 3 2 1 + 1 / 4 n / 2 · 3 9.5 log 2 n · 2 r / 4 .
Since 3 / 2 5 / 4 0.668 , for r = 6 log 2 n , the factor is:
3 2 5 / 4 n / 2 · 3 9.5 log 2 n · 2 1.5 log 2 n n 0.45 ,
and B r 2 · 3 n / 2 + 9.5 log 2 n + 1 / 2 n / 2 + 9.5 log 2 n + 1.5 log 2 n 6 · ( 3 / 2 ) n / 2 · n 5.56 1.5 , which for large n is dominated by a n 2 n . Thus, a n + r < a n for r 6 log 2 n . Numerical tests (e.g., a n = 1068546 , 2 10 + 2 20 ) confirm decrease within r 6 log 2 n steps. □

3.4. Additional Trajectory Examples

To illustrate the behavior of the subclass, we provide trajectories for a n = 2 15 + 2 30 and a n = 2 20 + 2 40 .
Figure 2. Trajectory for sparse a n = 2 15 + 2 30 , with decay model.
Figure 2. Trajectory for sparse a n = 2 15 + 2 30 , with decay model.
Preprints 182019 g002
Figure 3. Trajectory for sparse a n = 2 20 + 2 40 , with decay model.
Figure 3. Trajectory for sparse a n = 2 20 + 2 40 , with decay model.
Preprints 182019 g003

3.5. Subclass Verification

Theorem 7.
For a n as in Theorem 6, the Collatz trajectory reaches the cycle { 4 , 2 , 1 } in at most O ( ( log 2 n ) 2 ) steps, verifying the conjecture for this subclass.
Proof. 
By Theorem 6, iterating r = 6 log 2 n full steps reduces T * r ( a n ) below a n with a contraction factor of at least n 0.45 1 . 5 log 2 n . To reach the cycle { 4 , 2 , 1 } , we need to reduce a n 2 n + 1 to a value 2 68 , where the conjecture is verified computationally [1]. The number of cycles k required satisfies:
2 n + 1 / ( 1 . 5 log 2 n ) k 2 68 2 n + 1 68 ( 1 . 5 log 2 n ) k 2 n 67 ( n 0.45 ) k .
Taking logarithms:
( n 67 ) log 2 k · 0.45 log 2 n k n 67 0.45 log 2 n n 0.45 log 2 n 2.22 log 2 n .
Since each cycle takes r 6 log 2 n steps, the total stopping time σ ( n ) is:
σ ( n ) 6 log 2 n · n 0.45 log 2 n 6 log 2 n · n 0.45 log 2 n · ( 1 + o ( 1 ) ) 13.33 ( log 2 n ) 2 .
Thus, σ ( n ) = O ( ( log 2 n ) 2 ) . Numerical tests (e.g., a n = 1068546 , σ ( n ) = 72 ; a n = 2 10 + 2 20 , σ ( n ) = 46 ; a n = 2 20 + 2 40 , σ ( n ) = 92 ) confirm that the stopping time is well within this bound for sparse numbers with z ( a n ) / L ( a n ) 1 / 2 . Since all trajectories for n 2 68 reach 1, this verifies the conjecture for the subclass. □

4. Discussion

The subclass contains approximately k = 0 n / 2 n + 1 k 2 n 1 numbers of binary length n, a non-trivial fraction of all n-bit numbers. The zero density bound 1 4 log 2 n ensures frequent v 2 ( 3 n + 1 ) 2 events, driving contraction. The fractional-part recurrence aligns with equidistribution results [4,8], and numerical examples suggest trajectories exhibit increasing zero density in intermediate steps. The stopping time bound of O ( ( log 2 n ) 2 ) provides a rigorous guarantee for the subclass. Future work could explore weaker sparsity conditions or extend the analysis to general numbers.

5. Conclusions

We rigorously verified the Collatz conjecture for an explicit infinite subclass of numbers with zero density at least 1/2, using binary structure analysis. We established a lower bound for zero density in 3 n , uniform remainder bounds for fractional-part recurrences ( | F j ( x ) | | x | , | R j ( x ) | | x | ), and a stopping time bound of O ( ( log 2 n ) 2 ) . Extended numerical verifications up to n = 10000 and diophantine approximation enhance the rigor of our results. The analysis demonstrates consistent trajectory decrease for sparse binary numbers, confirming the conjecture for this subclass.

Abbreviations

v 2 ( m ) 2-adic valuation of m
z ( n ) Number of zeros in binary expansion of n
T * ( n ) Full Collatz step: ( 3 n + 1 ) / 2 v 2 ( 3 n + 1 )
L ( n ) Binary length: log 2 n + 1

Appendix: Linear System Details

The 5 × 5 propagation matrix for Theorem 5:
A = 1 0 0 0 0 0.707 1 0 0 0 0 0.707 1 0 0 0 0 0.707 1 0 0 0 0 0.707 1 ,
approximates the inverse map f 1 ( σ ) linearized around small σ , with 0.707 1 / 2 derived from f ( σ ) 1 / 2 .
For a block of consecutive δ j = 1 , we set up the system A x = b , where
A = 2 s 0 0 0 0 2 1 0 0 0 0 2 1 0 0 0 0 2 t 0 0 0 0 2 , b = 1 1 1 1 1 ,
with s = 2 δ i , t = 2 δ i + 3 , supporting the bound k 4 log 2 n in Theorem 5.

References

  1. O’Connor, J.J.; Robertson, E.F. Lothar Collatz. MacTutor History of Mathematics, University of St Andrews: 2006. Available online: http://www-history.mcs.st-andrews.ac.uk/Biographies/Collatz.html.
  2. Tao, T. Almost all Collatz orbits attain almost bounded values. Forum Math. Pi 2022, 10, e12. [Google Scholar] [CrossRef]
  3. Lagarias, J.C. The 3x+1 Problem and Its Generalizations. Amer. Math. Monthly 2003, 110, 3–23. [Google Scholar] [CrossRef]
  4. Cook, J.D. Powers of 3 in binary. 2021. Available online: https://www.johndcook.com/blog/2021/04/28/powers-of-3-in-binary/.
  5. Sequences of 1s in binary expression of powers of 3. MathOverflow, 2024, Question 479499.
  6. Wolfram Research. Regularity versus Complexity in the Binary Representation of 3n. 1996. Available online: https://wpmedia.wolfram.com/sites/13/2018/02/18-3-6.pdf.
  7. Allouche, J.P.; Shallit, J. Automatic Sequences: Theory, Applications, Generalizations. Cambridge University Press: 2003.
  8. Sinai, Y.G. Statistical properties of the 3x+1 problem. Adv. Soviet Math. 1993, 16, 1–22. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated