Preprint
Article

This version is not peer-reviewed.

An Erdős-Révész Type Law for the Length of the Longest Match of Two Coin-Tossing Sequences

A peer-reviewed article of this preprint also exists.

Submitted:

15 November 2024

Posted:

18 November 2024

You are already at the latest version

Abstract
Consider a coin-tossing sequence, i.e., a sequence of independent variables taking values 0 and 1 with probability 1/2. The famous Erdős-Rényi (1970) law of large numbers implies that the longest run of ones in the first n observations has a length Rn that behaves like log2(n) as n tends to infinity. Erdős and Révész (1976) refined this result by giving a description of the Lévy upper and lower classes of the process Rn. In another direction, Arratia and Waterman (1985) extended the Erdős-Rényi result to the longest matching subsequence (with shifts) of two coin-tossing sequences, finding that it behaves asymptotically like 2log2(n). The present paper gives some Erdős-Révész-type results in this situation, obtaining a complete description of the upper classes and a partial result on the lower ones.
Keywords: 
;  ;  ;  

1. Introduction

Consider a coin-tossing sequence ( X n ) , i.e. a sequence of independent random variables satisfying P ( X n = 0 ) = P ( X n = 1 ) = 1 / 2 . Let R n be the length of the longest head-run, i.e., the largest integer r for which there is an i, 0 i n r , for which X i + j = 1 for j = 1 , r . A result of Erdős and Rényi [2] implies that
lim n R n log ( n ) = 1
(throughout this paper, log will denote base 2 logarithms. The notation log k will be used for its iterates: log 2 ( x ) = log ( log ( x ) ) , log k + 1 ( x ) = log ( log k ( x ) ) . Also C and c, with or without index, are used to denote generic constants that may have different values at each occurrence). The simple result (1) has seen a number of improvements. Erdős and Révész [3] gave a detailed description of the asymptotic behavior of R n . In order to formulate their result, let us recall
Definition 1
(Lévy classes). Let ( Y n ) be a sequence of random variables. We say that a sequence ( a n ) of real numbers belongs to
  • The upper-upper class of ( Y n ) ( U U C ( Y n ) ), if, with probability 1 as n , Y n a n eventually.
  • The upper-lower class of ( Y n ) ( U L C ( Y n ) ), if, with probability 1 as n , Y n > a n for infinitely many n.
  • The lower-upper class of ( Y n ) ( L U C ( Y n ) ), if, with probability 1 as n , Y n < a n for infinitely many n.
  • The lower-lower class of ( Y n ) ( U L C ( Y n ) ), if, with probability 1 as n , Y n a n eventually.
Of course, these definitions work best, if the sequence ( Y n ) obeys some zero-one law.
Their result is as follows:
Let ( a n ) be a nondecreasing integer sequence. Then
  • ( a n ) U U C ( R n ) if n 2 a n < ,
  • ( a n ) U L C ( R n ) if n 2 a n = ,
  • for any ϵ > 0 , a n = log ( n ) log 3 ( n ) + log 2 ( e ) 1 + ϵ L U C ( R n ) ,
  • for any ϵ > 0 , a n = log ( n ) log 3 ( n ) + log 2 ( e ) 2 ϵ L L C ( R n ) .
Arratia and Waterman [1] extend Erdős and Rényi’s result in another direction: they consider two independent coin-tossing sequences ( X n ) and ( Y n ) and look for the longest matching subsequences when shifting is allowed. Formally, let M ( n ) be the the largest integer m for which there are i , j with 0 i , j n m and X i + k = Y j + k for all k = 1 , , m . They prove that, with probability 1
lim n M n log ( n ) = 2 .
In the present paper, we will make this more precise by giving a description for the upper classes of ( M n ) and also some results on its lower classes:
Theorem 1.
Let ( a n ) be a nondecreasing integer sequence. We have
  • ( a n ) U U C ( M n ) if n n 2 a n < .
  • ( a n ) U L C ( M n ) if n n 2 a n = .
  • for some c, a n = 2 log ( n ) log 3 ( n ) + c L U C ( M n ) .
  • for some c, a n = 2 log ( n ) log 2 ( n ) l o g 3 ( n ) + c L L C ( M n ) .

2. Discussion

We leave the proof of Theorem 1 for later and rather discuss some of the concepts that are connected to this problem. One of them is the so-called independence principle: in many, though not all, situations, one may pretend that the waiting times until a given pattern of length l is observed have an exponential distribution with parameter 2 l , and that the waiting times for different patterns are independent. Móri [4] and Móri and Székely [6] give an account of this principle and its limitations. In our case, all results but the lower-lower class one are more or less in tune with this principle.
Another question that is closely related is that of the number N ( n , l ) of different length l subsequences of ( X 1 , , X n ) . This question doesn’t seem to have been touched by literature very much; one remarkable result by Móri [5] states that for l = log ( n ) log 2 ( n ) and n large enough, all 2 l possible patterns occur as subsequences of ( X 1 , , X n ) . The independence principle would suggest that N ( n , log ( n ) ) / n is bounded away from 0 with probability one, and this or even the easier N ( n , log ( n ) ) n ( log 2 ( n ) ) c eventually would serve to remove the double log term from the L L C result. Unfortunately, we are only able to get N ( n , log ( n ) ) c n / log ( n ) , which is also implied by Móri’s result.

3. Proofs

Proof of the upper-upper class result. 
Both upper class statements are fairly easy to prove. First observe that under our assumptions, the convergence of
n = 1 n 2 a n
is equivalent to that of
k = 1 n k 2 2 a n k
with n k = 2 k .
Now, define events
A k = [ M n k a n k 1 ] .
A k occurs if in one of the ( n k + 1 a n k 1 ) 2 pairs of sequences
( ( X i + 1 , , X i + a k 1 ) , ( ( Y j + 1 , , Y j + a k 1 ) )
both sequences agree. That gives the trivial upper bound
P ( A k ) n k 2 2 a n k 1 ,
so, by our assumptions, n P ( A k ) < , and the Borel-Cantelli lemma implies that, with probability 1, only finitely many events A k occur. Thus, for sufficiently large k, M n k a n k 1 , and for n k 1 n n k , we have
M n M n k a n k 1 a n .
This shows that ( a n ) U U C ( M n ) , as claimed. □
Proof of the upper-lower class result. 
We may assume without loss of generality that n 2 2 a n 1 / 4 .
Again, let n k = 2 k . We want to use the second Borel-Cantelli lemma, so we are defining independent events
A k = [ i , j : n k 1 < i , j n k : X i + s = Y j + s , s = 0 , , a n k 1 ]
This is the union of the events
B i j = = [ X i + s = Y j + s , s = 0 , , a n k 1 ]
with n k 1 < i , j n k . We endow the set of pairs ( i , j ) with the lexicographic order. For a subset I of the real numbers, Bonferroni’s inequality gives
P ( ( i , j ) I × I B i j ) ( i , j ) I × I P ( B i j ) ( i , j ) , ( i , j ) I × I , ( i , j ) < ( i , j ) P ( B i j B i j ) .
Let d ( ( i , j ) , ( i , j ) ) = max ( | i i | , | j j | ) . If d ( ( i , j ) , ( i , j ) ) a n k , then P ( B i j B i j ) = 2 2 a n k , otherwise P ( B i j B i j ) = 2 ( a n k + d ( ( i , j ) , ( i , j ) ) .
Setting I = { i : n k 1 < i n k : 2 | c } in (11) yields, after some calculation
P ( A k ) 1 48 n k 2 2 a n k ,
and k P ( A k ) = . Borel-Cantelli implies that, with probability 1, infinitely many events A k occur. Thus, for infinitely many k, M n k a n k , so ( a n ) U L C ( M n ) . □
For the lower class results, we first prove some lemmas:
Lemma 1.
For any c ( 0 , 1 ) , with probability 1 eventually
c n log ( n ) N ( n , log ( n ) ) n
Proof of Lemma 1. 
The lower part is a direct consequence of Móri’s result: this states that, for sufficiently large n N ( n , log ( n ) log 2 ( n ) ) = 2 log ( n ) log 2 ( n ) = n log ( n ) ( 1 + o ( 1 ) ) and, obviously N ( n , log ( n ) ) N ( n , log ( n ) log 2 ( n ) , as extending two different sequences from length log ( n ) log 2 ( n ) keeps them different; it can only happen that some of them are extended beyond index n. □
Lemma 2.
Let S be a set of m < 2 l sequences of length l < n , and let A be the event that none of the sequences in S occurs as a subsequence of ( X 1 , , X n ) . Assume n m l 2 2 2 l < γ < 1 . Then Then there are positive constants C 1 , C 2 , c 1 and c 2 (depending on γ such that
C 1 exp ( c 1 m n 2 l ) P ( A ) C 2 exp ( c 2 m n 2 l ) .
Proof of Lemma 2. 
The assumptions imply that we can find n such that both n m 2 l < 1 and n n l m 2 l < 1 . The probability that there is a sequence from S in a coin-tossing sequence of length n can be trivially bounded above by n m 2 l , and a Bonferroni-type argument like the one we have seen before shows that this is bounded below by c n m 2 l (with c depending on γ , of course). So, we get an upper bound for the probability of A from the probability that there is no sequence from S in any of the n / n Blocks of length n , amounting to
( 1 c n m 2 l ) n / n exp ( c n m 2 l ) .
For the lower bound, the probability that there is no sequence from S in any Block of length n cannot directly be used as a lower bound for the probability of A, as there may be sequences crossing the border between two blocks. We can fix this by subtracting the sum of the probabilities of a sequence crossing a border multiplied by the product of the probabilities of all blocks except the two adjacent two the border in question. Thus, the lower bound looks like
( 1 n m 2 l ) n / n 2 ( 1 n m 2 l ) 2 n n m l 2 l .
This can be bounded below by
C 1 exp ( c 1 n m 2 l )
as claimed. □
Proof of the lower-lower class result. 
Combining Lemmas 1 and 2, we get an "almost" upper bound for the probability (more to the point, an upper bound for the conditional probability with respect to the event that the number of different sequences of length a n among Y 1 , , Y n is at least c n / log ( n ) ) that the longest match is shorter than a n amounting to
C 1 exp ( c n 2 log ( n ) 2 a n ) .
Letting n = 2 k again, and substituting a n as defined in the theorem, choosing the constants appropriately we get an upper bound O ( k α ) with α > 1 , giving a convergent series again, so another appeal to Borel Cantelli finishes this part. □
Proof of the lower-upper class result. 
The main obstacle in this proof is the need to find independent or almost independent events that can be fed into an appropriate Borel-Cantelli lemma. We go for conditional independence. With an appropriate constant C, we start with a sufficiently large τ 0 and define τ k as the minimum of τ k 1 + C 2 k log ( k ) and the smallest n for which there is a match of length k between X 1 , , X n and Y τ k 1 + 1 , , Y n or between Y 1 , , Y n and X τ k 1 + 1 , , X n . We let A k denote the event that τ k τ k 1 > δ 2 k log ( k ) . By our construction, τ k = O ( 2 k log ( k ) ) , and we can use this upper bound as m in Lemma 2. This gives a lower bound exp ( c δ log ( k ) ) for P ( A k ) . By Borel-Cantelli, we conclude that, with probability one, infinitely many of the events A k occur. This puts us almost at our goal, the only possible problem is that there may be a match of length k crossing one of the boundaries τ k . The probability for this is easily bounded above by O ( k 2 k log ( k ) ) , and another Borel-Cantelli argument shows that eventually this does not happen. □

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Arratia, R; Waterman, S. An Erdős-Rényi law with shifts. Adv. Math. 1985, 55, 13–23. [CrossRef]
  2. Erdős, P.; Rényi, A. On a new law of large numbers. J. Analyse Math. 1970, 23, 103–111. [Google Scholar] [CrossRef]
  3. Erdős, P.; Révész, P. On the length of the longest head-run. Coll. Math. soc. J. Bolyai: Topics in Information Theory Csiszár, I., Elias, P., Eds.. 1976, 23, 219–228. [Google Scholar]
  4. Móri, T. Large deviation results for waiting times in repeated experiments. Acta Math. Hung. 1985, 45, 213–221. [Google Scholar] [CrossRef]
  5. Móri, T. On the waiting time till each of some given patterns occurs as a run. Probab. Th. Rel. Fields 1991, 87, 313–323. [Google Scholar] [CrossRef]
  6. Móri, T.; Székely, G. Asymptotic independence of pure head stopping times. Stat. Probabil. Lett. 1984, 2, 5–8. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated