An Erdős-Révész Type Law for the Length of the Longest Match of Two Coin-Tossing Sequences

Karl Grill

doi:10.20944/preprints202411.1254.v1

Submitted:

15 November 2024

Posted:

18 November 2024

You are already at the latest version

Abstract

Consider a coin-tossing sequence, i.e., a sequence of independent variables taking values 0 and 1 with probability 1/2. The famous Erdős-Rényi (1970) law of large numbers implies that the longest run of ones in the first n observations has a length Rn that behaves like log2(n) as n tends to infinity. Erdős and Révész (1976) refined this result by giving a description of the Lévy upper and lower classes of the process Rn. In another direction, Arratia and Waterman (1985) extended the Erdős-Rényi result to the longest matching subsequence (with shifts) of two coin-tossing sequences, finding that it behaves asymptotically like 2log2(n). The present paper gives some Erdős-Révész-type results in this situation, obtaining a complete description of the upper classes and a partial result on the lower ones.

Keywords:

coin-tossing

;

runs

;

matching subsequences

;

strong asymptotics

Subject:

Computer Science and Mathematics - Probability and Statistics

MSC: 60F15

1. Introduction

Consider a coin-tossing sequence

(X_{n})

, i.e. a sequence of independent random variables satisfying

P (X_{n} = 0) = P (X_{n} = 1) = 1 / 2

. Let

R_{n}

be the length of the longest head-run, i.e., the largest integer r for which there is an i,

0 \leq i \leq n - r

, for which

X_{i + j} = 1

for

j = 1, \dots r

. A result of Erdős and Rényi [2] implies that

lim_{n \to \infty} \frac{R_{n}}{log (n)} = 1

(1)

(throughout this paper, log will denote base 2 logarithms. The notation

{log}_{k}

will be used for its iterates:

{log}_{2} (x) = log (log (x))

,

{log}_{k + 1} (x) = log ({log}_{k} (x))

. Also C and c, with or without index, are used to denote generic constants that may have different values at each occurrence). The simple result (1) has seen a number of improvements. Erdős and Révész [3] gave a detailed description of the asymptotic behavior of

R_{n}

. In order to formulate their result, let us recall

Definition 1

(Lévy classes). Let

(Y_{n})

be a sequence of random variables. We say that a sequence

(a_{n})

of real numbers belongs to

The upper-upper class of $(Y_{n})$ ( $U U C (Y_{n})$ ), if, with probability 1 as $n \to \infty$ , $Y_{n} \leq a_{n}$ eventually.
The upper-lower class of $(Y_{n})$ ( $U L C (Y_{n})$ ), if, with probability 1 as $n \to \infty$ , $Y_{n} > a_{n}$ for infinitely many n.
The lower-upper class of $(Y_{n})$ ( $L U C (Y_{n})$ ), if, with probability 1 as $n \to \infty$ , $Y_{n} < a_{n}$ for infinitely many n.
The lower-lower class of $(Y_{n})$ ( $U L C (Y_{n})$ ), if, with probability 1 as $n \to \infty$ , $Y_{n} \geq a_{n}$ eventually.

Of course, these definitions work best, if the sequence

(Y_{n})

obeys some zero-one law.

Their result is as follows:

Let

(a_{n})

be a nondecreasing integer sequence. Then

$(a_{n}) \in U U C (R_{n})$ if $\sum_{n} 2^{- a_{n}} < \infty$ ,
$(a_{n}) \in U L C (R_{n})$ if $\sum_{n} 2^{- a_{n}} = \infty$ ,
for any $ϵ > 0$ , $a_{n} = ⌊ log (n) - {log}_{3} (n) + {log}_{2} (e) - 1 + ϵ ⌋ \in L U C (R_{n})$ ,
for any $ϵ > 0$ , $a_{n} = ⌊ log (n) - {log}_{3} (n) + {log}_{2} (e) - 2 - ϵ ⌋ \in L L C (R_{n})$ .

Arratia and Waterman [1] extend Erdős and Rényi’s result in another direction: they consider two independent coin-tossing sequences

(X_{n})

and

(Y_{n})

and look for the longest matching subsequences when shifting is allowed. Formally, let

M (n)

be the the largest integer m for which there are

i, j

with

0 \leq i, j \leq n - m

and

X_{i + k} = Y_{j + k}

for all

k = 1, \dots, m

. They prove that, with probability 1

lim_{n \to \infty} \frac{M_{n}}{log (n)} = 2 .

(2)

In the present paper, we will make this more precise by giving a description for the upper classes of

(M_{n})

and also some results on its lower classes:

Theorem 1.

Let

(a_{n})

be a nondecreasing integer sequence. We have

$(a_{n}) \in U U C (M_{n})$ if $\sum_{n} n 2^{- a_{n}} < \infty$ .
$(a_{n}) \in U L C (M_{n})$ if $\sum_{n} n 2^{- a_{n}} = \infty$ .
for some c, $a_{n} = ⌊ 2 log (n) - {log}_{3} (n) + c ⌋ \in L U C (M_{n})$ .
for some c, $a_{n} = ⌊ 2 log (n) - {log}_{2} (n) - l o g_{3} (n) + c ⌋ \in L L C (M_{n})$ .

2. Discussion

We leave the proof of Theorem 1 for later and rather discuss some of the concepts that are connected to this problem. One of them is the so-called independence principle: in many, though not all, situations, one may pretend that the waiting times until a given pattern of length l is observed have an exponential distribution with parameter

2^{- l}

, and that the waiting times for different patterns are independent. Móri [4] and Móri and Székely [6] give an account of this principle and its limitations. In our case, all results but the lower-lower class one are more or less in tune with this principle.

Another question that is closely related is that of the number

N (n, l)

of different length l subsequences of

(X_{1}, \dots, X_{n})

. This question doesn’t seem to have been touched by literature very much; one remarkable result by Móri [5] states that for

l = ⌈ log (n) - {log}_{2} (n) ⌉

and n large enough, all

2^{l}

possible patterns occur as subsequences of

(X_{1}, \dots, X_{n})

. The independence principle would suggest that

N (n, log (n)) / n

is bounded away from 0 with probability one, and this or even the easier

N (n, log (n)) \geq n {({log}_{2} (n))}^{- c}

eventually would serve to remove the double log term from the

L L C

result. Unfortunately, we are only able to get

N (n, log (n)) \geq c n / log (n)

, which is also implied by Móri’s result.

3. Proofs

Proof of the upper-upper class result.

Both upper class statements are fairly easy to prove. First observe that under our assumptions, the convergence of

\sum_{n = 1}^{\infty} n 2^{- a_{n}}

(3)

is equivalent to that of

\sum_{k = 1}^{\infty} n_{k}^{2} 2^{- a_{n_{k}}}

(4)

with

n_{k} = 2^{k}

.

Now, define events

A_{k} = [M_{n_{k}} \geq a_{n_{k - 1}}] .

(5)

A_{k}

occurs if in one of the

{(n_{k} + 1 - a_{n_{k - 1}})}^{2}

pairs of sequences

((X_{i + 1}, \dots, X_{i + a_{k - 1}}), ((Y_{j + 1}, \dots, Y_{j + a_{k - 1}}))

(6)

both sequences agree. That gives the trivial upper bound

P (A_{k}) \leq n_{k}^{2} 2^{- a_{n_{k - 1}}},

(7)

so, by our assumptions,

\sum_{n} P (A_{k}) < \infty

, and the Borel-Cantelli lemma implies that, with probability 1, only finitely many events

A_{k}

occur. Thus, for sufficiently large k,

M_{n_{k}} \leq a_{n_{k - 1}}

, and for

n_{k - 1} \leq n \leq n_{k}

, we have

M_{n} \leq M_{n_{k}} \leq a_{n_{k - 1}} \leq a_{n} .

(8)

This shows that

(a_{n}) \in U U C (M_{n})

, as claimed. □

Proof of the upper-lower class result.

We may assume without loss of generality that

n^{2} 2^{- a_{n}} \leq 1 / 4

.

Again, let

n_{k} = 2^{k}

. We want to use the second Borel-Cantelli lemma, so we are defining independent events

A_{k} = [\exists i, j : n_{k - 1} < i, j \leq n_{k} : X_{i + s} = Y_{j + s}, s = 0, \dots, a_{n_{k}} - 1]

(9)

This is the union of the events

B_{i j} = = [X_{i + s} = Y_{j + s}, s = 0, \dots, a_{n_{k}} - 1]

(10)

with

n_{k - 1} < i, j \leq n_{k}

. We endow the set of pairs

(i, j)

with the lexicographic order. For a subset I of the real numbers, Bonferroni’s inequality gives

P (⋃_{(i, j) \in I \times I} B_{i j}) \geq \sum_{(i, j) \in I \times I} P (B_{i j}) - \sum_{(i, j), (i^{'}, j^{'}) \in I \times I, (i, j) < (i^{'}, j^{'})} P (B_{i j} \cap B_{i^{'} j^{'}}) .

(11)

Let

d ((i, j), (i^{'}, j^{'})) = max (| i - i^{'} |, | j - j^{'} |)

. If

d ((i, j), (i^{'}, j^{'})) \geq a_{n_{k}}

, then

P (B_{i j} \cap B_{i^{'} j^{'}}) = 2^{- 2 a_{n_{k}}}

, otherwise

P (B_{i j} \cap B_{i^{'} j^{'}}) = 2^{- (a_{n_{k}} + d ((i, j), (i^{'}, j^{'}))}

.

Setting

I = {i : n_{k - 1} < i \leq n_{k} : 2 | c}

in (11) yields, after some calculation

P (A_{k}) \geq \frac{1}{48} n_{k}^{2} 2^{- a_{n_{k}}},

(12)

and

\sum_{k} P (A_{k}) = \infty

. Borel-Cantelli implies that, with probability 1, infinitely many events

A_{k}

occur. Thus, for infinitely many k,

M_{n_{k}} \geq a_{n_{k}}

, so

(a_{n}) \in U L C (M_{n})

. □

For the lower class results, we first prove some lemmas:

Lemma 1.

For any

c \in (0, 1)

, with probability 1 eventually

c \frac{n}{log (n)} \leq N (n, log (n)) \leq n

(13)

Proof of Lemma 1.

The lower part is a direct consequence of Móri’s result: this states that, for sufficiently large n

N (n, ⌊ log (n) - {log}_{2} (n) ⌋) = 2^{⌊ log (n) - {log}_{2} (n) ⌋} = \frac{n}{log (n)} (1 + o (1))

and, obviously

N (n, log (n)) \geq N (n, log (n) - {log}_{2} (n)

, as extending two different sequences from length

log (n) - {log}_{2} (n)

keeps them different; it can only happen that some of them are extended beyond index n. □

Lemma 2.

Let S be a set of

m < 2^{l}

sequences of length

l < n

, and let A be the event that none of the sequences in S occurs as a subsequence of

(X_{1}, \dots, X_{n})

. Assume

n m l^{2} 2^{- 2 l} < γ < 1

. Then Then there are positive constants

C_{1}

,

C_{2}

,

c_{1}

and

c_{2}

(depending on γ such that

C_{1} exp (- c_{1} m n 2^{- l}) \leq P (A) \leq C_{2} exp (- c_{2} m n 2^{- l}) .

(14)

Proof of Lemma 2.

The assumptions imply that we can find

n^{'}

such that both

n^{'} m 2^{- l} < 1

and

\frac{n}{n^{'}} l m 2^{- l} < 1

. The probability that there is a sequence from S in a coin-tossing sequence of length

n^{'}

can be trivially bounded above by

n^{'} m 2^{- l}

, and a Bonferroni-type argument like the one we have seen before shows that this is bounded below by

c n^{'} m 2^{- l}

(with c depending on

γ

, of course). So, we get an upper bound for the probability of A from the probability that there is no sequence from S in any of the

n / n^{'}

Blocks of length

n^{'}

, amounting to

{(1 - c n^{'} m 2^{- l})}^{n / n^{'}} \leq exp (- c n m 2^{- l}) .

(15)

For the lower bound, the probability that there is no sequence from S in any Block of length

n^{'}

cannot directly be used as a lower bound for the probability of A, as there may be sequences crossing the border between two blocks. We can fix this by subtracting the sum of the probabilities of a sequence crossing a border multiplied by the product of the probabilities of all blocks except the two adjacent two the border in question. Thus, the lower bound looks like

{(1 - n^{'} m 2^{- l})}^{n / n^{'} - 2} ({(1 - n^{'} m 2^{- l})}^{2} - \frac{n}{n^{'}} m l 2^{- l}) .

(16)

This can be bounded below by

C_{1} exp (- c_{1} n m 2^{- l})

(17)

as claimed. □

Proof of the lower-lower class result.

Combining Lemmas 1 and 2, we get an "almost" upper bound for the probability (more to the point, an upper bound for the conditional probability with respect to the event that the number of different sequences of length

a_{n}

among

Y_{1}, \dots, Y_{n}

is at least

c n / log (n)

) that the longest match is shorter than

a_{n}

amounting to

C_{1} exp (- c \frac{n^{2}}{log (n)} 2^{- a_{n}}) .

(18)

Letting

n = 2^{k}

again, and substituting

a_{n}

as defined in the theorem, choosing the constants appropriately we get an upper bound

O (k^{- α})

with

α > 1

, giving a convergent series again, so another appeal to Borel Cantelli finishes this part. □

Proof of the lower-upper class result.

The main obstacle in this proof is the need to find independent or almost independent events that can be fed into an appropriate Borel-Cantelli lemma. We go for conditional independence. With an appropriate constant C, we start with a sufficiently large

τ_{0}

and define

τ_{k}

as the minimum of

τ_{k - 1} + C \sqrt{2^{k} log (k)}

and the smallest n for which there is a match of length k between

X_{1}, \dots, X_{n}

and

Y_{τ_{k - 1} + 1}, \dots, Y_{n}

or between

Y_{1}, \dots, Y_{n}

and

X_{τ_{k - 1} + 1}, \dots, X_{n}

. We let

A_{k}

denote the event that

τ_{k} - τ_{k - 1} > δ \sqrt{2^{k} log (k)}

. By our construction,

τ_{k} = O (\sqrt{2^{k} log (k)})

, and we can use this upper bound as m in Lemma 2. This gives a lower bound

exp (- c δ log (k))

for

P (A_{k})

. By Borel-Cantelli, we conclude that, with probability one, infinitely many of the events

A_{k}

occur. This puts us almost at our goal, the only possible problem is that there may be a match of length k crossing one of the boundaries

τ_{k}

. The probability for this is easily bounded above by

O (k \sqrt{2^{- k} log (k)})

, and another Borel-Cantelli argument shows that eventually this does not happen. □

Conflicts of Interest

The authors declare no conflicts of interest.

References

Arratia, R; Waterman, S. An Erdős-Rényi law with shifts. Adv. Math. 1985, 55, 13–23. [CrossRef]
Erdős, P.; Rényi, A. On a new law of large numbers. J. Analyse Math. 1970, 23, 103–111. [Google Scholar] [CrossRef]
Erdős, P.; Révész, P. On the length of the longest head-run. Coll. Math. soc. J. Bolyai: Topics in Information Theory Csiszár, I., Elias, P., Eds.. 1976, 23, 219–228. [Google Scholar]
Móri, T. Large deviation results for waiting times in repeated experiments. Acta Math. Hung. 1985, 45, 213–221. [Google Scholar] [CrossRef]
Móri, T. On the waiting time till each of some given patterns occurs as a run. Probab. Th. Rel. Fields 1991, 87, 313–323. [Google Scholar] [CrossRef]
Móri, T.; Székely, G. Asymptotic independence of pure head stopping times. Stat. Probabil. Lett. 1984, 2, 5–8. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

An Erdős-Révész Type Law for the Length of the Longest Match of Two Coin-Tossing Sequences

Abstract

Keywords:

Subject:

1. Introduction

2. Discussion

3. Proofs

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe