Possibly Wrong — GeistHaus

Comments on: What Every Experimenter Must Know About Randomization

possiblywrong May 18, 2026

Introduction T. Kelly writes in [1] about “What Every Experimenter Must Know About Randomization.” In a running example scenario throughout the paper, Kelly describes a hypothetical poultry farmer who, wondering whether exposing his turkeys to television will fatten them up, … Continue reading →

Show full content

Introduction

T. Kelly writes in [1] about “What Every Experimenter Must Know About Randomization.” In a running example scenario throughout the paper, Kelly describes a hypothetical poultry farmer who, wondering whether exposing his turkeys to television will fatten them up, conducts a test using $N$ turkey subjects, from which $N/2$ are randomly selected to receive the TV treatment, with the remaining $N/2$ control turkeys getting no screen time.

How should the farmer randomize the assignment of turkeys to the treatment and control groups? The main point of the paper– very subtly emphasized in underlined, bold italics– is to “never use a PRNG [pseudo-random number generator] for random assignment.“

I don’t really agree with this guidance at all. I think the arguments presented in its favor are flawed, or at least potentially misleading, on practical, mathematical, and even philosophical grounds. But the paper does present at least one very interesting and subtle observation that I think deserves highlighting and clarification.

Balls into bins

To make the case against using a pseudo-random number generator, Kelly provides a C program, balls_into_bins.c, that simulates the random partition of $N=32$ turkeys into two groups of 16, with one group receiving the treatment and the other being the control group. If we start with all 32 turkeys arranged in a fixed initial order, then uniformly randomly shuffle them, we can designate the leftmost 16 turkeys in the treatment group, and the rightmost 16 turkeys in the control group.

(We don’t actually have to shuffle all 32 turkeys; once we have selected the treatment group– in a uniformly random order– we don’t need to continue rearranging the control turkeys. That is, Kelly effectively implements the first 16 iterations of the mirror_shuffle version of the Fisher-Yates shuffle.)

Kelly’s program then iterates over all $2^{32}$ possible seeds of the C standard library random number generator via srand(seed), executing this simulated random assignment once for each seed. The idea is to show the distribution of possible assignments, assuming that our farmer conducts the experiment by selecting a “true random” value used as the seed for the PRNG to then (pseudo)randomly shuffle the turkeys to select an assignment.

For a 32-bit seed and $N=32$ turkeys, these numbers are small enough that we can manage both the time to simulate each seed (i.e., each “ball”), as well as the memory to track all ${32 \choose 16}=601,080,390$ possible treatment assignments (i.e., “bins”), recording how many times each assignment is selected (i.e., how many balls end up in each bin). Kelly’s results are shown in red in the figure below, indicating the number of bins containing a given number of balls.

For example, the red point at (28,1) indicates that there was a single assignment that was generated 28 times by each of 28 different random seeds. The red point at (0,474467), on the other hand, indicates that 474,467 assignments were never generated, never appearing even once among all $2^{32}$ possible seeded treatment selections. As Kelly writes:

The resulting distribution of balls into bins will be far from uniform: Some bins end up with more balls than others, and some bins end up empty— even, perhaps surprisingly, when the number of balls is considerably larger than the number of bins.

That is, there are $2^{32}/{32 \choose 16} \approx 7.1$ times as many balls as bins, so that the presumably ideal behavior would instead be to distribute the balls as evenly as possible, so that every one of the possible assignments would be selected 7 or 8 times.

Comparing generators

There is a useful point here, which we’ll get to shortly. But first, it wasn’t clear to me whether Kelly is arguing that the non-uniformity illustrated above is because we are using a PRNG. That is certainly not the case… but at the same time, some PRNGs are better than others, and at first glance it seemed like a rather disingenuous straw man to use the one PRNG that most everyone agrees shouldn’t be used “for serious random number generation needs,” namely the one in the C standard library.

For example, how do things look if we try this experiment on Windows, using the MSVCRT/UCRT implementation of rand()? This took some work, since Kelly’s code does not compile due to lines like this:

static_assert(RAND_MAX == INT_MAX); // would be very weird otherwise

This assertion is, well, weird. Not only is there no language standard guarantee that these constants be equal– and on Windows, they aren’t– neither is there anything in the code that depends on this assertion, so it’s unclear why it’s there in the first place.

The program also uses the stdc_count_ones function defined in the stdbit.h header, which is new enough that it may not be available depending on how up to date your environment is, on either Linux or Windows. But here as well, we don’t really need this function, or even an equivalent replacement, to get the job done.

So to get things working, I wrote balls_into_bins.cpp (code is on GitHub, where I’ve included Kelly’s original balls_into_bins.c for side-by-side comparison), with the same behavior, but which (1) works on both Linux and Windows, and (2) allows us to select which of several random number generation approaches we want to evaluate:

The Linux GLIBC pseudo-random generator that reproduces Kelly’s original results exactly.
The Windows MSVCRT/UCRT pseudo-random generator.
The “true” random RDSEED hardware-based generator using the _rdseed64_step function.

The hardware random number generator is understandably much slower; on my laptop it produces only a little over half a million 64-bit words per second. But eventually we see the output shown in blue above, with “non-uniform” behavior very similar to the GLIBC PRNG.

In other words, we should expect to see some treatment assignments selected more often than others. And we should expect to see some assignments not selected at all; with $s=2^{32}$ seeds (“balls”) and $r={32 \choose 16}$ possible assignments (“bins”), the expected number of empty bins is

$r(1-\frac{1}{r})^s \approx r e^{-\frac{s}{r}} \approx 473936$

which matches pretty closely with the 474,467 empty bins using the GLIBC generator, and the 473,435 empty bins from my particular output using RDSEED.

However, there is clearly something different going on with the Windows UCRT implementation, shown in green above. One interesting “feature” of the UCRT behavior is that we only see even numbers. That is, each assignment (“bin”) is always randomly generated an even number of times– possibly zero times, or twice, or four times, etc., but never exactly once, nor exactly thrice, etc. (Obligatory puzzle: why?)

Repeatability

So what, exactly, is the point being made here? Certainly a poor PRNG like the Windows UCRT generator can cause unintended bias. But the GLIBC PRNG is better behaved, at least from the perspective of the balls-into-bins experiment described here. For example, for both the GLIBC output as well as the “true” random RDSEED output, a chi-squared test would fail to reject the null hypothesis that the selected assignments constitute a truly (uniformly) random sample.

However, there is a key difference between the two generators, namely their repeatability. That is, although it’s true that the RDSEED hardware random generator “misses” nearly half a million possible turkey treatment assignments in the above output, if we were to run the program again with the RDSEED generator, we would once again fail to select roughly half a million assignments… but it would be a different subset of missed assignments. But with the GLIBC PRNG, it is the same persistent subset of exactly 474,467 specific assignments that will never be selected, no matter what seed we use, no matter how many times we run the program. For any one of those persistently-missed assignments, the probability that the farmer selects it for his experiment should ideally be $1/{32 \choose 16}$ , but instead it’s zero.

“True” vs. pseudo-random numbers

I think it’s an interesting question when, and just how much, this matters in any practical sense for many real experimental scenarios. There are certainly scenarios where it can matter— but in this running example of the farmer experimenting on his turkeys, it’s less clear to me what actual problem might arise from what effectively amounts to bootstrapping from a single prior uniformly sampled multiset from the population of treatment assignments.

But if we did want to “fix” our farmer’s methodology, there are at least a couple of simple ways to do so. For example, Kelly’s setup assumes that the farmer starts by generating 32 “truly” random bits with which to seed a pseudo-random number generator. Maybe those seed bits are obtained by repeatedly flipping a fair coin; but if we’re willing to assume access to 32 “truly random” bits to seed the PRNG, then why subsequently bother with the PRNG at all? As Kelly points out, we only need $\log_2 {32 \choose 16} \approx 29.2$ bits of entropy, so we need on average at most 2 more, or at most 32 coin flips (on average), to select a random integer and unrank it to determine the corresponding assignment, rather than wasting a lot more coin flips Fisher-Yates shuffling turkeys around.

The above alternative abandons pseudo-random numbers altogether, and only uses a “true” random source of numbers to generate an assignment. I’ve been putting “true” in quotes here, since Kelly never defines or explains exactly what is meant by “true” randomness. It doesn’t seem to require anything quantum mechanical or otherwise theoretical, for example; indeed, the requirements seem pretty lax:

Random assignment does not require a computer. Physical coins or urns are reasonable options sanctioned by many statisticians today, for the same reasons that casinos and lotteries continue to use dice, roulette wheels, and other mechanical contraptions.

Kelly is even okay with “re-using” randomness for an experiment, as long as it’s “unrelated” to that past use:

Regarding “canned” entropy (e.g., from an old-fashioned table of random numbers), I’m aware of no reason why such entropy would lose its usefulness merely from being recorded and stored for a period of time. Similarly, regarding “recycled” entropy that has already been used for purposes unrelated to the one at hand, I’m aware of no reason why such entropy would be any less useful than if it had not been used before for any purpose.

This is what I find confusing and contradictory. A not-completely-terrible intuitive description of a PRNG is “an old-fashioned table of random numbers”… but one that is so large that we never need to re-use any of it. That is, for example, suppose that when we complete our undergraduate statistics education, we seed a pseudo-random number generator, one time, and use that single stream of random numbers for the duration of our lifetime, or at least for our entire career as an experimenter. When we need some random numbers, we extract them, then store the subsequent generator state on disk. When we need some more random numbers, we restore the generator state from disk, extract some more random numbers, rinse and repeat.

If our experiments are “slow” like testing turkeys, and not “fast” like computer simulations, then such a single pseudo-random stream is plenty long enough to support more experiments on more test subjects than our farmer can possibly conduct in their lifetime. And if we are worried, as Kelly writes, that “the outcome is a foregone conclusion, completely devoid of uncertainty, no matter how random the sequence might appear,” then why is an old-fashioned published book of random numbers acceptable?

Reference:

Kelly, T., What Every Experimenter Must Know About Randomization, ACM Queue Magazine, 23(6), November/December 2025. [PDF]

http://possiblywrong.wordpress.com/?p=5309

Extensions

Analysis of Biscuits dice game

possiblywrong Feb 24, 2026

Introduction In the dice game “Biscuits,” a player rolls 12d6+1d8+1d10+1d12, then selects one or more of the rolled dice to set aside, rolling the remainder again, continuing this process until eventually all 15 dice have been set aside. The player’s … Continue reading →

Show full content

Introduction

In the dice game “Biscuits,” a player rolls 12d6+1d8+1d10+1d12, then selects one or more of the rolled dice to set aside, rolling the remainder again, continuing this process until eventually all 15 dice have been set aside. The player’s score is the sum of the final values shown on the dice, with the highest score winning the game.

A rolled six on a d6– or an eight on the d8, or generally the maximum value on a die– is referred to as a biscuit… and as the rules suggest, “you want biscuits and lots of them!”, where “sometimes you’ll roll several biscuits all at once, you can set them all aside.”

You can set them all aside… but should you? That is, suppose that you roll sixes on multiple d6s at once, but relatively small values for the d8, d10, and/or d12. Should you set aside all of those biscuits, banking the highest possible score for those dice, or is it worth it to “give up” some of that guaranteed score in exchange for more future opportunities to roll better scores for the d8, d10, and d12?

Results

Not to beat around the bush, the latter turns out to be the case. Strategy in this game appears to be pretty interesting; let’s suppose that our objective is to maximize our expected total score (more on this later). Python code is available on GitHub to evaluate all possible moves that we might make from any given roll. For a most extreme example situation, suppose that on the initial roll we get biscuits for all 12d6, but we roll 1 for each of the other dice:

>>> analyze(((0, 0, 0, 0, 0, 12),
             (1, 0, 0, 0, 0, 0, 0, 0),
             (1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
             (1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)))
 93.836 points by keeping d6 (6 ), 
 93.761 points by keeping d6 (6 6 ), 
 93.691 points by keeping d6 (6 6 6 ), 
 93.625 points by keeping d6 (6 6 6 6 ), 
 93.565 points by keeping d6 (6 6 6 6 6 ), 
 93.509 points by keeping d6 (6 6 6 6 6 6 ), 
 93.462 points by keeping d6 (6 6 6 6 6 6 6 ), 
 93.449 points by keeping d6 (6 6 6 6 6 6 6 6 6 6 6 6 ), 
 93.427 points by keeping d6 (6 6 6 6 6 6 6 6 6 6 6 ), 
 93.426 points by keeping d6 (6 6 6 6 6 6 6 6 ), 
 93.407 points by keeping d6 (6 6 6 6 6 6 6 6 6 ), 
 93.407 points by keeping d6 (6 6 6 6 6 6 6 6 6 6 ), 
 87.292 points by keeping d6 (6 6 6 6 6 6 6 6 6 6 6 6 ), d8 (1 ), 
 87.167 points by keeping d6 (6 6 6 6 6 6 6 6 6 6 6 ), d8 (1 ), 
 87.064 points by keeping d8 (1 ), 
 87.037 points by keeping d6 (6 6 6 6 6 6 6 6 6 6 ), d8 (1 ), 
...

Optimal strategy is to keep just a single d6, sacrificing all 11 other biscuits in exchange for more future rolls to hopefully improve the contribution from the d8, d10, and d12. Even more interesting is that the expected total score isn’t monotonic in how many biscuits we might choose to set aside: keeping one d6 is best, and keeping all 12 is worse… but not as bad as keeping just 10 of them.

The figure below shows the distribution of possible total scores using this strategy, sampled from 10,000 simulated games.

Distribution of final total score in Biscuits using expectation-maximizing strategy, with expected score of about 93.9 (out of 102 possible) shown in red.

I was surprised at how well this strategy performs, with an expected score of about 93.9 (shown in red), and in a few simulated games even reaching the maximum possible score of 102.

Implementation

As discussed previously here, we can efficiently enumerate all possible outcomes from rolling a given subset of the dice as weak compositions of the total number of dice of each type. To compute optimal strategy for a given roll, we consider all possible non-empty subsets of dice to keep, recursively evaluating the expected score of each such move, memoizing results as we go.

As an aside, the Python code was not the starting point. Quite often I find it useful to do my “thinking” in Mathematica, then convert the resulting algorithm into a more widely accessible form. It’s always interesting to see how much more concise– but often much less readable– the Mathematica implementation usually is, as shown below.

rolls[n_, d_] := Map[
  Differences[Append[Prepend[#, 0], n + d]] - 1 &,
  Subsets[Range[n + d - 1], {d - 1}]
  ]

score[ns_, ds_] := score[ns, ds] = Total@Map[
  (Times @@ Multinomial @@@ #) Max[0, Map[
    Total[Total /@ (Range[ds] #)] + score[ns - (Total /@ #), ds] &,
    Rest[Tuples[Tuples /@ Range[0, #]]]
    ]] &,
  Tuples[MapThread[rolls, {ns, ds}]]
  ] / Times @@ (ds^ns)

score[{12, 1, 1, 1}, {6, 8, 10, 12}]

Open questions

I think there are a lot of interesting additional questions to pursue here. For example, is there a concise description of optimal strategy, or even an approximation of it, that a human player can implement with simple mental calculation?

Can the calculation of optimal strategy be made more efficient? Interestingly, both the Mathematica and Python implementations took about a day and a half to execute on my laptop. Add any more dice and things would get out of hand.

Finally, the strategy analyzed above is only optimal in the sense of maximizing expected score. This is not the same thing as maximizing the probability of beating another player. Similar to the approach utilized here, ideally we would evaluate the second player’s strategy first, with the utility function parameterized by the first player’s hypothetical final score. Then, once we know all of the possible expected values for the second player, we could compute the first player’s optimal strategy and overall value of the game, using the (negated) second player’s expected returns as the utility function. How different do the resulting strategies look as a result, compared with maximizing expectation?

http://possiblywrong.wordpress.com/?p=5292

Extensions

Analysis of Mastermind-like bottle match game

possiblywrong Jan 23, 2026

Show full content

During the holidays, my niece introduced me to a “bottle match” game, variants of which have apparently made the rounds online for a while now, but it was new to me. She secretly arranges seven wooden bottle-shaped pegs, each of a different color, in a row. I then make a series of guesses of the ordering of the colors (arranging a second set of seven bottles visible to everyone), with each guess scored with a number indicating how many bottles are in the correct position. The objective is to get all seven bottles in the correct order in as few guesses as possible.

This is very similar to the board game Mastermind: one player arranges a secret “code,” and another player makes a series of guesses, with scores providing partial information about how close each guess is to the solution.

Watching family members play the bottle match game, I wondered how inefficient we were being in our guesses. Our exploration of the space of possible solutions seemed pretty timid, making “small” single-transposition changes to the arrangement of bottles at each step, even backtracking to previous guesses at times just to confirm remembered scores.

With $n=7$ bottles, a coarse lower bound of at least $\lceil\log_{n-1}n!\rceil=5$ guesses are necessary in the worst case. How close to this bound can we get?

I wrote a program to try to answer this question, taking advantage of the similarity to Mastermind to analyze both games in the process. The resulting ~100 lines of Python code are on GitHub.

The greedy heuristic strategy used for both games is that described by Knuth (2): at each turn, make a guess that minimizes the maximum number of possible secret codes that might remain, over all possible scores that the guess might receive. Break any ties by first preferring codes that are possible (so that we have a chance of winning on the next move)… and in the case of the bottle match game, prefer a code that is “closest” to the previous guess, in the sense of requiring the least amount of additional shifting bottles around to turn the previous guess into the new guess.

This distance metric on permutations can be hard to compute, depending on what sort of shifting operations are allowed. The “block interchange” metric described by Christie (1) seems reasonably realistic for this problem, while being efficiently computable: one operation involves taking two “blocks” of consecutive bottles and swapping them, as shown in the example in the figure below.

Example of a single block interchange operation, swapping the block of 3 bottles in the gray box with the block of 2 bottles in the black box.

As a sanity check, evaluating the resulting strategy for the Mastermind game across all $6^4=1296$ possible secret codes, replicates the results in Knuth’s paper: an expected number of guesses of 5801/1296, or about 4.476, never requiring more than a maximum of 5 guesses.

For the bottle match game, we can guess the secret code in 34169/5040, or about 6.78 guesses on average, never requiring more than a maximum of 8 guesses. It’s interesting how often it is advantageous– at least in this greedy heuristic minimax sense– to guess a code that is not consistent with the scores observed so far, as shown in the example output below.

>>> import mastermind
>>> game = mastermind.ColorMatch()
>>> game.play((6, 5, 4, 3, 2, 1, 0), show=True)
Guess (0, 1, 2, 3, 4, 5, 6) in 5040 remaining: score 1
Guess (2, 3, 4, 5, 6, 1, 0) not in 1855 remaining: score 3
Guess (2, 5, 0, 1, 6, 3, 4) not in 106 remaining: score 1
Guess (4, 3, 0, 1, 6, 5, 2) not in 31 remaining: score 0
Guess (2, 1, 4, 5, 0, 3, 6) not in 8 remaining: score 1
Guess (6, 5, 4, 3, 2, 1, 0) in 2 remaining: score 7

References:

Christie, D., Sorting permutations by block-interchanges, Information Processing Letters, 60(4) November 1996, p. 165-169.
Knuth, D., The Computer as Master Mind, Journal of Recreational Mathematics, 9(1) 1976-77.

http://possiblywrong.wordpress.com/?p=5275

Extensions

Riddler Solution: Can You Skillfully Ski The Slopes?

possiblywrong Dec 13, 2025

Introduction In Zach Wissner-Gross’s Fiddler on the Proof, he mentions the following problem from nearly five years ago, when it was still FiveThirtyEight’s The Riddler: You are one of finalists in a skiing championship. There are two rounds: in the … Continue reading →

Show full content

Introduction

In Zach Wissner-Gross’s Fiddler on the Proof, he mentions the following problem from nearly five years ago, when it was still FiveThirtyEight’s The Riddler:

You are one of $n=2$ finalists in a skiing championship. There are two rounds: in the first round, each finalist skis down the mountain, recording their finishing time; this is repeated in the second round, with the least total time for both runs winning the championship. This is the top talent in the world, so the time for every run is independent and identically distributed. Having learned that you have the best time in the first round, what is the probability that you will win the championship? For extra credit, what if $n=30$ ?

In the original problem, the finishing times were normally distributed. The neat feature of this problem is a cocktail-napkin solution for $n=2$ that does not depend on this distribution. However, the extra credit for larger $n$ — for normally distributed run times– appears to require resorting to simulation or approximation, although Josh Silverman provides some really interesting recent asymptotic analysis.

The motivation for this post is to capture my notes on the exact solution for any $n$ for the case where finishing times are uniformly distributed. (This feels to me like a slightly more natural version of the problem, without the unrealistic negative tails.)

Solution

Define random variables $X$ and $Y$ to be the finishing times for our first and second run, respectively, and similarly define $R_i$ and $S_i$ to be the first and second round times for each competitor $i \in \{1,2,\ldots,n-1\}$ . Define event $A_i$ to be $X+Y<R_i+S_i$ (i.e., we beat competitor $i$ overall) and event $B_i$ to be $X<R_i$ (i.e., we beat competitor $i$ in the first round). Then we want to compute the probability

$P(\bigcap\limits_{i=1}^{n-1}A_i | \bigcap\limits_{i=1}^{n-1}B_i)$

Applying Bayes’ theorem, and re-ordering and re-grouping the intersection of events in the numerator, we have

$\frac{P(\bigcap\limits_{i=1}^{n-1}A_i \cap B_i)}{P(\bigcap\limits_{i=1}^{n-1}B_i)}$

There are two key observations: first, by symmetry, the probability in the denominator– that we are the leader after the first round– is $1/n$ . Second, in the numerator, if we condition on fixed times for our two runs, say for example $X=x=0.3$ and $Y=y=0.8$ , then the $n-1$ events $A_i \cap B_i$ are independent– so we can multiply them– and they all have the same probability.

The following figure shows this example situation, where the area of the shaded region is the probability $P(A_i \cap B_i)=P(x+y<R_i+S_i \land x<R_i)$ that we beat a particular competitor in both rounds.

This shaded region has two different shapes depending on whether $x+y$ is less than or greater than 1; putting everything together, the desired probability is

$n \int_{x=0}^1(\int_{y=0}^{1-x}(1-x-\frac{1}{2}y^2)^{n-1}\mathrm{d}y + \int_{y=1-x}^1((1-x)(1-y)+\frac{1}{2}(1-x)^2)^{n-1}\mathrm{d}y)\mathrm{d}x$

Applying binomial expansion to the powers and moving the integrals inside the resulting summation gives the following formula:

$n\sum\limits_{k=0}^{n-1}{n-1 \choose k}\frac{1}{2^k}(\frac{(-1)^k}{(2k+1)(n+1+k)}+\frac{(n-1-k)!(n-1+k)!}{(2n)!})$

The figure below shows this probability vs. the number of skiers $n$ .

For 5 or 6 skiers, it’s roughly a coin toss whether you retain your first-round lead. When $n=30$ , the probability is about 0.226.

http://possiblywrong.wordpress.com/?p=5231

Extensions

Variants of Fitch Cheney’s Trick

possiblywrong Nov 14, 2025

Nearly 15 years ago, I wrote about the following trick attributed to William “Fitch” Cheney, Jr., in [1]: Effect 1: While the magician turns their back or even leaves the room, a spectator shuffles a standard deck of cards, and … Continue reading →

Show full content

Nearly 15 years ago, I wrote about the following trick attributed to William “Fitch” Cheney, Jr., in [1]:

Effect 1: While the magician turns their back or even leaves the room, a spectator shuffles a standard deck of $n=52$ cards, and selects an (unordered) subset of any $m=5$ cards and hands them to the magician’s assistant. For example:

The assistant arranges $k=4$ of these cards in a row left to right, as shown below, keeping the remaining selected card hidden.

Upon observing this arrangement, the magician announces the fifth selected card, in this case the ace of diamonds.

The article linked above describes how the trick works in practice (including Python source code demonstrating both the assistant and the magician), as well as a proof that the following condition on $(n,m,k)$ is necessary and sufficient for the trick to be feasible, at least in principle:

${n \choose m} \leq (n)_k$

Intuitively, the LHS binomial coefficient counts the number of spectator selections that may be presented to the assistant, and the RHS falling factorial counts the number of arrangements that may be presented to the magician.

The motivation for this post is to observe that this basic argument, applying the König-Egerváry theorem to a bi-regular graph to find a saturating matching, is more generally applicable: it should be possible to perform other “Fitch-like” tricks, whose feasibility rests on similar counting arguments. For example:

Effect 2: A spectator rolls $m=6$ dice each with $n=6$ sides (maybe from their Farkle set), and arranges the dice in a row left to right, in any desired order. The assistant covers one of the dice by placing a cup over it, leaving the ordered arrangement otherwise undisturbed. The magician observes this arrangement with the $k=m-1$ values shown, and announces the value on the covered di(c)e.

This trick is feasible if and only if:

$n^m \leq n^k {m \choose k}$

For another example of this same form, $(n,m,k)=(10,10,9)$ corresponds to writing down a 10-digit number, the assistant covering one of the digits, and the magician announcing the hidden digit.

The linked proof shows that both of these tricks are feasible in principle— but it’s a separately interesting challenge to come up with a practical scheme for performing them. In this case, there is a relatively simple solution left as a puzzle for the reader; but things can get complicated quickly, as in this puzzling.stackexchange.com post that discusses the $(6,9,7)$ case.

Effect 3: A spectator selects any $m=27$ cards from a deck of $n=52$ cards, and arranges them face-up in a row left to right, in any desired order. (More practically, imagine shuffling and cutting strictly more than half the deck, and fanning the cards face-up in a row.) The assistant flips one of the cards face-down, leaving the ordered arrangement otherwise undisturbed. The magician observes this arrangement with the $k=m-1$ face-up cards shown, and announces the face-down card(s).

In this case, the trick is feasible if and only if:

$(n)_m \leq (n)_k {m \choose k}$

Of these two variants, I think this one would be the more powerful effect. Note that unlike Fitch’s original trick, the spectator gets to not only choose the cards, but to arrange them in any chosen order as well. The assistant’s only freedom is in which card(s) to flip face-down.

Here again, the obligatory puzzle is to determine a practical scheme for performing this trick. I’m not sure how much it helps as a hint to note that, in this particular case of $k=m-1$ where the assistant flips exactly one card face-down, for the trick to be possible we need the spectator to select and arrange strictly more than half of the cards in the deck. (So that, for example, it is perhaps useful that the spectator is guaranteed to select at least one pair of cards of the same rank and color in their arrangement?)

Reference:

Kleber, M., The Best Card Trick. Mathematical Intelligencer, 24 (2002). [PDF]

http://possiblywrong.wordpress.com/?p=5213

Extensions

The criminal coupon collector

possiblywrong Oct 25, 2025

Introduction Alice and Bob ride their bicycles together to work each day. They each secure their bicycles using chain combination locks with dials, with 10 decimal digits on each dial. “I notice you always take longer than I do to … Continue reading →

Show full content

Introduction

Alice and Bob ride their bicycles together to work each day. They each secure their bicycles using chain combination locks with $m=6$ dials, with 10 decimal digits on each dial.

“I notice you always take longer than I do to lock your bike,” Alice says, “and I spin each of my dials independently and uniformly after locking– making sure I don’t accidentally leave it unlocked. What takes you so long?”

“I spin my dials after locking, too, ” Bob says, “but I also make sure that none of the individual dials on the lock happens to be correct, giving it extra random spins if necessary. No sense in helping a thief get lucky.” (In other words, for example, if Bob’s combination is 123456, then he may leave the combination spun to 451234, but not 151234, since the first digit would be correct.)

Eve is the neighborhood thief, who has noticed these two expensive-looking bicycles at this same location every morning for a while now, and she happens to overhear this conversation. After watching Alice and Bob lock their bikes and enter their building, Eve casually strolls over and observes the sequence of digits on each of the locks. She has an idea.

Puzzle 1: Assuming Eve is able to observe the combination on Alice’s lock and try exactly one combination each morning after Alice leaves her bike for work, what is the expected number of days until she is able to steal the bike?

Puzzle 2: Similarly observing the displayed combination on Bob’s lock, what is the expected number of days until Eve is able to steal Bob’s bike?

Multiple coupon collectors

Stealing Alice’s bike is an interesting problem in its own right. But the focus of and motivation for this post is to capture my notes on the second problem of stealing Bob’s bike. The practice of ensuring that none of the individual combination dials is randomly accidentally correct, while well-intentioned, ends up making Eve’s job much easier. Despite the lock having one million possible combinations, Eve needs less than 40 days’ worth of observations on average to steal the bike.

We can rephrase the problem in terms of multiple coupon collectors: there are $m$ collectors, one for each dial, each independently trying to collect $s=10-1=9$ coupon types, one for each incorrect digit on that dial. Each day, all $m$ collectors draw a coupon; we want to compute the expected number of days until all collectors have each obtained all coupon types. Put another way, we want the expected value of the maximum of $m$ independent random variables, each distributed according to the standard solo coupon collector problem with $s$ coupon types.

The cumulative distribution for this maximum $N$ is

$P(N \leq n) = (\sum\limits_{k=0}^s (-1)^k{s \choose k}(1-\frac{k}{s})^n)^m$

with expected value

$E(N) = -\sum\limits_{\mathbf{k}>\mathbf{0}} \frac{\prod\limits_{i=1}^m (-1)^{k_i}{s \choose k_i}}{1-\prod\limits_{i=1}^m (1-\frac{k_i}{s})}$

where the summation ranges over tuples $\mathbf{k}$ of $m$ non-negative integers that are not all zero, corresponding to a number of “unknown incorrect” digits for each dial. In Python:

from fractions import Fraction
from itertools import islice, product
from math import comb, prod

def probability_coupon_draws(n, s, m=1):
    """CDF(# of coupon draws from s types for all of m collectors)."""
    return sum((-1) ** k * comb(s, k) * (1 - Fraction(k, s)) ** n
               for k in range(s + 1)) ** m

def expected_coupon_draws(s, m=1):
    """E(# of coupon draws from s types for all of m collectors)."""
    return -sum(prod((-1) ** k_i * comb(s, k_i) for k_i in k) /
                (1 - prod(1 - Fraction(k_i, s) for k_i in k))
                for k in islice(product(range(s + 1), repeat=m), 1, None))

The result is about 39.5891 observations on average until Eve knows all 9 incorrect digits for each dial, and thus knows the entire correct combination.

http://possiblywrong.wordpress.com/?p=5200

Extensions

Binomial proportion estimation with unknown and varying number of trials

possiblywrong Oct 12, 2025

Here is a neat trick: our friend, Paul, has asked us for help estimating his skill at the shooting range. Every day for the past several years, he has diligently recorded the number of successful shots he took that day. … Continue reading →

Show full content

Here is a neat trick: our friend, Paul, has asked us for help estimating his skill at the shooting range. Every day for the past several years, he has diligently recorded the number $x_i$ of successful shots he took that day. Assuming that every shot is an iid Bernoulli trial with success probability $p$ , can we estimate $p$ ?

Unfortunately, no. Paul only recorded the number of successful shots each day. If he hit the target $x_i=20$ times on a given day, was that out of 20 attempts, or 25, or 150? Not only did Paul not record that corresponding total number $n_i$ of attempted trials each day, but that total varied from one day to the next: some days it might be just one “stand” of 25 shots, other days it might be several hours with hundreds of shots.

It might seem like we’re stuck. However, Paul mentions that he doesn’t shoot alone. His friend, Quinn, has accompanied him on each of these trips to the shooting range. In friendly competition, they each take the same number of shots, and whoever has the most hits buys a round on the way home. Fortunately, Quinn has kept a similar record of his daily hits (but again, not daily total trials).

This is interesting: Quinn takes each shot with some different success probability $q$ , that is just as unknown to us as $p$ . How can similarly incomplete information about Quinn possibly help us learn something about Paul?

It turns out that it can; following is Python code implementing a simpler one of several different estimates of Paul’s success probability $p$ , given a list of pairs of Paul’s and Quinn’s hit counts.

def estimate_p(xy: list[tuple[int, int]]) -> float:
    """Estimate Bernoulli probability from pairs of success counts."""
    sx  = sum(x     for x, y in xy)
    sy  = sum(y     for x, y in xy)
    sxy = sum(x * y for x, y in xy)
    sx2 = sum(x * x for x, y in xy)
    return 1 - sx2 / sx + sxy / sy

The figure below shows this estimate vs. Paul’s true success probability, for each of 100 simulated experiments, where we randomly choose true success probabilities for Paul and Quinn.

Simulation of 100 experiments (one point each) with uniformly random p and q, showing resulting estimated vs. true probability p.

Code for these experiments and plotting results is on GitHub.

http://possiblywrong.wordpress.com/?p=5181

Extensions

A trick-taking game

possiblywrong Aug 28, 2025

There has not been a puzzle here in a while, so… Alice is playing a card game with her nephew, Bob, using a standard deck of playing cards. Alice starts with the 26 red cards in her hand, and Bob … Continue reading →

Show full content

There has not been a puzzle here in a while, so…

Alice is playing a card game with her nephew, Bob, using a standard deck of playing cards. Alice starts with the 26 red cards in her hand, and Bob starts with the 26 black cards in his hand. In each of 26 rounds (or tricks), Alice and Bob simultaneously select and play one card from their remaining hand. The player with the higher-ranked card takes the trick; cards of the same rank are discarded with neither player taking the trick. After all cards have been played, Alice wins (loses) one point for every trick won (lost).

At this point, by symmetry, this seems like a fair game. To skew things slightly in her nephew’s favor, Alice suggests a wrinkle: aces are low… except for one of Bob’s aces, the ace of spades, which is high, ranking higher than all other cards.

What are Alice’s and Bob’s optimal strategies and corresponding expected score in this game?
Suppose that instead of a point for every trick, the player with the most tricks wins the game (with a score of +1, or 0 if both players take the same number of tricks). What are Alice’s and Bob’s optimal strategies and expected score?
How does this generalize to arbitrary initial hands? That is, suppose Alice and Bob start with known card ranks $a_1 \leq a_2 \leq \ldots \leq a_n$ and $b_1 \leq b_2 \leq \ldots \leq b_n$ , respectively (with all cards known to both players).

http://possiblywrong.wordpress.com/?p=5178

Extensions

Fiddler Solution: Can You Hack Bowling?

possiblywrong Jul 18, 2025

The following problem is from the Fiddler (the “spiritual successor to FiveThirtyEight’s The Riddler column”), involving the game of tenpin bowling with traditional scoring: If you knock down 100 pins over the course of a game, you are guaranteed to … Continue reading →

Show full content

The following problem is from the Fiddler (the “spiritual successor to FiveThirtyEight’s The Riddler column”), involving the game of tenpin bowling with traditional scoring:

If you knock down 100 pins over the course of a game, you are guaranteed to have a score that’s at least 100. But what’s the minimum total number of pins you need to knock down such that you can attain a score of at least 100?

I thought this was an interesting problem, in part because traditional scoring in bowling is a bit of a mess. Every pin that you knock down scores at least one point, but some pins may yield additional bonus points… based on previously knocked down pins.

My first thought was to transform the puzzle into an integer linear programming (ILP) problem. In hindsight, the solution ends up having a nice enough form to suggest a more elegant cocktail-napkin proof. But those scoring rules are just messy enough to want some more brute-force verification from the computer. Or at least that’s what I told myself to justify the interesting coding problem.

The Mathematica code is on GitHub; it’s not much, just a few dozen lines. The solution is simple to describe: we can score at least 100 points by knocking down just 44 pins, rolling four consecutive strikes, immediately followed by knocking down four pins, and no more, for a total score of 30+30+24+14+4=102.

We certainly can’t fully brute-force the problem by examining all possible games: the number of distinct “scoreboards” in a game of bowling is

${12 \choose 2}^9({12 \choose 2}+10 + 10 \cdot 11 + {12 \choose 2}-11) = 5,726,805,883,325,784,576$

However, at the other extreme, I also wasn’t able to reduce the problem to just a single ILP instance. Not only is the total score (which we are bounding) not a linear function of the variable numbers of pins knocked down by each roll, but even the total number of pins knocked down (which we are trying to minimize) is not a simple linear function of the variables, because of the possible extra frames after a strike or spare in the tenth frame.

Instead, we can trade execution time for preserved code simplicity, by evaluating not just one ILP instance, but $3^9 \cdot 4=78,732$ of them, by grouping game outcomes according to the sequence of frame “types”: strike, spare, or open. For example, the eventual solution described above occurs in a game of the form (strike, strike, strike, strike, open, open, open, open, open, open). Having fixed the type of each frame in the game, now we can describe the number of pins knocked down and the resulting total score as linear functions of variables indicating the number of pins knocked down by each roll.

http://possiblywrong.wordpress.com/?p=5168

Extensions

A maximally inefficient Monte Carlo estimate of pi

possiblywrong Jun 26, 2025

Introduction Let’s play a game: I will repeatedly flip a fair coin, showing you the result of each flip, until you say to stop, at which point you win an amount equal to the fraction of observed flips that were … Continue reading →

Show full content

Introduction

Let’s play a game: I will repeatedly flip a fair coin, showing you the result of each flip, until you say to stop, at which point you win an amount equal to the fraction of observed flips that were heads. What is your strategy for deciding when to stop?

This weekend 6/28 is “Two-Pi Day” (or Tau Day, if you like). Celebration in the classroom can take many forms: for example, we can write a program to calculate an approximation of the value of $\pi$ using Monte Carlo simulation. Or we can find some mathematical problem where the occurrence of $\pi$ is surprising or unexpected. I think the game described above is an interesting case where we can do both at the same time.

Throwing darts

The most common approach to estimating $\pi$ using Monte Carlo simulation involves selecting random points uniformly distributed in a square; imagine throwing darts at a square dartboard, as shown in the figure below.

Monte Carlo simulation of 5000 uniformly random points in a square. Red points lie within the inscribed circular sector, and the remaining blue points lie outside the sector.

Let the indicator random variable $X_1$ equal 1 if a dart lands within the inscribed circle– or within the inscribed 90-degree circular sector shown above, which has the same area– otherwise $X_1=0$ . Then $E(X_1)=P(X_1=1)$ is the ratio of areas $\pi / 4$ , so that we can estimate $\pi$ by Monte Carlo estimation of $E(Y_1)$ , where $Y_1=4X_1$ . In Python:

import random

def dart():
    x, y = random.random(), random.random()
    return 4 * int(x * x + y * y < 1)

n = 1000000
print(sum(dart() for _ in range(n)) / n)

Note how simple the simulation of dart() is, just a couple of random numbers and a few arithmetic operations. Indeed, this might seem perhaps too simple: we don’t really need the randomness of Monte Carlo simulation here, and could instead integrate over an evenly spaced grid of thrown darts. And despite the simple arithmetic implementation of the simulation, the “pi-ness” of the process– there is a circle clearly visible in the diagram– is rather unsurprising.

Throwing noodles

Another common approach to estimating $\pi$ is Buffon’s noodle. Imagine throwing uncooked straight spaghetti noodles, each one foot long, onto a floor marked with parallel lines spaced one foot apart, with the random variable $X_2$ equal to the number of times a particular noodle crosses a line on the floor.

Monte Carlo simulation of 100 Buffon’s noodles. Red noodles cross a line, and blue noodles do not.

It turns out that $E(X_2)=2/\pi$ . (Note this is an expected value, not just a probability, because it works even if we cook the noodles, so that $X_2$ can take on non-negative integer values beyond just 0 and 1.) We can similarly estimate $\pi$ by Monte Carlo simulation of this noodle-throwing process, appropriately transforming the resulting estimate of $E(X_2)$ . Again in Python:

import random
import math

def noodle():
    y = random.random()
    angle = math.pi * random.random()
    angle_0 = math.asin(y)
    return int(angle_0 < angle < math.pi - angle_0)

n = 1000000
print(2 / (sum(noodle() for _ in range(n)) / n))

As a method of computing an approximation of $\pi$ , this feels a bit circular (!) to me. We are using math.pi, the very value we are trying to estimate, in the implementation of the simulation! (I’m also skipping over the error analysis, which is more complicated than for the other simulation approaches discussed here, due to the reciprocal in the estimate.)

Flipping coins

Coming back now to the Chow-Robbins coin-flipping game, unfortunately fully specifying an optimal strategy with maximum possible expected return (known to be approximately 0.793) remains a complex open problem. But let’s consider a strategy that, although it is suboptimal, is very simple to describe, still performs reasonably well, and turns out to have interesting behavior relevant to this discussion: let’s suppose that we stop when the number of heads $h$ first exceeds the number of tails $t=h-1$ , so that we win $X_3=h/(h+t)=(t+1)/(2t+1)$ .

Then it’s a nice exercise to show that our expected return using this strategy is

$E(X_3)=\sum\limits_{t=0}^{\infty} \frac{1}{t+1}{2t \choose t}\frac{1}{2^{2t+1}}\frac{t+1}{2t+1} = \frac{\pi}{4} \approx 0.7854$

So we can estimate $\pi$ in the same way we did with the dartboard. In Python:

import random

def coin_flips():
    heads = 0
    flips = 0
    while heads <= flips - heads:
        heads += random.getrandbits(1)
        flips += 1
    return 4 * heads / flips

total = 0
n = 0
while True:
    total += coin_flips()
    n += 1
    print(f'Estimate after {n} iterations = {total / n}')

This works… sort of. There are a lot of interesting things happening here. First, the variance of our estimate is less than one-third that for a dart throw, so that we should need correspondingly fewer Monte Carlo iterations to achieve comparable accuracy.

Also, each individual coin flip is a lot less computationally expensive than each dart throw. The latter requires 128 random bits to generate two floating-point values for the $(x,y)$ position of a dart, while a coin flip is a single random bit. The ability and relative speed of extraction of a single random bit varies from language to language and library to library. For example, even sticking to Python, it’s interesting to compare the above random.getrandbits(1) with the theoretically equivalent but much slower random.randint(0, 1).

But these potential benefits are dwarfed by the overall cost of each Monte Carlo iteration simulating a single playout of the game, which generally involves more than just a single coin flip. You may notice that the output from the above code scrolls by with not-really-blazing speed… but periodically there is a significant pause, sometimes for a long time.

The problem is that the expected number of coin flips– that is, the average duration of a single Monte Carlo iteration– is infinite. In other words, even if we know that we need, say, roughly 12 million iterations of the game to achieve 3 digits of accuracy in our estimate of $\pi$ , we can’t even estimate how long those 12 million iterations will take to execute.

All is not entirely lost, however. Note that although the expected duration of a game is infinite, the program above is never in any danger of being well and truly hung in the inner loop. That is, even during one of those long pauses, we are guaranteed— with probability 1– to observe eventual completion of the current game and subsequent starting of the next new game.

We are right on the razor’s edge of hanging, though. That is, suppose that we play the game with a biased coin, so that the probability of heads is $p<1/2$ . Then there is a positive probability $(1-2p)/(1-p)$ that a game will never end, with a meandering deficit of heads that is never overcome. (Which means that the endless loop through games above is guaranteed, with probability 1, to eventually encounter and get stuck on such a never-ending game.)

On the other hand, if $p>1/2$ , then the situation improves. For example, let’s suppose that we play the game with a biased coin with $p=3/4$ . Then the expected return from this game is $\pi/(2\sqrt{3})$ , so we can still estimate $\pi$ by playing this modified game:

import random
import math

def coin_flips():
    heads = 0
    flips = 0
    while heads <= flips - heads:
        heads += int(random.getrandbits(2) != 0)
        flips += 1
    return 2 * math.sqrt(3) * heads / flips

n = 120000
print(sum(coin_flips() for _ in range(n)) / n)

Better yet, not only is the expected number of coin flips (i.e., random bits) per game finite, it’s just 2 flips on average per game, so that each game is relatively fast to simulate. Better still, the variance is now less than 12% of the dartboard’s, so that we don’t even have to play as many games to achieve the same accuracy.

http://possiblywrong.wordpress.com/?p=5122

Extensions

How shallow is the cut card effect?

possiblywrong May 13, 2025

Introduction This is a follow-up to an article from nearly a decade ago, answering what I didn’t recognize at the time as an interesting question, buried in an otherwise offhand remark. To repeat the setup, suppose that we shuffle together … Continue reading →

Show full content

Introduction

This is a follow-up to an article from nearly a decade ago, answering what I didn’t recognize at the time as an interesting question, buried in an otherwise offhand remark. To repeat the setup, suppose that we shuffle together $d=6$ decks of playing cards, and play a series of rounds of blackjack, each using the same fixed CDZ- playing strategy, stopping at the “cut card”, a marker inserted in the shoe indicating when to reshuffle. Let’s place the cut card at “penetration” $p=0.75$ , that is, stop and reshuffle when at most $d(1-p)=1.5$ decks or 78 cards remain in the shoe. Let the random variable $X_n$ be the outcome of the n-th round. The figure below shows estimates of $E(X_n)$ , in percent of initial wager, from just 1000 simulated shuffled shoes.

Expected return vs. number of rounds dealt from the shoe, playing fixed CDZ- strategy optimized for full 6-deck shoe. The mean of 1000 randomly shuffled shoes is shown in blue, with standard error of the sample mean shown in gray.

That steep drop in expected return late in the shoe is the “cut card effect.” An intuitive, hand-waving explanation goes something like this: although the position of the cut card is fixed, the number of rounds needed to reach it varies from one shuffled shoe to the next. If near the end of the shoe we have played a larger-than-typical number of rounds, then each of those rounds must have consumed fewer-than-typical number of cards on average, and so were likely rich in tens. The remainder is thus poor in tens, a disadvantage for the player.

Inspection of the above figure suggests that the cut card effect “begins” at approximately round $n=42$ or so (for this particular choice of rules, playing strategy, and penetration). The motivation for this post is to show that the cut card effect actually “begins” much earlier in the shoe, at round $n=19$ in this case.

Extended True Count Theorem

The cut card effect is interesting because, at first glance, it seems to contradict a very useful theorem, first proved by Thorp, that is easy to state incorrectly, so for now I’ll state it vaguely:

Theorem: For each round $n$ played with the same fixed strategy, $E(X_n)=E(X_1)$ , provided we do not run out of cards.

In other words, compute the exact expected return from playing our fixed strategy off the top of the shoe: this is $E(X_1)$ . Now compute the expected return from playing a second round, after having played a first round off the top of the shoe: this is $E(X_2)$ , and the above theorem states that these two expected values are equal. Similarly, the expected return from playing the third round into the shoe is $E(X_3)=E(X_2)=E(X_1)$ , etc.

Looking again at the figure above, the small variations in estimated expected return for small $n$ are purely due to sampling error. If we could in principle evaluate not just 1000 randomly shuffled arrangements of cards in the shoe, but all $(52d)!$ possible arrangements, computing each expected return exactly, then the above theorem states that this curve would be exactly constant for those small $n$ .

But clearly something different is happening for $n \geq 42$ , so what counts as “small” $n$ ?

Integer linear programming

This is where the vague “not running out of cards” comes in. There are two conditions that must be satisfied to guarantee $E(X_n)=E(X_1)$ : we must be guaranteed to reach round $n$ , and we must be guaranteed to be able to complete round $n$ .

I think the latter condition is simpler to describe. We need to ensure that, no matter how the previous $n-1$ rounds play out, we have enough cards left in the shoe to complete the subsequent round $n$ , with neither the player nor the dealer running out of cards. For these rules and CDZ- playing strategy, we will consume at most 33 cards in any single round, so as long as we don’t start the round before the cut card, there is no danger of “running out of cards.”

It’s the former condition that is trickier, and is the heart of the problem here: we must, with probability 1, complete $n-1$ prior rounds before reaching the cut card. Otherwise, to even define the random variable $X_n$ , we must condition the population of possible shuffled arrangements of cards in the shoe to exclude those cases where we reach the cut card too soon.

So, how many rounds are we guaranteed to be able to play before reaching the cut card? This is an integer linear programming problem: for each of the 3,054,067 possible subsets of card ranks consumed in a round, let $a_{r,k}$ be the number of cards of rank $r$ consumed. Then the problem is to find non-negative integers $x_k$ :

minimizing $\sum_k x_k$

subject to $\sum_k a_{r,k}x_k \leq 4d, 1 \leq r < 10$

$\sum_k a_{10,k}x_k \leq 16d$

$\sum_k (\sum_r a_{r,k})x_k \geq 52dp$

The code is on GitHub: C++ to compute all possible subsets of cards comprising the player hands in a round, and Mathematica to do the same for the dealer, to compute the outer product of player and dealer outcomes, and to solve the linear programming problem.

Interpreting results

The resulting minimum value of the objective function is 18. This means that it is possible– albeit unlikely– to reach the cut card after having played just 18 rounds. If we could eliminate the sampling error in the above figure, the actual exact expected return would be constant for $E(X_1)=E(X_2)=\ldots =E(X_{18})$ .

But the extended true count theorem has nothing to say about the expected return from round $n=19$ or later. Empirically, these mid-shoe departures from $E(X_1)$ are almost certainly very small… but they are just as almost certainly non-zero.

At a blackjack table, imagine next to the dealer is a display indicating the number $n$ of the round to be played, that increments by one after each round, and resets to 1 after each reshuffle. Further imagine that we walk up to the table in the middle of the shoe, and observe the displayed round number, with no information about any cards that have been dealt since the last shuffle. If we observe $n \leq 18$ , then we can be confident that the expected return from playing the upcoming round is the same as if we walked up to a full shoe: $E(X_n)=E(X_1)$ . On the other hand, if we observe, say, $n=19$ , then the expected return $E(X_{19})$ is almost certainly not exactly equal to $E(X_1)$ … but we can’t say for sure, let alone the magnitude or even the sign of the difference.

In practice, such a display does not exist; if we walk up to a table cold we don’t know how many rounds have been played up to that point. But suppose instead that we observe not the number of the round, but the number of cards left in the shoe before the cut card. How can we tell when we are in the “safe zone” of no cut card effect, with expected return exactly equal to $E(X_1)$ ?

If there are 234 cards left in the shoe, before the 75% penetration cut card, that’s a full shoe, so the expected return from the round is $E(X_1)$ . But suppose that there are as many as 162 cards still left in the shoe. Seems like that should still be plenty… but with even that many cards remaining, we are already “in the cut card effect,” where the extended true count theorem provides no information about the expected return from the next round. (It’s an exercise for the reader to show that this minimum of 162 cards remaining is the solution to another linear programming problem, but one that we can solve in our head.)

http://possiblywrong.wordpress.com/?p=5080

Extensions