GeistHaus
log in · sign up

https://mikespivey.wordpress.com/feed

rss
10 posts
Polling state
Status active
Last polled May 19, 2026 05:46 UTC
Next poll May 20, 2026 07:53 UTC
Poll interval 86400s
Last-Modified Tue, 14 Apr 2026 03:14:33 GMT

Posts

Optimizing Against the Field: A Bracket Pool Strategy Using Team Seeds, Part 1
gamesoptimizationprobabilitysportsstatistics
This year’s NCAA basketball tournament has just ended, which means a lot of people are currently cursing their bracket picks. That also means it’s a great time for a retrospective: How could one go about picking a better bracket? There … Continue reading →
Show full content

This year’s NCAA basketball tournament has just ended, which means a lot of people are currently cursing their bracket picks. That also means it’s a great time for a retrospective: How could one go about picking a better bracket? There are all kinds of considerations here, but in this post we’re going to focus on just one feature: team seedings.

In the NCAA tournament all 64 teams are placed in one of four regional tournaments, and the teams in each regional tournament are given seeds numbered 1 through 16. The regional tournaments are set up so that #1 seeds play #16 seeds, #2 seeds play #15 seeds, and so forth, for the first round. In the second round, the winner of 1 vs. 16 plays the winner of 8 vs. 9, the winner of 2 vs. 15 plays the winner of 7 vs. 10, and so on, continuing through four rounds. After the fourth round, fifteen teams from each region have been defeated, and the remaining team is the winner of the region and goes to the Final Four to play against the other regional winners.

The NCAA men’s tournament expanded to 64 teams in 1985. This gives us 40 years’ worth of information on how teams actually perform relative to their seeds. In this series of blog posts, we will use this information to investigate how you can improve your chances of winning your tournament bracket pool using information from the team seeds.

(For this purposes of this project, we are ignoring the play-in teams in the men’s tournament. Also, we will not include information from the NCAA women’s tournament, since the dynamics appear to be different: The top seeds and a small number of programs tend to dominate in a way that does not happen with the men’s tournament.)

Historical Winning Percentages

The first thing to do is figure out the historical win rate for the various seed matchups. Kaggle’s Machine Learning Mania 2026 contains the all the data we need (much more than we need, in fact). We extract the data and keep only the yearly seed information from 1985 on, including which seeds won which matchups.

After all the data extraction, cleaning, and computation involved, we end up with a 16 x 16 matrix for seed vs. seed win/loss records from 1985 to 2025. The full matrix is given in the file below. A few details are worth pointing out.

First, the #1 seed has a winning record of 98.8% against the #16 seed. From 1985 to 2025, inclusive, there have been 40 NCAA tournaments (41 years less one for Covid-19), which means 160 games that have pitted a #1 seed against a #16. We know that a #16 seed has only won twice (UMBC over Virginia in 2018 and Fairleigh Dickinson over Purdue in 2023), giving a 158/160, or 98.75%, winning percentage for the #1 seed here. The 0.988 that appears in the table serves as a sanity check on our calculations.

In addition, several higher-numbered seeds have winning records against lower-numbered seeds (noted in orange). Many of these are based on only one or a few games, but one in particular is not. With 160 matchups, the #9 seeds actually have a winning record against the #8 seeds: 51.9%. There are also many, many seed vs. seed matchups that have never occurred in the tournament (noted in light yellow). Finally, several seeds have 100% winning records against particular other seeds (noted in light blue if not already colored orange). While historically accurate, these are based on only a few games and are suspicious from a modeling/analysis standpoint.

win_rate_matrixDownload

Smoothing Winning Percentages with Bradley-Terry

The sparsity of the historical win rate matrix and the nonzero entries based on only few matchups are problems for us if we’re trying to do a full seed-vs-seed analysis. All is not lost, though: We can smooth the matrix in some fashion, using the historical information we do have to estimate the information we do not.

There are a few options available to us here, but perhaps the simplest robust choice is to use the Bradley-Terry model. In our terms, this model says that the probability that seed i defeats seed j in the tournament is

P(\text{seed } i \text{ defeats seed } j ) = \dfrac{p_i}{p_i + p_j},

where p_i and p_j are the overall strengths (in some sense) of teams i and j, respectively. And we have a good way to estimate the p_i value for each seed i: Use maximum likelihood estimation and the historical seed vs. seed win data so that the ratio formula above best explains all observed matchups simultaneously. (This process is described in detail in the Bradley-Terry model page.)

Applying the Bradley-Terry model gives us the updated win-rate matrix below. This matrix looks much more reasonable: Every matchup has a win probability, and the probabilities seem reasonable (the #1 vs. #16 win rate of 98.8% is even preserved). The oddly high percentages below the diagonal have been smoothed out, leaving only three win rates in the orange, all of which seem plausible.

bt_win_rate_matrixDownload

Now, how do we use this model to pick a good bracket? We will continue the discussion in the next post, where we discuss how to maximize your expected bracket score.

mzspivey
http://mikespivey.wordpress.com/?p=3140
Extensions
Mean Residual Life for the Weibull Distribution
probabilityspecial functionsstatistics
A few years ago I answered a question on Cross Validated about the existence of a closed form for the mean residual life of the Weibull distribution, a popular distribution in reliability analysis. The short answer is that it depends … Continue reading →
Show full content

A few years ago I answered a question on Cross Validated about the existence of a closed form for the mean residual life of the Weibull distribution, a popular distribution in reliability analysis. The short answer is that it depends on what you mean by closed form. There is an expression for the Weibull MRL in terms of the upper incomplete gamma function \Gamma(s,x), but that’s generally not considered to be closed form. On the other hand, many modern programming languages like Python and C# have numerical packages that include \Gamma(s,x), so if your interest is numerical computation you can use one of these instead of a general numerical integration routine to implement the Weibull MRL. (Side note: Both of the linked implementations are to the regularized upper incomplete gamma function.)

On to the derivation. The mean residual life function m(x) is, for a general probability model,

\displaystyle m(x) = \dfrac{\int_x^\infty S(t) \, dt}{S(x)},

where

S(x) is the survival function S(x) = 1 - F(x), and F(x) is the cumulative distribution function for the probability model.

The Weibull probability density function is

\displaystyle F(x) = 1 - e^{-(x/\eta)^{\beta}}, \:\: x > 0,

so the Weibull mean residual life function is

m(x) = \displaystyle \dfrac{\int_x^{\infty} e^{-(t/\eta)^{\beta}} \, dt}{e^{-(x/\eta)^{\beta}}}.

Now, use the substitution u = (\frac{t}{\eta})^\beta. Then du = \frac{\beta}{\eta} (\frac{t}{\eta})^{\beta - 1} \, dt.

Since \frac{t}{\eta} = u^{1/\beta}, this yields du = \frac{\beta}{\eta} (u^{1/\beta})^{\beta - 1} \, dt = \frac{\beta}{\eta} u^{1 - 1/\beta} \, dt, so that dt = \frac{\eta}{\beta} u^{1/\beta - 1} \, du. Then

\begin{aligned} m(x) &= e^{(x/\eta)^{\beta}} \int_{(x/\eta)^{\beta}}^\infty \frac{\eta}{\beta} u^{1/\beta - 1} e^{-u} \, du \\ &= \frac{\eta}{\beta}e^{(x/\eta)^{\beta}} \Gamma \left(\frac{1}{\beta}, \left(\frac{x}{\eta}\right)^\beta \right).  \end{aligned}

This can be rewritten slightly by using a recurrence property of the upper incomplete gamma function:

\begin{aligned} m(x) &= \eta e^{(x/\eta)^{\beta}} \left( \Gamma \left(1 + \frac{1}{\beta}, \left(\frac{x}{\eta}\right)^{\beta}\right)  - \left( \left(\frac{x}{\eta}\right)^{\beta} \right)^{1/\beta} e^{-(x/\eta)^{\beta}} \right) \\ &= \eta e^{(x/\eta)^{\beta}}  \Gamma \left(1 + \frac{1}{\beta}, \left(\frac{x}{\eta}\right)^{\beta} \right) - x. \end{aligned}

Either of these two representations of m(x) for the Weibull model is fine.

Be careful with using numerical implementations of the upper incomplete gamma function, though. The Python and C# methods I linked to in the first paragraph both implement the regularized upper incomplete gamma function. Basically, that means they implement \Gamma(s,x) / \Gamma(s) rather than \Gamma(s,x). If you use one of those methods, make sure you multiply the result by \Gamma(s) to match the formulas given here.

mzspivey
http://mikespivey.wordpress.com/?p=3090
Extensions
An Interview Question about Car Replacements
interview questionsprobability
Recently I came across the following on a site that provides questions on probability and statistics for tech interviews. I solved it, and so I thought I would give my solution here. Suppose there is a new vehicle launch upcoming. … Continue reading →
Show full content

Recently I came across the following on a site that provides questions on probability and statistics for tech interviews. I solved it, and so I thought I would give my solution here.

Suppose there is a new vehicle launch upcoming. Initial data suggests that any given day there is either a malfunction with some part of the vehicle or possibility of a crash, with probability p which then requires a replacement. Additionally, each vehicle that has been around for n days must be replaced. What is the long-term frequency of vehicle replacements?

I’m going to interpret “long-term frequency” as the expected amount of time that a vehicle is in operation before it is replaced. With that interpretation, the probability that a vehicle will need to be replaced after k days, if k < n, is p(1-p)^{k-1}, as the vehicle will need to work fine for the first k-1 days (with probability 1-p for each day) and then break or crash on day k, with probability p. The probability that a vehicle will need to be replaced after n days is (1-p)^{n-1}, as this is the probability that the vehicle will work fine for n-1 days. (It doesn’t matter whether it breaks on day n or not; it will still be replaced after day n.) Letting q = 1-p, the expected amount of time that a vehicle is in operation is thus (using the definition of expected value) given by

\setlength\arraycolsep{2pt}\begin{array}{rl} \displaystyle \sum_{k=1}^{n-1} k p q^{k-1} + n q^{n-1} &\displaystyle= p\sum_{k=0}^{n-1} k q^{k-1} + n q^{n-1} = p \sum_{k=0}^{n-1} \frac{d}{dq} q^k + n q^{n-1} \smallskip \\ &\displaystyle= p \frac{d}{dq} \sum_{k=0}^{n-1}  q^k + n q^{n-1} = p \frac{d}{dq} \frac{1-q^n}{1-q} + n q^{n-1} \smallskip \\ &\displaystyle= p \frac{(1-q) (- n q^{n-1}) + (1-q^n)}{(1-q)^2} + n q^{n-1} =  p \frac{p(- n q^{n-1}) + (1-q^n)}{p^2} + n q^{n-1} \smallskip \\ &\displaystyle= \frac{1-q^n}{p}.\end{array}

Here, in the first line we use the trick of rewriting k q^{k-1} as \frac{d}{dq} (q^k) and then swapping the order of summation. In the second line we use the geometric sum formula, and in the third line we use the quotient rule for derivatives.

mzspivey
http://mikespivey.wordpress.com/?p=3053
Extensions
Two Fallacious Induction Arguments
proof techniques
In this post I’m going to discuss two “proofs” by induction. That is, they’ll both be attempts at proving a statement, but they’ll both have flaws in their arguments. My hope is to alert people to potential errors when attempting … Continue reading →
Show full content

In this post I’m going to discuss two “proofs” by induction. That is, they’ll both be attempts at proving a statement, but they’ll both have flaws in their arguments. My hope is to alert people to potential errors when attempting to prove a statement by induction.

Claim 1: For all n, n = n+1.

“Proof”: Let P(n) denote that claim that n = n+1. Suppose that P(k) is true for some k; that is, suppose k = k+1. Now, P(k+1) is the claim that k + 1 = k+2. If P(k) is true, then k+1 = (k+1) + 1 = k+2, which proves that the truth of P(k) implies the truth of P(k+1). By the principle of mathematical induction, then, P(n) is true for all n; that is, n = n+1 for all n.

Claim 2: For all n \geq 1, all people in a group of size n share the same birthday.

“Proof”: Let P(n) denote the claim that all people in a group of size n have the same birthday. Clearly, P(1) is true, as all people in a group of size 1 have the same birthday. Now, suppose that P(k) is true; that is, all people in a group of size k have the same birthday. Suppose now that we have a group of k+1 people. Single out one of those people; perhaps his name is Al. Without Al, we have a group of k people, all of whom must, by our inductive hypothesis, share the same birthday. Now, swap Al with one of those other k people. Now Al is in a group of k people. Again, by hypothesis all of them must share the same birthday. So Al must have the same birthday as all the others in his group. Thus these k+1 people all share the same birthday, and so P(k+1) is true. By the principle of mathematical induction, then, every member of a group of size n must have the same birthday.


Clearly, both Claim 1 and Claim 2 are incorrect. The question, then, is this: Where are the errors in their “proofs”?

For the “proof” of Claim 1, the argument that P(k) implies P(k+1) is actually valid. The logic error lies in the failure to prove a base case. And in fact, P(n) is false for any base case you can think of, so any attempt to prove a base case will fail. And if there’s no valid base case, then the chain-like logic of induction doesn’t have an initial link to start with. Thus the “proof” is invalid. The moral of the story here is that, while proving the base case may seem perfunctory, it’s actually crucial to the logic of induction.

For the “proof” of Claim 2, the error is more subtle. The argument for the base case is valid. The argument for P(k) implies P(k+1) is also valid—except for one, single, solitary value of k. When k = 1, and we remove Al from the group of two people, we then have Al and a group of size 1. Swapping Al with that one other person does put Al in a group of size 1, and he does share the same birthday as everyone else in the group (namely, just himself), but he doesn’t necessarily share the same birthday with that one other person. So P(1) does not imply P(2), even though P(k) does imply P(k+1) for all larger values of k. Thus while the first link in a chain-like induction argument holds, the attempt to attach a second link to the first one fails. So we can’t attach any more links, and so the “proof” is invalid. The moral of the story here is that, for an induction argument to work, P(k) implies P(k+1) has to be true for all values of k (well, values at least as large as the value for k used in the base case). If it fails for even one value of k, then the induction chain breaks at that point, and the statement may not be true for any values larger than k.

mzspivey
http://mikespivey.wordpress.com/?p=3036
Extensions
Some Corrections to My Paper “A Combinatorial View of Sums of Powers”
combinatoricsnumber theorypublishingStirling numbers
A few months ago Mathematics Magazine published a paper of mine, “A Combinatorial View of Sums of Powers.” In it I give a combinatorial interpretation for the power sum , together with combinatorial proofs of two formulas for this power … Continue reading →
Show full content

A few months ago Mathematics Magazine published a paper of mine, “A Combinatorial View of Sums of Powers.” In it I give a combinatorial interpretation for the power sum \sum_{k=1}^n k^m, together with combinatorial proofs of two formulas for this power sum. (An earlier version of some of the results in this paper actually appeared in a blog post from several years ago.)

Recently, however, I received an email from Michael Maltenfort at Northwestern University, pointing out a couple of errors in the paper. I’m recording those errors here.


The most serious mistake Maltenfort pointed out to me is the paper’s claim that m is a nonnegative integer. The formulas actually require m to be a positive integer. For example, in the paper I state the following.

Theorem 1 (Original). The power sum \sum_{k=1}^n k^m is the number of functions f: [m+1] \mapsto [n+1] such that, for all i \in [m], f(i) < f(m+1).

However, when m = 0 this doesn’t work. The power sum as I stated it entails adding up n copies of 1 and so evaluates to n. On the other hand, the set [m] is the set \{1, 2, \ldots, m\}. This means that all n+1 functions from [1] to [n+1] vacuously satisfy the condition in the theorem. The theorem as stated in the paper is off by 1, then, for the case m = 0.

Maltenfort suggests changing the lower index on the sum to start at k = 0, so that we have the following.

Theorem 1 (Updated). The power sum \sum_{k=0}^n k^m is the number of functions f: [m+1] \mapsto [n+1] such that, for all i \in [m], f(i) < f(m+1).

As long as we are fine with the convention that 0^0 = 1 (and I am a fan of this convention in a combinatorial setting), this allows the formula to hold in the case m = 0 as well.

Maltenfort also points out that the two formulas for the power sum need to be updated in order for them to hold for the m = 0 case. My paper has

Theorem 2 (Original): \displaystyle \sum_{k=1}^n k^m = \sum_{k=0}^m \binom{n+1}{k+1} \left\{ m \atop k \right\} k!, and

Theorem 3 (Original): \displaystyle \sum_{k=1}^n k^m = \sum_{k=0}^{m-1} \left\langle m \atop k \right\rangle \binom{n+k+1}{m+1}.

These should be updated to

Theorem 2 (Updated): \displaystyle \sum_{k=0}^n k^m = \sum_{k=0}^m \binom{n+1}{k+1} \left\{ m \atop k \right\} k!, and

Theorem 3 (Updated): \displaystyle \sum_{k=0}^n k^m = \sum_{k=0}^m \left\langle m \atop k \right\rangle \binom{n+k+1}{m+1}.


The other mistake in the paper is in the definition of the induced permutation \pi. The definition for \pi in the paper has the requirement

\displaystyle f(\pi(i)) = f(\pi(j)) \implies \pi(i) < \pi(j).

That should instead be

\displaystyle f(\pi(i)) = f(\pi(j)) \text{ and } i < j  \implies \pi(i) < \pi(j).

Thanks to Professor Maltenfort for these corrections!


References

Spivey, Michael Z. “A Combinatorial View of Sums of Powers,” Mathematics Magazine, 90 (2): 125-131, 2021.

mzspivey
http://mikespivey.wordpress.com/?p=3004
Extensions
A Connection Between the Riemann Zeta Function and the Power Sum
Bernoulli numbersnumber theory
The Riemann zeta function can be expressed as , for complex numbers s whose real part is greater than 1. By analytic continuation, can be extended to all complex numbers except where . The power sum is given by . … Continue reading →
Show full content

The Riemann zeta function \zeta(s) can be expressed as \zeta(s) = \sum_{n=1}^{\infty} \frac{1}{n^s}, for complex numbers s whose real part is greater than 1. By analytic continuation, \zeta(s) can be extended to all complex numbers except where s = 1.

The power sum S_a(M) is given by S_a(M) = \sum_{n=1}^{M-1} n^a. (Sometimes M is given as the upper bound, but for this post it’s more convenient to use M-1.)

Recently I learned of a really nice relationship between the zeta function \zeta(-a) and the power sum S_a(M) for a \in \mathbb{N}. It’s due to Mináč [1], and I reproduce his proof here.

Theorem: For positive integers a, \displaystyle \zeta(-a) = \int_0^1 S_a(x) \, dx.

Now, of course M in S_a(M) is a positive integer, so what do we mean by integrating S_a(x) from 0 to 1? Well, it’s also known that S_a(M) is a polynomial of degree a+1 in M. For example,

\displaystyle S_1(M) = \sum_{n=1}^{M-1} n = \frac{M(M-1)}{2},

\displaystyle S_2(M) = \sum_{n=1}^{M-1} n^2 = \frac{M(M-1)(2M-1)}{6}.

So what we mean when we’re integrating S_a(x) from 0 to 1 is that we’re integrating the polynomial representation of S_a(x) from 0 to 1.

Proof. First, we need a few known facts about the Bernoulli numbers and Bernoulli polynomials:

  1. The known representation of the power sum in terms of Bernoulli polynomials: \displaystyle S_a(M) = \frac{B_{a+1}(M) - B_{a+1}(1)}{a+1}.
  2. The derivative of a Bernoulli polynomial: \displaystyle \frac{d}{dx} B_m(x) = m B_{m-1}(x).
  3. Some specific values of Bernoulli polynomials: \displaystyle B_m(1) = (-1)^m B_m(0) = (-1)^m B_m.
  4. All the odd Bernoulli numbers (except B_1) are 0.
  5. A relationship between the zeta function and Bernoulli numbers: \displaystyle \zeta(-a) = (-1)^a \frac{B_{a+1}}{a+1}.

Now, using these facts, we have

\displaystyle \int_0^1 S_a(x) \, dx =\int_0^1 \frac{B_{a+1}(x) - B_{a+1}(1)}{a+1} dx = \frac{B_{a+2}(1) - B_{a+2}(0)}{(a+1)(a+2)} + (-1)^a \frac{B_{a+1}}{a+1} = 0 + (-1)^a \frac{B_{a+1}}{a+1} = \zeta(-a). \blacksquare

For example, Theorem 1 gives us that

\displaystyle 1 + 2 + 3 + \cdots = \zeta(-1) = \int_0^1 \left(\frac{x^2}{2} - \frac{x}{2}\right) \, dx = \left. \frac{x^3}{6} - \frac{x^2}{4} \right|_0^1 = \frac{1}{6} - \frac{1}{4} = - \frac{1}{12},

\displaystyle 1^2 + 2^2 + 3^2 + \cdots = \zeta(-2) = \int_0^1 \left(\frac{x^3}{3} - \frac{x^2}{2} + \frac{x}{6}\right) \, dx = \left. \frac{x^4}{12} - \frac{x^3}{6} + \frac{x^2}{12} \right|_0^1 = \frac{1}{12} - \frac{1}{6} + \frac{1}{12} = 0.

References

[1] Ján Mináč, “A remark on the values of the Riemann zeta function,” Expositiones Mathematicae, 12: 459-462, 1994.

mzspivey
http://mikespivey.wordpress.com/?p=2975
Extensions
Intuition for the Dual in Linear Programming
linear programmingoptimization
One of the most important theoretical results in linear programming is that every LP has a corresponding dual program. Where, exactly, this dual comes from can often seem mysterious. Several years ago I answered a question on a couple of … Continue reading →
Show full content

One of the most important theoretical results in linear programming is that every LP has a corresponding dual program. Where, exactly, this dual comes from can often seem mysterious. Several years ago I answered a question on a couple of Stack Exchange sites giving an intuitive explanation for where the dual comes from. Those posts seem to have been appreciated, so I thought I would reproduce my answer here.


Suppose we have a primal problem as follows.

Primal = \begin{Bmatrix} \max &5x_1 - 6x_2 \\ s.t. &2x_1 - x_2 = 1\\ &x_1 + 3x_2 \leq 9 \\  &x_1 \geq 0 \end{Bmatrix}


Now, suppose we want to use the primal’s constraints as a way to find an upper bound on the optimal value of the primal. If we multiply the first constraint by 9, the second constraint by 1, and add them together, we get 9(2x_1 - x_2) + 1(x_1 +3 x_2) for the left-hand side and 9(1) + 1(9) for the right-hand side. Since the first constraint is an equality and the second is an inequality, this implies

19x_1 - 6x_2 \leq 18.

But since x_1 \geq 0, it’s also true that 5x_1 \leq 19x_1, and so

\displaystyle 5x_1 - 6x_2 \leq 19x_1 - 6x_2 \leq 18.

Therefore, 18 is an upper-bound on the optimal value of the primal problem.

Surely we can do better than that, though. Instead of just guessing 9 and 1 as the multipliers, let’s let them be variables. Thus we’re looking for multipliers y_1 and y_2 to force

\displaystyle 5x_1 - 6x_2 \leq y_1(2x_1-x_2) + y_2(x_1 + 3x_2) \leq y_1(1) + y_2(9).

Now, in order for this pair of inequalities to hold, what has to be true about y_1 and y_2? Let’s take the two inequalities one at a time.


The first inequality: 5x_1 - 6x_2 \leq y_1(2x_1-x_2) + y_2(x_1 + 3x_2)

We have to track the coefficients of the x_1 and x_2 variables separately. First, we need the total x_1 coefficient on the right-hand side to be at least 5. Getting exactly 5 would be great, but since x_1 \geq 0, anything larger than 5 would also satisfy the inequality for x_1. Mathematically speaking, this means that we need 2y_1 + y_2 \geq 5.

On the other hand, to ensure the inequality for the x_2 variable we need the total x_2 coefficient on the right-hand side to be exactly -6. Since x_2 could be positive, we can’t go lower than -6, and since x_2 could be negative, we can’t go higher than -6 (as the negative value for x_2 would flip the direction of the inequality). So for the first inequality to work for the x_2 variable, we’ve got to have -y_1 + 3y_2 = -6.


The second inequality: y_1(2x_1-x_2) + y_2(x_1 + 3x_2) \leq y_1(1) + y_2(9)

Here we have to track the y_1 and y_2 variables separately. The y_1 variable comes from the first constraint, which is an equality constraint. It doesn’t matter if y_1 is positive or negative, the equality constraint still holds. Thus y_1 is unrestricted in sign. However, the y_2 variable comes from the second constraint, which is a less-than-or-equal to constraint. If we were to multiply the second constraint by a negative number that would flip its direction and change it to a greater-than-or-equal constraint. To keep with our goal of upper-bounding the primal objective, we can’t let that happen. So the y_2 variable can’t be negative. Thus we must have y_2 \geq 0.

Finally, we want to make the right-hand side of the second inequality as small as possible, as we want the tightest upper-bound possible on the primal objective. So we want to minimize y_1 + 9y_2.


Putting all of these restrictions on y_1 and y_2 together we find that the problem of using the primal’s constraints to find the best upper bound on the optimal primal objective entails solving the following linear program:

\begin{matrix} \text{Minimize } &y_1 + 9y_2 \\ \text{ subject to } &2y_1 + y_2 \geq 5 \\ &-y_1 + 3y_2 = -6\\ &y_2 \geq 0 \end{matrix}

And that’s the dual.


It’s probably worth summarizing the implications of this argument for all possible forms of the primal and dual. The following table is taken from p. 214 of Introduction to Operations Research, 8th edition, by Hillier and Lieberman. They refer to this as the SOB method, where SOB stands for Sensible, Odd, or Bizarre, depending on how likely one would find that particular constraint or variable restriction in a maximization or minimization problem.

             Primal Problem                           Dual Problem
             (or Dual Problem)                        (or Primal Problem)

             Maximization                             Minimization

Sensible     <= constraint            paired with     nonnegative variable
Odd          =  constraint            paired with     unconstrained variable
Bizarre      >= constraint            paired with     nonpositive variable

Sensible     nonnegative variable     paired with     >= constraint
Odd          unconstrained variable   paired with     = constraint
Bizarre      nonpositive variable     paired with     <= constraint
mzspivey
http://mikespivey.wordpress.com/?p=2941
Extensions
The Sum of Cubes is the Square of the Sum
number theory
It’s fairly well-known, to those who know it, that . In other words, the square of the sum of the first n positive integers equals the sum of the cubes of the first n positive integers. It’s probably less well-known … Continue reading →
Show full content

It’s fairly well-known, to those who know it, that

\displaystyle \left(\sum_{k=1}^n k \right)^2 = \frac{n^2(n+1)^2}{4} =  \sum_{k=1}^n k^3 .

In other words, the square of the sum of the first n positive integers equals the sum of the cubes of the first n positive integers.

It’s probably less well-known that a similar relationship holds for \tau, the function that counts the number of divisors of an integer:

\displaystyle \left(\sum_{d|n} \tau(d) \right)^2 = \sum_{d|n} \tau(d)^3 .

In this post we’re going to prove this formula for \tau.

First, it’s pretty easy to see what the value of \tau is for prime powers; i.e., integers of the form p^e, where p is prime. Since the only divisors of p^e are 1, p,  p^2, \ldots, p^e, the number of divisors of p^e is given by \tau(p^e) = e+1.

Let’s check the identity we’re trying to prove when n = p^e. We have

\displaystyle \left(\sum_{d|p^e} \tau(d) \right)^2 = \left(\tau(1) + \tau(p) + \tau(p^2) + \cdots + \tau(p^e)\right)^2 \\ = \left(1 + 2 + 3 + \cdots + (e+1)\right)^2 = \left( \frac{(e+1)(e+2)}{2}\right)^2.

We also have

\displaystyle \sum_{d|p^e} \tau(d)^3 = \tau(1)^3 + \tau(p)^3 +\tau(p^2)^3 + \cdots + \tau(p^e)^3 \\ = 1^3 + 2^3 + 3^3 + \cdots + (e+1)^3 = \left( \frac{(e+1)(e+2)}{2}\right)^2.

Clearly, the two expressions are equal, so the identity we’re trying to prove holds in the prime-power case. (And, in fact, the derivation uses the identity about the sum of the first several positive integers mentioned at the beginning of the post!)

Let f(n) = \sum_{d|n} \tau(d) and F(n) = \sum_{d|n} \tau(d)^3. What we’ve shown thus far is that f(p^e) = F(p^e).

Now, we’re going to pull some elementary number theory. One fact about \tau is that it is multiplicative; i.e., \tau(mn) = \tau(m) \tau(n) when m and n are relatively prime. This is one of the first properties you learn about \tau once you learn its definition, and we’re not going to prove it here.

It turns out that both f and F are multiplicative as well! First, the product of two multiplicative functions is also multiplicative. (This is a one-line proof using the definition of multiplicative.) So \tau(d)^3 is multiplicative.

Another property of multiplicative functions is that if g is multiplicative, then \sum_{d|n} g(d) is also multiplicative. This is a special case of the more general result that the Dirichlet convolution of two multiplicative functions is multiplicative. (In the Dirichlet convolution g \star h, take h to be the identity function; i.e., h(n) = 1.) This means that our functions f and F defined in the previous paragraph are both multiplicative.

Therefore, with the prime-power factorization of n given by n = p_1^{e_1} p_2^{e_2} \cdots p_k^{e_k}, we have

\displaystyle \left(\sum_{d|n} \tau(d) \right)^2 = f(n) = f(p_1^{e_1} p_2^{e_2} \cdots p_k^{e_k}) = f(p_1^{e_1}) f(p_2^{e_2}) \cdots f(p_k^{e_k}) \\= F(p_1^{e_1}) F(p_2^{e_2}) \cdots F(p_k^{e_k}) = F(p_1^{e_1} p_2^{e_2} \cdots p_k^{e_k}) = F(n) = \sum_{d|n} \tau(d)^3.

For an even further generalization, see Barbeau and Seraj [1]. (The proof we give in this post follows the same basic argument as the proof of Proposition 1 in Barbeau and Seraj’s paper.)

  1. Barbeau, Edward, and Samer Seraj, “Sum of cubes is square of sum,” Notes on Number Theory and Discrete Mathematics 19(1), 2013, pages 1–13.

mzspivey
http://mikespivey.wordpress.com/?p=2905
Extensions
Happy Birthday, Benoit Mandelbrot
complex numbersfractalspeople
Today’s Google doodle honors mathematician Benoit Mandelbrot. He would have been 96 today. If you’re interested in learning more about his life and work, the Google doodle link contains a short summary. If you want to go deeper, you can … Continue reading →
Show full content

Today’s Google doodle honors mathematician Benoit Mandelbrot. He would have been 96 today. If you’re interested in learning more about his life and work, the Google doodle link contains a short summary. If you want to go deeper, you can read the Wikipedia article on him.

I only have one small thing to add. A few years ago I wrote an interactive text-based game, A Beauty Cold and Austere. You can read some reviews of it and play it here, and you can read some of my thoughts about it here. The connection to Mandelbrot is that I used part of the Mandelbrot set for the cover art for the game, as you can see below.

abcacover

mzspivey
abcacover
http://mikespivey.wordpress.com/?p=2896
Extensions
No Integer Solutions to a Mordell Equation
number theory
Equations of the form are called Mordell equations.  In this post we’re going to prove that the equation has no integer solutions, using (with one exception) nothing more complicated than congruences. Theorem: There are no integer solutions to the equation … Continue reading →
Show full content

Equations of the form x^3 = y^2 + k are called Mordell equations.  In this post we’re going to prove that the equation x^3 = y^2 -7 has no integer solutions, using (with one exception) nothing more complicated than congruences.

Theorem: There are no integer solutions to the equation x^3 = y^2 -7.

Proof.

Case 1.  First, suppose that there is a solution in which x is even.  Since 2|x, 8|x^3.  Looking at the equation modulo 8, then, we have

\displaystyle 0 \equiv y^2 - 7 \pmod{8} \implies y^2 \equiv 7 \pmod{8}.

However, if we square the residues \{0, 1, 2, 3, 4, 5, 6, 7\} and reduce them modulo 8 we have \{0, 1, 4, 1, 0, 1, 4, 1\}.  So there is no integer y that when squared is congruent to 7 modulo 8.  Therefore, there is no solution to x^3 = y^2 -7 when is even.

Case 2.  Now, suppose there is a solution in which x is odd.  If so, then we have x \equiv 3 \pmod{4} or x \equiv 1 \pmod{4}.

If x \equiv 3 \pmod{4} then we have y^2 \equiv 27 + 7 \equiv 34 \equiv 2 \pmod{4}.  However, if we square the residues \{0, 1, 2, 3\} and reduce them modulo 4 we get \{0, 1, 0, 1\}.  So there is no integer y that when squared is congruent to 2 modulo 4.  Thus there is no solution to x^3 = y^2 - 7 when x \equiv 3 \pmod{4}.

Now, suppose x \equiv 1 \pmod{4}.  Applying some algebra to the original equation, we have

\displaystyle y^2 = x^3 + 7 \implies y^2 + 1 = x^3 + 8 = (x+2)(x^2 - 2x + 4).

By assumption, x + 2 \equiv 3 \pmod{4}.  If x + 2 has prime factors that are all equivalent to 1 modulo 4, then their product (i.e., x+2) would be equivalent to 1 modulo 4.  Thus x + 2 has at least one prime factor q that is congruent to 3 modulo 4.

However, if q|x+2, then q|y^2+1.   This means that y^2 \equiv -1 \pmod{q}.  However, thanks to Euler’s criterion, only odd primes q such that q \equiv 1 \pmod{4} can have a solution to y^2 \equiv -1 \pmod{q}.  Since q \equiv 3 \pmod{4}, y^2 \equiv -1 \pmod{q} has no solution.  Thus there is no solution to x^3 = y^2 - 7 when x \equiv 1 \pmod{4}, either.

Since we’ve covered all cases, there can be no solution in integers to the Mordell equation x^3 = y^2 + k.

mzspivey
http://mikespivey.wordpress.com/?p=2887
Extensions