GeistHaus
log in · sign up

Jigger Wit

Part of wordpress.com

stories
Matroids in 1906
Uncategorizedmathmathematicsmatroid
There is a precise correspondence between matroids and Alfred North Whitehead’s theory of dimension, as found in Mathematical Concepts of the Material World (MW), published in 1906. In brief, if (E,φ) is a geometrical system in the sense of Whitehead, where E is finite and φ-maximal, then E is a simple matroid. Within the context … Continue reading Matroids in 1906
Show full content
A. N. Whitehead

There is a precise correspondence between matroids and Alfred North Whitehead’s theory of dimension, as found in Mathematical Concepts of the Material World (MW), published in 1906. In brief, if (E,φ) is a geometrical system in the sense of Whitehead, where E is finite and φ-maximal, then E is a simple matroid. Within the context of Whitehead’s geometrical axioms, φ-maximal is Whitehead’s term for a simple matroid. Conversely, every simple matroid is a φ-maximal geometrical system in the sense of Whitehead.

Whitehead’s notation will be used for his theory, and Oxley’s notation will be used for matroids.

Hassler Whitney introduced matroids in 1935. I am not aware of any direct influence of Whitehead on Whitney. However, there are some significant indirect connections, as mathematicians in the same intellectual milieu. Both of them were at Harvard: Whitney as a young PhD (received in 1932), and Whitehead from 1924 until his retirement in 1937. Whitney’s PhD advisor was George Birkhoff, who had numerous interactions with Whitehead at Harvard. In fact, the Whiteheads lived two floors above the Birkhoffs on Memorial Drive, and they “were very congenial with them.” Harvard’s Society of Fellows grew out of a conversation between Whitehead and the biochemist L. J. Henderson (who, years earlier, had helped to recruit Whitehead to Harvard). When the Society was organized in 1933, George Birkhoff’s son Garrett became one of the inaugural Junior Fellows, and Whitehead became an inaugural Senior Fellow. Garrett Birkhoff was an early contributor to matroid theory, publishing the same year as Whitney in 1935. Incidently, another of the five inaugural Junior Fellows was Quine, who received his PhD under Whitehead at Harvard in 1932. The Society of Fellows met for dinner Monday nights.

Matroid theory was originally conceived as a common generalization of the notion of linear independence in vector spaces and circuits in graphs. The definition of a matroid can be presented in several ways – in terms of flats, a closure operator, a rank function, or independent sets. All of the definitions turn out to be equivalent, and the equivalence of these definitions (among others such as circuits) is the starting point of matroid theory.

Whitehead’s MW has an analogue of each of these fundamental concepts of matroid theory. His name for flats and closure was the φ-common region; his name for rank was the φ-dimension number; and his name for independent set was φ-axial. Within the context of his geometrical axioms, a φ-maximal set was his name for simple matroid. A significant part of MW is devoted to proving propositions about these concepts and their relationships.

The analogy between matroids and Whitehead’s theory is shown in the following table.

Matroid Whitehead closure operator φ-common region equal closure φ-equivalent rank φ-dimension number independent set φ-axial simple matroid φ-maximal

Whitehead’s theory includes examples that lie outside the scope of matroid theory. Yet numerous propositions in MW have exact counterparts in matroid theory. The two theorems below give the precise relationship between matroids and Whitehead’s theory. The first theorem states that every finite φ-maximal geometrical system in the sense of Whitehead is a simple matroid. The second theorem states conversely that every simple matroid is a finite φ-maximal geometrical system in the sense of Whitehead.

[Whitehead’s geometrical axioms] Let E be a set, and let some of the subsets of E be designated as φ-classes. We say that (E,φ) is geometrical (or a geometrical system) in the sense of Whitehead if the following five axioms hold.

  • (λ) E is a φ-class.
  • (μ) For all elements x of E, the singleton {x} is a φ-class.
  • (ν’) The φ-dimension number of E is finite, and the φ-common region of the empty set is the empty set.
  • (π) For all subsets u of E and for all φ-axial subsets v of its φ-common region, there exists a subset w of E such that the union v∪ w is φ-axial and φ-equivalent to u.
  • (ρ) For all φ-axial subsets u,v of E, if the cardinality of the union u∩ v is at least 2, then the union u∪ v is φ-maximal.

Whitehead defines φ to be geometrical if it satisfies five axioms (λ,μ,ν,π,ρ).Hp.φ. The axioms λ, μ, π and ρ are Whitehead’s axioms λ.Hp.φ, μ.Hp.φ, π.Hp.φ, and ρ.Hp.φ.

Whitehead’s axiom ν.Hp.φ is that E is three-dimensional. He makes this assumption because his interest is three-dimensional geometry, while observing that “the reasoning can be applied to higher dimensions, only more elaborate inductions and an extra axiom are required.” Whitehead does not spell out this extra axiom. We have replaced his axiom with finite-dimensionality (ν’).

Theorem. Let (E,φ) be geometrical in the sense of Whitehead. If E is finite and φ-maximal, then E is a simple matroid whose closure operator is the φ-common region operator. Moreover, a subset of E is a flat of the matroid iff it is an intersection of φ-classes. The independent sets of the matroid are the φ-axial sets (together with the empty set). Finally, the rank function on the matroid is equal to the φ-dimension number on every nonempty set.

Theorem. Let (E,I) be a simple matroid, with finite ground set E and independent sets I. Define the φ-classes to be the flats (i.e. closed sets) of the matroid. Then

  • For any subset u of the ground set E, the φ-common region of u is the matroid closure of u.
  • Two subsets u,v of the ground set E are φ-equivalent iff they have the same matroid closure.
  • A nonempty subset of the ground set E is φ-prime iff it is an independent set of the matroid.
  • The φ-dimension number of a nonempty subset of the ground set E is its matroid rank.
  • If E is nonempty, then E is φ-maximal. Moreover, every φ-prime set is φ-axial.
  • (E,φ) is geometrical in the sense of Whitehead.

The following pdf gives the proofs of these theorems.

whitehead-matroids-1906Download
tchales
http://jiggerwit.wordpress.com/?p=2020
Extensions
Babylonian Square Root of 2 in One Shot
Uncategorizedalgorithmsgeometryhistoryhistory of mathmathmathematicsscience
Introduction (A pdf version of this post appears at the end of this post.) We propose a one-shot calculation of √2 using Babylonian mathematics. An Old-Babylonian (OB) tablet YBC 7289 (from around 1800 B.C. to 1600 B.C.) contains the approximation B = 1 + 24/60 + 51/602 + 10/603 = 30547/21600 for √2. The approximation … Continue reading Babylonian Square Root of 2 in One Shot
Show full content
YBC 7289
Introduction

(A pdf version of this post appears at the end of this post.)

We propose a one-shot calculation of √2 using Babylonian mathematics. An Old-Babylonian (OB) tablet YBC 7289 (from around 1800 B.C. to 1600 B.C.) contains the approximation

B = 1 + 24/60 + 51/602 + 10/603 = 30547/21600

for √2. The approximation is very good: |B – √2| < 10-6. “This is indeed a remarkably good approximation. It was still used by Ptolemy in computing his table of chords almost two thousand years later” (Neugebauer). The tablet “demonstrates the greatest known computational accuracy obtained anywhere in the ancient world” (Beery). The tablet does not explain the method of calculation. It is a matter of speculative interest to give a plausible reconstruction of the method, based on our understanding of Babylonian mathematics. In this note, we propose a calculation leading to the value B that might plausibly have been done at that time.

The Babylonian number system was base sixty (sexagesimal). There was no decimal point. This means that a single written number N might represent any of the numbers N(60)n as n varies. For example the same numerals represent both the decimal number 30 and the fraction 0.5 = 30/60. The Babylonian mathematician had to keep mental track of the exponent n. In this post, numbers are written in base 10 except when indicated otherwise. Fractions are our modern notation, but not part of the Babylonian system.

The Babylonians knew the basic arithmetic operations: addition, subtraction, and multiplication. There was a concept of squaring and square roots, and it was understood that these are inverse operations. The square root operation should also be viewed as an extremely computationally expensive operation for numbers that are not perfect squares, and the aim of this note is how it might be that √2 was computed with such precision.

Babylonians computed reciprocals 1/b. Division was accomplished in two steps: a/b was computed as the product of a with the reciprocal 1/b. The vast majority of all reciprocals appearing in Old-Babylonian texts were reciprocals of regular numbers b, where we define a natural number b to be regular if it is a divisor of 60n for some n. In that special case 1/b = a/60n is exactly represented in sexagesimal, where ab=60n.

According to Mansfield, “Reciprocals of irregular numbers are more troublesome on account of their endless sexagesimal expression. By far the most common approach was to assert they have igi nu, which translates to no reciprocal.” Sachs states that the Babylonians rigged their problems in advance in such a way that the solution would not require division by an irregular number. In short, the Babylonians rigged their problems to avoid irregular division, and they tended to give up whenever an irregular division was required.

Calculation of the Square Root

We use the Babylonian square-root approximation √2= √(B02 – δ) ≈ B0 – δ/(2 B0), where δ=B02-2. Here B0 is an approximation to √2 and δ=B02-2 is the error term. The knowledge and use of this formula is well-attested. Simple geometric diagrams with areas of rectangles and squares give some geometric justification for the formula. Here and elsewhere in Old Babylonian texts, specific numeric values were used, foregoing variables. However, it was understood that other numeric values could be used in the numerical recipe.

The main challenge in using this square-root approximation is to find a suitable approximation to the reciprocal of B0 when B0 is an irregular number. We will use the approximation

(1+x)-1≈ (1-x), when |x|≪1.

This gives

1/B0 = (B0/2) (1/(1+δ/2)) ≈ B0(1-δ/2)/2 =: A0.

The approximation A0 might have been discovered in two stages. First, we observe that when B0 is close to √2 then the reciprocal of B0 is nearly B0/2. The error is the multiplicative factor B02/2=(1+δ/2), and its reciprocal can be approximated in the second stage using (1-δ/2). I am writing the formulas algebraically to make them more comprehensible, but the Babylonian calculation would have used specific numeric values, foregoing variables. Also, the formula would have been given as a procedure: first multiply B0 by its approximate reciprocal B0/2 to obtain (1+δ/2), then multiply by its approximate reciprocal (1-δ/2).

The constant B0:=85/60=1+25/60 is an Old Babylonian approximation to √2 (Neugebauer). I’ll add a few words of speculation about how this approximation might have been discovered. According to Neugebauer, tables of squares (listing n and n2) from n=1 to 60 were computed by the Babylonians. Going slightly beyond this range, it is easy to imagine that somebody sometime computed 852=7225 and saw the remarkable proximity to 7200 = 2(60)2. In sexagesimal, the “2” pops out on the right-hand side of the equation in a striking way:

B02=sxg(1,25)2 = sxg(2,0,25),

where we have used notation sxg(a1,…) for the number with sexagesimal digits (a1,…). More directly, perhaps somebody noticed directly from the tables of squares that 2(12)2=288 and 172=289 are consecutive. In short, B0 might well have been found rather than calculated. Whatever the method might have been, we take it as given that the Babylonians knew the approximation B0.

If we use the known value B0, then δ = B02-2 = 1/144. The approximation to the square root and A0 give an approximation that can be computed as an exactly representable sexagesimal number:

√2 ≈ B0 – δ A0/2 = B0 (δ2 – 2δ +8)/8 = 2815217/(21335)

    = 1 + 24/60 + 51/602 + 10/603 + 35/604 + 40/605 + 37/606 + 30/607.

Dropping the trailing least significant digits: + 35/604+…, we obtain B, the Babylonian approximation! (It is fortuitous that we improve the accuracy of the approximation by dropping the least significant digits.)

Comments

Most if not all previous reconstructions of the √2 calculation assume one or both of the following techniques:

  1. a general ability to compute high-precision reciprocals of irregular numbers;
  2. iterative use of a square root approximation (Heron’s method or Newton algorithm) in a scheme of successive approximations. (Recall that Heron’s method for square roots is the special case of Newton’s algorithm applied to the polynomial x2-d to compute √d. The Babylonian square root approximation is equivalent to a single iteration of Heron’s method with initial value B0.)

My reconstruction succeeds without using either technique. I have noted above the strong Babylonian aversion to reciprocals of irregular numbers. For the first technique, we only need the relatively easy approximation of reciprocals close to 1. There exists a Babylonian table attesting to approximate reciprocals of regular and irregular numbers of the form (1+x), for |x|≪ 1 (Neugebauer). In fact, the table gives one of the rare examples where reciprocals of irregular numbers were approximated. The table contains several mistakes, yet the correct answers are even more accurate than our (1-x) approximation. We have confined ourselves to approximating a reciprocal that falls within the range of the Babylonian table.

For the second technique, our scheme needs no iteration; our computation of √2 invokes the square root approximation only once. A plausible reconstruction should preferably avoid complex iterative schemes. Iteration was not a generally understood concept in Babylonian mathematics, although simple repetitions occur in simple contexts (such as procedures to multiply multi-digit numbers). Another OB example of iteration was the process of factoring a regular number by successively dividing out small factors. Knuth has studied algorithmic aspects of Babylonian mathematics. He describes the situation in these terms.

“Several ancient Babylonian methods, for doing such things as solving special sets of quadratic equations in two variables are also known. Genuine algorithms are involved in this case,… Many of these Babylonian algorithms predate Euclid by 1500 years, and they are the earliest known instances of written procedures for mathematics. But they do not have the stature of Euclid’s algorithm, since they do not involve iteration.” [Knuth, vol2, page 335]

Other speculative reconstructions of the approximation have been proposed. Mansfield proposes three iterations of Heron’s method. Bruins describes reciprocal algorithms. (The blog post leaves out footnotes and citations, which can be found in the pdf.)

sqrt2Download
tchales
http://jiggerwit.wordpress.com/?p=1962
Extensions
The Geometry of the Harpa Concert Hall
geometryarchitectureIcelandmath
I am in Reykjavík, and my hotel is a short walk from the Harpa concert hall. The facade of the Harpa is a tiling of elongated dodecahedra. Many have noted the similarity between the Harpa facade and the large basalt columns found in Iceland. However the designer of the facade has stated, “The bricks themselves … Continue reading The Geometry of the Harpa Concert Hall
Show full content
The Harpa Concert Hall in Reykjvavík.

I am in Reykjavík, and my hotel is a short walk from the Harpa concert hall.

The facade of the Harpa is a tiling of elongated dodecahedra. Many have noted the similarity between the Harpa facade and the large basalt columns found in Iceland. However the designer of the facade has stated, “The bricks themselves are not, as a matter of fact, meant to reflect the Icelandic basalt columns as some have speculated, but the idea of the intricate language of mathematics.”

The elongated dodecahedra of the Harpa.

A convex polyhedron whose translates are capable of a face-to-face tiling is called a parallelohedron. Such convex polyhedra in three dimensions were classified by Fedorov in 1885. All parallelohedra have easily recognized general features: (1) the polyhedron itself is centrally symmetric; (2) each face is a parallelogram or centrally symmetric hexagon; (3) the edges fall into groups of parallel edges of the same length. The Harpa polyhedron falls within this general classification, and these general features are visually evident in the Harpa.

The bottom cap of each Harpa polyhedron is formed by one centrally symmetric hexagon and two parallelograms.

Fedorov’s classification establishes that there are five families of parallelohedron. The most general of the five families is that of the truncated octahedra. Each polyhedron in this family has 14 faces, consisting of 8 hexagons (arranged into four opposite pairs) and 6 parallelograms (arranged into 3 opposite pairs). The polyhedron has 36 edges, arranged into six groups of six parallel edges, each edge within a group having the same length. The family is 11=6+(4*2-3) dimensional, determined by six edge lengths of the six parallel edge groups and four unit normals to the four pairs of hexagonal faces (mod the three-dimensional group of rotations).

Fedorov’s classification can be interpreted as degenerations of the family of truncated octahedral parallelohedra. Each degeneration is obtained by shrinking each of the edges within a parallel group to length zero.

The initial degeneration of the family of truncated octahedra gives the family of the elongated dodecahedron. Here “elongated” refers to the four hexagonal faces forming a equatorial belt that separates a “northern” cap of four parallelograms from a “southern” cap of four parallelograms. The Harpa polyhedron belongs to the family of elongated dodocahedra. The long “hexagonal columnar” form of the Harpa polyhedron is obtained by stretching one parallel group of edges to make them much longer than the others. (This stretching into long columns in the Harpa is not the same as the elongation referenced by the name “elongated dodecahedron.” In the Harpa polyhedron, the belt of four hexagons is at a tilt, extending into its top and bottom caps.)

The family of elongated dodecahedra can further degenerate into either the family of hexagonal prisms or the family of the rhombic dodecahedron. These families degenerate yet further into a family of parallelopipeds. This is Fedorov’s classification into five types: the family of the truncated octahedron, elongated dodecahedron, rhombic dodecahedron, hexagonal prism, and parallelopiped.

Note that there are several false claims in circulation about the mathematical structure of the Harpa tiling. Fedorov completed his classification in 1885. Any claim that the Harpa tiling was not known until the twentieth century is a historical inaccuracy.

tchales
http://jiggerwit.wordpress.com/?p=1939
Extensions
DeepSeek’s Expected Value
UncategorizedAIartificial-intelligencearXivDeepSeekeducationllmmachine learningmathematicsreinforcement learning
Abstract: A recent arXiv preprint from DeepSeek contains an equation for an expected value, which is used in reinforcement learning. We present the expected value formula and describe how it has been used to enhance language model reasoning. pdf version Introduction In an article on the recent AI model developed by the Chinese company DeepSeek, … Continue reading DeepSeek’s Expected Value
Show full content

Abstract: A recent arXiv preprint from DeepSeek contains an equation for an expected value, which is used in reinforcement learning. We present the expected value formula and describe how it has been used to enhance language model reasoning.

pdf version

Introduction

In an article on the recent AI model developed by the Chinese company DeepSeek, the Guardian reported:

The race for domination in artificial intelligence was blown wide open on Monday after the launch of a Chinese chatbot wiped $1tn from the leading US tech index, with one investor calling it a “Sputnik moment” for the world’s AI superpowers. — The Guardian, Monday, Jan 27, 2025.

DeepSeek-AI had posted an article on the arXiv five days earlier. The second sentence of the abstract asserted:

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.

DeepSeek’s article presents a key equation, defining an expected value that is used for reinforcement learning. Here, we explain that equation and how it is used to obtain a model (R1-Zero) with remarkable reasoning capabilities. This article is written by a mathematician for a math audience. I hope that others, too, might find it helpful.

Language Models

Let T be a finite set of tokens and let T* be the set of finite sequences of tokens; that is, T* is the set of strings formed from the given tokens. Let Δ(T*) be the probability simplex over T*:

Δ(T*) = {π:T*→[0,1] : ∑o∈ T*π(o)=1}.

A language model is a curried function

π:T* → Δ(T*),   q↦ o↦ π(o|q)∈[0,1],

where π(o|q) is the probability that the model gives the output o∈ T* in response to the input q∈ T*. The notation o|q is read o given q.

This definition of language model conforms with the intuitive idea of a large language model (LLM) such as ChatGPT, where the user enters text q at a prompt, and the LLM responds with text o. The response is not deterministic — repeated identical input prompts will yield different responses o. But some responses are more likely than others, according to the probability π(o|q) of the response o given input q.

A language model becomes a parametric language model, when there is a set Θ of trainable parameters. A parametric language model has the form

π: Θ× T* → Δ(T*),  (θ,q)↦ o↦πθ(o|q).

Typically, the set Θ⊆ℝd of parameters might be a convex subset of Euclidean space of some dimension d. When training is complete, the parameter θ∈Θ is fixed to provide a specific language model πθ. For example, θ might be the vector of weights of a trained neural network.

Optimization

DeepSeek’s reinforcement learning (RL) algorithm shows how an initial parameter θ=θ0 might be improved to a different parameter θf that produces “more reasonable” responses o to queries. Here, “reasonable” means achieving high accuracy on queries q where answers are unambiguously right or wrong. Deepseek improves the initial parameter θ0 through a sequence of nonlinear optimization problems. Deepseek defines an expected value (EV) function (Equation 1 below):

EV:Θ×Θ×Θ→ ℝ,  (θ,θn,θref)→ EV(θ,θn,θref).

Starting with an initial language model parameter θ0∈Θ and a reference parameter θref, we iteratively solve

θn∈argmaxθ∈Θ EV(θ,θn-1,θref), for   n=1,2,…,f.

(Recall that argmax is defined to be the set of inputs θ∈Θ which produce the maximum value of the objective function.) That is, each language model parameter θn is obtained by a maximization problem starting from the previous parameter θn-1, until reaching the final parameter θf, after thousands of iterations (say, f=104). To summarize, the expected value EV should be designed to have the property that its successive maximizations will produce language model parameters θ0,θ1,…,θf that generate increasing reasonable answers.

Reward

We assume as given a parametric language model π:Θ× T*→Δ(T*), initial language model parameters θ0,θθref∈Θ (for example, we might take θ0=θref to be the weights of a trained neural network), and a large bank Q⊂ T* of problems and their answers. The answers are unambiguously right or wrong. For example, Q might be a bank of math problems whose answers can be expressed in canonical form. One of the features of DeepSeek is that each answer o to a problem q is assigned a simple reward r(o|q) (as opposed to intricate reward modeling). The reward function is fixed once and for all. To give one possibility, it might be something as simple as the function

r(o|q) := δ1 + δ2,

where δ1 is a reward for correct formatting (say +0.1 if formatted corrected) and δ2 is a reward for the correct answer (say +1.0 if answered correctly, 0 otherwise).

Batches and advantages

The DeepSeek algorithm works with batches of answers to a given query. There is a fixed constant g (the batch size) such that each batch O is a random sample of size g of responses o to the same query q (via a given language model). If O is a batch of answers to the query q, the advantages are the standardized rewards, obtained by mean centering and variance scaling:

aO(o|q) := (r(o|q)-r̅)/σ,

where and σ are the mean and the standard deviation of the rewards in the batch O.

Cap function

The cap function

capε(x) := min(ε,x)

is suited for many engineering purposes such as bandwidth throttling, load control, risk control, and modeling physical limits. The cap limit ε >0 is a fixed constant (a hyperparameter with default value ε=0.2). The effect of the cap in a maximization objective function is to enforce the maxims that “Happiness is best in moderation” and “Better the devil you know than the devil you don’t.” In a capped maximization, incremental improvements are welcome, but windfall improvements are suspect. Without a cap, an excessively large update might occur in a single iteration.

The cap function
Value

The value of some new parameter θ relative to an old parameter θn (given query q, batch of answers O, and o∈ O) is defined as follows.

valueO(o|q,θ,θn,θref):= |a|capε(sgn(a)(tn-1)) – β d(tref), (Equation 1)

where we have used abbreviations for the advantage a=aO(o|q), the sign ∈{-1,0,1} of a, and likelihood ratios:

tn=πθ(o|q)/πθn(o|q),  tref=πθ(o|q)/πθref(o|q).

Formula (1) may be interpreted as capping the product of the advantage a and relative change (tn-1), then penalizing any departure from the reference. The penalty term β d(tref) will be explained below. The sign of a in Equation 1 means that better than average answers are valued, and worse than average answers are disparaged. Stated more mathematically, the function (Equation 1) is increasing in tn when a>0 and decreasing when a<0. The effect of the sign is a reflection of the cap function through the vertical axis tn=1. The greater the magnitude |a| of the advantage, the greater the influence on value.

Unlike some RL approaches, DeepSeek’s expected value function uses a conservative optimization strategy using both capping and penalty functions. This approach avoids over-optimization and encourages gradual performance gains across many iterations.

Expected value

The function EV is the expected value E of the total value:

EV(θ,θn,θref) = Eq,O( ∑o∈ O valueO(o|q,θ,θn,θref)).

The questions q are selected according to a fixed probability distribution on the bank of problems Q. For each q, the batch of answers O is a random sample (of fixed batch size #O=g) drawn according to the probability distribution o↦πθn(o|q).

Penalty

The penalty function is defined to be d(t):=t-1-ln(t), for t>0. By taking derivatives, we see that d is non-negative, convex, and attains its minimum d(1)=0 uniquely at t=1. It is also clear from the behavior of the logarithm that d(0+)=d(+∞)=+∞. Taking β>0 to be a fixed constant (say a hyperparameter with default value β=1), we see that the penalty term -β d(t)≤0 becomes increasingly negative as t moves away from t=1. Interpreting t=πθ(o|q)/πθref(o|q) as a likelihood ratio, we see that the penalty term will penalize any divergence from the reference parameter θref: don’t stray far from the well-trodden path. Both the cap and the penalty enforce a very conservative optimization strategy. This completes the specification of DeepSeek’s expected value function (EV).

The penalty -βd

Reasoning process

Each response o to a problem q∈ Q is required to follow the format specified by a template for DeepSeek-R1-Zero. “The assistant first thinks about the reasoning process … and then provides the user with the answer. The reasoning process and answer are enclosed within <think></think> and <answer></answer> tags, respectively” [arXiv:2501.12948, p6].

The reward function r(o|q) depends only on the correctness of the answer (and on correct formatting); the reasoning process between the <think> element tags is discarded. However, autoregressive models generate answers sequentially, and a structured reasoning process increases probabilities of a correct answer.

Endnote

DeepSeek uses the more complicated function f(t):= min(t,clip(t,1-ε,1+ε)) in its formulas, where the clipping function restricts the range of a function to an interval [a,b]. However, we can simplify DeepSeek’s formula by using the clip-cap identity f(t) = 1 + capε(t-1), for all ε>0, which replaces the clip with a cap. Also, transforming the objective function via an increasing affine function has no effect on the argmax.

DeepSeekDownload

tchales
http://jiggerwit.wordpress.com/?p=1897
Extensions
AI-algorithm for the digits of the cube root of 2. Or is it?
UncategorizedAIalgorithmsartificial-intelligencemachine learningmathmathematicsprogrammingpython
I received an email from Josef Urban this morning. His group has been using AI to find algorithms that generate the sequences found in the Sloane’s Online Encyclopedia of Integer Sequences (OEIS). The input is a finite sequence of integers. The output is an algorithm that produces the same sequence of digits. Using AI, they … Continue reading AI-algorithm for the digits of the cube root of 2. Or is it?
Show full content

I received an email from Josef Urban this morning. His group has been using AI to find algorithms that generate the sequences found in the Sloane’s Online Encyclopedia of Integer Sequences (OEIS). The input is a finite sequence of integers. The output is an algorithm that produces the same sequence of digits. Using AI, they have been able to generate algorithms for more than 130,000 of the OEIS integer sequences.

Urban writes:

…here is a freshly invented AI creation (apparently) computing the cube root of 2 as a gift for your retirement:

https://oeis.org/A2580 : 1 2 5 9 9 2 1 0 4 9 8 9 4 8 7 3 1 6 4 7 6 7 2 1 0 6 0 7 2 7 8 2 2 8 3 5 0 5 7 0 2 5 1 4 6 4 7 0 1 5 0 7 9 8 0 0 8 1 9 7 5 1 1 2 1 5 5 2 9 9 6 7 6 5 1 3 9 5 9 4 8 3 7 2 9 3 9 6 5 6 2 4 3 6 2 5 5 0 9 4 1 5 4 3 1 0 2 5

size 72, time 9598378: loop(loop2(((y div x) + x) div 2, y, (2 + 2) + y, x, loop(loop2(((((y * y) div (x + x)) + x) * y) div ((x + y) + 2), y, 2, (2 + (2 * (2 + 2))) * x, loop((2 + (2 * (2 + 2))) * x, y, 2)), y + y, 2)), x, 1) mod (2 + (2 * (2 + 2)))

...I have checked that it works for the first 1024 digits. It's the first program we got for cube root of 2

Here is the Python translation of our code for A2580:
def f5(X,Y):
x = 2
for y in range (1,Y + 1):
x = (2 + (2 * (2 + 2))) * x
return x

def f4(X,Y):
x,y = (2 + (2 * (2 + 2))) * X, f5(X,Y)
for z in range (1,2 + 1):
x,y = (((((y * y) // (x + x)) + x) * y) // ((x + y) + 2)), y
return x

def f3(X,Y):
x = 2
for y in range (1,(Y + Y) + 1):
x = f4(x,y)
return x

def f2(X,Y):
x,y = X, f3(X,Y)
for z in range (1,((2 + 2) + Y) + 1):
x,y = (((y // x) + x) // 2), y
return x

def f1(X):
x = 1
for y in range (1,X + 1):
x = f2(x,y)
return x

def f0(X):
return f1(X) % (2 + (2 * (2 + 2)))

for x in range(32):
print(f0(x))

So why does this work? The basic mechanism is as follows. The fixed point equation for the function f4 (using real division instead of integer division) gives in Mathematica the fixed point (depending on y):

fixedPoint= (x /. Solve[x == (((((y*y)/(x + x)) + x)*y)/((x + y) + 2)), x] // First)

From the explicit formula provided by Mathematica, it is easy to compute the exact limit of y/fixedPoint as y tends to infinity. The limit is the cube root of 2. In this way, the cube root of 2 arises naturally out of the AI-algorithm. Even more directly, if we drop the 2 from the denominator of f4, because it is a lower-order term, the fixed point equation takes the form

Solve[x/y == (((y*y)/(x + x)) + x)/(x + y), y]

and this has real solution y/x = 2^(1/3), without taking limits.

However, this is not at all the end of the story. Urban’s AI-generated algorithm contains a lot of extra noise. It is disturbing that the algorithm uses integer division, that it uses strange ranges for the iterations, and that it uses (finite) integer values for y, rather than the limit as y tends to infinity. Given all this noise, is it really true that the algorithm correctly outputs every single digit of the cube root of 2? I’m not so sure.

This raises interesting questions about intelligible AI. We might view this AI-generated algorithm as a prototype of the kind of output that we might get from an AI-system. We get an answer, and there is no simple path to the verification of the answer. In this case, the fixed point was easy to compute. But what happens when no such simple trick can be found? We might be left hanging, not knowing whether to trust the algorithm or not.

tchales
http://jiggerwit.wordpress.com/?p=1886
Extensions
Letter to the Pitt University Times on Free Speech
Uncategorizedfree-speechgazanewspolitics
[In step with other universities, the University of Pittsburgh administration has recently problematized free speech.] I am responding to two fundamentally misguided articles in the University Times, “Protests, attacks raise issues about when does free speech become hate speech” (10/18/2024) and “Man charged…” (9/6/2024). Senate Council President Kear remarked, “We have been talking about free … Continue reading Letter to the Pitt University Times on Free Speech
Show full content

[In step with other universities, the University of Pittsburgh administration has recently problematized free speech.]

I am responding to two fundamentally misguided articles in the University Times, “Protests, attacks raise issues about when does free speech become hate speech” (10/18/2024) and “Man charged…” (9/6/2024).

Senate Council President Kear remarked, “We have been talking about free speech versus hate speech and its impact.” Free speech has no hate speech exception. The First Amendment protects sorrowful and jubilant speech, hateful and loving speech, bland and controversial speech, timid and bold speech.

“The increased rhetoric on campus surrounding the conflict in the Middle East has put the University in a difficult position.” On the contrary, the university is not in a difficult position — not in the slightest. Our commitment to free speech must be unwavering, especially in protection of political speech.

I am cynical about the articles’ narratives, which describe an assault by a man “wearing a keffiyeh,” a keffiyeh which we are told is “Palestinian-associated.” Notice how deftly the University Times places collective guilt for the assault on Palestinians. From there, the articles make a fast and loose association with the “two pro-Palestinian encampments” in Oakland and a “pro-Palestinian rally.” Before we know it, if the narratives are to be believed, free speech of a certain type is to blame. According to the University Times, the problematic type of speech is the kind found at pro-Palestinian rallies.

I, too, have been assaulted when leaving the Pitt campus — a troubling experience. Still, I have refused to engage in identity politics or to spin a narrative for political gain.

By right and long tradition, we cannot stomach restrictions on free speech. In the words of Dilbert creator Scott Adams, “I want to know the full list of what I’m not supposed to f**king say, and I’m going to say everything on that f**king list every single day” (without the asterisks).

tchales
http://jiggerwit.wordpress.com/?p=1881
Extensions
Easy Pieces in Geometry
Uncategorizedfour-color-theoremgame-of-hexgeometryjohn-nashmathpiet-hein
Problems worthy of attack, prove their worth by hitting back – Piet Hein In 2007, I wrote a collection of “Easy Pieces in Geometry” for a course I was teaching in discrete geometry. I presented an easy piece at the beginning of each lecture as a warm-up. I never polished the notes, and I am … Continue reading Easy Pieces in Geometry
Show full content

Problems worthy of attack, prove their worth by hitting back – Piet Hein

In 2007, I wrote a collection of “Easy Pieces in Geometry” for a course I was teaching in discrete geometry. I presented an easy piece at the beginning of each lecture as a warm-up. I never polished the notes, and I am reposting them here in unfinished form.

During the Nazi occupation of Denmark, Piet Hein invented the board game of Hex, while working on the four-color problem. Hein drew four countries along the sides of a diamond and a fifth country surrounding the four. He wondered if it might be possible to find a way to extend the four countries into the diamond in such a way that the countries on opposite sides share a border. If such a map could be found, all five countries would share borders with one another, and five colors would be required to color the map. This would give a counterexample to the famous four-color problem.

The first easy piece is the game of Hex, invented by the mathematician Piet Hein and reinvented by John Nash.

easy-pieces-geometry-2007Download
tchales
http://jiggerwit.wordpress.com/?p=1870
Extensions
Telegram and Crypto Wars
UncategorizedcryptographyDurovTelegram
April 15, 2021 [This post gives the full text of my presentation on the cryptography of Telegram that was delivered April, 2021. The presentation has become highly relevant in 2024 in connection with the charges that have been brought against Pavel Durov in Paris. Three of those charges relate to the unauthorized use of cryptography. … Continue reading Telegram and Crypto Wars
Show full content
Pavel Durov

April 15, 2021

[This post gives the full text of my presentation on the cryptography of Telegram that was delivered April, 2021. The presentation has become highly relevant in 2024 in connection with the charges that have been brought against Pavel Durov in Paris. Three of those charges relate to the unauthorized use of cryptography. Here I discuss the mathematician Nikolai Durov, who designed the cryptography for Telegram. He is Pavel Durov’s brother and cofounder of Telegram. I strongly condemn France’s criminalization of unsurveilled speech.]

Crypto Wars

April [2021] has been a terrible – a truly awful – month for internet security. For starters, 533 million phone numbers of Facebook accounts have been leaked online. If you have a Facebook account, your phone number might be compromised. Just days after the Facebook news, it was revealed that hackers are selling the data from 500 million LinkedIn accounts. And then less than a week ago [4/9/2021], it was demonstrated that hackers can take control of a computer running Zoom. If you are watching right now [in 2021] in a Zoom session on Windows or Mac, your computer is vulnerable. My apologies to all those I invited tonight [if watching by Zoom]! Fortunately, researchers found this vulnerability before any mischief could be done and reported it responsibly to Zoom.

How can we design better protections for our security and privacy? Today I will explain how a mathematician thinks about these things.

Mathematical Jeopardy

Cryptography (crypto for short) is the science of protecting communication to keep intruders out. Think of encryption as putting a message into a sealed envelope so that nobody can read it until it reaches its destination. Crypto is based on math, on computer science, and on engineering.

To see how crypto works, let’s play a little game called Math Jeopardy. (I promise not to spend more than one minute on math.) I give you the answer and you try to be the first to guess the question.

  • If the answer is 6, you will say, “Alex, what is 2 times 3?”
  • If the answer is 10, you will say, “Alex, what is [pause] 2 times 5?”
  • If the answer is 15, you will say, “Alex, what is [pause] 3 times 5?”

Are you ready for final jeopardy?

  • If the answer is 2,430,101, you will say, “Alex, what is [long pause] 1223 times 1987?”

As you see from the game of math jeopardy, the solutions rapidly become more difficult as the numbers get bigger. There was a $75,000 prize if you can find two numbers that multiply together to give this 270 digit number:

412023436986659543855531365332575948179811699844327982845455626433876445565248426198098870423161841879261420247188869492560931776375033421130982397485150944909106910269861031862704114880866970564902903653658867433731720813104105190864254793282601391257624033946373269391 (RSA-896).

That is the end of the math in my talk. This little game is significant for two reasons.

  1. Multiplication problems like these are easy to create. All you need is a mobile device.
  2. There is an amazingly clever way to use these multiplication problems to seal your messages into a privacy envelope. This means that secure communication is possible on a massive scale.

I can seal an envelope using the number 6, and it can only be unsealed using the numbers 2 and 3. When we seal the envelope with a big number like the one on my slide, this is very effective. We literally have “Safety in Numbers” (echoing the theme of this lecture series).

Strong Crypto

The process is so effective that no government in the world – no matter how many supercomputers it has – can break the crypto. Your private communications are secure. These algorithms are built into every browser and you use them every day without even being aware of it. This gives an enormous power to individuals: the power to communicate in private.

This is what we mean by strong crypto: encryption that is so effective that no government in the world can defeat it. It goes without saying that some governments do not want strong crypto in the hands of its citizens.

This has led to the Crypto Wars. A crypto war is not a physical battle, but a legal battle between a government on one side and private citizens on the other side. The government is fighting for increased surveillance and weakened cryptography. The private citizens are fighting for their right to privacy and strong crypto.

The crypto wars go back to the Cold War. Open cryptographic research was basically impossible back in the 1970s. At the time, every research paper related to cryptography was defined as a “weapon of war” by the United States government and was regulated as such by the International Traffic of Arms Regulation (ITAR). There was a danger of being prosecuted by the US government for presenting a crypto research paper at a conference. Eventually the US government backed off, and crypto research started to be done in the open. The crypto wars have continued off and on since then.

Snowden
Edward Snowden

In 2013, the whistleblower Edward Snowden revealed that the United States government was engaged in mass surveillance on its own citizens. A US intelligence agency had inserted backdoors in encryption. A backdoor means that you believe your communications are secure, when in fact they are not. The Snowden revelations made me passionate about cryptography.

Telegram

To bring the discussion up to date [in 2021], let me say a few words about a recent crypto war. I’ll use the example of the instant messaging service Telegram. If you are not a user of Telegram, that’s fine: it is an app that allows you to send private text messages that can be encrypted. Telegram is very popular: it has more than 500 million active monthly users [950 million in 2024].

As a mathematician, I am interested in Telegram because one of the two founders Nikolai Durov is a mathematician. He designed the crypto. I have met him. He is a brilliant mathematician. He won three gold medals in high school on the International Math Olympiad. He has two math PhDs. At one point, Telegram boasted that half its employees were math PhDs.

Nicholai Durov at the chalkboard explaining a model of Spec(Z)

Can we trust the crypto in Telegram? Designing good crypto is about as hard as designing a helicopter that flies. Unless you are an expert, it probably won’t fly. Security researchers have said that Telegram’s crypto is quite weird and idiosyncratic. But is the current version safe? A few months ago [in December 2020] researchers announced, “We can affirm that MTProto 2.0 [Telegram’s crypto] does not present any logical flaw.” [Here’s an update from 2023.] Their proof used what is called automated reasoning: the reasoning was not done in the traditional way on a chalkboard, but the reasoning was done by a completely automated computer process. The idea of replacing chalkboard with computer reasoning is an important trend in computer science and mathematics.

Telegram’s Crypto Wars

Getting back to Crypto Wars, Telegram has had three major crypto battles.

(1) First battle – the Kremlin: The founders are from Saint Petersburg, and The Kremlin went after Telegram because of its strong crypto. Last year, in the Washington Post, there was a fascinating account of this battle with headline: “How the founder of the Telegram messaging app stood up to the Kremlin – and won” [6/28/2020].

(2) Second battle: The United States Securities and Exchange Commission (SEC) went after the Telegram blockchain. Blockchains are the key technology behind crypto currencies. The SEC has started to take action against some cryptocurrencies that fail to register with the SEC. Telegram gave up their battle with the SEC and withdrew their blockchain.

(3) Third battle: This is not a battle with a government but with private organizations that are against free speech and against private speech. The anti-free speech movement often uses what I call it the “evil pizza” argument. The evil pizza argument goes like this. A terrorist walks into Dominoes pizza and orders a pizza. You then blame evil Dominoes pizza for nourishing a terrorist. Then the terrorist gets in his Toyota and drives off, so you blame evil Toyota for providing transport to a terrorist, Then the terrorist sends an instant message on his iPhone, and you blame Telegram and iPhone, and so forth.

When you have built the infrastructure for hundreds of millions of users, you get a broad cross section of society, which will include some bad apples. That does not make Dominoes Pizza bad, or Toyota bad, or Telegram bad. It just makes the “evil pizza” argument a bad argument.

[Now there is a fourth battle with France in 2024.]

Why Care?

In conclusion, why should we even care about privacy? In Orwell’s 1984, the main character Winston learned to love Big Brother. Why can’t we?

  • We care about medical privacy because of the stigma in society attached to our HIV/AIDS status and to our diagnosed psychological disorders.
  • We care about financial privacy, to avoid being doxxed and to avoid having our bank account drained by identity thieves.
  • We care about sexual privacy to keep the government out of our bedroom.
  • We care about privacy on social media, so that we can joke and laugh and play and let down our guard without worrying that a potential future employer will judge us for what we say in jest.

The mathematics of privacy helps to make this all possible.

Source: University of Pittsburgh, Dietrich School of Arts and Sciences, public lecture series “Science Revealed.” Panel discussion on “Safety in Numbers? The Use (and Misuse) of Data in Society?” April 15, 2021.

[Postscript: August 31, 2024. It is remarkable how the Washington Post has reversed its opinion of Pavel Durov since 2020. It might appear that free speech is only good when it is bad for the Kremlin.]

tchales
http://jiggerwit.wordpress.com/?p=1835
Extensions
The Empirical World Interpretation of Quantum Mechanics
Uncategorizedphysicsquantum mechanics
I have had a deterministic interpretation of quantum mechanics in mind for decades, and I no longer recall the date of origin. My interest in the interpretation of quantum mechanics began as an undergraduate during a course on the philosophy of physics that I took from Nancy Cartwright around 1981-1982. All seeming non-determinism is pushed … Continue reading The Empirical World Interpretation of Quantum Mechanics
Show full content

I have had a deterministic interpretation of quantum mechanics in mind for decades, and I no longer recall the date of origin. My interest in the interpretation of quantum mechanics began as an undergraduate during a course on the philosophy of physics that I took from Nancy Cartwright around 1981-1982.

All seeming non-determinism is pushed back to the initial conditions. No measurements as projection operators take place, but measurements act as boundary conditions that place empirical constraints on the initial condition.

The interpretation is quite elementary, and I would be surprised if it hasn’t been proposed and rediscovered many times in the past. The interpretation proposed here avoids the anti-empirical shortcomings of the many worlds interpretation of quantum mechanics.

headDownload
tchales
http://jiggerwit.wordpress.com/?p=1829
Extensions
Packings of Smoothed Polygons
geometryUncategorizedgeometry of numbersmathmathematicspackingssciencewriting
Thomas Hales Koundinya Vajjha and I have posted a new book on the arXiv about smoothed polygons. We use optimal control theory to prove that the most unpackable centrally symmetric convex disk in the plane is a smoothed polygon. A smoothed polygon is a polygon whose corners have been rounded in a special way by … Continue reading Packings of Smoothed Polygons
Show full content
Thomas Hales

Karl Reinhardt

Koundinya Vajjha and I have posted a new book on the arXiv about smoothed polygons.

We use optimal control theory to prove that the most unpackable centrally symmetric convex disk in the plane is a smoothed polygon. A smoothed polygon is a polygon whose corners have been rounded in a special way by arcs of hyperbolas. To be highly unpackable means that even densest packing of that disk has low density. A subset K of Euclidean space is centrally symmetric (centered at the origin) if -K = K; that is, K is equal to its own reflection through the origin (such as the pictured hexagon).

Motivated by Minkowski’s geometry of numbers, researchers had already begun to search for the most unpackable centrally symmetric convex disk (in brief, the most unpackable disk) by the early 1920s. In 1934, Karl Reinhardt (1895-1941) conjectured that the most unpackable disk is a smoothed octagon. Working independently of Reinhardt, but also motivated by the geometry of numbers, Kurt Mahler (1903-1988) attempted without success in 1946 to prove that the most unpackable disk must be a smoothed polygon. Our book proves what Mahler set out to prove: Mahler’s First conjecture on smoothed polygons. His second conjecture is identical to the Reinhardt conjecture, which remains open.

Minkowski’s Theorem

Hermann Minkowski (1864-1909) is remembered today in physics for introducing the idea of a four dimensional spacetime, which gave a geometric interpretation of Einstein’s theory of special relativity. He is remembered in mathematics for the hyperplane separation theorem, for the Minkowski sum (an addition of subsets of Euclidean space), and for the Brunn-Minkowski inequality, but he is primarily remembered for the geometry of numbers. All of these topics, relate to convexity, lattices, or quadratic forms in some way.

Minkowski

Minkowski

Minkowski’s celebrated theorem in the geometry of numbers states

Theorem (Minkowski). For every lattice L, every convex, centrally symmetric set (centered at zero) of volume greater than det(2L) (where det(L) is the covolume of the lattice) must contain a lattice point of L other than the origin.

Minkowski understood the fundamental significance of convexity and central symmetry in his theorem. Every subset of Euclidean space can be made convex by taking its convex hull. If K is any convex body, then there are two centrally symmetric bodies associated with it: the difference body DK = K+(-K), and the central symmetrization Δ(K)=DK/2. The central symmetrization is equal to the rescaling of the difference body by a factor of (1/2). If K is already centrally symmetric, central symmetrization leaves K unchanged, but the difference body DK scales K by a factor of two: DK=2K.

The figure shows that the following conditions are equivalent on a compact convex body K (blue triangle) and vector v (red dot):

  1. K is disjoint from K+v
  2. The central symmetrization Δ(K) (blue hexagon) is disjoint from Δ(K)+v.
  3. v is not in the difference body DK (yellow hexagon).

The equivalence of the first two properties shows that we obtain a packing of K with respect to a lattice L if and only if Δ(K) is a packing with respect to the same lattice. Because of the equivalence of 1 and 2, following Minkowski, we will restrict our attention to centrally symmetric K (centered at the origin).

The equivalence of the second and third properties tells us that the statement and proof of Minkowski’s theorem can be reformulated as a statement about packings. From that perspective of packings, we can prove Minkowski’s theorem as follows.

Proof of Minkowski’s theorem: We prove the contrapositive, starting from the assumption that the centrally symmetric convex set K (centered at 0) does not contain a nonzero element of L. Since K=D(K/2), by the equivalence of (3) and (2), this means that translates of K/2 by the lattice form a packing. The density of a packing is never greater than 1. This gives vol(K/2)/det(L) ≤ 1, which is equivalent to the inequality vol(K)≤ det(2L) to be proved. QED.

From the proof, we see that Minkowski’s theorem is really a theorem about bounds on the density of packings in two dimensions.

Building on Minkowski’s theorem, packings became a research interest in Göttingen in the early twentieth century. “Minkowski and his close friend David Hilbert (1862-1943) had a strong interest in geometry, which at roughly this same time had become associated with the University of Göttingen and especially Felix Klein (1849-1925), and which Klein was at some pains to distinguish from the approach of mathematicians like Karl Weierstrass, associated more with the University of Berlin, which was more focused on analysis,…” (Historian, Richard Jensen)

Reinhardt Conjecture

To improve Minkowski’s theorem, it was natural to ask how much the constant of Minkowski’s theorem might be improved to a smaller constant d(K) ≤ 1 for each specific centrally symmetric convex body, so that if K has area greater than d(K) det(2L) then K must contain a lattice point other than the origin. There is no improvement if K is a parallelogram or centrally symmetric hexagon, because these polygons tile the plane. What choice of K gives the constant improving Minkowski’s theorem by the most? This problem is still unsolved in general, but our result gives the best known result in this direction: We prove that in dimension 2, the constant d(K) is minimal for some smoothed polygon K.

By the early 1920s, Courant (1888-1972) (who had been Hilbert’s assistant in Göttingen) conjectured that the minimal Minkowski constant in two dimensions is obtained by a circular disk K. However, in 1934, Reinhardt (who had also been Hilbert’s assistant in Göttingen) showed that the smoothed octagon is an improvement over the circle. Stated in terms of lattice packings, Reinhardt’s conjecture asserts that the smoothed octagon is the centrally symmetric convex K with the property that its densest lattice packing is the least.

The problem is an example of a minimax optimization problem: a double optimization that first maximizes over density for each given disk K, then minimizes that density with respect to K. Minimax problems are common in game theory (notably von Neumann’s minimax theorem), where each player tries to minimize the other player’s best strategy. An example of a maximin problem (a double optimization in the other direction) is to find the diameter of a space (or graph); the distance between two points (or vertices) is realized by the minimal path between the two points, and diameter is the maximum distance between points.

Reinhardt proved the existence of a most unpackable disk. For this, he used Blaschke’s selection theorem, which is also used to prove the existence of a solution to the isoperimetric inequality. Blaschke (1885-1962) is another mathematician who passed through Göttingen in the early 20th century, and it is through him that we know a bit of the early history of the problem.

Second, Reinhardt proved that the answer has no corners. He proved this by showing that if the disk has corners, it is possible to shave a bit away from the corners in a way that decreases its area without changing the optimal packing.

Third, he gave evidence that the optimal rounding of a corner might be an arc of a hyperbola. If too little is clipped, then it is possible to clip further. If too much is clipped from a corner, then the area of the smallest centrally symmetric hexagon decreases, which pushes the maximum packing density up rather than down. The right amount of clipping should occur between these two suboptimal extremes: when each point of the clipping curve supports a centrally symmetric hexagon of smallest area. This condition uniquely characterizes the arc of a hyperbola.

Why an octagon? The answer cannot be a square or a hexagon, because these shapes tile, which is not what we are looking for. Courant first conjectured the answer to be a circle. But then Reinhardt found that an octagon beats a circle. If we increase the number of sides in a regular polygon even more, the shape becomes more circle-like, which is non-optimal.

We can also assume that the answer is not a long skinny shape. The problem is invariant under affine transformations, and an affine transformation can be used to transform the optimal shape so that the boundary meets the six vertices of a regular hexagon. Any convex disk whose boundary passes through the six vertices of a regular hexagon must be rather round.

Fejes Tóth’s Theorem

The first proof of the circle packing problem (that the hexagonal arrangement is best) was published by Axel Thue (1863-1922) in 1890, but his proof is not widely accepted. Rigorous proofs were given by Claude Ambrose Rogers (1920-2005) and Fejes Tóth (1915-2005) in the 1950s. Fejes Tóth’s book (finally available in English after 70 years) contains more than one proof of the circle packing problem. (My article “Cannonballs and Honeycombs” gives a particularly simple proof.) Today we can view the circle packing problem as a consequence of a more general theorem by Fejes Tóth, who gave a three-step procedure for finding the best packing of any centrally symmetric disk.

  • Place the disk inside a centrally symmetric hexagon.
  • Tile the hexagons to pack the disk K.
  • The smallest area centrally symmetric hexagon will always give the densest packing of K in the plane.

Corollary (Thue’s theorem). The best packing of circles is the regular honeycomb arrangement. Proof. The smallest area hexagon containing a circle is the regular hexagon. The tiling by regular hexagons is the honeycomb.

Because of Fejes Tóth’s result, the Reinhardt problem is now recognized as dealing with all packings rather than just lattice packings.

Optimal Control Theory

Calculus of Variations and Optimal Control theory are both used to solve optimization problems, where the answer is not a number but a curve. Since we seek the optimal shape of the boundary of K, it is natural to turn to these theories for an answer. In our situation, it is conjectured that the convexity constraints have major importance (because of the straight edges in the smoothed octagons). In constrained problems such as this, optimal control theory is preferred over calculus of variations. Optimal control leads us to treat the boundary of K as a trajectory in a dynamical system.

Why was this theorem hard to prove? First of all, the optimization problem has an infinite number of critical points: a sequence of smoothed polygons converging to the circle. This means that the optimization problem cannot be convex. Second of all, we prove that there exist smoothed polygonal curves with infinitely many sides that are extremal for the Reinhardt optimization problem. The lengths of the sides shrink to zero in a geometric progression. In optimal control theory, this kind of infinite geometric behavior is known as chattering. Proving the existence of chattering solutions (and then ruling them out) involved the careful study of the unstable manifold at a fixed point of a dynamical system.

Our book explores the many remarkable structures of this packing problem, formulated as a problem in optimal control theory on a Lie group, with connections to hyperbolic geometry and Hamiltonian mechanics. Bang-bang Pontryagin extremals to the optimal control problem are smoothed polygons. Extreme difficulties arise in the proof because of chattering behavior in the optimal control problem, corresponding to possible smoothed polygons with infinitely many sides that need to be ruled out. To analyze and eliminate the possibility of chattering solutions, the book introduces a discrete dynamical system (the Poincaré first recurrence map) and gives a full description of its fixed points, stable and unstable manifolds, and basin of attraction on a blowup centered at a singular set. Some proofs in this book are computer-assisted using the Mathematica computer algebra system.

Link to our book

tchales
Minkowski
http://jiggerwit.wordpress.com/?p=1777
Extensions