Secret Blogging Seminar

Congress proposes cutting of all funding to US academics who mentor Chinese students

David Speyer Dec 1, 2025 Updated Dec 1, 2025

I’m writing to point out a potential law which should be gathering more opposition and attention in math academia: The Securing American Funding and Expertise from Adversarial Research Exploitation Act. This is an amendment to the 2026 National Defense Authorization Act which has passed the House and could be added to the final version of […]

Show full content

This act would ban any US scientist from receiving federal funding if they have, within the last five years, worked with anyone from China, Russia, Iran or North Korea, where “worked with” includes joint research, co-authorship on papers, or advising a foreign graduate student or postdoctoral fellow. As I said in my message to my senators, this is everyone. Every mathematician has advised Chinese graduate students or collaborated with Chinese mathematicians, because China is integrated into the academic world and is one fifth of the earth.

This obviously isn’t secret, since you can read about it in Science, but I am surprised that I haven’t heard more alarm. Obvious people to contact are your senators and your representatives. I would also suggest contacting members of the Senate armed services committee, who are in charge of reconciling the House and Senate versions of the bill.

http://sbseminar.wordpress.com/?p=6631

Extensions

Moonshine over the integers

Scott Carnahan Aug 27, 2024 Updated Aug 27, 2024

Show full content

I’d been meaning to write a plug for my paper A self-dual integral form for the moonshine module on this blog for almost 7 years, but never got around to it until now. It turns out that sometimes, if you wait long enough, someone else will do your work for you. In this case, I recently noticed that Lieven Le Bruyn wrote up a nice summary of the result in 2021. I thought I’d add a little history of my own interaction with the problem.

I first ran into this question when reading Borcherds and Ryba’s 1996 paper Modular Moonshine II during grad school around 2003 or 2004. Their paper gives a proof of Ryba’s modular moonshine conjecture for “small odd primes”, and it has an interesting partial treatment of the problem of finding a self-dual integral form of the monster vertex algebra with monster symmetry. More explicitly, the authors wanted the following data:

An abelian group $V_\mathbb{Z}$ graded by non-negative integers, with finitely generated free pieces in each degree.
A multiplication structure $V_\mathbb{Z} \otimes V_\mathbb{Z} \to V_\mathbb{Z} ((z))$ satisfying the vertex algebra axioms over the integers.
A faithful monster action by vertex algebra automorphisms.
An integer-valued inner product that is self-dual (i.e., it gives a unimodular lattice for each graded piece), monster-invariant, and invariant in a vertex-algebraic sense.
A vertex algebra isomorphism $V_\mathbb{Z} \otimes \mathbb{C} \to V^\natural$ from the complexification to the usual monster vertex algebra. This is the “integral form” property.

These properties would allow for the following developments:

The monster action would let them consider the action of centralizers on fixed-point subalgebras.
The self-dual form gives an isomorphism between degree 0 Tate cohomology for a prime order element and the quotient of fixed points by the radical of the induced bilinear form. This connects the Tate cohomology formulation in Borcherds-Ryba to the original formulation of Modular moonshine given by Ryba.
The integral form property lets them connect mod p data to the traces of elements on $V^\natural$ , which are known from monstrous moonshine.

Unfortunately, the technology for constructing such an object was missing at the time, so a large fraction of their paper was spent making some partial progress on this problem and working around the parts they couldn’t finish. As it happens, the first progress toward an integral form was earlier, in the 1988 book by Frenkel, Lepowsky, and Meurman where they constructed $V^\natural$ . After the initial construction, near the end of the book, they exhibited a monster-symmetric form over the rationals. Borcherds and Ryba showed that this construction could be refined to work over $\mathbb{Z}[1/2]$ , and they gave some tantalizing hints for refining this to an integral form. In particular, they pointed out that we can make a self-dual integral form from self-dual forms over $\mathbb{Z}[1/2]$ and $\mathbb{Z}[1/3]$ , if they are isomorphic over $\mathbb{Z}[1/6]$ . In algebraic geometry language, this is “descent by a Zariski cover”.

Unfortunately, it seems to be quite difficult to construct a self-dual integral form over $\mathbb{Z}[1/3]$ . The construction of $V^\natural$ by Frenkel, Lepowsky, and Meurman starts with the Leech lattice vertex algebra (which has an “easily constructed” self-dual integral form), and applies eigenspace decompositions for involutions in an essential way. In general, if you do a construction using eigenspace decomposition for a finite-order automorphism of a lattice, then you destroy self-duality over any ring where that order is not invertible. Recovering a self-dual object tends to require a lot of work by hand (e.g., adding a specific collection of cosets), which is impractical in an infinite dimensional structure.

Instead of the order 2 orbifold construction of Frenkel, Lepowsky, and Meurman, one can try an order 3 orbifold construction. Given such a construction, one can hope that it can be done over $\mathbb{Z}[1/3, \zeta_3]$ (now we know this is possible), and Borcherds and Ryba suggested a strategy for refining this to $\mathbb{Z}[1/3]$ (I still don’t know how to make their method work). Dong and Mason had tried to do an explicit order 3 orbifold construction in 1994, but after a massive calculation, they had to give up. The order 3 construction was eventually done in 2016 by Chen, Lam, and Shimakura using some newer technology in the form of pure existence theorems (in particular, using the regularity of fixed-point vertex subalgebras I proved with Miyamoto, and Huang’s modular tensor theorem). However, it was not clear how to do this construction over smaller rings.

Anyway, I had talked about the problem of constructing a self-dual integral form with Borcherds during grad school after reading his modular moonshine papers, and he mentioned that he had considered giving it to a grad student to figure out, but that it seemed “way too hard for a Ph.D. thesis”. After that, I just kept the problem in the fridge. Every so often, some new advance would come, and I would think about whether it would help with this question, and the answer would be “no”. Even after the existence of cyclic orbifolds over the complex numbers was established (I blogged about it here), the question of defining them over small rings in a way that ensured self-duality and monster symmetry was a seemingly impenetrable challenge.

The event that changed my outlook was a conversation with Toshiyuki Abe at a conference in Osaka in 2016. He kindly explained a paper he was writing with C. Lam and H. Yamada, and in particular, a way to produce $V^\natural$ “inside” an order 2p orbifold of the Leech lattice vertex algebra. Basically, you can take two copies of the Leech lattice vertex algebra, related by order 2p cyclic orbifold duality, and use them to generate a larger structure that contains $V^\natural$ . This was the advance I needed, because (after an easy generalization from order 2p to order pq) it let me produce self-dual forms of $V^\natural$ over small rings like $\mathbb{Z}[1/pq, \zeta_{pq}]$ without doing any explicit work.

After this, the pieces slowly fell into place. Once I had self-dual forms over enough small rings, I could try to glue them together to get a form over the integers. Using some results on maximal subgroups of the monster, I was able to show that the results of gluing are unique up to isomorphism and carry monster symmetry. However, I found out that the fundamentals of gluing are tricky if you’re not so good at commutative algebra. Perhaps there is a lesson here about the advantages of finding good collaborators.

Gluing problems

Suppose you have a diagram of commutative rings $R_l \leftarrow R \to R_r$ , together with an $R_l$ -module $M_l$ and an $R_r$ -module $M_r$ .

Question: What data and properties do we need to have a uniquely defined $R$ -module $M$ such that $R_l \otimes_R M \cong M_l$ and $M \otimes_R R_r \cong M_r$ ?

One rather obvious necessary condition is that we need $M_l \otimes_R R_r \cong R_l \otimes_R M_r$ , since both sides would be $R_l \otimes_R M \otimes_R R_r$ with different choices of parentheses. However, this is not sufficient, unless the diagram $R_l \leftarrow R \to R_r$ satisfies some additional properties.

If we consider this from the point of view of algebraic geometry, we have a diagram of schemes $\text{Spec} R_l \to \text{Spec} R \leftarrow \text{Spec} R_r$ and quasicoherent sheaves on the sources of the arrows. We would like to have a quasicoherent sheaf on $\text{Spec} R$ that pulls back to the sheaves we had. Clearly, if the scheme maps are not jointly surjective, then the sheaf on $\text{Spec} R$ will not be uniquely determined, since any point not in the image can be the support of a skyscraper sheaf.

We come to our first sufficient condition: If we have a Zariski cover, namely the two maps are open immersions that are jointly surjective, then a choice of isomorphism $M_l \otimes_R R_r \cong R_l \otimes_R M_r$ yields an $R$ -module $M$ together with isomorphisms $R_l \otimes_R M \cong M_l$ and $M \otimes_R R_r \cong M_r$ , and these data are unique up to unique isomorphism.

The problem in my situation was that I needed to glue modules using some maps that were not open immersions. When I wrote the first version of my paper, I was under the mistaken impression that I could glue sheaves on étale covers the same way we glue sheaves on Zariski covers (i.e., that we don’t need to consider fiber products of open sets with themselves), and this led to some strange conclusions. In particular, I thought I had constructed 4 possibly distinct integral forms.

After a referee asked for a reference for my claim, I realized that it was false! Here is a counterexample: Take a scheme with two connected components $X = U \cup V$ , and define a 2-element étale cover given by arbitrary surjective étale maps to each component: $\{ U_1 \to U, U_2 \to V \}$ . The gluing data gives no information, since the intersection is empty, so we can’t in general descend a sheaf along the surjective étale maps.

I eventually found a different sufficient condition: If both maps in the diagram $R_l \leftarrow R \to R_r$ are faithfully flat, then then a choice of isomorphism $M_l \otimes_R R_r \cong R_l \otimes_R M_r$ yields an $R$ -module $M$ together with isomorphisms $R_l \otimes_R M \cong M_l$ and $M \otimes_R R_r \cong M_r$ , and these data are unique up to unique isomorphism. The next problem was writing a solid proof of this new claim, and this required several more iterations with a referee because I wasn’t very careful.

Anyway, I am very grateful for the persistence and careful reading of referee 2, who prevented me from releasing a sloppy piece of work.

About the journal

I had thought about submitting my paper to a top-ranked journal, but my friend John Duncan asked me to submit it to a special SIGMA issue on Moonshine that he was editing. SIGMA is a “diamond open-access” ArXiv overly journal, and this suited my ideological leanings. Also, I had recently gotten tenure, so putting things in high-ranked journals suddenly seemed less important.

http://sbseminar.wordpress.com/?p=6611

Using discord for online teaching

David Speyer Mar 13, 2020 Updated Mar 13, 2020

On Wednesday, I asked several of my students what tools they use to collaborate online on their problem sets. Several of them mentioned Discord. I am currently trying to set up a Discord channel for my class. I imagine I am not the only one in this situation, so I am writing up my progress […]

Show full content

If you have questions to discuss, let’s do that in the comment thread here. And please promote this on twitter and wherever else math teacher’s gather!

I did a dry run with 6 of my students this afternoon. We spent 10 minutes introducing each other to the system, broke into two groups of 3 and spent 10 minutes solving a math problem (problem 19.2 here) and 30 minutes debriefing. Most students found it awkward but workable. Here are things I/we saw as good:

Connection was reliable; much less dropping out than the video conferencing tools they report their other courses are using.
The presence of the chat stream with integrated graphics and LaTeX encouraged people to write things down, just like we encourage students to write on blackboards when doing group work in class. This made it easier for me to jump in and out of conversations.
It was pretty easy for me to jump back and forth from one group to the other. (Not sure how I would do with 4 or more groups, though.)
We really did solve a nontrivial problem and have a nontrivial conversation about it!

Things we didn’t like

Both groups needed, at one point, to draw an equation or commutative diagram like this one:
$\mathbb{Q}(\cos \tfrac{2 \pi}{7} ) \cong \mathbb{Q}[x]/f(x) \mathbb{Q}[x] \cong \mathbb{Q}(\cos \tfrac{4 \pi}{7}).$
One group did the LaTeX, the other draw a commutative diagram in MS Paint and dropped it into the thread. Both expressed frustration about how much slower this was than drawing a picture in person.
Some students felt that it was awkward not seeing the people they were talking to.

http://sbseminar.wordpress.com/?p=6605

Extensions

Read Izabella Laba on diversity statements

David Speyer Dec 2, 2019 Updated Dec 2, 2019

Show full content

There has been a dispute running through mathematical twitter about diversity statements in academic hiring. Prompted by that, Izabella Laba has just written an excellent post, which affirms the importance of diversity as a goal, but lays out the many tricky issues with diversity statements. It makes a lot of points I would like to make, and raises others that I hadn’t thought of but should have. In case there is someone reading this blog but not reading Professor Laba’s, go check it out.

Sadly, this blog is still dead. I have a list of things I would like to write on it, but I have no idea when or if I will find the time.

http://sbseminar.wordpress.com/?p=6594

Why single variable analysis is easier

David Speyer Feb 22, 2018 Updated Feb 22, 2018

Jadagul writes: Got a draft of the course schedule for next year. Looks like I might get to teach real analysis. I probably need someone to talk me out of trying to do everything in R^n. A subsequent update indicates that the more standard alternative is teaching one variable analysis. This is my second go […]

Show full content

Jadagul writes:

Got a draft of the course schedule for next year. Looks like I might get to teach real analysis.

I probably need someone to talk me out of trying to do everything in R^n.

A subsequent update indicates that the more standard alternative is teaching one variable analysis.

This is my second go around teaching rigorous multivariable analysis — key points are the multivariate chain rule, the inverse and implicit function theorems, Fubini’s theorem, the multivariate change of variables formula, the definition of manifolds, differential forms, Stokes’ theorem, the degree of a differentiable map and some preview of de Rham cohomology. I wouldn’t say I’m doing a great job, but at least I know why it’s hard to do. I haven’t taught single variable, but I have read over the day-to-day syllabus and problem sets of our most experienced instructor.

Here is the conceptual difference: It is quite doable to start with the axioms of an ordered field and build rigorously to the Fundamental Theorem of Calculus. Doing this gives students a real appreciation for the nontrivial power of mathematical reasoning. I don’t want to say that it is actually impossible to do the same for Stokes’ theorem (according to rumor, Brian Conrad did it), but I never manage — there comes a point where I start waving my hands and saying “Oh yes, and throw in a partition of unity” or “Yes, there is an inverse function theorem for maps between $n$ -folds just like the one for maps between open subsets of $\mathbb{R}^n$ .” I think most students probably benefit from seeing things done carefully for a term first.

Below the fold, a list of specific topics much harder in more than one variable. If you have found ways not to make them harder, please chime in in the comments!

• No need for linear algebra. Just defining the multivariate derivative uses the concept of a linear map, and stating the chain rule requires you to compose them. If you want your students to ever be able to check the hypotheses of the inverse function theorem, they have to be able to check if matrices are invertible.

• One variable Riemann sums are so nice! If $u : [a,b] \to [c,d]$ is an increasing bijection, and $a=x_0 < x_1 < \cdots < x_N = b$ is a partition of $[a,b]$ , then $c=u(x_0) < u(x_1) < \cdots < u(x_N) =d$ is a partition of $[c,d]$ ; the $u$ -substitution formula for integrals follows immediately. If $u : [a_1, b_1] \times [a_2, b_2] \to [c_1, d_1] \times [c_2, d_2]$ is a smooth bijection, and we have a partition of $[a_1, b_1] \times [a_2, b_2]$ into rectangles, its image in $[c_1, d_1] \times [c_2, d_2]$ is quite hard to describe. This is why the change of variables formula is such a pain.

• Regions of integration: In one variable we always integrate over an interval. In many variables, we integrate over complicated regions, so we need to describe the geometry of complicated regions. If you want to give a region up into simpler pieces, you need to introduce some rudimentary notion of "measure zero", to make sure the boundaries you cut along aren't too large.

• Improper integrals: In one variable, we always take limits as the bounds of the integral go somewhere. In many variables, there are uncountably many different limiting processes which could define $\int_{\mathbb{R}^2} f(x,y)$ .

And that's not even getting into manifolds, or multilinear algebra…

http://sbseminar.wordpress.com/?p=6584

Extensions

A grad student fellowship

Noah Snyder Dec 18, 2017 Updated Dec 18, 2017

My CAREER grant includes funding for a thesis-writing fellowship for graduate students who have done extraordinary teaching and outreach during their time as a grad student. If you know any such grad students who are planning on graduating in the 2018-2019 school year please encourage them to apply. Deadline is Jan 31 and details here. […]

Show full content

While I’m shamelessly plugging stuff for early career people, Dave Penneys, Julia Plavnik, and I are running an MRC this summer in Quantum Symmetry. It’s aimed at people at -2 to +5 years from Ph.D. working in tensor categories, subfactors, topological phases of matter, and related fields, and the deadline is Feb 15.

http://sbseminar.wordpress.com/?p=6582

Fighting the grad student tax

David Speyer Dec 4, 2017 Updated Dec 14, 2017

I’m throwing this post up quickly, because time is of the essence. I had hoped someone else would do the work. If they did, please link them in the comments. As many of you know, the US House and Senate have passed revisions to the tax code. According to the House, but not the Senate […]

Show full content

I’m throwing this post up quickly, because time is of the essence. I had hoped someone else would do the work. If they did, please link them in the comments.

As many of you know, the US House and Senate have passed revisions to the tax code. According to the House, but not the Senate draft, graduate tuition remissions are taxed as income. Thus, here at U Michigan, our graduate stipend is 19K and our tuition is 12K. If the House version takes effect, our students would be billed as if receiving 31K without getting a penny more to pay it with.

It is thus crucial which version of the bill goes forward. The first meeting of the reconciliation committee is TONIGHT, at 6 PM Eastern Time. Please contact your congress people. You can look up their contact information here. Even if they are clearly on the right side of this issue, they need to be able to report how many calls they have gotten about it when arguing with their colleagues. Remember — be polite, make it clear what you are asking for, and make it clear that you live and vote in their district. If you work at a large public university in their district, you may want to point out the effect this will have on that university.

I’ll try to look up information about congress people who are specifically on the committee or otherwise particularly vulnerable. Jordan Ellenberg wrote a “friends only” facebook post relevant to this, which I encourage him to repost publicly, on his blog or in the comments here.

UPDATE According to the Wall Street Journal, the grad tax is out. Ordinarily, I thank my congress people when they’ve been on the right side of an issue and won. (Congress people are human, and appreciate thanks too!) In this case, I believe the negotiations happened largely in secret, so I’m not sure who deserves thanks. If anyone knows, feel free to post below.

http://sbseminar.wordpress.com/?p=6578

Extensions

… and Elsevier taketh away.

Scott Morrison Apr 10, 2017 Updated Apr 10, 2017

Readers may recall that during the 2013 “peak-Elsevier” period, Elsevier made an interesting concession to the mathematical community — they released all their old mathematical content (“old” here means a rolling 4-year embargo) under a fairly permissive licence. Unfortunately, sometime in the intervening period they have quietly withdrawn some of the rights they gave to […]

Show full content

Unfortunately, sometime in the intervening period they have quietly withdrawn some of the rights they gave to that content. In particular, they no longer give the right to redistribute on non-commercial terms. Of course, the 2013 licence is no longer available on their website, but thankfully David Roberts saved a copy at https://plus.google.com/u/0/+DavidRoberts/posts/asYgXTq9Y2r. The critical sentence there is

“Users may access, download, copy, display, redistribute, adapt, translate, text mine and data mine the articles provided that: …”

The new licence, at https://www.elsevier.com/about/our-business/policies/open-access-licenses/elsevier-user-license now reads

“Users may access, download, copy, translate, text and data mine (but may not redistribute, display or adapt) the articles for non-commercial purposes provided that users: …”

I think this is pretty upsetting. The big publishers hold the copyright on our collective cultural heritage, and they can deny us access to the mathematical literature at a whim. The promise that we could redistribute on a non-commercial basis was a guarantee that we could preserve the literature. If this is to be taken away, I hope that mathematicians will go to war again.

Hopefully Elsevier will soon come out with a “oops, this was a mistake, those lawyers, you know?” but this will only happen if we get on their case.

What to do:

Elsevier journal editors: please contact your Elsevier representations, and ask that the licence for the open archives be restored to what it was, to assure the mathematical community that we have ongoing access to the old literature.
Elsevier referees and authors: please contact your journal editors, to ask them to contact Elsevier. If you are currently refereeing or submitting, please bring up this issue directly.
Everyone: contact Elsevier, either by email or social media (twitter facebook google+).
Happily, as we have a copy of the 2013 licence, all the Elsevier open mathematics archive up to 2009 is still available for non-commercial redistribution under their terms. You can find these at https://tqft.net/misc/elsevier-oa/.

http://sbseminar.wordpress.com/?p=6543

Extensions

Bounds for sum free sets in prime power cyclic groups — three ways

David Speyer Jul 8, 2016 Updated Jul 8, 2016

A few weeks ago, I e-mailed Will Sawin excitedly to tell him that I could extend the new bounds on three-term arithmetic progression free subsets of to . Will politely told me that I was the third team to get there — he and Eric Naslund already had the result, as did Fedor Petrov. But […]

Show full content

A few weeks ago, I e-mailed Will Sawin excitedly to tell him that I could extend the new bounds on three-term arithmetic progression free subsets of $(\mathbb{Z}/p \mathbb{Z})^n$ to $(\mathbb{Z}/p^r \mathbb{Z})^n$ . Will politely told me that I was the third team to get there — he and Eric Naslund already had the result, as did Fedor Petrov. But I think there might be some expository benefit in writing up the three arguments, to see how they are all really the same trick underneath.

Here is the result we are proving: Let $q=p^r$ be a prime power and let $C_q$ be the cyclic group of order $q$ . Let $A \subset C_q^n$ be a set which does not contain any three term arithmetic progression, except for the trivial progressions $(a,a,a)$ . Then

$\displaystyle{|A| \leq 3 | \{(m^1, m^2, \ldots, m^n) \in [0,q-1]^n : \sum m^i \leq n(q-1)/3 \}|.}$

The exciting thing about this bound is that it is exponentially better than the obvious bound of $q^n$ . Until recently, all people could prove was bounds like $q^n/n^c$ , and this is still the case if $q$ is not a prime power.

All of our bounds extend to the colored version: Let $(\bar{a}_i, \bar{b}_i, \bar{c}_i)$ be a list of $N$ triples in $((C_q)^n)^3$ such that $\bar{a}_i-2 \bar{b}_i+\bar{c}_i=0$ , but $\bar{a}_i-2\bar{b}_j+\bar{c}_k \neq 0$ if $(i,j,k)$ are not all equal. Then the same bound applies to $N$ . To see that this is a special case of the previous problem, take $\bar{a}_i = \bar{b}_i = \bar{c}_i$ . Once the problem is cast this way, if $q$ is odd, one might as well define $(a_i, b_i, c_i) = (\bar{a}_i, -2 \bar{b}_i, \bar{c}_i)$ , so our hypotheses are that $a_i+b_i+c_i = 0$ but $a_i+b_j+c_k \neq 0$ if $(i,j,k)$ are not all equal. We will prove our bounds in this setting.

My argument — Witt vectors

I must admit, this is the least slick of the three arguments. The reader who wants to cut to the slick versions may want to scroll down to the other sections.

We will put an abelian group structure $\oplus$ on the set $\mathbb{F}_p^r$ which is isomorphic to $C_q$ , using formulas found by Witt. I give an example first: Define an addition on $\mathbb{F}_3^2$ by

$\displaystyle{(x_0,x_1) \oplus (y_0, y_1) = (x_0+y_0, x_1+y_1 - x_0^2 y_0 - x_0 y_0^2)}$

The reader may enjoy verifying that this is an associative addition, and makes $\mathbb{F}_3^2$ into a group isomorphic to $C_9$ . For example, $(1,0) \oplus (1,0) \oplus (1,0) = (2,1) \oplus (1,0) = (0,1)$ and $(0,1) \oplus (0,1) \oplus (0,1) = (0,0)$ .

In general, Witt found formulas

$\displaystyle{(x_0,x_1, \ldots x_{r-1}) \oplus (y_0, y_1, \ldots, y_{r-1}) = (S_0(x,y), S_1(x,y), \ldots, S_{r-1}(x,y))}$

such that $\mathbb{F}_p^r$ becomes an abelian group isomorphic to $C_q$ . If we define $x_e$ and $y_e$ to have degree $p^e$ , then $S_e$ is homogenous of degree $p^e$ . (Of course, Witt did much more: See Wikipedia or Rabinoff.)

Write

$\displaystyle{(x_0,x_1, \ldots x_{r-1}) \oplus (y_0, y_1, \ldots, y_{r-1}) \oplus (z_0, \ldots, z_{r-1}) = (S_0(x,y,z), S_1(x,y,z), \ldots, S_{r-1}(x,y,z))}$ .

and set

$\displaystyle{\bar{F}(x,y,z) = \prod_{i=0}^{r-1} \left( 1-S_i(x,y,z)^{p-1} \right)}$ .

For example, when $q=9$ , we have

$\displaystyle{\bar{F} = \left( 1-(x_0+y_0+z_0)^2 \right) \left( 1-(x_1+y_1+z_1 - x_0^2 y_0 - x_0 y_0^2 - x_0^2 z_0 - x_0 z_0^2 - y_0^2 z_0 - y_0 z_0^2 + x_0 y_0 z_0)^2 \right)}$ .

So $\bar{F}(x,y,z)=0$ if and only if $(x_0, \ldots, x_{r-1}) \oplus (y_0, \ldots, y_{r-1}) \oplus (z_0, \ldots, z_{r-1}) \neq 0$ in $C_q$ .

We now work with $3kn$ variables, $x_e^f$ , $y_e^f$ and $z_e^f$ , where $0 \leq e \leq k-1$ and $1 \leq f \leq n$ . Consider the polynomial

$\displaystyle{ \bar{G} = \prod_{f=1}^n \bar{F}(x^f, y^f, z^f) }$ .

Here each $\bar{F}$ is a polynomial in $3k$ variables.

So $\bar{G}$ is a polynomial on $(\mathbb{F}_p^k)^{3n}$ . We identify this domain with $C_q^{3n}$ . Then $\bar{G}(x,y,z)=0$ if and only if $x+y+z \neq 0$ in the group $C_q^n$ .

We define the degree of a monomial in the $x_e^f$ , $y_e^f$ and $z_e^f$ by setting $\deg(x_e^f)=\deg(y_e^f)=\deg(z_e^f)=p^e$ . In this section, “degree” always has this meaning, not the standard one. The degree of $S_e$ is $p^e$ ; the degree of $\bar{F}$ is $(p-1) + p(p-1) + p^2(p-1) + \cdots + p^{r-1}(p-1) = p^r-1 = q-1$ and the degree of $\bar{G}$ is $(q-1)n$ .

From each monomial in $\bar{G}$ , extract whichever of $\prod (x_e^f)^{i_e^f}$ , $\prod (y_e^f)^{j_e^f}$ or $\prod (z_e^f)^{k_e^f}$ has lowest degree. We see that we can write

$\displaystyle{ \bar{G}(x,y,z) = \sum A_s(x) P_s(y,z) + \sum B_s(y) Q_s(x,z) + \sum C_s(z) R_s(x,y)}$

where $A_s$ , $B_s$ and $C_s$ are monomials of degree $\leq (q-1)n/3$ .

The now-standard argument (I like Terry Tao’s exposition) shows that $N$ is bounded by three times the number of monomials $\prod (x_e^f)^{m_e^f}$ of degree $\leq (q-1)n/3$ . One needs to check that the argument works when the “degree” of a variable need not be $1$ , but this is straightforward.

Except we have a problem! There are too many monomials. To solve this issue, let $F$ be the polynomial obtained from $\bar{F}$ by replacing every monomial $u^{\bar{k}}$ by $u^k$ where $k \equiv \bar{k} \bmod p-1$ with $k=0$ if $\bar{k}=0$ and $1 \leq k \leq p-1$ if $\bar{k}>0$ . So $F$ coincides with $\bar{F}$ as a function on $\mathbb{F}_p^k$ , but $F$ uses smaller monomials. For example, the reader who multiplies out the expression for $\bar{F}$ when $q=9$ will find a term $2 x_0^6 y_0 z_0$ . In $F$ , this is replaced by $2 x_0^2 y_0 z_0$ . The polynomial $F$ does not have the nice factorization of $\bar{F}$ , but it is much smaller. For example, when $q=9$ , $\bar{F}$ has $137$ nonzero monomials and $F$ has $65$ . Replacing $\bar{F}$ by $F$ can only lower degree, so $\deg(F) \leq q-1$ . Now rerun the argument with $G(x,y,z) := \prod_{f=1}^n F(x^f,y^f,z^f)$ . Our new bound is three times the number of monomials $\prod (x_e^f)^{m_e^f}$ of degree $\leq (q-1)n/3$ , with the additional condition that all exponents $m_e^f$ are $\leq p-1$ .

Now, the monomial $\prod_e x_e^{m_e}$ has degree $\sum_e p^e m_e$ . Identify $[0,p-1]^r$ with $[0,q-1]$ by sending $(m_0, m_1, \ldots, m_{r-1})$ to $\sum m_e p^e$ . We can thus think of $[0,p-1]^{rn}$ as $[0,q-1]^n$ . We get the bound $N \leq 3 | \{(m^1, m^2, \ldots, m^n) \in [0,q-1]^n : \sum m^i \leq n(q-1)/3 \}|$ , just as in the prime case.

Naslund-Sawin — binomial coefficients

Let’s be much slicker. Here is how Naslund and Sawin do it (original here).

Notice that, by Lucas’s theorem, the function $x \mapsto \binom{x}{m}$ is a well defined function $C_q \to \mathbb{F}_p$ when $0 \leq m \leq q-1$ . Moreover, using Lucas again,

$\displaystyle{\binom{x-1}{q-1}=\begin{cases} 1 & x \equiv 0 \bmod q \\ 0 & \mbox{otherwise} \end{cases}}.$

Define a function $F: C_q^3 \to \mathbb{F}_p$ by

$\displaystyle{F(x,y,z) = \binom{x+y+z-1}{q-1} = \sum_{i+j+k = q-1} \binom{x-1}{i} \binom{y}{i} \binom{z}{k}}$

$\displaystyle{=\sum_{i+j+k \leq q-1} (-1)^{q-1-i-j-k} \binom{x}{i} \binom{y}{i} \binom{z}{k}}$ .

Here we have expanded by Vandermonde’s identity and used $\binom{x-1}{m} = \sum_{i \leq m} (-1)^{m-i} \binom{x}{i}$ .

Define a function $G: C_q^{3n} \to \mathbb{F}_p$ by $G(x,y,z) = \prod_{f=1}^n F(x^f,y^f,z^f)$ just as before. So $G(x,y,z)=0$ if and only if $x+y+z \neq 0$ in the abelian group $C_q^n$ . Expanding $G$ gives a sum of terms of the form $\prod_f \binom{x^f}{i^f} \binom{y^f}{j^f} \binom{z^f}{k^f}$ . Considering such a term to have “degree” $\sum_f (i^f+j^f+k^f)$ , we see that $G$ has degree $\leq (q-1)n$ .

As in the standard proof, factor out whichever of $\prod_f \binom{x^f}{i^f}$ , $\prod_f \binom{y^f}{j^f}$ or $\prod_f \binom{z^f}{k^f}$ , has least degree. We obtain

$\displaystyle{ G(x,y,z) = \sum A_s(x) P_s(y,z) + \sum B_s(y) Q_s(x,z) + \sum C_s(z) R_s(x,y)}$

where $A_s$ , $B$ and $C$ are products of binomial coefficients and, taking $\deg \binom{w}{m}=m$ , we have $\deg A_s$ , $\deg B_s$ and $\deg C_s \leq (q-1)n/3$ .

We derive the bound $N \leq 3 | \{(m^1, m^2, \ldots, m^n) \in [0,q-1]^n : \sum m^f \leq n(q-1)/3 \}|$ , exactly as before.

Group rings — Petrov’s argument

I have taken the most liberties in rewriting this argument, to emphasize the similarity with the other arguments. The reader can see the original here.

Let $\Gamma = C_q^n$ . Let $\mathbb{F}_p^{\Gamma}$ be the ring of functions $\Gamma \to \mathbb{F}_p$ with pointwise operations, and let $\mathbb{F}_p[\Gamma]$ be the group ring of $\Gamma$ . We think of $\mathbb{F}_p[\Gamma]$ acting on $\mathbb{F}_p^{\Gamma}$ by $\left( \left( \sum a_{\sigma} \sigma \right) f \right)(w) = \sum a_{\sigma} f(w+\sigma)$ .

Let $\tau_1$ , $\tau_2$ , …, $\tau_n$ be generators for $\Gamma = C_q^n$ . Let $Y_{\leq d} \subset \mathbb{F}_p^{\Gamma}$ the functions annihilated by the operators $\prod_f (\tau_f-1)^{m_f}$ where $\sum m_f = d+1$ . For example, $Y_{\leq 1}$ is the functions $f: \Gamma \to \mathbb{F}_p$ which obey $f(w+\tau_i+\tau_j) - f(w+\tau_i)-f(w+\tau_j)+f(w)=0$ for any $w$ , $i$ and $j$ . We think of $Y_{\leq d}$ as polynomials of degree $\leq d$ , and the dimension of $Y_{\leq d}$ is the number of monomials in $n$ variables of total degree $\leq d$ where each variable has degree $\leq q-1$ .

Define $H: \Gamma \to \mathbb{F}_p$ by $H(0)=1$ and $H(w)=0$ otherwise. Define $G: \Gamma^3 \to \mathbb{F}_p$ by $G(x,y,z) = H(x+y+z)$ .

We write $\alpha_i$ , $\beta_i$ and $\gamma_i$ for the generators of the three factors in $\Gamma \times \Gamma \times \Gamma$ .
Then we have

$\displaystyle{ \left( \prod (\alpha_f-1)^{i_f} \prod (\beta_f-1)^{j_f} \prod (\gamma_f-1)^{k_f} G\right) (x,y,z) = }$

$\displaystyle{ \left( \prod (\tau_f-1)^{i_f+j_f+k_f} H\right) (x+y+z)}.$

So, if $i_f+j_f+k_f = q-1$ , then $\prod (\alpha_f-1)^{i_f} \prod (\beta_f-1)^{j_f} \prod (\gamma_f-1)^{k_f} G=0$ as a function on $\Gamma^3$ .

On the other hand, we can expand $G(x,y,z) = \sum A_s(x) B_s(y) C_s(z)$ for $A_s$ , $B_s$ and $C_s$ in $\mathbb{F}_p^{\Gamma}$ . We see that, if $i_f+j_f+k_f = q-1$ , then

$\displaystyle{\sum_s \left( \prod (\alpha_f-1)^{i_f} A_s \right) (x) \left( \prod (\beta_f-1)^{j_f} B_s \right) (y) \left( \prod (\gamma_f-1)^{k_f} C \right) (z) =0}$ .

We make the familiar deduction: We can write $G$ in the form

$\displaystyle{ G(x,y,z) = \sum A_s(x) P_s(y,z) + \sum B_s(y) Q_s(x,z) + \sum C_s(z) R_s(x,y)}$

where $A_s$ , $B_s$ and $C_s$ run over a basis for $Y_{\leq (q-1)n/3}$ .

Once more, we obtain the bound $N \leq 3 \dim Y_{(q-1)n/3}$ .

Petrov’s method has an advantage not seen in the other proofs: It generalizes well to the case that $\Gamma$ is non-abelian. For any finite group $\Gamma$ , let $I$ be a one-sided ideal in $\mathbb{F}_p[\Gamma]$ obeying $I^3=0$ . In our case, this is the ideal generated by $\prod_f (\tau_f-1)^{m_f}$ with $\sum m_f = (q-1)n/3+1$ . Then we obtain a bound $N \leq 3 \dim \mathbb{F}_p[\Gamma]/I$ for sum free sets in $\Gamma$ .

What’s going on?

I find Petrov’s proof immensely clarifying, because it explains why the arguments all give the same bound. We are all working with functions $C_q \to \mathbb{F}_p$ . I write them as polynomials in $r$ variables $x_e$ , Naslund and Sawin use binomial coefficients $\binom{x}{m}$ . The formulas to translate between our variables are a mess: For example, my $x_1$ is their $\tfrac{1}{p} (x-x^p) \bmod p$ . However, we both agree on what it means to be a polynomial of degree $\leq d$ : It means to be annihilated by $(\tau-1)^{d+1}$ .

In both cases, we take the indicator function of the identity and pull it back to $\Gamma^3$ along the addition map. The first two proofs use explicit identities to see that the result has degree $\leq (q-1)n$ . The third proof points out this is an abstract property of functions pulled back along addition of groups, and has nothing to do with how we write the functions as explicit formulas.

I sometimes think that mathematical progress consists of first finding a dozen proofs of a result, then realizing there is only one proof. My mental image is settling a wilderness — first there are many trails through the dark woods, but later there is an open field where we can run wherever we like. But can we get anywhere beyond the current bounds with this understanding? All I can say is not yet…

http://sbseminar.wordpress.com/?p=6165

Extensions

The growth rate for three-colored sum-free sets

David Speyer Jul 4, 2016 Updated Jul 4, 2016

There seems to be a rule that all progress on the cap set problem should be announced on blogs, so let me continue the tradition. Robert Kleinberg, Will Sawin and I have found the rate of growth of three-colored sum-free subsets of , as . We just don’t know it is that what we’ve found! […]

Show full content

There seems to be a rule that all progress on the cap set problem should be announced on blogs, so let me continue the tradition. Robert Kleinberg, Will Sawin and I have found the rate of growth of three-colored sum-free subsets of $(\mathbb{Z}/q \mathbb{Z})^n$ , as $n \to \infty$ . We just don’t know it is that what we’ve found!

The preprint is here.

Let me first explain the problem. Let $H$ be an abelian group. A subset $A$ of $H$ is said to be free of three term arithmetic progressions if there are no solutions to $a-2b+c=0$ with $a$ , $b$ , $c \in A$ , other than the trivial solutions $(a,a,a)$ . I’ll write $C_q$ for the cyclic group of order $q$ . Ellenberg and Giswijt, building on work by Croot, Lev and Pach have recently shown that such an $A$ in $C_3^n$ can have size at most $3 \cdot \# \{ (a_1, a_2, \ldots, a_n) \in \{0,1,2 \}^n : \sum a_i \leq 2n/3 \}$ , which is $\approx 2.755^n$ . This was the first upper bound better than $3^n/n^c$ , and has set off a storm of activity on related questions.

Robert Kleinberg pointed out the argument extends just as well to bound colored-sets without arithmetic progressions. A colored set is a collection of triples $(a_i, b_i, c_i)$ in $H^3$ , and we see that it is free of arithmetic progressions if we have $a_i - 2 b_j + c_k = 0$ if and only if $i=j=k$ . So, if $a_i = b_i = c_i$ , then this is the same as a set free of three term arithmetic progressions, but the colored version allows us the freedom to set the three coordinates separately.

Moreover, once $a$ , $b$ and $c$ are treated separately, if $\# H$ is odd, we may as well replace $b_i$ by $-2b_i$ and just require that $a_i+b_i+c_i=0$ if and only if $i$ is odd. This is the three-colored sum-free set problem. Three-colored sum-free sets are easier to construct than three-term arithmetic-progression free sets, but the Croot-Lev-Pach/Ellenberg-Giswijt bounds apply to them as well*.

Our result is a matching of upper and lower bounds: There is a constant $\gamma(q)$ such that

(1) We can construct three-colored sum-free subsets of $C_q^n$ of size $\exp(\gamma n-o(n))$ and

(2) For $q$ a prime power, we can show that three-colored sum-free subsets of $C_q^n$ have size at most $\exp(\gamma n)$ .

So, what is $\gamma$ ? We suspect it is the same number as in Ellenberg-Giswijt, but we don’t know!

When $q$ is prime, Ellenberg and Giswijt establish the bound $3 \cdot \{ (a_1, a_2, \ldots, a_n) \in \mathbb{Z}_{\geq 0}^n : \sum a_i \leq (q-1)n/3 \}$ . Petrov, and independently Naslund and Sawin (in preparation), have extended this argument to prime powers.

In the set $\{ (a_1, a_2, \ldots, a_n) \in \{0,1,\ldots,q-1 \}^n : \sum a_i \leq (q-1)n/3 \}$ , almost all the $n$ -tuples have a particular mix of components. For example, when $q=3$ , almost all $n$ tuples have roughly $0.514n$ zeroes, $0.305 n$ ones and $0.181 n$ twos. The number of such $n$ tuples is roughly $\exp(n \eta(0.514, 0.305, 0.181))$ , where $\eta(p_0, p_1, \ldots, p_k)$ is the entropy $- \sum p_k \log p_k$ .

In general, let $(p_0, p_1, \ldots, p_{q-1})$ be the probability distribution on $\{ 0,1,\ldots, q-1 \}$ which maximizes entropy, subject to the constraint that the expected value $\sum j p_j$ is $(q-1)/3$ . Then almost all $n$ -tuples in $\{ (a_1, a_2, \ldots, a_n) \in \{0,1,\ldots,q-1 \}^n : \sum a_i \leq (q-1)n/3 \}$ have roughly $p_j n$ copies of $j$ , and the number of such $n$ -tuples is grows like $\exp(n \cdot \eta(p_0, \ldots, p_{q-1}))$ . I’ll call $(p_0, \ldots, p_{q-1})$ the EG-distribution.

So Robert and I set out to construct three-colored sum-free sets of size
$\exp(n \cdot \eta(p_0, \ldots, p_{q-1}))$ . What we were actually able to do was to construct such sets whenever there was an $S_3$ -symmetric probability distribution on $T:= \{ (a,b,c) \in \mathbb{Z}_{\geq 0} : a+b+c = q-1 \}$ such that $p_j$ was the marginal probability that the first coordinate of $(a,b,c)$ was $j$ , and the same for the $b$ and $c$ coordinates. For example, in the $q=3$ case, if we pick $(2,0,0)$ , $(0,2,0)$ and $(0,0,2)$ with probability $0.181$ and $(1,1,0)$ , $(1,0,1)$ and $(0,1,0)$ with probability $0.152$ , then the resulting distribution on each of the three coordinates is the EG-distribution $(0.514, 0.305, 0.181)$ , and we can realize the growth rate of the EG bound for $q=3$ .

Will pointed out to us that, if such a probability distribution on $T$ does not exist, then we can lower the upper bound! So, here is our result:

Consider all $S_3$ -symmetric probability distributions on $T$ . Let $(\psi_0, \psi_1,\ldots, \psi_{q-1})$ be the corresponding marginal distribution, with $\psi_j$ the probabilty that the first coordinate of $(a,b,c)$ will be $j$ . Let $\gamma$ be the largest value of $\eta(\psi_0, \ldots, \psi_{q-1})$ for such a $\psi$ . Then

(1) There are three-colored sum-free subsets of $C_q^n$ of size $\exp(\gamma n - o(n))$ and

(2) If $q$ is a prime power, such sets have size at most $\exp(\gamma n)$ .

Any marginal of an $S_3$ -symmetric distribution on $T$ has expected value $(q-1)/3$ , so our upper bound is at least as strong as the Ellenberg-Gisjwijt/Petrov-Naslund-Sawin bound. We suspect they are the same: That their optimal probability distribution is such a marginal. But we don’t know!

Here are a few remarks:

(1) The restriction to $S_3$ -symmetric distributions is a notational convenience. Really, all we need is that all three marginals are equal to each other. But we might as well impose $S_3$ -symmetry because, if all the marginals of a distribution are equal, we can just take the average over all $S_3$ permutations of that distribution.

(2) Our lower bound does not need $q$ to be a prime power. I’d love to know whether the upper bound can also remove that restriction.

(3) If the largest entropy of a marginal comes from a distribution $\pi_{abc}$ on $T$ with all $\pi_{abc}>0$ , then the marginal distribution is the EG distribution. The problem is about the distributions at the boundary; it seems hard to show that it is always beneficial to perturb inwards.

(4) For $q \geq 7$ , there is more than one distribution on $T$ with the required marginal. One canonical choice would be the one which has largest entropy given that marginal. If the optimal solution has all $\pi_{abc}>0$ , then one can show that it factors as $\pi_{abc} = \beta(a) \beta(b) \beta(c)$ for some function $\beta:\{0,1,\ldots,q-1\} \to \mathbb{R}_{>0}$ .

* One exception is that Fedor Petrov has lowered the bound for AP free sets in $C_3^n$ to $2 \cdot \{ (a_1, a_2, \ldots, a_n) \in \mathbb{Z}_{\geq 0}^n : \sum a_i \leq 2n/3 \}+1$ , whereas the bound for sum-free is still $3 \cdot \{ (a_1, a_2, \ldots, a_n) \in \mathbb{Z}_{\geq 0}^n : \sum a_i \leq 2n/3 \}$ . But, as you will see, I am chasing much rougher bounds here.

http://sbseminar.wordpress.com/?p=6020

Extensions