GeistHaus
log in · sign up

in theory

Part of wordpress.com

"Marge, I agree with you - in theory. In theory, communism works. In theory." -- Homer Simpson

stories
FOCS Test of Time Awards
theorythings that are excellent
The FOCS test of time award recognizes each year a paper, or papers, each from the FOCS conference of 10, 20, and 30 years earlier for the impact they have had on theoretical computer science. It is an important occasion … Continue reading →
Show full content

The FOCS test of time award recognizes each year a paper, or papers, each from the FOCS conference of 10, 20, and 30 years earlier for the impact they have had on theoretical computer science.

It is an important occasion to appreciate the work that has had a long term impact on our field and that, for the 30 year old papers, maybe is not even cited any more and has turned into a “it is well known that…”

If your work has been significantly affected by a great paper appeared in FOCS 2014, FOCS 2004, or FOCS 1994, and the paper fits the nomination criteria, it is a valuable contribution to the community to nominate it.

Here is the call for nominations.

luca
http://lucatrevisan.wordpress.com/?p=4701
Extensions
Forthcoming workshops in Bocconi
BocconiMilansciencetheoryaverage-case complexitycryptographythings that are excellent
For the Italy-based readers and others that are able to make it to Milan on short notice, I would like to publicize that next week, in Bocconi, there will be a very exciting interdisciplinary workshop on average-case problems from several … Continue reading →
Show full content

For the Italy-based readers and others that are able to make it to Milan on short notice, I would like to publicize that next week, in Bocconi, there will be a very exciting interdisciplinary workshop on average-case problems from several perspectives: (1) evidence for and against the existence of efficient algorithms in different regimes coming from statistical physics, (2) the evidence for and against the existence of efficient algorithms in different regimes coming from convex optimization techniques, (3) strong evidence of hardness coming from worst-case to averarge-case reductions, (4) applications to cryptography.

On May 31, we are going to hold a Theory Day with a very diverse range of talks by an excellent collection of invited speakers, which includes Cynthia Dwork.

Participation is free for both events, but registration is required.

luca
http://lucatrevisan.wordpress.com/?p=4692
Extensions
Feige’s Conjecture and the Magic of Kikuchi Graphs
maththeoryKikuchi graphspectral graph theorythings that are excellent
A question that I am very interested in is whether it is possible to study hypergraphs with techniques that are in the spirit of spectral graph theory. It is generally possible to “flatten” the adjacency tensor of a hypergraph into … Continue reading →
Show full content

A question that I am very interested in is whether it is possible to study hypergraphs with techniques that are in the spirit of spectral graph theory.

It is generally possible to “flatten” the adjacency tensor of a hypergraph into a matrix, especially if the hypergraph is {k}-uniform with {k} even, and spectral properties of this matrix give information about the hypergraph, but usually a large amount of information is lost in this process, and the approach can only be applied to rather dense hypergraphs.

If we have a 4-uniform hypergraph with {n} vertices, for example, meaning a hypergraph in which each hyperedge is a set of four elements, its {n \times n \times n \times n} adjacency tensor can be flattened to a {n^2 \times n^2} matrix. Unless the hypergraph has a number of hyperedges that is at least quadratic in {n}, however, such a matrix is too sparse to provide any useful information via basic spectral techniques.

Recently, a number of results of a “spectral hypergraph theory” flavor have appeared that, instead, apply spectral graph theory techniques to the Kikuchi graph associated to a hypergraph, leading to very impressive applications such as new lower bounds for locally correctable codes.

In this post I would like to show a simple but rather magical use of this approach, that gives a proof of Feige’s conjecture concerning a “Moore bound for hypergraphs”.

In an undirected graph, the girth is the length of the shortest simple cycle, and in the previous post we told the story of trade-offs between density of the graph and girth, such as the Moore bound.

In a hypergraph, an interesting analog to the notion of girth is the size of the smallest even cover, where an even cover is a set of hyperedges such that every vertex belongs to an even number of hyperedges in the set. The reader should spend a minute to verify that if the hypergraph is a graph, this definition is indeed equivalent to the girth of the graph.

To see why this is a useful property, the hyperedges of a {k}-uniform hypergraph with vertex set V can be represented as vectors in {{\mathbb F}_2^V} in a standard way: a vector {x\in {\mathbb F}_2^V} represents the set {\{ v : x_v \neq 0 \}}. Under this representation, an even cover is a collection of hyperedges whose corresponding vectors have a linear dependency, so a {k}-uniform hypergraph with {n} vertices, {m} hyperedges and such that there is no even cover of size {\leq L} corresponds to a construction of {m} vectors in {{\mathbb F}_2^n}, each of Hamming weight {k}, such that any {L} of them are linearly independent. Having large collections of sparse vectors that don’t have small linear dependencies is useful in several applications.

It is easy to study the size of even covers in random hypergraphs, and a number of results about CSP refutations and SoS lower bounds rely on such calculations. Feige made the following worst-case conjecture:

Conjecture 1 (Moore Bound for Hypergraphs) If a {k}-uniform hypergraph has {n} vertices and {n\left( \frac{n}{r} \right) ^{\frac k2 -1}} hyperedges, then there must exist an even cover of size {\tilde O( r )}

Where the “tilde” hides a polylogn multiplicative factor. For {k=3}, for example, the conjecture asserts that a hypergraph with {n} vertices and {m} hyperedges must contain an even cover of size {\tilde O(n^3/m^2)}. For {k=4}, the bound is {\tilde O(n^2/m)}.

The Feige conjecture was recently proved by Guruswami, Kothari and Manohar using Kikuchi graphs and their associated matrices. In this post, we will see a simplified proof by Hsieh, Kothari and Mohanty (we will only see the even {k} case, which is much easier to analyze, and we will not prove the case of odd {k}, which is considerably more difficult).

1. A spectral proof of the Moore bound in graphs

We will first provide a spectral proof of the Moore bound for graphs, and then we will use Kikuchi graphs/matrices to lift the argument to hypergraphs.

To see how to use spectral arguments to reason about girth, let us start from the simple case of {d}-regular graphs. We want to reason about cycles in a graph, and a spectral concept immediately comes to mind: if {A} is the adjacency matrix of the graph, then {{\rm tr}(A^L)} is the number of closed walks of length {L} in the graph. If the girth of {A} is more than {L}, then none of those closed walks can be, or contain, a cycle, and this restricts how many such walks can exist. All such walks happen within the local tree that surrounds the start node, and the walk can just traverse part of this local tree. Combining these observations suffices to obtain a non-trivial bound.

First of all, if a graph is {d}-regular, we have {||A||=d} and so

\displaystyle  {\rm tr}(A^L) \geq ||A^L|| \geq d^L

and we claim that if the girth is more than {L} we also have

\displaystyle  {\rm tr} (A^L) \leq n\cdot 2^L \cdot (d-1)^{L/2}

This is because a closed walk of even length {L}, provided that {L} is less than the girth, can be specified by indicating the starting node, of which there are {n}, then for every step of the walk whether we are moving one step closer or one step further from the starting node in the local tree around the node, for which we have {2^L} choices to enumerate, and, finally, for the {L/2} steps in which we move one step further, which edge we follow, for which we have {d-1} choices. When we move closer to the start node, only one edge can be followed and there are no choices to enumerate.

Combining the bounds and taking {L}-th roots, we have {2n^{1/L} \geq \sqrt {d}}, which gives the weak but respectable Moore-like bound

\displaystyle  L\leq 2\log_{d/4} n

The next step is to make such an argument work for irregular graphs. We would again like to reason in terms of the trace of the power of a matrix, to relate it to counting closed walks, and to show that the count is small if the girth is large. We will need, however, to find the proper normalization, because we cannot use just the adjacency matrix any more.

If {A} is the adjacency matrix of an irregular graph {G} of average degree {\bar d} and {D} is the diagonal matrix of degrees of {G}, so that {D_{v,v} = d_v} is the degree of vertex {v}, a common normalization is to look at the matrix {D^{-1}A}, or to the symmetric matrix {D^{-1/2} A D^{-1/2}}, which has the same trace and eigenvalues. This is also the transition matrix of the random walk in {G}, so that when we study the trace of powers of this matrix we end up considering the probability that a random walk of a certain length returns to the start vertex. We also know that

\displaystyle  || D^{-1/2} A D^{-1/2} || = 1

so that for every {L}

\displaystyle  {\rm tr}( (D^{-1/2} A D^{-1/2})^L) \geq ||( D^{-1/2} A D^{-1/2}) ^L|| \geq 1

If we try to analyze the trace of powers of this matrix, however, we run into some technical difficulties, that are greatly simplified by a neat trick which is not totally natural and that I will try to justify.

Let us think about the regular case as analyzed above, except that we work with {A/d} for consistency. The division by {d} contributes a {1/d^L} term to the trace of the {L}-th power, so we “gain” a factor of {1/d} in each step. We lose it almost entirely when we do a step away from the start node, because we have to account for {d-1} choices, but we gain it entirely when we take a step closer to the start node, because that is a forced move for which we do not need to enumerate choices. We end up gaining a factor of {1/d} for each backward step, of which there are {L/2}, and so we can cleanly say that the trace goes down by a factor of {d^{-L/2}}. We pay {n} for the choice of the start vertex and {2^L} for the choice of when to go forward and when to backward, and the overall bound is {n 2^L d^{-L/2}} which becomes {2 n^{1/L} / \sqrt d} when we take the {L}-th root.

The difficulty in replicating this argument with the same cleanliness in the irregular case is that, when we are enumerating what can happen when the closed walk passes through a vertex {v}, the gain from a backward step from {v} is {1/d_v}, which could be much larger than {1/{\bar d}}, and it is not straightforward how to “average” things, because we are multiplying terms that are not independent.

It would certainly be very convenient if the normalization in a step from vertex {v} was such that we always gain a factor that is both smaller than {1/{\bar d}}, to have a proper account of the backward steps, and smaller than {1/d_v} to account for the possible forward steps.

The trick is simply to enforce that with a different normalization: instead of working with {D^{-1} A} or {D^{-1/2} A D^{-1/2}}, we define {\Gamma := \bar d \cdot I + D} and work with {\Gamma ^{-1 }A} and {\Gamma^{-1/2} A \Gamma^{-1/2}}. With this normalization, a step backward from {v} contributes {1/({\bar d} + d_v)} which is certainly less than {1/{\bar d}} while the enumeration of steps forward can still not exceed a contribution of 1.

The only concern about this new normalization is that we divide {A} by a larger matrix and this could cause its norm to drop too much, but fortunately the norm is essentially preserved, as we can see by using the test vector {\Gamma ^{1/2} {\bf 1}}.

We have that the quadratic form is

\displaystyle  {\bf 1}^T \Gamma^{1/2} \Gamma ^{-1/2} A \Gamma^{-1/2} \Gamma ^{1/2} {\bf 1} = {\bf 1} ^TA {\bf 1} = {\bar d} n

while the norm squared of the test vector is

\displaystyle  {\bf 1}^T \Gamma^{1/2} \Gamma^{1/2} {\bf 1} = {\bf 1}^T ( \bar d \cdot I + D) {\bf 1} = 2 {\bar d } n

so that we have a good lower bound on the norm

\displaystyle  || \Gamma^{-1/2} A \Gamma^{-1/2} || \geq \frac 12

To bound the trace of our normalized matrix, we perform a similar enumeration as the one we did in the regular case. We have to enumerate all closed walks {v_0,v_1,\ldots,, v_{L-1},v_L} where {v_0 = v_L} and the edge {(v_i,v_{i+1}) } exists for {i=0,\ldots,L-1}. We have {n} choices for the start vertex {v_0} and then {2^L} choices for whether, at each step, we move one step closer to the start vertex in the local tree, in a unique way, or one step forward, in one of {d_{v_i}-1} possible ways. We will denote by {N_0(v_i)} the set containing the unique neighbor of {v_i} that is one step closer to the start vertex in the local tree of the start vertex, and by {N_1(v_i)} the set containing the {d_v-1} neighbors of {v_1} that are one step further. If {b_v \in \{0,1\}} represents the binary choice of whether to move further or closer to the start vertex in the walk from node {v}, then {N_{b_v} (v)} is the set of choices that we have to enumerate.

With this notation we can write the trace as

\displaystyle  \begin{array}{rcl} & & {\rm tr}( (\Gamma^{-1} A)^L) \\ & = & \sum_{b \in \{ 0,1\}^L} \sum_{v_0 \in [n]} \sum_{v_1 \in N_{b_0}(v_0)}\Gamma^{-1}_{v_0,v_0} \ldots \\ & & \hspace{20pt} \sum_{v_{L-1} \in N_{b_{L-2}} (v_{L-2}) }\Gamma^{-1} _ { v_{L-2} ,v_{L-2} } \cdot {\bf 1}(v_0 \in N_{b_{L-1}} (v_{L-1})) \cdot \Gamma_{v_{L-1},v_{L-1}} \end{array}

In order to bound the above expression, we see that we have {2^L} choices for {b}, then {n} choices for {v_0}, and for each of the {L/2} steps from a vertex {v_i} in which we go backward we have a term {\Gamma^{-1} _{v_i,v_i}} which is {1/(d_{v_i} + \bar d)< 1/\bar d}; for each of the {L/2} steps from a vertex {v_i} in which we go forward we have {d_{v_i}-1} branches, but each multiplied by {1/(d_{v_i} + \bar d)< 1/d_{v_i}} so that the total contribution is less than 1.

Overall we can say

\displaystyle  \frac 1{2^L} < {\rm tr}( (\Gamma^{-1} A)^L) \leq n \cdot 2^L \cdot \bar d^{-L/2}

so that rearranging and taking {L}-th roots we have

\displaystyle  4n^{1/L} > \sqrt {\bar d}

that is also equivalent to

\displaystyle  L < 2\log_{\bar d/16} n

which is again a respectable Moore-like bound for irregular graphs.

A final note about this spectral argument is that {\Gamma^{-1} A} can still be seen as the transition matrix of a sort of random walk, that is, one that from {v} has a probability {\frac {\bar d}{\bar d + d_v}} of stopping, and probability {\frac 1{\bar d + d_v}} of proceeding through each neighbor of {v}. The trace of the {L}-th power of {\Gamma^{-1}A} computes, up to a factor of {n}, the collision probability of the distribution of the {(L/2)}-th step of this walk starting from a random vertex, and the negative logarithm of this probability is called the Renyi entropy of the distribution. This is all to say that this spectral argument is not totally unrelated to the argument in the previous post, in which we computed the entropy of a non-backtracking random walk started at a random vertex.

2. Kikuchi matrices and hypergraphs

We now come to the magic of this post, where we define Kikuchi graphs and associated matrices, and we see how to transfer spectral graph theory techniques to hypergraphs.

If {H= ([n],F)} is a {k}-uniform hypergraph with {n} vertices, {k} even, and if we select an integer parameter {r \geq k/2}, the Kikuchi graph associated to {H} and {r} is a graph {G= (V,E)} with {|V| = {n \choose r}} vertices defined as follows. We think of each vertex of {G} as a set {S} of {r} vertices of {H}; if {e\in F} is a hyperedge of {H}, then for every set {S} of {r} vertices we have the edge {(S, S \oplus e)} in {G}, where we use {\oplus} to denote symmetric difference between sets. In other words, the edges of {G} are the pairs of sets of {r} vertices of {H} whose symmetric difference is a hyperedge of {H}.

This feels a lot like a Cayley graph, and in fact it can be seen as a vertex-induced subgraph of a Cayley graph, but this would not give a very enlightening perspective on the content of this post, so we will not pursue this concept furhter.

Necessarily, in an edge {(S,T)} of the Kikuchi graph corresponding to an hyperedge {e}, the sets {S} and {T} must each contain {k/2} vertices of {e} each, and so the construction as described only works for hypergraphs where the hyperedge cardinality is even. We already mentioned that everything becomes considerably more complex in the odd case, which we will not deal with.

For each hyperedge {e} of {H}, there are {{k \choose k/2} \cdot {n-k \choose r-k/2}} choices of a set {S} containing {r} vertices of {H} of which {k/2} are from {e}. Overall, if {H} has {m} hyperedges, {G} is going to have

\displaystyle  \frac m2 \cdot {k \choose k/2} \cdot {n-k \choose r-k/2}

undirected edges (the factor of 2 comes from the fact that without it we would count {(S, S\oplus e)} and {(S\oplus e, S)} as two different edges while it is the same undirected edge).

If {r=k/2}, then the Kikuchi graph construction is analogous to the standard “flattening” of a tensor to a matrix or of a hypergraph to a graph, and it is not immediately clear from the definitions what is gained by choosing a larger {r} and obtaining a larger graph that only seems more redundant and more complex to analyze.

An important gain is that for larger {r} the resulting graph is denser. Indeed the average degree of the Kikuchi graph is

\displaystyle  \bar d = \frac m2 \cdot {k \choose k/2} \cdot {n-k \choose r-k/2} \cdot {n \choose r } ^{-1}

and one can see that for { r > k} and {r\leq n/8} the bound

\displaystyle  \bar d \geq \frac m2 \left ( \frac r n \right)^{k/2}

holds, showing how larger choices of {r} yield denser graphs.

What about even covers and Feige’s conjecture? Suppose {H} does not have even covers of size {\leq L}, and let us look at a length-{L} closed walk {S_0,S_1,\ldots S_L} in {G}, where {S_0 = S_L} and the edges {(S_i,S_{i+1})} exist in {G} for each step {i=0,\ldots,L-1} of the walk. Let us call {e_i} the hyperedge of {H} corresponding to the edge {(S_i,S_{i+1})}, so that we have {S_i = S_{i+1} \oplus e_i}. The key observation is that, iterating the definitions of the edges we have

\displaystyle  S_L = S_0 \oplus e_0 \oplus e_1 \ldots \oplus e_{L-1}

but we also have {S_L=S_0} and so

\displaystyle  e_0 \oplus e_1 \ldots \oplus e_{L-1} = \emptyset

From the multiset of hyperedges {(e_0,\ldots e_{L-1} )} let us repeatedly remove pairs of identical hyperedges: this process will terminate either with a set of hyperedges occurring each once, or with an empty collection of hyperedges. The former case cannot happen, however, because this set of hyperedges would be a double cover of cadinality less than {L}, so the process must terminate with an empty set, meaning that the multiset {( e_0,\ldots e_{L-1} )} is such that every hyperedge occurs an even number of times.

Maybe it is not clear where we are going with this, but we have made all the key observations to be ready to prove the Feige conjecture, because we have related lack of even covers in the hypergraph with restricted properties of closed walks in the associated Kikuchi graph, and we already have spectral techniques that can be applied when we understand restrictions on closed walks in a graph.

3. Proof of the Feige Conjecture

The Feige conjecture will follow from the bounds formalized below:

Lemma 1 (Main) Let {H} be an order {k}, {k} even, hypergraph with {n} vertices and {m} edges. Suppose that there is no even cover of size {\leq L} in {H}. Construct the Kikuchi graph {G} of {H} with parameter {r}, where {k < r < n/8}. Let {A} be the adjacency matrix of {G}, let {D} be its diagonal matrix of degrees, let {\bar d} be the average degree of {G}, define {\Gamma := \bar d \cdot I + D}. Then

\displaystyle  \frac 1 {2^L} \leq || ( \Gamma^{-1/2} A \Gamma^{-1/2} )^L || \leq {\rm tr}( ( \Gamma^{-1} A)^L) \leq n^r \cdot 2^L \cdot \left( \frac L {\bar d } \right) ^{L/2}

We already proved the first two inequalities. To deal with the last one we will enumerate the closed walks of length {L} in {G}.

To enumerate all length-{L} closed walks {S_0,\ldots,S_{L-1}, S_L} with {S_L=S_0}, we call {e_i} the hyperedge associated to the edge {(S_i,S_{i+1})}, and we again associate a vector {b\in \{0,1\}^L} to each walk, but this time {b_i=0} when the hyperedge {e_i} is old, meaning it has already appeared in the sequence {e_0,\ldots,e_{L-1}} as some {e_j} with {j<1}. We let {b_i=1} if the hyperedge {e_i} is new, meaning that it appears for the first time in the sequence {e_0,\ldots,e_{L-1}}.

Note that now the Kikuchi graph does not necessarily have large girth: for example it is easy to see that it has a lot of length-4 cycles. From the discussion in the previous section, however, we know that the sequence {e_0,\ldots,e_{L-1}} is such that every hyperedge that appears in the sequence appears an even number of times and, in particular, the sequence involves at most {L/2} distinct hyperedges. This will be enough to obtain the claimed upper bound on the trace.

Analogously with the notation used in the previous section, We let {N_{0} (S_i) } be the set of neighbors of {S_i} that are possible as a next step of the closed walk assuming that the step {(S_i,S_{i+1})} corresponds to an old hyperedge and {N_{1} (S_i) } the set of neighbors of {S_i} that are possible as a next step of the closed walk assuming that the step {(S_i,S_{i+1})} corresponds to a new hyperedge.

With this notation, we can write the trace as

\displaystyle  \begin{array}{rcl} & & {\rm tr}( (\Gamma^{-1} A)^L) \\ & = & \sum_{b \in \{ 0,1\}^L} \sum_{S_0 \in {[n] \choose r}} \sum_{S_1 \in N_{b_0} (S_0)} \Gamma^{-1} _{S_0,S_0} \ldots \\ & & \hspace{15pt} \sum_{S_{L-1} \in N_{b_{L-2}} (S_{L-2}) }\Gamma^{-1} _{S_{L-2}, S_{L-2} } \cdot {\bf 1}( S_0 \in N_{b_{L-1}} (S_{L-1}) ) \cdot \Gamma^{-1} _{S_{L-1}, S_{L-1} } \end{array}

We have {2^L} choices for {b} and {{n \choose r} \leq n^r} choices for {S_0}. For each step {i} of the walk, if {b_i = 0} and the step uses an old hyperedge, then {N_0(S_i)} has at most {i-1 \leq L} possibilities, corresponding to hyperedges that have been seen so far in the walk and that can be repeated. Each choice contributes a value of {\Gamma^{-1}_{S_i,S_i} = \frac 1{d_{S_i} + \bar d} < \frac 1 {\bar d}}, for a total contribution that is at most {\frac L{\bar d}}. Because all hyperedges that are used in the walk are used an even number of times, there have to be at least {L/2} steps in which we have an old hyperedge. When {b_i=1} and we have a new hyperedge at step {i}, then {N_1(S_i)} can be at most the set of all {d_{S_i}} neighbors of {S_i}, and each contributes a value of {\Gamma^{-1}_{S_i,S_i} = \frac 1 {d_{S_i} + \bar d} < \frac 1 {d_{S_i}}} so that the total contribution is less than one. This proves the claimed upper bound

\displaystyle  2^L n^r \left( \frac L {\bar d} \right)^{L/2}

and establishes the Main Lemma.

After taking {L}-th roots, the Main Lemma says that

\displaystyle  n^{r/L} \geq \frac 14 \sqrt{ \frac {\bar d}{L}}

We instantiate the construction to

\displaystyle  r:= \frac L{\log_2 n}

which makes {n^{r/L} = 2}, so that the Main Lemma implies {\bar d \leq 64 L}. Recall that we also proved

\displaystyle  \bar d \geq \frac m2 \left ( \frac r n \right)^{k/2}

which combines to

\displaystyle  r\log_2 n = L \geq \frac {\bar d}{64} \geq \frac m{128} \left ( \frac r n \right)^{k/2}

or, equivalently, we have proved the upper bound

\displaystyle  m \leq 128 n \log n \left( \frac nr \right)^{k/2 - 1}

provided that there is no even cover of size {\leq L = r \log_2 n}, which is precisely the Feige conjecture.

Without all the commentary that we added, the even case of the Feige conjecture, already a significant problem that had been open and challenging for a while, is solved in half a page of simple calculations. The magic of the Kikuchi graph comes in the very simple way that even covers in the hypergraph translate to restrictions to what closed walks can look like in the associated graph, and these are restrictions that are usefully exploitable in trace arguments.

As we remarked, something else that is also very useful in the connection is that the size of the sets corresponding to vertices in the Kikuchi graph is a parameter that we can control, and it “densifies” the graph the more it is turned up.

The odd {k} case is considerably more involved, and it is where all the major work is done, though the even case already gives a glimpse of the power of these techniques.

I would like to remark that several technical steps above are taken verbatim from the paper of Hsieh, Kothari and Mohanty, as they are already as simplified and clear as they can be. I want to thank Lucas Pesenti for having explained this proof to me.

luca
http://lucatrevisan.wordpress.com/?p=4685
Extensions
The Moore Bound for Irregular Graphs
mathgirthinformation theorythings that are excellent
Guruswami, Kothari and Manohar recently solved a conjecture of Feige about even covers in hypergraphs, with beautiful techniques that have several applications. I would like to describe some of their ideas in a subsequent post or series of posts. Feige’s … Continue reading →
Show full content

Guruswami, Kothari and Manohar recently solved a conjecture of Feige about even covers in hypergraphs, with beautiful techniques that have several applications. I would like to describe some of their ideas in a subsequent post or series of posts.

Feige’s conjecture is about an hypergraph analog of the question of how big can the girth of a graph be relative to its density, and I would like to start by sharing a “proof from the book” about the latter problem.

In an undirected graph, the girth is the length of the shortest simple cycle. If a graph has odd girth {g}, and we perform a BFS in it starting from any vertex, and we stop it after no more than {(g-1)/2} hops, then we are guaranteed to keep finding new vertices at every step of the exploration: if we had two paths of length at most {(g-1)/2} from the same source to the same destination we would actually have a cycle of length {g-1}. If the graph is {d}-regular, this means that, in this truncated BFS, we see at least

\displaystyle   n \geq 1 + d \cdot \sum_{i = 0}^{\frac {g-1}2 -1} (d-1)^i \ \ \ \ \ (1)

different vertices (because what we are seeing are the first {\frac {g-1}2} levels of a complete {d}-ary tree). There is a similar expression for even {g}.

This remarkable property of high-girth graphs to look “locally tree-like” is very interesting and it is useful in certain applications, but it clearly puts a trade-off between the number nodes and the number of edges in high-girth regular graphs. Obviously, the total number of vertices of the graph has to be at least the bound (1), meaning that if {n} is the number of vertices and {d} is the degree then the girth can be at most something like {2\log_{d-1} n}.

There are lots of questions about existence and constructions of high-girth graphs, and about the tightness of the above bound, but another really good question, raised by Bollobas, is what happens in irregular undirected graphs. Does a lower bound like (1), which is called the Moore bound, hold in irregular graphs, if we replace the regular degree {d} by the average degree {\bar d = 2|E|/|V|}? It took about 30 years for this question to be positively resolved by Alon, Hoory and Linial.

A simplified proof was then found by Babu and Radhakrishnan, with an information-theoretic approach. I will try to explain how one might have conceived and developed the latter proof. I want to thank Lucas Pesenti for explaining the proof to me.

We have an undirected graph with {n} vertices and {n\bar d /2} edges, where {\bar d} is the average degree, the girth is an odd number {g = 1 + 2r}, and we would like to say that {n} is large, for the same reason why this is true in the regular case: if we start from any vertex and explore using BFS up to distance {r}, we always discover new vertices at each iteration, the average vertex has degree {\bar d} and, although we cannot say that all these local trees are big, maybe there needs to be some local tree that is big, with a size of the order of {\bar d^r}.

All the interesting part of the argument (the rest is just a few lines) is already there in proving that there is a tree whose number of leaves is at least

\displaystyle  n \geq {\bar d} \cdot ({\bar d}-1)^{r-1}

where we just aim to match the number of nodes in the last level of the regular case, and we will focus on this slightly restricted goal.

Let’s put ourselves in an optimistic state of mind and imagine what a reasonably simple proof of such a fact might look like. Maybe actually the average local tree, starting from a random root vertex, is large, and at least as large as if the graph was regular (in the reular case all the local trees are identical copies of truncated complete {d}-ary trees and the average size is just the size, which is the Moore bound). The average size of a local tree of depth {r} rooted at a random vertex is a complicated expression of the degree distribution of the graph, but if it needs to be at least as large as the (achievable) case in which the degrees are all equal, it sounds like we would like to write, or lower bound, the average in terms of a concave function of the degree distribution, so that the bound in terms of the regular case would fall from convexity inequalities.

At this point the idea might come that somehow we should think about entropy and not cardinality, because the former can be used to give a lower bound for the latter, it has nice convexity properties, and it might work better in a probabilistic setting. So the goal might be to take a random start vertex {s}, consider the distribution of a random node {v} at distance {r} from {s}, and then say that the entropy of this distribution (averaged over the choices of {s}), needs to be at least {\log_2 \bar d \cdot (\bar d-1)^{r-1} }.

This view has the additional advantage that now, instead of talking about a random local tree, which is a fairly messy object, we are talking about simply picking a random vertex {s} and then a random path of length {r} from {s}, where we call the destination of the path {v}.

In defining this probabilistic process correctly, however, we have to be careful not to think of this path as being generated as a random walk, because a random walk can go back, and loop, and it is not guaranteed to be a simple path, the way paths in BFS trees are.

We can use, instead, the non-backtracking random walk process. In an undirected graph {G=(V,E)}, the non-backtracking random walk is the probabilistic process where, at every step, we keep track of the vertex {v} where we are and the vertex {u} where we were at the previous step, and, for the next step, we choose a new vertex {w} among the neighbors of {v} except for {u}. The vertex {v} becomes the new “previous vertex” and so on.

This means that the states of the walk are the ordered pairs of vertices joined by an edge, of which there are {2 \cdot |E|}. The transition rules are that we have transitions of the form {(a,b) \rightarrow (b,c)} , where {\{a,b\}, \{b,c\}} are pairs of edges of {G} with a vertex in common, and such a transition happens with probability {1/(d_b-1)}, where we use {d_v} the denote the degree of a vertex {v}.

Notice that if you start a non-backtracking random walk at the root of a tree, the walk will move forward to a random child, then a random child of the random child, until it reaches a leaf and gets stuck. If you do an {r}-step non-backtracking random walk in a graph that has girth {2r+1}, then you will explore a random path and reach a random leaf in one of the local trees that we are interested in, so this is indeed the process that we want.

A final consideration, that is standard in this setting, is that, if we select the start vertex of a non-backtracking walk with probability proportional to its degree, then every step of the walk traverses a uniformly distributed edge, and, at every step, the “current vertex” remains distributed with probability proportional to its degree: these are very convenient conditions to work with.

At this point, we have all the pieces of what could be a proof: we have our undirected irregular graph {G=(V,E)} with {n} vertices of average degree {\bar d} and girth {2r+1}, we consider the process of picking a random start vertex {s} with probability proportional to its degree, we perform an {r}-step non-backtracking random walk {s=v_0,v_1,v_2,\ldots,v_r}, and we would like to say that the expectation over {s} of the entropy of {v_r} given {s} is at least {\log_2 \bar d \cdot (\bar d-1)^{r-1}}. We will now prove that and we will interleave the chain of equalities and inequalities with explanations:

\displaystyle  H( v_r | s )

\displaystyle  = H(v_1,\ldots,v_r | s)

because the setup is such that there is only one path of length {r} from {s} to {v_r}

\displaystyle  = H(v_1 | s) + H(v_2| sv_1) + \cdots H(v_r | sv_1\cdots v_r)

by the Chain Rule

\displaystyle  = \mathop{\mathbb E} \log_2 d_s + \mathop{\mathbb E} \log_2 d_{v_1} + \cdots + \mathop{\mathbb E} \log_2 d_{v_{r-1}}

because the distribution of vertex {v_{t+1}} given the previous vertices is just the uniform distribution of the neighbors of {v_t} with the exception of {v_{t-1}}, so it is a uniform distribution over {d_{v_t}-1} possibilities, whose entropy is clearly {\log_2 (d_{v_t}-1)}. The expectation is over the process of selecting the walk. Recall that the walk process is such that, at each step, the “current vertex” always has the same distribution, so, calling {D} the distribution over vertices in which a vertex of the graph is sampled with probability proportional to its degee, the above expression is

\displaystyle  = \mathop{\mathbb E}_{v\sim D} \log_2 d_v + (r-1) \cdot \log_2 (d_v-1)

\displaystyle  = \mathop{\mathbb E} _{v\sim D} \log_2 d_v \cdot (d_v-1)^{r-1}

where we used linearity of expectation to collect terms under one sign of expectation. Indeed we can now use convexity to get a bound in terms of the regular case. Maybe there is another way that is a bit more transparent, but, following Babu-Radhakrishnan, we open up the expectation and apply Jensen to the convex function {x \rightarrow x \log_2 x (x-1)^{r-1}}, and we have

\displaystyle  = \frac 1{\bar d n}\sum_v d_v \log_2 d_v (d_v -1)^{r-1}

\displaystyle  \geq \log_2 \bar d (\bar d -1)^{r-1}

Success!

I really like the heuristic of thinking about what the cleanest possible approach to a proof would be, make some optimistic assumptions which if true would simplify things, and see where it goes. It almost never works, but it can rarely lead to very nice outcomes.

In terms of the history of this particular result, I think that the proof of Alon et al. is already based on getting to a concave expression so that one establishes a lower bound in terms of the regular case, but it is developed fully combinatorially. The proof of Babu and Radhakrishnan is in some sense mathematically the same, but they recognized that the information-theoretic language simplifies everything quite a lot.

luca
http://lucatrevisan.wordpress.com/?p=4683
Extensions
Introducing Bocconi’s new M.Sc. in Artificial Intelligence
Milanteachingthings that are excellent
This September, Bocconi will start a new M.Sc. in Artificial Intelligence. It will be a two-year computer science degree meant for students with Bachelor degrees in computer science, engineering, math, statistics, physics and related quantitative fields. In the first year, … Continue reading →
Show full content

This September, Bocconi will start a new M.Sc. in Artificial Intelligence. It will be a two-year computer science degree meant for students with Bachelor degrees in computer science, engineering, math, statistics, physics and related quantitative fields.

In the first year, courses on algorithms, mathematical methods, optimization, information theory, and software engineering will build a foundation in math and CS, then courses on deep learning, reinforcement learning, natural language processing and computer vision and image processing will go in depth on machine learning and some of its applications. In the second year there are various options and elective courses, with the possibility to study, for example, cryptography and blockchains, or bio-medical applications. As common for the second year of Bocconi’s M.Sc. degrees, there will be options for exchange programs to spend a semester abroad. Students also take a seminar on ethics in AI, a project-oriented AI lab, and a foreign language (not English and not the student’s native language) course. The language of instruction is English.

Tomorrow at 5pm CET there will be an online information session: those interested can sign up here.

More information about the degree are at www.unibocconi.eu/ai-msc.

Applications open today and are due by May 25th.

luca
http://lucatrevisan.wordpress.com/?p=4670
Extensions
Postdoc Positions for 2023-24
Bocconitheorypostdoc
I am looking for three postdoctoral fellows for the next academic year to work with me at Bocconi. The positions offer an internationally competitive salary (up to 65,000 Euro per year, tax-free, plus relocation assistance and travel allowance), in a … Continue reading →
Show full content

I am looking for three postdoctoral fellows for the next academic year to work with me at Bocconi.

The positions offer an internationally competitive salary (up to 65,000 Euro per year, tax-free, plus relocation assistance and travel allowance), in a wonderful location. The strict application deadline is January 31, 2023. Each position is for one year, renewable to a second year.

Among the topics that I am interested in are spectral graph theory, average-case complexity, “applications” of semidefinite programming, random processes on networks, approximation algorithms, pseudorandomness and combinatorial constructions.

Please contact me if you are interested in these positions and you would like more information.

To apply, go to this link, and then look for the opening for 3 positions dated December 6, 2022, like the one below (unfortunately there isn’t a perma-link to the application form):

Bocconi’s Computing Sciences department has a sizable theory group, that includes Laura Sanità, who works on optimization and approximation algorithms, Alon Rosen, who works on the foundations of cryptography, Marek Elias, who works on online optimization, and Andrea Celli, who works on algorithmic game theory. Next year, Adam Polak who works on fine-grained complexity and analysis of algorithms, will also join us.

Speaking of Alon Rosen, he is also recruiting postdocs for the next academic year, and he has two open positions, that you can find at the same link looking for two positions dated December 14 with an application deadline of February 28:

luca
http://lucatrevisan.wordpress.com/?p=4654
Extensions
Workshop on Fairness in AI
BocconitechnologyAlgoritmi + Inclusivithings that are excellent
Next Monday, June 27, I am organizing a workshop on issues around fairness, bias and discrimination in AI and Machine Learning. Here is a link to the program. Remote participation is possible (link in the website), and in-person participation is … Continue reading →
Show full content

Next Monday, June 27, I am organizing a workshop on issues around fairness, bias and discrimination in AI and Machine Learning.

Here is a link to the program. Remote participation is possible (link in the website), and in-person participation is free but we ask people to register so we can print badges and order the appropriate number of coffee breaks.

This workshop is being organized in partnership with EDGE, an Italian NGO that works on LGBT rights, and it is the first event of their initiative “A+I: Algoritmi + Inclusivi”, which will feature an awareness campaign and a series of video interviews that will start after the summer.

In next week’s workshop, Oreste Pollicino from Bocconi will talk about the perspective of the legal community around algorithmic discrimination, Symeon Papadopoulos from ITI Patras will give a survey on issues of fairness in image processing and image understanding, Sanghamitra Dutta from J.P. Morgan AI will talk about how to use the theory of causality to reason about fairness, Debora Nozza and Dirk Hovy from Bocconi will talk about issues of fairness in language models and natural language processing, and Omer Reingold from Stanford and Cynthia Dwork from Harvard will talk about modeling and achieving fairness in prediction models.

The last morning session will be a panel discussion moderated by Damiano Terziotti from EDGE about perspectives from the social sciences and from outside academia. It will feature, among others, Brando Benifei, a member of the EU parliament who has played a leading role in the 2021 draft EU regulations on AI. The other panel members are Alessandro Bonaita, who is a data science lead in Generali (Italy’s largest insurance company), Luisella Giani, who is leading a technology consulting branch of Oracle for Europe, Middle East and Africa, Cinzia Maiolini, who is in the national secretariat of CGIL, an Italian Union, and Massimo Airoldi from the University of Milan.

If you are in or near Milan next week, come to what is shaping up to be a memorable event!

luca
http://lucatrevisan.wordpress.com/?p=4640
Extensions
Workshop in Milan Next Week
Milantheorycryptographyspectral graph theorythings that are excellent
As previously announced, next week Alon Rosen and I are organizing a workshop at Bocconi, which will actually be the union of two workshops, one on Recent Advances in Cryptography and one on Spectral and Convex Optimization Techniques in Graph … Continue reading →
Show full content

As previously announced, next week Alon Rosen and I are organizing a workshop at Bocconi, which will actually be the union of two workshops, one on Recent Advances in Cryptography and one on Spectral and Convex Optimization Techniques in Graph Algorithms. Here is the program. In short:

  • where: Bocconi University’s Roentgen Building (via Roentgen 1, Milano), Room AS01
  • when: June 15-18
  • what: talks on cryptography and graph algorithms, including two hours devoted to Max Flow in nearly-linear time
  • how: register for free
luca
http://lucatrevisan.wordpress.com/?p=4635
Extensions
The First XL Computer Scientist
ItalytheoryPseudorandomnessthings that are excellentXL
Some time ago, I received a message to the effect that I was being considered for membership in the “Academy of the XL”, to which my reaction was, hey, we have all gone out of shape during the pandemic, and … Continue reading →
Show full content

Some time ago, I received a message to the effect that I was being considered for membership in the “Academy of the XL”, to which my reaction was, hey, we have all gone out of shape during the pandemic, and body-shaming is never… then it was explained to me that, in this context, “XL” means “forty” and that the Academy of the Forty is Italy’s National Academy of Science.

Italy has a wonderfully named, and well-known within the country, National Academy of Arts and Science, the Accademia dei Lincei, which means something like academy of the “eagle-eyed” (literally, lynx-eyed), that is, people that can see far. The Accademia dei XL is much less well known, although it has a distinguished 240-year history, during which people like Guglielmo Marconi and Enrico Fermi were members. More recently, the much beloved Rita Levi-Montalcini, Holocaust survivor, Nobel Laureate, and Senator-for-life, was a member. Current members include Nobel Laureates Carlo Rubbia and Giorgio Parisi. Noted algebraist Corrado De Concini is the current president.

Be that as it may, the academicians did vote to make me a member, their first computer scientist ever. Next week, at the inauguration of their 240th academic year, I will speak to the other members about randomness and pseudorandomness in computation.

luca
http://lucatrevisan.wordpress.com/?p=4630
Extensions