GeistHaus
log in · sign up

https://feeds.feedburner.com/barmaley-exe-blog-feed

rss
26 posts
Polling state
Status active
Last polled May 19, 2026 01:25 UTC
Next poll May 20, 2026 04:55 UTC
Poll interval 86400s
Last-Modified Sun, 17 May 2026 22:37:46 GMT

Posts

Reciprocal Convexity to reverse the Jensen Inequality
postsmath

Jensen's inequality is a powerful tool often used in mathematical derivations and analyses. It states that for a convex function $f(x)$ and an arbitrary random variable $X$ we have the following upper bound: $$ f\left(\E X\right) \le \E f\left(X\right) $$

However, oftentimes we want the …

tag:None,2021-05-02:/posts/2021-05-02-reciprocal-convexity-to-reverse-the-jensen-inequality.html
Extensions
Not every REINFORCE should be called Reinforcement Learning
postsmachine learningRLREINFORCE

Deep RL is hot these days. It's one of the most popular topics in the submissions at NeurIPS / ICLR / ICML and other ML conferences. And while the definition of RL is pretty general, in this note I'd argue that the famous REINFORCE algorithm alone is not enough to label your …

tag:None,2020-11-29:/posts/2020-11-29-reinforce-is-not-rl.html
Extensions
A simpler derivation of f-GANs
postsmachine learninggan

I have been looking at $f$-GANs derivation doing some of my research, and found an easier way to derive its lower bound, without invoking convex conjugate functions.

$f$-GANs are a generalization of standard GANs to arbitrary $f$-divergence. Given a convex function $f$, $f$-divergence, in turn, can …

tag:None,2019-12-01:/posts/2019-12-01-a-simpler-derivation-of-f-gans.html
Extensions
Thoughts on Mutual Information: Alternative Dependency Measures
postsmachine learningmutual information

This posts finishes the discussion started in the Thoughts on Mutual Information: More Estimators with a consideration of alternatives to the Mutual Information.

Mutual Information

Let's step out a bit and take a critical look at the MI. One of its equivalent definitions says that it's a KL-divergence between the …

tag:None,2019-09-15:/posts/2019-09-15-thoughts-on-mutual-information-alternative-dependency-measures.html
Extensions
Thoughts on Mutual Information: Formal Limitations
postsmachine learningmutual information

This posts continues the discussion started in the Thoughts on Mutual Information: More Estimators. This time we'll focus on drawbacks and limitations of these bounds.

Let's start with a elephant in the room: a year ago an interesting preprint has been uploaded to arxiv: Formal Limitations on the Measurement of …

tag:None,2019-08-14:/posts/2019-08-14-thoughts-on-mutual-information-formal-limitations.html
Extensions
Thoughts on Mutual Information: More Estimators
postsmachine learningmutual information

In this post I'd like to show how Self-Normalized Importance Sampling (IWHVI and IWAE) and Annealed Importance Sampling can be used to give (sometimes sandwich) bounds on the MI in many different cases.

Mutual Information (MI) is an important concept from the Information Theory that captures the idea of information …

tag:None,2019-08-10:/posts/2019-08-10-thoughts-on-mutual-information-more-estimators.html
Extensions
Importance Weighted Hierarchical Variational Inference
postsmachine learningvariational inferenceneural samplers

This post finishes the discussion on Neural Samplers for Variational Inference by introducing some recent results (including mine).

Also, there's a talk recording of me presenting this post's content, so if you like videos more than texts, check it out.

Quick Recap

It all started with an aspiration for a …

tag:None,2019-05-10:/posts/2019-05-10-importance-weighted-hierarchical-variational-inference.html
Extensions
Neural Samplers and Hierarchical Variational Inference
postsmachine learningvariational inferenceneural samplers

This post sets background for the upcoming post on my work on more efficient use of neural samplers for Variational Inference.

Variational Inference

At the core of Bayesian Inference lies the well-known Bayes' theorem, relating our prior beliefs $p(z)$ with those obtained after observing some data $x$:

$$ p(z …

tag:None,2019-04-26:/posts/2019-04-26-neural-samplers-and-hierarchical-variational-inference.html
Extensions
Stochastic Computation Graphs: Fixing REINFORCE
postsmachine learningdeep learningstochastic computation graphs seriesREINFORCE

This is the final post of the stochastic computation graphs series. Last time we discussed models with discrete relaxations of stochastic nodes, which allowed us to employ the power of reparametrization.

These methods, however, posses one flaw: they consider different models, thus introducing inherent bias – your test time discrete model …

tag:None,2017-11-12:/posts/2017-11-12-stochastic-computation-graphs-fixing-reinforce.html
Extensions
Stochastic Computation Graphs: Discrete Relaxations
postsmachine learningdeep learningvariational inferencestochastic computation graphs series

This is the second post of the stochastic computation graphs series. Last time we discussed models with continuous stochastic nodes, for which there are powerful reparametrization technics.

Unfortunately, these methods don't work for discrete random variables. Moreover, it looks like there's no way to backpropagate through discrete stochastic nodes, as …

tag:None,2017-10-28:/posts/2017-10-28-stochastic-computation-graphs-discrete-relaxations.html
Extensions
Stochastic Computation Graphs: Continuous Case
postsmachine learningdeep learningstochastic computation graphs seriesREINFORCE

Last year I covered some modern Variational Inference theory. These methods are often used in conjunction with Deep Neural Networks to form deep generative models (VAE, for example) or to enrich deterministic models with stochastic control, which leads to better exploration. Or you might be interested in amortized inference.

All …

tag:None,2017-09-10:/posts/2017-09-10-stochastic-computation-graphs-continuous-case.html
Extensions
On No Free Lunch Theorem and some other impossibility results
postssemi-mathematicalmachine learningartificial intelligence

The more I talk to people online, the more I hear about the famous No Free Lunch Theorem (NFL theorem). Unfortunately, quite often people don't really understand what the theorem is about, and what its implications are. In this post I'd like to share my view on the NFL theorem …

tag:None,2017-07-23:/posts/2017-07-23-no-free-lunch-theorem.html
Extensions
Matrix and Vector Calculus via Differentials
postsmath

Many tasks of machine learning can be posed as optimization problems. One comes up with a parametric model, defines a loss function, and then minimizes it in order to learn optimal parameters. One very powerful tool of optimization theory is the use of smooth (differentiable) functions: those that can be …

tag:None,2017-01-29:/posts/2017-01-29-matrix-and-vector-calculus-via-differentials.html
Extensions
Neural Variational Inference: Importance Weighted Autoencoders
postsmachine learningdeep learningvariational inferencemodern variational inference series

Previously we covered Variational Autoencoders (VAE) — popular inference tool based on neural networks. In this post we'll consider, a followup work from Torronto by Y. Burda, R. Grosse and R. Salakhutdinov, Importance Weighted Autoencoders (IWAE). The crucial contribution of this work is introduction of a new lower-bound on the marginal …

tag:None,2016-07-14:/posts/2016-07-14-neural-variational-importance-weighted-autoencoders.html
Extensions
Neural Variational Inference: Variational Autoencoders and Helmholtz machines
postsmachine learningdeep learningvariational inferencemodern variational inference series

So far we had a little of "neural" in our VI methods. Now it's time to fix it, as we're going to consider Variational Autoencoders (VAE), a paper by D. Kingma and M. Welling, which made a lot of buzz in ML community. It has 2 main contributions: a new …

tag:None,2016-07-11:/posts/2016-07-11-neural-variational-inference-variational-autoencoders-and-helmholtz-machines.html
Extensions
Neural Variational Inference: Blackbox Mode
postsmachine learningdeep learningvariational inferencemodern variational inference series

In the previous post we covered Stochastic VI: an efficient and scalable variational inference method for exponential family models. However, there're many more distributions than those belonging to the exponential family. Inference in these cases requires significant amount of model analysis. In this post we consider Black Box Variational Inference …

tag:None,2016-07-05:/posts/2016-07-05-neural-variational-inference-blackbox.html
Extensions
Neural Variational Inference: Scaling Up
postsmachine learningdeep learningvariational inferencemodern variational inference series

In the previous post I covered well-established classical theory developed in early 2000-s. Since then technology has made huge progress: now we have much more data, and a great need to process it and process it fast. In big data era we have huge datasets, and can not afford too …

tag:None,2016-07-04:/posts/2016-07-04-neural-variational-inference-stochastic-variational-inference.html
Extensions
Neural Variational Inference: Classical Theory
postsmachine learningdeep learningvariational inferencemodern variational inference series

As a member of Bayesian methods research group I'm heavily interested in Bayesian approach to machine learning. One of the strengths of this approach is ability to work with hidden (unobserved) variables which are interpretable. This power however comes at a cost of generally intractable exact inference, which limits the …

tag:None,2016-07-01:/posts/2016-07-01-neural-variational-inference-classical-theory.html
Extensions
Exploiting Multiple Machines for Embarrassingly Parallel Applications
postsgnu parallellinux

During work on my machine learning project I was needed to perform some quite computation-heavy calculations several times — each time with a bit different inputs. These calculations were CPU and memory bound, so just spawning them all at once would just slow down overall running time because of increased amount …

tag:None,2014-08-01:/posts/2014-08-01-gnu-parallel.html
Extensions
On Sorting Complexity
postsalgortihmscomputer science

It's well known that lower bound for sorting problem (in general case) is $\Omega(n \log n)$. The proof I was taught is somewhat involved and is based on paths in "decision" trees. Recently I've discovered an information-theoretic approach (or reformulation) to that proof.

First, let's state the problem: given …

tag:None,2014-05-01:/posts/2014-05-01-on-sorting-complexity.html
Extensions
Namespaced Methods in JavaScript
postsjavascriptecmascript 5

Once upon a time I was asked (well, actually a question wasn't for me only, but for whole habrahabr's community) is it possible to implement namespaced methods in JavaScript for built-in types like:

5..rubish.times(function() { // this function will be called 5 times
  console.log("Hi there!");
});

"some string …
tag:None,2013-05-23:/posts/2013-05-23-js-namespaced-methods.html
Extensions
Crazy Expression Parsing
postspythonmadness

Suppose we have an expression like (5+5 * (x^x-5 | y && 3)) and we'd like to get some computer-understandable representation of that expression, like:

ADD Token[5] (MUL Token[5] (AND (BIT_OR (XOR Token[x] (SUB Token[x] Token[5])) Token[y]) Token[3])

In case if you don't know …

tag:None,2013-03-30:/posts/2013-03-30-crazy-expression-parsing.html
Extensions
Memoization Using C++11
postsC++C++11memoizationoptimization

Recently I've read an article Efficient Memoization using Partial Function Application. Author explains function memoization using partial application. When I was reading the article, I thought "Hmmm, can I come up with a more general solution?" And as suggested in comments, one can use variadic templates to achieve it. So …

tag:None,2013-03-29:/posts/2013-03-29-cpp-11-memoization.html
Extensions
Resizing Policy of std::vector
postsC++math

Sometime ago when Facebook opensourced their Folly library I was reading their docs and found something interesting. In section "Memory Handling" they state

In fact it can be mathematically proven that a growth factor of 2 is rigorously the worst possible because it never allows the vector to reuse any …

tag:None,2013-02-10:/posts/2013-02-10-std-vector-growth.html
Extensions