Sobolev.space — GeistHaus

Reciprocal Convexity to reverse the Jensen Inequality

Artem Sobolev May 1, 2021

Jensen's inequality is a powerful tool often used in mathematical derivations and analyses. It states that for a convex function $f(x)$ and an arbitrary random variable $X$ we have the following upper bound: $$ f\left(\E X\right) \le \E f\left(X\right) $$

However, oftentimes we want the …

tag:None,2021-05-02:/posts/2021-05-02-reciprocal-convexity-to-reverse-the-jensen-inequality.html

Extensions

Not every REINFORCE should be called Reinforcement Learning

Artem Sobolev Nov 28, 2020

Deep RL is hot these days. It's one of the most popular topics in the submissions at NeurIPS / ICLR / ICML and other ML conferences. And while the definition of RL is pretty general, in this note I'd argue that the famous REINFORCE algorithm alone is not enough to label your …

tag:None,2020-11-29:/posts/2020-11-29-reinforce-is-not-rl.html

Extensions

A simpler derivation of f-GANs

Artem Sobolev Nov 30, 2019

I have been looking at $f$-GANs derivation doing some of my research, and found an easier way to derive its lower bound, without invoking convex conjugate functions.

$f$-GANs are a generalization of standard GANs to arbitrary $f$-divergence. Given a convex function $f$, $f$-divergence, in turn, can …

tag:None,2019-12-01:/posts/2019-12-01-a-simpler-derivation-of-f-gans.html

Extensions

Thoughts on Mutual Information: Alternative Dependency Measures

Artem Sobolev Sep 14, 2019

This posts finishes the discussion started in the Thoughts on Mutual Information: More Estimators with a consideration of alternatives to the Mutual Information.

Mutual Information

Let's step out a bit and take a critical look at the MI. One of its equivalent definitions says that it's a KL-divergence between the …

tag:None,2019-09-15:/posts/2019-09-15-thoughts-on-mutual-information-alternative-dependency-measures.html

Extensions

Thoughts on Mutual Information: Formal Limitations

Artem Sobolev Aug 13, 2019

This posts continues the discussion started in the Thoughts on Mutual Information: More Estimators. This time we'll focus on drawbacks and limitations of these bounds.

Let's start with a elephant in the room: a year ago an interesting preprint has been uploaded to arxiv: Formal Limitations on the Measurement of …

tag:None,2019-08-14:/posts/2019-08-14-thoughts-on-mutual-information-formal-limitations.html

Extensions

Thoughts on Mutual Information: More Estimators

Artem Sobolev Aug 9, 2019

In this post I'd like to show how Self-Normalized Importance Sampling (IWHVI and IWAE) and Annealed Importance Sampling can be used to give (sometimes sandwich) bounds on the MI in many different cases.

Mutual Information (MI) is an important concept from the Information Theory that captures the idea of information …

tag:None,2019-08-10:/posts/2019-08-10-thoughts-on-mutual-information-more-estimators.html

Extensions

Importance Weighted Hierarchical Variational Inference

Artem Sobolev May 9, 2019

This post finishes the discussion on Neural Samplers for Variational Inference by introducing some recent results (including mine).

Also, there's a talk recording of me presenting this post's content, so if you like videos more than texts, check it out.

Quick Recap

It all started with an aspiration for a …

tag:None,2019-05-10:/posts/2019-05-10-importance-weighted-hierarchical-variational-inference.html

Extensions

Neural Samplers and Hierarchical Variational Inference

Artem Sobolev Apr 25, 2019

This post sets background for the upcoming post on my work on more efficient use of neural samplers for Variational Inference.

Variational Inference

At the core of Bayesian Inference lies the well-known Bayes' theorem, relating our prior beliefs $p(z)$ with those obtained after observing some data $x$:

$$ p(z …

tag:None,2019-04-26:/posts/2019-04-26-neural-samplers-and-hierarchical-variational-inference.html

Extensions

Stochastic Computation Graphs: Fixing REINFORCE

Artem Sobolev Nov 11, 2017

This is the final post of the stochastic computation graphs series. Last time we discussed models with discrete relaxations of stochastic nodes, which allowed us to employ the power of reparametrization.

These methods, however, posses one flaw: they consider different models, thus introducing inherent bias – your test time discrete model …

tag:None,2017-11-12:/posts/2017-11-12-stochastic-computation-graphs-fixing-reinforce.html

Extensions

Stochastic Computation Graphs: Discrete Relaxations

Artem Sobolev Oct 27, 2017

This is the second post of the stochastic computation graphs series. Last time we discussed models with continuous stochastic nodes, for which there are powerful reparametrization technics.

Unfortunately, these methods don't work for discrete random variables. Moreover, it looks like there's no way to backpropagate through discrete stochastic nodes, as …

tag:None,2017-10-28:/posts/2017-10-28-stochastic-computation-graphs-discrete-relaxations.html

Extensions

Stochastic Computation Graphs: Continuous Case

Artem Sobolev Sep 9, 2017

Last year I covered some modern Variational Inference theory. These methods are often used in conjunction with Deep Neural Networks to form deep generative models (VAE, for example) or to enrich deterministic models with stochastic control, which leads to better exploration. Or you might be interested in amortized inference.

All …

tag:None,2017-09-10:/posts/2017-09-10-stochastic-computation-graphs-continuous-case.html

Extensions

ICML 2017 Summaries

Artem Sobolev Aug 13, 2017

Just like with NIPS last year, here's a list of ICML'17 summaries (updated as I stumble upon new ones)

Random ML&Datascience musing by Olga Liakhovich

tag:None,2017-08-14:/posts/2017-08-14-icml-2017.html

Extensions

On No Free Lunch Theorem and some other impossibility results

Artem Sobolev Jul 22, 2017

The more I talk to people online, the more I hear about the famous No Free Lunch Theorem (NFL theorem). Unfortunately, quite often people don't really understand what the theorem is about, and what its implications are. In this post I'd like to share my view on the NFL theorem …

tag:None,2017-07-23:/posts/2017-07-23-no-free-lunch-theorem.html

Extensions

Matrix and Vector Calculus via Differentials

Artem Sobolev Jan 28, 2017

Many tasks of machine learning can be posed as optimization problems. One comes up with a parametric model, defines a loss function, and then minimizes it in order to learn optimal parameters. One very powerful tool of optimization theory is the use of smooth (differentiable) functions: those that can be …

tag:None,2017-01-29:/posts/2017-01-29-matrix-and-vector-calculus-via-differentials.html

Extensions

NIPS 2016 Summaries

Artem Sobolev Dec 30, 2016

I did not attend this year's NIPS, but I've gathered many summaries published online by those who did attend the conference.

tag:None,2016-12-31:/posts/2016-12-31-nips-2016-summaries.html

Extensions

Neural Variational Inference: Importance Weighted Autoencoders

Artem Sobolev Jul 13, 2016

Previously we covered Variational Autoencoders (VAE) — popular inference tool based on neural networks. In this post we'll consider, a followup work from Torronto by Y. Burda, R. Grosse and R. Salakhutdinov, Importance Weighted Autoencoders (IWAE). The crucial contribution of this work is introduction of a new lower-bound on the marginal …

tag:None,2016-07-14:/posts/2016-07-14-neural-variational-importance-weighted-autoencoders.html

Extensions

Neural Variational Inference: Variational Autoencoders and Helmholtz machines

Artem Sobolev Jul 10, 2016

So far we had a little of "neural" in our VI methods. Now it's time to fix it, as we're going to consider Variational Autoencoders (VAE), a paper by D. Kingma and M. Welling, which made a lot of buzz in ML community. It has 2 main contributions: a new …

tag:None,2016-07-11:/posts/2016-07-11-neural-variational-inference-variational-autoencoders-and-helmholtz-machines.html

Extensions

Neural Variational Inference: Blackbox Mode

Artem Sobolev Jul 4, 2016

In the previous post we covered Stochastic VI: an efficient and scalable variational inference method for exponential family models. However, there're many more distributions than those belonging to the exponential family. Inference in these cases requires significant amount of model analysis. In this post we consider Black Box Variational Inference …

tag:None,2016-07-05:/posts/2016-07-05-neural-variational-inference-blackbox.html

Extensions

Neural Variational Inference: Scaling Up

Artem Sobolev Jul 3, 2016

In the previous post I covered well-established classical theory developed in early 2000-s. Since then technology has made huge progress: now we have much more data, and a great need to process it and process it fast. In big data era we have huge datasets, and can not afford too …

tag:None,2016-07-04:/posts/2016-07-04-neural-variational-inference-stochastic-variational-inference.html

Extensions

Neural Variational Inference: Classical Theory

Artem Sobolev Jun 30, 2016

As a member of Bayesian methods research group I'm heavily interested in Bayesian approach to machine learning. One of the strengths of this approach is ability to work with hidden (unobserved) variables which are interpretable. This power however comes at a cost of generally intractable exact inference, which limits the …

tag:None,2016-07-01:/posts/2016-07-01-neural-variational-inference-classical-theory.html

Extensions

Exploiting Multiple Machines for Embarrassingly Parallel Applications

Artem Sobolev Jul 31, 2014

During work on my machine learning project I was needed to perform some quite computation-heavy calculations several times — each time with a bit different inputs. These calculations were CPU and memory bound, so just spawning them all at once would just slow down overall running time because of increased amount …

tag:None,2014-08-01:/posts/2014-08-01-gnu-parallel.html

Extensions

On Sorting Complexity

Artem Sobolev Apr 30, 2014

It's well known that lower bound for sorting problem (in general case) is $\Omega(n \log n)$. The proof I was taught is somewhat involved and is based on paths in "decision" trees. Recently I've discovered an information-theoretic approach (or reformulation) to that proof.

First, let's state the problem: given …

tag:None,2014-05-01:/posts/2014-05-01-on-sorting-complexity.html

Extensions

Namespaced Methods in JavaScript

Artem Sobolev May 22, 2013

Once upon a time I was asked (well, actually a question wasn't for me only, but for whole habrahabr's community) is it possible to implement namespaced methods in JavaScript for built-in types like:

5..rubish.times(function() { // this function will be called 5 times
  console.log("Hi there!");
});

"some string …

tag:None,2013-05-23:/posts/2013-05-23-js-namespaced-methods.html

Extensions

Crazy Expression Parsing

Artem Sobolev Mar 29, 2013

Suppose we have an expression like (5+5 * (x^x-5 | y && 3)) and we'd like to get some computer-understandable representation of that expression, like:

ADD Token[5] (MUL Token[5] (AND (BIT_OR (XOR Token[x] (SUB Token[x] Token[5])) Token[y]) Token[3])

In case if you don't know …

tag:None,2013-03-30:/posts/2013-03-30-crazy-expression-parsing.html

Extensions

Memoization Using C++11

Artem Sobolev Mar 28, 2013

Recently I've read an article Efficient Memoization using Partial Function Application. Author explains function memoization using partial application. When I was reading the article, I thought "Hmmm, can I come up with a more general solution?" And as suggested in comments, one can use variadic templates to achieve it. So …

tag:None,2013-03-29:/posts/2013-03-29-cpp-11-memoization.html

Extensions

Resizing Policy of std::vector

Artem Sobolev Feb 9, 2013

Sometime ago when Facebook opensourced their Folly library I was reading their docs and found something interesting. In section "Memory Handling" they state

In fact it can be mathematically proven that a growth factor of 2 is rigorously the worst possible because it never allows the vector to reuse any …

tag:None,2013-02-10:/posts/2013-02-10-std-vector-growth.html

Extensions

https://feeds.feedburner.com/barmaley-exe-blog-feed

Posts