GeistHaus
log in · sign up

Clueless Fundatma

Part of feedburner.com

A random walk through a subset of things I care about. Science, math, computing, higher education, open source software, economics, food etc.

stories
LaTeX and arxiv
Show full content

Posting a LaTeX manuscript on arxiv is straightforward.

  1. Compile your document (say, main.tex). It is okay to leave all your figures in a "figs/" subfolder. Unlike some outlets, you don't have to flatten your directory structure.
  2. Apart from main.tex, main.bbl [bibliography], and figures, you may delete all other files. You need the bbl file because arxiv does not run bibtex.
  3. Zip the folder, and upload on arxiv.
If you have a supplementary information document (say, si.tex) and you use the "xr" package to cross-reference between main.tex and si.tex, then a few extra steps are required.
arxiv compiles all tex files in the zipped folder in alphabetic order. So it is important that "main.tex" appears before "si.tex", in case your tex files have different labels. 
  1. Compile main.tex and si.tex several times on your machine, so that inter-document cross-references work as desired.
  2. Apart from main.tex, si.tex, main.bbl, si.bbl, main.aux, si.aux, and figures, you may delete all other files. [xr uses main.aux and si.aux.]
  3. Relabel main.tex to main_ren.tex, main.bbl to main_ren.bbl, si.tex to si_ren.tex, and si.bbl to si_ren.bbl. Do not relabel *.aux files. Do not compile.
  4. Zip the folder, and upload on arxiv.

tag:blogger.com,1999:blog-7379110960796014170.post-2132492234012960651
Extensions
Large PDFs with Matplotlib
Show full content
Vector graphics (SVG/PDF) outputs of scatterplots with thousands of points lead to bloated files, unlike say raster formats like PNG. This makes scrolling PDF documents that include such bloated files a painful affair.
The reason is fairly obvious: vector files scale with the number of data-points, while raster files scale with the number of pixels.

There are many potential solutions. The simplest is to rasterize only the large dataset of scatter points using the rasterized=True flag. Thus,
plt.plot(x, y, 'o', alpha=0.1, rasterized=True)

The resulting PDF is much lighter.
tag:blogger.com,1999:blog-7379110960796014170.post-4748745287129665690
Extensions
Merging BibTeX bibliography files
bibtexlatex
Show full content

Suppose you want to merge two bib files (f1.bib and f2.bib) that have considerable overlap. One easy solution using Jabref works as described below.

Suppose the target bibliography file without duplicates is merge.bib.

1. Copy f1.bib to merge.bib [cp f1.bib merge.bib]

2. Open merge.bib with Jabref

3. Then click File > Import into current database and select the other file [f2.bib]

4. You get a dialog box which allows you to manually decide what entries/versions you want to retain. If both f1.bib and f2.bib are of comparable quality, you can select "Deselect all duplicates" which automatically unselects duplicated entries.

5. Hit "OK" and save the modfied database [Ctrl-S]

tag:blogger.com,1999:blog-7379110960796014170.post-335171618629390229
Extensions
Two useful Matplotlib utilities
Show full content

 1. Latexify_py

latexify is a Python package to compile a fragment of Python source code to a corresponding expression.


2. Pylustrator
Pylustrator offers an interactive interface to find the best way to present your data in a figure for publication. Added formatting an styling can be saved by automatically generated code. To compose multiple figures to panels, pylustrator can compose different subfigures to a single figure.
See Youtube demo.


tag:blogger.com,1999:blog-7379110960796014170.post-3340317422981657294
Extensions
LaTeX to Word
latex
Show full content

Often I have a document in LaTeX, and somebody else needs an editable copy in Word. Here is a list of hacks I have learnt to use:

1. If the document is relatively free of math and figures then the simplest course is often to compile a PDF, and "import" the PDF into MS Word. This works out remarkably well in many cases.

2. The same thing above applies to figures. You can now directly drop PDF images into a Word doc.

3. If you have lots of equations, then it is worthwhile to use pandoc

pandoc mydoc.tex -o mydoc.docx

More sophisticated options to copy cross-references, and bibliography exist. See this as well.

4. Many journals accept PDF figures. If they need TIFF, then you can use Adobe Acrobat online to do this conversion. In my experience, this produces smaller files compared to other automatic converters including ImageMagick.

tag:blogger.com,1999:blog-7379110960796014170.post-4170163808684388264
Extensions
Recursively Clean LaTeX Debris in all Sub-Folders
quicktip
Show full content

 Often, I have a big folder like Lectures/ which may have sub-folders based on topics, and each topic might have additional folders. To clean auxillary LaTeX files in one fell swoop use,

find ./ \( -iname "*.bbl" -o -iname "*.aux" -o -iname "*.log" -o -iname "*.blg" -o -iname "*.nav" -o -iname "*.snm" -o -iname "*.toc" -o -iname "*.vrb" -o -iname "*.out" -o -iname "*.synctex.gz" -o -iname _minted*" \) -delete


tag:blogger.com,1999:blog-7379110960796014170.post-7282680105888463669
Extensions
RegEx Help
linuxregex
Show full content

This ML based regex generator is quite handy! 

https://www.autoregex.xyz/home

tag:blogger.com,1999:blog-7379110960796014170.post-3983872237731886599
Extensions
Lectures on Graphical Models
lecturesprobability
Show full content

Christopher Bishop has an excellent set (1, 2, and 3) of introductory lectures on "Probabilistic Graphical Models". They are well-motivated and cover topics that include:

  • directed and undirected graphs
  • conditional independence
  • factor graphs
  • inference using factor graphs and sum/product rules

tag:blogger.com,1999:blog-7379110960796014170.post-5273762668474203195
Extensions
QuickTip: Extracting pages from PDF on Linux
Show full content

On a Mac OSX system, the default app Preview allows you to cut and paste pages from a PDF.

On Linux you can use PDFChain to manipulate PDFs. If you simply want to extract a certain range, then qpdf is quite handy.

A CLI solution is to use ghostscript as described here:

gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER \

       -dFirstPage=1 -dLastPage=15 -sOutputFile=outfile.pdf inpfile.pdf

You can make the interface friendlier by saving a function in your bashrc as described in the article.


tag:blogger.com,1999:blog-7379110960796014170.post-1563023955986480393
Extensions
Matplotlib: Lines Connecting Points and Boxes
Show full content

This gist has python functions that help Matplotlib draw lines connecting points, and to draw boxes.

def drawBox(xlim, ylim):
    pts = [[xlim[0], ylim[0]], [xlim[1], ylim[0]],            [xlim[1], ylim[1]], [xlim[0], ylim[1]],            [xlim[0], ylim[0]]]
    x, y = zip(*pts)
    return x, y

def connectPoints(pts):
    x, y = zip(*pts)
    return x, y
tag:blogger.com,1999:blog-7379110960796014170.post-7180703147324476552
Extensions
Quicktip: Batch convert LibreOffice documents to PDF
Show full content

To convert all the DOCX files the current working directory to PDF

 lowriter --headless --convert-to pdf *.docx

Similarly, to convert ODT files,

 lowriter --headless --convert-to pdf *.docx

tag:blogger.com,1999:blog-7379110960796014170.post-1627753946851645812
Extensions
QuickTip: LaTeX multiline equations with explanations
Show full content

Sometimes you want to write a sequence of steps, and write the explanation for each step next to it.

abc = xyz    pythagoras rule

    = uvw    triangle inequality

    = ABC    

It is easy to do this with the amsmath package as detailed in this StackOverflow question.
\usepackage{amsmath}

\begin{align*}
abc &= xyz \\
    &= uvw && \text{pythagoras rule} \\
    &= D   && \text{triangle inequality} \\
    &= ABC && 
\end{align*}



tag:blogger.com,1999:blog-7379110960796014170.post-9144808658838399408
Extensions
Smooth Transition Between Functions
math
Show full content

 Stitching together two functions is sometimes required as a way to transition from one dependence to another. The following schematic describes the idea pictorially:


Two different approaches are considered in this PDF (or this Jupyter Notebook).


tag:blogger.com,1999:blog-7379110960796014170.post-6978117268717848934
Extensions
Trapezoidal rule in log-log space
pythonSciComp
Show full content
Consider the problem described in this StackOverFlow post. You have a function with certain smoothness properties that are apparent on a log-log plot. This is often accompanied by a large domain of integration. It seems worthwhile to "integrate in logspace", whatever that means. 
This Jupyter notebook probes this question and makes some recommendations.
tag:blogger.com,1999:blog-7379110960796014170.post-8502572431310449563
Extensions
Quicktip: Reindent Python Scripts
python
Show full content
Suppose part of a python file uses spaces for indentation, while another part uses tabs. This will throw up exceptions at runtime. So the question is how to fix it.

One answer is to use the python script reindent.py. Stick it in some folder (~/bin/) in the default path and make it executable (chmod +x reindent.py).

The usage is straightforward:

reindent -n file.py

modifies the original file in place.
tag:blogger.com,1999:blog-7379110960796014170.post-6147677137041679267
Extensions
Matplotlib: Saving TIFF and JPG formats
python
Show full content
With pillow installed, on my LinuxMint installation:

import matplotlib
matplotlib.use('TkAgg') # backend

x = np.linspace(0,1)
plt.plot(x, x**2)
plt.savefig('test.tiff', dpi=300, fmt="tiff", pil_kwargs={"compression": "tiff_lzw"})


tag:blogger.com,1999:blog-7379110960796014170.post-9037155008202118086
Extensions
LaTeX: Cross-referencing between Different Documents
latexquicktip
Show full content
Problem: I have a manuscript TeX file (main.tex), and an independent supporting information file (si.tex). I was to cross-reference (using \label and \ref) items across the two files.

For example, I might want to reference figure 1 from si.tex in main.tex.

Solution: As this SO answer suggests, the answer lies in the CTAN package xr.

In main.tex, just include "si.tex" as an external documents, and all its labels become visible!

\usepackage{xr}
\externaldocument{si}
tag:blogger.com,1999:blog-7379110960796014170.post-8002400868511069481
Extensions
Parameter Uncertainty in Numpy Polyfit
python
Show full content
Say you want to fit a line to (x,y) data. With polyfit, you can say,
coeff = np.polyfit(x, y, 1)
With numpy 1.7 and greater, you can also request the estimated covariance matrix,
coeff, cov = np.polyfit(x, y, 1, cov=True)
The standard error on the parameters is the square-root of the diagonal elements
print(np.sqrt(np.diag(cov)))
This report referenced in the SO page is quite useful!
tag:blogger.com,1999:blog-7379110960796014170.post-9076001693744905870
Extensions
Learning Gaussian Processes
technical
Show full content
I've been studying up Gaussian process modeling for machine learning.

For someone seeing these concepts for the first time, I would recommend the following sequence based on my experience:

1. A Visual Exploration of Gaussian Processes

It hits the key points of what makes multinormal distributions special (conditionals and marginals are normal too!), and the visuals help build intuition.

1a. Gaussian Processes for Dummies

You might not need this, but I like this essay because it is jargon-free, and focuses on how to get things going. There is python code at the end, which you can play with.

2. Chapter 2 of Gaussian Process for Machine Learning

This "bible" is astonishingly well-written. If you are familiar with linear algebra and some statistics, this is a breezy read. Plus, all the important formulae and algorithms you see in different articles, are available here in one place!

3. If you like videos, then this YouTube lecture might be worth watching!
tag:blogger.com,1999:blog-7379110960796014170.post-7783801797054789719
Extensions
QuickTip: Toggling to Previous View in PDF Readers:
quicktip
Show full content
I use Preview (on my Mac laptop) and Foxit Reader (on my Linux Desktop) to read PDFs.

While reading papers, I often find myself clicking on links to citations. This takes me to the reference section. After looking up the citation, I like to go back to the previous location on the paper (right before clicking on the link).

How to go back to the "previous view" isn't well documented.

In Preview, the short cut is "Cmd + [" and "Cmd + ]".

In Foxit Reader for Linux (v2.4 and above) the short cut is "Alt + Left Arrow" and "Alt + Right Arrow", respectively. 
tag:blogger.com,1999:blog-7379110960796014170.post-915547280223216176
Extensions
QuickTip: Math Font in Matplotlib
pythonquicktip
Show full content
Matplotlib (v2 and higher) uses "mathtext" to render math by default. It is quite capable, but I don't like the default font, and prefer the classic "Computer Modern" font.

You can fix this globally by modifying the rc file in your custom-style file (use the command matplotlib.get_configdir() to find location) by adding the line:

mathtext.fontset : cm

If you want to render all text using LaTeX (this slows down rendering somewhat), then use:

text.usetex : true


tag:blogger.com,1999:blog-7379110960796014170.post-5277852549021918834
Extensions
Snip Math
math
Show full content
Mathpix Snip looks like an amazing tool.

You take a screenshot of some math and get it rendered in LaTeX.

The process as illustrated on their website:



It is available for download on all major OS.
tag:blogger.com,1999:blog-7379110960796014170.post-8701390930265791343
Extensions
Zero and Infinity
pop philosophy
Show full content
A triangle has three corners.

A pentagon has five. A decagon has ten.

As the number of corners becomes large, the polygon becomes more "circular".

When the number of corners is infinity, the polygon is a circle - a shape with zero corners!
tag:blogger.com,1999:blog-7379110960796014170.post-7602759657752126246
Extensions