Tafkas Blog — GeistHaus

Apr 17, 2024

Django projects often call for a robust, powerful setup to ensure a smooth development and deployment process. Cookiecutter Django is a popular framework that aims to offer Django users a comprehensive, out-of-the-box setup, including configurations for databases, templates, and much more. Cookiecutter Django exclusively supports PostgreSQL, reflecting its intention for production-level applications where PostgreSQL's advanced features can be a real asset. However, there might be scenarios where a developer wants to use SQLite instead, perhaps for small-scale applications, quick prototypes, or simply due to personal preference or familiarity with SQLite.

https://blog.tafkas.net/2024/04/17/using-cookiecutter-django-with-sqlite/

Maintenance of Apache Iceberg Tables

Jan 31, 2024

In the world of data warehousing and large-scale data analysis, table formats like Apache Iceberg play a pivotal role in managing massive datasets. If you're dealing with Iceberg tables, maintaining and optimizing them can lead to significant performance improvements. Let's delve into the three key ways to maintain Iceberg tables for optimized performance and data management. Partitions Partitioning your Iceberg tables is one of the simplest and most effective methods to boost performance.

https://blog.tafkas.net/2024/01/31/maintenance-of-apache-iceberg-tables/

Fix Python Abort libcrypto dylib on macOS 10.15 Catalina

Dec 12, 2019

After the release of macOS 10.15.2 two days agao I have upgraded my mac at work to latest version today. Immediately, after running pip to install some packages I was greeted with an abort. I checked the crash reporter to find the offender: Application Specific Information: /usr/lib/libcrypto.dylib abort() called Invalid dylib load. Clients should not load the unversioned libcrypto dylib as it does not have a stable ABI. After poking around for a bit I figured out it was because of the asn1crypto library.

https://blog.tafkas.net/2019/12/12/fix-python-abort-libcrypto-dylib-on-macos-10.15-catalina/

Creating a grid based on Geohashes

Sep 28, 2018

When dealing with geospatial data it is sometimes useful to have a grid at hand that represents the given data. One way to create a grid like this is to use Geohashes. GeoHashes are a hierarchical spatial data structure which subdivides space into buckets of grid shape, which is one of the many applications of what is known as a Z-order curve, and generally space-filling curves. A Geohash is an encoded character string that is computed from geographic coordinates.

https://blog.tafkas.net/2018/09/28/creating-a-grid-based-on-geohashes/

Upgrading PostgreSQL Using Homebrew

Oct 19, 2017

On October 5th the PostgreSQL Global Development Group announced the release of PostgreSQL 10. It comes with tremendous amount of new features like Table partitioning Logical replication Improved parallel queries Stronger password hashing Durable Hash Indexes and more. A nice list, including explanations can be found on Robert Haas’ blog. This post explains how to upgrade to the latest version of PostgreSQL on macOS using Homebrew. At the time of this writing I was using macOS 10.

https://blog.tafkas.net/2017/10/19/upgrading-postgresql-using-homebrew/

Attending PyCon US Development Sprints

May 29, 2017

After three great days at the PyCon US 2017 in Portland, OR Hendrik and I decided to participate in the development sprints succeeding the conferece. The code sprints are an essential part of PyCon, and a chance to meet some of the maintainers and contributors of various open source projects. For us it was the first time attending a code sprint. The day before the sprint there was a session helping people to set up Git, Python (including virtual environments) and getting familiar with version control.

https://blog.tafkas.net/2017/05/29/attending-pycon-us-development-sprints/

Open Sourcing Google Adwords Downloader

Mar 1, 2017

Google Ads, the globally renowned advertising platform, empowers numerous businesses to strategically place ads, reach prospective customers, and grow their presence. The kaleidoscope of data that Google Ads provides forms the bedrock of insightful business decisions, higher return on investment, and the optimization of AdWords campaigns. While Google Ads features a user-friendly interface for data access and management, some tasks often benefit from a programmatic approach. In response to this need, Google provides the AdWords API.

https://blog.tafkas.net/2017/03/01/open-sourcing-google-adwords-downloader/

Contact me

Feb 19, 2017

If you want to get in contact with me, shoot me an email or connect via https://twitter.com/Tafkas https://github.com/Tafkas https://gitlab.com/Tafkas https://linkedin.com/in/stadeschuldt https://google.com/+ChristianStadeSchuldt

https://blog.tafkas.net/page/contact/

About me

Feb 19, 2017

https://blog.tafkas.net/page/about/

Monitoring Jupyter Notebooks with Munin

Jan 28, 2017

Recently, I set up Jupyter Notebooks on a server at work. The idea was to create an enviroment where every team member could run analyses using Python and share the results with the rest. After reading the documentation, I found out that the Jupyter Notebook web application comes with a Contents API I quickly put together a little Munin script that collects some statistics about the current notebooks. The graph shows the total number of notebooks on the server as well as the currently open notebooks:

https://blog.tafkas.net/2017/01/28/monitoring-jupyter-notebooks-with-munin/

Auto-timestamping SQLAlchemy entities

Mar 7, 2016

There are a lot of cases when we want to track time when an entity was created or updated. Here is a simple recipe to make some or all of your SQLAlchemy entities auto-timestamping. To achieve this, we will provide a mixin class. from datetime import datetime from sqlalchemy import Column, DateTime, event class TimeStampMixin(object): """ Timestamping mixin """ created_at = Column(DateTime, default=datetime.utcnow) created_at._creation_order = 9998 updated_at = Column(DateTime, default=datetime.utcnow) updated_at.

https://blog.tafkas.net/2016/03/07/auto-timestamping-sqlalchemy-entities/

MensaBot - A Slack Bot For Lunch Information

Jan 25, 2016

Everyday at work around noon the question of where to get lunch comes up. Normally, we choose between different restataurants in the vicinity of the office. One exception is the HU Mensa (university cafeteria). Despite being really cheap the food quality there varies a lot and it really depends on the daily menu whether a visit is worthwhile. To tackle this issue I decided to spent another IT Open Space putting together a little script that will help us in the future.

https://blog.tafkas.net/2016/01/25/mensabot-a-slack-bot-for-lunch-information/

Codebase Cop - A Slack Bot Watching Over Your Tickets

Oct 10, 2015

At Project-A we are using Codebase as a project management tool together with its version control. Just as with any other tools you can create tickets and organize them in sprints. Our usual (very simplified) workflow includes: Sprint planning for tickets Priotizing tickets Developer working on tickets Product managers verifying if the tickets were implemented as intended Unfortunately, sometimes your backlog keeps growing and tickets are no longer valid, outdated or, in the worst case, just forgotten.

https://blog.tafkas.net/2015/10/10/codebase-cop-a-slack-bot-watching-over-your-tickets/

Identifying redundant edges in a dependency graph

Sep 22, 2015

An ETL import graph is build on logical dependencies of the jobs to each other. So typically a SQL transformation job depends on all the previous jobs that create the tables used in the query. But once there are a certain number of jobs, dependencies often get a bit more complicated and some of them become redundant in the process. A simple example can be seen in the dependency graph from figure, where the three red edges are redundant.

https://blog.tafkas.net/2015/09/22/identifying-redundant-edges-in-a-dependency-graph/

A cost-based scheduler for ETL pipelines

Apr 17, 2015

To speed up the ETL data pipeline, you should try to run jobs in parallel. Obviously, not all jobs can run at the same time in most cases, since there are dependency constraints between the jobs and limits of the servers capacity (number of processors and/or IO bandwidth). So assuming the server allows you to run n jobs in parallel, often there is the situation that the dependencies give you the option to run any of a set of m different jobs with m > n.

https://blog.tafkas.net/2015/04/17/a-cost-based-scheduler-for-etl-pipelines/

Run a website on the Raspberry Pi using Middleman

Feb 22, 2015

Once you have set-up a web server like Apache or nginx running on the Raspberry Pi it is time to create a website. From here there a several options: A CMS that relies on a database, some purely manual crafted pages or a static pages generated by a script. I chose the latter for some reasons. Static sites have a lot of advantages: no database to slow requests down offer greater security, as they do not contain dynamic content, so are immune to the most common attacks flat, text files, makes them ideal to be used with version control systems, such as Git low footprint on the server as serving raw html files But there also some limitations:

https://blog.tafkas.net/2015/02/22/run-a-website-on-the-raspberry-pi-using-middleman/

Monitoring a Synology Diskstation with Munin

Jan 15, 2015

I have been using Munin to monitor the health of my Raspberry Pi for while now. As I have more devices installed in my network I was looking for a way to monitor these devices as well. As Munin uses a client-server model you are required to install the Munin node on the device to be monitored. Every five minutes the Munin server polls its clients for the values and creates charts using RRDTool.

https://blog.tafkas.net/2015/01/15/monitoring-a-synology-diskstation-with-munin/

SolarPi - A Flask powered photovoltaic monitor

Nov 19, 2014

After collecting some photovoltaic data using PikoPy and a some readings from the residential meter it was time to put everything together. The data is collected by a couple of scripts triggered by a cronjob every five minutes. $ crontab -l */5 * * * * python /home/solarpi/kostal_piko.py */5 * * * * python /home/solarpi/collect_meter.py */15 * * * * python /home/solarpi/collect_weather.py The results are then written into a SQLite database.

https://blog.tafkas.net/2014/11/19/solarpi-a-flask-powered-photovoltaic-monitor/

PikoPy: A python package for working with a Piko Inverter from Kostal

Aug 1, 2014

The first step of my plan, building a Raspberry Pi based photovoltaic monitoring solution, is finished. I created a python package that works with the Kostal Piko 5.5 inverter (and theoretically should work with other Kostal inverters as well) and offers a clean interface for accessing the data: import pikopy #create a new piko instance p = Piko('host', 'username', 'password') #get current power print p.get_current_power() #get voltage from string 1 print p.

https://blog.tafkas.net/2014/08/01/pikopy-a-python-package-for-working-with-a-piko-inverter-from-kostal/

Determine your Fitbit stride length using a GPS watch

Jul 15, 2014

I have been carrying my Fitbit One for a little over two years with me and it keeps tracking my daily steps. It also tracks my distance covered by multiplying those steps using the stride length which you can either provide explicitly or implicitly setting your heights. In the winter of 2012 I bought my first ~Garmin Forerunner 410~ (replaced by a Garmin Forerunner 920XT) GPS watch to help me track my running (and other outdoor) activities.

https://blog.tafkas.net/2014/07/15/determine-your-fitbit-stride-length-using-a-gps-watch/

A Raspberry Pi photovoltaic monitoring solution

Jul 3, 2014

A friend of mine had a photovoltaic system (consisting of 14 solar panels) installed on his rooftop last year. As I was looking for another raspberry pi project I convinced him I would setup a reliable monitoring solution that will lead him to an access to the data in real-time data. The current setup comes with an inverter by the company Kostal. The Kostal Piko 5.5 runs an internal web server showing statistics like current power, daily energy, total energy plus specific information for each string.

https://blog.tafkas.net/2014/07/03/a-raspberry-pi-photovoltaic-monitoring-solution/

Analyzing Sleep with Sleep Cycle App and R

Jan 28, 2014

I have been tracking my sleep for almost two years now using my Fitbit. I started with the Fitbit Ultra and then moved on the the Fitbit One after it came out. In October 2013 I found out about the Sleep Cycle (Link) app for the iPhone. For weeks, Sleep Cycle was listed as the best-selling health app in Germany, where currently (as of January 2014) it is in second place.

https://blog.tafkas.net/2014/01/28/analyzing-sleep-with-sleep-cycle-app-and-r/

Berlin Marathon 2014 Participants

Nov 12, 2013

After the 2013 Berlin Marathon sold out in less than four hours, the organizers decided to alter the registration process for 2014. First there was a pre-registration phase followed by a random selection from the pool of registrants to receive a spot. Those who were selected had to register until November 11th, 2013. Any spots that were not confirmed till the 11th would be offered to pre-registered candidates according to the order in which they were randomly selected.

https://blog.tafkas.net/2013/11/12/berlin-marathon-2014-participants/

Installing Oracle Java 7 SDK on the Raspberry Pi

Sep 28, 2013

Two days ago the official hard-float Oracle Java 7 JDK has been announced on the official Raspberry Pi blog. Prior to this there was only the OpenJDK implementation which was lacking performance. Furterhmore the Raspberry Pi Foundation announced that future Raspbian images would ship with. Oracle Java by default. If you want to give it a spin you can install the JDK with: $ sudo apt-get update && sudo apt-get install oracle-java7-jdk

https://blog.tafkas.net/2013/09/28/installing-oracle-java-7-sdk-on-the-raspberry-pi/

Using htop to monitor system processes on the Raspberry Pi

Jul 9, 2013

If you work a lot on the command line you are probably familiar with the top utility to see what process is taking the most CPU or memory. There’s a similar utility called htop, which is an advanced, interactive system-monitor utility that can be used as a replacement tool for the default process monitoring command ‘top’ on a Linux ecosystem. This interactive process viewer provides a real-time, dynamic view of what’s happening on your Raspberry Pi system.

https://blog.tafkas.net/2013/07/09/using-htop-to-monitor-system-processes-on-the-raspberry-pi/

Five years of Weight Tracking

Jun 22, 2013

After I moved back from New Jersey in June 2008 I started to track my body weight more seriously. My routine usually consists of getting up and after finishing the morning bathroom I would step on my scale. That way I try to ensure that the condition for each weighing are as similar as possible. I recorded my weight on paper and eventually would put everything into a spreadsheet for further analysis.

https://blog.tafkas.net/2013/06/22/five-years-of-weight-tracking/

Downloading Fitbit Data using Google Spreadsheets

May 12, 2013

One of the most important features in quantified self is the ability to export your data in an open format. Fitbit lets you download your personal data if you subscribe to a premium membership. Alternatively they provide an API at dev.fitbit.com/ that allows developers to interact with Fitbit data in their own applications, products and services. In a blog post at quantifiedself.com Mark Levitt shows a way how to export your Fitbit data into Google Spreadsheets.

https://blog.tafkas.net/2013/05/12/downloading-fitbit-data-using-google-spreadsheets/

Getting the Raspberry Pi temperature from the command-line

Apr 11, 2013

If you are overclocking your Raspberry Pi or you just curious how hot this little guy gets, there are two ways to get the internal temperature. Assuming you are running Raspbian as your operating system. Method 1: $ /opt/vc/bin/vcgencmd measure_temp This gives you the temperate in in degrees Celsius: temp=54.1'C Method 2: If you need the temperature to be more precise (e.g. storing it in an database or for further processing) use the following command:

https://blog.tafkas.net/2013/04/11/getting-the-raspberry-pi-temperature-from-the-command-line/

Using SSH Public Key Authentication on the Raspberry Pi

Jan 5, 2013

If you log into your Raspberry Pi using ssh it will prompt you for a password. Having to do this multiple times a days this is very annoying. To ease the pain, and enhance security, you can use public key authentication instead. Therefor you create a pair of keys on your client, and store the public key on your Raspberry Pi. Then you set up an authentication by key. Afterwards the user can login into the Raspberry Pi using his private key.

https://blog.tafkas.net/2013/01/05/using-ssh-public-key-authentication-on-the-raspberry-pi/

Charting Sunrise and Sunset in Highcharts

Nov 26, 2012

In order to visually enhance my temperature logging I added some Javascript that computes sunrise and sunset for the 24h, 28h, weekly and monthly chart. Then I use this information to plot vertical bands on the chart indicating the effects of the sun on temperatures (and humidities): To add the bands to your Highchart just get the sunrise and sunset value for a particular day and push it on the xAxis.

https://blog.tafkas.net/2012/11/26/charting-sunrise-and-sunset-in-highcharts/

Gathering and Charting Temperatures using RRDTool and Highcharts

Oct 3, 2012

**tl;dr Checkout the charts on my RaspberryPi ** For quite a long time I was looking for a way to monitor and record th temperature and humidity at my apartment. What was missing was a convenient, preferably wireless solution. After receiving my RaspberryPi I started to look into that more intensively. USB-WDE1 Receiver The USB Weather Data Receiver USB-WDE1 wirelessly receives data from various weather sensors of ELV at 868 MHz.

https://blog.tafkas.net/2012/10/03/gathering-and-charting-temperatures-using-rrdtool-and-highcharts/

Setting up Dynamic DNS on the Raspberry Pi

Sep 9, 2012

Once you have set up your Raspberry Pi chances are that you want to access it from remote machine or host a little web site on it. The problem is that your provider usually gives you a dynamic IP, which changes every time you connect to the Internet. In Germany most (A|V)DSL provider reset your connection every 24h. The solution for this is a dynamic DNS (DDNS), which automatically updates the name server in the Domain Name System (DNS).

https://blog.tafkas.net/2012/09/09/setting-up-dynamic-dns-on-the-raspberry-pi/

A (not so) safe betting strategy for winning at roulette

Jun 9, 2009

A couple of years ago I was on a trip to Budapest with a couple of friends. While roaming the streets we were passing by a casino and my friend insisted that there was a perfect strategy that would only lead to winning at roulette tables. Curious as I was I had him explain his theory. The system basically works as follows: First, you place a coin on red. If red wins, take your winning and start over.

https://blog.tafkas.net/2009/06/09/a-not-so-safe-betting-strategy-for-winning-at-roulette/

Results of the St. Pat's 10 Miler and 5K

Mar 23, 2008

Recently I ran the St. Pat's 10 Miler in Atlantic City, Nj. It was my first official running event ever and I enjoyed it lot. Shortly after the race the official results have been posted on the Internet. The data did not only include the number and times of the participants but also gender and age. Looking at the finisher time distribution it shows that most runners finished at around 90 minutes:

https://blog.tafkas.net/2008/03/23/results-of-the-st.-pats-10-miler-and-5k/