m1el — GeistHaus

Oct 30, 2021

Recently, Casey Muratori has implemented a proof-of-concept terminal, which is designed around fast input processing and rendering. This is an important and commendable effort, since the vast majority of software performs tens to thousands of times slower than it can.[citation needed]

One of the design choices of refterm is to use a hash table to cache glyph rendering. The key in this table is the UTF-8 string of a single glyph. To avoid string allocations, the hash value of the glyph bytes is used as a key instead.

When Casey got asked about the possibility of hash collisions on stream, he responded with a claim that the hash function used in refterm is “a strong hash”, and the complexity to find collision is about 2^64 operations. After analyzing the code for the hash function used in refterm, a few flaws were found in the hash function, and O(1) attacks were discovered.

All the source code presented here was the most recent refterm code at the moment of writing this post, commit 20a063f3700ac4c2913076864cb560301e76b2dc. The hash function ComputeGlyphHash is located in refterm_example_source_buffer.c:166.

Operation of the hash

The hash value is initialized to input length
The hash value is xored with the provided seed value, which is always equal to DefaultSeed
For each full 16 bytes chunk, the following bit mixing method is used:
- the hash value is xored with the input chunk
- the hash value is fed four times through _mm_aesdec_si128 with zero key
For the last chunk, it is zero padded to 16 bytes
And fed through the same bit mixing method as described above

In FP terms, this could be described as:

ComputeHash(Input) = ZeroPad(Input, 16).chunks_exact(16)
    .fold(Length ^ DefaultSeed, |Hash, Chunk| AesDec4Times(Hash ^ Chunk, Zero))

Flaw 1: Padding weakness

The input padding is done by zero-padding. This means that if the last block ends with zeros, the result of processing this block is going to be the same as if that block was right-trimmed of nul bytes.

char Temp[16];
__movsb((unsigned char *)Temp, At, Overhang);
__m128i In = _mm_loadu_si128((__m128i *)Temp);
In = _mm_and_si128(In, _mm_loadu_si128((__m128i *)(OverhangMask + 16 - Overhang)));

Since the hash value of input is initialized with the input length, which is then xored with the first block, the attack is a little bit more complex. The first byte (in little-endian systems) needs to be xored with the xor of input lengths. For example, if the input is "" (empty string), and we append a nul byte "\x00", since the length is 1, the first byte will be xored to "\x01". If we use the value 0x01 for the first byte, the produced hash of the string "\x01" will be the same as the hash of the string "".

Example of collisions:

message: "",      hash: [41148b628ab49fce6d55977058831078]
message: "\x01",  hash: [41148b628ab49fce6d55977058831078]
message: "A",     hash: [86dd41d6a69d8f979dc9c5ca6fc27cce]
message: "B\x00", hash: [86dd41d6a69d8f979dc9c5ca6fc27cce]
message: "BAAAAAAAAAAAAAAA",     hash: [32b05cfcaa74f585fc6c4fd7cef4a99]
message: "CAAAAAAAAAAAAAAA\x00", hash: [32b05cfcaa74f585fc6c4fd7cef4a99]

Mitigating Padding Attack

One of the mitigations for the padding attack is to initialize padding to some value dependending on overhang length. If done correctly, no inputs of different length will produce the same padded block. A computationally inexpensive solution would be to set padding bytes to overhang length. This only works if the overhang chunk is always calculated.

Example code:

    // Read the last block
    char Temp[16];
    __movsb((unsigned char *)Temp, At, Overhang);
    __m128i In = _mm_loadu_si128((__m128i *)Temp);
    // Compute padding depending on overhang length to prevent padding attacks
    __m128i Mask = _mm_loadu_si128((__m128i *)(OverhangMask + 16 - Overhang));
    // _mm_blendv_epi8 requires SSE4, so we use `andnot` + `and` instead
    // In = _mm_blendv_epi8(Mask, In, _mm_set1_epi8((char)Overhang));
    __m128i Padding = _mm_andnot_si128(Mask, _mm_set1_epi8((char)Overhang));
    In = _mm_or_si128(_mm_and_si128(Mask, In), Padding);
    // At this point, In has been padded securely.

Before:
Input: []   -> Padded [00 00 00 00...]
Input: [00] -> Padded [00 00 00 00...]
After:
Input: []   -> Padded [00 00 00 00...]
Input: [00] -> Padded [00 01 01 01...]

Flaw 2: Invertible hashing operations

AES encryption is symmetric encryption. Which means that encryption and decryption functions are invertible, given that you know the encryption key. Since the key used for bit mixing operations is constant (all zeros), the bit mixing operations are easily invertible.

NextHash = AesDec4Times(PreviousHash ^ Chunk, 0)

Given AesDec4Times is invertible,

InvAesDec4Times(NextHash, 0) = PreviousHash ^ Chunk.
Chunk = PreviousHash ^ InvAesDec4Times(NextHash, 0).
PreviousHash = Chunk ^ InvAesDec4Times(NextHash, 0).

This equation allows us to choose two out of three variables arbitrarily. Given fixed PreviousHash and NextHash, we can calculate a Chunk that will produce NextHash after running one round of ComputeGlyphHash. Also, given NextHash and Chunk, we can calculate the PreviousHash value.

This is pretty bad news for this hash function, since it allows us to:

Invert a round of hashing, thus calculating the exact value for one input hashes. Given Hash, for one chunk we know that InputLength ^ DefaultSeed ^ Chunk = InvAesDec4Times(Hash), thus InputLength ^ Chunk = InvAesDec4Times(Hash) ^ DefaultSeed.
Prepend a block to change the hash to an arbitrary value.
Insert a block to change the hash to an arbitrary value.
Append a block to change the hash to an arbitrary value.

Actually Inverting _mm_aesdec_si128(Hash, Zero)

The hass function is using AES primitives in a non-standard way, so some work is required to invert the operation.

Let’s look at the operation of four AES-NI intrinsics: _mm_aesdec_si128, _mm_aesdeclast_si128, _mm_aesenc_si128, _mm_aesenclast_si128:

_mm_aesdec_si128
a[127:0] := InvShiftRows(a[127:0])
a[127:0] := InvSubBytes(a[127:0])
a[127:0] := InvMixColumns(a[127:0])
dst[127:0] := a[127:0] XOR RoundKey[127:0]

_mm_aesdeclast_si128
a[127:0] := InvShiftRows(a[127:0])
a[127:0] := InvSubBytes(a[127:0])
dst[127:0] := a[127:0] XOR RoundKey[127:0]

_mm_aesenc_si128
a[127:0] := ShiftRows(a[127:0])
a[127:0] := SubBytes(a[127:0])
a[127:0] := MixColumns(a[127:0])
dst[127:0] := a[127:0] XOR RoundKey[127:0]

_mm_aesenclast_si128
a[127:0] := ShiftRows(a[127:0])
a[127:0] := SubBytes(a[127:0])
dst[127:0] := a[127:0] XOR RoundKey[127:0]

Since RoundKey is always zero in our case, the xor operations may be omitted below.

To invert _mm_aesdec_si128, we need to run it in reverse, with the opposite operations. InvShiftRows -> InvSubBytes -> InvMixColumns is inverted to: MixColumns -> SubBytes -> ShiftRows.

Initially, InvAesDec was implemented using primitives from Tiny AES in C,

__m128i InvAesDec(__m128i data, __m128i key) {
    // Implemented in terms of aes primitives, using tiny-aes
    data = _mm_xor_si128(data, key);
    MixColumns((state_t*)&data);
    SubBytes((state_t*)&data);
    ShiftRows((state_t*)&data);
    return data;
}

Furthermore, Discord user @davee has helpfully pointed out that the same operation can be implemented using AES-NI intrinsics the following way:

__m128i InvAesDec(__m128i data, __m128i key) {
    // Implemented using AES-NI intrinsics, credit to @davee
    data = _mm_xor_si128(data, key);
    __m128i zero = _mm_setzero_si128();
    data = _mm_aesenc_si128(_mm_aesdeclast_si128(data, zero), zero);
    return _mm_aesenclast_si128(data, zero);
}

Why does this work? Let’s write out the operations:

// aesdec(data, key)
// the operation we would like to invert
InvShiftRows, InvSubBytes, InvMixColumns, XOR Key,
// xor key
XOR Key,
// aesdeclast(data, zero)
InvShiftRows, InvSubBytes, XOR zero,
// aesenc(data, zero)
ShiftRows, SubBytes, MixColumns, XOR zero,
// aesenclast(daat)
ShiftRows, SubBytes, XOR zero,

All XOR zero operations are no-ops, and can be omitted. Two XOR Key operations are next to each other, so they cancel each other. So we are left with these ten:

InvShiftRows, InvSubBytes, InvMixColumns,
InvShiftRows, InvSubBytes,
ShiftRows, SubBytes, MixColumns,
ShiftRows, SubBytes,

Now, we need to delve a little bit about the inner workings of AES. We don’t need to know much, but these few things:

InvX operations invert X operations.
ShiftRows and InvShiftRows operations swap bytes around.
SubBytes and InvSubBytes substitute bytes.
For the current purpose, MixColumns is viewed as a black box that doesn’t commute with anything else.

Because of what (Inf)ShiftRows. and (Inv)SubBytes do, they commute. Byte substitution is position-independent, and byte swapping is independent of contents.

Let’s rewrite the operations by swapping consecutive InvShiftRows and InvSubBytes:

InvSubBytes(5), InvShiftRows(4), InvMixColumns(3),
InvSubBytes(2), InvShiftRows(1),
ShiftRows(1), SubBytes(2), MixColumns(3),
ShiftRows(4), SubBytes(5),

Now we can see that the inverse operations are paired up, so they all cancel out. Therefore, InvAesDec(AesDec(data, key), key) is a no-oop. QED

Since we’re interested in inverting four rounds of AesEnc, rather than calling InvAesDec four times, we can optimize this operation. Note: the xor operations can be omitted in this case since the key is zero.

__m128i InvAesDecX4(__m128i data, __m128i key) {
    __m128i zero = _mm_setzero_si128();
    data = _mm_xor_si128(data, key);
    data = _mm_aesdeclast_si128(data, zero);
    data = _mm_aesenc_si128(data, zero);
    data = _mm_xor_si128(data, key);
    data = _mm_aesenc_si128(data, zero);
    data = _mm_xor_si128(data, key);
    data = _mm_aesenc_si128(data, zero);
    data = _mm_xor_si128(data, key);
    data = _mm_aesenc_si128(data, zero);
    return _mm_aesenclast_si128(data, zero);
}

This function allows us to perform all of the attacks mentioned before.

Demonstrating these attacks

Demonstrating invert attack, invert a hash up to 15 bytes
Note: due to padding attack, there are actually more messages
plaintext: [51, 77, 65, 72, 74, 79, 31, 32, 33]
hash:      [8, 6e, c4, db, 1f, d1, 5b, b7, 95, d5, 44, ea, 5, 2c, bd, 4b]
recovered: [51, 77, 65, 72, 74, 79, 31, 32, 33]
hash:      [8, 6e, c4, db, 1f, d1, 5b, b7, 95, d5, 44, ea, 5, 2c, bd, 4b]

Demonstrating prefix attack, prepend a block and avoid affecting the hash
message: [68, 65, 6c, 6c, 6f]
hash:    [c1, 1e, 0, 16, 4a, f5, 67, 27, e, c1, 1c, ea, 39, 74, e6, 4b]
prefix:  [1d, a4, 56, e6, e9, c3, 64, 70, 5c, e2, 8e, 11, ab, 23, a7, 9c]
forgery: [1d, a4, 56, e6, e9, c3, 64, 70, 5c, e2, 8e, 11, ab, 23, a7, 9c, 68, 65, 6c, 6c, 6f]
hash:    [c1, 1e, 0, 16, 4a, f5, 67, 27, e, c1, 1c, ea, 39, 74, e6, 4b]

Demonstrating chosen prefix attack, append a block and produce arbitrary hash
prefix:  [68, 65, 6c, 6c, 6f]
forgery: [68, 65, 6c, 6c, 6f, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, be, c2, 21, f0, 83, 9d, a6, 0, 61, 58, 2c, ac, f7, 33, 74, b2]
hash:    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Demonstrating preimage attack, prepend a block and produce arbitrary hash
suffix:    [68, 65, 6c, 6c, 6f]
goal hash: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
prefix:    [f4, 47, d8, c8, 25, 21, 17, 41, 91, 82, 83, 31, ca, 60, 37, 4b]
message:   [f4, 47, d8, c8, 25, 21, 17, 41, 91, 82, 83, 31, ca, 60, 37, 4b, 68, 65, 6c, 6c, 6f]
hash:      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

The code for these attacks is available at github.com/m1el/refterm-hash-break.

Mitigating invertible hash operations

To mitigate these attacks, we take a look at publicly available construction of one-way functions. The hash function in refterm is most similar to Davies-Meyer compression method, given a cryptographically secure symmetrical encryption function Encrypt, produces a secure one-way function. It can be expressed in FP terms the following way:

ComputeHash(Input) = SecurePad(Input, 16).chunks_exact(16)
    .fold(Zero, |Hash, Chunk| Hash ^ Encrypt(Hash, Chunk))

The implementation of this construction for refterm bit mixing operation would look like this:

__m128i PreviousHash = HashValue;
HashValue = _mm_aesdec_si128(HashValue, In);
HashValue = _mm_aesdec_si128(HashValue, In);
HashValue = _mm_aesdec_si128(HashValue, In);
HashValue = _mm_aesdec_si128(HashValue, In);
HashValue = _mm_xor_si128(HashValue, PreviousHash);

Please note that this is using AesDec4Times as Encrypt, and it is NOT cryptographically secure. This is not a proper use of AES primitives, I suspect there may exist attacks on this hash function which I’m currently not aware of. However, this provides significantly more security than previously existing hash function in refterm.

Conclusions

Multiple weaknesses were found in refterm’s ComputeGlyphHash, with practical demonstration of attacks. This demonstrates that ComputeGlyphHash is not a cryptographically secure function.

Well known mitigations (proper padding, Davies-Meyer construction) were implemented and published. The mitigations produced negligible performance losses.

These attacks and mitigations could have been found and implemented by anyone with cursory understanding of cryptography.

Do not roll your own crypto if you don’t know what you’re doing.

Do not claim that you have built a strong hash function if you didn’t pass it through nine circles of peer review hell.

The attack code is available here: github.com/m1el/refterm-hash-break.

The mitigations are available here: github.com/cmuratori/refterm/pull/40.

http://m1el.github.io/refterm-hash/

What does Oculus Runtime send home?

Apr 23, 2020

I wanted to know what data Oculus Runtime sends home. To do that, I’ve implemented a custom tool to extract TLS keys from a running Oculus Runtime instance.

You can find my findings below.

Methodology

I’ve captured HTTP packets using Wireshark, and decrypted them using SSL KEYLOG generated by oculus-tls-extractor.

Then I’ve manually inspected the captured packets over multiple sessions.

Limitations

I was not able to inspect data from OculusClient, which is an Electron application, and there may be additional data being sent from there.
I manually inspected the packets and it is certain that I’ve missed something.

Findings

The most used endpoints were: https://graph.oculus.com/logging_client_events and https://graph.oculus.com/graphql.

/graphql endpoint was mostly used as a read-only endpoint, with few exceptions.

For example:

Request:
POST /graphql?forced_locale=en_US HTTP/1.1
Host: graph.oculus.com
...

access_token=<removed_access_token>&variables=%7B%7D&doc_id=<removed_doc_id>

---
Response:
HTTP/1.1 200 OK
...
Access-Control-Allow-Origin: https://facebook.com
Content-Type: text/html; charset="utf-8"
...

{"data":{"viewer":{"user":{"current_party":null,"friends":{"count":0,"edges":[]},"id":"<my user id>"}}}}

There were a few exceptions.

POST /graphql

Form data:
method = POST
forced_locale = en_US
doc_id = <removed_doc_id>
variables = {
  "input": {
    "client_mutation_id": "30166",
    "device_id": "<device uuid>",
    "device_name": "UNGNU",
    "device_serial": "<another uuid>",
    "hardware": [
      {
        "battery": "DISCHARGING",
        "battery_percent": 0,
        "connection": "connected",
        "operational": "operable",
        "serial": "<device serial>",
        "type": "HMD_LAGUNA"
      },
      {
        "battery": "DISCHARGING",
        "battery_percent": 0,
        "connection": "disconnected",
        "operational": "inoperable",
        "serial": "<device serial>",
        "type": "INPUT_LLCON"
      },
      {
        "battery": "DISCHARGING",
        "battery_percent": 0,
        "connection": "disconnected",
        "operational": "inoperable",
        "serial": "<device serial>",
        "type": "INPUT_RLCON"
      }
    ],
    "meets_min_spec": "PASS"
  }
}

/logging_client_events this is a more interesting endpoint. It has a stream of data being sent by the runtime.

The requests have the following format:

Request:
POST /logging_client_events HTTP/1.1
Host: graph.oculus.com
Content-Type: application/json
...

{
  "message": {
    "app_id": 490451621120355,
    "app_ver": "16.0.0.118.452",
    "data": [
      {
        "extra": {
          "client_session_id": 1,
          "hmd_firmware_version": "2.2.0",
          "hmd_os_build": "0",
          "hmd_type": "HMD KM",
          "id": "<user_id>",
          "in_staged_rollout": false,
          "is_core2": true,
          "is_dash_up": false,
          "left_lcon_firmware_version": "unavailable",
          "release_channel": "LIVE",
          "remote_firmware_version": "unavailable",
          "right_lcon_firmware_version": "unavailable",
          "timestamp_gaps": "8",
          "tracker_firmware_version": "",
          "valid_imu_sample": "57937"
        },
        "name": "hal_imu",
        "time": 1587645768.38274
      }
    ],
    "device_id": "<device_id>",
    "log_type": "client_event",
    "oculus_access_token": "<access_token>",
    "oculus_userid": "<user_id>",
    "seq": 3,
    "session_id": "1587645706.843546",
    "time": 1587645777.5674851,
    "uid": "0"
  }
}

Response:
HTTP/1.1 200 OK

{"checksum":"","config":"","app_data":"{}"}

This is a remote logging endpoint. I’ve dumped all logs, sorted them by types and inspected a few log entries of each type.

If you want to inspect the data yourself, I’ve uploaded a my (cleaned up) log entries in LJSON format here.

One of the events caught my eye:

{
  "extra": {
    ...
    "level": "Error",
    "message": "c:\\cygwin\\data\\sandcastle\\boxes\\trunk-hg-ovrsource-win
     \\software\\oculussdk\\pc\\oaf\\liboaf\\steamparser\\parsing.cpp(163)
     : Error parsing Steam info (1971039)",
    ...
  },
  "name": "oculus_application_framework_error",
  "time": 1587238393.064477
}

It looks like Oculus Runtime tries to parse Steam information, but fails to do so? I would appreciate if it didn’t.

OK, let’s look at oculus_service_start entry: https://gist.github.com/m1el/373b4fa14a46b22f933b44d96dd26f7b

      "CpuList": [
        {
...
          "MaxFrequencyMhz": 3600,
          "Model": 0,
          "Name": "Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz",
          "NumberOfCores": 8,
          "NumberOfLogicalProcessors": 16,
...
      "UsbMatchedController": {
        "Compatibility": 3,
        "Compatibility_str": "UNKNOWN_USB3",
        "DriverDescription": "USB xHCI Compliant Host Controller",
...
    "monitors": [
      { "horizontal_active_pixels": 3840, "horizontal_image_size_mm": 1872, "vertical_active_pixels": 2160, "vertical_image_size_mm": 1053 },
      { "horizontal_active_pixels": 1920, "horizontal_image_size_mm": 518, "vertical_active_pixels": 1200, "vertical_image_size_mm": 324 },
      { "horizontal_active_pixels": 1920, "horizontal_image_size_mm": 510, "vertical_active_pixels": 1080, "vertical_image_size_mm": 287 } ],
...
      "VideoCardList": [
        {
          "LUID": "<LUID>",
          "Name": "AMD Radeon R9 290/390",
...
    "motherboard_manufacturer": "ASUSTeK COMPUTER INC.",
    "motherboard_product": "PRIME Z390-A",
    "motherboard_serial": "<motherboard_serial>",
    "motherboard_version": "Rev 1.xx",
    "os": "Windows 10 64-bit",
    "os_build": "18363",
...

That’s a lot of info about the system! All GPUs, all monitors, USB controllers, motherboard serial number, etc.

The rest of the logging events were rather boring. Most of them were:

applications being launched in VR
guardian area
debug information about inside-out tracking
lots and lots of debug information metrics
some cryptic errors
no information about room geometry or photos of the room

Conclusions

The findings were rather boring. The most egregious things I’ve found were:

Sending a dump of system information on service startup.
Trying to parse Steam information?

I would like it if Oculus didn’t send this information home, but this probably won’t happen.

My analysis of the logs is incomplete, if you’d like to do inspect the data yourself, feel free to take a look at the logs I’ve captured myself or capture your own logs using the tool I wrote.

http://m1el.github.io/oculus-tls-extract/results.html

Extracting TLS keys from an unwilling application

Apr 11, 2020

I want to be able to inspect the traffic of programs running on my computer. I don’t really trust those programs and ideally I’d like to put nearly all of them into a high security jail.

One of those programs is Oculus software. There are a few reasons why I want to be cautious about Oculus software.

Oculus is owned by Facebook, which means Facebook can dictate what Oculus software does with user’s data.
Oculus servers run on Facebook’s infrastructure, which means that Facebook can have access to any data uploaded to those servers.
Oculus Headset has cameras and software that builds a 3D map of my room, which can be uploaded to the servers.
Oculus Privacy Policy explicitly states that Oculus automatically collects:

Information about your environment, physical movements, and dimensions when you use an XR device. For example, when you set up the Oculus Guardian System to alert you when you approach a boundary, we receive information about the play area that you have defined;

Fortunately, I can use programs such as Wireshark or tcpdump to inspect traffic sent to the servers.
Fortunately, Oculus is using TLS so that a third party cannot snoop on this data in transit.
Unfortunately, I am playing a role of “third party” in this case.
Fortunately, it’s possible to read process memory and extract secret keys (TLS session master key + client random) and inspect TLS data.

Things that didn’t work

Setting SSLKEYLOGFILE variable – Oculus Runtime is using a 1.0.2o version of OpenSSL where this is not supported.
Extracting OpenSSL keys using an automated debugger – Oculus Runtime is using statically linked OpenSSL with no debug symbols.
Substituting OpenSSL library with one that can log secret keys – it’s statically linked.
I didn’t want to add extra root certificates and proxies to inspect all TLS traffic going on the machine.

Doing this the hard way

So the plan forward was to:

Figure out a code location where secret keys for the session are available
Patch or debug the program so that we can inspect and log those keys

A bit of reverse engineering

To figure out the code location some reverse-engineering was necessary. The first step was to figure out which version does Oculus Runtime use. Looking for strings in OculusAppFramework.dll, there was the following string: Stack part of OpenSSL 1.0.2o-fb10 27 Mar 2018, which means I have a specific version that I work off.

After reading Introduction to OpenSSL programming on Linux Journal, I deduced that SSL_connect (and later SSL_set_connect_state) may be a good place to interject OpenSSL for key extraction.

I’ve loaded up Oculus Runtime into Ghidra, opened up source code that contains public interface of OpenSSL ssl_lib.c and attempted to find common ground between those. The things of interest were integer and string constants which could be used as landmarks.

A particularly notable function is SSL_get_version, which contains references to multiple strings. Looking for TLSv1.2 yielded a few locations, particularly this one:

It looks like SSL_get_version got inlined. I suspected that this was not the only place where TLS connections are made, so I had to find a different place to work on. The next notable thing was that near one of the SSL version strings I’ve noticed code paths and assertion strings:

As it turns out, debug information such as source file names and assertion expressions, which can be used as landmarks too. Now we have more landmarks to navigate OpenSSL binary code.

I’ve continued to label function as I inspected nearby references to .\\ssl\ssl_lib.c strings, and I’ve stumbled upon SSL_set_connect_state function. Using the same code pattern, mov dword ptr [$register + 0x48], 0x5000, I’ve found SSL_connect as well. SSL_connect had an inlined call to SSL_set_connect_state. I’ve decided to be cautious and interject both of these functions.

Extracting the data

These functions already have pointers to ssl_st struct, so let’s extract the data from there.

There were a few options to do this:

Use/write a programmable debugger and breakpoint/inspect values on those functions
Binary patch the DLL
Use DLL injection and patching in-memory DLL code

Since I’ve had poor experience with controlling debuggers programmatically, I didn’t want to go use the first option. This is a potential thing that I might want to improve. Since I didn’t want to invalidate a signed DLL, I didn’t binary patch it. Last option it is.

DLL injection

To extract all TLS keys, we need to control the running process from the beginning of its execution. One way to do this is to set gflags to use DLL injector program as a debugger for the process. The program we want to debug is called OVRServer_x64.exe, so let’s create HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\OVRServer_x64.exe key in registry and set Debugger string value to command line of our injector.

The injector doesn’t have to do any debugging, but it needs to start a program with DEBUG_ONLY_THIS_PROCESS or DEBUG_PROCESS flag. Otherwise, our debugger will be spawned recursively.

The CreateRemoteThread DLL injection is itself a simple technique on Windows, it’s consicely described in a WikiLeaks article as well in other articles.

The code for injector.exe is here.

Running from inside the process

After the DLL is injected, it can patch the code in-memory and log secret keys.

The architecture for injectee.dll is pretty simple – patch the code, create a channel, create writer thread with the receiving end of the channel, send keys from different threads using the other end.

Patching was done in assembly. It can be visually explained like this:

There are several ways to extract the keys given a pointer to ssl_st struct.

Implementing a C library
Walking the pointers manually
Porting OpenSSL structs to the used language

Initially I’ve implemented walking the pointers by hand, but that is a very fragile approach. Porting OpenSSL structs to Rust is quite cumbersome, so I’ve implemented a miniature C library to read secret keys given an ssl_st struct pointer.

The rest is plumbing, and we now can inspect TLS traffic in a running application:

The code for injectee.dll is here.

If you’re interested, the entire code for the project is here: github.com/m1el/oculus-tls-extractor

Analysis of the data being sent by Oculus Runtime to the mothership is in the follow-up post.

http://m1el.github.io/oculus-tls-extract/

The smallest lambda interpreter in JavaScript

Jan 25, 2018

This post is heavily inspired by Tom Stuart: Programming with noting, William Byrd on “The Most Beautiful Program Ever Written”, Guowei LV – The Most Beautiful Program Ever Written, John Tromp – Binary lambda calculus.

When I first learned about Lambda calculus, I was amazed by a strong mathematical basis for anonymous functions used in many languages.

Then I wondered: what’s the simplest way to represent lambda functions as data?

Representation

Lambda expressions, such as λa.λb.(a b) have three parts: Variables a, b, Application (a b) and Abstraction (or creating a function) λa.a.

The question of representing code as data would be incomplete without mentioning LISP, where code is data that can be transformed. The Most Beautiful Program Ever Written dives into that topic, and into the question of evaluation.

All of the features in lambda expressions are available in LISP and have a straightforward representation. However, LISP AST has quite complex data structures: symbols, strings, linked lists… we need to go simpler.

First, let’s use De Bruijn indices – instead of variable symbols. Replace variable names with the number of the nested lambda’s argument you want to look up. For example, an implementation of the K combinator λa.λb.a becomes λλ2, where 2 means “the argument of the second lambda, going up from this position”.

Second, note that the lambda notation has a binary operation – function application, and a single argument syntax for creating a lambda, the argument being lambda body.

Let’s use the following representation:

variable lookup is a positive integer number which represents a De Bruijn index.
creating a function is a tuple of zero and function body. λa.a becomes [0, 1] in JS notation.
function application is a tuple of two values. λa.λb.(a b) becomes [0, [0, [2, 1]]].

Note: John Tromp has invented Binary lambda calculus, which is a much more efficient representation than this. For the purpose of making a simple interpreter, a binary tree form is used to represent lambda terms. There’s a 1-to-1 mapping between binary tree form and binary representation.

Environment

Since variable are represented as indicies, the most straightforward way to represent environment as linked lists. Or a JS tuple [head, tail].

Why not a stack or an array? Because multiple scopes could reference the same parent scope, with possibly changing values.

Constructing an environment becomes as simple as constructing a tuple [currentValue, parentEnv].

Evaluation

It’s easy to construct a simple evaluator once you’ve seen one:

function Eval(prog, env) {
    if (typeof prog === 'number') {
        // lookup a variable
        while(--prog) { env = env[1]; }
        return env[0];
    } else if (prog[0] === 0) {
        // constructing a new lambda
        return (arg) => Eval(prog[1], [arg, env]);
    } else {
        // function application
        return Eval(prog[0], env)(Eval(prog[1], env));
    }
}

The code mangled to 140 characters or less, for no particular reason:

Eval = function E(p, e) {
 if (typeof p=='number'){while(--p){e=e[1]}return e[0]}
 return p[0]==0?(a)=>E(p[1],[a,e]):E(p[0],e)(E(p[1],e))
}

And of course, translated to Common LISP for completeness:

(defun Eval (prog env)
 (cond
  ((numberp prog)     (nth (1- prog) env))
  ((zerop (car prog)) (lambda (arg) (Eval (cdr prog) (cons arg env))))
  ('t                 (apply (Eval (car prog)) (Eval (cdr prog))))))

note: this code is using cons pairs instead of linked lists to represent the binary tree

Does it work?

Let’s test this representation and evaluation with simple combinators:

I combinator or identity: λx.x. Trivially translates to [0, 1].

var I = Eval([0, 1]);

console.assert(I("test") === "test");

K combinator or first of two arguments: λx.λy.x. Translates to [0, [0, 2]].

var K = Eval([0, [0, 2]]);

console.assert(K("first")("second") === "first");

S combinator or generalized application: λx.λy.λz.((x z) (y z)). Translates to [0, [0, [0, [[3, 1], [2, 1]]]]].

var K = Eval([0, [0, 2]]);
var S = Eval([0, [0, [0, [[3, 1], [2, 1]]]]]);

console.assert(S(K)(K)("test") === "test");

ι combinator, Iota combinator or universal iota combinator:

λf.((f S) K) => [0, [[1, S], K]]

Or, expanded: [0, [[1, [0, [0, [0, [[3, 1], [2, 1]]]]]], [0, [0, 2]]]]

var iota = Eval([0, [[1, [0, [0, [0, [[3, 1], [2, 1]]]]]], [0, [0, 2]]]]);
var iota_I = iota(iota);
var iota_K = iota(iota(iota(iota)));
var iota_S = iota(iota(iota(iota(iota))));

console.assert(iota_I("test") === "test");
console.assert(iota_K("first")("second") === "first");
console.assert(iota_S(iota_K)(iota_K)("test") === "test");

FizzBuzz

Once the interpreter is working, we can run some more ambitious programs.

Using some trivial transformations, Tom Stuart’s FizzBuzz becomes the following:

var Eval = function E(p, e) {
 if (typeof p=='number'){while(--p){e=e[1]}return e[0]}
 return p[0]==0?(a)=>E(p[1],[a,e]):E(p[0],e)(E(p[1],e))
};

ZERO = [0,[0,1]];
ONE = [0,[0,[2,1]]];
TWO = [0,[0,[2,[2,1]]]];
THREE = [0,[0,[2,[2,[2,1]]]]];
FOUR = [0,[0,[2,[2,[2,[2,1]]]]]];
FIVE = [0,[0,[2,[2,[2,[2,[2,1]]]]]]];

INC = [0,[0,[0,[2,[[3,2],1]]]]];
DEC = [0,[0,[0,[[[3,[0,[0,[1,[2,4]]]]],[0,2]],[0,1]]]]];

PLUS = [0,[0,[0,[0,[[4,2],[[3,2],1]]]]]];
MINUS = [0,[0,[[1,DEC],2]]];

MUL = [0,[0,[0,[2,[3,1]]]]];
POW = [0,[0,[2,1]]];

NINE = [[PLUS,FIVE],FOUR];
TEN = [[MUL,TWO],FIVE];
FIFTEEN = [[MUL,FIVE],THREE];
HUNDRED = [[MUL,[[MUL,FIVE],FIVE]],FOUR];

TRUE  = [0,[0,2]];
FALSE = [0,[0,1]];

IF = [0,[0,[0,[[3,2],1]]]];

IS_ZERO = [0,[[1,[0,FALSE]],TRUE]];
LEQ = [0,[0,[IS_ZERO,[[MINUS,2],1]]]];

PAIR = [0,[0,[0,[[1,3],2]]]];
LEFT  = [0,[1,[0,[0,2]]]];
RIGHT = [0,[1,[0,[0,1]]]];
EMPTY = [[PAIR,TRUE],TRUE];
UNSHIFT = [0,[0,[[PAIR,FALSE],[[PAIR,1],2]]]];
IS_EMPTY = LEFT;
FIRST = [0,[LEFT,[RIGHT,1]]];
REST = [0,[RIGHT,[RIGHT,1]]];

Z = [0,[[0,[2,[0,[[2,2],1]]]],[0,[2,[0,[[2,2],1]]]]]];

MOD = [Z,[0,[0,[0,[[[[LEQ,1],2],[0,[[[4,[[MINUS,3],2]],2],1]]],2]]]]];
DIV = [Z,[0,[0,[0,[[[[LEQ,1],2],[0,[[INC,[[4,[[MINUS,3],2]],2]],1]]],ZERO]]]]];
RANGE = [Z,[0,[0,[0,[[[[LEQ,2],1],[0,[[[UNSHIFT,[[4,[INC,3]],2]],3],1]]],EMPTY]]]]];
FOLD = [Z,[0,[0,[0,[0,[[[IS_EMPTY,3],2],[0,[[[2,[[[5,[REST,4]],3],2]],[FIRST,4]],1]]]]]]]];
MAP = [0,[0,[[[FOLD,2],EMPTY],[0,[0,[[UNSHIFT,2],[3,1]]]]]]];
PUSH = [0,[0,[[[FOLD,2],[[UNSHIFT,EMPTY],1]],UNSHIFT]]];
TO_DIGITS = [Z,[0,[0,[[PUSH,[[[[LEQ,1],NINE],EMPTY],[0,[[3,[[DIV,2],TEN]],1]]]],[[MOD,1],TEN]]]]];

B   = TEN;
F   = [INC,B];
I   = [INC,F];
U   = [INC,I];
ZED = [INC,U];
FIZZ = [[UNSHIFT,[[UNSHIFT,[[UNSHIFT,[[UNSHIFT,EMPTY],ZED]],ZED]],I]],F];
BUZZ = [[UNSHIFT,[[UNSHIFT,[[UNSHIFT,[[UNSHIFT,EMPTY],ZED]],ZED]],U]],B];
FIZZBUZZ = [[UNSHIFT,[[UNSHIFT,[[UNSHIFT,[[UNSHIFT,BUZZ],ZED]],ZED]],I]],F];

RESULT = [[0,[[0,[[MAP,[[RANGE,ONE],HUNDRED]],[0,
  [[[3,[[2,1],FIFTEEN]],
    FIZZBUZZ],
   [[[3,[[2,1],THREE]],
     FIZZ],
    [[[3,[[2,1],FIVE]],
      BUZZ],
     [TO_DIGITS,1]]]]
]]],MOD]],IS_ZERO];

// some helper functions to extract data into a readable form.
// lambda bool to JS bool
var to_bool = (fn) => fn(true)(false);
// church numeral to JS number
var to_int = (fn) => fn(x => x+1)(0);
// linked list to JS array
var to_list = (fn) => {
    var list = [];
    var LEFT  = p => p(x => y => x);
    var RIGHT = p => p(x => y => y);
    var FIRST = l => LEFT(RIGHT(l));
    var REST = l => RIGHT(RIGHT(l));
    while (!to_bool(LEFT(fn)))
    {
        list.push(FIRST(fn));
        fn = REST(fn);
    }
    return list;
};

// transform list of numbers to JS string
var to_string = (fn) => {
  var alpha = '0123456789BFiuz';
  return to_list(fn).map(c=>alpha[to_int(c)]).join('');
};

console.log('full program:', JSON.stringify(RESULT));
to_list(Eval(RESULT)).forEach(row => console.log(to_string(row)));

Utility

N/A

Downsides

The simplicity of this language and evaluator implies a near-impossibility of providing meaningful error messages.

There’s almost no applications for this in the real world except for learning how lambda calculus works.

Future work

Translate more lambda expressions into data using this notation.
Write a parser that can translate lambda expressions into this form.
Write a parser for Binary Lambda Calculus
Write a lazy evaluator.
Implement an evaluator without using lambda functions provided by the language.
Implement an evaluator in a language that doesn’t support GC.
Make a lambda -> x64 compiler.

Summary

Making a simple evaluator was fun. This is probably the simplest evaluator of a kind.

This evaluator shows once again the simplicity and expressiveness of lambda calculus, and how it maps onto programming languages.

http://m1el.github.io/smallest-lambda-eval/

Monogatari frame drops

Apr 6, 2016

There is a beatiful web page: Monogatari drops, and it’s a rare example of an acceptable use of page scroll for artistic purposes.

The page is great, I like the concept, I like the art, I like monogatari series.

But there is a problem: the page is slow. And we’re going to fix it.

An optimized and fixed version of the page is available over here, and more details on the optimization process are available in the rest of this post.

@media screen and (min-width: 1100px) { img[src$="#sshot"] { max-width: none; margin-left: -173px; } } Measuring

When I said “the page is slow”, I mean that the page scrolling seems to drop frames. But there should always be a way to measure how “slow” it is.

Let’s look at Chrome’s timeline report.

35 milliseconds for a frame?!

I’m viewing this page on a high-end PC, capable of drawing millions of textured polygons per second, and you tell me it takes 35 millisecond to scroll?

If we look at the profile, we can easily identify the main culprit:

$(document).on('scrolled', function(){
  if(
    $scroll_top + $window_height < $scene.data('offset_top') + $scene.height() &&
    $scroll_top > $scene.data('offset_top')
  ) {
    var translateY = size * (3 - (($scroll_top - $scene.data('offset_top')) / $window_height));
    $object.css({'transform': 'translateY(' + translateY + 'px) rotateZ(' + rotate + 'deg)'});
  }
});

Layout thrashing

The browser can’t return layout properties without re-calculating the layout, if the layout has changed. This problem is known as Layout thrashing.

The function in question is constantly causing layout re-calculations by requesting $scene.height() and updating $object.css(...).

If we cache $scene’s height in it’s data, Chrome won’t have to update the layout several times:

diff --git a/main-orig.js b/main.js
--- a/main-orig.js
+++ b/main.js
@@ -249,6 +249,7
       $(".scene").each(function(){
         var $scene = $(this);
         $scene.data('offset_top', $scene.offset().top);
+        $scene.data('height', $scene.height());
       });
     };
     fix_size();
@@ -284,7 +285,7
             var _num = parseInt($scene.attr('id').replace(/^[^\d]*(\d+)[^\d]*$/, "$1"), 10);
             if(
               $scroll_top > $scene.data('offset_top') &&
-              $scene.data('offset_top') + $scene.height() > $scroll_top &&
+              $scene.data('offset_top') + $scene.data('height') > $scroll_top &&
               scene_num !== _num
             ) {
               $(document).trigger('change_nav', _num);
@@ -660,7 +661,7

         $(document).on('scrolled', function(){
           if(
-            $scroll_top + $window_height < $scene.data('offset_top') + $scene.height() &&
+            $scroll_top + $window_height < $scene.data('offset_top') + $scene.data('height') &&
             $scroll_top > $scene.data('offset_top')
           ) {
             var translateY = size * (3 - (($scroll_top - $scene.data('offset_top')) / $window_height));

8 ms? Much better, but can we improve this?

The next culprit is:

$(document).on('scrolled', function(){
  if(
    ($scroll_top + $window_height - 70) > $object.offset().top
  ) {
    if (($scroll_top + 300) > $object.offset().top + $object.height()){
      $object.removeClass('show');
    } else {
      $object.addClass('show');
      $object.trigger('show');
    }
  } else {
    $object.removeClass('show');
  }
});

This code is causing some unnecessary layout updates, and has a similar fix:

diff --git a/main-orig.js b/main.js
--- a/main-orig.js
+++ b/main.js
@@ -616,11 +623,13
             img1.src = "/content/topics/nishio/drops/imgs/scene/low/"+num+"_1.jpg";
             img2.src = "/content/topics/nishio/drops/imgs/scene/low/"+num+"_2.jpg";
           });
+          $object.data('offset_top', $object.offset().top);
+          $object.data('height', $object.height());
           $(document).on('scrolled', function(){
             if(
-              ($scroll_top + $window_height - 70) > $object.offset().top
+              ($scroll_top + $window_height - 70) > $object.data('offset_top')
             ) {
-              if (($scroll_top + 300) > $object.offset().top + $object.height()){
+              if (($scroll_top + 300) > $object.data('offset_top') + $object.data('height')){
                 $object.removeClass('show');
               } else {
                 $object.addClass('show');

The next slow function is:

$(document).on('scrolled', function(){
  if( $scroll_top + $window_height > $scene.data('offset_top') && $scroll_top - 7.5 * $window_height < $scene.data('offset_top')  ) {
    $scene.data("objects").show();
  } else {
    $scene.data("objects").hide();
  }
});

This function will call show() or hide() on many DOM nodes on each frame, which is a big performance hit.

The fix is pretty straightforward:

var objectsVisible = null;
$(document).on('scrolled', function(){
  var newVisible = $scroll_top + $window_height > $scene.data('offset_top') &&
                   $scroll_top - 7.5 * $window_height < $scene.data('offset_top');
  if (objectsVisible !== newVisible) {
    if(newVisible) {
      $scene.data("objects").show();
    } else {
      $scene.data("objects").hide();
    }
    objectsVisible = newVisible;
  }
});

And another fix with layout thrashing:

diff --git a/main-fix1.js b/main.js
--- a/main-fix1.js
+++ b/main.js
@@ -251,6 +251,9 @@ function closeOpening () {
         $scene.data('offset_top', $scene.offset().top);
         $scene.data('height', $scene.height());
       });
+
+      var $ending_wrapper = $(".ending_wrapper");
+      $ending_wrapper.data('offset_top', $ending_wrapper.offset().top);
     };
     fix_size();
     $(window).resize(fix_size);
@@ -532,12 +535,12 @@ function closeOpening () {
         return false;
       };
       $(document).on('scrolled', function(){
-        if( $scroll_top > $ending_wrapper.offset().top ) {
+        if( $scroll_top > $ending_wrapper.data('offset_top') ) {
           $ending.css({ 'position': 'fixed' });
           // console.log(ending_player.getPlayerState());
           if( ending_player ) {
             if (ending_player.getPlayerState() === 1 || ending_player.getPlayerState() === -1) {
-              window.scrollTo(0, $ending_wrapper.offset().top);
+              window.scrollTo(0, $ending_wrapper.data('offset_top'));
               $ending
                 .on('mousewheel', stopevent)
                 .on('scroll', stopevent)

What do you think now, Mr. Profiler?

4 ms? Can we do better? Maybe. But I’ll call this good enough and enjoy my day.

What about those frame drops at the 5 second mark?

These frame drops are caused by downloading and decoding additional images in the background, unfortunately we can’t do much about it.

Unwanted 60 fps

Surprisingly, there is a part of this page that needs to have a lower framerate: Hitagi animation.

imgLoad.on('done', function(){
  var flag = true;
  $(document).on('scrolled', function(){
    cnt++;
    $hitagi_img.attr("src", images[cnt % images.length]);
  });
});

By default, this code will change Hitagi’s frames as fast as possible. In my case, it runs at 60 fps, and looks ridiculous.

Anime is usually animated at 12 fps, so we have to limit the frame rate:

var lastChange = 0;
imgLoad.on('done', function(){
  var flag = true;
  $(document).on('scrolled', function(){
    var now = Date.now();
    if (now - lastChange > 1000 / 12) {
      cnt++;
      lastChange = now;
      $hitagi_img.attr("src", images[cnt % images.length]);
    }
  });
});

End

It’s pretty impressive how such little things can make or break JS/HTML performance.

You can look at the optimized Monogatari drops over here.

And full patch with several bug fixes is available here.

Comments on reddit

Finally, I can enjoy this piece of art in silky smooth 60 fps.

http://m1el.github.io/monogatari-frame-drops/index.html

How to draw oscilloscope lines with math and WebGL

Sep 17, 2015

When I saw oscilloscope music demos, they amazed and inspired me.

Observing a clever use of interaction between sound and an electron beam is a unique experience.

Some awesome oscilloscope music videos: Youscope, Oscillofun and Khrậng.

However, I could not find any beautiful oscilloscope emulators. Lines produced by most emulators barely look like real oscilloscope lines at all.

Because of that I made a cool WebGL demo — woscope, which is a XY mode oscilloscope emulator.

In this post you can learn how to draw these lines in WebGL.

window.MathJax = { tex2jax: { inlineMath: [['$', '$']], processEscapes: true } }; Problem formulation

We have a stereo sound file. We want to draw a line that looks like oscilloscope line, when the oscilloscope is connected in XY mode.

Each pair of samples from the sound file interpreted as 2D point.

I decided to draw each line segment by drawing a rectangle that affected by electron beam moving from the start to the end of the segment.

This has two interesting implications:

There is no corner case for segment joints
There will be triangle overdraw, which might cause loss of performance

Beam intensity is collected from different segments using blendFunc(gl.SRC_ALPHA, gl.ONE);.

Generating vertices

For a line segment, for each point of a rect we calculate vertices position using segment start point, segment end point and index of the vertex.

it might be weird to store the same point 4 times in a buffer, but I could not find a better way

The forst two points are close to the starting point, second two points are close to the ending point. Even points are shifted to the “right” of the segment and odd point are shifted to the “left” of the segment.

This transformations is pretty straightforward in glsl:

#define EPS 1E-6
uniform float uInvert;
uniform float uSize;
attribute vec2 aStart, aEnd;
attribute float aIdx;
// uvl.xy is used later in fragment shader
varying vec4 uvl;
varying float vLen;
void main () {
    float tang;
    vec2 current;
    // All points in quad contain the same data:
    // segment start point and segment end point.
    // We determine point position using its index.
    float idx = mod(aIdx,4.0);

    // `dir` vector is storing the normalized difference
    // between end and start
    vec2 dir = aEnd-aStart;
    uvl.z = length(dir);

    if (uvl.z > EPS) {
        dir = dir / uvl.z;
    } else {
    // If the segment is too short, just draw a square
        dir = vec2(1.0, 0.0);
    }
    // norm stores direction normal to the segment difference
    vec2 norm = vec2(-dir.y, dir.x);

    // `tang` corresponds to shift "forward" or "backward"
    if (idx >= 2.0) {
        current = aEnd;
        tang = 1.0;
        uvl.x = -uSize;
    } else {
        current = aStart;
        tang = -1.0;
        uvl.x = uvl.z + uSize;
    }
    // `side` corresponds to shift to the "right" or "left"
    float side = (mod(idx, 2.0)-0.5)*2.0;
    uvl.y = side * uSize;
    uvl.w = floor(aIdx / 4.0 + 0.5);

    gl_Position = vec4((current+(tang*dir+norm*side)*uSize)*uInvert,0.0,1.0);
}

Calculating the intensity with math

Now that we know where to draw a quad, we need to calculate the cumulative intensity at any given point in the quad.

Let’s assume that electron beam has gaussian distribution, which is not that uncommon in physics.

\[\mathrm{Intensity}(distance) = \frac{1}{σ\sqrt{2π}} e^{-\frac{distance^2}{2σ^2}}\]

Where σ is the spread of the beam.

To calculate the cumulative intensity, we integrate the intensity over time as the beam is moving from start to end.

\[\mathrm{Cumulative}=\int_0^1 \mathrm{Intensity}(\mathrm{distance}(t)) dt\]

If we use the frame of reference where start point is at $(0,0)$ and the end point is $(length,0)$, $distance(t)$ can be written as:

\[\mathrm{distance}(t)=\sqrt{(p_x-t\times length)^2+p_y^2}\]

Now,

\[\mathrm{Cumulative}(p)=\int_0^1 \frac{1}{σ\sqrt{2π}} e^{-\frac{(p_x-t\times length)^2+p_y^2}{2σ^2}} dt\]

Since $p_y^2$ is a constant in this integral, $e^{-\frac{p_y^2}{2σ^2}}$ can be taken out of the integration.

\[\mathrm{Cumulative}(p)=e^{-\frac{p_y^2}{2σ^2}} \int_0^1 \frac{1}{σ\sqrt{2π}} e^{-\frac{(p_x-t\times length)^2}{2σ^2}} dt\]

Let’s simplify the integral slightly, by replacing $t$ with $\frac{u}{l}$:

\[\int_0^1 \frac{1}{σ\sqrt{2π}} e^{-\frac{(p_x-t\times length)^2}{2σ^2}} dt = \frac{1}{l} \int_0^l \frac{1}{σ\sqrt{2π}} e^{-\frac{(p_x-u)^2}{2σ^2}} du\]

The integral of normal distribution is half error function. w|a

\[\frac{1}{l} \int_0^l \frac{1}{σ\sqrt{2π}} e^{-\frac{(p_x-u)^2}{2σ^2}} du= \frac{1}{2l} \left(\mathrm{erf}\left(\frac{p_x}{\sqrt2 σ}\right) - \mathrm{erf}\left(\frac{p_x-l}{\sqrt2 σ}\right)\right)\]

Finally,

\[\mathrm{Cumulative}(p)=\frac{1}{2l} e^{-\frac{p_y^2}{2σ^2}}\left(\mathrm{erf}\left(\frac{p_x}{\sqrt2 σ}\right) - \mathrm{erf}\left(\frac{p_x-l}{\sqrt2 σ}\right)\right)\]

One of the benefits of doing math in this case is that segment joints will have mathematically perfect intensity after adding adjacent segments.

The resulting formula is not hard to implement in framgent shader in glsl.

Fragment shader

The uvl varying in the vertex shader contains coordinates in which start point is $(0,0)$ and end point is $(length,0)$.

Since varyings are interpolated linearly for the fragment shader, we get the point coordinates in required frame of reference for free.

The rest is writing the math formula in glsl, and handle a corner case when the length of the segment is too small.

gaussian function and erf approximation for glsl is shamelessly stolen from Evan Wallace’s blog

#define EPS 1E-6
#define TAU 6.283185307179586
#define TAUR 2.5066282746310002
#define SQRT2 1.4142135623730951
uniform float uSize;
uniform float uIntensity;
precision highp float;
varying vec4 uvl;

// A standard gaussian function, used for weighting samples
float gaussian(float x, float sigma) {
  return exp(-(x * x) / (2.0 * sigma * sigma)) / (TAUR * sigma);
}

// This approximates the error function, needed for the gaussian integral
float erf(float x) {
  float s = sign(x), a = abs(x);
  x = 1.0 + (0.278393 + (0.230389 + 0.078108 * (a * a)) * a) * a;
  x *= x;
  return s - s / (x * x);
}

void main (void)
{
    float len = uvl.z;
    vec2 xy = uvl.xy;
    float alpha;

    float sigma = uSize/4.0;
    if (len < EPS) {
    // If the beam segment is too short, just calculate intensity at the position.
        alpha = exp(-pow(length(xy),2.0)/(2.0*sigma*sigma))/2.0/sqrt(uSize);
    } else {
    // Otherwise, use analytical integral for accumulated intensity.
        alpha = erf(xy.x/SQRT2/sigma) - erf((xy.x-len)/SQRT2/sigma);
        alpha *= exp(-xy.y*xy.y/(2.0*sigma*sigma))/2.0/len*uSize;
    }

    float afterglow = smoothstep(0.0, 0.33, uvl.w/2048.0);
    alpha *= afterglow * uIntensity;
    gl_FragColor = vec4(1./32., 1.0, 1./32., alpha);
}

Things that can be improved

The resulting picture looks pretty nice, however there are several problems with current implementation.

The electron beam should not move in straight lines between sample points. There are no sharp corners on a real oscilloscope, but sometimes there are sharp corners in the current implementation.
It would probably be benefitial to implement a sinc interpolation to increase the number of points.
Point intensity is saturated too fast. I’ve added a workaround by giving little red and blue value to the color, so oversaturated pixels look white.
No account for gamma correction.
There is no post-processing bloom, but is it really required?
Currently, woscope is a PoC of this technique, should it be implemented as JavaScript library?
Implement the oscilloscope emulator as a native program?

Summary

Using math is extremely beneficial when emulating physical processes.

I’ve never seen an oscilloscope beam implemented this way, and I really like the resulting picture. This technique can be used to draw other soft lines.

I hope this will help people drawing beautiful lines using WebGL/OpenGL :)

code on this page is in Public Domain, except erf and gaussian functions borrowed from other blog

http://m1el.github.io/woscope-how/index.html

The Most Expensive Anti-Pattern

Jul 22, 2015

In this post, I’d like to talk about the most expensive programming anti-pattern I know:

Manipulating structured data formats using string functions.

I will be referring to it as “printf anti-pattern”.

The cost

When I call it “the most expensive” anti-pattern, it is not an empty claim.

I counted vulnerabilities by type using data from cve.mitre.org, and I got this list of top vulnerability types:

rexec: 19268
DoS: 14849
xss: 9236
memory: 8212
sqlinj: 6230
privilege: 3321
dirtraversal: 2762
arith: 1260
csrf: 1117

You’re free to criticize how I did it and do it better.

If you look at top entries, you will notice that XSS and SQL injections contribute a noticeable chunk to the list.

I claim that most XSS and SQL injections are caused by printf anti-pattern.

Inserting random strings into HTML is a terrible idea. Same goes for SQL.

Examples

Once you know the definition of printf anti-pattern, you will notice that it is ubiquitous.

You will find that this anti-pattern is most common in HTML and SQL. That’s why there are so many SQL injections and XSS vulnerabilities.

Here are a few examples:

Nearly every PHP web-site in existence

<div class="greeting">Hello, <?php echo $username; ?>!</div>

Any example of SQL query generated using string manipulation (see PHP’s docs for mysql_query)

$query = sprintf("SELECT firstname, lastname, address, age FROM friends 
    WHERE firstname='%s' AND lastname='%s'",
    mysql_real_escape_string($firstname),
    mysql_real_escape_string($lastname));

Lots of programmers generating dynamic elements in JavaScript using innerHtml-like techniques, example:

var OriginalContent = $(this).text();
$(this).html("<input type='text' value='" + OriginalContent + "' />");

Some poor programmer writing a dynamic web-site in C:

sprintf(buffer, "<tr><td onclick=putpgtl(\"?j=%d&k=%d&v=%d&h=24\")>%s２４時間</td></tr>", ...)

Generating JSON using string formatting: gist.github.com/varemenos/e95c2e098e657c7688fd

git log --pretty=format:'{ %n  "commit": "%H",%n  "abbreviated_commit": "%h",%n  "tree": "%T",%n  "abbreviated_tree": "%t",%n  "parent": "%P",%n  "abbreviated_parent": "%p",%n  "refs": "%D",%n  "encoding": "%e",%n  "subject": "%s",%n  "sanitized_subject_line": "%f",%n  "body": "%b",%n  "commit_notes": "%N",%n  "verification_flag": "%G?",%n  "signer": "%GS",%n  "signer_key": "%GK",%n  "author": { %n    "name": "%aN",%n    "email": "%aE",%n    "date": "%aD"%n  },%n  "commiter": { %n    "name": "%cN",%n    "email": "%cE",%n    "date": "%cD"%n  }%n},'

# jesus christ why
(curl json) | jq -c '.[] | {value: .value, name: .name}' | sed -e 's/"name":"//g' -e 's/","value"//g' | tr -d '"}' | grep -v ':0' | awk '{FS=":" ; printf "%20s\t\%d\n",$1,$2}' | less

Using sed to process XML:

# http://askubuntu.com/questions/442013/using-sed-to-search-and-replace-text-in-xml-file
sed -i 's#<!--UpdateAccountGUIDs>UpdateAndExit</UpdateAccountGUIDs-->#<UpdateAccountGUIDs>UpdateAndExit</UpdateAccountGUIDs>#' File.XML
# http://askubuntu.com/questions/284983/print-text-between-two-xml-tags
sed -n '/<serverName/,/<\/serverName/p' big_xml_file.xml

Code generation in ExtJS:

  '<tpl if="hasCustomConvert">',
  '    dest["{name}"] = value === undefined ? __field{#}.convert(__field{#}.defaultValue, record) : __field{#}.convert(value, record);\n',
  ...
// exploited using
// Ext.define('m',{extend:'Ext.data.Model',fields:['id']});
// var store = Ext.create('Ext.data.Store',{model:m});
// store.loadRawData({metaData:{fields:['"+alert(1)+"']}});

mal defining load-file as eval on string concatenation

Define a load-file function using mal itself. In your main program call the rep function with this string: "(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \")\")))))".

Proper Alternatives

So if I’m generating HTML using string functions, what should I be doing instead?

You will see that all proposed solutions have a common theme: manipulate the underlying data structure and then serialize it.

There is no point where a serialized data structure is modified as a string.

For HTML

hiccup, hiccup-examples

(html [:ul
  (for [x (range 1 4)]
       [:li x])])
(defn index []
  [:div {:id "content"}
   [:h1 {:class "text-success"} "Hello Hiccup"]])

lxml e-factory

html = page = (
  E.html(       # create an Element called "html"
    E.head(
      E.title("This is a sample document")
    ),
    E.body(
      E.h1("Hello!", CLASS("title")),
      E.p("This is a paragraph with ", E.b("bold"), " text in it!"),
      E.p("This is another paragraph, with a", "\n      ",
        E.a("link", href="http://www.python.org"), "."),
      E.p("Here are some reservered characters: <spam&egg>."),
    )
  )
)

Manipulate the DOM, then serialize HTML

A toy example would look something like this:

html = etree.parse('template.html')
name_node = html.xpath('//div[@id="user-name"]')[0]
name_node.text = user.name
print(etree.tostring(html))

Server-side AngularJS or React
Manipulate the DOM directly

e.g. jQuery:

var OriginalContent = $(this).text();
$(this).empty().append($('<input type=text>').val(OriginalContent))
// Bad code. BAD!
// $(this).html("<input type='text' value='" + OriginalContent + "' />");

For JSON

Use object literals or list comprehensions to make a JSON object, then serialize it
Generate a JSON object and then serialize it

For SQL

Use query placeholders
Manipulate SQL Abstract Syntax Trees, then emit SQL
LINQ to SQL

For XML

Use a sane serialization format instead (see JSON)
Manipulate the DOM, then serialize XML
XSLT would qualify if it was not terrible

For meta-programming

Lisp
Manipulate Abstract Syntax Trees for your language

The Reason

So how did we get here?

I suspect that it happened because people are lazy.

— Oh, so we have this markup that we can generate simply using string concatenation? Yeah, let’s just do that.

— But if some string contains <img/src/onerror=alert(1)>, it will cause JavaScript code execution!

— Ugh, let’s write the htmlspecialchars function and every time we put some string into HTML, pass the string through this function…

— But if you write "<img class=" + htmlspecialchars($_GET['cls']) + " src=dot.gif>", you can still inject JavaScript!

— But only an idiot would write that code.

— What should we do if our code emits invalid HTML in the first place, without string substitution?

— Just write it carefully next time.

So here we are, generating HTML using string concatenation.

Of course, if you are being really, really careful, you can emit valid HTML and SQL with no injection problems.

But you don’t use manual memory management and pointer arithmetics when you generate your web-site, do you?

Why would you walk on a razor’s edge if you can program in a safe way?

Browsers

There is, of course, a funnier side to this story: browsers.

Browser vendors had to tune their parsers for broken HTML because many web-sites gave the browsers invalid HTML. People chose browsers that were able to process nearly arbitrary chunks of bytes as HTML because these browsers were able to display their favorite web-sites.

Browser vendors had to implement XSS filters, because web-sites are prone to putting raw user requests straight into HTML. The rationale for XSS filters is simple: if browser vendor can prevent 90% of XSS attacks on the user, the user will be happy.

However, these filters simply can not protect from all XSS attacks.

These two examples are browsers dealing with symptoms of the problem, and not with the problem itself. The problem is in the head of a programmer who thinks that it is reasonable to generate dynamic HTML using string manipulation.

Conclusion

The state of HTML as a structured data format is terrible because people started manipulating it as a string from the very beginning. There are many problems (including, but not limited to: XSS, invalid HTML, browser parsing differences) caused by this mistreating of HTML format.

Maybe, just maybe, if available tools did not encourage people to generate HTML as a string, the web would be a better place.

Maybe, just maybe, if we chose a different serialization format for documents on the web, we would not treat it as a string that can be written using printf.

We would definitely have less vulnerabilities if programmers did not think that constructing structured data formats using string functions is an acceptable idea.

http://m1el.github.io/printf-antipattern/

Speculations on security

Jun 29, 2015

m1el: “You cannot abstract security”

m1el: “Security is not an interface”

aitderceto: “Security cannot be instantiated”

aitderceto: I’d like one security, please

m1el: OOP is not a good tool to describe security

http://m1el.github.io/oop-security/

On white-box crypto

May 14, 2015

Usually, I don’t react strongly to articles that contain bullshit. But when I stumbled on an article that contained three false statements in the very first sentence, I simply could not leave it be.

The first sentence of the article is:

Program obfuscation is a breakthrough and trending field of cryptography.

– valerini

My post touches a lot of topics, and I’m not an expert in any of those.

Strong opinion inside.

What’s white-box crypto?

The challenge that white-box cryptography aims to address is to implement a cryptographic algorithm in software in such a way that cryptographic assets remain secure even when subject to white-box attacks.

This technology allows to perform cryptographic operations without revealing any portion of confidential information such as the cryptographic key.

– whiteboxcrypto.com

As far as I understand, the formulation of the problem is the following:

Alice gives Bob software with some crypto and keys inside
The software can decrypt ciphertexts that Alice will give to Bob
Bob should not be able to extract the decryption key from the software
There Bob is evil for trying to figure out what’s the software doing on his own computer

Personally, I already see several problems within that problem formulation:

Why is Alice trying to hide the key from Bob? Bob gets to see the plaintext anyway!
Bob can simply use the obfuscated code for ciphertext decryption aka “code lifting”, which defeats the purpose of encryption.
Security through obscurity? It does not seem to conform Kerckhoffs’s principle, therefore it is not strong crypto.

Anyway, why would you want to do such a useless thing?

White-box cryptography is the breakthrough technology that enables modern day software DRM systems.

– Schultz12

Ah, now I see. Alice is trying to imlpement a DRM system, so she converted one broken problem to another more narrow (and still broken) problem and called it “white-box crypto”.

It would probably be fair to call White-box crypto “DRM”, I’ve failed to find any other application for it.

What’s algorithm obfuscation?

There is a good paper on the theoretical impossibility of program obfuscation even under very weak formalization Barak01, which removes one way to achieve the goal of WBC. And another paper shows some classes of function that cannot be obfuscated Goldwasser07.

The other paper Goldwasser07 states that the best possibly achievable obfuscation is “indistinguishability obfuscation” which tells us nothing about extracting data from obfuscated programs.

All WBC implementations I know of depend on embedding the key in the algorithm and obfuscation, and have been defeated.

When talking about obfuscation, some people mention that it is possible Wee05 to “hide” point function implementation in the code. Which is basically checking the hash.

myHiddenPointFunction x = sha256(x, "f5cb1e...") == "5b27d1..."
# the entire article reduced to a single line of code

Or that it is possible to obfuscate some other functions: Ran10, Zvika05 … which are way, waaay far from obfuscating decryption, especially AES.

What do the authors of WBC say about WBC security?

all references, excluding Wyseur12, in the following paragraph are taken from Wyseur12 have not been read/verified by me, there are no download links

After reading the file Wyseur12 by a proponent of WBC, I noticed that in that very paper the author mentions that all other previous attempts on WBC implementations have been defeated [Jacob02], [Link05], [Wyseur07], [Billet04]; can be defeated in a similar way [Mich08]; can be defeated with a similar formulation of a problem [Wyseur09]; and modifications have been defeated too [Bri06wbc], [Dem10].

And also quote:

Despite all the new constructions that have been presented, the security of white-box cryptography is very unclear. Almost all of the presented constructions have been shown insecure with academic cryptanalysis papers that have been published. Only a few constructions remain ‘unbroken’, but they are merely a minor tweak to existing constructions – hence the existing attacks will apply with little effort.

So the author is focusing on the obfuscation part, and making obfuscation less prone to code lifting, reverse engineering, etc.

And WBC now becomes a problem of software obfuscation. We’ve completed the circle.

The Many Facades of DRM

After reading the file Schultz12, I’d like to quote

White-box cryptography … enables modern day software DRM systems. Without this technique digital media distribution from iTunes, Amazon, and others using software DRM protection would simply not exist.

And call bullshit twice:

Most of all known WBC is broken as shown in Wyseur12, therefore it “enables” nothing.
iTunes is selling DRM-free tracks. People like it and AAPL gets the profits.

Then the author is referencing [Chow02AES] as WBC implementation, which is defeated by [Jacob02] and [Link05] as been claimed in Wyseur12.

Homomorphic encryption is unrelated

I remember some people claiming that homomorphic encryption might somehow help with white-box crypto implementation.

Here’s why not: homomorphic encryption assumes that computation is carried on the ciphertext, and plaintext (modified or not) is never available to the party performing the computation. Once that party gets the plaintext, that assumption is broken, and within WBC problem formulation plaintext is available to the end-user.

Additional info on WBC

Souchet15AES gives a great example of AES WBC implementation, but he shows how to defeat it (apart from obfuscation) in the same article.

Khovratovich13WBCSurvey (video) is a survey on WBC.

In the question section someone gives a comment 52:59 I’d like to quote

First of all, thank you for the presentation because it helped shed some light on the mysterious topic of white-box cryptography. And it helped me calling out bullshit on the whole idea, I think someone should make this critical comment, because the whole concept breaks with one of the principles of cryptography that the algorithm should be verifiable. If you obfuscate it to hide something… if it’s only hiding the key, it’s no use because I would not go after the implementation at all, I would use other ways to attack and recover the key. Everybody knows how easy that is. And if it obfuscates even more, it changes it, then it breaks with the principle of being verifiable. So I think that the whole thing is snake oil and should not be used by anybody.

Summary

White-box cryptography problem formulation is utterly broken.
All existing white-box cryptography implementations do not achieve their purpose.
The only application for white-box crypto is DRM, which is a broken task on its own.
The only ways to implement WBC seems to be algorithm or software obfuscation.
Neither are shown to be doable securely for any sufficiently interesting tasks.

The world is going to be a better place without “white-box crypto”.

Used literature

[Barak01]: Boaz Barak et al. “On the (Im)possibility of Obfuscating Programs” wisdom.weizmann.ac.il/~oded/PS/obf4.pdf
[Goldwasser05]: Shafi Goldwasser, Yael Tauman Kalai “On the Impossibility of Obfuscation with Auxiliary Input” http://research.microsoft.com/en-us/um/people/yael/publications/2005-impossibility_obfuscation.pdf
[Goldwasser07]: Shafi Goldwasser, Guy N. Rothblum “On Best-Possible Obfuscation” iacr.org/archive/tcc2007/43920194/43920194.pdf
[Wee05]: Hoeteck Wee “On Obfuscating Point Functions” cs.columbia.edu/~hoeteck/pubs/obf-point-stoc05.pdf
[Zvika05]: Zvika Brakerski, Guy N. Rothblum “Black-Box Obfuscation for d-CNFs” eprint.iacr.org/2013/557.pdf
[Ran10]: Ran Canetti et al. “Obfuscation of Hyperplane Membership” iacr.org/archive/tcc2010/59780071/59780071.pdf
[Wyseur12]: Brecht Wyseur “WHITE-BOX CRYPTOGRAPHY: HIDING KEYS IN SOFTWARE” whiteboxcrypto.com/files/2012_misc.pdf
[Schultz12]: Rod Schultz “The Many Facades of DRM” whiteboxcrypto.com/files/2012_MISC_DRM.pdf
[Souchet15AES]: Axel Souchet “Spotlight on an Unprotected AES128 White-box Implementation” doar-e.github.io/blog/2015/02/08/spotlight-on-an-unprotected-aes128-whitebox-implementation/
[Khovratovich13WBCSurvey]: Speaker: Dmitry Khovratovich “White-Box Cryptography Survey” [30c3] youtube.com/watch?v=om5AVTqB5bA

Not actually used literature

[Jacob02]: Matthias Jacob, Dan Boneh, and Edward W. Felten. “Attacking an Obfuscated Cipher by Injecting Faults” 2002
[Link05]: Hamilton E. Link and William D. Neumann. “Clarifying Obfuscation: Improving the Security of White-Box DES” 2005
[Wyseur07]: Brecht Wyseur, Wil Michiels, Paul Gorissen, and Bart Preneel “Cryptanalysis of White-Box DES Implementations with Arbitrary External Encodings” 2007
[Billet04]: Olivier Billet et al. “Cryptanalysis of a White Box AES Implementation” 2004
[Mich08]: Wil Michiels, Paul Gorissen, and Henk D.L. Hollmann “Towards Security Notions for White-Box Cryptography” 2008
[Wyseur09]: Brecht Wyseur, “White-Box Cryptography” 2009
[Bri06wbc]: Julien Bringer, Herve Chabanne, and Emmanuelle Dottax. “White box cryptography: Another attempt”
[Dem10]: Yoni De Mulder, Brecht Wyseur, and Bart Preneel, “Cryptanalysis of a Perturbated White-box AES Implementation”
[Chow02AES]: Stanley Chow, Philip A. Eisen, Harold Johnson, and Paul C. van Oorschot. “White-Box Cryptography and an AES Implementation” 2002

http://m1el.github.io/wbc/

Kare Kano DVDs

May 8, 2015

One day, I’ve stumbled upon Kare Kano anime, but after searching for rips for a while, I did not find a single DVDRip I would consider acceptable.

So I thought: heeey, maybe I should rip the DVD myself?

God, how wrong I was.

In this post I’m going to write about the shit I found while working with Kare Kano DVDs.

I’m writing this post not to be informative, but for myself to be done with this DVD. Forever.

Disclaimer: for all intents and purposes the reader can assume that the author of this post is a crazy uninformed lunatic.

Getting the DVDs

It turns out, there are several different releases of Kare Kano. At some point, I’ve had four different sets of DVDs downloaded: R1, R2J, Spanish, and homemade Russian.

The Spanish one turned out to be a PAL DVD, and anime is not supposed to be 25 or 50 fps.

Homemade russian release turned out to be R1 release without english streams and with additional russian subtitles and sound streams.

Now, downloading the R2J release turned out to be somehow problematic. At the time I tried to do it, in 2010, there were no seeds in any of Kare Kano R2J torrents. I could not even find it on some closed trackers.

When I finally found the torrent I’ve been looking for so long, there was only one seeder on that torrent. The download speed varied from 5 to 19 kilobytes per second. And he was not even online all the time.

When I put his IP into maxmind geoIP tool, it spewed out:

Russia, SibirTelecom, <some-small-and-distant-city>

ಠ_ಠ

So I’ve been downloading 45 gigabytes of Kare Kano in the course of 3 months from some guy in deep Siberia.

How do I DVDRIP?

After reading some guides on Doom9 forums and avisynth manuals, I decided to look for anime ripping communities. I found #raw-providers IRC channel on Rizon, and told them I want to rip Kare Kano.

To paraphrase modestly, they strongly dismissed the idea of ripping that DVD.

They had Yousei, who was a DVDRip wizard. Just to give you a taste of what he did: screenshot1, screenshot2.

That’s a DVDRip. There are BDs that are worse than that. That release is probably the strongest argument in favor of anime upscales.

And he told me to drop the idea of ripping Kare Kano.

What’s telecine?

We might want to know how the DVD we have was made.

Anime used to be produced using cels, which were shot on 24 fps film. Kare Kano was produced this way.

A screenshot from video of Gainax burning Kare Kano cels; episode 19, ending

Then we got the amazing thechnology of digital video distribution — DVD. Existing anime had to be digitized, and telecine does the job.

Telecine is the process of transferring motion picture film into video and is performed in a color suite.

Wikipedia

But there is a problem: film has 24 fps, and DVD supports only 25 or 29.97 fps video, interlaced or progressive.

Wait? Not exactly 30 fps? 30000/1001 fps? What sorcery is this? We’ll look into that later.

Okay, so how does one fit 24 fps video into a 30-ish fps video?
If we look at given framerates, we’ll see that 24/30 = 4/5, so if we somehow convert 4 frames of given 24 fps video to 5 frames, we’ll get 30 fps.
It is possible with 2:3 pulldown.

2:3 pulldown, explained by a picture from Wikipedia

Pulldown is producing interlaced video meaning there will be “combs”.

I would describe the 2:3 pulldown algorithm in python pseudocode:

from itertools import cycle
alternating = cycle([Even, Odd]) # alternate between even and odd output fields
pattern = cycle([2, 3])          # the name comes from this field pattern
for frame in input:
    for _ in range(next(pattern)):           # output N fields
        yield first.field(next(alternating)) # output the corresponding field

And to get 30-ish fps, we also need to slow down the playback by a factor of 1000/1001 at some point.

The 2:3 pulldown pattern can be observed in first frames of Kare Kano:

kare kano, episode 01, first seconds; picture is anamorhpic

Well, obviously we can deal with that. Avisynth has plugins for inverse telecine: tfm, tdecimate.

TFM is matching the fields, and TDecimate is throwing away unmatched and doubled frames.

Let’s see how it works:

mpeg2source("vts_01_1.d2v")
tfm(pp=1) // don't interpolate unmatched frames
tdecimate()
trim(2518,2525)
// avs2yuv ivtc.avs -o - | ffmpeg -r 3 -i - -c:v vp9 -b:v 0 -crf 18 whats-ivtc.webm

kare kano, episode 01, first seconds, IVTC; picture is anamorphic

That looks promising!

However, my hope of easy Kare Kano rip died really fast.

Gainax-style digital video effects circa 1998

For some reason, Gainax decided that adding 60-ish fps interlaced video effects to the telecined video is a good idea.

Miyazawa laughing at Gainax digital video processing; episode 01; picture is anamorphic

In the video clip above you can see that “ふ” hieroglyphs are always interlaced. It is a purely interlaced 60-ish fps effect. The “fufufu” was added in a digital way, after telecine.

This means several extremely unpleasant things.

some parts of the video MUST be motion-interpolated and to produce constant framerate video
some parts of the video MUST be field-interpolated to produce progressive video
detecting digitally edited parts has to be done manually

But wait, there is more!

Temporal Scanline Blending

The term in this headline is stolen from a handy guide Analysing Your DVD Footage.

a picture of Miyazawa partially blending with the door; episode 05; picture is anamorphic

Hm, what’s that? Let’s inspect individual fields.

individual fields, upscaled to double size using nnedi; episode 05; picture is anamorphic

What we see here is film frames changing in the middle of the output field. The next field after that is slightly blended too.

I tried to fix this issue using various de-blending plugins and writing a plugin on my own, but I failed to reverse this problem. Partially, because there is an inevitable information loss involved: after fields are blended in the analog world, the pixels are quantized in 8 bit (probably?) and then lossily encoded with MPEG2. So even if I know perfect blending parameters and frame change timing, MPEG2 is introducing a lossy step, and trying to de-blend will cause artifacts from both frames to surface. The quantization step adds some problem to de-blending too.

I’ve inspected at all individual frames in the first episode and wrote down blending parameters for all required frames. Of course, there was a lot of automation involved. However, I was not happy with the final de-blending result.

Pretty early into this process I discovered why this happens. The blending pattern repeats approximately every 1000 frames. The line of the blending frames moves “up” across the whole field in a course of ~500 frames.

My hypothesis on the cause of this artifact is that during telecine process, film has been displayed at 24.00 frames per second to a telecine machine that took 59.94 fields per second, and sometimes frames were changing during the field scan. The difference in scanning speed of 1000/1001 causes the line of blending to move along the field.

Since editing was partially done in the digital world, each digital “cut” caused blend pattern to “reset”, which invalidated the simple formula for blend position.

Quoting the “Analysing Your DVD Footage” guide on how bad it is:

This is the nastiest of the nasty Telecine methods and can be seen in every Gainax production from Otaku no Video to Kare Kano and in many more late 80s to late 90s anime. It’s the reason Eva Renewal had to be made. […]

This stuff is nigh impossible to recover at the moment. There are no tools available at our level which can deal with field blending like this […]

It’s nasty stuff - especially in pans where you can end up with an IVTC process giving you a scene with horizontal motion with the top half moving in one frame and the bottom half catching it up in the next frame. Evil.

That article too has a similar explanation for how it happens.

Film editing with scissors and office tape

Kare Kano was partially edited in film and partially in the digital world.
The first one is visible as a jagged film edge sometimes getting on the screen.
The second one sometimes appears as video cut in between frame fields.

a film edge visible on the screen; episode 02, picture is upscaled from one field with nnedi

I kind of like the first artifact because it adds vintage analog feel to the video, however it is still a video artifact.

These artifacts are minor, but I decided to include them anyway.

Dot crawl

This artifact is caused by interference between luma signal and chroma signal, due to excellent implementation of color television. It happens in composite video, NTSC and PAL.

I see dots crawling in your eyes, Miyazawa; episode 01, 300% zoom with nearest point interpolation

You can observe the checkerboard pattern on the edge of colorful regions. There are avisynth plugins to deal with this, and you have to tune the “knobs” to do it right.

Summary

The process of producing Kare Kano DVDs is a disaster.
Almost every step after film is a terrible information loss.
Artifacts are clearly visible and cannot be reversed.

Several artifacts are caused by NTSC color implementation and non-integral framerate.

film is edited with visible cuts and gluing.
telecine is done with incorrect framerate, causing field blends and converting video to interlaced.
this step is the worst of all. Literally Hitler.
the video is then transported using composite video signal, causing dot crawl.
video is edited digitally, adding pure interlaced 60-ish fps effects.
video is lossily encoded using MPEG2 and packed into DVD.

I am not going to work with Kare Kano DVDs in my life again.

Fuck. That. Shit.

If you happen to know where to get Kare Kano film/negatives, please drop an email to m1el.2027@gmail.com. I am willing to scan and compile them myself.

And a comics, just for lulz:

if you ever thought I have drawing skills, you might be disappointed

http://m1el.github.io/karekano/