GeistHaus
log in · sign up

Sampling v. tracing

danluu.com
6 pages link to this URL
Parse huge XML files quick with Rust + Serde + quick-xml

Recently, I wondered whether songs were being increasingly released with entirely lowercase titles. I ended up using MusicBrainz’ library as a datasource, which comes packaged up as a Postgres database, but the data source I considered initially was one of Discogs’ monthly data dumps, which is made available for download as a set of (gzipped) XML files. The file I was interested in – the releases dataset – is 11.62 GB gzipped, 74 GB once decompressed. I wanted to iterate through every record and check for (a) entry quality, and (b) whether the track title was lowercase. Normally I’d use some kind of serialization framework, gesture at the shape of data, and then tell the framework to deserialize the whole object, but - that’s not an option when the file you’re working with is several times the RAM on your computer.

0 inbound links article en posts rustsoftware devRustSoftware Dev
Profiling in production with function call traces

We introduce a new C++ function tracing profiler, and discuss how to use such a profiler, how to make one for native code, and how a simple CPU hardware feature can make tracing very cheap for compiled, interpreted and JITted languages

1 inbound link article en