Max Inden

28th DistSys Reading Group - Hotstuff

With the 28th session we jumped into the space of byzantine fault tolerant consensus protocols. We covered fault tolerant consensus with various Paxos variants in the past, but this session was the first one looking into how to solve the byzantine generals problem. Instead of using PBFT [1] as a first paper we went with Hotstuff [2] instead. The reasoning behind this choice was (a) Hotstuff presenting a somewhat easy up-to-date consensus algorithm and (b) that it provides a framework enabling one to compare other algorithms (e.g. PBFT) in the space. ...

Know your latencies

I find it helpful to know the orders of magnitude by which certain computer operations differ. Certainly it is not worth the effort to pay attention to every digit or learn these by heart, especially since they differ (slightly) across systems, but having a basic understanding of what a tiny fraction of time a CPU cycle occupies compared to sending a TCP packet is incredibly helpful whenever reasoning about systems performance. ...

26th DistSys Reading Group - Cache coherence

We have long been planning to cover the caching mechanisms in CPUs. As a shared knowledge base for the discussions in this session we chose the following two articles by Martin Thompson among other things known for his work on the LMAX Disruptor: CPU Cache Flushing Fallacy including a good overview over the different caches in modern Intel CPUs. Write Combining exemplifying the advanced mechanisms one can find in today’s CPUs and how one can make use of them. ...

25th DistSys Reading Group - Fair queuing

In the session today we covered Madhavapeddi Shreedhar and George Varghese paper “Efficient fair queuing using deficit round-robin” [1]. While the session was not so much about the relatively simple algorithmic details of deficit-round-robin (still worth checking out) we talked about: Its benefits over basic FIFO queuing and thus its impact for congestion controlled traffic (tcp) compared to not congestion controlled traffic (udp). Its wide deployment still seen today. Its derivatives DRR+ and DRR++ being able to handle both best-effort as well as latency critical flows. ...

24th DistSys Reading Group - BBR Congestion-Based Congestion Control

After a bit of a break due to current pandemic we decided to carry on and continue our meetings as virtual calls. Ignoring the usual initial hiccups and the missing whiteboard the medium worked well for us. Topic and reading of this session was the ACM Queue article BBR: Congestion-Based Congestion Control [1], as well as the Dropbox article Evaluating BBRv2 on the Dropbox Edge Network [2]. We started off with a quick recap of the previous session covering why we need congestion control, how one can view a multi-hop connection as a single hop connection with a single bottleneck and most importantly the fact that the Internet is the largest distributed system that most of the time “just works” due to congestion control. ...

Elimination back-off stack performance

I recently stumbled upon the idea of an Elimination Back-off Stack promising to be a parallel, linearizable and lock-free stack. In case you are not familiar with it, I would suggest either reading my previous post or the corresponding paper [1] itself. Being quite intrigued by the ideas of the above stack I wrote my own implementation in Rust with a little help from crossbeam. ...

Elimination back-off stack

Reading The Art of Multiprocessor Programming [1] I came across the Elimination Back-off Stack [2] datastructure introduced in 2004 by Danny Hendler, Nir Shavit, and Lena Yerushalmi. It promises to be a parallel lock-free stack. How can a stack allow parallel operations without going through a single serialization point, e.g. a Mutex or an Atomic? Let’s dive into it. A lock-free stack A lock-free stack, also often referred to as a Treiber stack [3] due to Kent Treiber, operates on top of a lock-free linked list. The entry point of the stack is an atomic pointer which is either pointing to the node on the very top of the stack or is null in case of an empty stack. ...

Leaving the Prometheus team

In January 2017 I joined the company CoreOS as a test-engineer helping the monitoring team and the rkt container engine team write reliable software. Eventually I joined the CoreOS’ monitoring team full-time as a software engineer and ultimately was invited to be part of the upstream Prometheus team due to my contributions to the Alertmanager sub-project. Over the next 2 years and 4 month I worked a lot on Alertmanager, e.g. writing parts of its Elm-based UI and introducing API v2, a bit on Prometheus itself and a bunch within the overall Prometheus and Kubernetes ecosystem e.g. on kube-state-metrics. In addition being part of the open source project enabled me to speak at many conferences and meetups. ...

23rd Distributed Systems Paper Club

At the end of the previous session one of us suggested to dive into congestion control algorithms. This has found a greater echo, thus the 23rd session covered congestion control algorithms in general and TCP’s Reno as well as TCP’s Tahoe in particular. This weeks reading was: Chapter 13 “TCP Reno and Congestion Management” from the comprehensive online book “An Introduction to Computer Networks” [1] from the Loyola University Chicago. ...

22nd Distributed Systems Paper Club

In the 22nd session we took a look at io_uring - a new Kernel interface for asynchronous I/O. Tyler, who is currently implementing an io_uring library in Rust [4] for his database sled [7] guided us through the concepts as well as a bunch of source code. Tyler started off introducing the status quo of I/O interfaces within the Linux Kernel like read, pread and preadv, jumped over to asynchronous I/O like aio and eventually helped us develop a sense of what the perfect asynchronous I/O interface of the future could look like. For alll of this he used Jens Axboe’s slides from a Kernel Recipes 2019 talk [1]. ...