Know your latencies

I find it helpful to know the orders of magnitude by which certain computer operations differ. Certainly it is not worth the effort to pay attention to every digit or learn these by heart, especially since they differ (slightly) across systems, but having a basic understanding of what a tiny fraction of time a CPU cycle occupies compared to sending a TCP packet is incredibly helpful whenever reasoning about systems performance....

June 19, 2020 · Max Inden

26th DistSys Reading Group - Cache coherence

We have long been planning to cover the caching mechanisms in CPUs. As a shared knowledge base for the discussions in this session we chose the following two articles by Martin Thompson among other things known for his work on the LMAX Disruptor: CPU Cache Flushing Fallacy including a good overview over the different caches in modern Intel CPUs. Write Combining exemplifying the advanced mechanisms one can find in today’s CPUs and how one can make use of them....

May 18, 2020 · Max Inden

25th DistSys Reading Group - Fair queuing

In the session today we covered Madhavapeddi Shreedhar and George Varghese paper “Efficient fair queuing using deficit round-robin” [1]. While the session was not so much about the relatively simple algorithmic details of deficit-round-robin (still worth checking out) we talked about: Its benefits over basic FIFO queuing and thus its impact for congestion controlled traffic (tcp) compared to not congestion controlled traffic (udp). Its wide deployment still seen today. Its derivatives DRR+ and DRR++ being able to handle both best-effort as well as latency critical flows....

April 27, 2020 · Max Inden

24th DistSys Reading Group - BBR Congestion-Based Congestion Control

After a bit of a break due to current pandemic we decided to carry on and continue our meetings as virtual calls. Ignoring the usual initial hiccups and the missing whiteboard the medium worked well for us. Topic and reading of this session was the ACM Queue article BBR: Congestion-Based Congestion Control [1], as well as the Dropbox article Evaluating BBRv2 on the Dropbox Edge Network [2]. We started off with a quick recap of the previous session covering why we need congestion control, how one can view a multi-hop connection as a single hop connection with a single bottleneck and most importantly the fact that the Internet is the largest distributed system that most of the time “just works” due to congestion control....

April 6, 2020 · Max Inden

Elimination back-off stack performance

I recently stumbled upon the idea of an Elimination Back-off Stack promising to be a parallel, linearizable and lock-free stack. In case you are not familiar with it, I would suggest either reading my previous post or the corresponding paper [1] itself. Being quite intrigued by the ideas of the above stack I wrote my own implementation in Rust with a little help from crossbeam. In this post I will compare my implementation to other stack implementations....

April 1, 2020 · Max Inden

Elimination back-off stack

Reading The Art of Multiprocessor Programming [1] I came across the Elimination Back-off Stack [2] datastructure introduced in 2004 by Danny Hendler, Nir Shavit, and Lena Yerushalmi. It promises to be a parallel lock-free stack. How can a stack allow parallel operations without going through a single serialization point, e.g. a Mutex or an Atomic? Let’s dive into it. A lock-free stack A lock-free stack, also often referred to as a Treiber stack [3] due to Kent Treiber, operates on top of a lock-free linked list....

March 28, 2020 · Max Inden

Leaving the Prometheus team

In January 2017 I joined the company CoreOS as a test-engineer helping the monitoring team and the rkt container engine team write reliable software. Eventually I joined the CoreOS’ monitoring team full-time as a software engineer and ultimately was invited to be part of the upstream Prometheus team due to my contributions to the Alertmanager sub-project. Over the next 2 years and 4 month I worked a lot on Alertmanager, e....

March 14, 2020 · Max Inden

23rd Distributed Systems Paper Club

At the end of the previous session one of us suggested to dive into congestion control algorithms. This has found a greater echo, thus the 23rd session covered congestion control algorithms in general and TCP’s Reno as well as TCP’s Tahoe in particular. This weeks reading was: Chapter 13 “TCP Reno and Congestion Management” from the comprehensive online book “An Introduction to Computer Networks” [1] from the Loyola University Chicago....

February 18, 2020 · Max Inden

22nd Distributed Systems Paper Club

In the 22nd session we took a look at io_uring - a new Kernel interface for asynchronous I/O. Tyler, who is currently implementing an io_uring library in Rust [4] for his database sled [7] guided us through the concepts as well as a bunch of source code. Tyler started off introducing the status quo of I/O interfaces within the Linux Kernel like read, pread and preadv, jumped over to asynchronous I/O like aio and eventually helped us develop a sense of what the perfect asynchronous I/O interface of the future could look like....

January 28, 2020 · Max Inden

21st Distributed Systems Paper Club

We started the new year with a session on epidemic / gossip protocols. To decide what to read I compiled the following list of papers that I either enjoyed reading in the past, or that were recommended to me. The Swim (Scalable failure detection and membership protocol) paper won the poll. Das, Abhinandan, Indranil Gupta, and Ashish Motivala. “Swim: Scalable weakly-consistent infection-style process group membership protocol.” Proceedings International Conference on Dependable Systems and Networks....

January 24, 2020 · Max Inden