31st DistSys Reading Group - BGP 1

We decided to turn our interest to BGP which we will devote 3 sessions to. In today’s session - the first one - we introduced BGP, looked at the convergence problem, as well as the solution suggested in the paper below. Gao, Lixin, and Jennifer Rexford. “Stable Internet routing without global coordination.” IEEE/ACM Transactions on networking 9.6 (2001): 681-692. To play around with BGP as well as general Internet routing: ...

February 24, 2021 · Max Inden

30th DistSys Reading Group - NTP

What better way to start a new year than with a paper discussing how to change time? In the 30th session we discussed a paper which I think has much up its sleeves - Attacking the Network Time Protocol. First off the paper gives us a good introduction to the inner working of the network time protocol. Next up it examines the broader ecosystem as well as why we need accurate time in the first place. Once we established enough background, the paper dives into how one can attack the protocol, starting off with on-path attacks all the way to some crazy (creative) off-path attacks. Last but not least, the list of references at the end is a small treasure trove. ...

January 26, 2021 · Max Inden

28th DistSys Reading Group - Hotstuff

With the 28th session we jumped into the space of byzantine fault tolerant consensus protocols. We covered fault tolerant consensus with various Paxos variants in the past, but this session was the first one looking into how to solve the byzantine generals problem. Instead of using PBFT [1] as a first paper we went with Hotstuff [2] instead. The reasoning behind this choice was (a) Hotstuff presenting a somewhat easy up-to-date consensus algorithm and (b) that it provides a framework enabling one to compare other algorithms (e.g. PBFT) in the space. ...

September 8, 2020 · Max Inden

Know your latencies

I find it helpful to know the orders of magnitude by which certain computer operations differ. Certainly it is not worth the effort to pay attention to every digit or learn these by heart, especially since they differ (slightly) across systems, but having a basic understanding of what a tiny fraction of time a CPU cycle occupies compared to sending a TCP packet is incredibly helpful whenever reasoning about systems performance. ...

June 19, 2020 · Max Inden

26th DistSys Reading Group - Cache coherence

We have long been planning to cover the caching mechanisms in CPUs. As a shared knowledge base for the discussions in this session we chose the following two articles by Martin Thompson among other things known for his work on the LMAX Disruptor: CPU Cache Flushing Fallacy including a good overview over the different caches in modern Intel CPUs. Write Combining exemplifying the advanced mechanisms one can find in today’s CPUs and how one can make use of them. ...

May 18, 2020 · Max Inden

25th DistSys Reading Group - Fair queuing

In the session today we covered Madhavapeddi Shreedhar and George Varghese paper “Efficient fair queuing using deficit round-robin” [1]. While the session was not so much about the relatively simple algorithmic details of deficit-round-robin (still worth checking out) we talked about: Its benefits over basic FIFO queuing and thus its impact for congestion controlled traffic (tcp) compared to not congestion controlled traffic (udp). Its wide deployment still seen today. Its derivatives DRR+ and DRR++ being able to handle both best-effort as well as latency critical flows. ...

April 27, 2020 · Max Inden

24th DistSys Reading Group - BBR Congestion-Based Congestion Control

After a bit of a break due to current pandemic we decided to carry on and continue our meetings as virtual calls. Ignoring the usual initial hiccups and the missing whiteboard the medium worked well for us. Topic and reading of this session was the ACM Queue article BBR: Congestion-Based Congestion Control [1], as well as the Dropbox article Evaluating BBRv2 on the Dropbox Edge Network [2]. We started off with a quick recap of the previous session covering why we need congestion control, how one can view a multi-hop connection as a single hop connection with a single bottleneck and most importantly the fact that the Internet is the largest distributed system that most of the time “just works” due to congestion control. ...

April 6, 2020 · Max Inden

Elimination back-off stack performance

I recently stumbled upon the idea of an Elimination Back-off Stack promising to be a parallel, linearizable and lock-free stack. In case you are not familiar with it, I would suggest either reading my previous post or the corresponding paper [1] itself. Being quite intrigued by the ideas of the above stack I wrote my own implementation in Rust with a little help from crossbeam. ...

April 1, 2020 · Max Inden

Elimination back-off stack

Reading The Art of Multiprocessor Programming [1] I came across the Elimination Back-off Stack [2] datastructure introduced in 2004 by Danny Hendler, Nir Shavit, and Lena Yerushalmi. It promises to be a parallel lock-free stack. How can a stack allow parallel operations without going through a single serialization point, e.g. a Mutex or an Atomic? Let’s dive into it. A lock-free stack A lock-free stack, also often referred to as a Treiber stack [3] due to Kent Treiber, operates on top of a lock-free linked list. The entry point of the stack is an atomic pointer which is either pointing to the node on the very top of the stack or is null in case of an empty stack. ...

March 28, 2020 · Max Inden

23rd Distributed Systems Paper Club

At the end of the previous session one of us suggested to dive into congestion control algorithms. This has found a greater echo, thus the 23rd session covered congestion control algorithms in general and TCP’s Reno as well as TCP’s Tahoe in particular. This weeks reading was: Chapter 13 “TCP Reno and Congestion Management” from the comprehensive online book “An Introduction to Computer Networks” [1] from the Loyola University Chicago. ...

February 18, 2020 · Max Inden