We are very happy to announce that Hamid Ghasemirahni successfully defended his PhD thesis (second and final one in the ERC ULTRA project) on November 18, 2024! Marco Chiesa has done a superb job as a co-advisor, and we are very grateful to Prof. Gerald Q. Maguire Jr. for his stellar insights (as usual). Gábor Rétvári was the opponent at the defense, while Paris Carbone served as the Chair. Hamid’s thesis is available online:
In short, this thesis contains the work on Reframer showing a surprising result that deliberately delaying packets can improve the performance of backend servers by up to about a factor of 2 (e.g., those used for Network Function Virtualization). It also includes FAJITA, which shows that a commodity server running a chain of stateful network functions can process more than 170 M packets per second (equivalent of 1.4 Tbps if payloads are stored in a disaggregated fashion as in our earlier Ribosome work [NSDI ’23]!).
A few images from the defense and the celebration are below.
Hamid presenting during the defense (image taken by Dejan Kostic).
Paris congratulates Hamid on the successfully defended PhD thesis (image taken by Dejan Kostic).
Dejan hands the traditional gift to Hamid (image taken by Voravit Tanyingyong).
Group image with colleagues and Dejan (image taken by Voravit Tanyingyong).
We are happy to announce that Massimo Girondi successfully defended his licentiate thesis (licentiate is a degree at KTH half-way to a PhD)! Marco Chiesa has done an excellent job as a co-advisor and as is customary we are very grateful to Prof. Gerald Q. Maguire Jr. for his key insights. Giuseppe Siracusano was a superb opponent at the licentiate seminar, with Amir Payberah as the examiner. Massimo’s thesis (second licentiate thesis of this project) is available online:
Group shot of Networked Systems Laboratory members (Massimo is beneath the KTH logo). Image taken by Voravit Tanyingyong
Dejan hands the gift to Massimo a few weeks later in the hallway that Massimo chose for the shot. Definitely looks better than the opposite side we used in the past! (image taken by Voravit Tanyingyong)
Can networking applications achieve suitable performance with IOMMU at high rates? Our recent PeerJ CS article answers this question by characterizing the performance implications of IOMMU and its cache (IOTLB) on recent Intel Xeon Scalable & AMD EPYC processors at 200 Gbps. Our study shows that enabling IOMMU at high rates could result in an up-to-20-percent throughput drop due to excessive IOTLB misses. Moreover, we present potential mitigation techniques to recover the introduced throughput drop caused by the “IOTLB wall” by using hugepage-backed buffers in the Linux kernel. This is joint work with Alireza Farshin (KTH), Luigi Rizzo (Google), Khaled Elmeleegy (Google), and Dejan Kostic (KTH). Follow the links for PDF and code.”
At NSDI ’22, Waleed presented our RedN paper that shows a suprising result, namely that Remote Direct Memory Access (RDMA), as implemented in widely deployed RDMA Network Interface Cards, is Turing Complete. We leverage this finding to reduce the tail latency of services running on busy servers by 35x! Full Abstract is below. This is joint work with Waleed Reda, Marco Canini (KAUST), Dejan Kostić, and Simon Peter (UW).
It is becoming increasingly popular for distributed systems to exploit offload to reduce load on the CPU. Remote Direct Memory Access (RDMA) offload, in particular, has become popular. However, RDMA still requires CPU intervention for complex offloads that go beyond simple remote memory access. As such, the offload potential is limited and RDMA-based systems usually have to work around such limitations.
We present RedN, a principled, practical approach to implementing complex RDMA offloads, without requiring any hardware modifications. Using self-modifying RDMA chains, we lift the existing RDMA verbs interface to a Turing complete set of programming abstractions. We explore what is possible in terms of offload complexity and performance with a commodity RDMA NIC. We show how to integrate these RDMA chains into applications, such as the Memcached key-value store, allowing us to offload complex tasks such as key lookups. RedN can reduce the latency of key-value get operations by up to 2.6× compared to state-of-the-art KV designs that use one-sided RDMA primitives (e.g., FaRM-KV), as well as traditional RPC-over-RDMA approaches. Moreover, compared to these baselines, RedN provides performance isolation and, in the presence of contention, can reduce latency by up to 35× while providing applications with failure resiliency to OS and process crashes.
We are hugely honored that our “Packet Order Matters!” paper received the Community Award at NSDI 2022! More details are available in our earlier post.