Our paper titled “Toward Online Testing of Federated and Heterogeneous Distributed Systems” will appear at the 2011 USENIX Annual Technical Conference (USENIX ATC ’11).
In this paper, we argue that distributed system reliability should be improved by proactively identifying potential faults using an online testing functionality. We propose an approach called DiCE that continuously and automatically explores the system behavior, to check whether the system deviates from its desired behavior. This paper outlines our vision and the problem we want to tackle. Then, it focuses on describing our experience in integrating DiCE with an open-source BGP router. We evaluate DiCE’s ability to quickly detect origin misconfiguration (popularly known as ‘prefix hijacking’), a recurring operator mistake that causes Internet-wide outages. The most (in)famous instance is perhaps the one of YouTube hijacking.