This is one of the most interesting papers I have read so far in this class. The problem of reproducing bugs is one that I face on a regular basis at work. The code that I write runs on a card (“small computer”) embedded in a big server and is responsible for driving I/O operations between the server and storage over a fiber channel network. We drive on average 100.000 I/O operations per second. Concurrency and timings bugs are common and reproducing them is painful and not always possible. We have been working for years to improve our ability to reproduce problems, but externals timings condition (On the server, network and storage) makes it even more challenging. The system that we currently have mainly consists of logging huge amount of data so that the error can reproduce. However, this represents a significant overhead and is only used under development and test. It is impossible to add so many traces in production code because customers cannot tolerate the performance impact.
The paper presents the idea of recording only partial information (sketches) during production and using an intelligent replayer during debugging. The sketches are used as a guideline to the replayer to reproduce the bug. The tool uses a different idea than most of the bug reproduction tools. The main goal here is to reproduce the bug by finding a combination that leads to the bug and not to reproduce the exact execution path and states that led to the bug. This is a great approach because different executions path and states can lead to the same bugs and this approach increases the changes of reproducing the bug with fewer replays.
Many reproductions are usually necessary for the programmer to find the root cause of complex bugs. So, one of the major advantages of this tool is that it guaranties reliably reproduction of the bug after the first successful reproduction.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment