Don't Panic! Better, Fewer, Syntax Errors for LR Parsers
Syntax errors are generally easy to fix for humans, but not for parsers, in general, and LR parsers, in particular. Traditional ‘panic mode’ error recovery, though easy to implement and applicable to any grammar, often leads to a cascading chain of errors that drown out the original. More advanced error recovery techniques suffer less from this problem but have seen little practical use because their typical performance was seen as poor, their worst case unbounded, and the repairs they reported arbitrary. In this paper we introduce an algorithm and implementation that addresses these issues. First, we report the complete set of minimum cost repair sequences for a given location, allowing programmers to select the one that best fits their intention. Second, on a corpus of 200,000 real-world syntactically invalid Java programs, we are able to repair 98.38%±0.018% of files within a cut-off of 0.5s. Finally, we use the existence of the complete set of minimum cost repair sequences to reduce one of the most frustrating consequences of error reporting: the cascading error problem. Across our corpus, we report 435,824±480 error locations to the user, while the panic mode algorithm reports 981,628±0 error locations: in other words, we reduce the cascading error problem by well over half.