ANTLR (ANother Tool for Language Recognition) is a powerful parser generator that generates a language recognizer from a grammar you provide. While both ANTLR 3 and ANTLR 4 fulfill this purpose, they feature fundamental differences in their parsing strategy, how they handle grammar rules, and how they separate grammar logic from application code.

ANTLR 3

  • Parsing algorithm: Uses a static, greedy LL() algorithm that determines its lookahead depth during parser generation. This required developers to manually fix certain grammar issues, such as left recursion, to make them palatable for the tool. It also supported auto-backtracking as an option for grammars that were too complex for the static LL() analysis.
  • Grammar development: Allowed embedding custom code, known as “actions,” directly inside the grammar rules. This could mix parsing logic with business logic and was useful for building Abstract Syntax Trees (ASTs).
  • Abstract Syntax Tree (AST) construction: Supported building custom ASTs with specific tree rewrite rules (->) and AST operators (^, !) within the grammar. This gave developers direct control over the structure of the output tree.
  • Tree grammars: Included specific tree grammars that could be used to process the custom ASTs created by the parser.
  • IDE support: Was bundled with ANTLRWorks, a graphical development environment that included an editor, an interpreter for rapid prototyping, and a debugger.

ANTLR 4

  • Parsing algorithm: Uses the more advanced, adaptive LL() (or ALL()) algorithm. It dynamically computes the necessary lookahead at runtime and can handle most grammars, even those with direct left recursion, without special handling. This simplifies grammar development by eliminating many manual adjustments.
  • Automatic parse tree generation: Automatically generates a complete parse tree (also known as a concrete syntax tree) from the input. Unlike ANTLR 3, it does not build ASTs directly.
  • Listeners and visitors: Generates listener and visitor interfaces for walking the parse tree. This encourages a separation of concerns, where grammar rules focus on syntax and external code (the listeners or visitors) handles processing the parse results. This approach replaced ANTLR 3’s embedded actions and tree grammars.
  • Enhanced performance: While runtime performance varies, the tool itself is significantly faster at generating parsers than ANTLR 3 because it avoids the static analysis step.

ANTLR 3 vs. ANTLR 4: A comparison

Feature ANTLR 3 ANTLR 4
Parsing Strategy Static LL(*) with optional backtracking Adaptive LL(*) that dynamically determines lookahead
Left Recursion Cannot handle left-recursive rules directly; must be factored out Automatically handles direct left-recursion rules
Output Tree Builds custom Abstract Syntax Trees (ASTs) via rewrite rules Builds a generic Parse Tree by default
Grammar Actions Can embed custom code directly within grammar rules Discourages embedded actions, favoring listeners and visitors
Tree Traversal Uses separate tree grammars for processing ASTs Uses auto-generated listener and visitor interfaces for walking the parse tree
Parser Generation Speed Slower, requiring static DFA table computation Much faster, as it avoids static analysis during generation
Complexity More complex to write robust grammars due to manual factoring and lookahead management Simpler grammar development, especially for left-recursive and ambiguous rules

Antlr4 built by Antlr3

The ANTLR 4 tool uses ANTLR 3 and its runtime to process grammar files and generate a parser . It is not self-hosting, which would mean it uses a parser generated by ANTLR 4 itself. This approach is a form of bootstrapping, where a previous version of a compiler is used to build the next version.

Here are the key points regarding ANTLR 4’s bootstrapping process:

  • Built by ANTLR 3: The ANTLR 4 tool, which is a Java application, relies on the ANTLR 3 tool and runtime to read .g4 grammar files and create its own parser. There is no publicly developed project to make ANTLR 4 self-host.
  • Grammar differences: The ANTLR 4 grammar syntax is significantly different from ANTLR 3, most notably because ANTLR 4 handles direct left-recursion automatically and no longer supports manual AST construction. However, the ANTLR 4 grammar for the ANTLR language itself is processed by the older version of the tool.
  • Self-hosting is complex: While “eating your own dog food” is a common software development practice, bootstrapping a compiler with itself is a complex task. Using the stable ANTLR 3 tool to build the new ANTLR 4 tool eliminates the circular dependency and ensures a stable build process.