Introduction

  • The Syntax Analysis phase is also called Parsing.

Definition

  • Syntax Analysis is the second layer/phase of a compiler, in which the parser analyzes the sequence of tokens produced by the lexical analyzer phase to determine whether the input program (created tokens) follows the grammar rules of a programming language or not. A parser also checks the syntax error (structure) of a program and constructs a parse tree, or syntax tree, or abstract syntax tree (AST) for later compiler semantic phases. 

Characteristics

  • Syntax Analysis is performed by a parser generator tool called a Parser/Syntax Analyzer. For example, the Yacc (Yet Another Compiler Compiler) tool.
  • It is the bridge between raw tokens (of the Lexical Analysis phase) and meaningful program structure(of the Semantic Analysis phase).

Roles or Functions of Parser

  • In detail, the function of Parser can be categorized into –
    • Syntax Checking
      • It receives tokens as input from the lexical analyzer phase and ensures that the program’s structure is syntactically or grammatically correct.
      • For example, it detects errors like missing semicolons, unmatched parentheses, incorrect statement order, etc.
    • Parse Tree Construction
      • It builds a hierarchical structure (called a parse tree) that represents the grammatical structure of the source code.
      • For example, for x = y + z; , the parser builds a tree showing = as the root, with x on the left and y + z on the right.
    • Error Detection and Recovery
      • It reports syntax errors when tokens don’t match grammar rules.
      • It implements strategies to recover and continue parsing (like skipping tokens or inserting missing symbols).
      • For example, if a ) is missing, the parser may insert it to continue analysis.
    • Interface Between Front-End and Middle-End
      • It provides a structured representation (AST) to the semantic analyzer and later phases.
      • It ensures that this compiler phase has a clear, tree-based representation of the program.

Types of Parser

  • A parser is the major component of a compiler(Syntax Analysis phase) that checks whether the sequence of tokens (from the lexical analyzer) follows the grammar rules of the programming language.
  • Parsers are broadly classified into two subcategories-
    • Top-Down Parsers
      • Top-down parsers are parsers that start constructing the parse tree from the root (the start symbol of the grammar) and expand it downwards toward the leaves by repeatedly applying production rules, predicting which production rule to use based on the next input token.
      • Top-Down Parsers are of two Subtypes-
        • Recursive Descent Parser:
          • A Recursive Descent Parser is a type of top-down parser that uses a set of recursive procedures, one for each non-terminal in the grammar, where each procedure attempts to match a portion of the input; however, this method may require backtracking if the wrong production is chosen.
          • It implements grammar rules as recursive procedures.
          • It is simple to code but may require backtracking.
        • Predictive Parser (LL Parser):
          • A Predictive Parser (LL Parser) is a refined form of a recursive descent parser that avoids backtracking by using lookahead tokens to decide which production rule to apply, but it requires the grammar to be left-factored and free of left recursion.
          • LL parsers are top-down parsers that scan input left to right and produce a leftmost derivation of the parse tree.
          • They are simple, efficient, and widely used for grammars that are suitable for predictive parsing, especially LL(1) grammars.
          • It is a form of recursive descent parser without backtracking.
          • It uses lookahead tokens to decide which production to apply in advance.
          • Its grammar must be left-factored and free of left recursion.
          • It is also called the LL parser, where L (first) – stands for Left-to-right scanning of the input, and L (second) – stands for Leftmost derivation of the parse tree. So, an LL parser is a parser that:-
            • Reads the input from left to right.
            • Constructs a leftmost derivation of the grammar.
          • There are several types of LL parsers, such as –
            • The LL(1) Parser is the most common type of LL parser, which uses 1 token to make parsing decisions.
            • The LL(k) Parser means the parser uses k lookahead tokens to make parsing decisions.
    • Bottom-Up Parsers
      • Bottom-Up parsers are parsers that start constructing the parse tree from the input tokens (leaves) and build the parse tree upward to the root by repeatedly applying production rules, predicting which production rule to use based on the next input token.
      • In other words, Bottom-Up Parsers are parsers that start from the input tokens (the leaves of the parse tree) and gradually reduce substrings of input into higher-level non-terminals until the start symbol(reached) is reached, thereby constructing the parse tree from the bottom up.  
      • There are the following types of Bottom-Up Parsers
        • Shift-Reduce Parser
          • A Shift-Reduce Parser is a type of bottom-up parser that uses a stack to hold symbols and performs two main operations: shifting the next input token onto the stack and reducing a sequence of symbols on the stack into a non-terminal according to grammar rules.
          • Here, Shift means push the next input token onto the stack, and Reduce means replace a sequence of stack symbols with a non-terminal using grammar rules.
        • LR Parser 
          • An LR Parser (Left-to-right, Rightmost derivation) is a powerful bottom-up parser that reads the input from left to right and constructs a rightmost derivation in reverse, making it suitable for handling a wide range of programming language grammars.
          • LR Parser has 3 Variants:-
            • SLR (Simple LR)
              • It is a simplified version of LR parsers that are easier to implement but less powerful, as they cannot handle all context-free grammars.
            • CLR (Canonical LR)
              • It is the most powerful form of LR parsers, because it is capable of handling complex grammars, but they are more difficult to construct and require larger parsing tables.
            • LALR (Lookahead LR)
              • This LR parsers balance between power and efficiency.
              • This LR parser is widely used in practice.
              • It is a practical compromise between SLR and Canonical LR parsers, combining efficiency with sufficient power.
              • They are widely used in real-world compiler construction tools such as Yacc.

Example

  • Suppose we have a C program: int x = 10 + 5;
  • The Lexical Analyzer Output would be a Tokens = int, x, =, 10, +, 5, ;
  • Now the Parser Role is:
    • Checks grammar first, i.e., check for valid declaration and assignment, and when found true, then builds AST/Parse Tree:
    Assignment (=)
       ├── Identifier (x)
       └── Addition (+)
            ├── Number (10)
            └── Number (5)

    Yacc 

    • Yacc (Yet Another Compiler Compiler) is a popular parser generator tool that automatically generates parsers for programming languages from a grammar specification written in a special file (usually ending in .y).
    • Yacc does the Syntax Analysis or Parsing (as in the above).
    • It si developed by Stephen C. Johnson at Bell Labs in the 1970s for Unix.
    • It produces efficient LALR(1) parsers, which are bottom-up shift-reduce parsers capable of handling a wide range of context-free grammars used in compilers.
    • It uses tokens generated by the Lex tool of the Lexical Analysis phase.
    • Lex performs lexical analysis, while Yacc performs syntax analysis; they work together but do not perform the same task.
    • This tool checks syntax (grammar rules), builds a parse tree, and detects syntax errors.
    • It outputs a C file (typically y.tab.c) containing the yyparse() function, parsing tables, and supporting code. 

    Loading


    0 Comments

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.