aboutsummaryrefslogtreecommitdiffstats
path: root/gnuwin32/man/cat1p/yacc.1p.txt
diff options
context:
space:
mode:
Diffstat (limited to 'gnuwin32/man/cat1p/yacc.1p.txt')
-rw-r--r--gnuwin32/man/cat1p/yacc.1p.txt1269
1 files changed, 0 insertions, 1269 deletions
diff --git a/gnuwin32/man/cat1p/yacc.1p.txt b/gnuwin32/man/cat1p/yacc.1p.txt
deleted file mode 100644
index cab30089..00000000
--- a/gnuwin32/man/cat1p/yacc.1p.txt
+++ /dev/null
@@ -1,1269 +0,0 @@
-YACC(1P) POSIX Programmer's Manual YACC(1P)
-
-
-
-PROLOG
- This manual page is part of the POSIX Programmer's Man-
- ual. The Linux implementation of this interface may
- differ (consult the corresponding Linux manual page for
- details of Linux behavior), or the interface may not be
- implemented on Linux.
-
-NAME
- yacc - yet another compiler compiler (DEVELOPMENT)
-
-SYNOPSIS
- yacc [-dltv][-b file_prefix][-p sym_prefix] grammar
-
-DESCRIPTION
- The yacc utility shall read a description of a context-
- free grammar in grammar and write C source code, con-
- forming to the ISO C standard, to a code file, and
- optionally header information into a header file, in the
- current directory. The C code shall define a function
- and related routines and macros for an automaton that
- executes a parsing algorithm meeting the requirements in
- Algorithms .
-
- The form and meaning of the grammar are described in the
- EXTENDED DESCRIPTION section.
-
- The C source code and header file shall be produced in a
- form suitable as input for the C compiler (see c99 ).
-
-OPTIONS
- The yacc utility shall conform to the Base Definitions
- volume of IEEE Std 1003.1-2001, Section 12.2, Utility
- Syntax Guidelines.
-
- The following options shall be supported:
-
- -b file_prefix
- Use file_prefix instead of y as the prefix for
- all output filenames. The code file y.tab.c, the
- header file y.tab.h (created when -d is speci-
- fied), and the description file y.output (created
- when -v is specified), shall be changed to
- file_prefix .tab.c, file_prefix .tab.h, and
- file_prefix .output, respectively.
-
- -d Write the header file; by default only the code
- file is written. The #define statements associate
- the token codes assigned by yacc with the user-
- declared token names. This allows source files
- other than y.tab.c to access the token codes.
-
- -l Produce a code file that does not contain any
- #line constructs. If this option is not present,
- it is unspecified whether the code file or header
- file contains #line directives. This should only
- be used after the grammar and the associated
- actions are fully debugged.
-
- -p sym_prefix
-
- Use sym_prefix instead of yy as the prefix for
- all external names produced by yacc. The names
- affected shall include the functions yyparse(),
- yylex(), and yyerror(), and the variables yylval,
- yychar, and yydebug. (In the remainder of this
- section, the six symbols cited are referenced
- using their default names only as a notational
- convenience.) Local names may also be affected by
- the -p option; however, the -p option shall not
- affect #define symbols generated by yacc.
-
- -t Modify conditional compilation directives to per-
- mit compilation of debugging code in the code
- file. Runtime debugging statements shall always
- be contained in the code file, but by default
- conditional compilation directives prevent their
- compilation.
-
- -v Write a file containing a description of the
- parser and a report of conflicts generated by
- ambiguities in the grammar.
-
-
-OPERANDS
- The following operand is required:
-
- grammar
- A pathname of a file containing instructions,
- hereafter called grammar, for which a parser is
- to be created. The format for the grammar is
- described in the EXTENDED DESCRIPTION section.
-
-
-STDIN
- Not used.
-
-INPUT FILES
- The file grammar shall be a text file formatted as spec-
- ified in the EXTENDED DESCRIPTION section.
-
-ENVIRONMENT VARIABLES
- The following environment variables shall affect the
- execution of yacc:
-
- LANG Provide a default value for the internationaliza-
- tion variables that are unset or null. (See the
- Base Definitions volume of IEEE Std 1003.1-2001,
- Section 8.2, Internationalization Variables for
- the precedence of internationalization variables
- used to determine the values of locale cate-
- gories.)
-
- LC_ALL If set to a non-empty string value, override the
- values of all the other internationalization
- variables.
-
- LC_CTYPE
- Determine the locale for the interpretation of
- sequences of bytes of text data as characters
- (for example, single-byte as opposed to multi-
- byte characters in arguments and input files).
-
- LC_MESSAGES
- Determine the locale that should be used to
- affect the format and contents of diagnostic mes-
- sages written to standard error.
-
- NLSPATH
- Determine the location of message catalogs for
- the processing of LC_MESSAGES .
-
-
- The LANG and LC_* variables affect the execution of the
- yacc utility as stated. The main() function defined in
- Yacc Library shall call:
-
-
- setlocale(LC_ALL, "")
-
- and thus the program generated by yacc shall also be
- affected by the contents of these variables at runtime.
-
-ASYNCHRONOUS EVENTS
- Default.
-
-STDOUT
- Not used.
-
-STDERR
- If shift/reduce or reduce/reduce conflicts are detected
- in grammar, yacc shall write a report of those conflicts
- to the standard error in an unspecified format.
-
- Standard error shall also be used for diagnostic mes-
- sages.
-
-OUTPUT FILES
- The code file, the header file, and the description file
- shall be text files. All are described in the following
- sections.
-
- Code File
- This file shall contain the C source code for the
- yyparse() function. It shall contain code for the vari-
- ous semantic actions with macro substitution performed
- on them as described in the EXTENDED DESCRIPTION sec-
- tion. It also shall contain a copy of the #define state-
- ments in the header file. If a %union declaration is
- used, the declaration for YYSTYPE shall also be included
- in this file.
-
- Header File
- The header file shall contain #define statements that
- associate the token numbers with the token names. This
- allows source files other than the code file to access
- the token codes. If a %union declaration is used, the
- declaration for YYSTYPE and an extern YYSTYPE yylval
- declaration shall also be included in this file.
-
- Description File
- The description file shall be a text file containing a
- description of the state machine corresponding to the
- parser, using an unspecified format. Limits for internal
- tables (see Limits ) shall also be reported, in an
- implementation-defined manner. (Some implementations may
- use dynamic allocation techniques and have no specific
- limit values to report.)
-
-EXTENDED DESCRIPTION
- The yacc command accepts a language that is used to
- define a grammar for a target language to be parsed by
- the tables and code generated by yacc. The language
- accepted by yacc as a grammar for the target language is
- described below using the yacc input language itself.
-
- The input grammar includes rules describing the input
- structure of the target language and code to be invoked
- when these rules are recognized to provide the associ-
- ated semantic action. The code to be executed shall
- appear as bodies of text that are intended to be C-lan-
- guage code. The C-language inclusions are presumed to
- form a correct function when processed by yacc into its
- output files. The code included in this way shall be
- executed during the recognition of the target language.
-
- Given a grammar, the yacc utility generates the files
- described in the OUTPUT FILES section. The code file can
- be compiled and linked using c99. If the declaration and
- programs sections of the grammar file did not include
- definitions of main(), yylex(), and yyerror(), the com-
- piled output requires linking with externally supplied
- versions of those functions. Default versions of main()
- and yyerror() are supplied in the yacc library and can
- be linked in by using the -l y operand to c99. The yacc
- library interfaces need not support interfaces with
- other than the default yy symbol prefix. The application
- provides the lexical analyzer function, yylex(); the lex
- utility is specifically designed to generate such a rou-
- tine.
-
- Input Language
- The application shall ensure that every specification
- file consists of three sections in order: declarations,
- grammar rules, and programs, separated by double percent
- signs ( "%%" ). The declarations and programs sections
- can be empty. If the latter is empty, the preceding "%%"
- mark separating it from the rules section can be omit-
- ted.
-
- The input is free form text following the structure of
- the grammar defined below.
-
- Lexical Structure of the Grammar
- The <blank>s, <newline>s, and <form-feed>s shall be
- ignored, except that the application shall ensure that
- they do not appear in names or multi-character reserved
- symbols. Comments shall be enclosed in "/* ... */", and
- can appear wherever a name is valid.
-
- Names are of arbitrary length, made up of letters, peri-
- ods ( '.' ), underscores ( '_' ), and non-initial dig-
- its. Uppercase and lowercase letters are distinct. Con-
- forming applications shall not use names beginning in yy
- or YY since the yacc parser uses such names. Many of the
- names appear in the final output of yacc, and thus they
- should be chosen to conform with any additional rules
- created by the C compiler to be used. In particular they
- appear in #define statements.
-
- A literal shall consist of a single character enclosed
- in single-quotes ( '" ). All of the escape sequences
- supported for character constants by the ISO C standard
- shall be supported by yacc.
-
- The relationship with the lexical analyzer is discussed
- in detail below.
-
- The application shall ensure that the NUL character is
- not used in grammar rules or literals.
-
- Declarations Section
- The declarations section is used to define the symbols
- used to define the target language and their relation-
- ship with each other. In particular, much of the addi-
- tional information required to resolve ambiguities in
- the context-free grammar for the target language is pro-
- vided here.
-
- Usually yacc assigns the relationship between the sym-
- bolic names it generates and their underlying numeric
- value. The declarations section makes it possible to
- control the assignment of these values.
-
- It is also possible to keep semantic information associ-
- ated with the tokens currently on the parse stack in a
- user-defined C-language union, if the members of the
- union are associated with the various names in the gram-
- mar. The declarations section provides for this as well.
-
- The first group of declarators below all take a list of
- names as arguments. That list can optionally be pre-
- ceded by the name of a C union member (called a tag
- below) appearing within '<' and '>' . (As an exception
- to the typographical conventions of the rest of this
- volume of IEEE Std 1003.1-2001, in this case <tag> does
- not represent a metavariable, but the literal angle
- bracket characters surrounding a symbol.) The use of tag
- specifies that the tokens named on this line shall be of
- the same C type as the union member referenced by tag.
- This is discussed in more detail below.
-
- For lists used to define tokens, the first appearance of
- a given token can be followed by a positive integer (as
- a string of decimal digits). If this is done, the under-
- lying value assigned to it for lexical purposes shall be
- taken to be that number.
-
- The following declares name to be a token:
-
-
- %token [<tag>] name [number][name [number]]...
-
- If tag is present, the C type for all tokens on this
- line shall be declared to be the type referenced by tag.
- If a positive integer, number, follows a name, that
- value shall be assigned to the token.
-
- The following declares name to be a token, and assigns
- precedence to it:
-
-
- %left [<tag>] name [number][name [number]]...
- %right [<tag>] name [number][name [number]]...
-
- One or more lines, each beginning with one of these sym-
- bols, can appear in this section. All tokens on the same
- line have the same precedence level and associativity;
- the lines are in order of increasing precedence or bind-
- ing strength. %left denotes that the operators on that
- line are left associative, and %right similarly denotes
- right associative operators. If tag is present, it shall
- declare a C type for names as described for %token.
-
- The following declares name to be a token, and indicates
- that this cannot be used associatively:
-
-
- %nonassoc [<tag>] name [number][name [number]]...
-
- If the parser encounters associative use of this token
- it reports an error. If tag is present, it shall declare
- a C type for names as described for %token.
-
- The following declares that union member names are non-
- terminals, and thus it is required to have a tag field
- at its beginning:
-
-
- %type <tag> name...
-
- Because it deals with non-terminals only, assigning a
- token number or using a literal is also prohibited. If
- this construct is present, yacc shall perform type
- checking; if this construct is not present, the parse
- stack shall hold only the int type.
-
- Every name used in grammar not defined by a %token,
- %left, %right, or %nonassoc declaration is assumed to
- represent a non-terminal symbol. The yacc utility shall
- report an error for any non-terminal symbol that does
- not appear on the left side of at least one grammar
- rule.
-
- Once the type, precedence, or token number of a name is
- specified, it shall not be changed. If the first decla-
- ration of a token does not assign a token number, yacc
- shall assign a token number. Once this assignment is
- made, the token number shall not be changed by explicit
- assignment.
-
- The following declarators do not follow the previous
- pattern.
-
- The following declares the non-terminal name to be the
- start symbol, which represents the largest, most general
- structure described by the grammar rules:
-
-
- %start name
-
- By default, it is the left-hand side of the first gram-
- mar rule; this default can be overridden with this dec-
- laration.
-
- The following declares the yacc value stack to be a
- union of the various types of values desired:
-
-
- %union { body of union (in C) }
-
- By default, the values returned by actions (see below)
- and the lexical analyzer shall be of type int. The yacc
- utility keeps track of types, and it shall insert corre-
- sponding union member names in order to perform strict
- type checking of the resulting parser.
-
- Alternatively, given that at least one <tag> construct
- is used, the union can be declared in a header file
- (which shall be included in the declarations section by
- using a #include construct within %{ and %}), and a
- typedef used to define the symbol YYSTYPE to represent
- this union. The effect of %union is to provide the dec-
- laration of YYSTYPE directly from the yacc input.
-
- C-language declarations and definitions can appear in
- the declarations section, enclosed by the following
- marks:
-
-
- %{ ... %}
-
- These statements shall be copied into the code file, and
- have global scope within it so that they can be used in
- the rules and program sections.
-
- The application shall ensure that the declarations sec-
- tion is terminated by the token %%.
-
- Grammar Rules in yacc
- The rules section defines the context-free grammar to be
- accepted by the function yacc generates, and associates
- with those rules C-language actions and additional
- precedence information. The grammar is described below,
- and a formal definition follows.
-
- The rules section is comprised of one or more grammar
- rules. A grammar rule has the form:
-
-
- A : BODY ;
-
- The symbol A represents a non-terminal name, and BODY
- represents a sequence of zero or more names, literals,
- and semantic actions that can then be followed by
- optional precedence rules. Only the names and literals
- participate in the formation of the grammar; the seman-
- tic actions and precedence rules are used in other ways.
- The colon and the semicolon are yacc punctuation. If
- there are several successive grammar rules with the same
- left-hand side, the vertical bar '|' can be used to
- avoid rewriting the left-hand side; in this case the
- semicolon appears only after the last rule. The BODY
- part can be empty (or empty of names and literals) to
- indicate that the non-terminal symbol matches the empty
- string.
-
- The yacc utility assigns a unique number to each rule.
- Rules using the vertical bar notation are distinct
- rules. The number assigned to the rule appears in the
- description file.
-
- The elements comprising a BODY are:
-
- name, literal
- These form the rules of the grammar: name is
- either a token or a non-terminal; literal stands
- for itself (less the lexically required quotation
- marks).
-
- semantic action
-
- With each grammar rule, the user can associate
- actions to be performed each time the rule is
- recognized in the input process. (Note that the
- word "action" can also refer to the actions of
- the parser-shift, reduce, and so on.)
-
- These actions can return values and can obtain the val-
- ues returned by previous actions. These values are kept
- in objects of type YYSTYPE (see %union). The result
- value of the action shall be kept on the parse stack
- with the left-hand side of the rule, to be accessed by
- other reductions as part of their right-hand side. By
- using the <tag> information provided in the declarations
- section, the code generated by yacc can be strictly type
- checked and contain arbitrary information. In addition,
- the lexical analyzer can provide the same kinds of val-
- ues for tokens, if desired.
-
- An action is an arbitrary C statement and as such can do
- input or output, call subprograms, and alter external
- variables. An action is one or more C statements
- enclosed in curly braces '{' and '}' .
-
- Certain pseudo-variables can be used in the action.
- These are macros for access to data structures known
- internally to yacc.
-
- $$
- The value of the action can be set by assigning
- it to $$. If type checking is enabled and the
- type of the value to be assigned cannot be deter-
- mined, a diagnostic message may be generated.
-
- $number
- This refers to the value returned by the compo-
- nent specified by the token number in the right
- side of a rule, reading from left to right; num-
- ber can be zero or negative. If number is zero or
- negative, it refers to the data associated with
- the name on the parser's stack preceding the
- leftmost symbol of the current rule. (That is,
- "$0" refers to the name immediately preceding the
- leftmost name in the current rule to be found on
- the parser's stack and "$-1" refers to the symbol
- to its left.) If number refers to an element past
- the current point in the rule, or beyond the bot-
- tom of the stack, the result is undefined. If
- type checking is enabled and the type of the
- value to be assigned cannot be determined, a
- diagnostic message may be generated.
-
- $<tag>number
-
- These correspond exactly to the corresponding
- symbols without the tag inclusion, but allow for
- strict type checking (and preclude unwanted type
- conversions). The effect is that the macro is
- expanded to use tag to select an element from the
- YYSTYPE union (using dataname.tag). This is par-
- ticularly useful if number is not positive.
-
- $<tag>$
- This imposes on the reference the type of the
- union member referenced by tag. This construction
- is applicable when a reference to a left context
- value occurs in the grammar, and provides yacc
- with a means for selecting a type.
-
-
- Actions can occur anywhere in a rule (not just at the
- end); an action can access values returned by actions to
- its left, and in turn the value it returns can be
- accessed by actions to its right. An action appearing
- in the middle of a rule shall be equivalent to replacing
- the action with a new non-terminal symbol and adding an
- empty rule with that non-terminal symbol on the left-
- hand side. The semantic action associated with the new
- rule shall be equivalent to the original action. The use
- of actions within rules might introduce conflicts that
- would not otherwise exist.
-
- By default, the value of a rule shall be the value of
- the first element in it. If the first element does not
- have a type (particularly in the case of a literal) and
- type checking is turned on by %type, an error message
- shall result.
-
- precedence
- The keyword %prec can be used to change the
- precedence level associated with a particular
- grammar rule. Examples of this are in cases where
- a unary and binary operator have the same sym-
- bolic representation, but need to be given dif-
- ferent precedences, or where the handling of an
- ambiguous if-else construction is necessary. The
- reserved symbol %prec can appear immediately
- after the body of the grammar rule and can be
- followed by a token name or a literal. It shall
- cause the precedence of the grammar rule to
- become that of the following token name or lit-
- eral. The action for the rule as a whole can fol-
- low %prec.
-
-
- If a program section follows, the application shall
- ensure that the grammar rules are terminated by %%.
-
- Programs Section
- The programs section can include the definition of the
- lexical analyzer yylex(), and any other functions; for
- example, those used in the actions specified in the
- grammar rules. It is unspecified whether the programs
- section precedes or follows the semantic actions in the
- output file; therefore, if the application contains any
- macro definitions and declarations intended to apply to
- the code in the semantic actions, it shall place them
- within "%{ ... %}" in the declarations section.
-
- Input Grammar
- The following input to yacc yields a parser for the
- input to yacc. This formal syntax takes precedence over
- the preceding text syntax description.
-
- The lexical structure is defined less precisely; Lexical
- Structure of the Grammar defines most terms. The corre-
- spondence between the previous terms and the tokens
- below is as follows.
-
- IDENTIFIER
- This corresponds to the concept of name, given
- previously. It also includes literals as defined
- previously.
-
- C_IDENTIFIER
- This is a name, and additionally it is known to
- be followed by a colon. A literal cannot yield
- this token.
-
- NUMBER A string of digits (a non-negative decimal inte-
- ger).
-
- TYPE, LEFT, MARK, LCURL, RCURL
-
- These correspond directly to %type, %left, %%,
- %{, and %}.
-
- { ... }
- This indicates C-language source code, with the
- possible inclusion of '$' macros as discussed
- previously.
-
-
-
- /* Grammar for the input to yacc. */
- /* Basic entries. */
- /* The following are recognized by the lexical analyzer. */
-
-
- %token IDENTIFIER /* Includes identifiers and literals */
- %token C_IDENTIFIER /* identifier (but not literal)
- followed by a :. */
- %token NUMBER /* [0-9][0-9]* */
-
-
- /* Reserved words : %type=>TYPE %left=>LEFT, and so on */
-
-
- %token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION
-
-
- %token MARK /* The %% mark. */
- %token LCURL /* The %{ mark. */
- %token RCURL /* The %} mark. */
-
-
- /* 8-bit character literals stand for themselves; */
- /* tokens have to be defined for multi-byte characters. */
-
-
- %start spec
-
-
- %%
-
-
- spec : defs MARK rules tail
- ;
- tail : MARK
- {
- /* In this action, set up the rest of the file. */
- }
- | /* Empty; the second MARK is optional. */
- ;
- defs : /* Empty. */
- | defs def
- ;
- def : START IDENTIFIER
- | UNION
- {
- /* Copy union definition to output. */
- }
- | LCURL
- {
- /* Copy C code to output file. */
- }
- RCURL
- | rword tag nlist
- ;
- rword : TOKEN
- | LEFT
- | RIGHT
- | NONASSOC
- | TYPE
- ;
- tag : /* Empty: union tag ID optional. */
- | '<' IDENTIFIER '>'
- ;
- nlist : nmno
- | nlist nmno
- ;
- nmno : IDENTIFIER /* Note: literal invalid with % type. */
- | IDENTIFIER NUMBER /* Note: invalid with % type. */
- ;
-
-
- /* Rule section */
-
-
- rules : C_IDENTIFIER rbody prec
- | rules rule
- ;
- rule : C_IDENTIFIER rbody prec
- | '|' rbody prec
- ;
- rbody : /* empty */
- | rbody IDENTIFIER
- | rbody act
- ;
- act : '{'
- {
- /* Copy action, translate $$, and so on. */
- }
- '}'
- ;
- prec : /* Empty */
- | PREC IDENTIFIER
- | PREC IDENTIFIER act
- | prec ';'
- ;
-
- Conflicts
- The parser produced for an input grammar may contain
- states in which conflicts occur. The conflicts occur
- because the grammar is not LALR(1). An ambiguous grammar
- always contains at least one LALR(1) conflict. The yacc
- utility shall resolve all conflicts, using either
- default rules or user-specified precedence rules.
-
- Conflicts are either shift/reduce conflicts or
- reduce/reduce conflicts. A shift/reduce conflict is
- where, for a given state and lookahead symbol, both a
- shift action and a reduce action are possible. A
- reduce/reduce conflict is where, for a given state and
- lookahead symbol, reductions by two different rules are
- possible.
-
- The rules below describe how to specify what actions to
- take when a conflict occurs. Not all shift/reduce con-
- flicts can be successfully resolved this way because the
- conflict may be due to something other than ambiguity,
- so incautious use of these facilities can cause the lan-
- guage accepted by the parser to be much different from
- that which was intended. The description file shall con-
- tain sufficient information to understand the cause of
- the conflict. Where ambiguity is the reason either the
- default or explicit rules should be adequate to produce
- a working parser.
-
- The declared precedences and associativities (see Decla-
- rations Section ) are used to resolve parsing conflicts
- as follows:
-
- 1. A precedence and associativity is associated with
- each grammar rule; it is the precedence and associa-
- tivity of the last token or literal in the body of
- the rule. If the %prec keyword is used, it overrides
- this default. Some grammar rules might not have both
- precedence and associativity.
-
-
- 2. If there is a shift/reduce conflict, and both the
- grammar rule and the input symbol have precedence
- and associativity associated with them, then the
- conflict is resolved in favor of the action (shift
- or reduce) associated with the higher precedence. If
- the precedences are the same, then the associativity
- is used; left associative implies reduce, right
- associative implies shift, and non-associative
- implies an error in the string being parsed.
-
-
- 3. When there is a shift/reduce conflict that cannot be
- resolved by rule 2, the shift is done. Conflicts
- resolved this way are counted in the diagnostic out-
- put described in Error Handling .
-
-
- 4. When there is a reduce/reduce conflict, a reduction
- is done by the grammar rule that occurs earlier in
- the input sequence. Conflicts resolved this way are
- counted in the diagnostic output described in Error
- Handling .
-
-
- Conflicts resolved by precedence or associativity shall
- not be counted in the shift/reduce and reduce/reduce
- conflicts reported by yacc on either standard error or
- in the description file.
-
- Error Handling
- The token error shall be reserved for error handling.
- The name error can be used in grammar rules. It indi-
- cates places where the parser can recover from a syntax
- error. The default value of error shall be 256. Its
- value can be changed using a %token declaration. The
- lexical analyzer should not return the value of error.
-
- The parser shall detect a syntax error when it is in a
- state where the action associated with the lookahead
- symbol is error. A semantic action can cause the parser
- to initiate error handling by executing the macro YYER-
- ROR. When YYERROR is executed, the semantic action
- passes control back to the parser. YYERROR cannot be
- used outside of semantic actions.
-
- When the parser detects a syntax error, it normally
- calls yyerror() with the character string "syntax error"
- as its argument. The call shall not be made if the
- parser is still recovering from a previous error when
- the error is detected. The parser is considered to be
- recovering from a previous error until the parser has
- shifted over at least three normal input symbols since
- the last error was detected or a semantic action has
- executed the macro yyerrok. The parser shall not call
- yyerror() when YYERROR is executed.
-
- The macro function YYRECOVERING shall return 1 if a syn-
- tax error has been detected and the parser has not yet
- fully recovered from it. Otherwise, zero shall be
- returned.
-
- When a syntax error is detected by the parser, the
- parser shall check if a previous syntax error has been
- detected. If a previous error was detected, and if no
- normal input symbols have been shifted since the preced-
- ing error was detected, the parser checks if the looka-
- head symbol is an endmarker (see Interface to the Lexi-
- cal Analyzer ). If it is, the parser shall return with a
- non-zero value. Otherwise, the lookahead symbol shall be
- discarded and normal parsing shall resume.
-
- When YYERROR is executed or when the parser detects a
- syntax error and no previous error has been detected, or
- at least one normal input symbol has been shifted since
- the previous error was detected, the parser shall pop
- back one state at a time until the parse stack is empty
- or the current state allows a shift over error. If the
- parser empties the parse stack, it shall return with a
- non-zero value. Otherwise, it shall shift over error and
- then resume normal parsing. If the parser reads a looka-
- head symbol before the error was detected, that symbol
- shall still be the lookahead symbol when parsing is
- resumed.
-
- The macro yyerrok in a semantic action shall cause the
- parser to act as if it has fully recovered from any pre-
- vious errors. The macro yyclearin shall cause the parser
- to discard the current lookahead token. If the current
- lookahead token has not yet been read, yyclearin shall
- have no effect.
-
- The macro YYACCEPT shall cause the parser to return with
- the value zero. The macro YYABORT shall cause the parser
- to return with a non-zero value.
-
- Interface to the Lexical Analyzer
- The yylex() function is an integer-valued function that
- returns a token number representing the kind of token
- read. If there is a value associated with the token
- returned by yylex() (see the discussion of tag above),
- it shall be assigned to the external variable yylval.
-
- If the parser and yylex() do not agree on these token
- numbers, reliable communication between them cannot
- occur. For (single-byte character) literals, the token
- is simply the numeric value of the character in the cur-
- rent character set. The numbers for other tokens can
- either be chosen by yacc, or chosen by the user. In
- either case, the #define construct of C is used to allow
- yylex() to return these numbers symbolically. The
- #define statements are put into the code file, and the
- header file if that file is requested. The set of char-
- acters permitted by yacc in an identifier is larger than
- that permitted by C. Token names found to contain such
- characters shall not be included in the #define declara-
- tions.
-
- If the token numbers are chosen by yacc, the tokens
- other than literals shall be assigned numbers greater
- than 256, although no order is implied. A token can be
- explicitly assigned a number by following its first
- appearance in the declarations section with a number.
- Names and literals not defined this way retain their
- default definition. All token numbers assigned by yacc
- shall be unique and distinct from the token numbers used
- for literals and user-assigned tokens. If duplicate
- token numbers cause conflicts in parser generation, yacc
- shall report an error; otherwise, it is unspecified
- whether the token assignment is accepted or an error is
- reported.
-
- The end of the input is marked by a special token called
- the endmarker, which has a token number that is zero or
- negative. (These values are invalid for any other
- token.) All lexical analyzers shall return zero or nega-
- tive as a token number upon reaching the end of their
- input. If the tokens up to, but excluding, the endmarker
- form a structure that matches the start symbol, the
- parser shall accept the input. If the endmarker is seen
- in any other context, it shall be considered an error.
-
- Completing the Program
- In addition to yyparse() and yylex(), the functions
- yyerror() and main() are required to make a complete
- program. The application can supply main() and yyer-
- ror(), or those routines can be obtained from the yacc
- library.
-
- Yacc Library
- The following functions shall appear only in the yacc
- library accessible through the -l y operand to c99; they
- can therefore be redefined by a conforming application:
-
- int main(void)
-
- This function shall call yyparse() and exit with
- an unspecified value. Other actions within this
- function are unspecified.
-
- int yyerror(const char *s)
-
- This function shall write the NUL-terminated
- argument to standard error, followed by a <new-
- line>.
-
-
- The order of the -l y and -l l operands given to c99 is
- significant; the application shall either provide its
- own main() function or ensure that -l y precedes -l l.
-
- Debugging the Parser
- The parser generated by yacc shall have diagnostic
- facilities in it that can be optionally enabled at
- either compile time or at runtime (if enabled at compile
- time). The compilation of the runtime debugging code is
- under the control of YYDEBUG, a preprocessor symbol. If
- YYDEBUG has a non-zero value, the debugging code shall
- be included. If its value is zero, the code shall not be
- included.
-
- In parsers where the debugging code has been included,
- the external int yydebug can be used to turn debugging
- on (with a non-zero value) and off (zero value) at run-
- time. The initial value of yydebug shall be zero.
-
- When -t is specified, the code file shall be built such
- that, if YYDEBUG is not already defined at compilation
- time (using the c99 -D YYDEBUG option, for example),
- YYDEBUG shall be set explicitly to 1. When -t is not
- specified, the code file shall be built such that, if
- YYDEBUG is not already defined, it shall be set explic-
- itly to zero.
-
- The format of the debugging output is unspecified but
- includes at least enough information to determine the
- shift and reduce actions, and the input symbols. It also
- provides information about error recovery.
-
- Algorithms
- The parser constructed by yacc implements an LALR(1)
- parsing algorithm as documented in the literature. It is
- unspecified whether the parser is table-driven or
- direct-coded.
-
- A parser generated by yacc shall never request an input
- symbol from yylex() while in a state where the only
- actions other than the error action are reductions by a
- single rule.
-
- The literature of parsing theory defines these concepts.
-
- Limits
- The yacc utility may have several internal tables. The
- minimum maximums for these tables are shown in the fol-
- lowing table. The exact meaning of these values is
- implementation-defined. The implementation shall define
- the relationship between these values and between them
- and any error messages that the implementation may gen-
- erate should it run out of space for any internal struc-
- ture. An implementation may combine groups of these
- resources into a single pool as long as the total avail-
- able to the user does not fall below the sum of the
- sizes specified by this section.
-
- Table: Internal Limits in yacc
-
- Minimum
- Limit Maximum Description
- {NTERMS} 126 Number of tokens.
- {NNONTERM} 200 Number of non-terminals.
- {NPROD} 300 Number of rules.
- {NSTATES} 600 Number of states.
- {MEMSIZE} 5200 Length of rules. The total length, in
- names (tokens and non-terminals), of all
- the rules of the grammar. The left-hand
- side is counted for each rule, even if
- it is not explicitly repeated, as speci-
- fied in Grammar Rules in yacc .
- {ACTSIZE} 4000 Number of actions. "Actions" here (and
- in the description file) refer to parser
- actions (shift, reduce, and so on) not
- to semantic actions defined in Grammar
- Rules in yacc .
-
-EXIT STATUS
- The following exit values shall be returned:
-
- 0 Successful completion.
-
- >0 An error occurred.
-
-
-CONSEQUENCES OF ERRORS
- If any errors are encountered, the run is aborted and
- yacc exits with a non-zero status. Partial code files
- and header files may be produced. The summary informa-
- tion in the description file shall always be produced if
- the -v flag is present.
-
- The following sections are informative.
-
-APPLICATION USAGE
- Historical implementations experience name conflicts on
- the names yacc.tmp, yacc.acts, yacc.debug, y.tab.c,
- y.tab.h, and y.output if more than one copy of yacc is
- running in a single directory at one time. The -b option
- was added to overcome this problem. The related problem
- of allowing multiple yacc parsers to be placed in the
- same file was addressed by adding a -p option to over-
- ride the previously hard-coded yy variable prefix.
-
- The description of the -p option specifies the minimal
- set of function and variable names that cause conflict
- when multiple parsers are linked together. YYSTYPE does
- not need to be changed. Instead, the programmer can use
- -b to give the header files for different parsers dif-
- ferent names, and then the file with the yylex() for a
- given parser can include the header for that parser.
- Names such as yyclearerr do not need to be changed
- because they are used only in the actions; they do not
- have linkage. It is possible that an implementation has
- other names, either internal ones for implementing
- things such as yyclearerr, or providing non-standard
- features that it wants to change with -p.
-
- Unary operators that are the same token as a binary
- operator in general need their precedence adjusted. This
- is handled by the %prec advisory symbol associated with
- the particular grammar rule defining that unary opera-
- tor. (See Grammar Rules in yacc .) Applications are not
- required to use this operator for unary operators, but
- the grammars that do not require it are rare.
-
-EXAMPLES
- Access to the yacc library is obtained with library
- search operands to c99. To use the yacc library main():
-
-
- c99 y.tab.c -l y
-
- Both the lex library and the yacc library contain
- main(). To access the yacc main():
-
-
- c99 y.tab.c lex.yy.c -l y -l l
-
- This ensures that the yacc library is searched first, so
- that its main() is used.
-
- The historical yacc libraries have contained two simple
- functions that are normally coded by the application
- programmer. These functions are similar to the follow-
- ing code:
-
-
- #include <locale.h>
- int main(void)
- {
- extern int yyparse();
-
-
- setlocale(LC_ALL, "");
-
-
- /* If the following parser is one created by lex, the
- application must be careful to ensure that LC_CTYPE
- and LC_COLLATE are set to the POSIX locale. */
- (void) yyparse();
- return (0);
- }
-
-
- #include <stdio.h>
-
-
- int yyerror(const char *msg)
- {
- (void) fprintf(stderr, "%s\n", msg);
- return (0);
- }
-
-RATIONALE
- The references in may be helpful in constructing the
- parser generator. The referenced DeRemer and Pennello
- article (along with the works it references) describes a
- technique to generate parsers that conform to this vol-
- ume of IEEE Std 1003.1-2001. Work in this area contin-
- ues to be done, so implementors should consult current
- literature before doing any new implementations. The
- original Knuth article is the theoretical basis for this
- kind of parser, but the tables it generates are imprac-
- tically large for reasonable grammars and should not be
- used. The "equivalent to" wording is intentional to
- assure that the best tables that are LALR(1) can be gen-
- erated.
-
- There has been confusion between the class of grammars,
- the algorithms needed to generate parsers, and the algo-
- rithms needed to parse the languages. They are all rea-
- sonably orthogonal. In particular, a parser generator
- that accepts the full range of LR(1) grammars need not
- generate a table any more complex than one that accepts
- SLR(1) (a relatively weak class of LR grammars) for a
- grammar that happens to be SLR(1). Such an implementa-
- tion need not recognize the case, either; table compres-
- sion can yield the SLR(1) table (or one even smaller
- than that) without recognizing that the grammar is
- SLR(1). The speed of an LR(1) parser for any class is
- dependent more upon the table representation and com-
- pression (or the code generation if a direct parser is
- generated) than upon the class of grammar that the table
- generator handles.
-
- The speed of the parser generator is somewhat dependent
- upon the class of grammar it handles. However, the orig-
- inal Knuth article algorithms for constructing LR
- parsers were judged by its author to be impractically
- slow at that time. Although full LR is more complex than
- LALR(1), as computer speeds and algorithms improve, the
- difference (in terms of acceptable wall-clock execution
- time) is becoming less significant.
-
- Potential authors are cautioned that the referenced
- DeRemer and Pennello article previously cited identifies
- a bug (an over-simplification of the computation of
- LALR(1) lookahead sets) in some of the LALR(1) algorithm
- statements that preceded it to publication. They should
- take the time to seek out that paper, as well as current
- relevant work, particularly Aho's.
-
- The -b option was added to provide a portable method for
- permitting yacc to work on multiple separate parsers in
- the same directory. If a directory contains more than
- one yacc grammar, and both grammars are constructed at
- the same time (by, for example, a parallel make pro-
- gram), conflict results. While the solution is not his-
- torical practice, it corrects a known deficiency in his-
- torical implementations. Corresponding changes were made
- to all sections that referenced the filenames y.tab.c
- (now "the code file"), y.tab.h (now "the header file"),
- and y.output (now "the description file").
-
- The grammar for yacc input is based on System V documen-
- tation. The textual description shows there that the
- ';' is required at the end of the rule. The grammar and
- the implementation do not require this. (The use of
- C_IDENTIFIER causes a reduce to occur in the right
- place.)
-
- Also, in that implementation, the constructs such as
- %token can be terminated by a semicolon, but this is not
- permitted by the grammar. The keywords such as %token
- can also appear in uppercase, which is again not dis-
- cussed. In most places where '%' is used, '\' can be
- substituted, and there are alternate spellings for some
- of the symbols (for example, %LEFT can be "%<" or even
- "\<" ).
-
- Historically, <tag> can contain any characters except
- '>', including white space, in the implementation. How-
- ever, since the tag must reference an ISO C standard
- union member, in practice conforming implementations
- need to support only the set of characters for ISO C
- standard identifiers in this context.
-
- Some historical implementations are known to accept
- actions that are terminated by a period. Historical
- implementations often allow '$' in names. A conforming
- implementation does not need to support either of these
- behaviors.
-
- Deciding when to use %prec illustrates the difficulty in
- specifying the behavior of yacc. There may be situations
- in which the grammar is not, strictly speaking, in
- error, and yet yacc cannot interpret it unambiguously.
- The resolution of ambiguities in the grammar can in many
- instances be resolved by providing additional informa-
- tion, such as using %type or %union declarations. It is
- often easier and it usually yields a smaller parser to
- take this alternative when it is appropriate.
-
- The size and execution time of a program produced with-
- out the runtime debugging code is usually smaller and
- slightly faster in historical implementations.
-
- Statistics messages from several historical implementa-
- tions include the following types of information:
-
-
- n/512 terminals, n/300 non-terminals
- n/600 grammar rules, n/1500 states
- n shift/reduce, n reduce/reduce conflicts reported
- n/350 working sets used
- Memory: states, etc. n/15000, parser n/15000
- n/600 distinct lookahead sets
- n extra closures
- n shift entries, n exceptions
- n goto entries
- n entries saved by goto default
- Optimizer space used: input n/15000, output n/15000
- n table entries, n zero
- Maximum spread: n, Maximum offset: n
-
- The report of internal tables in the description file is
- left implementation-defined because all aspects of these
- limits are also implementation-defined. Some implementa-
- tions may use dynamic allocation techniques and have no
- specific limit values to report.
-
- The format of the y.output file is not given because
- specification of the format was not seen to enhance
- applications portability. The listing is primarily
- intended to help human users understand and debug the
- parser; use of y.output by a conforming application
- script would be unusual. Furthermore, implementations
- have not produced consistent output and no popular for-
- mat was apparent. The format selected by the implementa-
- tion should be human-readable, in addition to the
- requirement that it be a text file.
-
- Standard error reports are not specifically described
- because they are seldom of use to conforming applica-
- tions and there was no reason to restrict implementa-
- tions.
-
- Some implementations recognize "={" as equivalent to '{'
- because it appears in historical documentation. This
- construction was recognized and documented as obsolete
- as long ago as 1978, in the referenced Yacc: Yet Another
- Compiler-Compiler. This volume of IEEE Std 1003.1-2001
- chose to leave it as obsolete and omit it.
-
- Multi-byte characters should be recognized by the lexi-
- cal analyzer and returned as tokens. They should not be
- returned as multi-byte character literals. The token
- error that is used for error recovery is normally
- assigned the value 256 in the historical implementation.
- Thus, the token value 256, which is used in many multi-
- byte character sets, is not available for use as the
- value of a user-defined token.
-
-FUTURE DIRECTIONS
- None.
-
-SEE ALSO
- c99, lex
-
-COPYRIGHT
- Portions of this text are reprinted and reproduced in
- electronic form from IEEE Std 1003.1, 2003 Edition,
- Standard for Information Technology -- Portable Operat-
- ing System Interface (POSIX), The Open Group Base Speci-
- fications Issue 6, Copyright (C) 2001-2003 by the Insti-
- tute of Electrical and Electronics Engineers, Inc and
- The Open Group. In the event of any discrepancy between
- this version and the original IEEE and The Open Group
- Standard, the original IEEE and The Open Group Standard
- is the referee document. The original Standard can be
- obtained online at http://www.open-
- group.org/unix/online.html .
-
-
-
-IEEE/The Open Group 2003 YACC(1P)