path: root/gnuwin32/man/cat1p/yacc.1p.txt
diff options
Diffstat (limited to 'gnuwin32/man/cat1p/yacc.1p.txt')
1 files changed, 1269 insertions, 0 deletions
diff --git a/gnuwin32/man/cat1p/yacc.1p.txt b/gnuwin32/man/cat1p/yacc.1p.txt
new file mode 100644
index 00000000..cab30089
--- /dev/null
+++ b/gnuwin32/man/cat1p/yacc.1p.txt
@@ -0,0 +1,1269 @@
+YACC(1P) POSIX Programmer's Manual YACC(1P)
+ This manual page is part of the POSIX Programmer's Man-
+ ual. The Linux implementation of this interface may
+ differ (consult the corresponding Linux manual page for
+ details of Linux behavior), or the interface may not be
+ implemented on Linux.
+ yacc - yet another compiler compiler (DEVELOPMENT)
+ yacc [-dltv][-b file_prefix][-p sym_prefix] grammar
+ The yacc utility shall read a description of a context-
+ free grammar in grammar and write C source code, con-
+ forming to the ISO C standard, to a code file, and
+ optionally header information into a header file, in the
+ current directory. The C code shall define a function
+ and related routines and macros for an automaton that
+ executes a parsing algorithm meeting the requirements in
+ Algorithms .
+ The form and meaning of the grammar are described in the
+ The C source code and header file shall be produced in a
+ form suitable as input for the C compiler (see c99 ).
+ The yacc utility shall conform to the Base Definitions
+ volume of IEEE Std 1003.1-2001, Section 12.2, Utility
+ Syntax Guidelines.
+ The following options shall be supported:
+ -b file_prefix
+ Use file_prefix instead of y as the prefix for
+ all output filenames. The code file, the
+ header file (created when -d is speci-
+ fied), and the description file y.output (created
+ when -v is specified), shall be changed to
+ file_prefix .tab.c, file_prefix .tab.h, and
+ file_prefix .output, respectively.
+ -d Write the header file; by default only the code
+ file is written. The #define statements associate
+ the token codes assigned by yacc with the user-
+ declared token names. This allows source files
+ other than to access the token codes.
+ -l Produce a code file that does not contain any
+ #line constructs. If this option is not present,
+ it is unspecified whether the code file or header
+ file contains #line directives. This should only
+ be used after the grammar and the associated
+ actions are fully debugged.
+ -p sym_prefix
+ Use sym_prefix instead of yy as the prefix for
+ all external names produced by yacc. The names
+ affected shall include the functions yyparse(),
+ yylex(), and yyerror(), and the variables yylval,
+ yychar, and yydebug. (In the remainder of this
+ section, the six symbols cited are referenced
+ using their default names only as a notational
+ convenience.) Local names may also be affected by
+ the -p option; however, the -p option shall not
+ affect #define symbols generated by yacc.
+ -t Modify conditional compilation directives to per-
+ mit compilation of debugging code in the code
+ file. Runtime debugging statements shall always
+ be contained in the code file, but by default
+ conditional compilation directives prevent their
+ compilation.
+ -v Write a file containing a description of the
+ parser and a report of conflicts generated by
+ ambiguities in the grammar.
+ The following operand is required:
+ grammar
+ A pathname of a file containing instructions,
+ hereafter called grammar, for which a parser is
+ to be created. The format for the grammar is
+ described in the EXTENDED DESCRIPTION section.
+ Not used.
+ The file grammar shall be a text file formatted as spec-
+ ified in the EXTENDED DESCRIPTION section.
+ The following environment variables shall affect the
+ execution of yacc:
+ LANG Provide a default value for the internationaliza-
+ tion variables that are unset or null. (See the
+ Base Definitions volume of IEEE Std 1003.1-2001,
+ Section 8.2, Internationalization Variables for
+ the precedence of internationalization variables
+ used to determine the values of locale cate-
+ gories.)
+ LC_ALL If set to a non-empty string value, override the
+ values of all the other internationalization
+ variables.
+ Determine the locale for the interpretation of
+ sequences of bytes of text data as characters
+ (for example, single-byte as opposed to multi-
+ byte characters in arguments and input files).
+ Determine the locale that should be used to
+ affect the format and contents of diagnostic mes-
+ sages written to standard error.
+ Determine the location of message catalogs for
+ the processing of LC_MESSAGES .
+ The LANG and LC_* variables affect the execution of the
+ yacc utility as stated. The main() function defined in
+ Yacc Library shall call:
+ setlocale(LC_ALL, "")
+ and thus the program generated by yacc shall also be
+ affected by the contents of these variables at runtime.
+ Default.
+ Not used.
+ If shift/reduce or reduce/reduce conflicts are detected
+ in grammar, yacc shall write a report of those conflicts
+ to the standard error in an unspecified format.
+ Standard error shall also be used for diagnostic mes-
+ sages.
+ The code file, the header file, and the description file
+ shall be text files. All are described in the following
+ sections.
+ Code File
+ This file shall contain the C source code for the
+ yyparse() function. It shall contain code for the vari-
+ ous semantic actions with macro substitution performed
+ on them as described in the EXTENDED DESCRIPTION sec-
+ tion. It also shall contain a copy of the #define state-
+ ments in the header file. If a %union declaration is
+ used, the declaration for YYSTYPE shall also be included
+ in this file.
+ Header File
+ The header file shall contain #define statements that
+ associate the token numbers with the token names. This
+ allows source files other than the code file to access
+ the token codes. If a %union declaration is used, the
+ declaration for YYSTYPE and an extern YYSTYPE yylval
+ declaration shall also be included in this file.
+ Description File
+ The description file shall be a text file containing a
+ description of the state machine corresponding to the
+ parser, using an unspecified format. Limits for internal
+ tables (see Limits ) shall also be reported, in an
+ implementation-defined manner. (Some implementations may
+ use dynamic allocation techniques and have no specific
+ limit values to report.)
+ The yacc command accepts a language that is used to
+ define a grammar for a target language to be parsed by
+ the tables and code generated by yacc. The language
+ accepted by yacc as a grammar for the target language is
+ described below using the yacc input language itself.
+ The input grammar includes rules describing the input
+ structure of the target language and code to be invoked
+ when these rules are recognized to provide the associ-
+ ated semantic action. The code to be executed shall
+ appear as bodies of text that are intended to be C-lan-
+ guage code. The C-language inclusions are presumed to
+ form a correct function when processed by yacc into its
+ output files. The code included in this way shall be
+ executed during the recognition of the target language.
+ Given a grammar, the yacc utility generates the files
+ described in the OUTPUT FILES section. The code file can
+ be compiled and linked using c99. If the declaration and
+ programs sections of the grammar file did not include
+ definitions of main(), yylex(), and yyerror(), the com-
+ piled output requires linking with externally supplied
+ versions of those functions. Default versions of main()
+ and yyerror() are supplied in the yacc library and can
+ be linked in by using the -l y operand to c99. The yacc
+ library interfaces need not support interfaces with
+ other than the default yy symbol prefix. The application
+ provides the lexical analyzer function, yylex(); the lex
+ utility is specifically designed to generate such a rou-
+ tine.
+ Input Language
+ The application shall ensure that every specification
+ file consists of three sections in order: declarations,
+ grammar rules, and programs, separated by double percent
+ signs ( "%%" ). The declarations and programs sections
+ can be empty. If the latter is empty, the preceding "%%"
+ mark separating it from the rules section can be omit-
+ ted.
+ The input is free form text following the structure of
+ the grammar defined below.
+ Lexical Structure of the Grammar
+ The <blank>s, <newline>s, and <form-feed>s shall be
+ ignored, except that the application shall ensure that
+ they do not appear in names or multi-character reserved
+ symbols. Comments shall be enclosed in "/* ... */", and
+ can appear wherever a name is valid.
+ Names are of arbitrary length, made up of letters, peri-
+ ods ( '.' ), underscores ( '_' ), and non-initial dig-
+ its. Uppercase and lowercase letters are distinct. Con-
+ forming applications shall not use names beginning in yy
+ or YY since the yacc parser uses such names. Many of the
+ names appear in the final output of yacc, and thus they
+ should be chosen to conform with any additional rules
+ created by the C compiler to be used. In particular they
+ appear in #define statements.
+ A literal shall consist of a single character enclosed
+ in single-quotes ( '" ). All of the escape sequences
+ supported for character constants by the ISO C standard
+ shall be supported by yacc.
+ The relationship with the lexical analyzer is discussed
+ in detail below.
+ The application shall ensure that the NUL character is
+ not used in grammar rules or literals.
+ Declarations Section
+ The declarations section is used to define the symbols
+ used to define the target language and their relation-
+ ship with each other. In particular, much of the addi-
+ tional information required to resolve ambiguities in
+ the context-free grammar for the target language is pro-
+ vided here.
+ Usually yacc assigns the relationship between the sym-
+ bolic names it generates and their underlying numeric
+ value. The declarations section makes it possible to
+ control the assignment of these values.
+ It is also possible to keep semantic information associ-
+ ated with the tokens currently on the parse stack in a
+ user-defined C-language union, if the members of the
+ union are associated with the various names in the gram-
+ mar. The declarations section provides for this as well.
+ The first group of declarators below all take a list of
+ names as arguments. That list can optionally be pre-
+ ceded by the name of a C union member (called a tag
+ below) appearing within '<' and '>' . (As an exception
+ to the typographical conventions of the rest of this
+ volume of IEEE Std 1003.1-2001, in this case <tag> does
+ not represent a metavariable, but the literal angle
+ bracket characters surrounding a symbol.) The use of tag
+ specifies that the tokens named on this line shall be of
+ the same C type as the union member referenced by tag.
+ This is discussed in more detail below.
+ For lists used to define tokens, the first appearance of
+ a given token can be followed by a positive integer (as
+ a string of decimal digits). If this is done, the under-
+ lying value assigned to it for lexical purposes shall be
+ taken to be that number.
+ The following declares name to be a token:
+ %token [<tag>] name [number][name [number]]...
+ If tag is present, the C type for all tokens on this
+ line shall be declared to be the type referenced by tag.
+ If a positive integer, number, follows a name, that
+ value shall be assigned to the token.
+ The following declares name to be a token, and assigns
+ precedence to it:
+ %left [<tag>] name [number][name [number]]...
+ %right [<tag>] name [number][name [number]]...
+ One or more lines, each beginning with one of these sym-
+ bols, can appear in this section. All tokens on the same
+ line have the same precedence level and associativity;
+ the lines are in order of increasing precedence or bind-
+ ing strength. %left denotes that the operators on that
+ line are left associative, and %right similarly denotes
+ right associative operators. If tag is present, it shall
+ declare a C type for names as described for %token.
+ The following declares name to be a token, and indicates
+ that this cannot be used associatively:
+ %nonassoc [<tag>] name [number][name [number]]...
+ If the parser encounters associative use of this token
+ it reports an error. If tag is present, it shall declare
+ a C type for names as described for %token.
+ The following declares that union member names are non-
+ terminals, and thus it is required to have a tag field
+ at its beginning:
+ %type <tag> name...
+ Because it deals with non-terminals only, assigning a
+ token number or using a literal is also prohibited. If
+ this construct is present, yacc shall perform type
+ checking; if this construct is not present, the parse
+ stack shall hold only the int type.
+ Every name used in grammar not defined by a %token,
+ %left, %right, or %nonassoc declaration is assumed to
+ represent a non-terminal symbol. The yacc utility shall
+ report an error for any non-terminal symbol that does
+ not appear on the left side of at least one grammar
+ rule.
+ Once the type, precedence, or token number of a name is
+ specified, it shall not be changed. If the first decla-
+ ration of a token does not assign a token number, yacc
+ shall assign a token number. Once this assignment is
+ made, the token number shall not be changed by explicit
+ assignment.
+ The following declarators do not follow the previous
+ pattern.
+ The following declares the non-terminal name to be the
+ start symbol, which represents the largest, most general
+ structure described by the grammar rules:
+ %start name
+ By default, it is the left-hand side of the first gram-
+ mar rule; this default can be overridden with this dec-
+ laration.
+ The following declares the yacc value stack to be a
+ union of the various types of values desired:
+ %union { body of union (in C) }
+ By default, the values returned by actions (see below)
+ and the lexical analyzer shall be of type int. The yacc
+ utility keeps track of types, and it shall insert corre-
+ sponding union member names in order to perform strict
+ type checking of the resulting parser.
+ Alternatively, given that at least one <tag> construct
+ is used, the union can be declared in a header file
+ (which shall be included in the declarations section by
+ using a #include construct within %{ and %}), and a
+ typedef used to define the symbol YYSTYPE to represent
+ this union. The effect of %union is to provide the dec-
+ laration of YYSTYPE directly from the yacc input.
+ C-language declarations and definitions can appear in
+ the declarations section, enclosed by the following
+ marks:
+ %{ ... %}
+ These statements shall be copied into the code file, and
+ have global scope within it so that they can be used in
+ the rules and program sections.
+ The application shall ensure that the declarations sec-
+ tion is terminated by the token %%.
+ Grammar Rules in yacc
+ The rules section defines the context-free grammar to be
+ accepted by the function yacc generates, and associates
+ with those rules C-language actions and additional
+ precedence information. The grammar is described below,
+ and a formal definition follows.
+ The rules section is comprised of one or more grammar
+ rules. A grammar rule has the form:
+ A : BODY ;
+ The symbol A represents a non-terminal name, and BODY
+ represents a sequence of zero or more names, literals,
+ and semantic actions that can then be followed by
+ optional precedence rules. Only the names and literals
+ participate in the formation of the grammar; the seman-
+ tic actions and precedence rules are used in other ways.
+ The colon and the semicolon are yacc punctuation. If
+ there are several successive grammar rules with the same
+ left-hand side, the vertical bar '|' can be used to
+ avoid rewriting the left-hand side; in this case the
+ semicolon appears only after the last rule. The BODY
+ part can be empty (or empty of names and literals) to
+ indicate that the non-terminal symbol matches the empty
+ string.
+ The yacc utility assigns a unique number to each rule.
+ Rules using the vertical bar notation are distinct
+ rules. The number assigned to the rule appears in the
+ description file.
+ The elements comprising a BODY are:
+ name, literal
+ These form the rules of the grammar: name is
+ either a token or a non-terminal; literal stands
+ for itself (less the lexically required quotation
+ marks).
+ semantic action
+ With each grammar rule, the user can associate
+ actions to be performed each time the rule is
+ recognized in the input process. (Note that the
+ word "action" can also refer to the actions of
+ the parser-shift, reduce, and so on.)
+ These actions can return values and can obtain the val-
+ ues returned by previous actions. These values are kept
+ in objects of type YYSTYPE (see %union). The result
+ value of the action shall be kept on the parse stack
+ with the left-hand side of the rule, to be accessed by
+ other reductions as part of their right-hand side. By
+ using the <tag> information provided in the declarations
+ section, the code generated by yacc can be strictly type
+ checked and contain arbitrary information. In addition,
+ the lexical analyzer can provide the same kinds of val-
+ ues for tokens, if desired.
+ An action is an arbitrary C statement and as such can do
+ input or output, call subprograms, and alter external
+ variables. An action is one or more C statements
+ enclosed in curly braces '{' and '}' .
+ Certain pseudo-variables can be used in the action.
+ These are macros for access to data structures known
+ internally to yacc.
+ $$
+ The value of the action can be set by assigning
+ it to $$. If type checking is enabled and the
+ type of the value to be assigned cannot be deter-
+ mined, a diagnostic message may be generated.
+ $number
+ This refers to the value returned by the compo-
+ nent specified by the token number in the right
+ side of a rule, reading from left to right; num-
+ ber can be zero or negative. If number is zero or
+ negative, it refers to the data associated with
+ the name on the parser's stack preceding the
+ leftmost symbol of the current rule. (That is,
+ "$0" refers to the name immediately preceding the
+ leftmost name in the current rule to be found on
+ the parser's stack and "$-1" refers to the symbol
+ to its left.) If number refers to an element past
+ the current point in the rule, or beyond the bot-
+ tom of the stack, the result is undefined. If
+ type checking is enabled and the type of the
+ value to be assigned cannot be determined, a
+ diagnostic message may be generated.
+ $<tag>number
+ These correspond exactly to the corresponding
+ symbols without the tag inclusion, but allow for
+ strict type checking (and preclude unwanted type
+ conversions). The effect is that the macro is
+ expanded to use tag to select an element from the
+ YYSTYPE union (using dataname.tag). This is par-
+ ticularly useful if number is not positive.
+ $<tag>$
+ This imposes on the reference the type of the
+ union member referenced by tag. This construction
+ is applicable when a reference to a left context
+ value occurs in the grammar, and provides yacc
+ with a means for selecting a type.
+ Actions can occur anywhere in a rule (not just at the
+ end); an action can access values returned by actions to
+ its left, and in turn the value it returns can be
+ accessed by actions to its right. An action appearing
+ in the middle of a rule shall be equivalent to replacing
+ the action with a new non-terminal symbol and adding an
+ empty rule with that non-terminal symbol on the left-
+ hand side. The semantic action associated with the new
+ rule shall be equivalent to the original action. The use
+ of actions within rules might introduce conflicts that
+ would not otherwise exist.
+ By default, the value of a rule shall be the value of
+ the first element in it. If the first element does not
+ have a type (particularly in the case of a literal) and
+ type checking is turned on by %type, an error message
+ shall result.
+ precedence
+ The keyword %prec can be used to change the
+ precedence level associated with a particular
+ grammar rule. Examples of this are in cases where
+ a unary and binary operator have the same sym-
+ bolic representation, but need to be given dif-
+ ferent precedences, or where the handling of an
+ ambiguous if-else construction is necessary. The
+ reserved symbol %prec can appear immediately
+ after the body of the grammar rule and can be
+ followed by a token name or a literal. It shall
+ cause the precedence of the grammar rule to
+ become that of the following token name or lit-
+ eral. The action for the rule as a whole can fol-
+ low %prec.
+ If a program section follows, the application shall
+ ensure that the grammar rules are terminated by %%.
+ Programs Section
+ The programs section can include the definition of the
+ lexical analyzer yylex(), and any other functions; for
+ example, those used in the actions specified in the
+ grammar rules. It is unspecified whether the programs
+ section precedes or follows the semantic actions in the
+ output file; therefore, if the application contains any
+ macro definitions and declarations intended to apply to
+ the code in the semantic actions, it shall place them
+ within "%{ ... %}" in the declarations section.
+ Input Grammar
+ The following input to yacc yields a parser for the
+ input to yacc. This formal syntax takes precedence over
+ the preceding text syntax description.
+ The lexical structure is defined less precisely; Lexical
+ Structure of the Grammar defines most terms. The corre-
+ spondence between the previous terms and the tokens
+ below is as follows.
+ This corresponds to the concept of name, given
+ previously. It also includes literals as defined
+ previously.
+ This is a name, and additionally it is known to
+ be followed by a colon. A literal cannot yield
+ this token.
+ NUMBER A string of digits (a non-negative decimal inte-
+ ger).
+ These correspond directly to %type, %left, %%,
+ %{, and %}.
+ { ... }
+ This indicates C-language source code, with the
+ possible inclusion of '$' macros as discussed
+ previously.
+ /* Grammar for the input to yacc. */
+ /* Basic entries. */
+ /* The following are recognized by the lexical analyzer. */
+ %token IDENTIFIER /* Includes identifiers and literals */
+ %token C_IDENTIFIER /* identifier (but not literal)
+ followed by a :. */
+ %token NUMBER /* [0-9][0-9]* */
+ /* Reserved words : %type=>TYPE %left=>LEFT, and so on */
+ %token MARK /* The %% mark. */
+ %token LCURL /* The %{ mark. */
+ %token RCURL /* The %} mark. */
+ /* 8-bit character literals stand for themselves; */
+ /* tokens have to be defined for multi-byte characters. */
+ %start spec
+ %%
+ spec : defs MARK rules tail
+ ;
+ tail : MARK
+ {
+ /* In this action, set up the rest of the file. */
+ }
+ | /* Empty; the second MARK is optional. */
+ ;
+ defs : /* Empty. */
+ | defs def
+ ;
+ {
+ /* Copy union definition to output. */
+ }
+ {
+ /* Copy C code to output file. */
+ }
+ | rword tag nlist
+ ;
+ rword : TOKEN
+ | LEFT
+ | TYPE
+ ;
+ tag : /* Empty: union tag ID optional. */
+ | '<' IDENTIFIER '>'
+ ;
+ nlist : nmno
+ | nlist nmno
+ ;
+ nmno : IDENTIFIER /* Note: literal invalid with % type. */
+ | IDENTIFIER NUMBER /* Note: invalid with % type. */
+ ;
+ /* Rule section */
+ rules : C_IDENTIFIER rbody prec
+ | rules rule
+ ;
+ rule : C_IDENTIFIER rbody prec
+ | '|' rbody prec
+ ;
+ rbody : /* empty */
+ | rbody IDENTIFIER
+ | rbody act
+ ;
+ act : '{'
+ {
+ /* Copy action, translate $$, and so on. */
+ }
+ '}'
+ ;
+ prec : /* Empty */
+ | prec ';'
+ ;
+ Conflicts
+ The parser produced for an input grammar may contain
+ states in which conflicts occur. The conflicts occur
+ because the grammar is not LALR(1). An ambiguous grammar
+ always contains at least one LALR(1) conflict. The yacc
+ utility shall resolve all conflicts, using either
+ default rules or user-specified precedence rules.
+ Conflicts are either shift/reduce conflicts or
+ reduce/reduce conflicts. A shift/reduce conflict is
+ where, for a given state and lookahead symbol, both a
+ shift action and a reduce action are possible. A
+ reduce/reduce conflict is where, for a given state and
+ lookahead symbol, reductions by two different rules are
+ possible.
+ The rules below describe how to specify what actions to
+ take when a conflict occurs. Not all shift/reduce con-
+ flicts can be successfully resolved this way because the
+ conflict may be due to something other than ambiguity,
+ so incautious use of these facilities can cause the lan-
+ guage accepted by the parser to be much different from
+ that which was intended. The description file shall con-
+ tain sufficient information to understand the cause of
+ the conflict. Where ambiguity is the reason either the
+ default or explicit rules should be adequate to produce
+ a working parser.
+ The declared precedences and associativities (see Decla-
+ rations Section ) are used to resolve parsing conflicts
+ as follows:
+ 1. A precedence and associativity is associated with
+ each grammar rule; it is the precedence and associa-
+ tivity of the last token or literal in the body of
+ the rule. If the %prec keyword is used, it overrides
+ this default. Some grammar rules might not have both
+ precedence and associativity.
+ 2. If there is a shift/reduce conflict, and both the
+ grammar rule and the input symbol have precedence
+ and associativity associated with them, then the
+ conflict is resolved in favor of the action (shift
+ or reduce) associated with the higher precedence. If
+ the precedences are the same, then the associativity
+ is used; left associative implies reduce, right
+ associative implies shift, and non-associative
+ implies an error in the string being parsed.
+ 3. When there is a shift/reduce conflict that cannot be
+ resolved by rule 2, the shift is done. Conflicts
+ resolved this way are counted in the diagnostic out-
+ put described in Error Handling .
+ 4. When there is a reduce/reduce conflict, a reduction
+ is done by the grammar rule that occurs earlier in
+ the input sequence. Conflicts resolved this way are
+ counted in the diagnostic output described in Error
+ Handling .
+ Conflicts resolved by precedence or associativity shall
+ not be counted in the shift/reduce and reduce/reduce
+ conflicts reported by yacc on either standard error or
+ in the description file.
+ Error Handling
+ The token error shall be reserved for error handling.
+ The name error can be used in grammar rules. It indi-
+ cates places where the parser can recover from a syntax
+ error. The default value of error shall be 256. Its
+ value can be changed using a %token declaration. The
+ lexical analyzer should not return the value of error.
+ The parser shall detect a syntax error when it is in a
+ state where the action associated with the lookahead
+ symbol is error. A semantic action can cause the parser
+ to initiate error handling by executing the macro YYER-
+ ROR. When YYERROR is executed, the semantic action
+ passes control back to the parser. YYERROR cannot be
+ used outside of semantic actions.
+ When the parser detects a syntax error, it normally
+ calls yyerror() with the character string "syntax error"
+ as its argument. The call shall not be made if the
+ parser is still recovering from a previous error when
+ the error is detected. The parser is considered to be
+ recovering from a previous error until the parser has
+ shifted over at least three normal input symbols since
+ the last error was detected or a semantic action has
+ executed the macro yyerrok. The parser shall not call
+ yyerror() when YYERROR is executed.
+ The macro function YYRECOVERING shall return 1 if a syn-
+ tax error has been detected and the parser has not yet
+ fully recovered from it. Otherwise, zero shall be
+ returned.
+ When a syntax error is detected by the parser, the
+ parser shall check if a previous syntax error has been
+ detected. If a previous error was detected, and if no
+ normal input symbols have been shifted since the preced-
+ ing error was detected, the parser checks if the looka-
+ head symbol is an endmarker (see Interface to the Lexi-
+ cal Analyzer ). If it is, the parser shall return with a
+ non-zero value. Otherwise, the lookahead symbol shall be
+ discarded and normal parsing shall resume.
+ When YYERROR is executed or when the parser detects a
+ syntax error and no previous error has been detected, or
+ at least one normal input symbol has been shifted since
+ the previous error was detected, the parser shall pop
+ back one state at a time until the parse stack is empty
+ or the current state allows a shift over error. If the
+ parser empties the parse stack, it shall return with a
+ non-zero value. Otherwise, it shall shift over error and
+ then resume normal parsing. If the parser reads a looka-
+ head symbol before the error was detected, that symbol
+ shall still be the lookahead symbol when parsing is
+ resumed.
+ The macro yyerrok in a semantic action shall cause the
+ parser to act as if it has fully recovered from any pre-
+ vious errors. The macro yyclearin shall cause the parser
+ to discard the current lookahead token. If the current
+ lookahead token has not yet been read, yyclearin shall
+ have no effect.
+ The macro YYACCEPT shall cause the parser to return with
+ the value zero. The macro YYABORT shall cause the parser
+ to return with a non-zero value.
+ Interface to the Lexical Analyzer
+ The yylex() function is an integer-valued function that
+ returns a token number representing the kind of token
+ read. If there is a value associated with the token
+ returned by yylex() (see the discussion of tag above),
+ it shall be assigned to the external variable yylval.
+ If the parser and yylex() do not agree on these token
+ numbers, reliable communication between them cannot
+ occur. For (single-byte character) literals, the token
+ is simply the numeric value of the character in the cur-
+ rent character set. The numbers for other tokens can
+ either be chosen by yacc, or chosen by the user. In
+ either case, the #define construct of C is used to allow
+ yylex() to return these numbers symbolically. The
+ #define statements are put into the code file, and the
+ header file if that file is requested. The set of char-
+ acters permitted by yacc in an identifier is larger than
+ that permitted by C. Token names found to contain such
+ characters shall not be included in the #define declara-
+ tions.
+ If the token numbers are chosen by yacc, the tokens
+ other than literals shall be assigned numbers greater
+ than 256, although no order is implied. A token can be
+ explicitly assigned a number by following its first
+ appearance in the declarations section with a number.
+ Names and literals not defined this way retain their
+ default definition. All token numbers assigned by yacc
+ shall be unique and distinct from the token numbers used
+ for literals and user-assigned tokens. If duplicate
+ token numbers cause conflicts in parser generation, yacc
+ shall report an error; otherwise, it is unspecified
+ whether the token assignment is accepted or an error is
+ reported.
+ The end of the input is marked by a special token called
+ the endmarker, which has a token number that is zero or
+ negative. (These values are invalid for any other
+ token.) All lexical analyzers shall return zero or nega-
+ tive as a token number upon reaching the end of their
+ input. If the tokens up to, but excluding, the endmarker
+ form a structure that matches the start symbol, the
+ parser shall accept the input. If the endmarker is seen
+ in any other context, it shall be considered an error.
+ Completing the Program
+ In addition to yyparse() and yylex(), the functions
+ yyerror() and main() are required to make a complete
+ program. The application can supply main() and yyer-
+ ror(), or those routines can be obtained from the yacc
+ library.
+ Yacc Library
+ The following functions shall appear only in the yacc
+ library accessible through the -l y operand to c99; they
+ can therefore be redefined by a conforming application:
+ int main(void)
+ This function shall call yyparse() and exit with
+ an unspecified value. Other actions within this
+ function are unspecified.
+ int yyerror(const char *s)
+ This function shall write the NUL-terminated
+ argument to standard error, followed by a <new-
+ line>.
+ The order of the -l y and -l l operands given to c99 is
+ significant; the application shall either provide its
+ own main() function or ensure that -l y precedes -l l.
+ Debugging the Parser
+ The parser generated by yacc shall have diagnostic
+ facilities in it that can be optionally enabled at
+ either compile time or at runtime (if enabled at compile
+ time). The compilation of the runtime debugging code is
+ under the control of YYDEBUG, a preprocessor symbol. If
+ YYDEBUG has a non-zero value, the debugging code shall
+ be included. If its value is zero, the code shall not be
+ included.
+ In parsers where the debugging code has been included,
+ the external int yydebug can be used to turn debugging
+ on (with a non-zero value) and off (zero value) at run-
+ time. The initial value of yydebug shall be zero.
+ When -t is specified, the code file shall be built such
+ that, if YYDEBUG is not already defined at compilation
+ time (using the c99 -D YYDEBUG option, for example),
+ YYDEBUG shall be set explicitly to 1. When -t is not
+ specified, the code file shall be built such that, if
+ YYDEBUG is not already defined, it shall be set explic-
+ itly to zero.
+ The format of the debugging output is unspecified but
+ includes at least enough information to determine the
+ shift and reduce actions, and the input symbols. It also
+ provides information about error recovery.
+ Algorithms
+ The parser constructed by yacc implements an LALR(1)
+ parsing algorithm as documented in the literature. It is
+ unspecified whether the parser is table-driven or
+ direct-coded.
+ A parser generated by yacc shall never request an input
+ symbol from yylex() while in a state where the only
+ actions other than the error action are reductions by a
+ single rule.
+ The literature of parsing theory defines these concepts.
+ Limits
+ The yacc utility may have several internal tables. The
+ minimum maximums for these tables are shown in the fol-
+ lowing table. The exact meaning of these values is
+ implementation-defined. The implementation shall define
+ the relationship between these values and between them
+ and any error messages that the implementation may gen-
+ erate should it run out of space for any internal struc-
+ ture. An implementation may combine groups of these
+ resources into a single pool as long as the total avail-
+ able to the user does not fall below the sum of the
+ sizes specified by this section.
+ Table: Internal Limits in yacc
+ Minimum
+ Limit Maximum Description
+ {NTERMS} 126 Number of tokens.
+ {NNONTERM} 200 Number of non-terminals.
+ {NPROD} 300 Number of rules.
+ {NSTATES} 600 Number of states.
+ {MEMSIZE} 5200 Length of rules. The total length, in
+ names (tokens and non-terminals), of all
+ the rules of the grammar. The left-hand
+ side is counted for each rule, even if
+ it is not explicitly repeated, as speci-
+ fied in Grammar Rules in yacc .
+ {ACTSIZE} 4000 Number of actions. "Actions" here (and
+ in the description file) refer to parser
+ actions (shift, reduce, and so on) not
+ to semantic actions defined in Grammar
+ Rules in yacc .
+ The following exit values shall be returned:
+ 0 Successful completion.
+ >0 An error occurred.
+ If any errors are encountered, the run is aborted and
+ yacc exits with a non-zero status. Partial code files
+ and header files may be produced. The summary informa-
+ tion in the description file shall always be produced if
+ the -v flag is present.
+ The following sections are informative.
+ Historical implementations experience name conflicts on
+ the names yacc.tmp, yacc.acts, yacc.debug,,
+, and y.output if more than one copy of yacc is
+ running in a single directory at one time. The -b option
+ was added to overcome this problem. The related problem
+ of allowing multiple yacc parsers to be placed in the
+ same file was addressed by adding a -p option to over-
+ ride the previously hard-coded yy variable prefix.
+ The description of the -p option specifies the minimal
+ set of function and variable names that cause conflict
+ when multiple parsers are linked together. YYSTYPE does
+ not need to be changed. Instead, the programmer can use
+ -b to give the header files for different parsers dif-
+ ferent names, and then the file with the yylex() for a
+ given parser can include the header for that parser.
+ Names such as yyclearerr do not need to be changed
+ because they are used only in the actions; they do not
+ have linkage. It is possible that an implementation has
+ other names, either internal ones for implementing
+ things such as yyclearerr, or providing non-standard
+ features that it wants to change with -p.
+ Unary operators that are the same token as a binary
+ operator in general need their precedence adjusted. This
+ is handled by the %prec advisory symbol associated with
+ the particular grammar rule defining that unary opera-
+ tor. (See Grammar Rules in yacc .) Applications are not
+ required to use this operator for unary operators, but
+ the grammars that do not require it are rare.
+ Access to the yacc library is obtained with library
+ search operands to c99. To use the yacc library main():
+ c99 -l y
+ Both the lex library and the yacc library contain
+ main(). To access the yacc main():
+ c99 lex.yy.c -l y -l l
+ This ensures that the yacc library is searched first, so
+ that its main() is used.
+ The historical yacc libraries have contained two simple
+ functions that are normally coded by the application
+ programmer. These functions are similar to the follow-
+ ing code:
+ #include <locale.h>
+ int main(void)
+ {
+ extern int yyparse();
+ setlocale(LC_ALL, "");
+ /* If the following parser is one created by lex, the
+ application must be careful to ensure that LC_CTYPE
+ and LC_COLLATE are set to the POSIX locale. */
+ (void) yyparse();
+ return (0);
+ }
+ #include <stdio.h>
+ int yyerror(const char *msg)
+ {
+ (void) fprintf(stderr, "%s\n", msg);
+ return (0);
+ }
+ The references in may be helpful in constructing the
+ parser generator. The referenced DeRemer and Pennello
+ article (along with the works it references) describes a
+ technique to generate parsers that conform to this vol-
+ ume of IEEE Std 1003.1-2001. Work in this area contin-
+ ues to be done, so implementors should consult current
+ literature before doing any new implementations. The
+ original Knuth article is the theoretical basis for this
+ kind of parser, but the tables it generates are imprac-
+ tically large for reasonable grammars and should not be
+ used. The "equivalent to" wording is intentional to
+ assure that the best tables that are LALR(1) can be gen-
+ erated.
+ There has been confusion between the class of grammars,
+ the algorithms needed to generate parsers, and the algo-
+ rithms needed to parse the languages. They are all rea-
+ sonably orthogonal. In particular, a parser generator
+ that accepts the full range of LR(1) grammars need not
+ generate a table any more complex than one that accepts
+ SLR(1) (a relatively weak class of LR grammars) for a
+ grammar that happens to be SLR(1). Such an implementa-
+ tion need not recognize the case, either; table compres-
+ sion can yield the SLR(1) table (or one even smaller
+ than that) without recognizing that the grammar is
+ SLR(1). The speed of an LR(1) parser for any class is
+ dependent more upon the table representation and com-
+ pression (or the code generation if a direct parser is
+ generated) than upon the class of grammar that the table
+ generator handles.
+ The speed of the parser generator is somewhat dependent
+ upon the class of grammar it handles. However, the orig-
+ inal Knuth article algorithms for constructing LR
+ parsers were judged by its author to be impractically
+ slow at that time. Although full LR is more complex than
+ LALR(1), as computer speeds and algorithms improve, the
+ difference (in terms of acceptable wall-clock execution
+ time) is becoming less significant.
+ Potential authors are cautioned that the referenced
+ DeRemer and Pennello article previously cited identifies
+ a bug (an over-simplification of the computation of
+ LALR(1) lookahead sets) in some of the LALR(1) algorithm
+ statements that preceded it to publication. They should
+ take the time to seek out that paper, as well as current
+ relevant work, particularly Aho's.
+ The -b option was added to provide a portable method for
+ permitting yacc to work on multiple separate parsers in
+ the same directory. If a directory contains more than
+ one yacc grammar, and both grammars are constructed at
+ the same time (by, for example, a parallel make pro-
+ gram), conflict results. While the solution is not his-
+ torical practice, it corrects a known deficiency in his-
+ torical implementations. Corresponding changes were made
+ to all sections that referenced the filenames
+ (now "the code file"), (now "the header file"),
+ and y.output (now "the description file").
+ The grammar for yacc input is based on System V documen-
+ tation. The textual description shows there that the
+ ';' is required at the end of the rule. The grammar and
+ the implementation do not require this. (The use of
+ C_IDENTIFIER causes a reduce to occur in the right
+ place.)
+ Also, in that implementation, the constructs such as
+ %token can be terminated by a semicolon, but this is not
+ permitted by the grammar. The keywords such as %token
+ can also appear in uppercase, which is again not dis-
+ cussed. In most places where '%' is used, '\' can be
+ substituted, and there are alternate spellings for some
+ of the symbols (for example, %LEFT can be "%<" or even
+ "\<" ).
+ Historically, <tag> can contain any characters except
+ '>', including white space, in the implementation. How-
+ ever, since the tag must reference an ISO C standard
+ union member, in practice conforming implementations
+ need to support only the set of characters for ISO C
+ standard identifiers in this context.
+ Some historical implementations are known to accept
+ actions that are terminated by a period. Historical
+ implementations often allow '$' in names. A conforming
+ implementation does not need to support either of these
+ behaviors.
+ Deciding when to use %prec illustrates the difficulty in
+ specifying the behavior of yacc. There may be situations
+ in which the grammar is not, strictly speaking, in
+ error, and yet yacc cannot interpret it unambiguously.
+ The resolution of ambiguities in the grammar can in many
+ instances be resolved by providing additional informa-
+ tion, such as using %type or %union declarations. It is
+ often easier and it usually yields a smaller parser to
+ take this alternative when it is appropriate.
+ The size and execution time of a program produced with-
+ out the runtime debugging code is usually smaller and
+ slightly faster in historical implementations.
+ Statistics messages from several historical implementa-
+ tions include the following types of information:
+ n/512 terminals, n/300 non-terminals
+ n/600 grammar rules, n/1500 states
+ n shift/reduce, n reduce/reduce conflicts reported
+ n/350 working sets used
+ Memory: states, etc. n/15000, parser n/15000
+ n/600 distinct lookahead sets
+ n extra closures
+ n shift entries, n exceptions
+ n goto entries
+ n entries saved by goto default
+ Optimizer space used: input n/15000, output n/15000
+ n table entries, n zero
+ Maximum spread: n, Maximum offset: n
+ The report of internal tables in the description file is
+ left implementation-defined because all aspects of these
+ limits are also implementation-defined. Some implementa-
+ tions may use dynamic allocation techniques and have no
+ specific limit values to report.
+ The format of the y.output file is not given because
+ specification of the format was not seen to enhance
+ applications portability. The listing is primarily
+ intended to help human users understand and debug the
+ parser; use of y.output by a conforming application
+ script would be unusual. Furthermore, implementations
+ have not produced consistent output and no popular for-
+ mat was apparent. The format selected by the implementa-
+ tion should be human-readable, in addition to the
+ requirement that it be a text file.
+ Standard error reports are not specifically described
+ because they are seldom of use to conforming applica-
+ tions and there was no reason to restrict implementa-
+ tions.
+ Some implementations recognize "={" as equivalent to '{'
+ because it appears in historical documentation. This
+ construction was recognized and documented as obsolete
+ as long ago as 1978, in the referenced Yacc: Yet Another
+ Compiler-Compiler. This volume of IEEE Std 1003.1-2001
+ chose to leave it as obsolete and omit it.
+ Multi-byte characters should be recognized by the lexi-
+ cal analyzer and returned as tokens. They should not be
+ returned as multi-byte character literals. The token
+ error that is used for error recovery is normally
+ assigned the value 256 in the historical implementation.
+ Thus, the token value 256, which is used in many multi-
+ byte character sets, is not available for use as the
+ value of a user-defined token.
+ None.
+ c99, lex
+ Portions of this text are reprinted and reproduced in
+ electronic form from IEEE Std 1003.1, 2003 Edition,
+ Standard for Information Technology -- Portable Operat-
+ ing System Interface (POSIX), The Open Group Base Speci-
+ fications Issue 6, Copyright (C) 2001-2003 by the Insti-
+ tute of Electrical and Electronics Engineers, Inc and
+ The Open Group. In the event of any discrepancy between
+ this version and the original IEEE and The Open Group
+ Standard, the original IEEE and The Open Group Standard
+ is the referee document. The original Standard can be
+ obtained online at
+ .
+IEEE/The Open Group 2003 YACC(1P)