


FLEX(1)             UNIX Programmer's Manual              FLEX(1)



NAME
     flex - fast lexical analyzer generator

SYNOPSIS
     flex [ -dfirstvFILT -c[efmF] -Sskeleton_file ] [ filename ]

DESCRIPTION
     flex is a rewrite of lex intended to right some of that
     tool's deficiencies: in particular, flex generates lexical
     analyzers much faster, and the analyzers use smaller tables
     and run faster.

OPTIONS
     In addition to lex's -t flag, flex has the following
     options:

     -d   makes the generated scanner run in debug mode.  When-
          ever a pattern is recognized the scanner will write to
          stderr a line of the form:

              --accepting rule #n

          Rules are numbered sequentially with the first one
          being 1.

     -f   has the same effect as lex's -f flag (do not compress
          the scanner tables); the mnemonic changes from fast
          compilation to (take your pick) full table or fast
          scanner. The actual compilation takes longer, since
          flex is I/O bound writing out the big table.

          This option is equivalent to -cf (see below).

     -i   instructs flex to generate a case-insensitive scanner.
          The case of letters given in the flex input patterns
          will be ignored, and the rules will be matched regard-
          less of case.  The matched text given in yytext will
          have the preserved case (i.e., it will not be folded).

     -r   specifies that the scanner uses the REJECT action.

     -s   causes the default rule (that unmatched scanner input
          is echoed to stdout) to be suppressed.  If the scanner
          encounters input that does not match any of its rules,
          it aborts with an error.  This option is useful for
          finding holes in a scanner's rule set.

     -v   has the same meaning as for lex (print to stderr a sum-
          mary of statistics of the generated scanner).  Many
          more statistics are printed, though, and the summary
          spans several lines.  Most of the statistics are mean-
          ingless to the casual flex user.



Printed 12/28/88           13 May 1987                          1






FLEX(1)             UNIX Programmer's Manual              FLEX(1)



     -F   specifies that the fast scanner table representation
          should be used.  This representation is about as fast
          as the full table representation (-f), and for some
          sets of patterns will be considerably smaller (and for
          others, larger).  In general, if the pattern set con-
          tains both "keywords" and a catch-all, "identifier"
          rule, such as in the set:

               "case"    return ( TOK_CASE );
               "switch"  return ( TOK_SWITCH );
               ...
               "default" return ( TOK_DEFAULT );
               [a-z]+    return ( TOK_ID );

          then you're better off using the full table representa-
          tion.  If only the "identifier" rule is present and you
          then use a hash table or some such to detect the key-
          words, you're better off using -F.

          This option is equivalent to -cF (see below).

     -I   instructs flex to generate an interactive scanner.
          Normally, scanners generated by flex always look ahead
          one character before deciding that a rule has been
          matched.  At the possible cost of some scanning over-
          head (it's not clear that more overhead is involved),
          flex will generate a scanner which only looks ahead
          when needed.  Such scanners are called interactive
          because if you want to write a scanner for an interac-
          tive system such as a command shell, you will probably
          want the user's input to be terminated with a newline,
          and without -I the user will have to type a character
          in addition to the newline in order to have the newline
          recognized.  This leads to dreadful interactive perfor-
          mance.

          If all this seems to confusing, here's the general
          rule: if a human will be typing in input to your
          scanner, use -I, otherwise don't; if you don't care
          about how fast your scanners run and don't want to make
          any assumptions about the input to your scanner, always
          use -I.

          Note, -I cannot be used in conjunction with full or
          fast tables, i.e., the -f, -F, -cf, or -cF flags.

     -L   instructs flex to not generate #line directives (see
          below).

     -T   makes flex run in trace mode.  It will generate a lot
          of messages to standard out concerning the form of the
          input and the resultant non-deterministic and



Printed 12/28/88           13 May 1987                          2






FLEX(1)             UNIX Programmer's Manual              FLEX(1)



          deterministic finite automatons.  This option is mostly
          for use in maintaining flex.

     -c[efmF]
          controls the degree of table compression.  -ce directs
          flex to construct equivalence classes, i.e., sets of
          characters which have identical lexical properties (for
          example, if the only appearance of digits in the flex
          input is in the character class "[0-9]" then the digits
          '0', '1', ..., '9' will all be put in the same
          equivalence class).  -cf specifies that the full
          scanner tables should be generated - flex should not
          compress the tables by taking advantages of similar
          transition functions for different states.  -cF speci-
          fies that the alternate fast scanner representation
          (described above under the -F flag) should be used.  -
          cm directs flex to construct meta-equivalence classes,
          which are sets of equivalence classes (or characters,
          if equivalence classes are not being used) that are
          commonly used together.  A lone -c specifies that the
          scanner tables should be compressed but neither
          equivalence classes nor meta-equivalence classes should
          be used.

          The options -cf or -cF and -cm do not make sense
          together - there is no opportunity for meta-equivalence
          classes if the table is not being compressed.  Other-
          wise the options may be freely mixed.

          The default setting is -cem which specifies that flex
          should generate equivalence classes and meta-
          equivalence classes.  This setting provides the highest
          degree of table compression.  You can trade off
          faster-executing scanners at the cost of larger tables
          with the following generally being true:

              slowest            smallest
                         -cem
                         -ce
                         -cm
                         -c
                         -c{f,F}e
                         -c{f,F}
              fastest            largest


     -Sskeleton_file
          overrides the default skeleton file from which flex
          constructs its scanners.  You'll never need this option
          unless you are doing flex maintenance or development.

INCOMPATIBILITIES WITH LEX



Printed 12/28/88           13 May 1987                          3






FLEX(1)             UNIX Programmer's Manual              FLEX(1)



     flex is fully compatible with lex with the following excep-
     tions:

     -    There is no run-time library to link with.  You needn't
          specify -ll when linking, and you must supply a main
          program.  (Hacker's note: since the lex library con-
          tains a main() which simply calls yylex(), you actually
          can be lazy and not supply your own main program and
          link with -ll.)

     -    lex's %r (Ratfor scanners) and %t (translation table)
          options are not supported.

     -    The do-nothing -n flag is not supported.

     -    When definitions are expanded, flex encloses them in
          parentheses.  With lex, the following

              NAME    [A-Z][A-Z0-9]*
              %%
              foo{NAME}?      printf( "Found it\n" );
              %%

          will not match the string "foo" because when the macro
          is expanded the rule is equivalent to "foo[A-Z][A-Z0-
          9]*?" and the precedence is such that the '?' is asso-
          ciated with "[A-Z0-9]*".  With flex, the rule will be
          expanded to "foo([A-z][A-Z0-9]*)?" and so the string
          "foo" will match.

     -    yymore() is not supported.

     -    The undocumented lex-scanner internal variable yylineno
          is not supported.

     -    If your input uses REJECT, you must run flex with the
          -r flag.  If you leave out the flag, the scanner will
          abort at run-time with a message that the scanner was
          compiled without the flag being specified.

     -    The input() routine is not redefinable, though may be
          called to read characters following whatever has been
          matched by a rule.  If input() encounters and end-of-
          file the normal yywrap() processing is done.  A
          ``real'' end-of-file is returned as EOF.

          Input can be controlled by redefining the YY_INPUT
          macro.  YY_INPUT's calling sequence is
          "YY_INPUT(buf,result,max_size)".  Its action is to
          place up to max_size characters in the character buffer
          "buf" and return in the integer variable "result"
          either the number of characters read or the constant



Printed 12/28/88           13 May 1987                          4






FLEX(1)             UNIX Programmer's Manual              FLEX(1)



          YY_NULL (0 on Unix systems) systems) to indicate EOF.
          The default YY_INPUT reads from the file-pointer "yyin"
          (which is by default stdin), so if you just want to
          change the input file, you needn't redefine YY_INPUT -
          just point yyin at the input file.

          A sample redefinition of YY_INPUT (in the first section
          of the input file):

              %{
              #undef YY_INPUT
              #define YY_INPUT(buf,result,max_size) \
                  result = (buf[0] = getchar()) == EOF ? YY_NULL : 1;
              %}

          You also can add in things like counting keeping track
          of the input line number this way; but don't expect
          your scanner to go very fast.

     -    output() is not supported.  Output from the ECHO macro
          is done to the file-pointer "yyout" (default stdout).

     -    Trailing context is restricted to patterns which have
          either a fixed-sized leading part or a fixed-sized
          trailing part.  For example, "a*/b" and "a/b*" are
          okay, but not "a*/b*".  This restriction is due to a
          bug in the trailing context algorithm given in Princi-
          ples of Compiler Design (and Compilers - Principles,
          Techniques, and Tools) which can result in mismatches.
          Try the following lex program

              %%
              x+/xy           printf( "I found \"%s\"\n", yytext );

          on the input "xxy".  (If anyone knows of a fast algo-
          rithm for finding the beginning of trailing context for
          an arbitrary pair of regular expressions, please let me
          know!) If you must have arbitrary trailing context, you
          can use yyless() to effect it.

     -    flex reads only one input file, while lex's input is
          made up of the concatenation of its input files.

ENHANCEMENTS
     -    Exclusive start-conditions can be declared by using %x
          instead of %s. These start-conditions have the property
          that when they are active, no other rules are active.
          Thus a set of rules governed by the same exclusive
          start condition describe a scanner which is independent
          of any of the other rules in the flex input.  This
          feature makes it easy to specify "mini-scanners" which
          scan portions of the input that are syntactically



Printed 12/28/88           13 May 1987                          5






FLEX(1)             UNIX Programmer's Manual              FLEX(1)



          different from the rest (e.g., comments).

     -    flex dynamically resizes its internal tables, so direc-
          tives like "%a 3000" are not needed when specifying
          large scanners.

     -    The scanning routine generated by flex is declared
          using the macro YY_DECL. By redefining this macro you
          can change the routine's name and its calling sequence.
          For example, you could use:

              #undef YY_DECL
              #define YY_DECL float lexscan( a, b ) float a, b;

          to give it the name lexscan, returning a float, and
          taking two floats as arguments.

     -    flex generates #line directives mapping lines in the
          output to their origin in the input file.

     -    You can put multiple actions on the same line,
          separated with semi-colons.  With lex, the following

              foo    handle_foo(); return 1;

          is truncated to

              foo    handle_foo();

          flex does not truncate the action.  Actions that are
          not enclosed in braces are terminated at the end of the
          line.

     -    Actions can be begun with %{ and terminated with %}. In
          this case, flex does not count braces to figure out
          where the action ends - actions are terminated by the
          closing %}. This feature is useful when the enclosed
          action has extraneous braces in it (usually in comments
          or inside inactive #ifdef's) that throw off the brace-
          count.

     -    All of the scanner actions (e.g., ECHO, yywrap ...)
          except the unput() and input() routines, are written as
          macros, so they can be redefined if necessary without
          requiring a separate library to link to.

FILES
     flex.skel
          skeleton scanner

     flex.fastskel
          skeleton scanner for -f and -F



Printed 12/28/88           13 May 1987                          6






FLEX(1)             UNIX Programmer's Manual              FLEX(1)



     flexskelcom.h
          common definitions for skeleton files

     flexskeldef.h
          definitions for compressed skeleton file

     fastskeldef.h
          definitions for -f, -F skeleton file

SEE ALSO
     lex(1)

     M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator

AUTHOR
     Vern Paxson, with the help of many ideas and much inspira-
     tion from Van Jacobson.  Original version by Jef Poskanzer.
     Fast table representation is a partial implementation of a
     design done by Van Jacobson.  The implementation was done by
     Kevin Gong and Vern Paxson.

     Thanks to the many flex beta-testers, especially Casey Lee-
     dom, Nick Christopher, Chris Faylor, Eric Goldman, Craig
     Leres, Mohamed el Lozy, Esmond Pitt, Jef Poskanzer, and Dave
     Tallman.  Thanks to John Gilmore, Bob Mulcahy, Rich Salz,
     and Richard Stallman for help with various distribution
     headaches.

     Send comments to:

          Vern Paxson
          Real Time Systems
          Bldg. 46A
          Lawrence Berkeley Laboratory
          1 Cyclotron Rd.
          Berkeley, CA 94720

          (415) 486-6411

          vern@lbl-{csam,rtsg}.arpa
          ucbvax!lbl-csam.arpa!vern


DIAGNOSTICS
     flex scanner jammed - a scanner compiled with -s has encoun-
     tered an input string which wasn't matched by any of its
     rules.

     flex input buffer overflowed - a scanner rule matched a
     string long enough to overflow the scanner's internal input
     buffer (as large as BUFSIZ in "/usr/include/stdio.h").  You
     can edit flexskelcom.h and increase YY_BUF_SIZE and



Printed 12/28/88           13 May 1987                          7






FLEX(1)             UNIX Programmer's Manual              FLEX(1)



     YY_MAX_LINE to increase this limit.

     REJECT used and scanner was not generated using -r - jus
     like it sounds.  Your scanner uses REJECT. You must run flex
     on the scanner description using the -r flag.

     old-style lex command ignored - the flex input contains a
     lex command (e.g., "%n 1000") which is being ignored.

BUGS
     Use of unput() or input() trashes the current yytext and
     yyleng.

     Use of unput() to push back more text than was matched can
     result in the pushed-back text matching a beginning-of-line
     ('^') rule even though it didn't come at the beginning of
     the line.

     Nulls are not allowed in flex inputs or in the inputs to
     scanners generated by flex.  Their presence generates fatal
     errors.

     Do not mix trailing context with the '|' operator used to
     specify that multiple rules use the same action.  That is,
     avoid constructs like:

             foo/bar      |
             bletch       |
             bugprone     { ... }

     They can result in subtle mismatches.  This is actually not
     a problem if there is only one rule using trailing context
     and it is the first in the list (so the above example will
     actually work okay).  The problem is due to fall-through in
     the action switch statement, causing non-trailing-context
     rules to execute the trailing-context code of their fellow
     rules.  This should be fixed, as it's a nasty bug and not
     obvious.  The proper fix is for flex to spit out a
     FLEX_TRAILING_CONTEXT_USED #define and then have the backup
     logic in a separate table which is consulted for each rule-
     match, rather than as part of the rule action.  The place to
     do the tweaking is in add_accept() - any kind soul want to
     be a hero?

     The pattern:

          x{3}

     is considered to be variable-length for the purposes of
     trailing context, even though it has a clear fixed length.

     Due to both buffering of input and read-ahead, you cannot



Printed 12/28/88           13 May 1987                          8






FLEX(1)             UNIX Programmer's Manual              FLEX(1)



     intermix calls to, for example, getchar() with flex rules
     and expect it to work.  Call input() instead.

     The total table entries listed by the -v flag excludes the
     number of table entries needed to determine what rule has
     been matched.  The number of entries is equal to the number
     of DFA states if the scanner was not compiled with -r, and
     greater than the number of states if it was.

     The scanner run-time speeds have not been optimized as much
     as they deserve.  Van Jacobson's work shows that the can go
     quite a bit faster still.











































Printed 12/28/88           13 May 1987                          9



