ocaml - Faithfully handle white-spacing in a pretty-printer -
i writing front-end language (by ocamllex
, ocamlyacc
).
so frond-end can build abstract syntax tree (ast)
program. write pretty printer, takes ast , print program. if later want compile or analyse ast, of time, don't need printed program exactly same original program, in terms of white-spacing. however, time, want write pretty printer prints exactly same program original one, in terms of white-spacing.
therefore, question best practices handle white-spacing while trying not modify types of ast. don't want add number (of white-spaces) each type in ast.
for example, how deal (ie, skip) white-spacing in lexer.mll
:
rule token = parse ... | [' ' '\t'] { token lexbuf } (* skip blanks *) | eof { eof }
does know how change other parts of front-end correctly taking white-spacing account later printing?
it's quite common keep source-file location information each token. information allows more accurate errors, example.
the general way keep beginning , ending line number , column position each token, total of 4 numbers. if easy compute end position of token value , start position, reduced 2 numbers, @ price of code complexity.
bison has features simplify bookkeeping work of remembering location objects; it's possible ocamlyacc includes similar features, didn't see in documentation. in case, straight-forward maintain location object associated each input token.
with information, easy recreate whitespace between 2 adjacent tokens, long separated tokens whitespace. comments issue.
it's judgement call whether or not simpler attaching preceding whitespace (and comments) each token lexed.