Multiple Lexicalisation - A Java Based Study. / Johnstone, Adrian; Scott, Elizabeth.

ACM Digital Library: Proceedings of Software Language Engineering 2019. ACM, 2019.

Research output: Chapter in Book/Report/Conference proceedingConference contribution



  • Accepted Manuscript

    Accepted author manuscript, 770 KB, PDF document


We consider the possibility of making the lexicalisation phase
of compilation more powerful by avoiding the need for the
lexer to return a single token string from the input character
string. This has the potential to empower language design
by softening the boundaries between lexical and phrase level
specification. The large number of lexicalisations makes it
impractical to parse each one individually, but it is possible
to share the parsing of common subparts, reducing the num-
ber of tokens parsed from the product of the token numbers
associated with the components to their sum. We report total
numbers of lexicalisations of example Java strings, and the
impact on these numbers of various lexical disambiguation
strategies, and we introduce a new generalised parsing tech-
nique that can efficiently parse multiple lexicalisations of
character string simultaneously. We then use this technique
on Java, reporting on the number of lexicalisations that cor-
respond to syntactically correct Java strings and the degree
to which the standard Java lexer is safe in the sense that it
does not remove all the syntactically correct lexicalisations
of an input character string. Our multi-lexer parser is an
alternative to scannerless parsing of a character level gram-
mar, retaining the separation between grammar terminals
and the corresponding lexical tokens. This has the advan-
tages of allowing the parser to use terminal level lookahead
and keeping lexical level disambiguation separate from the
context free grammar.
Original languageEnglish
Title of host publicationACM Digital Library
Subtitle of host publicationProceedings of Software Language Engineering 2019
Number of pages12
Publication statusAccepted/In press - 10 Aug 2019

ID: 34483809