dorm/vendor/modernc.org/golex/lex/doc.go

// Copyright (c) 2015 The golex Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

// Package lex is a Unicode-friendly run time library for golex[0] generated
// lexical analyzers[1].
//
// Changelog
//
// 2015-04-08: Initial release.
//
// Character classes
//
// Golex internally handles only 8 bit "characters". Many Unicode-aware
// tokenizers do not actually need to recognize every Unicode rune, but only
// some particular partitions/subsets. Like, for example, a particular Unicode
// category, say upper case letters: Lu.
//
// The idea is to convert all runes in a particular set as a single 8 bit
// character allocated outside the ASCII range of codes. The token value, a
// string of runes and their exact positions is collected as usual (see the
// Token and TokenBytes method), but the tokenizer DFA is simpler (and thus
// smaller and perhaps also faster) when this technique is used. In the example
// program (see below), recognizing (and skipping) white space, integer
// literals, one keyword and Go identifiers requires only an 8 state DFA[5].
//
// To provide the conversion from runes to character classes, "install" your
// converting function using the RuneClass option.
//
// References
//
// -
//
//  [0]: http://godoc.org/modernc.org/golex
//  [1]: http://en.wikipedia.org/wiki/Lexical_analysis
//  [2]: http://golang.org/cmd/yacc/
//  [3]: https://modernc.org/golex/blob/master/lex/example.l
//  [4]: http://golang.org/pkg/io/#RuneReader
//  [5]: https://modernc.org/golex/blob/master/lex/dfa
package lex // import "modernc.org/golex/lex"