You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
41 lines
1.6 KiB
41 lines
1.6 KiB
// Copyright (c) 2015 The golex Authors. All rights reserved.
|
|
// Use of this source code is governed by a BSD-style
|
|
// license that can be found in the LICENSE file.
|
|
|
|
// Package lex is a Unicode-friendly run time library for golex[0] generated
|
|
// lexical analyzers[1].
|
|
//
|
|
// Changelog
|
|
//
|
|
// 2015-04-08: Initial release.
|
|
//
|
|
// Character classes
|
|
//
|
|
// Golex internally handles only 8 bit "characters". Many Unicode-aware
|
|
// tokenizers do not actually need to recognize every Unicode rune, but only
|
|
// some particular partitions/subsets. Like, for example, a particular Unicode
|
|
// category, say upper case letters: Lu.
|
|
//
|
|
// The idea is to convert all runes in a particular set as a single 8 bit
|
|
// character allocated outside the ASCII range of codes. The token value, a
|
|
// string of runes and their exact positions is collected as usual (see the
|
|
// Token and TokenBytes method), but the tokenizer DFA is simpler (and thus
|
|
// smaller and perhaps also faster) when this technique is used. In the example
|
|
// program (see below), recognizing (and skipping) white space, integer
|
|
// literals, one keyword and Go identifiers requires only an 8 state DFA[5].
|
|
//
|
|
// To provide the conversion from runes to character classes, "install" your
|
|
// converting function using the RuneClass option.
|
|
//
|
|
// References
|
|
//
|
|
// -
|
|
//
|
|
// [0]: http://godoc.org/modernc.org/golex
|
|
// [1]: http://en.wikipedia.org/wiki/Lexical_analysis
|
|
// [2]: http://golang.org/cmd/yacc/
|
|
// [3]: https://modernc.org/golex/blob/master/lex/example.l
|
|
// [4]: http://golang.org/pkg/io/#RuneReader
|
|
// [5]: https://modernc.org/golex/blob/master/lex/dfa
|
|
package lex // import "modernc.org/golex/lex"
|