If you were registered and logged in, you could join this project.
Lazer (Language Tokenizer) allows an application to break the text of some programming language into tokens representing keywords, operators etc of that language.
Strategies for handling Java and XML files are supplied. Users of this library can supply their own strategies for other lanaguages.
Lazer is used by the Syntalight project
Release 0.1 is out and can be downloaded here. But be warned that the API cannot be considered stable.
In particular some fine tuning of the language strategies need to be finalized. Different products may want the same language strategy to return tokens with slight variations on strategy rules, eg ignore whitespace, append whitespace to previous token, append tokens of the same type. The API for setting these is currently in flux.
This may result in the language strategies becoming non-singleton in the next release.