Currently, the translator fails on benchmark tasks containing non-ASCII
characters (e.g. accented letters) in PDDL files when run under Python 3. They
are accepted under when running the planner under Python 2.
Suggestion:
- Make the translator accept non-ASCII characters *in comments* as long as an
ASCII-compatible encoding is used (specifically, one where all non-ascii
characters exclusively use bytes beyond the ASCII range).
- Implementation strategy: decode the input as Latin-1, then discard comments,
then test that the remaining text is ASCII-only. (One way to test this is to
re-encode to ASCII and catch errors.)
For those not familiar with these encodings, Latin-1 is a good choice for this
because the set of inputs it accepts is a superset of ASCII, of the Latin-*
encodings and of UTF-8. (*Every* byte sequence is valid Latin-1.) Perhaps we
should add a comment to the code explaining this because otherwise the Latin
encodings are a bit old-fashioned in a Unicode world -- but here this will lead
to strictly fewer errors than using UTF-8.
|