Issue614

Title translator: don't choke on non-ASCII characters in comments
Priority feature Status resolved
Superseder Nosy List gabi, jendrik, malte
Assigned To gabi Keywords
Optional summary

Created on 2015-12-11.12:04:39 by malte, last changed by gabi.

Messages
msg4962 (view) Author: gabi Date: 2015-12-11.14:40:45
Resolved (following Malte's suggested solution).
msg4958 (view) Author: malte Date: 2015-12-11.12:04:39
Currently, the translator fails on benchmark tasks containing non-ASCII
characters (e.g. accented letters) in PDDL files when run under Python 3. They
are accepted under when running the planner under Python 2.

Suggestion:

- Make the translator accept non-ASCII characters *in comments* as long as an
ASCII-compatible encoding is used (specifically, one where all non-ascii
characters exclusively use bytes beyond the ASCII range).

- Implementation strategy: decode the input as Latin-1, then discard comments,
then test that the remaining text is ASCII-only. (One way to test this is to
re-encode to ASCII and catch errors.)

For those not familiar with these encodings, Latin-1 is a good choice for this
because the set of inputs it accepts is a superset of ASCII, of the Latin-*
encodings and of UTF-8. (*Every* byte sequence is valid Latin-1.) Perhaps we
should add a comment to the code explaining this because otherwise the Latin
encodings are a bit old-fashioned in a Unicode world -- but here this will lead
to strictly fewer errors than using UTF-8.
History
Date User Action Args
2015-12-11 14:40:45gabisetstatus: chatting -> resolved
messages: + msg4962
2015-12-11 12:10:45jendriksetnosy: + jendrik
2015-12-11 12:04:39maltecreate