Issue 614: translator: don't choke on non-ASCII characters in comments - Fast Downward issue tracker

Title	translator: don't choke on non-ASCII characters in comments
Priority	feature	Status	resolved
Superseder		Nosy List	gabi, jendrik, malte
Assigned To	gabi	Keywords
Optional summary

Created on 2015-12-11.12:04:39 by malte, last changed by gabi.

Messages
msg4962 (view)	Author: gabi	Date: 2015-12-11.14:40:45
Resolved (following Malte's suggested solution).
msg4958 (view)	Author: malte	Date: 2015-12-11.12:04:39
Currently, the translator fails on benchmark tasks containing non-ASCII characters (e.g. accented letters) in PDDL files when run under Python 3. They are accepted under when running the planner under Python 2. Suggestion: - Make the translator accept non-ASCII characters in comments as long as an ASCII-compatible encoding is used (specifically, one where all non-ascii characters exclusively use bytes beyond the ASCII range). - Implementation strategy: decode the input as Latin-1, then discard comments, then test that the remaining text is ASCII-only. (One way to test this is to re-encode to ASCII and catch errors.) For those not familiar with these encodings, Latin-1 is a good choice for this because the set of inputs it accepts is a superset of ASCII, of the Latin-* encodings and of UTF-8. (Every byte sequence is valid Latin-1.) Perhaps we should add a comment to the code explaining this because otherwise the Latin encodings are a bit old-fashioned in a Unicode world -- but here this will lead to strictly fewer errors than using UTF-8.

History
Date	User	Action	Args
2015-12-11 14:40:45	gabi	set	status: chatting -> resolved messages: + msg4962
2015-12-11 12:10:45	jendrik	set	nosy: + jendrik
2015-12-11 12:04:39	malte	create