Created on 2026-03-23.14:23:43 by masataro, last changed by malte.
SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits.
1. better interoperability with numerous other, open-ended applications.
While many work on action model learning, including my latplan but also recent llm-based ones, produce PDDLs so that their outputs are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables.
Allowing multivalue variables may also improve the performance of the current not-so-great neural planning solvers written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Anyone with a sane engineering principle would first try them to see whether the proof of concept works before writing a custom CUDA kernel. But then there is a SAS format that makes it difficult for them to load. Removing such a hurdle one by one makes classical planning available to many who don't want to write C++ (although some people, including myself, have some feeling against them).
Using a json-based format also improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridge python dataclass and the corresponding json grammer decoder (pydantic).
It also lowers the hurdle for implementing other grounders, including recent Tarski.
2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer.
Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing.
3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf
Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments. Standardizing the output to json may make this next step more approacheable.
4. Implementation cost is low.
The current SAS format is a simple intermediate file format that easily translates to an equivalent json.
|
| msg12051 (view) |
Author: malte |
Date: 2026-03-23.14:59:49 |
|
I like this idea. Do you have good recommendations for JSON libraries for C++?
|
|
| Date |
User |
Action |
Args |
| 2026-03-23 14:59:49 | malte | set | messages:
+ msg12051 nosy:
+ malte, jendrik, masataro status: unread -> chatting |
| 2026-03-23 14:44:04 | masataro | set | summary: SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits.
1. better interoperability with numerous other, open-ended applications.
While many work on action model learning, including my latplan but also recent llm-based ones, produce PDDLs so that their outputs are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables.
Allowing multivalue variables may also improve the performance of the current not-so-great neural planning solvers written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Anyone with a sane engineering principle would first try them to see whether the proof of concept works before writing a custom CUDA kernel. But then there is a SAS format that makes it difficult for them to load. Removing such a hurdle one by one makes classical planning available to many who don't want to write C++ (although some people, including myself, have some feeling against them).
Using a json-based format also improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridge python dataclass and the corresponding json grammer decoder (pydantic).
It also lowers the hurdle for implementing other grounders, including recent Tarski.
2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer.
Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing.
3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf
Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments.
4. Implementation cost is low.
The current SAS format is a simple intermediate file format that easily translates to an equivalent json. -> SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits.
1. better interoperability with numerous other, open-ended applications.
While many work on action model learning, including my latplan but also recent llm-based ones, produce PDDLs so that their outputs are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables.
Allowing multivalue variables may also improve the performance of the current not-so-great neural planning solvers written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Anyone with a sane engineering principle would first try them to see whether the proof of concept works before writing a custom CUDA kernel. But then there is a SAS format that makes it difficult for them to load. Removing such a hurdle one by one makes classical planning available to many who don't want to write C++ (although some people, including myself, have some feeling against them).
Using a json-based format also improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridge python dataclass and the corresponding json grammer decoder (pydantic).
It also lowers the hurdle for implementing other grounders, including recent Tarski.
2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer.
Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing.
3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf
Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments. Standardizing the output to json may make this next step more approacheable.
4. Implementation cost is low.
The current SAS format is a simple intermediate file format that easily translates to an equivalent json. |
| 2026-03-23 14:42:10 | masataro | set | summary: SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits.
1. better interoperability with numerous other, open-ended applications.
While many work, including my latplan but also recent llm-based ones, produces PDDL so that they are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables.
Allowing multivalue variables may greatly improve the performance of a neural planning solver written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones.
Using a json-based format also greatly improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridges python dataclass and the corresponding json grammer decoder (pydantic).
It also lowers the hurdle for implementing other grounders, including recent Tarski.
2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer.
Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing.
3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf
Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments.
4. Implementation costis low.
The current SAS format is a simple intermediate file format that easily translates to an equivalent json. -> SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits.
1. better interoperability with numerous other, open-ended applications.
While many work on action model learning, including my latplan but also recent llm-based ones, produce PDDLs so that their outputs are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables.
Allowing multivalue variables may also improve the performance of the current not-so-great neural planning solvers written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Anyone with a sane engineering principle would first try them to see whether the proof of concept works before writing a custom CUDA kernel. But then there is a SAS format that makes it difficult for them to load. Removing such a hurdle one by one makes classical planning available to many who don't want to write C++ (although some people, including myself, have some feeling against them).
Using a json-based format also improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridge python dataclass and the corresponding json grammer decoder (pydantic).
It also lowers the hurdle for implementing other grounders, including recent Tarski.
2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer.
Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing.
3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf
Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments.
4. Implementation cost is low.
The current SAS format is a simple intermediate file format that easily translates to an equivalent json. |
| 2026-03-23 14:23:43 | masataro | create | |
|