We started doing some simple benchmarking of the planner on OS X and run into a
few issues. The idea was to report on the website which performance to expect on
which platform, and to a clearer idea if we should recommend some platform over
another. (In the past, we've occasionally warned people against using OS X For
benchmarking since the planner is essentially untested on that platform.)
Here's what I wrote about this in an email:
===========================================================================
[I wrote] a script for some very informal and anecdotal
benchmarking. To run it, just get the most recent code, ideally using a
clean clone, go to new-scripts, and run
$ ./run-calibration-test
This will probably take an hour or so. It'd be good to keep the machine
quiet while you're running this.
Alternatively, or rather additionally, it'd also be interesting to see
what happens if you run multiple instances of this in parallel, but I
fear currently they'd all need their own checkout since they will
clobber each other's temporary files. If you have k GB of RAM, don't run
more than k/2 instances in parallel.
The script will generate up to three log files (the third one only if
something goes wrong) named "calibration-test.*". I'd need those, as
well as the output of
$ hg identify
and
$ cat /proc/cpuinfo
(or if that doesn't exist on a Mac, whatever other info identifies your
CPU).
Once we're happy that everything went fine, it'd be good to put this
info up on the wiki. I'll contribute data for a bunch of Linux machines.
===========================================================================
Emil reported the following problems/bits of feedback (this is now a bit older,
so might no longer be current):
- The VAL Makefile didn't work out of the box because it linked statically.
This can be fixed the same way as with our own code.
- To get CPU info on the Mac, run /usr/sbin/system_profiler.
- "date" on OS X doesn't print nanoseconds, so the script's
'date +%s.%N' doesn't work. 'date +%s' does, but only has second resolution.
It further transpired that there were additional problems:
=============================================================================
some of the output looks weird, e.g. this:
> Testing lazy search with h^cea...
> Solving satellite/p30-HC-pfile10.pddl...
> Translator: [145.500s CPU, 90.416s wall-clock]
> Plan cost: 280
> Evaluated 4048 state(s).
> Search: 16.22s
> Peak memory: 778332 KB
> Elapsed wall-clock time: 125.000 seconds
> Plan valid
The line relating to the translator is strange because CPU time should
not exceed wall-clock time. (What that line suggests is that the
translator ran for a total of 90.416 seconds real-time and that during
that time it used the CPU for 145.5 seconds.)
=============================================================================
[...]
=============================================================================
There was a bug in Python a while ago where on some OSes os.times() was
off due to using a wrong multiplier internally. I remember that one well
because I was the one who reported it. Let me see if I can dig out
something relevant...
Here is the issue: http://bugs.python.org/issue1040026
=============================================================================
The Python bug was apparently present in Python 2.6.1, but fixed in 2.6.2 and
2.7. So it should be sufficient to require a sufficiently new Python version.
|