Issue349

Title get performance testing on OS X to work and report results on website
Priority wish Status resolved
Superseder Nosy List malte, rpgoldman
Assigned To malte Keywords
Optional summary

Created on 2012-09-11.19:52:03 by malte, last changed by malte.

Messages
msg4354 (view) Author: malte Date: 2015-07-12.18:45:00
Noone spoke up, so I've removed the script.
msg4307 (view) Author: malte Date: 2015-07-02.03:15:53
I think we have given up on the original idea here, and I think
run-calibration-test is not maintained any more. It does not work with the
current planner code, and I would like to remove it altogether, as we now have
better ways of running such experiments.

Do you have any objections to removing the script and marking this issue as
resolved?
msg3629 (view) Author: malte Date: 2014-10-04.18:34:27
OK, I had a bit more time to look at the current script now, and I think it
would be good to defer working on this until issue414 is merged (which I hope
will be soon). Some of the functionality that is currently in the performance
testing script might be useful to integrate into the new driver script directly,
and also the new script will influence the way we call the planner.

In other words, I'm deferring this one for a bit, but hopefully not for very long.
msg3535 (view) Author: rpgoldman Date: 2014-09-24.18:25:13
OK. I am also busy right now, but might be able to run some tests for you later on Mac 
OS X.
msg3534 (view) Author: malte Date: 2014-09-24.18:17:22
Thanks, Robert! We've had many incompatibilities due to Unix utilities being
named differently or behaving differently on Linux vs. Mac OS, and I'm coming
round to the idea that avoiding bash + unix tools and instead implementing
things like this script in Python makes it easier to maintain cross-platform
compatibility.

I'm a bit strapped for time right now, but I'll try to get back to this issue in
early October with your comments and the Python option in mind.
msg3529 (view) Author: rpgoldman Date: 2014-09-24.15:13:54
With respect to date on Mac OS X, the easiest thing may be to check for the presence of 
gnu date which, if installed from MacPorts, will show up as "gdate".  Some people may 
also choose to use the gnu versions in place of the Mac/BSD ones, in which case the 
"date" in their path WILL work.

One could check for date --version working (doesn't work on Apple's date) and 
containing the substring "GNU".  Failing that, check for "gdate", failing that fall 
over to using date w/o nanoseconds.
msg3457 (view) Author: malte Date: 2014-09-19.15:41:24
Emil replied that he is too busy with his other duties, so we'll need another OS
X user to help us out with this one.
msg3456 (view) Author: malte Date: 2014-09-19.15:30:24
Hi Emil, I'd like to close some old issues. Do you still have time and interest
for this sort of thing, and if yes, can you let me know if the things in this
issue work correctly on OS X these days? If not, no worries, I'll find another
Mac owner.
msg2332 (view) Author: malte Date: 2012-09-11.19:52:03
We started doing some simple benchmarking of the planner on OS X and run into a
few issues. The idea was to report on the website which performance to expect on
which platform, and to a clearer idea if we should recommend some platform over
another. (In the past, we've occasionally warned people against using OS X For
benchmarking since the planner is essentially untested on that platform.)

Here's what I wrote about this in an email:

===========================================================================
[I wrote] a script for some very informal and anecdotal
benchmarking. To run it, just get the most recent code, ideally using a
clean clone, go to new-scripts, and run

$ ./run-calibration-test

This will probably take an hour or so. It'd be good to keep the machine
quiet while you're running this.

Alternatively, or rather additionally, it'd also be interesting to see
what happens if you run multiple instances of this in parallel, but I
fear currently they'd all need their own checkout since they will
clobber each other's temporary files. If you have k GB of RAM, don't run
more than k/2 instances in parallel.

The script will generate up to three log files (the third one only if
something goes wrong) named "calibration-test.*". I'd need those, as
well as the output of

$ hg identify

and

$ cat /proc/cpuinfo

(or if that doesn't exist on a Mac, whatever other info identifies your
CPU).

Once we're happy that everything went fine, it'd be good to put this
info up on the wiki. I'll contribute data for a bunch of Linux machines.
===========================================================================

Emil reported the following problems/bits of feedback (this is now a bit older,
so might no longer be current):

- The VAL Makefile didn't work out of the box because it linked statically.
  This can be fixed the same way as with our own code.
- To get CPU info on the Mac, run /usr/sbin/system_profiler.
- "date" on OS X doesn't print nanoseconds, so the script's
  'date +%s.%N' doesn't work. 'date +%s' does, but only has second resolution.

It further transpired that there were additional problems:

=============================================================================
some of the output looks weird, e.g. this:

> Testing lazy search with h^cea...
> Solving satellite/p30-HC-pfile10.pddl...
> Translator: [145.500s CPU, 90.416s wall-clock]
> Plan cost: 280
> Evaluated 4048 state(s).
> Search: 16.22s
> Peak memory: 778332 KB
> Elapsed wall-clock time: 125.000 seconds
> Plan valid

The line relating to the translator is strange because CPU time should
not exceed wall-clock time. (What that line suggests is that the
translator ran for a total of 90.416 seconds real-time and that during
that time it used the CPU for 145.5 seconds.)
=============================================================================
[...]
=============================================================================
There was a bug in Python a while ago where on some OSes os.times() was
off due to using a wrong multiplier internally. I remember that one well
because I was the one who reported it.  Let me see if I can dig out
something relevant...

Here is the issue: http://bugs.python.org/issue1040026
=============================================================================

The Python bug was apparently present in Python 2.6.1, but fixed in 2.6.2 and
2.7. So it should be sufficient to require a sufficiently new Python version.
History
Date User Action Args
2015-07-12 18:45:00maltesetstatus: chatting -> resolved
messages: + msg4354
2015-07-02 03:15:53maltesetmessages: + msg4307
2014-10-04 18:34:27maltesetmessages: + msg3629
2014-09-24 18:25:13rpgoldmansetmessages: + msg3535
2014-09-24 18:17:22maltesetmessages: + msg3534
2014-09-24 15:13:54rpgoldmansetnosy: + rpgoldman
messages: + msg3529
2014-09-19 15:41:24maltesetnosy: - emilkeyder
messages: + msg3457
2014-09-19 15:30:25maltesetmessages: + msg3456
2012-09-11 19:52:03maltecreate