Issue 404: iPDB random walk evaluates every state

Title	iPDB random walk evaluates every state
Priority	wish	Status	resolved
Superseder		Nosy List	florian, jendrik, malte, silvan
Assigned To	silvan	Keywords
Optional summary

Created on 2013-12-17.11:06:58 by florian, last changed by silvan.

Messages
msg2879 (view)	Author: silvan	Date: 2014-01-05.15:47:18
Marking this one as resolved.
msg2878 (view)	Author: silvan	Date: 2014-01-05.15:47:02
I merged and pushed. Thank you for your feedback.
msg2876 (view)	Author: malte	Date: 2014-01-05.15:41:35
Looks good. If you've done a spot-check on 1-2 instances that the behaviour is as before, I don't think we need another experiment.
msg2874 (view)	Author: silvan	Date: 2014-01-05.15:39:05
I integrated your suggested changes. Do you also want another experiment here?
msg2868 (view)	Author: silvan	Date: 2013-12-30.19:51:54
I created an rietveld issue; for the moment, only Malte has access. https://codereview.appspot.com/41330044/ I don't think the code got much complicated, but on the other side, there is no real visible improvement. I think we could merge it in, though, but I am fine with both choices.
msg2865 (view)	Author: malte	Date: 2013-12-30.19:35:50
Thanks! I'd like to have a brief look at the patch. I assume it complicates the code, so we should probably only merge it if we think the tradeoff between code effort and time saved is worth it. To see if there is a statistically significant difference, I looked at the per-domain total time scores, comparing the domains where they differ and checking if the difference can be interpreted as randomness. Assuming I counted correctly, the improvement seems to be highly statistically significant, with a p level of less than 0.000013. However, even if the effect is significant, it doesn't seem to be strong. (In other words, it often leads to an improvement, but not by much.) What do the others think?
msg2862 (view)	Author: silvan	Date: 2013-12-30.18:21:14
Here are the results: http://ai.cs.unibas.ch/_tmp_files/sieverss/2013-12-29-issue404-abs-d.html http://ai.cs.unibas.ch/_tmp_files/sieverss/2013-12-29-issue404-abs-p.html There is no visible difference. If anyone wants to see the patch, let me know, otherwise I'll merge and push soon.
msg2858 (view)	Author: silvan	Date: 2013-12-29.14:52:16
Okay. I will run some experiments.
msg2857 (view)	Author: malte	Date: 2013-12-29.11:22:16
Let's say we have N PDBs and M maximal additive subsets, which are of size K on average. To detect a dead end, we need to check all PDBs once: O(N). To compute the heuristic value, we need to look up the heuristic values and then maximize over all maximal additive subsets: O(N + MK). If we only care about dead ends and not heuristic values, it would be nicer to only pay the O(N) cost rather that O(N + MK). How much difference does it make? Florian has looked at the relationship between N and M more thoroughly than me, but if I recall correctly, there are many cases, where M is much larger than N. In the scatter plots on http://ai.cs.unibas.ch/papers/pommerening-et-al-ijcai2013-poster.pdf (where \|MAS(C)\| denotes M and \|C\| denotes N), there is a significant number of cases where M is 100x or 1000x larger than N. (Although this is with the Sys^2 PDB generation method and not iPDB; I would expect iPDB to have lower values of both M and N, but still the number of maximal additive subsets is low.)
msg2856 (view)	Author: silvan	Date: 2013-12-29.11:06:52
Okay. But then again, all we can do is have a method in the canonical heuristic function that iteratively evaluates all of its PDBs and stops as soon as one of them is a dead end for the given state. There is no way to avoid retrieving the heuristic value from the PDB, right? And this is cheap anyway. Or do I still miss something? If this is correct, everything we can save is just the summation over all PDB heuristic values when computing the heuristic value of the canonical heuristic function.
msg2855 (view)	Author: malte	Date: 2013-12-28.22:40:17
The problem (or rather, wasted effort) occurs when the sampled state is not a dead-end.Then compute_heuristic evaluates all current maximal cliques to compute the heuristic value, which can be quite expensive. But the sampling process doesn't care about the heuristic value, only whether or not it's a dead end. So it would be useful to have a way to do the dead-end test without computing the exact heuristic value in case of a non-dead-end.
msg2852 (view)	Author: silvan	Date: 2013-12-28.18:15:25
I am not sure if I miss something, but I do not see where there is any overhead in the way it is currently implemented: random_walk indeed calls evaluate(), but having a look at evaluate(), the first call there is compute_heuristic(), which in turn does the following: for (size_t i = 0; i < pattern_databases.size(); ++i) { pattern_databases[i]->evaluate(state); if (pattern_databases[i]->is_dead_end()) return -1; } (the last return should be return DEAD_END; I just fixed this) For me, this looks exactly like the way you proposed it should be implemented. The computation of the maximal additive subsets is only triggered when a new pattern is added to the canonical heuristic function, something which does not happen in the process of sampling.
msg2809 (view)	Author: florian	Date: 2013-12-17.11:06:58
The iPDB random walk that is used to create sample states calls evaluate() on every intermediate state to determine if it is a recognized dead end. This can be checked cheaper by adding a specialized method to canonical heuristic that checks whether one of its component PDBs recognizes the state as a dead end. This would avoid the overhead of evaluating the maximal additive subsets for these states.

History
Date	User	Action	Args
2014-01-05 15:47:18	silvan	set	status: testing -> resolved messages: + msg2879
2014-01-05 15:47:02	silvan	set	messages: + msg2878
2014-01-05 15:41:35	malte	set	messages: + msg2876
2014-01-05 15:39:05	silvan	set	messages: + msg2874
2013-12-30 19:51:54	silvan	set	messages: + msg2868
2013-12-30 19:35:50	malte	set	messages: + msg2865
2013-12-30 18:21:14	silvan	set	messages: + msg2862
2013-12-29 14:52:16	silvan	set	status: chatting -> testing assignedto: silvan messages: + msg2858
2013-12-29 11:22:16	malte	set	messages: + msg2857
2013-12-29 11:06:52	silvan	set	messages: + msg2856
2013-12-28 22:40:17	malte	set	messages: + msg2855
2013-12-28 18:15:26	silvan	set	messages: + msg2852
2013-12-17 12:21:29	jendrik	set	nosy: + jendrik
2013-12-17 11:06:58	florian	create

Issue404