-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a subclass of field_set without continuous representation as variable-length sequences of bits. #10
Comments
Hi Arley, ... and Nil, I think we need to fix contin_t for "ordinary" moses, not just particle-swarm moses. How about this idea: we store a full-length float or double in the deme, AS WELL AS the contin_t. The value of the double is used in the actual combo tree, whereas the contin_t is used only during knob-turning, to store the current variance. Thus, during knob turning, the contin_t is used to generate a random double-precision number, with a PDF centered at zero, and a variance equal to the contin_t value. This random offset is then added to the double-precision number in the deme. (The random offset is stored in the instance, or maybe the sum of the value and the offset is stored .. ). Thus, knob-turning generates a random walk. (During knob turning, the variance is also either made smaller, is kept as-is, or is increased). The above is how we should do floats in "ordinary", non-particle-swarm MOSES. For the particle-swarm moses, you may want to do knob turning differently (I don't really understand how), but the above mechanism will at least allow you to store the full-length double-precision number in both the instance and in the deme. |
yah -- This seems like a reasonable approach for now.. any thoughts from ben On Tue, Jul 7, 2015 at 10:29 AM, Linas Vepštas [email protected]
Ben Goertzel, PhD "The reasonable man adapts himself to the world: the unreasonable one |
@linas the current state of the code already nearly does that, a contin_spec contains a mean and any knob turning is centered around that mean. However the variance is replaced by the expansion combined with the step, so if you tweak the contin trits (Left, Right, Stop) you get some distribution, let's call it a CTD (for Contin Trits Distribution), it's not a Gaussian, but maybe it approaches one. Anyway, I agree that the distribution should not be limited to a CTD. My advice would be to add or upgrade the neighborhood sampling methods to take in argument a distribution, could be a Gaussian or otherwise. I don't think you need to change anything else. And here's why, candidates in the deme are represented as instances typedef std::vector<packed_t> instance; see instance.h If you were to represent the contin_spec directly as a float you'd still need to hack the code to en/decode a contin into an instance (see field_set::pack in field_set.h). It's not impossible, but it adds a coding burden to our student. And if you run out of resolution you can always increase the contin_spec depth, so why bother. So to sum up, yes ps doesn't need this contin trits representation, but MOSES does. Ultimately I don't think this en/decoding of candidates into instances is good. It has the advantage of saving a bit of RAM at the expense of making the code more rigid and complex. But getting rid of it is probably too much work for this project. I'll get back with more details about what should be hacked to enable neighborhood sampling with any distribution, not just CTDs. |
Hmm, on the other hand converting a float into bytes is trivial, and it would save some CPU time. Anyway, you'd still need to upgrade the neighborhood sampling to support any distribution, those 2 issues are really independent. So I suggest
|
@ArleyRistar The code to hack to support 1. is definitely in
you may start to study it, till I come back with more information. |
At that point I really need to understand exactly what you want to do and how. There are several ways to go about that problem, like the distribution could be passed to the sampling function (sample_from_neighborhood, then generate_contin_neighbor or another new one), or it could be inside the contin_spec. Also there is a problem I had not thought about, the mean, in the contin_spec, remains the same during the deme optimization, it cannot be changed because otherwise the contin values corresponding to the instances will not be correctly reconstructed. So if you really want to have the mean change during deme optimization we need to address that. |
OK, I presume the code so far is here https://github.com/arleyristar/moses/commits/master Please point to me as well any relevant doc. |
Nil, i'm seeing the neighborhood sampling now, i'll try to solve this first In your penultimate email, i'm assuming that it was for me. I'll see how Yes, the code for ps is in there. On Tue, Jul 7, 2015 at 10:57 AM, ngeiswei [email protected] wrote:
|
From the things I thought this seems a acceptable idea for 1; forgive me if Nil, we could maintain neighborhood sampling as it is now and change Let me explain in details: My idea is to modify contin_stepper left() and right() to have a behavior Applying this contin_stepper to the actual LRS representation would give us I could write some distributions that would be choose as a parameter That makes sense or should i think of other solution? On Tue, Jul 7, 2015 at 12:52 PM, Arley Ristar [email protected] wrote:
|
On Tue, Jul 7, 2015 at 8:17 PM, ngeiswei [email protected] wrote:
The contin_t, as currently designed, attempts to use the center-points of I am kind-of guessing that storing full 64-bit floats, and using a random |
p.s. what you call a "contin trit" is called a "Cantor tree" in the general world. http://blogs.scientificamerican.com/critical-opalescence/files/2012/08/f5.jpg The cantor tree unfolds into the "modular group" and there is all sorts of neat stuff there. |
So, from Linas idea and issue 2 from Nil: While this isn't decided, I'll be fixing other things in ps. On Wed, Jul 8, 2015 at 1:31 AM, Linas Vepštas [email protected]
|
@linas, if you update the contin neighborhood sampling you don't have to iterate 1.0 1.5 1.25 1.375 ... to get to 1.333. You just set the contin_spec with enough depth and you can set the value 1.333 right away. @ArleyRistar contin_spec is certainly useful, but could indeed be greatly simplified, both in code and performances, if the cantor tree representation was replaced by a float representation. Although there is still one area the cantor tree representation wins. If the resolution of the contin values turn out to be low then it takes much less memory, a few bits instead of a systematic 32 bits. It's probably not worth it but useful to keep in mind, and again you don't need to change the contin_spec code to allow to sample with other distributions (unless you put the distribution in the contin_spec of course), as I explain above. |
@ArleyRistar you cannot change the mean, the expension or the step of a contin_spec, once constructed, or at least it wouldn't make much sense because in order reconstruct the contin value given the packed bit string you need those, if they change while optimizing the deme the contin values on the candidates will also change, so the older candidates won't have their contin values correctly reconstituted. That's why I want to understand exactly how you're algorithm is gonna work, I'm currently reading https://en.wikipedia.org/wiki/Particle_swarm_optimization let me know if you follow that, or some alteration of it (this will help me to understand your code when I start reviewing it). |
@ArleyRistar I'm only 1/3 way through reviewing your code, but I think you don't need to change the neighborhood sampling functions, all sampling code is taking place in the particle_swarm struct, and that makes sense given its peculiarities (you don't sample the neighborhood directly, you sample the coefficients to construct the velocities). So I don't think you need to do 1. or 2. All you need is to increase the depth of the contin_spec. If that becomes a performance bottleneck then you can work toward improving contin_spec, etc. But if the performance overhead is negligible, you don't need to bother (unless your really want to). Anyway, I'm keeping reviewing, I may change my mind once I see the whole thing. |
@ArleyRistar I'm re-reading your first comment when you created that issue. You got it all. Forget my comment about modifying the neighborhood distribution, I got there because of Linas initial comment and my lack of understanding of ps. So I stand by my comment right above, that can be synthesized into these questions
If the answers are yes, then I think what you suggest is the correct hack, replace the cantor tree by float binary rep. Of course before we go there, we want to do things well, record the performances of a bunch of runs before, to compare with after the code optimization, etc. And I'll probably want to review the contin_spec once more, just in case I see something simple and bright to do. But first things first, answer those 2 questions. |
For instance if you could plot
w.r.t. depth (varying from say 5 to 32), for one or more optimization problems that you know ps is supposed perform OK, that would definitely help a lot to assess what should be done next. Ideally you would even repeat that over several random seeds, say 10, to cancel out noise, get smoother charts. Once you know the kind of depth you need, you may run that into a profiler like valgrind, so you can get an idea of how much performance is wasted by the cantor tree representation. |
I'll change the code to increase the depth and test with it as Nil said. On Thu, Jul 9, 2015 at 6:45 AM, ngeiswei [email protected] wrote:
|
It may take you a few days even, but it's worth it, not just for ps. Thanks! |
Hi Nil, Arley, I have to retract/modify my earlier comments as well; they were based on a misunderstanding of sorts. So, what I meant to say was this: -- I am very excited by the PSO work; I think it will be an excellent way of handling float-point numbers in moses. My grumbling was less about the contin representation itself, than it was with how the the hill-climbing algo treated it: hill-climbing by twiddling one binary bit at a time of a contin was just terribly slow, inefficient, awful. -- Yes, contins offer a small, compact way to store floats or doubles. I guess its harmless to use them in the combo trees: you just need to write two functions (maybe they exist??) to convert double to the contin rep, and back. As long as you save enough digits, it should not be a problem. -- If contins are replaced in the combo trees, then I strongly recommend using 64-bit floats (i.e. doubles), do NOT mess with 32-bit floats. Messing about with 32-bit floats is not worth the effort. Its a bad idea. -- Nil, the way I currently understand it is as follows: every combo tree would have one to hundreds of doubles in it. If the metapop has 10K trees in it, then there would be 10K trees/metapop * 200 doubles/tree * 8 bytes/double = 16MBytes of doubles in the metapop: hardly a big amount. -- During a single PSO step, we pick just one tree, and so PSO only needs to optimize 1 to a few hundred doubles. If you have 1K particles per swarm, that is about 200K doubles per PSO step, which again is not very big, and might even fit in modern-day CPU caches. So I see no problem at all with storing doubles everywhere. Done right PSO will be awesome for moses. I'm actually excited, now that I think I understand it. |
Linas, you wrote, "-- Yes, contins offer a small, compact way to store floats or doubles. I guess its harmless to use them in the combo trees: you just need to write two functions (maybe they exist??) to convert double to the contin rep, and back. As long as you save enough digits, it should not be a problem." Yes they exist, field_set::get_contin and field_set::set_contin, but in fact you can directly en/deconde a contin via dereferencing the contin_iterator. Yeah, you're right, the extra RAM will probably be negligible. I'd still be interested in seeing how increasing the depth would help the quality of the results, and whether this en/decoding is really hurting run-time performance, though. I mean it's nice to know the actual gain of changing a piece of code (although the code simplification is a gain by itself). |
OK, sure. My knee-jerk reaction is "simplify the code" and remove contin entirely, but I'm tired just right now and don't really care. :-) Do the right thing. |
TL;DR update: Detailed update: The implementation in the last commit has a "mistake", it almost always It's clearly wrong, but this way it runs very quickly, like 1000 evals in 3 The thing is that it's taking a much longer time now (45s) and when i apply The worse thing? it doesn't happen because of the particle-swarm update of Changing the problem didn't help either (iris m2000 has very little contin Obs: When i change set_contin to do nothing, it gets 3s in predicates Nil don't expect much good results from what you asked. This weekend i'll On Thu, Jul 9, 2015 at 12:11 PM, Linas Vepštas [email protected]
|
What is the test problem that you are running? --linas On Sat, Jul 11, 2015 at 9:53 AM, arleyristar [email protected]
|
@ArleyRistar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us? |
I was running with predicates.csv -W1 -u pred -m1000. On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei [email protected] wrote:
|
I forgot to post here too: For resume version jump to 3.4. General.
First, the experiments were divided in two groups: without reduction (-B 0
2.2. Processing:
2.3. Time:
2.4. Others:
I don't think that memory is a problem, even very complex problems with With variable length representation, processing is just a problem with PS, I don't know how the cache of combo trees reduction works, but it seems to The variable length representation don't have that much impact in memory or and have a less complex code at the cost of a little more ram and my work.Detailed results data:
1.2. Callgrind:
1.3. Time (s):
2.2. Callgrind:
2.3. Time (s):
3.2. Callgrind:
3.3. Time (s):
4.2. Callgrind:
4.3. Time (s):
For even more detailed result (the callgrids and massifs outputs or time [1] See valgrind site: http://valgrind.org/info/tools.html On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar [email protected] wrote:
|
Update from what we talked about in the Moses channel.
For now on i'll make my updates here to avoid any confusion. On Mon, Jul 27, 2015 at 9:38 AM, Arley Ristar [email protected] wrote:
|
About std::bitset: it isn't something for PS alone, and it isn't something
Basically you have to change too much things to gain almost nothing, so On Thu, Jul 30, 2015 at 5:02 PM, Arley Ristar [email protected] wrote:
|
Hi Arley, Could you cut-n-paste this into the directory On Mon, Jul 27, 2015 at 7:38 AM, arleyristar [email protected]
What do you mean "affects too much"? Do you mean "uses too much cpu You nmay want to call (maybe) reduct once, before the PSO step; clearly,
predicates.csv is extremely simple. predicates-1.3.csv is a lot harder to something that had to learn e.g. x>1.618 would be hard for the current Much better test cases wou;ld be any dataset that people typically use A demo of either a "linear programming" (linear optimization) or a "linear
I don't quite understand these results. MOSES does reduction in several Can you sketch the pseudocode for me?
|
"Could you cut-n-paste this into the directory I'll make a diary from that post and commit."What do you mean "affects too much"? Do you mean "uses too much cpu You nmay want to call (maybe) reduct once, before the PSO step; clearly, predicates.csv is extremely simple. predicates-1.3.csv is a lot harder to something that had to learn e.g. x>1.618 would be hard for the current Much better test cases wou;ld be any dataset that people typically use A demo of either a "linear programming" (linear optimization) or a "linear Yes, predicates is the most simple one to test, i was aware of that. I don't quite understand these results. MOSES does reduction in several Can you sketch the pseudocode for me? On Thu, Jul 30, 2015 at 6:16 PM, Linas Vepštas [email protected]
|
OK, So now I am completely confused. I was assuming the algo went like this:
In other words, the new PSO-enabled version is just like the old hill-climbing moses, and it even uses the same-old hill-climbing just like before, with NO modifications AT ALL to hill-climbing... the only modification for PSO would be to replace all contins by particles, and to replace the computation of a single score by a PSO-loop. That is, in the old code, scoring would be done exactly once for each knob setting; in the new code, this single score would be replaced by a PSO loop. But otherwise, everything else would be unchanged. I guess its not being designed like this? |
Agreed about std::bitset. |
That is, in the old code, scoring would be done exactly once for each knob Some comments: outer loop: On Fri, Jul 31, 2015 at 5:20 AM, ngeiswei [email protected] wrote:
|
On Fri, Jul 31, 2015 at 8:06 PM, arleyristar [email protected] wrote:
OK, well, I don't understand what "pure PSO" is. How can PSO possibly
Again, I don't understand what a "discrete algo for PS" could be, or Look, there is nothing "wrong" with hill-climbing; it works well for
Yes, and that is what I find very exciting.
OK. I'd like some insight into what this might be like.
Yeah, this would be great!
The formatting is mangled...
The outer loop is independent of HC. There is a population of trees;
OK. As far as I am concerned, contins could be completely discarded
yes.
well, in this "hybrid" approach, this would be a step you'd have to
Yeah, I guess so ... or maybe you would want to do it for the best 5 or 10 ...
Yes; each different tree would have the contin variables in different
Well, normally, a single instance is converted to a tree, then the There seem to be two different ways of doing hybrid-PSO. One is "per-instance PSO": What is the "some set of instances"? Well, it could be a set of one:
I suppose that is another way of doing it. In this case, you let HC Clearly, per-hill PSO will use a LOT less cpu than per-instance PSO. --linas
|
What about PSO is moved into the sampling contin neighborhood? So there is no PSO per se, it's still hillclimbing optimization but the neighborhood sampling for contin (now double) values relies on PS. The code could be done so that PSO based sampling is just one possible sampling amonst others (just that one keeps track of positions and velocities to determine the sampling distribution, as opposed to say a Gausian). If there's only contin knobs I guess it might look and behave just like the current PSO code, but if there are discrete values, some discrete knob turning will be mashed up with contin knob turned according to PSO. I don't have a good intuition as to whether it's a good idea or not to mash up discrete HC and contin PS, ultimately it depends on how dependent discrete knobs are from contin knobs. Assuming there are independent then it should work great, faster than doing HC then PS, cause we'll be able to optimize both discrete and contin knobs at the same time. If there are dependent (they surely are to some degree), then I don't know. |
That seems a good, simple kind of hybrid to explore... On Tue, Aug 4, 2015 at 1:20 PM, ngeiswei [email protected] wrote:
Ben Goertzel, PhD "The reasonable man adapts himself to the world: the unreasonable one |
Well, during hill-climbing, hundreds or thousands of different trees are explored. Depending on the knob setting, any given tree may or may not contain some contin variables -- for example:
so if boolean-knob is set to True then the tree is |
Nil, I don't know if leaving PSO inside neighborhood sampling is a good Anyway, pick one of these two (separated or inside neighborhood) to do On Tue, Aug 4, 2015 at 1:17 PM, Linas Vepštas [email protected]
|
FYI, I don't think I have any objeections to the variable-length encoding that contin uses. I guess its OK for a suitable form of OK. What I did object to was treating contins as just another knob, of which a single bit could be twiddled at a time. Experience showed that this was an ineffective way of searching the space of possible floating point values. |
I have thought about this a little more and poked around on the Internet a There is something called PSO-LS, given in pseudocode in this paper which basically does PSO, but does hillclimbing on each particle to From what I understand, this would seem a natural extension of the current In this approach, MOSES search would break down into three levels: Highest level, which is the deme level: choosing which exemplars to put in Mid-level, which is choosing the starting-points for hill-climbing. In a Low-level... This would use the current hill-climbing, which is ... In this case, the motivation for incorporating LS (hillclimbing) into PSO Still, this approach would be expected to add value only in cases where On reflection, I find this more promising than doing PSO within the local With this approach, I think we could get away with demes containing -- Ben On Tue, Aug 4, 2015 at 11:44 PM, arleyristar [email protected]
Ben Goertzel, PhD "The reasonable man adapts himself to the world: the unreasonable one |
Hmm, what about keeping PSO as neighborhood sampling technique, but just limiting the number of contin/float knobs to turn at once? That way there would be more ones or zeros. Or would what you suggest brings a greater benefit? (I may not have understood very well honestly). |
I'm finally finishing the variable length change, it gave me more work than
Why it'll work (probably): *Nil, you can do that, but you lose the PS purpose of adjust lots of On Tue, Aug 11, 2015 at 12:29 PM, ngeiswei [email protected] wrote:
|
Yes, I like your "new idea". It makes sense. Maybe there could be a flag or Its very hard to intuit what will actually happen, since typically, there --linas On Tue, Aug 11, 2015 at 4:13 PM, arleyristar [email protected]
|
I'm back. The variable length change changed too much things outside the About the hybrid. It depends of the variable length and it's not giving On Tue, Aug 11, 2015 at 6:42 PM, Linas Vepštas [email protected]
|
Thanks for your comeback Arley. I expected the difficulty of you last task to be too high for the GSoC timing. See what you can do. We've learned things, that already matters. |
There are no open pull requests ... How hard would it be to split up the code into different mergable pull requests? |
There's no conflict, it's able to merge. On Mon, Aug 31, 2015 at 6:43 PM, Linas Vepštas [email protected]
|
ops, i did something wrong: On Mon, Aug 31, 2015 at 6:49 PM, Arley Ristar [email protected] wrote:
|
You can do multiple pull requests, that might make things easier to review and to merge. Ignore circle-ci for now. Its tempermental. |
How often things like this happen: But: Seriously that a 0 score is inside a range that is less than 0.018 (and can And in: On Mon, Aug 31, 2015 at 10:05 PM, Linas Vepštas [email protected]
|
Are these rhetorical questions, or do you want someone to answer them? On Sat, Sep 5, 2015 at 1:11 PM, arleyristar [email protected]
|
Yes, these are rhetorical questions because i had nothing better to do in "How the variable length representation could found these values?" "Is there a seed value..." I want to know to compare to what i found. And the last one: I do not exactly know how the reduction work (yet), So yes, you can consider my obvious questions as real. On Sat, Sep 5, 2015 at 3:27 PM, Linas Vepštas [email protected]
|
Hi @ArleyRistar I've only experimented learning contin-boolean expressions a few times, long ago. It obviously isn't extremely mature. Regarding
There is no |
Oh, Note also: there is no rule to reduce a contin expression by dividing out If we switch to float pt reps, then having a reduction rule that divides --linas On Mon, Sep 7, 2015 at 2:06 AM, ngeiswei [email protected] wrote:
|
Sorry, but these days I couldn't do anything for the project, at night (for On Wed, Sep 9, 2015 at 3:17 PM, Linas Vepštas [email protected]
|
Merge opencog/moses to singnet/moses
By recommendation of Nil, I'm opening a issue initially discussed on the Slack moses channel.
Summary of the situation:
I'm adding particle swarm to the optimization part of moses, and representing continuous knobs as variable-length sequences of bits is causing loss of information and ps doesn't need this discrete representation.
As discussed in the channel, I was thinking of change this representation just inside particle swarm, but there's the problem of returning the deme to the metapopulation that would lose information in the same way.
So Ben said that this design "doesn't really need to persist".
I was thinking that I could "create a modified version of field_set overriding the contin parts to use 1 packet_t as float and use this modified version only with particle swarm".
I still don't know how it can affect other parts like the metapopulation, or if it can be done easily.
Because of that i'm opening this issue looking for opinions.
Have in mind that will use a little more ram for each instance with contin parts, but with a better precision and less processing.
The text was updated successfully, but these errors were encountered: