Performing parameter learning #19

rodrigoazs · 2018-06-08T20:18:10Z

Hello,

I'd like to learn the regression values for a given tree. In order to do that I'm trying to force the code to select the node I want as the best one for the split when they look through some candidates. However, I don't know how the candidate nodes passed as parameter List children in the addChildrenToOpenList method (BestFirstSearch.java, line 25) are chosen. The nodes in the children list changes when the code is ran again. What is random in this process of selecting nodes to split the tree?

In addition, let's say I want the first split to be _professor(B), student(A), publication(C,A). How can I create a SingleClauseNode object to represent it and where am I supposed to define it as the bestNode?

The tree learned changes in different runs.

UW-CSE dataset:
First run

% FOR advisedby(A, B):
%   if ( professor(B), student(A) )
%   then if ( taughtby(C, B), tempadvisedby(D, B), publication(E, D) )
%   | then if ( publication(F, A), publication(F, B) )
%   | | then return 0.8581489350995122;  // std dev = 0.000, 10.000 (wgt'ed) examples reached here.  /* #pos=10 */
%   | | else return 0.02481560176617886;  // std dev = 0.373, 12.000 (wgt'ed) examples reached here.  /* #neg=10 #pos=2 */
%   | else if ( publication(G, B), publication(G, A) )
%   | | then return 0.8268989350995116;  // std dev = 0.174, 32.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=31 */
%   | | else if ( publication(H, A), publication(I, B) )
%   | | | then if ( ta(J, A), publication(I, K), ta(L, K) )
%   | | | | then return 0.8581489350995122;  // std dev = 0.000, 5.000 (wgt'ed) examples reached here.  /* #pos=5 */
%   | | | | else if ( tempadvisedby(M, B), publication(H, N), professor(N) )
%   | | | | | then return -0.14185106490048777;  // std dev = 0.000, 3.000 (wgt'ed) examples reached here.  /* #neg=3 */
%   | | | | | else if ( tempadvisedby(P, B) )
%   | | | | | | then return 0.6081489350995122;  // std dev = 0.866, 4.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=3 */
%   | | | | | | else return 0.15814893509951225;  // std dev = 0.458, 10.000 (wgt'ed) examples reached here.  /* #neg=7 #pos=3 */
%   | | | else return 0.6663681131817044;  // std dev = 0.394, 73.000 (wgt'ed) examples reached here.  /* #neg=14 #pos=59 */
%   else return -0.1418510649004883;  // std dev = 0.000, 220.000 (wgt'ed) examples reached here.  /* #neg=220 */

Second run

% FOR advisedby(A, B):
%   if ( hasposition(B, C), student(A) )
%   then if ( publication(D, A), publication(D, B) )
%   | then return 0.8116373071925351;  // std dev = 0.211, 43.000 (wgt'ed) examples reached here.  /* #neg=2 #pos=41 */
%   | else if ( publication(E, A), publication(E, F), professor(F) )
%   | | then if ( publication(G, B), tempadvisedby(H, B) )
%   | | | then return -0.05851773156715445;  // std dev = 0.276, 12.000 (wgt'ed) examples reached here.  /* #neg=11 #pos=1 */
%   | | | else if ( taughtby(I, B), taughtby(I, F) )
%   | | | | then return 0.10814893509951219;  // std dev = 0.866, 4.000 (wgt'ed) examples reached here.  /* #neg=3 #pos=1 */
%   | | | | else if ( ta(J, A) )
%   | | | | | then return 0.6581489350995122;  // std dev = 0.894, 5.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=4 */
%   | | | | | else if ( tempadvisedby(K, B), tempadvisedby(L, F), publication(M, L) )
%   | | | | | | then return 0.5248156017661788;  // std dev = 0.816, 3.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=2 */
%   | | | | | | else return 0.2581489350995122;  // std dev = 1.095, 5.000 (wgt'ed) examples reached here.  /* #neg=3 #pos=2 */
%   | | else if ( taughtby(N, B), ta(N, A) )
%   | | | then return 0.8581489350995123;  // std dev = 0.000, 16.000 (wgt'ed) examples reached here.  /* #pos=16 */
%   | | | else return 0.6208607995062918;  // std dev = 0.425, 59.000 (wgt'ed) examples reached here.  /* #neg=14 #pos=45 */
%   else return -0.13653191596431785;  // std dev = 0.073, 188.000 (wgt'ed) examples reached here.  /* #neg=187 #pos=1 */

Thank you,
Best regards.

The text was updated successfully, but these errors were encountered:

mayukhdas · 2018-06-08T20:47:17Z

Hi Rodrigo,

Allow us some time to look into this. We will get back to you with an explanation as soon as possible.

Thanks.

rodrigoazs · 2018-07-10T15:53:03Z

Hello,

I have done some modifications in the ILPouterLoop.java in order to force creating nodes and leaves in specific places and create the same structure of a given tree. I have done that by creating SingleClauseNodes and allowing the creation of interior nodes and leaves.

It seems that the code is working, however I am getting very different standard deviations in the WILL regression tree file produced. The number of reached examples and the regression values are very similar.

Any idea what could it be?

Learning a single tree

%%%%%  WILL-Produced Tree #1 @ 16:26:35 7/9/18.  [Using 3,379,008 memory cells.]  %%%%%

% FOR advisedby(A, B):
%   if ( professor(B), student(A) )
%   then if ( tempadvisedby(C, B), publication(D, A), publication(D, B) )
%   | then return 0.8581489350995117;  // std dev = 1.79e-07, 29.000 (wgt'ed) examples reached here.  /* #pos=29 */
%   | else if ( taughtby(E, B), ta(E, A) )
%   | | then return 0.8581489350995123;  // std dev = 0.000, 19.000 (wgt'ed) examples reached here.  /* #pos=19 */
%   | | else return 0.46000078695136487;  // std dev = 0.490, 108.000 (wgt'ed) examples reached here.  /* #neg=43 #pos=65 */
%   else return -0.14185106490048802;  // std dev = 0.000, 167.000 (wgt'ed) examples reached here.  /* #neg=167 */

Learning parameters for the previous tree:
First run

%%%%%  WILL-Produced Tree #1 @ 16:49:03 7/9/18.  [Using 3,310,648 memory cells.]  %%%%%

% FOR advisedby(A, B):
%   if ( professor(B), student(A) )
%   then if ( tempadvisedby(C, B), publication(D, A), publication(D, B) )
%   | then return 0.8248156017661784;  // std dev = 0.983, 30.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=29 */
%   | else if ( taughtby(E, B), ta(E, A) )
%   | | then return 0.762910839861417;  // std dev = 1.345, 21.000 (wgt'ed) examples reached here.  /* #neg=2 #pos=19 */
%   | | else return 0.5214142412219619;  // std dev = 4.678, 98.000 (wgt'ed) examples reached here.  /* #neg=33 #pos=65 */
%   else return -0.14185106490048813;  // std dev = 0.000, 194.000 (wgt'ed) examples reached here.  /* #neg=194 */

Second run

%%%%%  WILL-Produced Tree #1 @ 11:15:20 7/10/18.  [Using 3,102,320 memory cells.]  %%%%%

% FOR advisedby(A, B):
%   if ( professor(B), student(A) )
%   then if ( tempadvisedby(C, B), publication(D, A), publication(D, B) )
%   | then return 0.8248156017661784;  // std dev = 0.983, 30.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=29 */
%   | else if ( taughtby(E, B), ta(E, A) )
%   | | then return 0.762910839861417;  // std dev = 1.345, 21.000 (wgt'ed) examples reached here.  /* #neg=2 #pos=19 */
%   | | else return 0.5081489350995129;  // std dev = 4.770, 100.000 (wgt'ed) examples reached here.  /* #neg=35 #pos=65 */
%   else return -0.1418510649004882;  // std dev = 0.000, 211.000 (wgt'ed) examples reached here.  /* #neg=211 */

Thanks for helping,
Best regards.

mayukhdas · 2018-07-12T17:50:10Z

Hi Rodrigo,

Sorry for the delay. Also apologies for being confused about the question. So this code samples data for every run. That is why the SDs are different.

However, I may have understood your question/concern wrong. If that is so, please let me know. I will try to get to the bottom of this.

Thanks
Mayukh

rodrigoazs · 2018-07-13T13:10:56Z

Hi Mayukh,

I'm very grateful for your help. Actually I am concerned about why the SDs values are so different comparing to the regular learning code. As I said previously, I'm intending to implement Parameter Learning. To do that, I'm forcing the code to provide the clause I want as the Best Node found. I'm also forcing the code to branch leaves or interior nodes in the same structure of a given tree so that I can learn new regression values.

The way I'm doing that follows the steps below:

In ILPouterLoop.java, I'm not performing the Search (because I already know the nodes I want).
SearchResult sr = innerLoopTask.performSearch(learningTreeStructuredTheory ? savedBestNode : null);
I'm creating a bestNode by appending literals to a SingleClauseNode I have created.
Then I'm forcing to create a branch leaf or a branch interior node instead of checking the boolean goodEnoughFitTrueBranch and so on.

The first block of code is a WILL-Produced Tree learnt from scratch (using the original code). After learning this tree I forced my code to generate the same one (generating the same nodes in each level and branch). When I run this, I obtain similar regression values as you can see comparing block 1 with block 2 and block 1 with block 3. For every run it samples new data (I have provided the same train_neg and train_post files), that's why regression values, SDs and number of reached examples are different. However the SDs comparing from the original learning code with mine are very different. It's about ten times greater.

The clause advisedby(A, B) :- professor(B), student(A), ! has a original SD value of 0.490 for 43 negative examples and 65 positive examples, with my code it has a value of 4.678 with 35 negative examples and 65 positive ones.

Also, the clause advisedby(A, B) :- professor(B), student(A),tempadvisedby(C, B),publication(D, A),publication(D, B), ! has a original SD of 1.79e-07 with only 29 positive examples reached. My code presented a SD of 0.983 with only 1 more negative example reached.

The regression values and number of reached examples seems OK, but the standard deviations are very different.

It seems it's not impacting when I test my model. The model file just have the clauses and regression values, but I think there is something wrong in the way I'm creating theses SingleClauseNodes.

If you are still confusing, please let me know.

Thank you for your patience :)
Rodrigo

mayukhdas · 2018-07-13T18:13:16Z

Hi Rodrigo,

I understand your concern now. Is it possible to send the java file(s) with your changes so that we can take a look at them and try to figure out the impact of those changes? If you have a forked branch of the repository let us know we can also try to look at it directly instead of you attaching java code.

Thanks
Mayukh

rodrigoazs · 2018-07-17T20:21:39Z

Hi Mayukh,

I appreciate your help. My code is a little messy and it is not on a forked branch yet. I will provide good comments and push it to a forked branch so that you can take a look. Please, give me a couple days to do that.

In addition, I'm a little confused about how this weighted variance is calculated. Do you have something that could help me to understand that? I thought branches that have standard deviations closer to 0 were more likely to be good branches. But it seems that this assumption is not true.

I learnt a single tree (original learning code) in the Yago2s Database for the playsfor target and I've got this result:

%%%%% WILL-Produced Tree #1 @ 18:28:26 7/16/18. [Using 1.015.432.728 memory cells.] %%%%%
% FOR playsfor(A, B):
% if ( isaffiliatedto(A, B) )
% then return 0.8578097747059199; // std dev = 9,694, 277.155,000 (wgt'ed) examples reached here. /* #neg=94 #pos=277.061 */
% else return -0.14184745438877816; // std dev = 1,000, 276.969,000 (wgt'ed) examples reached here. /* #neg=276.968 #pos=1 */

For about 277.000 examples for each branch, the majority is either positive or negative. The AUC ROC is 0.999704.

Thank you,
Rodrigo.

mayukhdas · 2018-07-27T17:44:18Z

Hi Rodrigo @rodrigoazs,

If we could get a code snippet of your customization, it would be great. We tried but somehow are unable to replicate your scenario. Even if you do not have a forked repository send us the the snippet of your customized java class(es). We can try to integrate that into the current code and try to see why the standard deviations are different.

I understand it might be awkward to paste entire java files in comment, so just send me an email if that is easy for you.

Thanks
-- Mayukh

rodrigoazs · 2018-07-31T02:30:40Z

Hi @mayukhdas,

I just sent you an email with the customized code. About my last question (Yago2s), the tree in this scenario was obtained through the original BoostSRL code. I do not know why this happens in the original code and also in my customized one. I can also provide exactly the Yago2s train and test sets I have used if you like.

Thanks.

mayukhdas · 2018-07-31T15:07:57Z

Hey Rodrigo,

Thanks a lot, I will look into that code and get back to you as soon as possible.

Thanks
Mayukh

mayukhdas self-assigned this Jul 20, 2018

mayukhdas added help wanted question and removed help wanted labels Jul 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performing parameter learning #19

Performing parameter learning #19

rodrigoazs commented Jun 8, 2018 •

edited

Loading

mayukhdas commented Jun 8, 2018

rodrigoazs commented Jul 10, 2018

mayukhdas commented Jul 12, 2018

rodrigoazs commented Jul 13, 2018 •

edited

Loading

mayukhdas commented Jul 13, 2018

rodrigoazs commented Jul 17, 2018

mayukhdas commented Jul 27, 2018 •

edited

Loading

rodrigoazs commented Jul 31, 2018

mayukhdas commented Jul 31, 2018

Performing parameter learning #19

Performing parameter learning #19

Comments

rodrigoazs commented Jun 8, 2018 • edited Loading

mayukhdas commented Jun 8, 2018

rodrigoazs commented Jul 10, 2018

mayukhdas commented Jul 12, 2018

rodrigoazs commented Jul 13, 2018 • edited Loading

mayukhdas commented Jul 13, 2018

rodrigoazs commented Jul 17, 2018

mayukhdas commented Jul 27, 2018 • edited Loading

rodrigoazs commented Jul 31, 2018

mayukhdas commented Jul 31, 2018

rodrigoazs commented Jun 8, 2018 •

edited

Loading

rodrigoazs commented Jul 13, 2018 •

edited

Loading

mayukhdas commented Jul 27, 2018 •

edited

Loading