Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performing parameter learning #19

Open
rodrigoazs opened this issue Jun 8, 2018 · 9 comments
Open

Performing parameter learning #19

rodrigoazs opened this issue Jun 8, 2018 · 9 comments
Assignees
Labels

Comments

@rodrigoazs
Copy link

rodrigoazs commented Jun 8, 2018

Hello,

I'd like to learn the regression values for a given tree. In order to do that I'm trying to force the code to select the node I want as the best one for the split when they look through some candidates. However, I don't know how the candidate nodes passed as parameter List children in the addChildrenToOpenList method (BestFirstSearch.java, line 25) are chosen. The nodes in the children list changes when the code is ran again. What is random in this process of selecting nodes to split the tree?

In addition, let's say I want the first split to be _professor(B), student(A), publication(C,A). How can I create a SingleClauseNode object to represent it and where am I supposed to define it as the bestNode?

The tree learned changes in different runs.

UW-CSE dataset:
First run

% FOR advisedby(A, B):
%   if ( professor(B), student(A) )
%   then if ( taughtby(C, B), tempadvisedby(D, B), publication(E, D) )
%   | then if ( publication(F, A), publication(F, B) )
%   | | then return 0.8581489350995122;  // std dev = 0.000, 10.000 (wgt'ed) examples reached here.  /* #pos=10 */
%   | | else return 0.02481560176617886;  // std dev = 0.373, 12.000 (wgt'ed) examples reached here.  /* #neg=10 #pos=2 */
%   | else if ( publication(G, B), publication(G, A) )
%   | | then return 0.8268989350995116;  // std dev = 0.174, 32.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=31 */
%   | | else if ( publication(H, A), publication(I, B) )
%   | | | then if ( ta(J, A), publication(I, K), ta(L, K) )
%   | | | | then return 0.8581489350995122;  // std dev = 0.000, 5.000 (wgt'ed) examples reached here.  /* #pos=5 */
%   | | | | else if ( tempadvisedby(M, B), publication(H, N), professor(N) )
%   | | | | | then return -0.14185106490048777;  // std dev = 0.000, 3.000 (wgt'ed) examples reached here.  /* #neg=3 */
%   | | | | | else if ( tempadvisedby(P, B) )
%   | | | | | | then return 0.6081489350995122;  // std dev = 0.866, 4.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=3 */
%   | | | | | | else return 0.15814893509951225;  // std dev = 0.458, 10.000 (wgt'ed) examples reached here.  /* #neg=7 #pos=3 */
%   | | | else return 0.6663681131817044;  // std dev = 0.394, 73.000 (wgt'ed) examples reached here.  /* #neg=14 #pos=59 */
%   else return -0.1418510649004883;  // std dev = 0.000, 220.000 (wgt'ed) examples reached here.  /* #neg=220 */

Second run

% FOR advisedby(A, B):
%   if ( hasposition(B, C), student(A) )
%   then if ( publication(D, A), publication(D, B) )
%   | then return 0.8116373071925351;  // std dev = 0.211, 43.000 (wgt'ed) examples reached here.  /* #neg=2 #pos=41 */
%   | else if ( publication(E, A), publication(E, F), professor(F) )
%   | | then if ( publication(G, B), tempadvisedby(H, B) )
%   | | | then return -0.05851773156715445;  // std dev = 0.276, 12.000 (wgt'ed) examples reached here.  /* #neg=11 #pos=1 */
%   | | | else if ( taughtby(I, B), taughtby(I, F) )
%   | | | | then return 0.10814893509951219;  // std dev = 0.866, 4.000 (wgt'ed) examples reached here.  /* #neg=3 #pos=1 */
%   | | | | else if ( ta(J, A) )
%   | | | | | then return 0.6581489350995122;  // std dev = 0.894, 5.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=4 */
%   | | | | | else if ( tempadvisedby(K, B), tempadvisedby(L, F), publication(M, L) )
%   | | | | | | then return 0.5248156017661788;  // std dev = 0.816, 3.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=2 */
%   | | | | | | else return 0.2581489350995122;  // std dev = 1.095, 5.000 (wgt'ed) examples reached here.  /* #neg=3 #pos=2 */
%   | | else if ( taughtby(N, B), ta(N, A) )
%   | | | then return 0.8581489350995123;  // std dev = 0.000, 16.000 (wgt'ed) examples reached here.  /* #pos=16 */
%   | | | else return 0.6208607995062918;  // std dev = 0.425, 59.000 (wgt'ed) examples reached here.  /* #neg=14 #pos=45 */
%   else return -0.13653191596431785;  // std dev = 0.073, 188.000 (wgt'ed) examples reached here.  /* #neg=187 #pos=1 */

Thank you,
Best regards.

@mayukhdas
Copy link
Contributor

Hi Rodrigo,

Allow us some time to look into this. We will get back to you with an explanation as soon as possible.

Thanks.

@rodrigoazs
Copy link
Author

Hello,

I have done some modifications in the ILPouterLoop.java in order to force creating nodes and leaves in specific places and create the same structure of a given tree. I have done that by creating SingleClauseNodes and allowing the creation of interior nodes and leaves.

It seems that the code is working, however I am getting very different standard deviations in the WILL regression tree file produced. The number of reached examples and the regression values are very similar.

Any idea what could it be?

Learning a single tree

%%%%%  WILL-Produced Tree #1 @ 16:26:35 7/9/18.  [Using 3,379,008 memory cells.]  %%%%%

% FOR advisedby(A, B):
%   if ( professor(B), student(A) )
%   then if ( tempadvisedby(C, B), publication(D, A), publication(D, B) )
%   | then return 0.8581489350995117;  // std dev = 1.79e-07, 29.000 (wgt'ed) examples reached here.  /* #pos=29 */
%   | else if ( taughtby(E, B), ta(E, A) )
%   | | then return 0.8581489350995123;  // std dev = 0.000, 19.000 (wgt'ed) examples reached here.  /* #pos=19 */
%   | | else return 0.46000078695136487;  // std dev = 0.490, 108.000 (wgt'ed) examples reached here.  /* #neg=43 #pos=65 */
%   else return -0.14185106490048802;  // std dev = 0.000, 167.000 (wgt'ed) examples reached here.  /* #neg=167 */

Learning parameters for the previous tree:
First run

%%%%%  WILL-Produced Tree #1 @ 16:49:03 7/9/18.  [Using 3,310,648 memory cells.]  %%%%%

% FOR advisedby(A, B):
%   if ( professor(B), student(A) )
%   then if ( tempadvisedby(C, B), publication(D, A), publication(D, B) )
%   | then return 0.8248156017661784;  // std dev = 0.983, 30.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=29 */
%   | else if ( taughtby(E, B), ta(E, A) )
%   | | then return 0.762910839861417;  // std dev = 1.345, 21.000 (wgt'ed) examples reached here.  /* #neg=2 #pos=19 */
%   | | else return 0.5214142412219619;  // std dev = 4.678, 98.000 (wgt'ed) examples reached here.  /* #neg=33 #pos=65 */
%   else return -0.14185106490048813;  // std dev = 0.000, 194.000 (wgt'ed) examples reached here.  /* #neg=194 */

Second run

%%%%%  WILL-Produced Tree #1 @ 11:15:20 7/10/18.  [Using 3,102,320 memory cells.]  %%%%%

% FOR advisedby(A, B):
%   if ( professor(B), student(A) )
%   then if ( tempadvisedby(C, B), publication(D, A), publication(D, B) )
%   | then return 0.8248156017661784;  // std dev = 0.983, 30.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=29 */
%   | else if ( taughtby(E, B), ta(E, A) )
%   | | then return 0.762910839861417;  // std dev = 1.345, 21.000 (wgt'ed) examples reached here.  /* #neg=2 #pos=19 */
%   | | else return 0.5081489350995129;  // std dev = 4.770, 100.000 (wgt'ed) examples reached here.  /* #neg=35 #pos=65 */
%   else return -0.1418510649004882;  // std dev = 0.000, 211.000 (wgt'ed) examples reached here.  /* #neg=211 */

Thanks for helping,
Best regards.

@mayukhdas
Copy link
Contributor

Hi Rodrigo,

Sorry for the delay. Also apologies for being confused about the question. So this code samples data for every run. That is why the SDs are different.

However, I may have understood your question/concern wrong. If that is so, please let me know. I will try to get to the bottom of this.

Thanks
Mayukh

@rodrigoazs
Copy link
Author

rodrigoazs commented Jul 13, 2018

Hi Mayukh,

I'm very grateful for your help. Actually I am concerned about why the SDs values are so different comparing to the regular learning code. As I said previously, I'm intending to implement Parameter Learning. To do that, I'm forcing the code to provide the clause I want as the Best Node found. I'm also forcing the code to branch leaves or interior nodes in the same structure of a given tree so that I can learn new regression values.

The way I'm doing that follows the steps below:

  • In ILPouterLoop.java, I'm not performing the Search (because I already know the nodes I want).
    SearchResult sr = innerLoopTask.performSearch(learningTreeStructuredTheory ? savedBestNode : null);
  • I'm creating a bestNode by appending literals to a SingleClauseNode I have created.
  • Then I'm forcing to create a branch leaf or a branch interior node instead of checking the boolean goodEnoughFitTrueBranch and so on.

The first block of code is a WILL-Produced Tree learnt from scratch (using the original code). After learning this tree I forced my code to generate the same one (generating the same nodes in each level and branch). When I run this, I obtain similar regression values as you can see comparing block 1 with block 2 and block 1 with block 3. For every run it samples new data (I have provided the same train_neg and train_post files), that's why regression values, SDs and number of reached examples are different. However the SDs comparing from the original learning code with mine are very different. It's about ten times greater.

The clause advisedby(A, B) :- professor(B), student(A), ! has a original SD value of 0.490 for 43 negative examples and 65 positive examples, with my code it has a value of 4.678 with 35 negative examples and 65 positive ones.

Also, the clause advisedby(A, B) :- professor(B), student(A),tempadvisedby(C, B),publication(D, A),publication(D, B), ! has a original SD of 1.79e-07 with only 29 positive examples reached. My code presented a SD of 0.983 with only 1 more negative example reached.

The regression values and number of reached examples seems OK, but the standard deviations are very different.

It seems it's not impacting when I test my model. The model file just have the clauses and regression values, but I think there is something wrong in the way I'm creating theses SingleClauseNodes.

If you are still confusing, please let me know.

Thank you for your patience :)
Rodrigo

@mayukhdas
Copy link
Contributor

Hi Rodrigo,

I understand your concern now. Is it possible to send the java file(s) with your changes so that we can take a look at them and try to figure out the impact of those changes? If you have a forked branch of the repository let us know we can also try to look at it directly instead of you attaching java code.

Thanks
Mayukh

@rodrigoazs
Copy link
Author

Hi Mayukh,

I appreciate your help. My code is a little messy and it is not on a forked branch yet. I will provide good comments and push it to a forked branch so that you can take a look. Please, give me a couple days to do that.

In addition, I'm a little confused about how this weighted variance is calculated. Do you have something that could help me to understand that? I thought branches that have standard deviations closer to 0 were more likely to be good branches. But it seems that this assumption is not true.

I learnt a single tree (original learning code) in the Yago2s Database for the playsfor target and I've got this result:

%%%%% WILL-Produced Tree #1 @ 18:28:26 7/16/18. [Using 1.015.432.728 memory cells.] %%%%%
% FOR playsfor(A, B):
% if ( isaffiliatedto(A, B) )
% then return 0.8578097747059199; // std dev = 9,694, 277.155,000 (wgt'ed) examples reached here. /* #neg=94 #pos=277.061 */
% else return -0.14184745438877816; // std dev = 1,000, 276.969,000 (wgt'ed) examples reached here. /* #neg=276.968 #pos=1 */

For about 277.000 examples for each branch, the majority is either positive or negative. The AUC ROC is 0.999704.

Thank you,
Rodrigo.

@mayukhdas
Copy link
Contributor

mayukhdas commented Jul 27, 2018

Hi Rodrigo @rodrigoazs,

If we could get a code snippet of your customization, it would be great. We tried but somehow are unable to replicate your scenario. Even if you do not have a forked repository send us the the snippet of your customized java class(es). We can try to integrate that into the current code and try to see why the standard deviations are different.

I understand it might be awkward to paste entire java files in comment, so just send me an email if that is easy for you.

Thanks
-- Mayukh

@rodrigoazs
Copy link
Author

Hi @mayukhdas,

I just sent you an email with the customized code. About my last question (Yago2s), the tree in this scenario was obtained through the original BoostSRL code. I do not know why this happens in the original code and also in my customized one. I can also provide exactly the Yago2s train and test sets I have used if you like.

Thanks.

@mayukhdas
Copy link
Contributor

Hey Rodrigo,

Thanks a lot, I will look into that code and get back to you as soon as possible.

Thanks
Mayukh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants