-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precision, Recall and F1 score of 1 or 2 regression trees #17
Comments
Thanks for the note. A couple of questions:
1. What is the class imbalance in the data set? 50-50? It appears that all the examples are classified as one class.
2. What is the depth of the trees?
A couple of favors.
1. Can you please open the models directory and open regressionTrees.txt file?
2. Can you please send us the .bk file?
Thanks
Sriraam
On May 3, 2018, at 8:12 PM, Rodrigo Azevedo <[email protected]<mailto:[email protected]>> wrote:
When I set the trees parameter to learn 1 or 2 regression trees I always get those metrics as NaN or 0.
% Precision = NaN at threshold = 0,500
% Recall = 0,000000
% F1 = NaN
I've tried it several times with same dataset, settings and different numbers of learning trees.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#17>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AXip1Xhtm6S8TE9zS_iBfxw3AX4l6wPrks5tu6rhgaJpZM4Tx_tL>.
|
Thanks for helping! I tried to learn one single regression tree with the following dataset: IMDB dataset provided in the BoostSRL Wiki
95 positive examples and 146 negative examples.
NELL sports dataset 174 positive examples and 174 negative examples
|
Thanks a lot for providing the information. I am assuming you are getting proper AUC ROC and AUC PR. With probabilistic classifiers such as ours the precision and recall is not a straightforward measure. In our code the 'threshold' of the prediction probabilities are hard coded to 0.5, for deciding predicted positives and negatives, since the first development cycle. So when the predicted probabilities are all lower than 0.5 (especially with 1 tree) precision comes out as NaN. We will take this up as an open issue and make the threshold dynamic/customizable through our next full release cycle. However, as a quick fix on your side, if you are using the source code directly, is changing the threshold for different data sets and seeing what works. Just go to class "edu.wisc.cs.will.boosting.RDN.RunBoostedRDN.java" and change the threshold from 0.5 to your preferred value in the infer() method and recompile.
LINES: 455 - 458 OR, you may just use AUCs are your performance metric if that is suitable for you. Thanks |
I'm getting proper AUC ROC and AUC PR, however all the examples are being classified as False. Positive examples are getting probabilities under 0.3 while negative examples are getting probabilities above 0.7 (1-prob). I've changed some parameters as treeDepth and numOfClauses but I didn't get better results.
The best threshold is 0.19016607954333642.
In addition, is there a way to infer using the combined regression tree? I'm currently removing the boosted trees, renaming the combined tree to regressionTree0 and modifying the model file. Thanks. |
Hi Rodrigo, Were you able to change the hard-coded threshold in the source code to a different (lower) value that I outlined in my previous message and then tried running it? Thanks |
As you can see Precision = NaN is at threshold 0.5 ... We will take care of this issue dynamically in our next release. |
Hi Mayukh, Yes, I changed and I'm able to see the scores now.
Thank you. |
When I set the trees parameter to learn 1 or 2 regression trees I always get those metrics as NaN or 0.
I've tried it several times with same dataset, settings and different numbers of learning trees.
The text was updated successfully, but these errors were encountered: