Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't get proper prediction score from python when --boosting N option used at training time #4597

Open
jprobichaud opened this issue May 25, 2023 · 2 comments
Assignees
Labels
Bug Bug in learning semantics, critical by default

Comments

@jprobichaud
Copy link

Describe the bug

I've trained a small model using the --boosting N option, like for example --loss_function logistic -b 18 --l1 0.1 --l2 0.0001 --nn 50 --boosting 5 using python or the command line.

Then, when I try to get prediction scores out of it, I get "1.0" all the time when using python or using -p on the command line. When I use the -r command line parameter with vw cli executable, I get the list of all features with their scores and a final meaninfgul score at the end.

It seems impossible to get these values from within the python classes, no matter what I try (playing with the various PredictionType )

This seems either a documentation bug or a more serious issue w.r.t. boosting reduction perhaps.

How to reproduce

  1. train with --loss_function logistic -b 18 --l1 0.1 --l2 0.0001 --nn 50 --boosting 5
  2. try to get predictions within python that matches the scores you get from using vw -i model.vw -t test.txt -r /dev/stdout

Version

9.2.0

OS

Linux

Language

Python

Additional context

No response

@jprobichaud jprobichaud added the Bug Bug in learning semantics, critical by default label May 25, 2023
@ataymano
Copy link
Member

ataymano commented Jun 7, 2023

Seems like a bug with boosting (and nn/loss function are not relevant here).
Simpler repro (can be done with binary and -p as well, so problem is not python specific):

from vowpalwabbit import pyvw
import numpy as np

vw = pyvw.Workspace('--boosting 1 -b 1')

yhat = lambda x: 2 * x + 3

for i in range(10000):
    x = np.random.rand()
    vw.learn(f'{yhat(x)} | 1:{x}')

print(f'w_x = {vw.get_weight(1, 0)}')
print(f'constant = {vw.get_weight(0, 0)}')

x = 2
print(f'y({x}) = {vw.predict("| 1:{x}")}')

Output:

w_x = 1.9999972581863403
constant = 3.000001907348633
y(2) = 1.0

@ataymano
Copy link
Member

ataymano commented Jun 7, 2023

yeah seems like some binary classification logic is hardcoded there:

ec.pred.scalar = VW::math::sign(final_prediction);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Bug in learning semantics, critical by default
Projects
None yet
Development

No branches or pull requests

2 participants