Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model dump: JSON] model.save_model does not warn when required "pool" argument is skipped #2659

Open
NickVeld opened this issue May 13, 2024 · 0 comments · May be fixed by #2663
Open

[Model dump: JSON] model.save_model does not warn when required "pool" argument is skipped #2659

NickVeld opened this issue May 13, 2024 · 0 comments · May be fixed by #2663

Comments

@NickVeld
Copy link

NickVeld commented May 13, 2024

Problem:
By mistake I do not know whether I used UUID4 or an integer number (marked as a category) to train my now historical models. I wanted to understand it from my JSON-model dumps ( model.save_model(fname=path, format="json") ) but found no info about the feature values (only splits, definitions through combination and so on).
I know that there should be "cat_features_hash" that holds this info. But it does not exist in my dumps.
With the help I found in the documentation that I should have been passed "pool" argument.

Documentation says the following but the code raises no warning (or error) when I skipped this arg.

This parameter is required if the model contains categorical features and the output format is cpp, python, or JSON.

Bonus comment:

Documentation says the following but I was able to predict the imported model and the results looked okay.

The model can be saved to the JSON format without a pool. In this case it is available for review but it is not applicable.

[Appendix]
My dump structure:

root
- ctr_data
-- {"identifier":...
-- ...
- features_info
-- categorical_features
--- *feature_index*
---- feature_id: "..."
---- feature_index: *feature_index*
---- flat_feature_index: *flat_feature_index*
-- ctrs
--- *list index*
---- borders
----- 0: 0.9999989867210388
----- ...
----- 9: 12.999999046325684
---- ctr_type: "Borders"
---- elements
----- *nested list index"
------    cat_feature_index: 0
------    combination_element: "cat_feature_value"
---- identifier: "{"identifier":[{"cat_feature_index":0,"combination_element":"cat_feature_value"}],"type":"Borders"}"
---- prior_denomerator: 1
---- prior_numerator: 0.5
---- scale: 15
---- shift: 0
---- target_border_idx: 0
-- float_features
--- ...
- model_info
-- catboost_version_info: "Git info: Commit: ff8f073eea240c11e49e51cb774c2df5249c157c Branch: heads/master Author: akhropov <akhropov@yandex-team.com> Summary: Proper ranges in parameters description git-svn info: Last Changed Rev: 9293085 Other info: Build by: Unknown user Top src dir: /src/catboost "
-- class_params
--- ...
-- model_guid: "..."
-- output_options: "{...}"
-- params
--- ...
- oblivious_trees
-- ...
- scale_and_bias
-- ...

catboost version: 1.2.2
Operating System: Google Cloud Vertex AI Workbench User-Managed Notebook
CPU: ?
GPU: No GPU

@NickVeld NickVeld changed the title [Model dump: JSON] cat_features_hash does not exist in JSON dump of model [Model dump: JSON] model.save_model does not warn when required "pool" argument is skipped May 13, 2024
NickVeld added a commit to NickVeld/catboost that referenced this issue May 15, 2024
@NickVeld NickVeld linked a pull request May 15, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant