[Model dump: JSON] model.save_model does not warn when required "pool" argument is skipped #2659

NickVeld · 2024-05-13T15:34:19Z

Problem:
By mistake I do not know whether I used UUID4 or an integer number (marked as a category) to train my now historical models. I wanted to understand it from my JSON-model dumps ( model.save_model(fname=path, format="json") ) but found no info about the feature values (only splits, definitions through combination and so on).
I know that there should be "cat_features_hash" that holds this info. But it does not exist in my dumps.
With the help I found in the documentation that I should have been passed "pool" argument.

Documentation says the following but the code raises no warning (or error) when I skipped this arg.

This parameter is required if the model contains categorical features and the output format is cpp, python, or JSON.

Bonus comment:

Documentation says the following but I was able to predict the imported model and the results looked okay.

The model can be saved to the JSON format without a pool. In this case it is available for review but it is not applicable.

[Appendix]
My dump structure:

root
- ctr_data
-- {"identifier":...
-- ...
- features_info
-- categorical_features
--- *feature_index*
---- feature_id: "..."
---- feature_index: *feature_index*
---- flat_feature_index: *flat_feature_index*
-- ctrs
--- *list index*
---- borders
----- 0: 0.9999989867210388
----- ...
----- 9: 12.999999046325684
---- ctr_type: "Borders"
---- elements
----- *nested list index"
------    cat_feature_index: 0
------    combination_element: "cat_feature_value"
---- identifier: "{"identifier":[{"cat_feature_index":0,"combination_element":"cat_feature_value"}],"type":"Borders"}"
---- prior_denomerator: 1
---- prior_numerator: 0.5
---- scale: 15
---- shift: 0
---- target_border_idx: 0
-- float_features
--- ...
- model_info
-- catboost_version_info: "Git info: Commit: ff8f073eea240c11e49e51cb774c2df5249c157c Branch: heads/master Author: akhropov <akhropov@yandex-team.com> Summary: Proper ranges in parameters description git-svn info: Last Changed Rev: 9293085 Other info: Build by: Unknown user Top src dir: /src/catboost "
-- class_params
--- ...
-- model_guid: "..."
-- output_options: "{...}"
-- params
--- ...
- oblivious_trees
-- ...
- scale_and_bias
-- ...

catboost version: 1.2.2
Operating System: Google Cloud Vertex AI Workbench User-Managed Notebook
CPU: ?
GPU: No GPU

The text was updated successfully, but these errors were encountered:

Fix catboost#2659

NickVeld changed the title ~~[Model dump: JSON] cat_features_hash does not exist in JSON dump of model~~ [Model dump: JSON] model.save_model does not warn when required "pool" argument is skipped May 13, 2024

NickVeld added a commit to NickVeld/catboost that referenced this issue May 15, 2024

catboost#2659 Add warning over missing pool in save_model

8860b4c

Fix catboost#2659

NickVeld linked a pull request May 15, 2024 that will close this issue

#2659 Add warning over missing pool in save_model #2663

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model dump: JSON] model.save_model does not warn when required "pool" argument is skipped #2659

[Model dump: JSON] model.save_model does not warn when required "pool" argument is skipped #2659

NickVeld commented May 13, 2024 •

edited

[Model dump: JSON] model.save_model does not warn when required "pool" argument is skipped #2659

[Model dump: JSON] model.save_model does not warn when required "pool" argument is skipped #2659

Comments

NickVeld commented May 13, 2024 • edited

NickVeld commented May 13, 2024 •

edited