Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catboost 1.2.5 broke text features on GPU "Can't find borders for feature #4138" #2657

Closed
ryanshrott opened this issue May 12, 2024 · 11 comments

Comments

@ryanshrott
Copy link

ryanshrott commented May 12, 2024

Problem:
After upgrading to 1.2.5, text features training with GPU throws a strange exception. Oddly, the release specifies that this was just added to 1.2.5, but I was using it in 1.2.3 with no issues.

CatBoostError                             Traceback (most recent call last)
Cell In[7], [line 5](vscode-notebook-cell:?execution_count=7&line=5)
      [3](vscode-notebook-cell:?execution_count=7&line=3) model = CatBoostRegressor( **cat_params, cat_features=cat_cols, text_features=text_cols)
      [4](vscode-notebook-cell:?execution_count=7&line=4) train_pool = Pool(data=X_train, label=y_train, cat_features = cat_cols, text_features = text_cols)
----> [5](vscode-notebook-cell:?execution_count=7&line=5) model.fit(train_pool)

File /anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:5827, in CatBoostRegressor.fit(self, X, y, cat_features, text_features, embedding_features, sample_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)
   [5824](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:5824) if 'loss_function' in params:
   [5825](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:5825)     CatBoostRegressor._check_is_compatible_loss(params['loss_function'])
-> [5827](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:5827) return self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline,
   [5828](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:5828)                  use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description,
   [5829](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:5829)                  verbose_eval, metric_period, silent, early_stopping_rounds,
   [5830](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:5830)                  save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)

File /anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:2400, in CatBoost._fit(self, X, y, cat_features, text_features, embedding_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)
   [2397](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:2397) allow_clear_pool = train_params["allow_clear_pool"]
   [2399](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:2399) with plot_wrapper(plot, plot_file, 'Training plots', [_get_train_dir(self.get_params())]):
-> [2400](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:2400)     self._train(
   [2401](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:2401)         train_pool,
   [2402](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:2402)         train_params["eval_sets"],
   [2403](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:2403)         params,
   [2404](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:2404)         allow_clear_pool,
   [2405](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:2405)         train_params["init_model"]
   [2406](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:2406)     )
   [2408](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:2408) # Have property feature_importance possibly set
   [2409](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:2409) loss = self._object._get_loss_function_name()

File /anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:1780, in _CatBoostBase._train(self, train_pool, test_pool, params, allow_clear_pool, init_model)
   [1779](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:1779) def _train(self, train_pool, test_pool, params, allow_clear_pool, init_model):
-> [1780](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:1780)     self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None)
   [1781](https://vscode-remote+amlext-002b-002bsubscriptions-002ba389c1be-002d9690-002d496d-002dac46-002d96e13904e4b9-002bresourcegroups-002bresource-002bproviders-002bmicrosoft-002emachinelearningservices-002bworkspaces-002bsmartbids-002bcomputes-002bryanshrott1-003apublic-003aacdbda73-002d0358-002d4faa-002db305-002d4197c370ec4a.vscode-resource.vscode-cdn.net/anaconda/envs/new_catboost/lib/python3.11/site-packages/catboost/core.py:1781)     self._set_trained_model_attributes()

File _catboost.pyx:4833, in _catboost._CatBoost._train()

File _catboost.pyx:4882, in _catboost._CatBoost._train()

CatBoostError: /src/catboost/catboost/cuda/data/binarizations_manager.cpp:169: Can't find borders for feature #4138
text features on GPU

cat boost version: 1.2.5
Operating System: Linux
CPU: Not sure, something intel I think
GPU: A100

@ryanshrott ryanshrott changed the title Catboost 1.2.5 broke text features on GPU Catboost 1.2.5 broke text features on GPU "Can't find borders for feature #4138" May 12, 2024
@ryanshrott
Copy link
Author

FYI: Downgrading to 1.2.3 fixes the issue.

@Evgueni-Petrov-aka-espetrov
Copy link
Contributor

hi @ryanshrott
could you please share an end-to-end reproducer?
for me, catboost 1.2.5 builds w/o problems a regression model for rotten tomatoes dataset, which has categorical and text features as in your setting

import catboost
from catboost.datasets import rotten_tomatoes

train_df, test_df = rotten_tomatoes()

auxiliary_columns = ['id', 'theater_date', 'dvd_date', 'rating', 'date']
cat_features = ['rating_MPAA', 'studio', 'fresh', 'critic', 'top_critic', 'publisher']
text_features = ['synopsis', 'genre', 'director', 'writer', 'review']

def fill_na(df, features):
    for feature in features:
        df[feature] = df[feature].fillna('')

def preprocess_data_part(data_part):
    data_part = data_part.drop(auxiliary_columns, axis=1)

    fill_na(data_part, cat_features)
    fill_na(data_part, text_features)

    X = data_part.drop(['rating_10'], axis=1)
    y = data_part['rating_10']
    return X, y

X_train, y_train = preprocess_data_part(train_df)
X_test, y_test = preprocess_data_part(test_df)

from catboost import Pool

train_pool = Pool(
    X_train, y_train, 
    cat_features=cat_features, 
    text_features=text_features,
)

validation_pool = Pool(
    X_test, y_test, 
    cat_features=cat_features, 
    text_features=text_features,
)

from catboost import CatBoostClassifier
from catboost import CatBoostRegressor

def fit_model(train_pool, validation_pool, **kwargs):
    model = CatBoostRegressor(
        iterations=1000,
        learning_rate=0.05,
        # eval_metric='Accuracy',
        task_type='GPU',
        **kwargs,
    )

    return model.fit(
        train_pool,
        eval_set=validation_pool,
        verbose=100,
    )

model = fit_model(train_pool, validation_pool)

@Evgueni-Petrov-aka-espetrov
Copy link
Contributor

i get the following output on a machine with single v100

espetrov@test-gpu-catboost:~/arcadia$ python3.10 rep_gh_2657.py
0:      learn: 2.1048844        test: 2.1288423 best: 2.1288423 (0)     total: 37.1ms   remaining: 37.1s
100:    learn: 1.2821217        test: 1.2959100 best: 1.2959100 (100)   total: 3.11s    remaining: 27.7s
200:    learn: 1.2584938        test: 1.2768239 best: 1.2768239 (200)   total: 5.59s    remaining: 22.2s
300:    learn: 1.2432841        test: 1.2663729 best: 1.2663729 (300)   total: 7.83s    remaining: 18.2s
400:    learn: 1.2325383        test: 1.2599630 best: 1.2599630 (400)   total: 10s      remaining: 14.9s
500:    learn: 1.2241011        test: 1.2557045 best: 1.2557045 (500)   total: 12.2s    remaining: 12.1s
600:    learn: 1.2169718        test: 1.2523941 best: 1.2523941 (600)   total: 14.3s    remaining: 9.51s
700:    learn: 1.2111290        test: 1.2501000 best: 1.2500922 (699)   total: 16.4s    remaining: 7.01s
800:    learn: 1.2044072        test: 1.2472681 best: 1.2472681 (800)   total: 18.6s    remaining: 4.62s
900:    learn: 1.1998351        test: 1.2457889 best: 1.2457412 (897)   total: 20.7s    remaining: 2.28s
999:    learn: 1.1952935        test: 1.2448178 best: 1.2447994 (998)   total: 22.8s    remaining: 0us
bestTest = 1.244799443
bestIteration = 998
Shrink model to first 999 iterations.

@Evgueni-Petrov-aka-espetrov
Copy link
Contributor

@ryanshrott please pay attention -- without a reproducer, we are going to close this issue May 24

@ryanshrott
Copy link
Author

ryanshrott commented May 21, 2024

@Evgueni-Petrov-aka-espetrov

Here is a very simple reproducible example of the bug. The dataset is inline. This was reproduced on 2 different linux machines: my local with RTX 3080 and Azure Cloud using A100 80GB

import pandas as pd
from io import StringIO
from catboost import CatBoostRegressor, Pool

csv_content = """,ad_text,latitude,log_closed_price
0,"Amazing Freehold With No Condo Fees! This Sought After 2 Story Freehold Unit Town Offers 3 Generous Sized Bedrooms And 3.5 Baths In A Family Friendly Neighborhood That Is Close To Parks, Schools, Shopping Amenities, And Highway Access. Enjoy The Bright Open Concept Kitchen, Living, And Dining Room Provides Plenty Of Room For Relaxing Or Entertaining, Where The Kitchen Includes S.S Appliances, Quartz Countertops And A Great Island. Main Level Includes A 2-piece Bath and Sliding Glass Doors That Open To A Spacious Back Deck With Great Views Of The Park. The Second Level Features A Primary Bedroom With Walk-In Closet And 4-Piece Ensuite, Other 2 Spacious Bedrooms, A 4-Piece Bath And Upper Level Laundry. Downstairs You Will fully Appreciate An Amazing Family Room With 3-Pc Washroom And An Abundance Of Light From The Large Window And Sliding Glass Doors Walk-Out To The Oversize Private Backyard.<br/><br/><b>EXTRAS:</b> ",43.213356300000,13.652991628466498
1,"Charming Big Cedar Lake Cottage! Not much comes for sale on this pristine lake. 100ft of clean waterfront and west sunsets! Year round cottage/home with 3 bedrooms, 2 bathrooms plus bunkie. Open concept kitchen and dining with woodstove and walkout to private lakeside deck. Bright lakeview living room with cozy propane fireplace. Beautiful primary bedroom with 2pc ensuite and walk-in closet. Lower level laundry and workshop space with walkout to lakeside yard. Mature trees offering great privacy, lakeside firepit, large dock, brand new septic 2020 and spacious parking off year round road with garbage & recycling pickup. Around 20 min to the amenities of Lakefield or Apsley. Stunning views, clean waterfront, west sunsets and year round living on sought after Big Cedar Lake!!<br/><br/><b>EXTRAS:</b> ",44.600458100000,13.455257792677658
2,"Prepare to be amazed! Absolutely stunning 1 Bedroom + 1 Den corner unit with extremely rare 10 ft ceilings - only offered on the ultra exclusive top 3 floors at The Bond condos! Perfectly designed and elegantly appointed open concept living space highlighted by incredible floor to ceiling wrap around windows which allow natural light to cascade throughout the entire unit all day long! Ideal layout maximizing every sq ft with beautiful floors and soaring 10 ft ceilings! The open concept kitchen finished with quartz counters and integrated appliances sits overlooking the spacious living/dining room with walk out to private balcony. Enjoy unobstructed, and truly jaw dropping, south/west views of Toronto's skyline, CN Tower and even Lake Ontario! Large primary bedroom with two, yes two, fully organized double closets offers incredible storage space.Impressive separate Den with sliding glass door is the perfect home office that can easily function as a second bedroom. Spa like washroom with over sized shower and en suite laundry nicely tucked away from the main living space. 5 star building amenities - roof top pool, exterior and interior party room, impressive fitness centre, 24 Hour Concierge, Visitor/Public Parking, Billiards Room, Bbq Area, Guest Suites and more! Perfect Location (100 Walk Score & Transit Score) surrounded by delicious restaurants, excellent shopping, TTC, P.A.T.H. system, Tiff, U of T, and all Toronto has to offer! This is the Toronto condo you have been waiting for, don't miss out!<br/><br/><b>EXTRAS:</b> Sophisticated luxury in the heart of the entertainment district surrounded by Toronto's most iconic landmarks! Amazing location, perfect corner unit layout, 10ft ceilings & private balcony w/ stunning south west views! This one has it all!",43.647939900000,13.53843866462451
3,"Welcome to Westbeach Boutique Condos in the Beaches! Enjoy this 1 bedroom 524 sqft Penthouse Unit With 9 Ft Ceilings, Floor to Ceiling Windows With Walk Out to Private 260 Sqft Terrace Oasis with Gas line and Lush Views. Open Concept with White Modern Kitchen With Quartz Countertops and Luxury Finishes. Enjoy Vibrant Living Steps to the Beach, Bars, Restaurants, Movie Theatres, LCBO, Waterfront parks, Leslieville. Amenities include: Fitness center, Party room, Pet Washing Station, Outdoor Rooftop Terrace, BBQ & more!<br/><br/><b>EXTRAS:</b> All Window Coverings, All Electrical Light Fixtures, Stainless Steel Appliances; Fridge, Stove, Microwave Range. Built-In Dishwasher, Washer And Dryer.",43.666660000000,13.199324418540456
4,"Welcome To This Exquisite Home Nestled In The Heart Of North Oshawa Offering Approximately 2500 Sq Ft Of Finished Livable Space! Step Inside To Discover An Inviting Open-Concept Main Floor Adorned With Soaring Cathedral Ceilings In The Foyer, Setting A Grand Tone. The Spacious Recently Updated Eat-In Kitchen Overlooks The Cozy Living Room Featuring A Built-In Media Wall And A Gas Fireplace, Perfect For Gatherings And Relaxation. Entertain Outdoor With Ease On The Private Deck In The Backyard. Discover Three Generously Sized Bedrooms On The Second Floor, With The Primary Room Boasting A Luxurious Walk-In Closet And A Lavish 5pc Ensuite, Providing Comfort And Convenience. Finished Recreational Space In The Basement Offering Additional Entertainment Space. Close To Multiple Amenities, Schools, Parks & Trails. Don't Miss The Opportunity To Make This Stunning Residence Your Own!<br/><br/><b>EXTRAS:</b> ",43.936883800000,13.815509557963773"""


# Load the CSV string into a DataFrame
df = pd.read_csv(StringIO(csv_content))

# Display the DataFrame
print(df)

params  = {
        'task_type': "GPU",
        'iterations': 10,
    }

X_train, y_train = df[['latitude', 'ad_text']], df[['log_closed_price']]

model = CatBoostRegressor(cat_features=[], text_features=['ad_text'], **params)
train_pool = Pool(data=X_train, label=y_train, cat_features=[], text_features=['ad_text'])
model.fit(train_pool)

ERROR MESSAGE:


CatBoostError Traceback (most recent call last)
Cell In[16], line 28
26 model = CatBoostRegressor(cat_features=[], text_features=['ad_text'], **params)
27 train_pool = Pool(data=X_train, label=y_train, cat_features=[], text_features=['ad_text'])
---> 28 model.fit(train_pool)

File ~/miniconda3/envs/rapids-24.04/lib/python3.11/site-packages/catboost/core.py:5827, in CatBoostRegressor.fit(self, X, y, cat_features, text_features, embedding_features, sample_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)
5824 if 'loss_function' in params:
5825 CatBoostRegressor._check_is_compatible_loss(params['loss_function'])
-> 5827 return self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline,
5828 use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description,
5829 verbose_eval, metric_period, silent, early_stopping_rounds,
5830 save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)

File ~/miniconda3/envs/rapids-24.04/lib/python3.11/site-packages/catboost/core.py:2400, in CatBoost._fit(self, X, y, cat_features, text_features, embedding_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)
2397 allow_clear_pool = train_params["allow_clear_pool"]
2399 with plot_wrapper(plot, plot_file, 'Training plots', [_get_train_dir(self.get_params())]):
-> 2400 self._train(
2401 train_pool,
2402 train_params["eval_sets"],
2403 params,
2404 allow_clear_pool,
2405 train_params["init_model"]
2406 )
2408 # Have property feature_importance possibly set
2409 loss = self._object._get_loss_function_name()

File ~/miniconda3/envs/rapids-24.04/lib/python3.11/site-packages/catboost/core.py:1780, in _CatBoostBase._train(self, train_pool, test_pool, params, allow_clear_pool, init_model)
1779 def _train(self, train_pool, test_pool, params, allow_clear_pool, init_model):
-> 1780 self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None)
1781 self._set_trained_model_attributes()

File _catboost.pyx:4833, in _catboost._CatBoost._train()

File _catboost.pyx:4882, in _catboost._CatBoost._train()

CatBoostError: /src/catboost/catboost/cuda/data/binarizations_manager.cpp:169: Can't find borders for feature #13

@ryanshrott
Copy link
Author

Again, to emphasize: The issue only exists on GPU and catboost 1.2.5. Downgrading resolves the issue.

@ryanshrott
Copy link
Author

@ryanshrott please pay attention -- without a reproducer, we are going to close this issue May 24

Please see my example

@ryanshrott
Copy link
Author

Very strangely, switching the order of 'latitude' and 'ad_text' in the dataframe above solves the issue.

Seems very odd...

@Evgueni-Petrov-aka-espetrov
Copy link
Contributor

This was reproduced

thank you for the repro
i see a cuda error 999 on a v100

robot-piglet pushed a commit that referenced this issue May 27, 2024
92f9d90b7ffae394d07b22b96512fcb67c97df2d
@Evgueni-Petrov-aka-espetrov
Copy link
Contributor

fixed in main branch -- closing

@ryanshrott
Copy link
Author

Thanks. So this fix will be in the next release? It's not in 1.2.5?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants