-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catboost 1.2.5 broke text features on GPU "Can't find borders for feature #4138" #2657
Comments
FYI: Downgrading to 1.2.3 fixes the issue. |
hi @ryanshrott import catboost
from catboost.datasets import rotten_tomatoes
train_df, test_df = rotten_tomatoes()
auxiliary_columns = ['id', 'theater_date', 'dvd_date', 'rating', 'date']
cat_features = ['rating_MPAA', 'studio', 'fresh', 'critic', 'top_critic', 'publisher']
text_features = ['synopsis', 'genre', 'director', 'writer', 'review']
def fill_na(df, features):
for feature in features:
df[feature] = df[feature].fillna('')
def preprocess_data_part(data_part):
data_part = data_part.drop(auxiliary_columns, axis=1)
fill_na(data_part, cat_features)
fill_na(data_part, text_features)
X = data_part.drop(['rating_10'], axis=1)
y = data_part['rating_10']
return X, y
X_train, y_train = preprocess_data_part(train_df)
X_test, y_test = preprocess_data_part(test_df)
from catboost import Pool
train_pool = Pool(
X_train, y_train,
cat_features=cat_features,
text_features=text_features,
)
validation_pool = Pool(
X_test, y_test,
cat_features=cat_features,
text_features=text_features,
)
from catboost import CatBoostClassifier
from catboost import CatBoostRegressor
def fit_model(train_pool, validation_pool, **kwargs):
model = CatBoostRegressor(
iterations=1000,
learning_rate=0.05,
# eval_metric='Accuracy',
task_type='GPU',
**kwargs,
)
return model.fit(
train_pool,
eval_set=validation_pool,
verbose=100,
)
model = fit_model(train_pool, validation_pool) |
i get the following output on a machine with single v100
|
@ryanshrott please pay attention -- without a reproducer, we are going to close this issue May 24 |
Here is a very simple reproducible example of the bug. The dataset is inline. This was reproduced on 2 different linux machines: my local with RTX 3080 and Azure Cloud using A100 80GB import pandas as pd
from io import StringIO
from catboost import CatBoostRegressor, Pool
csv_content = """,ad_text,latitude,log_closed_price
0,"Amazing Freehold With No Condo Fees! This Sought After 2 Story Freehold Unit Town Offers 3 Generous Sized Bedrooms And 3.5 Baths In A Family Friendly Neighborhood That Is Close To Parks, Schools, Shopping Amenities, And Highway Access. Enjoy The Bright Open Concept Kitchen, Living, And Dining Room Provides Plenty Of Room For Relaxing Or Entertaining, Where The Kitchen Includes S.S Appliances, Quartz Countertops And A Great Island. Main Level Includes A 2-piece Bath and Sliding Glass Doors That Open To A Spacious Back Deck With Great Views Of The Park. The Second Level Features A Primary Bedroom With Walk-In Closet And 4-Piece Ensuite, Other 2 Spacious Bedrooms, A 4-Piece Bath And Upper Level Laundry. Downstairs You Will fully Appreciate An Amazing Family Room With 3-Pc Washroom And An Abundance Of Light From The Large Window And Sliding Glass Doors Walk-Out To The Oversize Private Backyard.<br/><br/><b>EXTRAS:</b> ",43.213356300000,13.652991628466498
1,"Charming Big Cedar Lake Cottage! Not much comes for sale on this pristine lake. 100ft of clean waterfront and west sunsets! Year round cottage/home with 3 bedrooms, 2 bathrooms plus bunkie. Open concept kitchen and dining with woodstove and walkout to private lakeside deck. Bright lakeview living room with cozy propane fireplace. Beautiful primary bedroom with 2pc ensuite and walk-in closet. Lower level laundry and workshop space with walkout to lakeside yard. Mature trees offering great privacy, lakeside firepit, large dock, brand new septic 2020 and spacious parking off year round road with garbage & recycling pickup. Around 20 min to the amenities of Lakefield or Apsley. Stunning views, clean waterfront, west sunsets and year round living on sought after Big Cedar Lake!!<br/><br/><b>EXTRAS:</b> ",44.600458100000,13.455257792677658
2,"Prepare to be amazed! Absolutely stunning 1 Bedroom + 1 Den corner unit with extremely rare 10 ft ceilings - only offered on the ultra exclusive top 3 floors at The Bond condos! Perfectly designed and elegantly appointed open concept living space highlighted by incredible floor to ceiling wrap around windows which allow natural light to cascade throughout the entire unit all day long! Ideal layout maximizing every sq ft with beautiful floors and soaring 10 ft ceilings! The open concept kitchen finished with quartz counters and integrated appliances sits overlooking the spacious living/dining room with walk out to private balcony. Enjoy unobstructed, and truly jaw dropping, south/west views of Toronto's skyline, CN Tower and even Lake Ontario! Large primary bedroom with two, yes two, fully organized double closets offers incredible storage space.Impressive separate Den with sliding glass door is the perfect home office that can easily function as a second bedroom. Spa like washroom with over sized shower and en suite laundry nicely tucked away from the main living space. 5 star building amenities - roof top pool, exterior and interior party room, impressive fitness centre, 24 Hour Concierge, Visitor/Public Parking, Billiards Room, Bbq Area, Guest Suites and more! Perfect Location (100 Walk Score & Transit Score) surrounded by delicious restaurants, excellent shopping, TTC, P.A.T.H. system, Tiff, U of T, and all Toronto has to offer! This is the Toronto condo you have been waiting for, don't miss out!<br/><br/><b>EXTRAS:</b> Sophisticated luxury in the heart of the entertainment district surrounded by Toronto's most iconic landmarks! Amazing location, perfect corner unit layout, 10ft ceilings & private balcony w/ stunning south west views! This one has it all!",43.647939900000,13.53843866462451
3,"Welcome to Westbeach Boutique Condos in the Beaches! Enjoy this 1 bedroom 524 sqft Penthouse Unit With 9 Ft Ceilings, Floor to Ceiling Windows With Walk Out to Private 260 Sqft Terrace Oasis with Gas line and Lush Views. Open Concept with White Modern Kitchen With Quartz Countertops and Luxury Finishes. Enjoy Vibrant Living Steps to the Beach, Bars, Restaurants, Movie Theatres, LCBO, Waterfront parks, Leslieville. Amenities include: Fitness center, Party room, Pet Washing Station, Outdoor Rooftop Terrace, BBQ & more!<br/><br/><b>EXTRAS:</b> All Window Coverings, All Electrical Light Fixtures, Stainless Steel Appliances; Fridge, Stove, Microwave Range. Built-In Dishwasher, Washer And Dryer.",43.666660000000,13.199324418540456
4,"Welcome To This Exquisite Home Nestled In The Heart Of North Oshawa Offering Approximately 2500 Sq Ft Of Finished Livable Space! Step Inside To Discover An Inviting Open-Concept Main Floor Adorned With Soaring Cathedral Ceilings In The Foyer, Setting A Grand Tone. The Spacious Recently Updated Eat-In Kitchen Overlooks The Cozy Living Room Featuring A Built-In Media Wall And A Gas Fireplace, Perfect For Gatherings And Relaxation. Entertain Outdoor With Ease On The Private Deck In The Backyard. Discover Three Generously Sized Bedrooms On The Second Floor, With The Primary Room Boasting A Luxurious Walk-In Closet And A Lavish 5pc Ensuite, Providing Comfort And Convenience. Finished Recreational Space In The Basement Offering Additional Entertainment Space. Close To Multiple Amenities, Schools, Parks & Trails. Don't Miss The Opportunity To Make This Stunning Residence Your Own!<br/><br/><b>EXTRAS:</b> ",43.936883800000,13.815509557963773"""
# Load the CSV string into a DataFrame
df = pd.read_csv(StringIO(csv_content))
# Display the DataFrame
print(df)
params = {
'task_type': "GPU",
'iterations': 10,
}
X_train, y_train = df[['latitude', 'ad_text']], df[['log_closed_price']]
model = CatBoostRegressor(cat_features=[], text_features=['ad_text'], **params)
train_pool = Pool(data=X_train, label=y_train, cat_features=[], text_features=['ad_text'])
model.fit(train_pool) ERROR MESSAGE: CatBoostError Traceback (most recent call last) File ~/miniconda3/envs/rapids-24.04/lib/python3.11/site-packages/catboost/core.py:5827, in CatBoostRegressor.fit(self, X, y, cat_features, text_features, embedding_features, sample_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr) File ~/miniconda3/envs/rapids-24.04/lib/python3.11/site-packages/catboost/core.py:2400, in CatBoost._fit(self, X, y, cat_features, text_features, embedding_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr) File ~/miniconda3/envs/rapids-24.04/lib/python3.11/site-packages/catboost/core.py:1780, in _CatBoostBase._train(self, train_pool, test_pool, params, allow_clear_pool, init_model) File _catboost.pyx:4833, in _catboost._CatBoost._train() File _catboost.pyx:4882, in _catboost._CatBoost._train() CatBoostError: /src/catboost/catboost/cuda/data/binarizations_manager.cpp:169: Can't find borders for feature #13 |
Again, to emphasize: The issue only exists on GPU and catboost 1.2.5. Downgrading resolves the issue. |
Please see my example |
Very strangely, switching the order of 'latitude' and 'ad_text' in the dataframe above solves the issue. Seems very odd... |
thank you for the repro |
92f9d90b7ffae394d07b22b96512fcb67c97df2d
fixed in main branch -- closing |
Thanks. So this fix will be in the next release? It's not in 1.2.5? |
Problem:
After upgrading to 1.2.5, text features training with GPU throws a strange exception. Oddly, the release specifies that this was just added to 1.2.5, but I was using it in 1.2.3 with no issues.
cat boost version: 1.2.5
Operating System: Linux
CPU: Not sure, something intel I think
GPU: A100
The text was updated successfully, but these errors were encountered: