Skip to content
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.

Replicating Results #1336

Open
MHDBST opened this issue Jun 7, 2023 · 3 comments
Open

Replicating Results #1336

MHDBST opened this issue Jun 7, 2023 · 3 comments

Comments

@MHDBST
Copy link

MHDBST commented Jun 7, 2023

I'm working on a classification task with fastText library, and I am trying to replicate the same results over different runs. I have set the following parameters and the seed is set to 40, but different runs result in different accuracies over dev set. The difference is significant in a way than in one run the accuracy is 90%, while in the other it is 75%. I'm not sure whether it's because of running on CPU and using multi thread functionality or there is any other way to replicate the results. Any guide on this?

fasttext.train_supervised(input=train_path, minCount=3, wordNgrams=4, minn=1, maxn=6, lr=0.001, dim=300, epoch=50, seed=40)

@SDAravind
Copy link

Yes, even I have the same issue, do you use autotune using the validation file parameter?

FYI - There's no seed parameter fasttext parameter

@MHDBST
Copy link
Author

MHDBST commented Jul 18, 2023

@SDAravind maybe its not mentioned in the wiki page for some reason, but this parameter is defined.

'seed', 'autotuneValidationFile', 'autotuneMetric',

@SDAravind
Copy link

SDAravind commented Aug 2, 2023

@MHDBST - It resulted in error for me when I set the seed parameter.

As an alternate approach, I would use fasttext generate sentence vector method for text vectorisation along with scikit-learn MLPClassifier or any other estimator for consistent results (set random state to some value of your choice).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants