Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] The project is failing 90% of the time because of new challenges #965

Open
tarekwiz opened this issue Dec 2, 2023 · 15 comments
Labels
bug Something isn't working

Comments

@tarekwiz
Copy link
Collaborator

tarekwiz commented Dec 2, 2023

Can you explain how to train new models because it's not visible anywhere? Maybe a small tutorial with examples would help a lot and many developers are going to be able to contribute to your project.

@tarekwiz
Copy link
Collaborator Author

tarekwiz commented Dec 2, 2023

Nevermind I see the workflow collecting captcha files and I also saw this project https://github.com/beiyuouo/hcaptcha-model-factory for automatic model creation, so why is this not a part of the workflow. Maybe we can use GPT for classifieng which classes are "yes" or "bad" @beiyuouo . What do you guys think?

@QIN2DIM
Copy link
Owner

QIN2DIM commented Dec 2, 2023

You can perform the zero-shot image binary classification task by modifying the modelhub instance variable. You don't need to train the model yourself. CLIP can already handle all image classification tasks.

@tarekwiz
Copy link
Collaborator Author

tarekwiz commented Dec 2, 2023

I've tried it but it doesn't work well. Can you show me an example using it with "Click on the animals that don't walk"
edit: maybe i'm using it incorrectly, hence why I asked for an example

@tarekwiz
Copy link
Collaborator Author

tarekwiz commented Dec 2, 2023

I found why. There's an error in your code components/common.py L55 Should be "ket in label" not "label in ket"

@tarekwiz
Copy link
Collaborator Author

tarekwiz commented Dec 2, 2023

@QIN2DIM I opened a pull request. Please merge

@QIN2DIM
Copy link
Owner

QIN2DIM commented Dec 3, 2023

For CLIP, I have not provided a better reference case.

For CLIP prompt,I offer two trigger options.clip_candidates and datalake, at the moment I prefer datalake.

I plan to use clip_candidates to handle nested types of image binary classification tasks, i.e., challenges where the prompt is invariant but the positive sample can be constantly updated.

@QIN2DIM
Copy link
Owner

QIN2DIM commented Dec 3, 2023

There are still some issues with clip_candidates, and I will subsequently change its data structure, which currently struggles to cope with the complex demands of prompt orchestration. I'd like it to be able to divide and conquer, and to further reduce inference pressure.

@QIN2DIM
Copy link
Owner

QIN2DIM commented Dec 3, 2023

The default is to use the RESNET-50 model, which is behind the times in terms of inference performance.

I will write a script later to determine if current hardware resources can run a better performing CLIP model.

@QIN2DIM
Copy link
Owner

QIN2DIM commented Dec 3, 2023

DEFAULT_CLIP_VISUAL_MODEL: str = "visual_CLIP_RN50.openai.onnx"
DEFAULT_CLIP_TEXTUAL_MODEL: str = "textual_CLIP_RN50.openai.onnx"
"""
Available Model
--- 1180+ MiB
DEFAULT_CLIP_VISUAL_MODEL: str = "visual_CLIP_ViT-B-32.openai.onnx"
DEFAULT_CLIP_TEXTUAL_MODEL: str = "textual_CLIP_ViT-B-32.openai.onnx"
--- 658.3 MiB
DEFAULT_CLIP_VISUAL_MODEL: str = "visual_CLIP_RN50.openai.onnx"
DEFAULT_CLIP_TEXTUAL_MODEL: str = "textual_CLIP_RN50.openai.onnx"
--- 3300+ MiB
DEFAULT_CLIP_VISUAL_MODEL: str = "visual_CLIP-ViT-L-14-DataComp.XL-s13B-b90K.onnx"
DEFAULT_CLIP_TEXTUAL_MODEL: str = "textual_CLIP-ViT-L-14-DataComp.XL-s13B-b90K.onnx"
"""

@tarekwiz
Copy link
Collaborator Author

tarekwiz commented Dec 3, 2023

I have managed to get clip_candidates to work but I dislike how the first element of the array is the correct one. Nothing indicates that and I had to read the code to identify it.
I think we should work on a CI that converts collected information into CLIP candidates automatically so we don't have to keep track of them manualy.

@QIN2DIM
Copy link
Owner

QIN2DIM commented Dec 3, 2023

I think we should work on a CI that converts collected information into CLIP candidates automatically so we don't have to keep track of them manualy.

You're right. That's what I want to do.

I have managed to get clip_candidates to work but I dislike how the first element of the array is the correct one. Nothing indicates that and I had to read the code to identify it.

I have provided a datalake orchestration method to orchestrate Positive and Negative for a specific challenge-prompt.

The current data structure for clip_candidates is failing miserably, and I'm already planning to replace it.

@QIN2DIM
Copy link
Owner

QIN2DIM commented Dec 3, 2023

clip_candidates:

@QIN2DIM QIN2DIM added the bug Something isn't working label Dec 3, 2023
@tarekwiz
Copy link
Collaborator Author

tarekwiz commented Dec 7, 2023

@QIN2DIM PR has been approved but it still hasn't been updated on pypi

@QIN2DIM
Copy link
Owner

QIN2DIM commented Dec 7, 2023

Updated

Actually, I rewrote clip_candidates about two weeks ago and also solved the current issue. However, that feature needs more testing, and I have been too busy recently.

@harusurv
Copy link

It's failing atm in every challenge :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants