[Question] The project is failing 90% of the time because of new challenges #965

tarekwiz · 2023-12-02T09:31:00Z

Can you explain how to train new models because it's not visible anywhere? Maybe a small tutorial with examples would help a lot and many developers are going to be able to contribute to your project.

tarekwiz · 2023-12-02T10:28:57Z

Nevermind I see the workflow collecting captcha files and I also saw this project https://github.com/beiyuouo/hcaptcha-model-factory for automatic model creation, so why is this not a part of the workflow. Maybe we can use GPT for classifieng which classes are "yes" or "bad" @beiyuouo . What do you guys think?

QIN2DIM · 2023-12-02T16:30:46Z

You can perform the zero-shot image binary classification task by modifying the modelhub instance variable. You don't need to train the model yourself. CLIP can already handle all image classification tasks.

tarekwiz · 2023-12-02T16:50:08Z

I've tried it but it doesn't work well. Can you show me an example using it with "Click on the animals that don't walk"
edit: maybe i'm using it incorrectly, hence why I asked for an example

tarekwiz · 2023-12-02T17:59:19Z

I found why. There's an error in your code components/common.py L55 Should be "ket in label" not "label in ket"

tarekwiz · 2023-12-02T18:29:36Z

@QIN2DIM I opened a pull request. Please merge

QIN2DIM · 2023-12-03T13:38:00Z

For CLIP, I have not provided a better reference case.

For CLIP prompt，I offer two trigger options.clip_candidates and datalake, at the moment I prefer datalake.

I plan to use clip_candidates to handle nested types of image binary classification tasks, i.e., challenges where the prompt is invariant but the positive sample can be constantly updated.

QIN2DIM · 2023-12-03T13:39:16Z

There are still some issues with clip_candidates, and I will subsequently change its data structure, which currently struggles to cope with the complex demands of prompt orchestration. I'd like it to be able to divide and conquer, and to further reduce inference pressure.

QIN2DIM · 2023-12-03T13:41:47Z

The default is to use the RESNET-50 model, which is behind the times in terms of inference performance.

I will write a script later to determine if current hardware resources can run a better performing CLIP model.

QIN2DIM · 2023-12-03T13:42:38Z

hcaptcha-challenger/hcaptcha_challenger/onnx/modelhub.py

Lines 295 to 308 in d5fa70a

    
               DEFAULT_CLIP_VISUAL_MODEL: str = "visual_CLIP_RN50.openai.onnx" 
        
               DEFAULT_CLIP_TEXTUAL_MODEL: str = "textual_CLIP_RN50.openai.onnx" 
        
               """ 
        
               Available Model 
        
               --- 1180+ MiB 
        
               DEFAULT_CLIP_VISUAL_MODEL: str = "visual_CLIP_ViT-B-32.openai.onnx" 
        
               DEFAULT_CLIP_TEXTUAL_MODEL: str = "textual_CLIP_ViT-B-32.openai.onnx" 
        
               --- 658.3 MiB 
        
               DEFAULT_CLIP_VISUAL_MODEL: str = "visual_CLIP_RN50.openai.onnx" 
        
               DEFAULT_CLIP_TEXTUAL_MODEL: str = "textual_CLIP_RN50.openai.onnx" 
        
               --- 3300+ MiB 
        
               DEFAULT_CLIP_VISUAL_MODEL: str = "visual_CLIP-ViT-L-14-DataComp.XL-s13B-b90K.onnx" 
        
               DEFAULT_CLIP_TEXTUAL_MODEL: str = "textual_CLIP-ViT-L-14-DataComp.XL-s13B-b90K.onnx" 
        
               """

tarekwiz · 2023-12-03T13:44:52Z

I have managed to get clip_candidates to work but I dislike how the first element of the array is the correct one. Nothing indicates that and I had to read the code to identify it.
I think we should work on a CI that converts collected information into CLIP candidates automatically so we don't have to keep track of them manualy.

QIN2DIM · 2023-12-03T15:48:46Z

I think we should work on a CI that converts collected information into CLIP candidates automatically so we don't have to keep track of them manualy.

You're right. That's what I want to do.

I have managed to get clip_candidates to work but I dislike how the first element of the array is the correct one. Nothing indicates that and I had to read the code to identify it.

I have provided a datalake orchestration method to orchestrate Positive and Negative for a specific challenge-prompt.

The current data structure for clip_candidates is failing miserably, and I'm already planning to replace it.

QIN2DIM · 2023-12-03T15:50:28Z

hcaptcha-challenger/src/objects.yaml

Line 6 in d5fa70a

clip_candidates:

hcaptcha-challenger/src/objects.yaml

Line 736 in d5fa70a

datalake:

tarekwiz · 2023-12-07T11:08:36Z

@QIN2DIM PR has been approved but it still hasn't been updated on pypi

QIN2DIM · 2023-12-07T18:54:40Z

Updated

Actually, I rewrote clip_candidates about two weeks ago and also solved the current issue. However, that feature needs more testing, and I have been too busy recently.

harusurv · 2023-12-11T17:27:30Z

It's failing atm in every challenge :(

QIN2DIM added the bug Something isn't working label Dec 3, 2023

QIN2DIM mentioned this issue Mar 31, 2024

feat(new-order): Integrated with LVM (large visual models) #978

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] The project is failing 90% of the time because of new challenges #965

[Question] The project is failing 90% of the time because of new challenges #965

tarekwiz commented Dec 2, 2023

tarekwiz commented Dec 2, 2023

QIN2DIM commented Dec 2, 2023

tarekwiz commented Dec 2, 2023 •

edited

tarekwiz commented Dec 2, 2023

tarekwiz commented Dec 2, 2023

QIN2DIM commented Dec 3, 2023

QIN2DIM commented Dec 3, 2023

QIN2DIM commented Dec 3, 2023

QIN2DIM commented Dec 3, 2023

tarekwiz commented Dec 3, 2023

QIN2DIM commented Dec 3, 2023 •

edited

QIN2DIM commented Dec 3, 2023

tarekwiz commented Dec 7, 2023

QIN2DIM commented Dec 7, 2023

harusurv commented Dec 11, 2023

[Question] The project is failing 90% of the time because of new challenges #965

[Question] The project is failing 90% of the time because of new challenges #965

Comments

tarekwiz commented Dec 2, 2023

tarekwiz commented Dec 2, 2023

QIN2DIM commented Dec 2, 2023

tarekwiz commented Dec 2, 2023 • edited

tarekwiz commented Dec 2, 2023

tarekwiz commented Dec 2, 2023

QIN2DIM commented Dec 3, 2023

QIN2DIM commented Dec 3, 2023

QIN2DIM commented Dec 3, 2023

QIN2DIM commented Dec 3, 2023

tarekwiz commented Dec 3, 2023

QIN2DIM commented Dec 3, 2023 • edited

QIN2DIM commented Dec 3, 2023

tarekwiz commented Dec 7, 2023

QIN2DIM commented Dec 7, 2023

harusurv commented Dec 11, 2023

tarekwiz commented Dec 2, 2023 •

edited

QIN2DIM commented Dec 3, 2023 •

edited