Skip to content

[SIGIR 2022] ORCAS-I: Queries Annotated with Intent using Weak Supervision

Notifications You must be signed in to change notification settings

ProjectDossier/intents_labelling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intents Labelling project

This package serves as basis for the paper "ORCAS-I: Queries Annotated with Intent using Weak Supervision"

Link to the paper: arXiv

DOI of the paper: https://doi.org/10.1145/3477495.3531737

DOI of the dataset: DOI

Installation

Create conda environment:

$ conda create --name intents_labelling python==3.8.12

Activate the environment:

$ source activate intents_labelling

Use pip to install requirements:

(intents_labelling) $ pip install -r requirements.txt

Install intents_labelling package for development

(intents_labelling) $ pip install -e .

Install spacy language model:

(intents_labelling) $ python -m spacy download en_core_web_lg

List of movie titles can be found here.

Put all data files in data/input/ directory.

Usage

Create a training set which will be a sample of ORCAS dataset. Filter out testset examples

(intents_labelling) $ python intents_labelling/create_train_file.py

Create snorkel annotations

(intents_labelling) $ python intents_labelling/main.py

Train Bert model

(intents_labelling) $ python intents_labelling/models/train_bert_classifier.py