Language Hack Day #236

benfoxall · 2016-11-22T16:17:11Z

The Language Hack Day is primarily geared towards making machines do things with human language, either in text or audio form...there are also potential applications with wearables/mobile devices and capturing physical vectors (think sign language), but that might be a bit ambitious for a one-day event.

What is a hack day?

A hack day is an event where people come together to collaborate on interesting ideas and learn new technologies. People usually organise themselves into teams, work on a project and then demo it at the end of the day. Hack days work best when there's a relaxed, open and friendly atmosphere.

Who is this event for?

Anyone interested in using computers to analyse, create, transform, or otherwise play with any of the 6,000 human languages (though the major world languages will be easier to find data on).

What technologies are we using (if you want...feel free to use other things)?

responsivevoice.js
HTML5 speech synthesis API
general natural language processing

TO BE DECIDED
To what extent are we interested in incorporating non-JS technologies? For example, I'm planning to have an API set up to serve language resources list words/sentences/etc, but written in python, which I'm assuming everyone is fine with...however a lot of the more powerful (and well documented) language processing libraries are in python. I'll try to find some good libraries in node in the meantime...

Examples of text input/output we could do stuff with:

tweets (has API)
reddit posts/comments (has API)
html pages (eg, wikipedia) (no API, have to scrape)
documents on the local file system
SMS (eg, sent through Twilio)
chats (eg, sent through Telegram)
I'll have a collection of all the README documentation from GitHub to play with (probably in just a big text file, but hoping to have something more like a csv with meta-data like language used).

Examples of audio input/output we could do stuff with:

audio from the internet (eg, soundcloud, youtube, TED talks)
audio recorded on a mobile device or laptop
hardware with audio enabled (eg, Amazon echo)

Examples of problems/topics we could work on (from easier-ish to slightly more difficult-ish):

Sentiment analysis (analysing how positive or negative a certain text is)
Text-to-speech
Part of Speech tagging (eg, singular noun, auxiliary verb, possessive pronoun, preposition)
Predicting sentence stress based on a given sentence (BONUS: evaluating the similarity to somebody reading the same sentence)
Generating simple sentences/poetry/song lyrics (BONUS: generating a backing audio track to be played while the computer reads the generated text...think "creating both the lyrics and melody of a rap song")
Scoring responses to happy/sad stories ("I just told you my cat died...why did you reply, 'AWESOME!!!1'...???") based on sentiment analysis
Deciding whether a song is more positive or negative (sentiment analysis, BONUS: base this on the prosodic features of the song instead of the lyrics)
Create a visualisation based on the sentiment analysis of a user's tweets/comments/voice recordings (BONUS: transform this into a non-linguistic medium like colours)
Summarising text samples (BONUS: scoring summaries)
Artificially stretch/compress the octave range of a recording (eg, regular speech sounds very excited, or vice-versa)
Generating/scoring paraphrases (eg, "Would this be an acceptable paraphrase of cited evidence in a research paper?")
Generate lists of words with spoken forms that could cause confusion between two language (eg, a word that means 'to like' in Indonesian, though is one of the most vulgar expletives in Russian)
Create a swipable timeline for a given verb, then change the form based on the touch actions from the user (eg, swipes left - change to past tense form, swipes right - change to future tense form, draws small circles around the centre of the line - change to present progressive)
Have users interact with a mobile device API (acceleromotors and gyroscopes) to appropriately simulate different verb meanings/functions (ie, "stop turning" = turning the device for a while and then stop, VS. "stop to turn" = shaking for a while, then stopping and turning the device)
Generate simple texts based on templates, introduce minor errors and time the users in how quickly they can fix them (eg, generate a simple business email with one spelling error and one improper collocation, like "mourn the profits")
Generate a list of most likely collocations (words that appear after/before a given word) for a word (BONUS: create a corpus for this and visualise differences in collocation probabilities by subject area/register)
Predicting word stress in made up words (eg, flexomagication, brabrahsticklensboratory, yuzzis)
Create word vectors (BONUS: create a visualisation for this)
Create document vectors (BONUS: create a visualisation for this)
Use Term Frequency / Inverse Document Frequency to decide what subject a particular text should be categorised as
Pass the Turing Test

Feel free to try out your own ideas or remix any of these, we will help wherever we can! If there are any that are particularly interesting and you'd like more information, let Adam know and he can find the latest-ish applied linguistics research behind them.

What does the day look like?

TBC
Time Activity
09:30 Arrive, drink coffee and eat pastries
10:00 Intro to the tech
10:15 Discuss ideas and form teams
10:45 Start hacking
13:00 Lunch slash dance break!
17:00 Demos
17:30 Finish up and head to bar/cafe for post-event celebrations
When we get set up you'll have the chance to pair up with another attendee or embark on a workshop on your own. We'll have mentors on-hand to help if you get stuck or give you suggestions on workshops you might like.

Food

TBC
We'll be providing pastries from a local patisserie, and a burrito lunch. You'll get to build your own burrito/box. We'll also have a selection of fruits and snacks, teas and coffees.

If you have any concerns about dietary requirements (or any questions whatsoever) get in touch.

I have a question

Great! You can reach out to any of the JSOxford organisers:
-Adam Leskis
-other?

What will I need?

You'll need a laptop and charger. If you don't have access to a laptop let us know and we'll do our best to sort you out.

Date

TBC
sometime in June 2017, 09:30 to 17:00

Organisation

Note that some of the applications follow along nicely from the Chatbot Hack day, so that might be a good place to get ideas from.

Feel free to add additional idea in comments and they'll be added in this post. 💥

confirm details: title, description, dates

lpmi-13 · 2016-12-14T23:10:34Z

+1 for doing any of the above, but via chatbots

AverageMarcus · 2016-12-15T08:45:11Z

chatbot all the things!!!

benfoxall added the hackday label Nov 22, 2016

benfoxall mentioned this issue Nov 22, 2016

2017 Events #233

Closed

benfoxall assigned lpmi-13 Nov 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language Hack Day #236

Language Hack Day #236

benfoxall commented Nov 22, 2016 •

edited by lpmi-13

lpmi-13 commented Dec 14, 2016

AverageMarcus commented Dec 15, 2016

Language Hack Day #236

Language Hack Day #236

Comments

benfoxall commented Nov 22, 2016 • edited by lpmi-13

lpmi-13 commented Dec 14, 2016

AverageMarcus commented Dec 15, 2016

benfoxall commented Nov 22, 2016 •

edited by lpmi-13