Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language Hack Day #236

Open
11 tasks
benfoxall opened this issue Nov 22, 2016 · 2 comments
Open
11 tasks

Language Hack Day #236

benfoxall opened this issue Nov 22, 2016 · 2 comments
Assignees
Labels

Comments

@benfoxall
Copy link
Member

benfoxall commented Nov 22, 2016

see discussion

The Language Hack Day is primarily geared towards making machines do things with human language, either in text or audio form...there are also potential applications with wearables/mobile devices and capturing physical vectors (think sign language), but that might be a bit ambitious for a one-day event.

What is a hack day?

A hack day is an event where people come together to collaborate on interesting ideas and learn new technologies. People usually organise themselves into teams, work on a project and then demo it at the end of the day. Hack days work best when there's a relaxed, open and friendly atmosphere.

Who is this event for?

Anyone interested in using computers to analyse, create, transform, or otherwise play with any of the 6,000 human languages (though the major world languages will be easier to find data on).

What technologies are we using (if you want...feel free to use other things)?

responsivevoice.js
HTML5 speech synthesis API
general natural language processing

TO BE DECIDED
To what extent are we interested in incorporating non-JS technologies? For example, I'm planning to have an API set up to serve language resources list words/sentences/etc, but written in python, which I'm assuming everyone is fine with...however a lot of the more powerful (and well documented) language processing libraries are in python. I'll try to find some good libraries in node in the meantime...

Examples of text input/output we could do stuff with:

  • tweets (has API)
  • reddit posts/comments (has API)
  • html pages (eg, wikipedia) (no API, have to scrape)
  • documents on the local file system
  • SMS (eg, sent through Twilio)
  • chats (eg, sent through Telegram)
  • I'll have a collection of all the README documentation from GitHub to play with (probably in just a big text file, but hoping to have something more like a csv with meta-data like language used).

Examples of audio input/output we could do stuff with:

  • audio from the internet (eg, soundcloud, youtube, TED talks)
  • audio recorded on a mobile device or laptop
  • hardware with audio enabled (eg, Amazon echo)

Examples of problems/topics we could work on (from easier-ish to slightly more difficult-ish):

  • Sentiment analysis (analysing how positive or negative a certain text is)
  • Text-to-speech
  • Part of Speech tagging (eg, singular noun, auxiliary verb, possessive pronoun, preposition)
  • Predicting sentence stress based on a given sentence (BONUS: evaluating the similarity to somebody reading the same sentence)
  • Generating simple sentences/poetry/song lyrics (BONUS: generating a backing audio track to be played while the computer reads the generated text...think "creating both the lyrics and melody of a rap song")
  • Scoring responses to happy/sad stories ("I just told you my cat died...why did you reply, 'AWESOME!!!1'...???") based on sentiment analysis
  • Deciding whether a song is more positive or negative (sentiment analysis, BONUS: base this on the prosodic features of the song instead of the lyrics)
  • Create a visualisation based on the sentiment analysis of a user's tweets/comments/voice recordings (BONUS: transform this into a non-linguistic medium like colours)
  • Summarising text samples (BONUS: scoring summaries)
  • Artificially stretch/compress the octave range of a recording (eg, regular speech sounds very excited, or vice-versa)
  • Generating/scoring paraphrases (eg, "Would this be an acceptable paraphrase of cited evidence in a research paper?")
  • Generate lists of words with spoken forms that could cause confusion between two language (eg, a word that means 'to like' in Indonesian, though is one of the most vulgar expletives in Russian)
  • Create a swipable timeline for a given verb, then change the form based on the touch actions from the user (eg, swipes left - change to past tense form, swipes right - change to future tense form, draws small circles around the centre of the line - change to present progressive)
  • Have users interact with a mobile device API (acceleromotors and gyroscopes) to appropriately simulate different verb meanings/functions (ie, "stop turning" = turning the device for a while and then stop, VS. "stop to turn" = shaking for a while, then stopping and turning the device)
  • Generate simple texts based on templates, introduce minor errors and time the users in how quickly they can fix them (eg, generate a simple business email with one spelling error and one improper collocation, like "mourn the profits")
  • Generate a list of most likely collocations (words that appear after/before a given word) for a word (BONUS: create a corpus for this and visualise differences in collocation probabilities by subject area/register)
  • Predicting word stress in made up words (eg, flexomagication, brabrahsticklensboratory, yuzzis)
  • Create word vectors (BONUS: create a visualisation for this)
  • Create document vectors (BONUS: create a visualisation for this)
  • Use Term Frequency / Inverse Document Frequency to decide what subject a particular text should be categorised as
  • Pass the Turing Test

Feel free to try out your own ideas or remix any of these, we will help wherever we can! If there are any that are particularly interesting and you'd like more information, let Adam know and he can find the latest-ish applied linguistics research behind them.

What does the day look like?

TBC
Time Activity
09:30 Arrive, drink coffee and eat pastries
10:00 Intro to the tech
10:15 Discuss ideas and form teams
10:45 Start hacking
13:00 Lunch slash dance break!
17:00 Demos
17:30 Finish up and head to bar/cafe for post-event celebrations
When we get set up you'll have the chance to pair up with another attendee or embark on a workshop on your own. We'll have mentors on-hand to help if you get stuck or give you suggestions on workshops you might like.

Food

TBC
We'll be providing pastries from a local patisserie, and a burrito lunch. You'll get to build your own burrito/box. We'll also have a selection of fruits and snacks, teas and coffees.

If you have any concerns about dietary requirements (or any questions whatsoever) get in touch.

I have a question

Great! You can reach out to any of the JSOxford organisers:
-Adam Leskis
-other?

What will I need?

You'll need a laptop and charger. If you don't have access to a laptop let us know and we'll do our best to sort you out.

Date

TBC
sometime in June 2017, 09:30 to 17:00

Organisation

  • Book venue (ensure wifi)

  • Sort out projector

  • Flesh out description for the day

  • Invoice sponsors

  • Add to Meetup.com

  • Book pastries

  • Purchase snacks/drinks

  • Twitter hype
    On the day

  • Take photos

  • Tweet all the things

Note that some of the applications follow along nicely from the Chatbot Hack day, so that might be a good place to get ideas from.

Feel free to add additional idea in comments and they'll be added in this post. 💥


  • confirm details: title, description, dates
@lpmi-13
Copy link
Member

lpmi-13 commented Dec 14, 2016

+1 for doing any of the above, but via chatbots

@AverageMarcus
Copy link
Member

chatbot all the things!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants