Skip to content

JoeBouchard/CapstoneRWFGCorpus

Repository files navigation

Reddit-Wikipedia-Fanfiction-GroupMe (RWFG)Corpus

This repository contains the data used for my undergraduate research project "A Comparative Analysis of Written Register on the Internet". It contains between 200,000 and 400,000 each sentences from 16 sources, with the sources all in their own named .txt file. The sources are:

  • r/Relationships
  • r/Confessions
  • r/TIFU (short for “Today I F*d Up”)
  • r/AmITheAsshole
  • r/AskHistorians
  • r/AskScience
  • r/ExplainLikeImFive
  • r/LifeProTips
  • r/NoSleep
  • r/Jokes
  • r/UnpopularOpinion
  • r/Sh*ttyAskScience
  • English Wikipedia
  • Standard Wikipedia
  • Fanfiction.net
  • GroupMe messages (anonymous)

The final paper with a more detailed methodology of the acquisition of this data can be found at on ResearchGate.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published