Skip to content

IRP Tutorial (3,7): Slides

IRR Tutorial1: Slides

IRR Tutorial2: Slides

IRR Tutorial3: Slides

IRR Tutorial5: Slides

IRR Tutorial6: Slides

Visual Question Answering Dataset . Image specific Question/Answer pairs generated from gold-standard human captions.

Telugu transliteration parallel corpus . This is from the generous contribution by a public facebook group called telugu inspiration.

Twitter word usage dataset . (Described in Gella et al., 2013)

Twitter and Web lexical sample sense annotations (Described in Gella et al., 2014)


Location Recognizer: A recent work on recognizing locations based on string matching algorithms. Capable of handling spelling corrections of locations as well.

Indic Unicode Equivalence: A simple script that handles unicode and transliteration equivalence issues in Indian languages (Hindi, Gujarati and Bengali) – Thanks to Jatin Sharma for the help with understanding the cases to be handled.

Other Useful Datasets