Taraka Rama

Multilingual NLP.

View My GitHub Profile

Publications & Research Overview

Research Themes


Selected Publications

Tübingen-Oslo at SemEval-201-8 Task 2: SVMs Perform Better Than RNNs in Emoji Prediction

Demonstrates that SVMs outperform RNN models on emoji prediction tasks.

Are Sounds Sound for Phylogenetic Reconstruction?

Finds that cognate-based phylogenies outperform sound-based methods for reconstructing language histories.

Probing Multilingual BERT for Genetic and Typological Signals

Shows that mBERT encodes genealogical and typological information; produces language distance matrices and phylogenetic trees.

Disentangling Dialects: A Neural Approach to Indo-Aryan Historical Phonology and Subgrouping

Uses LSTM encoder–decoders to model sound change and subgroup Indo-Aryan languages.

An Automated Framework for Fast Cognate Detection and Bayesian Phylogenetic Inference

Proposes a scalable pipeline combining automated cognate detection with Bayesian phylogenetics.

Similarity-Dependent Chinese Restaurant Process for Cognate Identification in Multilingual Wordlists

A threshold-free clustering method for cognate detection across multilingual datasets.

Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic Reconstruction?

Evaluates automatic vs. expert cognate sets for phylogenetic tree building.

Experiments with Universal CEFR Classification

Cross-lingual CEFR proficiency classification across learner languages.

Computational Analysis of Gondi Dialects

Combines dialectometry and neural modeling to analyze dialect variation.

A Telugu Treebank Based on a Grammar Book

Develops a Telugu treebank using grammar-based annotation.


Research Impact