Multilingual NLP.
Demonstrates that SVMs outperform RNN models on emoji prediction tasks.
Finds that cognate-based phylogenies outperform sound-based methods for reconstructing language histories.
Shows that mBERT encodes genealogical and typological information; produces language distance matrices and phylogenetic trees.
Uses LSTM encoder–decoders to model sound change and subgroup Indo-Aryan languages.
Proposes a scalable pipeline combining automated cognate detection with Bayesian phylogenetics.
A threshold-free clustering method for cognate detection across multilingual datasets.
Evaluates automatic vs. expert cognate sets for phylogenetic tree building.
Cross-lingual CEFR proficiency classification across learner languages.
Combines dialectometry and neural modeling to analyze dialect variation.
Develops a Telugu treebank using grammar-based annotation.