MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages

Shashank Sonkar, Zichao Wang, Richard G. Baraniuk

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper investigates the problem of Named Entity Recognition (NER) for extreme lowresource languages with only a few hundred tagged data samples. A critical enabler of most of the progress in NER is the readily available, large-scale training data for languages such as English and French. However, NER for lowresource languages remains relatively underexplored, leaving much room for improvement. We propose Mask Augmented Named Entity Recognition (MANER), a simple yet effective method that leverages the distributional hypothesis of pre-trained masked language models (MLMs) to improve NER performance for lowresource languages significantly. MANER repurposes the [mask] token in MLMs, which encodes valuable semantic contextual information, for NER prediction. Specifically, we prepend a [mask] token to every word in a sentence and predict the named entity for each word from its preceding [mask] token. We demonstrate that MANER is well-suited for NER in low-resource languages; our experiments show that for 100 languages with as few as 100 training examples, it improves on the state-of-the-art by up to 48% and by 12% on average on F1 score. We also perform detailed analyses and ablation studies to understand the scenarios that are best suited to MANER.

Original languageEnglish (US)
Title of host publication4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023 - Proceedings of the Workshop
EditorsNafise Sadat Moosavi, Iryna Gurevych, Yufang Hou, Gyuwan Kim, Jin Kim Young, Tal Schuster, Ameeta Agrawal
PublisherAssociation for Computational Linguistics (ACL)
Pages219-226
Number of pages8
ISBN (Electronic)9781959429791
StatePublished - 2023
Event4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023 - Toronto, Canada
Duration: Jul 13 2023 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023
Country/TerritoryCanada
CityToronto
Period7/13/23 → …

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages'. Together they form a unique fingerprint.

Cite this