What is it about?
Recognizing the names of identifiable entities such as buildings, medicines, and products from unstructured text is crucial for many applications and services. We have developed a scalable framework TAFSIL to recognize wide variety of entities across several languages spoken by more than 1.5 billions people. This paper demonstrates the efficacy of the proposed framework and the high quality of the generated dataset.
Featured Image
Photo by zhendong wang on Unsplash
Why is it important?
Data hungry AI systems often struggle in low-resource languages and with recognizing new and unseen entities. Our TAFSIL framework provides a two-pronged solution by enabling the creation of fine-grained entity recognition datasets in different taxonomies for six languages spoken across various South and South-east Asian countries.
Perspectives
I hope this article and the resources will pave an important path towards the advancement of AI solutions, especially for low-resource languages. It is a great inspiration that through the research work, billions of people may benefit.
Prachuryya Kaushik
Indian Institute of Technology Guwahati
Read the Original
This page is a summary of: TAFSIL: Taxonomy Adaptable Fine-grained Entity Recognition through Distant Supervision for Indian Languages, July 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3726302.3730341.
You can read the full text:
Resources
Contributors
The following have contributed to this page







