What is it about?
In the corporate or academic environment, knowing the areas of expertise of different professionals is something very relevant. This information can be used to help with tasks such as finding experts in a particular field, identifying which researchers are potentially eligible for grants, and link prediction. We explored the usage of machine learning techniques to recognize the main areas of expertise of researchers using several representations of their scientific production titles as the data source for classification algorithms. We have been able to surpass the current state-of-art results to resolve this problem by using a TF-IDF character n-gram representation for the text in the titles, achieving an accuracy of 95.91%.
Featured Image
Photo by Leon Wu on Unsplash
Why is it important?
We proposed and compared several machine learning techniques to recognize researchers' areas of expertise using its scientific productions titles as the data source to improve the related approaches. The titles were represented using different strategies, such as TF-IDF character N-Grams (which has shown good results in text classification tasks) and word embedding (namely, the Word2Vec approach). We found out that the character level n-grams with TF-IDF outperforms the word-level representation, the usage of a word embedding approach (Word2Vec), and previous approaches which used the same dataset.
Perspectives
Read the Original
This page is a summary of: Improving researcher’s area of expertise identification using TF-IDF Characters N-grams, June 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3466933.3466984.
You can read the full text:
Contributors
The following have contributed to this page