Open Access Repository
ETNLP: A visual-aided systematic approach to select pre-trained embeddings for a downstream task
Downloads
Downloads per month over past year
![]() |
PDF
139114 - ETNLP...pdf | Request a copy Full text restricted |
Abstract
© 2019 Association for Computational Linguistics (ACL). All rights reserved. Given many recent advanced embedding models, selecting pre-trained word embedding (a.k.a., word representation) models best fit for a specific downstream task is non-trivial. In this paper, we propose a systematic approach, called ETNLP, for extracting, evaluating, and visualizing multiple sets of pre-trained word embeddings to determine which embeddings should be used in a downstream task. We demonstrate the effectiveness of the proposed approach on our pre-trained word embedding models in Vietnamese to select which models are suitable for a named entity recognition (NER) task. Specifically, we create a large Vietnamese word analogy list to evaluate and select the pre-trained embedding models for the task. We then utilize the selected embeddings for the NER task and achieve the new state-of-the-art results on the task benchmark dataset. We also apply the approach to another downstream task of privacy-guaranteed embedding selection, and show that it helps users quickly select the most suitable embeddings. In addition, we create an open-source system using the proposed systematic approach to facilitate similar studies on other NLP tasks. The source code and data are available at https://github.com/vietnlp/etnlp.
Item Type: | Conference Publication |
---|---|
Authors/Creators: | Vu, X-S and Vu, T and Tran, SN and Jiang, L |
Keywords: | word embedding |
Journal or Publication Title: | Proceedings of the 2019 Recent Advances in Natural Language Processing International Conference |
ISSN: | 1313-8502 |
DOI / ID Number: | 10.26615/978-954-452-056-4_147 |
Copyright Information: | Copyright unknown |
Item Statistics: | View statistics for this item |
Actions (login required)
![]() |
Item Control Page |