Open Access Repository

ETNLP: A visual-aided systematic approach to select pre-trained embeddings for a downstream task

Downloads

Downloads per month over past year

Vu, X-S, Vu, T, Tran, SN and Jiang, L 2019 , 'ETNLP: A visual-aided systematic approach to select pre-trained embeddings for a downstream task', in Proceedings of the 2019 Recent Advances in Natural Language Processing International Conference , pp. 1285-1294 , doi: 10.26615/978-954-452-056-4_147.

[img] PDF
139114 - ETNLP...pdf | Request a copy
Full text restricted

Abstract

© 2019 Association for Computational Linguistics (ACL). All rights reserved. Given many recent advanced embedding models, selecting pre-trained word embedding (a.k.a., word representation) models best fit for a specific downstream task is non-trivial. In this paper, we propose a systematic approach, called ETNLP, for extracting, evaluating, and visualizing multiple sets of pre-trained word embeddings to determine which embeddings should be used in a downstream task. We demonstrate the effectiveness of the proposed approach on our pre-trained word embedding models in Vietnamese to select which models are suitable for a named entity recognition (NER) task. Specifically, we create a large Vietnamese word analogy list to evaluate and select the pre-trained embedding models for the task. We then utilize the selected embeddings for the NER task and achieve the new state-of-the-art results on the task benchmark dataset. We also apply the approach to another downstream task of privacy-guaranteed embedding selection, and show that it helps users quickly select the most suitable embeddings. In addition, we create an open-source system using the proposed systematic approach to facilitate similar studies on other NLP tasks. The source code and data are available at https://github.com/vietnlp/etnlp.

Item Type: Conference Publication
Authors/Creators:Vu, X-S and Vu, T and Tran, SN and Jiang, L
Keywords: word embedding
Journal or Publication Title: Proceedings of the 2019 Recent Advances in Natural Language Processing International Conference
ISSN: 1313-8502
DOI / ID Number: 10.26615/978-954-452-056-4_147
Copyright Information:

Copyright unknown

Item Statistics: View statistics for this item

Actions (login required)

Item Control Page Item Control Page
TOP