Using machine learning approach to identify synonyms for document mining

Amy J.C. Trappey*, Charles Trappey, Jheng Long Wu, Kevin T.C. Tsai

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Technical or knowledge documents, such as research papers, patents, and technical documents, e.g., request for quotations (RFQ), are important knowledge references for multiple purposes. For example, enterprises and R&D institutions often need to conduct literature and patent searches and analyses before, during, and after R&D and commercialization. These knowledge discovery processes help them identify prior arts related to the current R&D efforts to avoid duplicating research efforts or infringing upon existing intellectual property rights (IPRs). It is common to have many synonyms (i.e., words and phrases with near-identical meanings) appeared in documents, which may hinder search results, if queries do not consider these synonyms. For instance, conducting “freedom-to-operate” (FTO) patent search may not find all related patents if synonyms were not taking into consideration. This research develops methodologies of generating domain specific “word” and “phrase” synonym dictionaries using machine learning. The generation and validation of both domain-specific “word” and “phrase” synonym dictionaries are conducted using more than 2000 solar power related patents as testing document set. The testing result shows that, in the solar power domain, both word level and phrase level dictionaries identify synonyms effectively and, thus, significantly improve the patent search results.

Original languageEnglish
Title of host publicationTransdisciplinary Engineering for Complex Socio-technical Systems - Proceedings of the 26th ISTE International Conference on Transdisciplinary Engineering
EditorsKazuo Hiekata, `Brian Moser, Brian Moser, Masato Inoue, Josip Stjepandic, Nel Wognum
PublisherIOS Press BV
Pages509-518
Number of pages10
ISBN (Electronic)9781614994398
DOIs
StatePublished - 7 Oct 2019
Event26th ISTE International Conference on Transdisciplinary Engineering, TE 2019 - Tokyo, Japan
Duration: 30 Jul 20191 Aug 2019

Publication series

NameAdvances in Transdisciplinary Engineering
Volume10

Conference

Conference26th ISTE International Conference on Transdisciplinary Engineering, TE 2019
CountryJapan
CityTokyo
Period30/07/191/08/19

Keywords

  • Machine learning
  • Pattern-based extraction
  • Self-supervised learning
  • Synonym extraction

Fingerprint Dive into the research topics of 'Using machine learning approach to identify synonyms for document mining'. Together they form a unique fingerprint.

  • Cite this

    Trappey, A. J. C., Trappey, C., Wu, J. L., & Tsai, K. T. C. (2019). Using machine learning approach to identify synonyms for document mining. In K. Hiekata, B. Moser, B. Moser, M. Inoue, J. Stjepandic, & N. Wognum (Eds.), Transdisciplinary Engineering for Complex Socio-technical Systems - Proceedings of the 26th ISTE International Conference on Transdisciplinary Engineering (pp. 509-518). (Advances in Transdisciplinary Engineering; Vol. 10). IOS Press BV. https://doi.org/10.3233/ATDE190158