Intelligent compilation of patent summaries using machine learning and natural language processing techniques

A.J.C. Trappey, Charles Trappey, J.L. Wu*, W.C. Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Patents are a type of intellectual property with ownership and monopolistic rights that are publicly accessible published documents, often with illustrations, registered by governments and international organizations. The registration allows people familiar with the domain to understand how to re-create the new and useful invention but restricts the manufacturing unless the owner licenses or enters into a legal agreement to sell ownership of the patent. Patents reward the costly research and development efforts of inventors while spreading new knowledge and accelerating innovation. This research uses artificial intelligence natural language processing, deep learning techniques and machine learning algorithms to extract the essential knowledge of patent documents within a given domain as a means to evaluate their worth and technical advantage. Manual patent abstraction is a time consuming, labor intensive, and subjective process which becomes cost and outcome ineffective as the size of the patent knowledge domain increases. This research develops an intelligent patent summarization methodology using artificial intelligence machine learning approaches to allow patent domains of extremely large sizes to be effectively and objectively summarized, especially for cases where the cost and time requirements of manual summarization is infeasible. The system learns to automatically summarize patent documents with natural language texts for any given technical domain. The machine learning solution identifies technical key terminologies (words, phrases, and sentences) in the context of the semantic relationships among training patents and corresponding summaries as the core of the summarization system. To ensure the high performance of the proposed methodology, ROUGE metrics are used to evaluate precision, recall, accuracy, and consistency of knowledge generated by the summarization system. The Smart machinery technologies domain, under the sub-domains of control intelligence, sensor intelligence and intelligent decision-making provide the case studies for the patent summarization system training. The cases use 1708 training pairs of patents and summaries while testing uses 30 randomly selected patents. The case implementation and verification have shown the summary reports achieve 90% and 84% average precision and recall ratios respectively.
Original languageEnglish
Article number101027
JournalAdvanced Engineering Informatics
StatePublished - Jan 2020

Fingerprint Dive into the research topics of 'Intelligent compilation of patent summaries using machine learning and natural language processing techniques'. Together they form a unique fingerprint.

Cite this