In PloS one ; h5-index 176.0
International patent classifications (IPCs) are assigned to patent documents; however, since the procedure for assigning classifications is manually done by the patent examiner, it takes a lot of time and effort to select some IPCs from about 70,000 IPCs. Hence, some research has been conducted on patent classification with machine learning. However, patent documents are very voluminous, and learning with all the claims (the part describing the content of the patent) as input would run out of the necessary memory, even if the batch size is set to a very small size. Therefore, most of the existing methods learn by excluding some information, such as using only the first claim as input. In this study, we propose a model that considers the contents of all claims by extracting important information for input. In addition, we focus on the hierarchical structure of the IPC, and propose a new decoder architecture to consider it. Finally, we conducted an experiment using actual patent data to verify the accuracy of the prediction. The results showed a significant improvement in accuracy compared to existing methods, and the actual applicability of the method was also discussed.
Hoshino Yuki, Utsumi Yoshimasa, Matsuda Yoshiro, Tanaka Yoshitoshi, Nakata Kazuhide
2023