Journal of Guangxi Normal University(Natural Science Edition) ›› 2011, Vol. 29 ›› Issue (1): 167-172.

Previous Articles     Next Articles

Extraction of Web Mathematical Formulas Based on Nutch

CUI Lin-wei, SU Wei, GUO Wei, LI Lian   

  1. College of Information Science and Engineering,Lanzhou University,Lanzhou Gansu 730000,China
  • Received:2010-12-22 Published:2018-11-16

Abstract: The paper introduces the recognizing and extracting methods of mathematics expressions in formula-based mathematics search engine.Itsummarizes the corresponding features of MathML,OpenMath,LaTex and Infix when they are embedded in a Web page.A feature-based heuristic method of recognizing and extracting mathematical expressions is given in the paper.The experimentsproves that the method is effective and useful.

Key words: search engine, crawler, formulas search, mathematical formulas, MathML, OpenMath

CLC Number: 

  • TP391.3
[1] KWARC Research Group.Math web search[CP/OL].[2010-10-12].
[2] MINER R.The mathdex search engine[EB/OL].(2007)[2010-08-14].
[3] The ActiveMath Project Group.The activeMath project[CP/OL].[2010-10-12].
[4] YOUSSEF A.An information search and retrieval of mathematical contents:issues and methods[C]//Proceedings of the ISCA 14th International Conference on Intelligent and Adaptive Systems and Software Engineering.Cary,NC:ISCA,2005:100-105.
[5] YOUSSEF A.Roles of math search in mathematics[C]//Proceedings ofthe 5th International Conference on Mathematical Knowledge Management:LNAI Vol4108.Berlin:Spinger,2006:2-16.
[6] YOUSSEF A.Methods of relevance ranking and hit-content generationin math search[C]//Proceedings of the 6th Mathematical Knowledge Management Conference:LNCS Vol4573.Berlin:Springer,2007:393-406.
[7] 景珂.网络数学搜索中的数学查询语言与索引的研究[D].兰州:兰州大学信息科学与工程学院,2009.
[8] SUZUKI M,TAMARI F,FUKUDA R,et al.INFTY:an integrated OCR system for mathematical documents[C]//Proceedings of the 2003 ACM symposium on Documentengineering.New York:ACM Press,2003:95-104.
[1] LU·· Xue-qiang, SHU Yan, SUN Li-hua, CHENG Tao. Phrase of “V+N1+N2” Structure in Search Engine Query Logs [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 109-115.
[2] XIAO Shi-bin, ZHAO Hong-gai, WANG Hong-jun, LU· Xue-qiang. Noun Phrase of “N1+N2+V” Structure in Search Engine Query Logs [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 116-122.
Full text



No Suggested Reading articles found!