TY - JOUR
T1 - Mathematical formula representation via tree embeddings
AU - Wang, Zichao
AU - Lan, Andrew
AU - Baraniuk, Richard
N1 - Publisher Copyright:
Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2021
Y1 - 2021
N2 - We propose a new framework for learning formula representations using tree embeddings to facilitate search and similar content retrieval in textbooks containing mathematical (and possibly other types of) formula. By representing each symbolic formula (such as math equation) as an operator tree, we can explicitly capture its inherent structural and semantic properties. Our framework consists of a tree encoder that encodes the formula's operator tree into a vector and a tree decoder that generates a formula from a vector in operator tree format. To improve the quality of formula tree generation, we develop a novel tree beam search algorithm that is of independent scientific interest. We validate our framework on a formula reconstruction task and a similar formula retrieval task on a new real-world dataset of over 770k formulae collected online. Our experimental results show that our framework significantly outperforms various baselines.
AB - We propose a new framework for learning formula representations using tree embeddings to facilitate search and similar content retrieval in textbooks containing mathematical (and possibly other types of) formula. By representing each symbolic formula (such as math equation) as an operator tree, we can explicitly capture its inherent structural and semantic properties. Our framework consists of a tree encoder that encodes the formula's operator tree into a vector and a tree decoder that generates a formula from a vector in operator tree format. To improve the quality of formula tree generation, we develop a novel tree beam search algorithm that is of independent scientific interest. We validate our framework on a formula reconstruction task and a similar formula retrieval task on a new real-world dataset of over 770k formulae collected online. Our experimental results show that our framework significantly outperforms various baselines.
UR - http://www.scopus.com/inward/record.url?scp=85109647988&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85109647988&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85109647988
SN - 1613-0073
VL - 2895
SP - 121
EP - 133
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 3rd International Workshop on Intelligent Textbooks, iTextbooks 2021
Y2 - 15 June 2021
ER -