Combining neural networks and pattern matching for ontology mining - a meta learning inspired approach

Roussinov, Dmitri and Puchnina, Nadezhda (2019) Combining neural networks and pattern matching for ontology mining - a meta learning inspired approach. In: The 13th IEEE International Conference On Semantic Computing, 2019-01-30 - 2019-02-01. (https://doi.org/10.1109/ICOSC.2019.8665528)

[thumbnail of Roussinov-Puchnina-IEEE-ICSC-2019-Combining-neural-networks-and-pattern-matching-for-ontology-mining]
Preview
Text. Filename: Roussinov_Puchnina_IEEE_ICSC_2019_Combining_neural_networks_and_pattern_matching_for_ontology_mining.pdf
Accepted Author Manuscript

Download (231kB)| Preview

Abstract

Several applications dealing with natural language text involve automated validation of the membership in a given category (e.g. France is a country, Gladiator is a movie, but not a country). Meta-learning is a recent and powerful machine learning approach, which goal is to train a model (or a family of models) on a variety of learning tasks, such that it can solve new learning tasks in a more efficient way, e.g. using smaller number of training samples or in less time. We present an original approach inspired by meta-learning and consisting of two tiers of models: for any arbitrary category, our general model supplies high confidence training instances (seeds) for our category-specific models. Our general model is based on pattern matching and optimized for the precision at top N, while its recall is not important. Our category-specific models are based on recurrent neural networks (RNN-s), which recently showed themselves extremely effective in several natural language applications, such as machine translation, sentiment analysis, parsing, and chatbots. By following the meta-learning principles, we are training our highest level (general) model in such a way that our second-tier category-specific models (which are dependent on it) are optimized for the best possible performance in a specific application. This work is important because our approach is capable of verifying membership in an arbitrary category defined by a sequence of words including longer and more complex categories such as Ridley Scott movie or City in southern Germany that are currently not supported by existing manually created ontologies (such as Freebase, Wordnet or Wikidata). Also, our approach uses only raw text, and thus can be useful when there are no such ontologies available, which is a common situation with languages other than English. Even the largest English ontologies are known to have low coverage, insufficient for many practical applications such as automated question answering, which we use here to illustrate the advantages of our approach. We rigorously test it on a number of questions larger than the previous studies and demonstrate that when coupled with a simple answer-scoring mechanism, our meta-learning-inspired approach 1) provides up to 50% improvement over prior approaches that do not use any manually curated knowledge bases and 2) achieves the state ofthe- art performance among all the current approaches including those taking advantage of such knowledge bases.