Deep Neural Networks for Web Page Information Extraction - Artificial Intelligence Applications and Innovations
Conference Papers Year : 2016

Deep Neural Networks for Web Page Information Extraction

Tomas Gogar
  • Function : Author
  • PersonId : 1012055
Ondrej Hubacek
  • Function : Author
  • PersonId : 1012056
Jan Sedivy
  • Function : Author
  • PersonId : 1012057

Abstract

Web wrappers are systems for extracting structured information from web pages. Currently, wrappers need to be adapted to a particular website template before they can start the extraction process. In this work we present a new method, which uses convolutional neural networks to learn a wrapper that can extract information from previously unseen templates. Therefore, this wrapper does not need any site-specific initialization and is able to extract information from a single web page. We also propose a method for spatial text encoding, which allows us to encode visual and textual content of a web page into a single neural net. The first experiments with product information extraction showed very promising results and suggest that this approach can lead to a general site-independent web wrapper.
Fichier principal
Vignette du fichier
430537_1_En_14_Chapter.pdf (1.48 Mo) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

hal-01557648 , version 1 (06-07-2017)

Licence

Identifiers

Cite

Tomas Gogar, Ondrej Hubacek, Jan Sedivy. Deep Neural Networks for Web Page Information Extraction. 12th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), Sep 2016, Thessaloniki, Greece. pp.154-163, ⟨10.1007/978-3-319-44944-9_14⟩. ⟨hal-01557648⟩
892 View
3712 Download

Altmetric

Share

More