Automatic detection of webpages that share the same web template
[EN] Template extraction is the process of isolating the template of a given webpage. It is widely used in several disciplines, including webpages development, content extraction, block detection, and webpages indexing. One of the main goals of template extraction is identifying a set of webpages with the same template without having to load and analyze too many webpages prior to identifying the template. This work introduces a new technique to automatically discover a reduced set of webpages in a website that implement the template. This set is computed with an hyperlink analysis that computes a very small set with a high level of confidence. ; This work has been partially supported by the Spanish Ministerio de Econom´ıa y Competitividad (Secretar´ıa de Estado de Investigacion, Desarrollo e Innovaci ´ on) ´ under grant TIN2013-44742-C4-1-R and by the Generalitat Valenciana under grant PROMETEO/2011/052. David Insa was partially supported by the Spanish Ministerio de Eduacion under FPU grant AP2010-4415. Salvador Tamarit was partially supported by research project POLCA, Programming Large Scale Heterogeneous Infrastructures (610686), funded by the European Union, STREP FP7. ; Alarte, J.; Insa Cabrera, D.; Silva Galiana, JF.; Tamarit Muñoz, S. (2014). Automatic detection of webpages that share the same web template. Electronic Proceedings in Theoretical Computer Science. 163:2-15. https://doi.org/10.4204/EPTCS.163.2 ; S ; 2 ; 15 ; 163