Tags Extraction from Spatial Documents by Search Engines
S. Borhaninejad, F. Hakimpour , E. Hamzei
Nowadays the selective access to information on the Web is provided by search engines, but in the cases which the system includes spatial information the search task becomes more complex and requires special capabilities in the search engine system. The purpose of this study is to extract the information which lies in the GML documents also implementation and evaluation of this extracted information retrieval method in an integrated approach. Our proposed system consists of three components: crawler, database and user interface.

1- Crawler: The main innovation of this study is this component. Crawler is a piece of software that after receiving the initial feed enters into Web pages and open links on each page and enters into the pages of these links. The crawler repeats this for new pages until all pages are reviewed and there are no new pages.  The typical spatial search engines crawlers analyze and process the HTML documents and extract spatial information contained in these documents. In our proposed system, the crawler processes GML documents text instead of HTML documents, and extracts the spatial information from these documents. Crawler in this system has two main tasks:

- Detection of GML documents among the documents with different formats.

- Parsing of GML documents and extracting the spatial information  

2-Database: database has two major tasks in this system:

- Storing data which collected by crawlers

- Information indexing

3-User Interface: this section provides interaction between user and system and users send their queries to the system through this interface

In general, this system's search process is done in two phases: online and offline. Offline phase includes the crawler's searching and storing the information into the database. And the online phase includes user interface and ranking operation.

All in all, in this study the following objectives discussed:

1- Extraction of spatial information which is embedded in Web documents: Spatial documents include spatially explicit information such as the coordinates of the feature or the type of feature that extracting this information improves the response rate of spatial queries in search engines.

2- Implementation and evaluation of an integrated spatial information retrieval approach.

 We have implemented this system as a pilot system on an Application Server as a simulation of Web. Our system as a spatial search engine provided searching capability throughout the GML documents and thus an important step to improve the efficiency of search engines has been taken.

Despite the fact that today's engineers and specialists in many fields need raw spatial data and looking for it on the World Wide Web, most of spatial search engines are based on map representation and less attention is paid to spatial data. There is a substantial volume of spatial documents and information on the Web, however, the extent of the Web has caused this huge volume of documents and information hard to find among other information.Our proposed system as a spatial search engine provides the possibility of searching throughout the GML documents and thus it improves the efficiency of spatial search engines. Since GML documents include explicit spatial information along with non-spatial information, the main advantage of this system compared to other spatial search engines is an integrated approach to spatial and non-spatial data.

Keywords: Spatial Search Engine, Spatial Documents, Crawler, GML
