Today several datasets are available for the users where each of them demonstrates different representations of the real world. Modeling and representation of the world in the form of spatial information have been performed by the private and public organizations. Moreover, recently the spatial data is generated by VGI approach such as Google Map and OpenStreetMap projects. These different representations could create problems for the data producers or users during the processing steps like integration, data quality estimation, updates, and multi-scale analysis. Hence, it is required that objects with identical entities in different datasets be linked to each other. This process is called “data matching” or “object matching” in the literature. The matching method for different types of vector data (i.e. point, line, and polygon) is different. The subject of this article is the linear object matching. In the previous studies, the linear objects matching has been done by considering each or a combination of two geometrical and semantic properties. However, in this study, the geometrical property was used for identifying the corresponding objects; as the studies that are based on semantic matching lose their efficiency or their efficiency decreases when data such as attribute information are missing in one of the datasets.
For this problem, an approach composed of five sections was proposed for improving the matching of roads in the datasets with various scales and sources. The proposed framework was based on the graph theory considering criteria related to geometry in order to match the roads network with various scales and sources. In the proposed solution, the goal was to determine the similarity degree of objects in the datasets with various scales and sources by considering geometrical criteria such as distance, orientation, area, shape, and buffer overlapped area and by considering spatial cognition of the experts’ in determining the weight of criteria. In the first section, in addition to convert the datasets to the same format and the coordinate system, the topologic errors were eliminated. In the second section of the proposed approach, ambiguity in the definition of objects in the datasets was resolved by considering a pre-defined graph structure. In the third and fourth sections, the similarity degree of objects was calculated and the corresponding objects were determined by extracting the introduced geometrical criteria from the objects. In the final section, the proposed approach was tested through the precision, Recall and F-score indices.
The results of the study illustrated that careful and adequate matching is not possible in the data collections with various scales and sources despite the previous studies in which the matching was done with acceptable care and adequacy with one, two, or three geometrical criteria extracted from features in the data collections with various scales and same source or with same scale and various sources. Moreover, in this article, in addition to removing ambiguity in the definition of objects, the F-Score was calculated as 82.67% by considering five geometrical criteria extracted from the objects. It is worthy to note that this value was calculated in the datasets with high scales difference and with various sources without considering other criteria such as semantic and topological criteria.