Advances in Remote Sensing technologies provide various information regarding object detection problems. This information makes the interpretation of optical remote sensing images easier. Especial kinds of these interpretations relate to Object Detection approaches that most researches in this field are carried out using Neural Networks and Deep Learning techniques; Design of the network is an important process that affects detection accuracy. Recent researches
in the deep learning field and convolutional neural networks show that deeper networks can achieve better accuracy. However, in previous researches, sometimes too deep networks are the reason for other problems such as increasing the number of trainable parameters, vanishing gradients, unused extracted features, etc. These problems decrease the accuracy of the network in recognition of objects. This issue has been mentioned in many types of researches in the field of convolutional networks, and they have tried to meet the challenge by examining different topologies or presenting new training methods. In this article, a model was developed and tried to keep extracted features and transfer them to the next layers. The proposed architecture is a combination of several blocks stacked in a row. The blocks receive their input from the previous block and perform the relevant calculations. Each block consists of several cells that have two layers of convolution. To efficiently use all the features of the training images, the filters used in the convolution layers have kernels with sizes of 1×1 and 3×3. The output of the 3×3 layer in the combining stage is integrated with the information of the previous layers. The architecture of each cell in the proposed network keeps all the extracted features from previous layers to be used in subsequent cells. With these connections between layers, the networks can be deeper with fewer effects of vanishing gradient. In addition to solving gradient problem, this architecture decreases the number of trainable parameters and duration of the training phase impressively. The result of this process is an increase in the ability of existing models to distinguish multi-class objectives.
For this purpose, first, a collection of 320 training images is proposed and preprocessed. The proposed method is defined as feature extractor of Faster R-CNN model, and it is trained on image collection. To evaluate the proposed method, a part of Beijing International Airport and a part of Imam Khomeini International Airport were selected as the first and second case study areas. The F1-Measure criterion values for both regions are 97.9 and 93.7, respectively. While, ResNet architecture with 101 layers of convolution and 14.4 million more trainable parameters than the proposed architecture has achieved values of 96.7 and 93% for the mentioned criterion. Finally, the results of applying the proposed model were compared with different famous models of the existing network. The experimental results indicated the reliability and efficiency of the proposed method.
To improve the proposed architecture in this paper, dilated convolution operators can be used to extract more prominent features. On the other hand, with the aim of development and generalization, the proposed method can be applied in two stages on high resolution remote sensing images; In the first step, the goal is to identify the location of the airport, and in the next step, the planes inside each airport will be identified by the proposed method.