Next Article in Journal
Management of Severe Asthma in Eosinophilic Granulomatosis with Polyangiitis with Interleukin-5-Targeted Therapies: Current Status and Report of Two Cases
Next Article in Special Issue
Real-Time Moving Ship Detection from Low-Resolution Large-Scale Remote Sensing Image Sequence
Previous Article in Journal
Photobiomodulation to Reduce Orthodontic Treatment Time in Adults: A Historical Prospective Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improved Matching Algorithm with Anchor Argument for Rotate Target Detection

1
State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an 710071, China
2
Computer Vision Technology Department, Baidu Inc., Beijing 100085, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(22), 11534; https://doi.org/10.3390/app122211534
Submission received: 16 October 2022 / Revised: 8 November 2022 / Accepted: 10 November 2022 / Published: 14 November 2022
(This article belongs to the Collection Space Applications)

Abstract

:
Convolutional neural networks (CNNs) have been widely used in the task of object detection in remote sensing. Remote sensing targets can have arbitrary angles, and many anchor-base methods use a lot of anchors with different angles which cause efficiency and precision problems. To solve the problem caused by too many anchors, this paper presents a novel matching algorithm in the matching stage of the rotating anchor and object, which determines a more accurate rotating region of interests (RRoIs) for target regression using the copies set for each oriented anchor. It makes use of the high recall rate brought by a large number of anchor boxes with different angles and avoids the computation brought by a large number of anchor boxes. We use the remote sensing datasets DOTA and HRSC2016 with rotation bounding boxes to evaluate our improved algorithm on Rotation RetinaNet and compare it with it. For the targets of high aspect ratios, such as large vehicles and ships, our method is superior to Rotation RetinaNet and achieves a better performance.

1. Introduction

With the continuous development of aerospace technology, object detection tasks in remote sensing imagery have become one of the fundamental and challenging tasks in computer vision. In recent years, due to the effective feature learning ability of CNNs, tremendous successes have been achieved in a variety of computer vision tasks, such as image classification, semantic segmentation and object detection. To achieve better results, however, the difference in some specific characteristics of natural images and aerial images is remarkable as the aerial images are taken from a satellite view when remote sensing satellites are recording the images. The remote sensing images have more complex and diverse backgrounds with multi-scale and arbitrarily oriented targets, so it is hard for rotate object box matching with anchors which limits the application and brings challenges to remote sensing object detection.
Many traditional algorithms of object detection tasks, which are built upon the assumption of the horizontal position of the bounding box, have been applied to the detection of aerial objects. A regional proposal network (RPN) was first proposed in a Faster RCNN [1]. An RPN is implemented using a full convolution network, which has the great advantage of solving the time-consuming problem of region proposals in traditional object detection algorithms. The authors of [2] apply an RPN to enhance the object cues and weaken the non-object information. However, due to the arbitrary orientations, multi-scales and cluttered arrangement of aerial objects, these algorithms cannot show promising performances in practical applications. As shown in Figure 1, detecting oriented objects with horizontal bounding boxes cannot accurately locate the objects and contain much background. Moreover, the overlap between two horizontal bounding boxes of the adjacent objects is large, which will suppress the detection of densely arranged objects, and the precise localization could be discarded during employing the conventional non-maximal suppression (NMS) technique. Thus, the oriented bounding box is necessary for oriented object detection in aerial images.
Inspired by the arbitrarily oriented text detection model RRPN [3], many methods have been proposed with a rotated region of interests (RRoIs) which matches the oriented ground truth (GT) bounding boxes better. Although these rotation detectors have achieved good results, the computational cost dramatically increases as there is a large number of oriented anchors generated on each pixel. For example, the RRPN’s anchors increase five times. Reference article [4] shows the latency of NMS in their algorithm. If the number of anchors increases four times, the latency will increase 2.5 times; these values are showed in Table 1. Alternatively, another solution is to set more reasonable matching rules for the oriented GT bounding boxes and region of interests (RoIs), such as DAL [5]. However, these methods will lead to a complex model structure and a tardy convergence rate.
For the sake of making use of the accuracy of the oriented anchor matching while minimizing the excessive amount of calculation, we propose a new matching strategy for anchors and targets. As shown in Figure 1, the smaller the angle interval of the oriented anchor set on each pixel (the more the number of anchors at different angles), the more the number of anchors matched by the target and the better the detection effect in the matching stage. Motivated by this, we separated the anchors used in the matching stage from those used in the regression stage. In the matching stage, we designed several auxiliary copies for each anchor used for regression to reduce the angle interval of the oriented anchor and improve the detection ability while avoiding the invalid calculation caused by these auxiliary copies in the regression step. The benefits are illustrated in Figure 2. Based on the proposed framework above, we achieved a performance of 6.96% and 9.90% higher than the baseline on the rotating sensitive datasets, including the large vehicle in DOTA and HRSC2016, respectively.

2. Related Work

Recently, a lot of arbitrary oriented object detectors have been proposed for aerial images and scene text detection. The authors of [3] employed a rotated RPN to generate rotated region proposals based on the Faster R-CNN framework and further performed rotated bounding box regression. TextBoxes++ [6] adopted the framework of TextBoxes [7] and proposed an SSD-based oriented text detection method. An RRD [8] further improved TextBoxes++ and used different features for classification and regression tasks. The regression branch network extracts the rotation sensitive features, and the classification branch extracts the rotation invariant features by pooling the rotation sensitive features. Aerial object detection is a challenging task in computer vision as well. The authors of [9] proposed an RoI transformer that transforms the horizontal proposals to rotated ones and a rotation position-sensitive RoI align (RPS RoI align) model to extract rotation invariant features. In order to improve the sensitivity to small targets, the authors of [2] designed a sampling fusion network based on multi-layer features and effective anchor sampling with a supervised pixel attention network and channel attention network jointly to suppress the noise and enhance the features. A dynamic anchor learning [5] method was proposed, which uses the newly defined matching degree to comprehensively evaluate the localization potential of anchors and realizes a more effective label allocation process. The authors of [10] proposed an approximate skew loss to solve the problem that the calculation of skew loss cannot be deduced to obtain more accurate rotation estimations.
For small targets that often appear in remote sensing target detection, [11,12] proposed a small target detection method based on a multi-directional derivative weighted contrast measure and a new small target detection method based on a local hypergraph dissimilarity measure (LHDM). The authors of [13] proposed scale-aware rotated object detection. Deep learning has many applications in other scenarios as well [14,15,16,17]. As we can see, most of the existing methods use the same anchor setting in both the training stage and the test stage.

3. The Proposed Method

The proposed method is built based on the oriented anchor and bounding box shown in Figure 3. First, we designed multiple copies of each oriented anchor with different angles and defined the intersection over union (IoU) between the target and the anchor as the largest skew IoU of the target, and these copies retained the advantage of high matching accuracy of multi-anchor frame in the matching stage and determined more accurate RoIs through this method. Second, after determining the accurate receptive field, we only applied one anchor to regress the coordinates of rotated bounding boxes, which reduced the complexity and computational burden of the multi-anchor structure. Third, this structure was straightforward and can improve the performance of existing models with multiple anchors while reducing the computational cost without the demands of abundant computation and complicated structures. In this section, we first introduce the existing object detection algorithm based on the rotating anchor and state the omission in the matching stage due to the angle difference between the target and the anchor. Then, we propose our method: an auxiliary anchors matching method. This can intensively match target with more oriented anchors without greatly increasing the amount of calculation to avoid some matching omissions caused by the large difference between the target and the preset rotating anchor.

3.1. RetinaNet Based on Rotation Anchor

RetinaNet is one of the most advanced one-stage detection frameworks, which can be applied for real-time inference. RetinaNet uses the structure of ResNet and Feature Pyramid Network (FPN) as the backbone to generate a multi-scale convolutional feature pyramid. Two subnetworks are connected to the backbone subsequently to classify anchor boxes and regress anchor boxes to ground truth target boxes.
In the traditional RetinaNet framework, the area anchored on the pyramid P 3 to P 7 layers is 322 to 5122 respectively, where P i represents the pyramid level, and P i has resolution 1:2 lower than P i 1 . At each pyramid level, we use anchors of different scales and sizes. In the RetinaNet based on rotation anchor, orientation parameters are added to each anchor box to propose the RRoI. Therefore, in each pyramid level from P 3 to P 7 , Rotation RetinaNet uses anchors with five different aspect ratios, three different sizes and six different angles.
However, we found that in Rotation RetinaNet, the confidence of rotation target with large aspect ratio detection is very sensitive to the preset angles of anchor boxes, as shown in Figure 4. The same object with different orientations can achieve remarkably different confidence. The two objects have the same image features, while the angle enhancement during the training process is also random without specificity. So, the different results are not caused by data or network structure.
In fact, this is not a coincidence but a necessity. The target which achieves high confidence is oriented with an angle the same as the anchor angle. For a target with a large aspect ratio, if we add an anchor oriented at the same angle of target 2 in the network, the confidence of target 2 will also be improved. Therefore, it can be assumed that if we set more anchors with different orients in the network, the network will learn more positive examples, and the recall rate of the network will definitely be higher.

3.2. Anchor Argument Matching

After the above analysis, we believe that the capability of the network structure is actually enough, but this has not been excavated yet. If the anchors are set with sufficient angles, the detection results of the network will no longer be affected by the target angle. During training, the RRoI with classification and regression ability will be regarded as a negative sample because of the low IoU of the target and the oriented anchor. In fact, a recent article has proved that many anchors with negative samples can finally predict the correct target [4]. Motivated by this case, our fundamental idea was to reduce the number of anchors with regression ability, which are regarded as negative samples in the network.
In rotation object detection with orientation anchor, we proposed an anchor argument matching method, which is an optimization method in the anchor-target matching stage. In particular, we generated multiple duplicates with different angles for each orientation anchor to assist the original anchor in matching. In the training process, when we calculated the IoU between the target and each rotation anchor, we first generated some duplicates for the anchor. The step S N e w between the auxiliary copies (including the original anchor) is defined as:
S N e w = S O l d t i m e s
where S O l d is the step between two adjacent anchors, which are used for regression, and times means the number of copies generated to assist an anchor. S N e w is the step between two adjacent anchors after we add these duplicates anchors. The angles of these auxiliary duplicates are:
A c o p y = A x × S N e w
where x   t i m e s 1 ,   1 ,   t i m e s , A c o p y represents the anchor angle of each copy, and A means the original angle of the anchor. We can generate the number of auxiliary copies by setting the value of x . All the structures in the network will remain the same except this; thus, the computation of prediction will not increase.
In the matching stage, the IoU is defined as the maximum value of the IoU between the targets and anchors, including the original anchor and auxiliary copies, which is represented as:
IoU = m a x ( IoU ori , IoU c o p y )
where IoU ori is the IoU between the target and the original anchor, and IoU c o p y is the IoU between the target and auxiliary copies. This maximizing process will go through all of the auxiliary copies. Then, the IoU is compared with the threshold to distinguish the positive and negative samples. In this way, we can avoid the missing detection of positive samples due to the angle sensitive characteristics of IoU when the rotating anchor matches the target with a high aspect ratio and get a better matching effect through more copies. This method only increases the number of positive samples, not negative samples.
How does this proposed method work? The fairness of our method can be explained as the same target rotated to any angle can match the same number of anchors. This method is not required to reduce the set IOU threshold but increases some IOUs between anchor and target through maximization, which is showed in Figure 2. On the premise that other modules are not affected, if more anchors match the target, the probability of obtaining high performance after NMS is greater. With the increase in the number of auxiliary copies (from different angles), which is x in Equation (2), it will be fairer for targets from any angle. Therefore, the targets with wide aspect ratios are treated equally by the network regardless of their angles. Moreover, theoretically, as our method is fairer for targets with arbitrary angles, it can reduce the demand for random rotation in target detection with a high aspect ratio.
The factors affecting the IOU value between two rotating rectangular boxes include position, area ratio, aspect ratio and angle. If only the IOU threshold is changed, the feasible range of these four quantities will be affected at the same time. The method in this paper only affects the feasible range of the angle.

4. Results

4.1. Datasets

In order to evaluate the effect of the structure we designed, we chose to test on two public remote sensing datasets.
DOTA is a large remote sensing dataset released in 2018. The whole dataset contains 2806 aerial images with image sizes ranging from 800 ∗ 800 to 4000 ∗ 4000. There are 15 categories of targets, including planes, ships, storage tanks, basements, diamonds, tennis courts, swimming pools, ground track fields, harbors, bridges, large vehicles, small vehicles, helicopters, roundabouts, soccer ball fields and basketball courts, with a total of 188,282 instances.
HRSC2016 was released in 2016 for ship detection. All the images in the dataset are from six famous ports, and the resolution of most images is greater than 1000 ∗ 600. The dataset contains 1070 images and 2976 samples in total.

4.2. Experiment Details

For all datasets, the model has been trained with the training set for a total of 20 epochs. We used ResNet50 to initialize the model parameters. The initial learning rate was 5 × 10 4 , which decreased 10 times in 12 and 16 stages, respectively. During training, the weight attenuation and momentum were 0.0001 and 0.9, respectively, and the batch size was 1. The area anchored on the pyramid P3 to P7 was 32 2 to 512 2 , respectively. In Rotation RetinaNet, the angle interval between the rotation anchors was 15 degrees. In our model, the original step size was 15 degrees, and the new step size was 5 degrees. As the goal of the auxiliary anchor was to achieve a better matching performance, the smaller the step size, the better the performance. In practice, a smaller value was no longer necessary when the step size solved most of the matching problems.

4.3. Experiment Results

We compared the performance of our proposed model with that of Rotation RetinaNet on the DOTA and HRSC2016 datasets, as shown in Table 2 and Table 3. Our proposed model showed a significantly better performance than the baseline in the target category of the large aspect ratio, such as LV in DOTA and ships in HRSC2016, at about 6.94 and 9.90%, respectively. In Table 2, there are some categories that did not improve, but we still achieved a 1.46 mean AP increase compared with the baseline. Figure 5 illustrates the RP curves of the large aspect ratio targets of the two datasets. Under the same accuracy, the recall rates of our proposed model are higher than the baseline. Meanwhile, in order to verify the superiority of our model for the oriented target in some angle regions, we compared the recall improvement of our proposed model on the targets that are not sensitive to the anchor angle set by Rotation RetinaNet, such as the target oriented at 5–10, 20–25, 35–40, 50–55, 65–70 and 80–85 degrees, with the baseline model. The anchors set by Rotation RetinaNet are unequal in the detection of the targets of these angles because they do not have rotating anchors that can match them as well as the targets oriented at other angles. It is shown in Figure 6 that our method has an evident improvement effect on the targets of these angles.

5. Conclusions

In this paper, we proposed a novel algorithm of a rotated anchor box and oriented target matching. The algorithm took advantage of the high matching accuracy of a large number of anchor boxes with different angles, calculated the maximum inclined IOU between all anchor copies and the target for each oriented anchor and determined more accurate RRoIs in the matching stage to improve the recall rate of high aspect ratio targets. In the experiment, we used the remote sensing image datasets DOTA and HRSC2016, labeled by rotating bounding boxes. For target categories with a high aspect ratio, our model can achieve higher mAP and show a higher recall rate under the condition of consistent accuracy. Based on the existing literature and experimental phenomena, we proposed an improved method in the multi-angle anchor matching part. This method did not need to improve the structure of the existing rotating anchor frame matching network nor does it depend on the optimization of the loss function. It can be directly applied to the existing rotating anchor frame matching network. Our method provided more accurate RoIs for the subsequent coordinate regression and improved the recall rate of the algorithm detection. In the follow-up work, we will try to apply the algorithm to networks under other frameworks except for RetinaNet and further explore other possibilities to separate matching and regression anchors.

Author Contributions

Conceptualization, K.W. and X.W.; methodology, K.W. and B.C.; software, K.W. and B.C.; validation, Y.L.; formal analysis, K.W. and X.W.; writing—original draft preparation, B.C. and X.W.; visualization, B.C.; funding acquisition, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded in part by the China Postdoctoral Science Foundation (2013M540735); in part by the National Nature Science Foundation of China under Grant 61901388, 61301291, 61701360, 61502367, 61501346, 61571345, 91538101, 61801359, 61401337; in part by the 111 Project under Grant B08038; in part by the Yangtse River Scholar Bonus Schemes; in part by the Ten Thousand Talent Program; in part by the Fundamental Research Funds for the Central Universities; and in part by the Youth Innovation Team of Shaanxi Universities.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
  2. Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; Volume 10, pp. 8231–8240. [Google Scholar]
  3. Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef] [Green Version]
  4. Zhang, T.; Lin, J.; Hu, P.; Zhao, B.; Aly, M.M.S. PSRR-Maxpool NMS: Pyramid Shifted Maxpool NMS with Relationship Recovery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; IEEE Xplore: Washington, DC, USA, 2021. [Google Scholar]
  5. Ming, Q.; Zhou, Z.; Miao, L.; Zhang, H.; Li, L. Dynamic anchor learning for arbitrary-oriented object detection. arXiv 2020, arXiv:2012.04150. [Google Scholar] [CrossRef]
  6. Liao, M.; Shi, B.; Bai, X. Textboxes++: A single-shot oriented scene text detector. IEEE Trans. Image Process. 2018, 27, 3676–3690. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Liao, M.; Shi, B.; Bai, X.; Wang, X.; Liu, W. Textboxes: A fast text detector with a single deep neural network. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; AAAI Press: Menlo Park, CA, USA, 2016. [Google Scholar]
  8. Liao, M.; Zhu, Z.; Shi, B.; Xia, G.-S.; Bai, X. Rotation-Sensitive Regression for Oriented Scene Text Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  9. Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning roi transformer for oriented object detection in aerial images. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; IEEE Xplore: Washington, DC, USA, 2020. [Google Scholar]
  10. Yang, X.; Liu, Q.; Yan, J.; Li, A.; Zhang, Z.; Yu, G. R3det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; AAAI Press: Menlo Park, CA, USA, 2019. [Google Scholar]
  11. Lu, R.; Yang, X.; Li, W.; Fan, J.; Li, D.; Jing, X. Robust Infrared Small Target Detection via Multidirectional Derivative-Based Weighted Contrast Measure. IEEE Geosci. Remote Sens. Lett. 2020, 19, 7000105. [Google Scholar] [CrossRef]
  12. Lu, R.; Yang, X.; Jing, X.; Chen, L.; Fan, J.; Li, W.; Li, D. Infrared Small Target Detection Based on Local Hypergraph Dissimilarity Measure. IEEE Geosci. Remote Sens. Lett. 2020, 19, 7000405. [Google Scholar] [CrossRef]
  13. Wang, Y.; Zhang, Y.; Zhang, Y.; Zhao, L.; Sun, X.; Guo, Z. SARD: Towards Scale-Aware Rotated Object Detection in Aerial Imagery. IEEE Access 2019, 7, 173855–173865. [Google Scholar] [CrossRef]
  14. Wu, D.; Wu, C. Research on the Time-Dependent Split Delivery Green Vehicle Routing Problem for Fresh Agricultural Products with Multiple Time Windows. Agriculture 2022, 12, 793. [Google Scholar] [CrossRef]
  15. Chen, H.; Miao, F.; Chen, Y.; Xiong, Y.; Chen, T. A Hyperspectral Image Classification Method Using Multifeature Vectors and Optimized KELM. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2781–2795. [Google Scholar] [CrossRef]
  16. Deng, W.; Zhang, L.; Zhou, X.; Zhou, Y.; Sun, Y.; Zhu, W.; Chen, H.; Deng, W.; Chen, H.; Zhao, H. Multi-strategy particle swarm and ant colony hybrid optimization for airport taxiway planning problem. Inf. Sci. 2022, 612, 576–593. [Google Scholar] [CrossRef]
  17. Ren, Z.; Han, X.; Yu, X.; Skjetne, R.; Leira, B.J.; Sævik, S.; Zhu, M. Data-driven simultaneous identification of the 6DOF dynamic model and wave load for a ship in waves. Mech. Syst. Signal Process. 2023, 184, 109422. [Google Scholar] [CrossRef]
Figure 1. (a) Oriented objects with horizontal bounding boxes. (b) Oriented objects with oriented bounding boxes.
Figure 1. (a) Oriented objects with horizontal bounding boxes. (b) Oriented objects with oriented bounding boxes.
Applsci 12 11534 g001
Figure 2. The number of anchors matched to a target under the same IOU threshold. Baseline’s matching methods greatly fluctuate at either 4, 6 or 7, with a variance of 1.396, which will produce great unfairness in matching. Our method has small fluctuation, most of which are 8, and a few are 7 and 10, and the variance is 0.5, which is fairer in matching when the receptive field is appropriate.
Figure 2. The number of anchors matched to a target under the same IOU threshold. Baseline’s matching methods greatly fluctuate at either 4, 6 or 7, with a variance of 1.396, which will produce great unfairness in matching. Our method has small fluctuation, most of which are 8, and a few are 7 and 10, and the variance is 0.5, which is fairer in matching when the receptive field is appropriate.
Applsci 12 11534 g002
Figure 3. The schematic diagram of our method comparing the traditional matching method to this matching method. This figure shows a simplified process of matching an anchor with a target when the IoU threshold is 0.7. This method configures two auxiliary anchors for an anchor, calculates three IoUs with the target frame and then takes its maximum value.
Figure 3. The schematic diagram of our method comparing the traditional matching method to this matching method. This figure shows a simplified process of matching an anchor with a target when the IoU threshold is 0.7. This method configures two auxiliary anchors for an anchor, calculates three IoUs with the target frame and then takes its maximum value.
Applsci 12 11534 g003
Figure 4. The auxiliary anchor method can improve the non-anchor angle target score. (a) The baseline score when the target and anchor angle is same. (b,c) The baseline score when the target and anchor angle is different, which shows the decrease in score is 0.19, caused by the angle increase. (d) Our method score when the target and anchor angle is the same. (e,f) Our method score when the target and anchor angle is different in our method, which shows the decrease in score is 0.02, caused by the angle increase. It can be found that this method can effectively improve the score of the target when the angle of the target is different from that of the anchor.
Figure 4. The auxiliary anchor method can improve the non-anchor angle target score. (a) The baseline score when the target and anchor angle is same. (b,c) The baseline score when the target and anchor angle is different, which shows the decrease in score is 0.19, caused by the angle increase. (d) Our method score when the target and anchor angle is the same. (e,f) Our method score when the target and anchor angle is different in our method, which shows the decrease in score is 0.02, caused by the angle increase. It can be found that this method can effectively improve the score of the target when the angle of the target is different from that of the anchor.
Applsci 12 11534 g004
Figure 5. Comparison of accuracy recall curve between baseline method and this method. (a) DOTA. (b) HRSC2016.
Figure 5. Comparison of accuracy recall curve between baseline method and this method. (a) DOTA. (b) HRSC2016.
Applsci 12 11534 g005
Figure 6. Recall rates of targets from different angles. The ordinate is different recall rates with the same accuracy. (a) DOTA. (b) HRSC. Our method can effectively improve the recall rate of the target when the target is different from the anchor. When the target angle is the same as that of the anchor, it is basically the same under the dataset, and it is still improved under the HRSC2016 dataset.
Figure 6. Recall rates of targets from different angles. The ordinate is different recall rates with the same accuracy. (a) DOTA. (b) HRSC. Our method can effectively improve the recall rate of the target when the target is different from the anchor. When the target angle is the same as that of the anchor, it is basically the same under the dataset, and it is still improved under the HRSC2016 dataset.
Applsci 12 11534 g006
Table 1. The relationship between anchor numbers and latency from Reference article [4].
Table 1. The relationship between anchor numbers and latency from Reference article [4].
Anchor NumberNms Execution Time
20010 ms
100035 ms
Table 2. Detection accuracy (Ap for each category and overall mAP) on different objects in DOTA. The short names for categories are defined as PL—PLANE, BD—BASEBALL DIAMOND, BR—BRIDGE, GTF—GROUND FIELD TRACK, SV—SMALL VEHICLE, LV—LARGE VEHICLE, SH—SHIP, TC—TENNIS COURT, BC—BASKETBALL COURT.
Table 2. Detection accuracy (Ap for each category and overall mAP) on different objects in DOTA. The short names for categories are defined as PL—PLANE, BD—BASEBALL DIAMOND, BR—BRIDGE, GTF—GROUND FIELD TRACK, SV—SMALL VEHICLE, LV—LARGE VEHICLE, SH—SHIP, TC—TENNIS COURT, BC—BASKETBALL COURT.
MethodPLBDBRGTFSVLVSHTCBC
Baseline80.9565.2224.6861.5246.8060.3060.8290.4150.97
Our method80.8666.5222.2262.9847.9067.2660.7990.4955.83
Table 3. Detection accuracy on HRSC2016.
Table 3. Detection accuracy on HRSC2016.
MethodBackbonemAP
R-RetinaNetResNet-5063.04
Our methodResNet-5072.94
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, K.; Chen, B.; Wu, X.; Li, Y. Improved Matching Algorithm with Anchor Argument for Rotate Target Detection. Appl. Sci. 2022, 12, 11534. https://doi.org/10.3390/app122211534

AMA Style

Wang K, Chen B, Wu X, Li Y. Improved Matching Algorithm with Anchor Argument for Rotate Target Detection. Applied Sciences. 2022; 12(22):11534. https://doi.org/10.3390/app122211534

Chicago/Turabian Style

Wang, Kangkang, Bowen Chen, Xianyun Wu, and Yunsong Li. 2022. "Improved Matching Algorithm with Anchor Argument for Rotate Target Detection" Applied Sciences 12, no. 22: 11534. https://doi.org/10.3390/app122211534

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop