UAV Detection and Tracking in Urban Environments Using Passive Sensors: A Survey

Yan, Xiaochen; Fu, Tingting; Lin, Huaming; Xuan, Feng; Huang, Yi; Cao, Yuchen; Hu, Haoji; Liu, Peng

doi:10.3390/app132011320

Open AccessReview

UAV Detection and Tracking in Urban Environments Using Passive Sensors: A Survey

¹

HDU-ITMO Joint School, Hangzhou Dianzi University, Hangzhou 310018, China

²

School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China

³

Hangzhou Security and Technology Evaluation Center, Hangzhou 310020, China

⁴

College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(20), 11320; https://doi.org/10.3390/app132011320

Submission received: 15 September 2023 / Revised: 6 October 2023 / Accepted: 10 October 2023 / Published: 15 October 2023

(This article belongs to the Special Issue Deep Learning and Edge Computing for Internet of Things)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Unmanned aerial vehicles (UAVs) have gained significant popularity across various domains, but their proliferation also raises concerns about security, public safety, and privacy. Consequently, the detection and tracking of UAVs have become crucial. Among the UAV-monitoring technologies, those suitable for urban Internet-of-Things (IoT) environments primarily include radio frequency (RF), acoustic, and visual technologies. In this article, we provide a comprehensive review of passive UAV surveillance technologies, encompassing RF-based, acoustic-based, and vision-based methods for UAV detection, localization, and tracking. Our research reveals that certain lightweight UAV depth detection models have been effectively downsized for deployment on edge devices, facilitating the integration of edge computing and deep learning. In the city-wide anti-UAV, the integration of numerous urban infrastructure monitoring facilities presents a challenge in achieving a centralized computing center due to the large volume of data. To address this, calculations can be performed on edge devices, enabling faster UAV detection. Currently, there is a wide range of anti-UAV systems that have been deployed in both commercial and military sectors to address the challenges posed by UAVs. In this article, we provide an overview of the existing military and commercial anti-UAV systems. Furthermore, we propose several suggestions for developing general-purpose UAV-monitoring systems tailored for urban environments. These suggestions encompass considering the specific requirements of the application scenario, integrating detection and tracking mechanisms with appropriate countermeasures, designing for scalability and modularity, and leveraging advanced data analytics and machine learning techniques. To promote further research in the field of UAV-monitoring systems, we have compiled publicly available datasets comprising visual, acoustic, and radio frequency data. These datasets can be employed to evaluate the effectiveness of various UAV-monitoring techniques and algorithms. All of the datasets mentioned are linked in the text or in the references. Most of these datasets have been validated in multiple studies, and researchers can find more specific information in the corresponding papers or documents. By presenting this comprehensive overview and providing valuable insights, we aim to advance the development of UAV surveillance technologies, address the challenges posed by UAV proliferation, and foster innovation in the field of UAV monitoring and security.

Keywords:

anti-UAV; UAV detection; UAV tracking; UAV dataset; anti-UAV systems

1. Introduction

With the rapid advancement of technology, unmanned aerial vehicles (UAVs) have found numerous urban IoT (Internet of Things) applications in fields such as rescue operations [1], surveillance [2,3], edge computing [4,5], disaster area reconstruction [6,7], aerial base stations [8,9], intelligent transportation [10,11], wireless power transfer [12], environmental monitoring [13], and more [14,15]. According to market forecasts, the UAV market will experience a compound annual growth rate (CAGR) of 7.9%, expanding from USD 26.2 billion in 2022 to USD 38.3 billion in 2027 [16]. However, the increasing use of UAVs also raises concerns about security, public safety, and privacy. To address these issues, researchers have been exploring various UAV monitoring technologies. UAV surveillance can be achieved through four primary technologies: radar, radio frequency (RF), visual, and acoustic surveillance.

Radar surveillance is a well-established and widely adopted method for airspace monitoring and reconnaissance. This technology can be used to detect UAVs that lack radio communication or with full autonomy. Doppler radar is a powerful technology that uses the Doppler effect to detect the velocity of moving objects. Micro-Doppler radar is an advanced version of Doppler radar that can detect speed and motion differences inside objects. Micro-Doppler radar is an ideal method for detecting UAVs because their propellers create large linear velocity differences [17]. Other radar technologies that may be applicable include non-line-of-sight radar, ultra-wideband radar, millimeter-wave radar [18], chaotic mono-static, bi-static [19], and multi-static radars [20]. Because of their low radar cross section [21] and slow speeds, detecting UAVs with radar is difficult and complex [22]. Experimental results indicate that the detection range of radar rarely surpasses 10,000 m [23].

UAVs emitting RF wave signals can be intercepted and analyzed to track and locate them [24]. In many cases, manually operated UAVs communicate with a ground station and a GNSS (Global Navigation Satellite System) for operation, making it possible to intercept signals and obtain information such as coordinates and video feeds. However, autonomous UAVs that rely solely on onboard sensors for operation may not emit RF signals, making it more challenging to locate and track them [17]. RF surveillance exhibit the capability to detect and locate UAVs within a range of 5000 m. However, their performance can be influenced by factors such as multipath and non-line-of-sight propagation.

In recent years, computer vision-based methods have become increasingly popular for detecting and monitoring UAVs. By leveraging deep learning techniques, models can be trained to automatically extract appearance and motion features from datasets of UAV images and videos, enabling the identification and tracking of these vehicles using video surveillance from cameras [25]. In addition, infrared cameras can be used to detect and identify UAVs in low ambient light conditions [26]. By combining computer vision techniques with infrared cameras, UAVs can be detected and monitored effectively in various lighting and environmental conditions. Vision-based methods encounter challenges in distinguishing UAVs from birds, particularly when the UAVs are situated at a considerable distance. In fact, identifying UAVs beyond a range of 1000 m becomes exceedingly arduous, if not nearly impossible.

Sound waves emitted by the power unit and propeller blades during UAV navigation can be detected using an acoustic microphone, which converts the pressure of these sound waves into electrical signals. These sound waves create an individualized “audio fingerprint” for each UAV, enabling individual identification. However, detecting these sound waves can be challenging in practice due to factors such as ambient noise and sound wave attenuation. Deep learning models have been applied to improve the effectiveness of this method for UAV detection. By training these models with datasets of acoustic signatures, they can learn to distinguish UAV signals from noise and other interference, enhancing the accuracy and reliability of this approach [27]. Nevertheless, acoustic monitoring is highly susceptible to ambient noise and possesses a constrained detection range. As per the conducted tests, the maximum detection range for UAVs does not exceed 300 m.

Table 1 provides a comprehensive overview of the aforementioned monitoring techniques. It is crucial to acknowledge that the detection distances presented are derived from existing literature and systems, and may exhibit variations based on factors such as UAV type, hardware parameters, and associated algorithms. To improve the effectiveness of UAV recognition, UAV detection systems typically utilize a combination of two to three of the aforementioned technologies [22].

In the past, UAV-monitoring systems were primarily deployed in critical military and civilian facilities such as airports and military bases. However, with the increasing popularity of UAVs, the need for UAV-monitoring systems has expanded to a wider range of settings, including construction sites, communities, shopping malls, schools, and other locations. This has created a demand for UAV-monitoring systems that are more cost-effective, scalable, and responsive. To meet this challenge, researchers have been exploring ways to detect UAVs using lower-cost and passive sensors [28]. The development of software-defined radio has greatly reduced the cost of RF detection, making it more accessible to a broader range of users. In recent years, neural network-enhanced RF-based detection, visual-based detection, and acoustic-based detection have emerged as promising options for general UAV-monitoring systems in urban environments. With the development of lightweight models, there are already some models that can obtain acceptable results with very little computing resources, which makes it possible to use edge computing for UAV detection. While radar surveillance is highly effective in detecting aircraft, its use is limited to specific locations due to the high cost and radiation associated with the technology. As a result, it may not be suitable for detecting illegal UAVs in urban areas.

Our findings indicate that within the research community, there is a notable divergence in focus regarding anti-UAV research. While some researchers prioritize the development of low-cost and lightweight anti-UAV solutions, the majority of researchers show a greater interest in enhancing the effectiveness of UAV detection and tracking. Our survey aims to address this disparity and raise awareness among researchers regarding the broader anti-UAV requirements in urban environments. We believe there is a pressing need for further exploration of low-cost and passive sensor-based anti-UAV systems. By directing attention towards these areas, we hope to foster increased research and innovation in developing comprehensive anti-UAV solutions that are both cost-effective and capable of meeting the specific challenges posed by urban environments.

This paper provides a comprehensive review of passive UAV surveillance technologies, encompassing RF-based, acoustic-based, and vision-based methods for UAV detection, localization, and tracking. Our research reveals that certain lightweight UAV depth detection models have been effectively downsized for deployment on edge devices, facilitating the integration of edge computing and deep learning. In the city-wide anti-UAV, the integration of numerous urban infrastructure monitoring facilities presents a challenge in achieving a centralized computing center due to the large volume of data. To address this, calculations can be performed on edge devices, enabling faster UAV detection. Currently, there is a wide range of anti-UAV systems that have been deployed in both commercial and military sectors to address the challenges posed by UAVs. In this article, we provide an overview of the existing military and commercial anti-UAV systems. Furthermore, we propose several suggestions for developing general-purpose UAV-monitoring systems tailored for urban environments. These suggestions encompass considering the specific requirements of the application scenario, integrating detection and tracking mechanisms with appropriate countermeasures, designing for scalability and modularity, and leveraging advanced data analytics and machine learning techniques. To promote further research in the field of UAV-monitoring systems, we have compiled publicly available datasets comprising visual, acoustic, and radio frequency data. These datasets can be employed to evaluate the effectiveness of various UAV-monitoring techniques and algorithms. All of the datasets mentioned are linked in the text or in the references. Most of these datasets have been validated in multiple studies, and researchers can find more specific information in the corresponding papers or documents. By presenting this comprehensive overview and providing valuable insights, we aim to advance the development of UAV surveillance technologies, address the challenges posed by UAV proliferation, and foster innovation in the field of UAV monitoring and security.

The rest of the paper is organized as follows. Section 2 gives the comprehensive review of UAV detection and identification methods. Section 3 describes ways of UAV localization and tracking. Anti-UAV system is introduced in Section 4. Finally, the conclusion is presented in Section 5.

2. UAV Detection and Identification

In this section, we will conduct a comprehensive analysis and comparison of RF-based, acoustic-based, and vision-based methods for the detection and identification of UAVs in urban IoT environments. We will explore these methods from various perspectives, including traditional methods, deep learning methods, and the available public and semi-public datasets. Additionally, we will discuss the specific challenges associated with identifying UAV intrusions in IoT environments and review recent research focused on leveraging edge devices for UAV detection. Table 2 provides a performance comparison of various UAV detection methods, with the data obtained from literature research. The vision method based on deep learning demonstrates good accuracy even when experiments are conducted on public datasets that are not specifically optimized for UAVs. However, recent studies in the literature have shown that after fine-tuning, optimization, and training on specialized UAV datasets, the accuracy rate can exceed 90%, indicating a high level of competitiveness.

2.1. RF-Based Detection

The process of RF-based UAV detection typically involves feature extraction from the received RF signal, followed by comparison of the extracted features with the UAV RF feature database to achieve UAV identification. In recent years, learning-based approaches have gained significant traction in the realm of UAV detection, leading to significant changes in RF-based UAV detection methods. To enhance UAV detection using learned methods, researchers have employed various techniques. Generally, these approaches can be divided into two categories. The first category involves extracting signal features using signal processing methods such as Fourier transform, followed by classification using SVM, decision trees, and other methods [29,32]. The second category involves applying simple processing to the signal, followed by feature extraction using deep neural networks for UAV signal identification [33,34,35,36,37]. This section will provide an overview of both traditional RF-based detection methods and deep learning-based approaches for UAV detection. Additionally, commonly used datasets for UAV detection will also be introduced.

2.1.1. Traditional RF-Based Detection

RF detection involves identifying both the flight control signal and the image transmission signal that are exchanged between the control station and the UAV. Its principle is to analyze the frequency, symbol rate, modulation type, signal jump, channel bandwidth and other information of these signals based on collecting the original RF signal, and extract the ‘fingerprint’ feature that identifies its uniqueness. To improve the accuracy of identification, researchers often choose to extract multiple features to serve as the ‘fingerprint’ information for UAVs. In this section, we shall examine the methodologies employed for the extraction of UAV spectral features in recent years.

The Fractal dimension (FD) [29] is a statistical metric that characterizes the complexity of a signal, and can also be used to quantify its roughness or irregularity. There are two methods for calculating the FD, namely the time domain method and the phase space domain method. Some commonly used FD include the correlation dimension, Hausdorff dimension, box dimension [40], and so on. The Higuchi algorithm is a highly efficient method for calculating fractal dimensions [41]. The FD has been shown to be able to achieve identification accuracies of up to 100%.

Bispectrum is a powerful tool for analyzing nonlinear, non-Gaussian, non-minimal phase stationary random signals. The bispectrum is a powerful tool for reducing the impact of white Gaussian noise (WGN). For better online signal classification and object recognition, several bispectral-based integrated feature extraction methods have been developed [42], including the axially integrated bispectra (AIB) [30] and the square integrated bispectra (SIB) [31]. The AIB feature and SIB feature exhibit accuracies of 98% and 96%, respectively.

Signal Spectrum (SFS) [32] is a graphical representation of the frequency distribution of a signal, which can be used to analyze its frequency content. Throughout the entire flight process of a UAV, the continuous rotation of its propellers induces vibrations that propagate throughout the fuselage. By employing STFT to analyze the received wireless signal, it is possible to distinguish between UAV and non-UAV signals based on differences in their respective SFS. The accuracy of this approach can reach 97.85%.

Wavelet energy entropy (WEE) [32] can analyze the change of signal entropy level after wavelet transform. The wavelet transform can capture the sudden short-term signal changes that will occur when the UAV is restored due to the effects of wind and the UAV’s movement in three-dimensional space during the flight. The accuracy of the WEE method is 93.75%.

Power spectral entropy (PSE) [32] is a signal feature that describes the relationship between power spectrum and entropy rate, and can effectively differentiate between different signals. Existing literature indicates that under the same acquisition environment, UAV signals exhibit significantly higher PSE values compared to non-UAV signals. With respect to PSE, an accuracy of 83.85% has been achieved.

As shown in Table 3, the accuracy of FD, AIB, and SIB is higher than SFS, WEE, and PSE. However, the former ones (FD, AIB, and SIB) can only accurately identify known UAV types, and the accuracy of identification is low for UAVs that are unknown. The latter ones (SFS, WEE, and PSE) identify UAVs based on the physical characteristics of their flight, and maintains a high accuracy rate for UAVs that are not visible yet. Simultaneously utilizing multiple fingerprint features can enhance the accuracy of UAV identification.

2.1.2. RF-Based Detection Using Deep Learning

As learning methods continue to evolve, researchers are exploring new ways to improve the effectiveness of signal feature extraction. A wide range of possibilities for signal feature extraction using learning methods have been explored in the literature. Recent research results [43,44] have demonstrated that convolutional neural networks (CNN) are highly suitable for RF-based detection and classification, highlighting the potential of deep learning methods in this field. By leveraging the ability of CNN to automatically extract relevant features from raw signal data, researchers have achieved significant improvements in UAV detection and classification performance.

As shown in Ref. [33], CNN can directly extract features from compressed sensing signals without requiring the use of reconstruction algorithms to restore the original signal. Extracting features in this way is a little easier than using traditional methods. The accuracy of the final classification can reach about 99%. In Ref. [34], the authors fed the spectrogram directly into the CNN to extract features. With this method, the prediction accuracy can reach up to 100%. S. Lu et al. [35] tried to use a time-frequency waterfall map directly to extract features by utilizing a network architecture comprising EESP cells (according to the description in this literature, EESP may refer to the ESP, efficient spatial pyramid [21]) and employing the VGG feature extraction method, and achieved good results. This method can achieve an accuracy of over 98%. In Ref. [36], after preprocessing the compressive signal using wavelet transform, the signal was mapped to a two-channel image, which was fed into a VGG-16 model to detect and classify UAVs. Using bispectral feature extraction based on two-dimensional Fourier transform, they were able to extract more frequency domain features. This method is 100% accurate in predicting whether UAVs are present or absent. When the prediction is classified as different types of UAVs, the accuracy rate is 98.9%. When the prediction classification includes different operating modes of various types of UAVs, the accuracy rate is 90.2%. Li et al. [37] directly converted the two frequency domain dimensions

(f_{1}, f_{2})

and frequency domain amplitude

| B (f_{1}, f_{2}) |

to grayscale images and undertook embedding into a Siamese network, supported by an image augmentation strategy. The accuracy rates for UAV type detection and operation mode detection are 98.57% and 92.31%, respectively.

These deep learning-based methods typically convert radio frequency signals into images such as spectrograms, and then use deep learning networks to extract features and identify UAVs. Table 4 presents a comparison of the aforementioned methods.

Recent literature on RF-based UAV detection has identified two major development directions. The first direction involves combining UAV physical characteristics with signal processing techniques to extract more effective spectral features, such as waveform energy entropy (WEE) and spectral flatness measure (SFM). This approach aims to boost the discriminative capability of the extracted features by leveraging unique physical properties of UAV signals. The second direction involves developing network models that can extract RF features faster and more accurately. This approach focuses on designing deep learning architectures that can effectively capture the relevant features from the RF signal data. By improving the efficiency and accuracy of feature extraction, these approaches aim to enhance the overall performance of RF-based UAV detection systems.

2.1.3. Datasets and Metrics for RF-Based Detection

In this section, public and semi-public datasets in recent years will be reviewed in detail.

There are several publicly available RF-based UAV datasets that researchers can use to develop and evaluate RF-based UAV detection methods. These include the DroneRF dataset [45], Drone remote controller RF signal dataset [46], Radio-Frequency Control and Video Signal Recordings of Drones [47], Dronesignals Dataset [48], DroneDetect Dataset [49], VTIDroneSETFFT [50], and the CARDINAL RF [51]. These datasets contain recorded segments from 3, 17, 10, 9, 7, 3, and 6 different UAVs, respectively, with VTIDroneSETFFT being the only dataset to contain records of the simultaneous operation of multiple UAVs. Unlike vision-based UAV datasets, there are relatively few publicly available RF-based UAV datasets, and RF datasets are frequently quite sizable due to the considerable amount of data acquired. Despite these challenges, RF-based UAV detection methods have shown promising results in detecting and classifying UAVs based on their RF emissions. Researchers interested in accessing these datasets can find the corresponding links in the references section.

2.2. Acoustic-Based Detection

Audio-based UAV detection is another commonly used technical solution for UAV detection. While sound waves and electromagnetic waves are different, audio-based detection and radio-frequency detection share some similarities due to their wave nature. Acoustic-based detection can be broadly categorized into two main types, i.e., traditional acoustic-based detection and acoustic-based detection using deep learning.

2.2.1. Traditional Acoustic-Based Detection

Cepstrum analysis is a widely used technique for analyzing audio signals, and it entails computing the inverse Fourier transform of the logarithmic power spectrum values. Many feature extraction methods for audio signals are developed based on cepstrum coefficients. The commonly used cepstrum coefficients include Linear Predictive Cepstrum Coefficient (LPCC) and Mel Frequency Cepstrum Coefficient (MFCC). By applying these techniques to audio-based UAV detection, researchers have achieved promising results in identifying UAVs based on their unique acoustic features.

The Mel frequency is a non-linear frequency scale that was designed to model the way in which the human ear perceives sound. It does not have a linear correspondence with Hz frequency. The MFCC is a spectrum feature that is computed based on the Mel frequency scale, and it has found extensive use in the domain of speech recognition. However, the accuracy of MFCC calculation decreases with increasing frequency. As a result, in practical applications, only low-frequency MFCCs are typically used, while medium to high frequency MFCCs are discarded. Despite this limitation, MFCC remains the most widely employed audio feature in current UAV recognition tasks, as it has demonstrated effectiveness in identifying UAVs based on their unique acoustic features [38,52,53,54,55]. The accuracy when using MFCC can be over 97%.The audio signal is initially segmented into frames and windowed. This involves dividing the signal into shorter time intervals and applying a window function to each interval to minimize spectral leakage. Each frame undergoes the application of the Fast Fourier Transform (FFT) to derive its linear spectrum. The linear spectrum is subsequently filtered using a bank of Mel filters that are spaced according to the Mel frequency scale. The logarithm of the filtered spectrum is then computed to obtain the LogMel spectrogram. The LogMel spectrogram is subsequently subjected to the Discrete Cosine Transform (DCT), an instrumental method utilized for the transformation of spectral data from the time domain to the frequency domain. A desired number of DCT coefficients are then retained to obtain the MFCC, which can be used as a feature for audio-based UAV detection.

Linear Predictive Cepstral Coefficients (LPCC) is a commonly used method for feature extraction in audio signal analysis [56] and is a variant of MFCC. Similar to MFCC, LPCC extracts features by preprocessing, framing, filter bank processing, Mel cepstrum, DCT, and other steps on speech signals. However, LPCC uses linear prediction (LP) filters instead of triangular filter banks in the filter bank processing stage to extract resonance peak information of speech signals. The LP filters used by LPCC in filter bank processing can more accurately extract the formant information of speech signals, making LPCC more effective than MFCC in the presence of Gaussian white noise. However, the high complexity of the LP filters and the corresponding increase in computational complexity must be balanced in practical applications. In the literature, LPCC is usually used as a supplement to MFCC, and the amalgamation of these two features is commonly employed to bolster the performance of UAV recognition [53].

2.2.2. Acoustic-Based Detection Using Deep Learning

Acoustic-based UAV detection has two primary directions of research. The first direction involves accurately and quickly separating sound sources and identifying UAV acoustic signals in noisy environments, which can be achieved using techniques such as ICA and fastICA [57]. The second direction focuses on exploring UAV detection in time-varying scenarios, which is important due to the rapid movement of UAVs and the need to track their acoustic signatures over time [57]. While acoustic-based UAV detection is less commonly used in commercial and military anti-UAV systems due to its short detection range, this approach still has significant potential for application in urban IoT environments such as residential areas, schools, and commercial areas. By leveraging the unique acoustic signatures of UAVs, this approach can help to identify and track UAVs in real-time, providing valuable information for UAV detection and countermeasure systems. Future research in this area may focus on developing more accurate and robust acoustic-based UAV detection methods that can function across a range of acoustic environments and under various operating conditions of UAVs.

Independent Component Analysis (ICA) is a computational method that can be employed to decompose multivariate signals into additive subcomponents. This is done under the assumption that the subcomponents are non-Gaussian and statistically independent of one another. ICA is a special case of blind source separation, and it can be used to separate acoustic signals from mixed recordings. ICA’s ability to separate signals that are mixed with strong interference sources makes it a promising technique for detecting single or multiple UAVs when the signals are contaminated with noise [57]. In practice, ICA can be used as a part of signal preprocessing and combined with other analysis methods. For example, in Refs. [53,57], a combination of ICA and MFCC is used for UAV identification, with ICA used to preprocess the audio signals and extract independent acoustic sources before applying MFCC for feature extraction.

In addition to traditional signal processing techniques, learning-based methods are also widely used for feature extraction in audio-based UAV detection. Recent research [38,39] has shown that deep learning models can also be used to extract features from UAV acoustic signals for UAV detection. In Ref. [27], the authors converted audio clips into spectrograms and used them as input for deep learning algorithms. To train deep learning models, the algorithm extracted multiple features from the spectrograms generated by the audio signals. In Ref. [39], the received audio signal was converted into the MEL spectrum by short-term Fourier transform, recorded as an image, and then input into a specially designed Light-Weight Convolutional Neural Network to extract signal features. In Ref. [38], Log Mel spectrograms and MFCC were used as inputs to CNN to classify the acoustic signals as either indicative of UAV activity or not. Table 5 presents the accuracy of these methods. It showcases the effectiveness of the approaches described earlier in detecting UAVs using acoustic signals.

2.2.3. Datasets and Metrics for Acoustic-Based Detection

To evaluate the performance of acoustic-based detection, the following two datasets and metrics are usually used, i.e., DroneAudioDataset [58] (can be found from GitHub repository: https://github.com/saraalemadi/DroneAudioDataset, accessed on 1 June 2023) and the Casabianca’s Dataset [59] (can be found from GitHub repository: https://github.com/pcasabianca/Acoustic-UAV-Identification, accessed on 1 June 2023). The DroneAudioDataset contains recordings of UAV propeller noise in an indoor environment, with the audio clips categorized as either ’Drone’ or ’Unknown’. The ’Unknown’ category consists of white noise and silence audio clips that do not contain any UAV signals. In contrast, the Casabianca’s Dataset contains recordings of a variety of UAVs in a range of background sound situations, including airplanes, helicopters, traffic, thunderstorms, wind, and quiet environments. These datasets provide a valuable resource for researchers to develop and evaluate audio-based UAV detection methods under different acoustic conditions.

Since the number of acoustic UAV datasets is limited, researchers often have to create their own datasets by recording UAV acoustic signals from publicly available videos or by conducting their own experiments. This involves using appropriate recording equipment, such as microphones, and selecting suitable recording locations and environmental conditions to capture a diverse range of UAV acoustic signatures. Despite the challenges associated with creating and using self-collected datasets, this approach allows researchers to tailor their experiments to specific application scenarios and to evaluate the performance of their proposed methods under realistic conditions. However, it is important to ensure that the datasets are properly annotated and validated to ensure the accuracy and reliability of the results.

2.3. Vision-Based Detection

Object detection is a crucial task in the domain of computer vision that involves the detection of instances of certain types of visual objects (e.g., animals, people, or plants) in an image. By creating computational models and techniques to detect objects, computer vision applications can provide valuable information: what object is where? Object detection is widely recognized as a fundamental challenge, and it provides the basis for numerous other associated tasks. For instance, methods for object detection can be extended to instance segmentation, image captioning, and target tracking tasks. From an application standpoint, research on object detection can be broadly categorized into two primary topics: “general object detection”, which aim to detect various types of objects within a unified framework, mimicking the visual perception and recognition capabilities of human beings, and “detection applications”, which focuses on designing specialized approaches tailored to specific application scenarios, such as pedestrian detection, face detection, and other relevant domains. Over the past few years, rapidly advancing deep learning techniques have injected a new dynamic into object detection, bringing significant breakthroughs. Object detection has become increasingly popular across a diverse spectrum of real-world applications, including path planning, robot vision, video surveillance, and many others. The following section will introduce more typical target detection methods, including traditional object detection methods and deep learning-based approaches, as well as the commonly used UAVs object detection datasets at present. Table 6 shows the results of some state-of-the-art methods on different datasets.

2.3.1. Traditional Object Detection

Viola Jones Detector

The VJ detector [60] relies on a simple yet effective object detection technique known as sliding windows. As part of this method, the window’s position and scale are scanned in order to determine if there is a human face in any of the windows. Although the sliding windows technique used by the VJ detector appears to be straightforward, the computational demands of this approach were beyond the capabilities of computers at the time of its development. This method combines three essential techniques: “integral image”, “feature selection”, and “detection cascade”, which significantly enhance the speed of detection.

HOG Detector

HOG [61] can be regarded as a significant improvement over previous techniques such as scale-invariant feature transformations and shape contexts. It is designed to be computed on a dense grid of uniformly spaced cells in order to balance feature invariance, which includes translation, scaling, illumination, and other factors, as well as nonlinearity and the need to distinguish between different object classes. The classification accuracy is further improved by applying overlapping local contrast normalization to the “blocks”. In spite of the fact that HOG is capable of detecting objects of different classes, it is primarily motivated by pedestrian detection problems. HOG detectors resize input images several times while maintaining the detection window size to detect objects of different sizes. In various computer vision applications for many years, the HOG detector has been a key component of many object detectors.

2.3.2. Deep Learning Based Object Detection

RCNN

It is easy to understand how RCNN (Regions with CNN features) [62] works. Initially, selective search is used to extract an object candidate frame set. An ImageNet-trained CNN model is then applied to each candidate frame and rescaled to a fixed-size image, such as AlexNet, to extract features. The final step is to determine whether objects are present or absent within each region and to identify object classes using a linear SVM classifier. For instance, RCNN has demonstrated a substantial increase in mean Average Precision (mAP) 8.5%.

SPPNet

SPPNet (Spatial Pyramid Pooling Network) was proposed by K. He et al. [63] in 2014, which addressed the limitations of fixed-size inputs in CNN. A major contribution of SPPNet is its Spatial Pyramid Pooling (SPP) layer, which allows CNN to generate fixed length representations without rescaling depending on the image size. This approach avoids the need for repeated computation of convolutional features by computing the feature mapping only once from the entire image. When used for object detection, SPPNet enables the generation of a fixed-length representation for any region, making it unnecessary to rescale the image or region of interest. Using this approach, SPPNet can detect objects at 20 times faster speeds than RCNN without compromising detection accuracy (VOC07 mAP = 59.2%).

Fast RCNN

In 2015, Fast RCNN [64] detectors improve upon RCNN and SPPNet according to R. Girshick. By using the Fast RCNN detector, both the bounding box regressor and the object detector are trained using the same network configuration, which is an improvement over the previous two-stage approach used by RCNN and SPPNet. With its streamlined architecture, Fast RCNN provides a significant speedup over previous methods, detecting objects more than 200 times faster than RCNN, while still achieving high accuracy. Fast RCNN improved the mAP from 58.5% (RCNN) to 70.0% on the VOC07 dataset.

Faster RCNN

S. Ren et al. introduced a faster RCNN [65] detector shortly after the Fast RCNN detector made its debut in 2015. The Faster RCNN detector is considered to be the first to detect deep learning end-to-end and in near real-time. It achieved a COCO [email protected] of 42.7%, COCO mAP@[.5, .95] of 21.9%, VOC07 mAP of 73.2%, and VOC12 mAP of 70.4%. By integrating proposal detection, feature extraction, and bounding box regression into a single framework, Faster RCNN detector provides a complete solution for object detection. Today, faster RCNN is widely used for object detection due to its enhanced speed and accuracy.

YOLO

YOLO (You Only Look Once) was proposed by R. Joseph et al. in 2015 to detect objects in one stage [66]. YOLO is known for its remarkable speed, with the fast version of YOLO running at 155 fps (VOC07 mAP = 52.7%), while the enhanced version runs at 45 fps (VOCO7 mAP = 63.4%, VOC12 mAP = 57.9%). Unlike previous object detection methods that used a “proposal detection + verification” paradigm, one neural network is applied to the whole input image in YOLO. An image is divided into a grid of cells, with each cell containing its bounding box and probability. However, when it comes to localizing small objects, its accuracy falls short compared to that of the two-stage detector. YOLO is very popular in the field of UAV detection because of its extremely fast speed and relatively high accuracy, and there have been many studies on UAV detection based on the YOLO algorithm in recent years [67,68].

SSD

W. Liu et al. presented SSD [69] in 2015 as the second level 1 detector in deep learning. This invention introduces multi-reference and multi-resolution detection techniques that greatly enhance detection accuracy, especially for small objects. SSD outperforms previous detectors in terms of speed and accuracy, with VOC07 mAP of 76.8%, VOC12 mAP of 74.9%, COCO [email protected] of 46.5%, mAP@[.5, .95] of 26.8%, and a fast version running at 59 fps. SSD detects objects at 5 different scales at different layers of the network, as opposed to previous detectors that detected at only the top layer.

RetinaNet

RetinaNet [70], proposed by Lin et al. in 2018, is a one-level target detector that overcomes the challenge of training a single-level detector with an extreme foreground-background class imbalance. While two-level target detectors can achieve higher accuracy by proposing dense candidate locations, single-level detectors have been limited by difficulties in training an unbalanced set of positive and negative examples. A novel loss function called “focal loss” was proposed to address this problem, which alters the standard cross-entropy loss function so that the detector can focus more on difficult, misclassified examples during training, mitigating the loss resulting from well-classified or simple examples. The use of focal loss greatly improves the accuracy of RetinaNet, allowing it to achieve COCO [email protected] of 59.1% and mAP@[.5, .95] of 39.1%, while maintaining the speed of previous single-stage detectors.

2.4. Datasets and Metrics for Vision-Based Detection

In this section, public and semi-public datasets in recent years will be reviewed.

2.4.1. Well-Known Datasets

The construction of larger datasets with smaller biases is crucial for the development of advanced computer vision algorithms. During the last decade, numerous benchmarks and datasets have been released for object detection, including datasets such as the PASCAL VOC challenge [71,72], the MS-COCOCO detection challenge [73], and others. Table 6 shows the detection accuracies of the above deep learning methods on the VOC07, VOC12, and MS-COCO datasets.

2.4.2. Object Detection Dataset for UAVs

The main image-based UAV datasets are the Real World dataset [74], the Det-Fly dataset [75], the MIDGARD dataset [76], the USC-Drone dataset [77], and the DUT-Anti-UAV [78]. These 5 datasets contain 56,821, 13,271, 8775, 18,778, and 10,109 images, respectively, and their respective resolutions are 640 × 480, 3840 × 2160, 752 × 480, 1920 × 1080 pixels, and various resolutions. Their samples are shown in Figure 1. Compared to other UAV datasets, the Real World dataset stands out for its diversity in terms of UAV types and environments, despite the low-resolution quality of the images. This is because all the data in the Real World dataset are obtained from YouTube videos, whereas other datasets are typically collected by the researchers themselves. On the other hand, the Det-Fly dataset addresses the shortcomings of single-view UAV data by providing multi-view data that covers a wider range of angles and perspectives. The MIDGARD dataset and USC-Drone dataset also contain data from a single type of UAV and a richer environment, but they are limited by the disadvantage of having a single viewpoint. The DUT-Anti-UAV contains more than 35 types of UAVs with various background. The background in the dataset includes a wide variety of environments, such as the sky, black clouds, jungle, high-rise building, residential building, farmland, and playground. In addition, various lighting conditions (including dawn, day, dusk and night) as well as different weather conditions are taken into account. The datasets referenced, namely the Real World dataset, the Det-Fly dataset, the MIDGARD dataset, the USC-Drone dataset, and the DUT-Anti-UAV, can be accessed through the following provided links: https://github.com/Maciullo/DroneDetectionDataset, https://github.com/Jake-WU/Det-Fly, https://mrs.felk.cvut.cz/midgard, https://github.com/chelicynly/A-Deep-Learning-Approach-to-Drone-Monitoring and https://github.com/wangdongdut/DUT-Anti-UAV, accessed on 1 June 2023.

There are several publicly available video-based UAV datasets datasets, the Anti-UAV [79] and the USC-GRAD-STDdb [80]. The DUT-Anti-UAV also contain 24,804 videos, and their resolutions are mostly 1920 × 1080 pixels. There are 318 RGB-T pairs in the Anti-UAV dataset, each of which consists of an RGB video and a thermal video. A 25 FPS frame rate was used to record these videos, which feature various UAVs flying in the sky. It is publicly available at https://github.com/ucas-vg/Anti-UAV, accessed on 1 June 2023. Meanwhile, in USC-GRAD-STDdb dataset, there are 115 video segments, with a total of over 25,000 annotated frames in HD 720p resolution. The USC-GRAD-STDdb dataset is available at https://citius.usc.es/t/usc-grad-stddb, accessed on 1 June 2023.

2.5. Anti-UAV in IoT Environments

With the development of UAV technology, UAVs have become an important part of the modern IoT. Anonymous UAVs may hijack, steal, or disrupt data transmission within the Internet of Things. Therefore, Anti-UAV in IoT Environments face some new application scenarios, such as detecting only UAVs that attempt to invade the IoT, rather than all UAVs that invade areas. In the IoT, there are a great number of edge devices, and the detection and tracking of UAVs with the help of edge computing is also a research direction. This part reviews some research on counter-UAVs in the IoT environment.

2.5.1. Detect and Identify UAVs Trying to Hack IoT

The anti-UAVs mentioned above are all researched on UAVs invading areas of interest. In the IoT environment, people sometimes only want to detect and identify anonymous UAVs that attempt to invade the IoT. A blockchain-based UAV authentication network is an effective solution [81,82].

BANDA [81] (Blockchain-Assisted Network for Drone Authentication) is a system that leverages blockchain technology to enhance the authentication and security of UAVs. It operates by accepting real-time inputs about UAV information and add-on packages through cloud-based infrastructure. These inputs are then verified using proof-of-authority algorithms and smart contracts. The decentralized nature of BANDA is achieved through the deployment of systems on UAVs, ground control stations, and UAV defense systems. This distributed network allows for secure authentication and prevents malicious attacks and intrusions from anonymous UAVs. By utilizing a blockchain-based approach, BANDA ensures the integrity and authenticity of UAV information, enabling reliable identification and verification processes. This helps in mitigating security risks associated with unauthorized or malicious UAV activities. The optimized application of BANDA enables robust protection against potential threats and enhances the overall security of UAV operations.

RBFNNs [82], which stands for Radial Basis Function Neural Networks, is a blockchain-based model that enhances data integrity and storage capabilities, enabling intelligent decision-making across different Internet of Drone (IoD) environments. By leveraging blockchain technology, this model facilitates decentralized predictive analytics and the application of deep learning methods in a distributed manner. It proves to be a feasible and effective approach for the IoD environment. The utilization of blockchain in developing decentralized predictive analytics and models ensures data integrity and enables secure sharing of deep learning methods. This approach aligns well with the requirements and constraints imposed by network intrusion detection, making RBFNNs a suitable choice for developing classifiers in such scenarios. By combining the strengths of blockchain and deep learning, RBFNNs provide a robust and secure framework for intelligent decision making in the IoD environment. It offers the potential for reliable and efficient network intrusion detection while maintaining data integrity and compliance with the constraints of the system.

2.5.2. Anti-UAV with Edge Computing

Edge computing is used to store and process data on edge devices. It has the characteristics of fast data processing and analysis speed and strong real-time performance. However, edge devices have limited computing power, so they can only perform lightweight operations. Currently, edge computing’s application in UAV detection and tracking primarily revolves around sensor data fusion. This approach effectively reduces data storage and bandwidth requirements while enhancing latency and response time. An example of a commercially available multi-sensor fusion networking device is Droneshield’s SmartHub Mk2. Furthermore, the literature has explored lightweight deep network models that enable quick and accurate UAV detection and tracking within the constraints of limited computational resources. These studies shed light on the potential of leveraging edge computing for UAV detection and tracking. Carolyn J. Swinney et al. [34] introduces a cost-effective early warning system for UAV detection and classification. The system is composed of a BladeRF software-defined radio (SDR), a wideband antenna, and a Raspberry Pi 4, which together form an edge node. Remarkably, this setup is designed to be affordable, with a total cost of under USD 540. This produced overall accuracy for a two-class detection system at 100% and 90.9% for UAV type classification on the UAVs tested. The inference times for two-class detection in this system range from 15 to 28 s, while for the six-class UAV type classification system, the inference times range from 18 to 28 s. RF-UAVNet [83] is a lightweight convolutional neural network based on RF. Its grouped convolution layer can significantly reduce network size and computing cost; multi-level skip connections and multi-gap mechanisms can effectively improve accuracy. Notably, it achieves remarkable performance with an accuracy of approximately 99.9% for UAV detection, 98.6% for UAV classification, and 95.3% for operation recognition. What sets RF-UAVNet apart is its low complexity, boasting a mere 11,000 parameters. TIB-Net [84] introduces a cyclic pathway in the iterative backbone to keep the model size lightweight while utilizing low-level feature information, and the integrated spatial attention module further improves the performance. TIB-Net stands out not only for its compact size, but also for its efficiency. With a model size of less than 700 Kb and a remarkably low number of parameters at 0.1 million, TIB-Net demonstrates its ability to achieve notable results (approximately 89.2% for UAV detection) while maintaining a lightweight structure. In addition, there are other lightweight models available, such as the visual-based MOB-YOLO [85], which can be deployed on edge nodes. However, it is worth noting that the accuracy of MOB-YOLO is relatively lower at 49.62%. Nonetheless, it is reasonable to assume that this model can achieve higher accuracy if trained on a dedicated UAV dataset. It is essential to consider further research and training to potentially enhance the accuracy of MOF-YOLO for improved performance in UAV detection and classification tasks. According to our observation, the development of these lightweight models mainly considers taking certain measures to reduce the size and calculation amount of the model while retaining low-level information as much as possible, and using some mechanisms to improve accuracy.

In the future, city-wide anti-UAV systems are poised to become integral to the development of smart cities. However, the vast amount of surveillance data generated in the city’s airspace poses a significant challenge for a centralized computing center. To address this challenge, deploying detection models directly on edge devices and utilizing edge computing for UAV detection on these devices, can greatly reduce data transmission requirements and expedite UAV detection processes.

3. UAV Localization and Tracking

UAV localization and tracking play a crucial role in anti-UAV research. This section provides a review of UAV localization and UAV tracking methods. UAV localization research primarily concentrates on utilizing RF and acoustic-based methods. These methods are preferred due to their ability to accurately determine the location of UAVs. Regarding UAV tracking, we categorize the studies into filter-based approaches and deep Siamese networks approaches. Filter-based methods utilize various filters for tracking UAVs, while deep Siamese networks approaches leverage deep learning techniques for tracking. The advancements made in these research areas are thoroughly discussed, highlighting the progress and innovations achieved in UAV tracking.

3.1. UAV Localization

RF-based positioning technology has become increasingly mature, leading to numerous studies on RF-based UAV positioning. RF sensors are the only technology that can locate both the UAV and the pilot. On the other hand, acoustic-based UAV positioning technology is feasible for short-distance detection [22]. Although acoustic-based UAV localization technology started later than other methods, it has achieved remarkable results in recent years. In contrast, vision-based UAV tracking technology has its advantages, but due to the rapid movement of UAVs, the effect of visual positioning technology is not ideal. This section focuses on reviewing the RF and acoustic-based UAV localization technologies.

3.1.1. RF-Based Localization

There are a variety of localization methods based on radio frequency, such as Angle of Arrival measurement (AOA), Time of Arrival measurement (TOA), Time Difference of Arrival measurement (TDOA), and RSS measurement. To perform TOA localization, both transmitter and receiver clocks must be synchronized, which is typically difficult to accomplish in anti-UAV scenarios. Similarly, the TDOA positioning method relies on calculating the time difference when the signal arrives at each receiver, which can be challenging to achieve with widely spaced receivers in an anti-UAV system. Additionally, the accuracy of the positioning based on RSS measurement is generally low, and this method has limited applications in UAV positioning. In UAV localization tasks, most studies have focused on localization methods using AOA [29,86]. Despite the challenges associated with RF-based UAV positioning, these methods have shown promising results and have the advantage of being able to locate both the UAV and the pilot. Future research may focus on developing more accurate and robust RF-based UAV localization methods that can operate in a range of environments and under various UAV operating conditions.

In recent years, various parametric models have been proposed to compute AOA, such as beamforming, subspace-based methods, cross-correlation-based methods, maximum likelihood methods, and sparse array processing methods. These methods are based on pre-established models for position estimation, which can be cumbersome and computationally intensive. Figure 2 illustrates the specific positioning model. To address these issues, many studies have begun to use machine learning approaches to solve the AOA estimation problem [87,88]. From experimental results, machine learning-based approaches have shown superiority over traditional methods in terms of processing speed and accuracy. However, these methods may only achieve satisfactory results when the distribution of the training set and the test set are the same, which can be difficult to achieve in practical applications where it is challenging to cover all possible scenarios in the training set. To address these challenges, researchers are now using deep neural networks for emission source localization [86,89]. Deep Convolutional Neural Networks have been used for enhancing angle classification accuracy, as they allow high-level features to be learned from high-dimensional unstructured data at multiple scales [90]. Compared to traditional MD-based and ML-based methods, CNN-based methods have demonstrated superior accuracy levels, according to simulation results. Table 7 [90] displays the advantages and disadvantages of the methods.

3.1.2. Acoustic-Based Localization

There are two primary strategies for acoustic-based UAV positioning: AOA and TDOA [97]. The AOA method has been introduced in the previous section, and in this section, we focus on the TDOA method. The TDOA-based sound source localization method is implemented using a microphone array to find the three-dimensional position of the sound source. The TDOA source estimation involves a two-step procedure. First, the time delay between pairs of microphones in the array is estimated using the generalized cross-correlation function. This function measures the time difference between the arrival of a sound wave at different microphones in the array. Once the time delays have been estimated, they can be used as input to an equation that relates the time delays to the source location. Inverting this equation yields an estimate of the source location. The accuracy of the estimation depends on the quantity and distribution of microphones in the array, as well as the characteristics of the sound wave being detected [98,99]. The time complexity of modern TDOA-SSL methods is generally low for three-dimensional estimation of the source position, but multipath propagation and environmental noise can affect its positioning process. To address these challenges, some researchers have proposed using machine learning methods to limit these effects [100], while others have used bionic methods to improve accuracy [101].

Acoustic-based UAV localization has a maximum detection range of about 100 m under the condition that the SNR is lower than −5 dB [100]. However, for far-field conditions, the acoustic localization effect may not meet expectations. In such cases, RF-based localization can complement this issue. Therefore, the use of multiple technologies to compose an anti-UAV system is necessary to achieve effective UAV detection and tracking capabilities.

3.2. UAV Tracking

UAV tracking is a type of object tracking task that involves estimating the state and trajectory of the UAV. Object tracking (OT) is a fundamental open problem that has been extensively studied in the past decade. Two prominent paradigms for OT are Filters-based methods and deep Siamese Networks (SNs) [102]. In this section, we review recent literature on UAV tracking and discuss the advancements made in this field.

3.2.1. Filters-Based Tracker

The core of the prediction problem is to solve the distribution, which is particularly challenging when only part of the data is observed. In such cases, it is necessary to solve a conditional distribution, also known as the posterior distribution. In time-shifting systems, the current posterior distribution is often referred to as a filter. For instance, in object tracking, an inference is made based on the current state of the system, which involves a filtering task. Bayesian filtering is a widely used framework for solving filtering problems, based on a recursive method. Kalman filter is a popular Bayesian filter that is based on Gaussian process modeling. It uses a recursive method to solve linear filtering problems. The small memory footprint and high speed of the Kalman filter make it well-suited for real-time applications and embedded systems, as it only needs to retain the previous state and can efficiently update the estimated state in real-time. Different types of Kalman filters are among the most important technologies for moving target tracking [103,104,105]. In these studies, the Kalman filter is typically used for tracking after UAV detection and recognition. For instance, in Ref. [105], the authors focus on tracking a large number of targets simultaneously using Kalman filter.

In anti-UAV systems and UAV detection and tracking tasks, it is almost a consensus to utilize different types of sensors as they complement each other and greatly improve the detection effect. The Kalman filter is also known for its extraordinary advantages in multi-sensor data fusion, which is one of the reasons why it is widely used in UAV tracking tasks. By integrating information from multiple sensors, the Kalman filter can provide more accurate and reliable estimates of UAV states and trajectories. Therefore, the Kalman filter plays a crucial role in improving the effectiveness of multi-sensor-based anti-UAV systems.

Discriminative Correlation Filters (DCF) is a widely used method in visual tracking tasks. The DCF-based tracking approach involves training correlation filters online over regions of interest by minimizing a least-squares loss. The trained filters are then convolved by Fast Fourier Transform (FFT) to detect objects in consecutive frames [102]. DCF has also been shown to be a highly competitive method in vision-based UAV tracking tasks [106,107,108]. Through the exploitation of discriminative information between the target object and the background, DCF-based methods can achieve high tracking accuracy and robustness against challenging scenarios, such as occlusion and deformation. Therefore, DCF-based methods have gained significant attention in the field of UAV tracking and have been shown to outperform many other state-of-the-art methods.

3.2.2. Deep Siamese Networks

Deep Siamese Networks (SNs) have become a popular approach for learning similarities between target images and search image regions in visual tracking tasks. With SNs, real-time applications can fully leverage end-to-end learning and overcome the limitations of pretrained CNN. Siamese trackers are instructed to deal with various tracking challenges, such as rotations, perspective changes, and lighting changes, by using offline training videos. This enables the tracker to accurately localize target objects in consecutive frames [102]. Over the years, several advanced Siamese-based trackers have been proposed, such as SiamFC (Fully-Convolutional Siamese Networks) [109], SiamRPN++ (Siamese region proposal network plus plus) [110], and LTMU (High-Performance Long-Term Tracking with Meta-Updater) [111]. The robustness of these trackers against challenging scenarios, combined with their state-of-the-art performance on several benchmark datasets, has established them as highly competitive methods in the area of UAV tracking.

SiamFC [109]

A fully convolutional Siamese network is used in this classic tracking algorithm to perform cross-correlations between the template slice and the search area in order to locate the target. Additionally, a multi-scale strategy is employed to determine the appropriate scale of the object being tracked. By leveraging the discriminative information learned from the offline training videos, the Siamese-based tracker can accurately estimate the position and scale of the target object in real-time. On several benchmark datasets, this method has shown to achieve state-of-the-art performance and has gained considerable attention in the UAV tracking field.

SiamRPN++ [110]

SiamRPN++ is an advanced tracking algorithm based on the Siamese architecture that incorporates a region proposal network (RPN) into the Siamese network, enabling a very deep backbone network. The overall framework consists of two branches: a classification branch, which selects the optimal anchor, and a regression branch, which predicts the offsets of the anchor. By leveraging the RPN mechanism, SiamRPN++ can generate high-quality proposals for the target object, which enhances the robustness and speed of the tracking algorithm. SiamRPN++ is more robust and faster than SiamFC, thanks to the introduction of the RPN mechanism and the removal of the multi-scale strategy. SiamRPN++ has demonstrated state-of-the-art performance on several benchmark datasets and has become one of the most competitive tracking methods in the field of UAV tracking.

LTMU [111]

This method is not only a short-term tracker but also a long-term tracker. The main contribution of this method is its offline-trained meta-updater, which is utilized to determine whether the tracker needs to be updated in the current frame or not, resulting in improved robustness. Furthermore, a long-term tracking framework is designed that utilizes a SiamRPN-based re-detector, an online verifier, and an online local tracker in conjunction with the proposed meta-updater. By integrating these components, the tracker can maintain accurate tracking of the target object over an extended period, even in scenarios where the object undergoes significant changes, such as occlusion or appearance variations. In both short- and long-term tracking benchmarks, this method has demonstrated excellent discriminative capability and robustness, making it an extremely competitive method.

While the Kalman filter is widely used in UAV tracking tasks due to its low computational cost, achieving high accuracy results in practical applications is often challenging [112]. Recently, DCF methods and Siamese tracking algorithms have shown excellent performance in object tracking, thanks to their use of end-to-end offline learning. However, this has only been possible with the availability of large-scale training datasets [102]. Despite these advancements, achieving low-cost, high-precision real-time UAV tracking remains a challenge in anti-UAV research. The development of more efficient algorithms and the availability of more diverse and comprehensive datasets for training and testing are essential for further advancing the field of UAV tracking.

4. Anti-UAV System

In recent years, there has been a significant amount of literature on anti-UAV systems, and several anti-UAV systems have been used commercially and militarily.

One such system is DedroneTracker [113], a multi-sensor platform (RF, PTZ camera, radar) with countermeasure capabilities (jamming) released by Dedrone. The system can be extended and customized to meet specific field requirements and can automatically capture a portfolio of forensic data, including UAV make, model, time and length of UAV activity, and video verification. Dedrone also offers RF sensors and jammers, with a detection range of up to 1.5 km (up to 5 km in special cases) and a maximum jamming range of about 2 km. Droneshield [114] is another system that provides anti-UAV defense solutions, offering a range of standalone portable products as well as rapidly deployable fixed-site solutions. It employs a variety of surveillance technologies, including radar, audio, video, and radio frequency, to detect UAVs and provide effective countermeasures. Droneshield’s jammer can immediately cease video transmission back to the UAS operator and allow the UAV to respond to a live vertical controlled landing or return to the operator controller or starting point. Another RF-based UAV detection system is ARDRONIS [115], developed by Rohde and Schwarz. ARDRONIS performs detection for frequency-spread spectrum (FHSS) signals and WLAN signals, with a detection range of up to 7 km for commercial off-the-shelf remote signals and up to 5 km for UAVs such as the DJI Phantom 4 under ideal conditions. The AUDS [116] system, on the other hand, uses radar technology, video, and thermal imaging to detect and track UAVs and has directional radio frequency inhibition capabilities. With its Air Security Radar, the range of detection can be up to 10 km. The ORELIA Drone-Detector system [117] consists of an acoustic sensor and software for protected object monitoring, target tracing, and sensor adjustment. The detection range is about 100 m, and multiple acoustic detectors can be installed to detect UAVs in all directions. Falcon Shield [118] is a rapidly deployable, scalable, and modular system that combines electro-optics, electronic surveillance, and radar sensors. ELTA Systems [119] is another solution that combines radar, RF, and photoelectric sensors to detect and track UAVs more than 5 km away and to take soft and hard kill measures against them.

Each system uses different surveillance techniques and implements various functions, as summarized in Table 8. The examples of these systems are shown in Figure 3. These systems have demonstrated their effectiveness in UAV detection and tracking. To enhance their detection capabilities, the majority of anti-UAV systems opt to utilize multiple types of sensors. These systems primarily find application in scenarios such as airport operations and military protection. However, Dedrone stands out as a rare system provider that specifically addresses urban settings, including sports events and outdoor concerts. DeDrone and ARDRONIS are two anti-UAV systems that offer video evidence recording capabilities, although they employ different methods. DeDrone utilizes the visual monitor within the system to directly record video evidence when a UAV is detected. On the other hand, ARDRONIS obtains the UAV’s field of view by intercepting the visual transmission signal of the UAV. It is important to note that while DeDrone can record and save evidence whenever a UAV is detected, ARDRONIS can only achieve video recording when it successfully cracks the visual transmission signal of the UAV. Dedrone’s products and design concepts serve as a valuable source of inspiration for tackling the challenges of anti-UAV systems in urban environments. Their emphasis on scalable software platforms and leveraging existing infrastructure offers valuable insights for research and development in this field. By incorporating such approaches, it is possible to enhance the effectiveness and adaptability of anti-UAV systems in urban settings.

Based on our observation, the more mature anti-UAV systems are currently mainly focused on military applications, and these systems are often implemented using a combination of multiple sensors. RF monitoring, radar, and vision sensors are the three most commonly used types of multi-sensors. However, radar is not suitable for use in certain locations, such as urban environments, due to its strong radiation characteristics [22]. In urban environments, RF surveillance emerges as a crucial method for locating UAV pilots, while visual surveillance plays a pivotal role in capturing UAV flight videos as evidence of intrusions. This combination of RF and visual surveillance techniques provides a highly competitive solution for UAV detection in urban settings. Although the acoustic-based method has shortcomings such as small detection range and being easily affected by noise, it can also provide an effective supplement for UAV detection under certain conditions, such as poor visual conditions and complex electromagnetic environments.

Inspired by the above anti-UAV systems, we propose several suggestions for developing effective and scalable general anti-UAV systems. First, it is essential to consider the specific needs and requirements of the application scenario when selecting and combining sensors. Indeed, various environmental factors can impact the effectiveness of different surveillance methods in urban environments. For instance, visual surveillance may face challenges in locations with low visibility caused by heavy fog, sand, or dust. Similarly, complex electromagnetic environments can negatively affect the performance of RF surveillance. Additionally, acoustic surveillance may not be suitable in areas with strong winds or high levels of ambient noise. It is essential to consider these factors when selecting the most appropriate surveillance method for UAV detection, ensuring optimal performance in diverse urban scenarios.

Second, the detection and tracking components of the system should be integrated with appropriate countermeasure capabilities. One of the most prevalent countermeasure equipment against intruding UAVs is the use of RF jamming guns. These devices are designed to disrupt the communication and control signals of the UAVs, forcing them to either land or return to their point of origin. RF jamming guns are widely employed as a common UAV countermeasure. Another commonly utilized countermeasure involves deploying UAVs to capture intruding UAVs. This approach involves using specially designed UAVs equipped with nets, cables, or other mechanisms to physically intercept and capture the unauthorized UAVs. Both RF jamming guns and UAV capture methods serve as effective countermeasures in mitigating the risks and potential threats posed by unauthorized UAV activities.

Third, the system should be designed to be scalable and modular, allowing for easy deployment and adaptation to changing conditions. The design concept put forth by Dedrone Company offers valuable inspiration. It emphasizes the importance of creating scalable anti-UAV systems with a platform at the core. Such a design enables easier integration and maintenance of sensors in the future. By adopting a scalable approach, anti-UAV systems can adapt to evolving threats and technological advancements, ensuring flexibility and efficiency in the long run.

Fourth, data analytics and machine learning techniques can be employed to enhance the accuracy and efficiency of UAV detection and tracking. Indeed, machine learning techniques have already found extensive application in the field of UAV detection. The continuous advancements in machine learning algorithms and the availability of more comprehensive datasets are key factors in enhancing UAV detection capabilities. By leveraging more efficient learning methods and utilizing diverse and representative datasets, the accuracy and effectiveness of UAV detection systems can be significantly improved. This ongoing development in machine learning holds promise for further advancements in UAV detection technology.

Fifth, although accuracy is an important evaluation indicator, lightweight anti-UAV systems that sacrifice part of the accuracy seem to be more in demand in some non-important scenarios. In mobile scenarios or situations where budget constraints exist, lightweight anti-UAV systems that can operate on portable devices offer a more suitable solution. These systems, designed to be lightweight and portable, can be easily deployed and utilized in various environments. They provide flexibility and cost-effectiveness, making them a practical choice for scenarios where mobility and budget considerations are important factors. By leveraging lightweight anti-UAV systems, organizations can enhance their capabilities for UAV detection and mitigation while maintaining operational efficiency.

Furthermore, compatibility with existing sensors, such as ubiquitous video surveillance equipment, could significantly reduce the cost of anti-UAV systems. This suggestion of leveraging existing video surveillance equipment, inspired by Dedrone’s products, is indeed valuable. By ensuring compatibility and utilization of the already deployed video surveillance infrastructure, the deployment costs of anti-UAV systems can be significantly reduced. This approach aligns with the concept of using edge computing to assist in UAV detection, as previously discussed. By calibrating the physical locations of these monitoring devices, it becomes possible to detect and track illegally intruding UAVs effectively. This application scenario showcases the potential for cost-effective and efficient UAV detection by leveraging existing resources and edge computing capabilities.

5. Conclusions

In this paper, we have summarized the technical classification and implementation methods of UAV detection and tracking in urban IoT environment. We have also reviewed the performance of edge computing and deep learning empowered anti-UAV systems, highlighting their strengths and limitations. Furthermore, we have proposed several suggestions for developing effective and scalable general anti-UAV systems, such as considering the specific needs of the application scenario, integrating detection and tracking components with appropriate countermeasures, designing for scalability and modularity, employing data analytics and machine learning techniques, and ensuring compliance with relevant regulations.

To facilitate further research in the field of anti-UAV systems, we have also presented publicly available visual, acoustic, and radiofrequency datasets that can be used to evaluate the performance of different anti-UAV techniques and algorithms. We hope that these datasets will be useful for researchers in developing and testing new anti-UAV systems and techniques.

Author Contributions

Conceptualization, T.F., P.L. and H.H.; validation, F.X., H.L. and H.H.; formal analysis, P.L., Y.C. and T.F.; investigation, F.X., Y.H. and H.H.; resources, X.Y., Y.H. and H.H.; data curation, X.Y., H.L. and F.X.; writing—original draft preparation, X.Y. and Y.C.; writing—review and editing, X.Y. and P.L.; supervision, P.L. and T.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Zhejiang Public Information Industry Co., Ltd.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tian, S.; Li, Y.; Zhang, X.; Zheng, L.; Cheng, L.; She, W.; Xie, W. Fast UAV path planning in urban environments based on three-step experience buffer sampling DDPG. Digit. Commun. Netw. 2023. [Google Scholar] [CrossRef]
Ding, G.; Wu, Q.; Zhang, L.; Lin, Y.; Tsiftsis, T.A.; Yao, Y.D. An amateur drone surveillance system based on the cognitive Internet of Things. IEEE Commun. Mag. 2018, 56, 29–35. [Google Scholar] [CrossRef]
Shen, Y.; Pan, Z.; Liu, N.; You, X. Performance analysis of legitimate UAV surveillance system with suspicious relay and anti-surveillance technology. Digit. Commun. Netw. 2022, 8, 853–863. [Google Scholar] [CrossRef]
Lin, N.; Tang, H.; Zhao, L.; Wan, S.; Hawbani, A.; Guizani, M. A PDDQNLP Algorithm for Energy Efficient Computation Offloading in UAV-assisted MEC. IEEE Trans. Wirel. Commun. 2023. [Google Scholar] [CrossRef]
Deng, X.; Wang, L.; Gui, J.; Jiang, P.; Chen, X.; Zeng, F.; Wan, S. A review of 6G autonomous intelligent transportation systems: Mechanisms, applications and challenges. J. Syst. Archit. 2023, 142, 102929. [Google Scholar] [CrossRef]
Liu, J.; Peng, J.; Xu, W.; Liang, W.; Liu, T.; Peng, X.; Xu, Z.; Li, Z.; Jia, X. Maximizing Sensor Lifetime via Multi-node Partial-Charging on Sensors. IEEE Trans. Mob. Comput. 2022, 22, 6571–6584. [Google Scholar] [CrossRef]
Xu, W.; Xie, H.; Wang, C.; Liang, W.; Jia, X.; Xu, Z.; Zhou, P.; Wu, W.; Chen, X. An Approximation Algorithm for the h-Hop Independently Submodular Maximization Problem and Its Applications. IEEE/ACM Trans. Netw. 2023, 31, 1216–1229. [Google Scholar] [CrossRef]
Cheng, Z.; Liwang, M.; Chen, N.; Huang, L.; Guizani, N.; Du, X. Learning-based user association and dynamic resource allocation in multi-connectivity enabled unmanned aerial vehicle networks. Digit. Commun. Netw. 2022, in press. [Google Scholar] [CrossRef]
Heidari, A.; Jafari Navimipour, N.; Unal, M.; Zhang, G. Machine learning applications in internet-of-drones: Systematic review, recent deployments, and open issues. ACM Comput. Surv. 2023, 55, 1–45. [Google Scholar] [CrossRef]
Wang, L.; Deng, X.; Gui, J.; Jiang, P.; Zeng, F.; Wan, S. A review of Urban Air Mobility-enabled Intelligent Transportation Systems: Mechanisms, applications and challenges. J. Syst. Archit. 2023, 141, 102902. [Google Scholar] [CrossRef]
Iftikhar, S.; Asim, M.; Zhang, Z.; Muthanna, A.; Chen, J.; El-Affendi, M.; Sedik, A.; Abd El-Latif, A.A. Target Detection and Recognition for Traffic Congestion in Smart Cities Using Deep Learning-Enabled UAVs: A Review and Analysis. Appl. Sci. 2023, 13, 3995. [Google Scholar] [CrossRef]
Shi, J.; Cong, P.; Zhao, L.; Wang, X.; Wan, S.; Guizani, M. A two-stage strategy for UAV-enabled wireless power transfer in unknown environments. IEEE Trans. Mob. Comput. 2023, in press. [Google Scholar] [CrossRef]
Abbas, N.; Abbas, Z.; Liu, X.; Khan, S.S.; Foster, E.D.; Larkin, S. A Survey: Future Smart Cities Based on Advance Control of Unmanned Aerial Vehicles (UAVs). Appl. Sci. 2023, 13, 9881. [Google Scholar] [CrossRef]
Motlagh, N.H.; Taleb, T.; Arouk, O. Low-altitude unmanned aerial vehicles-based internet of things services: Comprehensive survey and future perspectives. IEEE Internet Things J. 2016, 3, 899–922. [Google Scholar] [CrossRef]
Weng, L.; Zhang, Y.; Yang, Y.; Fang, M.; Yu, Z. A mobility compensation method for drones in SG-eIoT. Digit. Commun. Netw. 2021, 7, 196–200. [Google Scholar] [CrossRef]
Markets. UAV Market by Point of Sale, Systems, Platform (Civil & Commercial, and Defense & Government), Function, End Use, Application, Type (Fixed Wing, Rotary Wing, Hybrid), Mode of Operation, Mtow, Range & Region-Global Forecast to 2027; Technical Report; Research and Markets: Dublin, Ireland, 2022. [Google Scholar]
Chamola, V.; Kotesh, P.; Agarwal, A.; Naren; Gupta, N.; Guizani, M. A Comprehensive Review of Unmanned Aerial Vehicle Attacks and Neutralization Techniques. Ad Hoc Netw. 2021, 111, 102324. [Google Scholar] [CrossRef]
Li, S.; Chai, Y.; Guo, M.; Liu, Y. Research on Detection Method of UAV Based on micro-Doppler Effect. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 3118–3122. [Google Scholar] [CrossRef]
Pappu, C.S.; Beal, A.N.; Flores, B.C. Chaos based frequency modulation for joint monostatic and bistatic radar-communication systems. Remote Sens. 2021, 13, 4113. [Google Scholar] [CrossRef]
Abd, M.H.; Al-Suhail, G.A.; Tahir, F.R.; Ali Ali, A.M.; Abbood, H.A.; Dashtipour, K.; Jamal, S.S.; Ahmad, J. Synchronization of monostatic radar using a time-delayed chaos-based FM waveform. Remote Sens. 2022, 14, 1984. [Google Scholar] [CrossRef]
Mehta, S.; Rastegari, M.; Caspi, A.; Shapiro, L.; Hajishirzi, H. Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 552–568. [Google Scholar]
Shi, X.; Yang, C.; Xie, W.; Liang, C.; Shi, Z.; Chen, J. Anti-Drone System with Multiple Surveillance Technologies: Architecture, Implementation, and Challenges. IEEE Commun. Mag. 2018, 56, 68–74. [Google Scholar] [CrossRef]
Farlik, J.; Kratky, M.; Casar, J.; Stary, V. Radar cross section and detection of small unmanned aerial vehicles. In Proceedings of the 2016 17th International Conference on Mechatronics-Mechatronika (ME), Prague, Czech Republic, 7–9 December 2016; pp. 1–7. [Google Scholar]
Zhang, J.; Liu, M.; Zhao, N.; Chen, Y.; Yang, Q.; Ding, Z. Spectrum and energy efficient multi-antenna spectrum sensing for green UAV communication. Digital Commun. Netw. 2022, 9, 846–855. [Google Scholar] [CrossRef]
Unlu, E.; Zenou, E.; Riviere, N.; Dupouy, P.E. Deep learning-based strategies for the detection and tracking of drones using several cameras. IPSJ Trans. Comput. Vis. Appl. 2019, 11, 7. [Google Scholar] [CrossRef]
Fang, H.; Xia, M.; Zhou, G.; Chang, Y.; Yan, L. Infrared small UAV target detection based on residual image prediction via global and local dilated residual networks. IEEE Geosci. Remote. Sens. Lett. 2021, 19, 7002305. [Google Scholar] [CrossRef]
Al-Emadi, S.; Al-Ali, A.; Mohammad, A.; Al-Ali, A. Audio based drone detection and identification using deep learning. In Proceedings of the 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC), Tangier, Morocco, 24–28 June 2019; pp. 459–464. [Google Scholar]
Dumitrescu, C.; Minea, M.; Costea, I.M.; Cosmin Chiva, I.; Semenescu, A. Development of an acoustic system for UAV detection. Sensors 2020, 20, 4870. [Google Scholar]
Nie, W.; Han, Z.C.; Zhou, M.; Xie, L.B.; Jiang, Q. UAV detection and identification based on WiFi signal and RF fingerprint. IEEE Sensors J. 2021, 21, 13540–13550. [Google Scholar]
Tugnait, J.K. Detection of non-Gaussian signals using integrated polyspectrum. IEEE Trans. Signal Process. 1994, 42, 3137–3149. [Google Scholar]
Yao, Y.; Yu, L.; Chen, Y. Specific Emitter Identification Based on Square Integral Bispectrum Features. In Proceedings of the 2020 IEEE 20th International Conference on Communication Technology (ICCT), Nanning, China, 28–31 October 2020; pp. 1311–1314. [Google Scholar] [CrossRef]
Nie, W.; Han, Z.C.; Li, Y.; He, W.; Xie, L.B.; Yang, X.L.; Zhou, M. UAV detection and localization based on multi-dimensional signal features. IEEE Sensors J. 2021, 22, 5150–5162. [Google Scholar]
Mo, Y.; Huang, J.; Qian, G. UAV Tracking by Identification Using Deep Convolutional Neural Network. In Proceedings of the 2022 IEEE 8th International Conference on Computer and Communications (ICCC), Chengdu, China, 9–12 December 2022; pp. 1887–1892. [Google Scholar]
Swinney, C.J.; Woods, J.C. Low-Cost Raspberry-Pi-Based UAS Detection and Classification System Using Machine Learning. Aerospace 2022, 9, 738. [Google Scholar] [CrossRef]
Lu, S.; Wang, W.; Zhang, M.; Li, B.; Han, Y.; Sun, D. Detect the Video Recording Act of UAV through Spectrum Recognition. In Proceedings of the 2022 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 24–26 June 2022; pp. 559–564. [Google Scholar]
He, Z.; Huang, J.; Qian, G. UAV Detection and Identification Based on Radio Frequency Using Transfer Learning. In Proceedings of the 2022 IEEE 8th International Conference on Computer and Communications (ICCC), Virtual, 9–12 December 2022; pp. 1812–1817. [Google Scholar]
Li, T.; Hong, Z.; Cai, Q.; Yu, L.; Wen, Z.; Yang, R. Bissiam: Bispectrum siamese network based contrastive learning for uav anomaly detection. IEEE Trans. Knowl. Data Eng. 2021. [Google Scholar] [CrossRef]
Dong, Q.; Liu, Y.; Liu, X. Drone sound detection system based on feature result-level fusion using deep learning. Multimed. Tools Appl. 2023, 82, 149–171. [Google Scholar]
Aydın, İ.; Kızılay, E. Development of a new Light-Weight Convolutional Neural Network for acoustic-based amateur drone detection. Appl. Acoust. 2022, 193, 108773. [Google Scholar]
Schweinhart, B. Persistent homology and the upper box dimension. Discret. Comput. Geom. 2021, 65, 331–364. [Google Scholar]
Higuchi, T. Approach to an irregular time series on the basis of the fractal theory. Phys. D Nonlinear Phenom. 1988, 31, 277–283. [Google Scholar]
Zhang, X.D.; Shi, Y.; Bao, Z. A new feature vector using selected bispectra for signal classification with application in radar target recognition. IEEE Trans. Signal Process. 2001, 49, 1875–1885. [Google Scholar] [CrossRef]
Al-Sa’d, M.F.; Al-Ali, A.; Mohamed, A.; Khattab, T.; Erbad, A. RF-based drone detection and identification using deep learning approaches: An initiative towards a large open source drone database. Future Gener. Comput. Syst. 2019, 100, 86–97. [Google Scholar]
Mo, Y.; Huang, J.; Qian, G. Deep Learning Approach to UAV Detection and Classification by Using Compressively Sensed RF Signal. Sensors 2022, 22, 3072. [Google Scholar] [PubMed]
Allahham, M.S.; Al-Sa’d, M.F.; Al-Ali, A.; Mohamed, A.; Khattab, T.; Erbad, A. DroneRF dataset: A dataset of drones for RF-based detection, classification and identification. Data Brief 2019, 26, 104313. [Google Scholar] [CrossRef] [PubMed]
Ezuma, M.; Erden, F.; Anjinappa, C.K.; Ozdemir, O.; Guvenc, I. Drone Remote Controller RF Signal Dataset. 2020. Available online: https://ieee-dataport.org/open-access/drone-remote-controller-rf-signal-dataset (accessed on 1 June 2023). [CrossRef]
Vuorenmaa, M.; Marin, J.; Heino, M.; Turunen, M.; Riihonen, T. Radio-Frequency Control and Video Signal Recordings of Drones. 2020. Available online: https://zenodo.org/record/4264467 (accessed on 1 June 2023). [CrossRef]
Basak, S.; Rajendran, S.; Pollin, S.; Scheers, B. Drone classification from RF fingerprints using deep residual nets. In Proceedings of the 2021 International Conference on COMmunication Systems & NETworkS (COMSNETS), Bangalore, India, 5–9 January 2021; pp. 548–555. [Google Scholar] [CrossRef]
Swinney, C.J.; Woods, J.C. DroneDetect Dataset: A Radio Frequency dataset of Unmanned Aerial System (UAS) Signals for Machine Learning Detection & Classification. 2021. Available online: https://ieee-dataport.org/open-access/dronedetect-dataset-radio-frequency-dataset-unmanned-aerial-system-uas-signals-machine (accessed on 1 June 2023).
Sazdić-Jotić, B.; Pokrajac, I.; Bajčetić, J.; Bondžulić, B.; Obradović, D. Single and multiple drones detection and identification using RF based deep learning algorithm. Expert Syst. Appl. 2022, 187, 115928. [Google Scholar] [CrossRef]
Medaiyese, O.; Ezuma, M.; Lauf, A.; Adeniran, A. Cardinal RF (CardRF): An Outdoor UAV/UAS/Drone RF Signals with Bluetooth and WiFi Signals Dataset. 2022. Available online: https://ieee-dataport.org/documents/cardinal-rf-cardrf-outdoor-uavuasdrone-rf-signals-bluetooth-and-wifi-signals-dataset (accessed on 1 June 2023). [CrossRef]
Svanström, F.; Alonso-Fernandez, F.; Englund, C. Drone Detection and Tracking in Real-Time by Fusion of Different Sensing Modalities. Drones 2022, 6, 317. [Google Scholar]
Uddin, Z.; Qamar, A.; Alharbi, A.G.; Orakzai, F.A.; Ahmad, A. Detection of Multiple Drones in a Time-Varying Scenario Using Acoustic Signals. Sustainability 2022, 14, 4041. [Google Scholar]
Jamil, S.; Rahman, M.; Ullah, A.; Badnava, S.; Forsat, M.; Mirjavadi, S.S. Malicious UAV detection using integrated audio and visual features for public safety applications. Sensors 2020, 20, 3923. [Google Scholar]
Guo, J.; Ahmad, I.; Chang, K. Classification, positioning, and tracking of drones by HMM using acoustic circular microphone array beamforming. EURASIP J. Wirel. Commun. Netw. 2020, 2020, 1–19. [Google Scholar]
Gupta, H.; Gupta, D. LPC and LPCC method of feature extraction in Speech Recognition System. In Proceedings of the 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence), Noida, India, 14–15 January 2016; pp. 498–502. [Google Scholar]
Uddin, Z.; Altaf, M.; Bilal, M.; Nkenyereye, L.; Bashir, A.K. Amateur Drones Detection: A machine learning approach utilizing the acoustic signals in the presence of strong interference. Comput. Commun. 2020, 154, 236–245. [Google Scholar]
Al-Emadi, S.A.; Al-Ali, A.K.; Al-Ali, A.; Mohamed, A. Audio Based Drone Detection and Identification using Deep Learning. In Proceedings of the IWCMC 2019 Vehicular Symposium (IWCMC-VehicularCom 2019), Tangier, Morocco, 24–28 June 2019. [Google Scholar]
Casabianca, P.; Zhang, Y. Acoustic-based UAV detection using late fusion of deep neural networks. Drones 2021, 5, 54. [Google Scholar]
Viola, P.; Jones, M.J. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the CVPR’05, San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 886–893. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the CVPR, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the ICCV, Santiago, Chile, 13–16 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the CVPR, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Ajakwe, S.O.; Ihekoronye, V.U.; Kim, D.S.; Lee, J.M. DRONET: Multi-Tasking Framework for Real-Time Industrial Facility Aerial Surveillance and Safety. Drones 2022, 6, 46. [Google Scholar]
Wang, J.; Hongjun, W.; Liu, J.; Zhou, R.; Chen, C.; Liu, C. Fast and Accurate Detection of UAV Objects Based on Mobile-Yolo Network. In Proceedings of the 2022 14th International Conference on Wireless Communications and Signal Processing (WCSP), Virtually, 1–3 November 2022; pp. 1–5. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the ECCV, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the ICCV, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar]
Everingham, M.; Eslami, S.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the ECCV, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
Pawełczyk, M.; Wojtyra, M. Real world object detection dataset for quadcopter unmanned aerial vehicle detection. IEEE Access 2020, 8, 174394–174409. [Google Scholar]
Zheng, Y.; Chen, Z.; Lv, D.; Li, Z.; Lan, Z.; Zhao, S. Air-to-air visual detection of micro-uavs: An experimental evaluation of deep learning. IEEE Rob. Autom. Lett. 2021, 6, 1020–1027. [Google Scholar]
Walter, V.; Vrba, M.; Saska, M. On training datasets for machine learning-based visual relative localization of micro-scale UAVs. In Proceedings of the ICRA, Paris, France, 31 May–31 August 2020; pp. 10674–10680. [Google Scholar]
Chen, Y.; Aggarwal, P.; Choi, J.; Kuo, C.C.J. A deep learning approach to drone monitoring. In Proceedings of the APSIPA ASC, Kuala Lumpur, Malaysia, 12–15 December 2017; pp. 686–691. [Google Scholar]
Zhao, J.; Zhang, J.; Li, D.; Wang, D. Vision-Based Anti-UAV Detection and Tracking. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25323–25334. [Google Scholar]
Jiang, N.; Wang, K.; Peng, X.; Yu, X.; Wang, Q.; Xing, J.; Li, G.; Zhao, J.; Guo, G.; Han, Z. Anti-UAV: A large multi-modal benchmark for UAV tracking. arXiv 2021, arXiv:2101.08466. [Google Scholar]
Bosquet, B.; Mucientes, M.; Brea, V. STDnet: A ConvNet for Small Target Detection. In Proceedings of the 29th British Machine Vision Conference, Newcastle, UK, 3–6 September 2018. [Google Scholar]
Ajakwe, S.O.; Saviour, I.I.; Kim, J.H.; Kim, D.S.; Lee, J.M. BANDA: A Novel Blockchain-Assisted Network for Drone Authentication. In Proceedings of the 2023 Fourteenth International Conference on Ubiquitous and Future Networks (ICUFN), Paris, France, 4–7 July 2023; pp. 120–125. [Google Scholar]
Heidari, A.; Navimipour, N.J.; Unal, M. A Secure Intrusion Detection Platform Using Blockchain and Radial Basis Function Neural Networks for Internet of Drones. IEEE Internet Things J. 2023, 10, 8445–8454. [Google Scholar]
Huynh-The, T.; Pham, Q.V.; Nguyen, T.V.; Da Costa, D.B.; Kim, D.S. RF-UAVNet: High-performance convolutional network for RF-based drone surveillance systems. IEEE Access 2022, 10, 49696–49707. [Google Scholar]
Sun, H.; Yang, J.; Shen, J.; Liang, D.; Ning-Zhong, L.; Zhou, H. TIB-Net: Drone detection network with tiny iterative backbone. IEEE Access 2020, 8, 130697–130707. [Google Scholar]
Liu, Y.; Liu, D.; Wang, B.; Chen, B. Mob-YOLO: A Lightweight UAV Object Detection Method. In Proceedings of the 2022 International Conference on Sensing, Measurement & Data Analytics in the Era of Artificial Intelligence (ICSMD), Harbin, China, 22–24 December 2022; pp. 1–6. [Google Scholar]
Golam, M.; Akter, R.; Naufal, R.; Doan, V.S.; Lee, J.M.; Kim, D.S. Blockchain Inspired Intruder UAV Localization Using Lightweight CNN for Internet of Battlefield Things. In Proceedings of the MILCOM 2022—2022 IEEE Military Communications Conference (MILCOM), Norfolk, VA, USA, 28 November–2 December 2022; pp. 342–349. [Google Scholar]
Barthelme, A.; Utschick, W. DoA estimation using neural network-based covariance matrix reconstruction. IEEE Signal Process. Lett. 2021, 28, 783–787. [Google Scholar]
Wang, M.; Yang, S.; Wu, S.; Luo, F. A RBFNN approach for DoA estimation of ultra wideband antenna array. Neurocomputing 2008, 71, 631–640. [Google Scholar]
Wu, L.; Liu, Z.M.; Huang, Z.T. Deep Convolution Network for Direction of Arrival Estimation With Sparse Prior. IEEE Signal Process. Lett. 2019, 26, 1688–1692. [Google Scholar] [CrossRef]
Akter, R.; Doan, V.S.; Huynh-The, T.; Kim, D.S. RFDOA-Net: An Efficient ConvNet for RF-Based DOA Estimation in UAV Surveillance Systems. IEEE Trans. Veh. Technol. 2021, 70, 12209–12214. [Google Scholar] [CrossRef]
Van Veen, B.D.; Buckley, K.M. Beamforming: A versatile approach to spatial filtering. IEEE Assp. Mag. 1988, 5, 4–24. [Google Scholar]
Schmidt, R. Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 1986, 34, 276–280. [Google Scholar]
Roy, R.; Kailath, T. ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Trans. Acoust. Speech, Signal Process. 1989, 37, 984–995. [Google Scholar]
Wu, L.L.; Huang, Z.T. Coherent SVR learning for wideband direction-of-arrival estimation. IEEE Signal Process. Lett. 2019, 26, 642–646. [Google Scholar]
Sun, Y.; Chen, J.; Yuen, C.; Rahardja, S. Indoor sound source localization with probabilistic neural network. IEEE Trans. Ind. Electron. 2017, 65, 6403–6413. [Google Scholar]
Chakrabarty, S.; Habets, E.A. Multi-speaker DOA estimation using deep convolutional networks trained with noise signals. IEEE J. Sel. Top. Signal Process. 2019, 13, 8–21. [Google Scholar]
Blanchard, T.; Thomas, J.H.; Raoof, K. Acoustic localization and tracking of a multi-rotor unmanned aerial vehicle using an array with few microphones. J. Acoust. Soc. Am. 2020, 148, 1456–1467. [Google Scholar]
Alameda-Pineda, X.; Horaud, R. A geometric approach to sound source localization from time-delay estimates. IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 1082–1095. [Google Scholar]
Chen, J.; Zhao, Y.; Zhao, C.; Zhao, Y. Improved two-step weighted least squares algorithm for TDOA-based source localization. In Proceedings of the 2018 19th International Radar Symposium (IRS), Piscataway, NJ, USA, 20–22 June 2018; pp. 1–6. [Google Scholar]
Shi, Z.; Chang, X.; Yang, C.; Wu, Z.; Wu, J. An acoustic-based surveillance system for amateur drones detection and localization. IEEE Trans. Veh. Technol. 2020, 69, 2731–2739. [Google Scholar]
Heydari, Z.; Mahabadi, A. Real-time TDOA-based stationary sound source direction finding. Multimedia Tools Appl. 2023, 1–32. [Google Scholar] [CrossRef]
Javed, S.; Danelljan, M.; Khan, F.S.; Khan, M.H.; Felsberg, M.; Matas, J. Visual object tracking with discriminative filters and siamese networks: A survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 6552–6574. [Google Scholar]
Jahangir, M.; Ahmad, B.I.; Baker, C.J. Robust drone classification using two-stage decision trees and results from SESAR SAFIR trials. In Proceedings of the 2020 IEEE International Radar Conference (RADAR), Washington, DC, USA, 28–30 April 2020; pp. 636–641. [Google Scholar]
Jouaber, S.; Bonnabel, S.; Velasco-Forero, S.; Pilte, M. NNAKF: A Neural Network Adapted Kalman Filter for Target Tracking. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 4075–4079. [Google Scholar]
Campbell, M.A.; Clark, D.E.; de Melo, F. An algorithm for large-scale multitarget tracking and parameter estimation. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 2053–2066. [Google Scholar]
Huang, Z.; Fu, C.; Li, Y.; Lin, F.; Lu, P. Learning aberrance repressed correlation filters for real-time UAV tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2891–2900. [Google Scholar]
Li, Y.; Fu, C.; Ding, F.; Huang, Z.; Pan, J. Augmented memory for correlation filters in real-time UAV tracking. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 1559–1566. [Google Scholar]
Yuan, D.; Chang, X.; Li, Z.; He, Z. Learning adaptive spatial-temporal context-aware correlation filters for UAV tracking. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2022, 18, 1–18. [Google Scholar]
Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H. Fully-convolutional siamese networks for object tracking. In Proceedings of the Computer Vision–ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–10 October 2016; Part II 14. pp. 850–865. [Google Scholar]
Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4282–4291. [Google Scholar]
Dai, K.; Zhang, Y.; Wang, D.; Li, J.; Lu, H.; Yang, X. High-performance long-term tracking with meta-updater. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6298–6307. [Google Scholar]
Zitar, R.A.; Mohsen, A.; Seghrouchni, A.E.; Barbaresco, F.; Al-Dmour, N.A. Intensive Review of Drones Detection and Tracking: Linear Kalman Filter Versus Nonlinear Regression, an Analysis Case. Arch. Comput. Methods Eng. 2023, 30, 2811–2830. [Google Scholar]
DeDrone. Counter Drone Software. [EB/OL]. Available online: https://www.dedrone.com/products/counter-drone-software (accessed on 1 June 2023).
Dronesshield. DroneSentry-X Vehicle, Ship and Fixed Site C-UAS Detect-and-Defeat. [EB/OL]. Available online: https://www.droneshield.com/products/sentry-x (accessed on 1 June 2023).
Rohde&Schwarz. R&S®ARDRONIS for Effective Drone Defense. [EB/OL]. Available online: https://www.rohde-schwarz.com.cn/products/aerospace-defense-security/counter-drone-systems_250881.html (accessed on 1 June 2023).
blighterz. AUDS Anti-UAV Defence System. [EB/OL]. Available online: https://www.blighter.com/products/auds-anti-uav-defence-system/ (accessed on 1 June 2023).
dronebouncer. Orelia Drone-Detector. [EB/OL]. Available online: http://dronebouncer.com/en/orelia-drone-detector/ (accessed on 1 June 2023).
leonardo. Falcon Shield. [EB/OL]. Available online: https://uk.leonardo.com/en/innovation/falcon-shield/ (accessed on 1 June 2023).
IAI. Elta-Systems. [EB/OL]. Available online: https://www.iai.co.il/about/groups/elta-systems (accessed on 1 June 2023).

Figure 1. Some example images in (a) Real World, (b) Det-Fly, (c) MIDGARD, and (d) USC-Drone.

Figure 2. The positioning model of AOA involves projecting the position of the UAV onto the ground using the AOA of the LOS (line of sight) path

β_{i}

selected by the i-th receiver. The height

h_{i}

of the UAV can then be calculated using the pitch angle

α_{i}

of each receiver and the projected position

p o s

of the UAV on the ground. By averaging the height calculated from multiple receivers, the overall height of the UAV h can be obtained. Finally, the UAV’s space position

p o s d

can be determined by combining the height h and the projected position

p o s

on the ground.

Figure 2. The positioning model of AOA involves projecting the position of the UAV onto the ground using the AOA of the LOS (line of sight) path

β_{i}

selected by the i-th receiver. The height

h_{i}

of the UAV can then be calculated using the pitch angle

α_{i}

of each receiver and the projected position

p o s

of the UAV on the ground. By averaging the height calculated from multiple receivers, the overall height of the UAV h can be obtained. Finally, the UAV’s space position

p o s d

can be determined by combining the height h and the projected position

p o s

on the ground.

Figure 3. Examples of anti-UAV systems: (a) DeDrone, (b) Droneshield, (c) ARDRONIS, (d) AUDS, (e) ORELIA, (f) Falcon Shield, (g) ELTA. The aforementioned images are sourced from multiple official websites of anti-UAV systems.

Table 1. Comparison of UAV surveillance technologies.

Surveillance Technology	Localization/Tracking Method	Detection Range	Challenges
Radar	Doppler-based tracking	10,000 m	Low radar cross section
Radar	Delay-based localization	10,000 m	Low speed and altitude
Acoustic	TDOA\AOA-based localization	0–300 m	High ambient noise
Vision	Motion-based tracking	100–1000 m	Confuse with birds
Vision	Motion-based tracking	100–1000 m	Indistingushable small objects
RF	RSS\AOA-based localization	5000 m	Ambient RF noise
			Multipath
			Non-line of sight

Table 2. Comparison of different UAV detection methods.

Category	Method	Accuracy
Traditional RF-based Detection	Fractal dimension (FD) [29]	100%
	axially integrated bispectra (AIB) [30]	98%
	square integrated bispectra (SIB) [31]	96%
	Signal Spectrum (SFS) [32]	97.85%
	Wavelet energy entropy (WEE) [32]	93.75%
	Power spectral entropy (PSE) [32]	83.85%
RF-based Detection Using Deep Learning	Y. Mo et al. [33]	99%
	C. J. Swinney et al. [34]	100%
	S. Lu et al. [35]	98%
	Z. He et al. [36]	90.2%
	T. Li et al. [37]	98.57%
Traditional Acoustic-based Detection	Mel Frequency Cepstrum Coefficient (MFCC)	97%
Acoustic-based Detection Using Deep Learning	S. Al-Emadi et al. [27]	96.38%
	Q. Dong et al. [38]	99%
	İ Aydın et al. [39]	98.78%
Vision-based Detection Using Deep Learning	RCNN	58.50%
	SPPNet	59.20%
	Fast RCNN	70.00%
	Faster RCNN	73.20%
	YOLOv3	63.40%
	SSD	76.80%
	RetinaNet	59.10%

Traditional RF methods involving RF feature extraction have made progress in recent years. The RF approach based on deep learning uses technology to convert RF signals into images, facilitating feature extraction using deep networks. Traditional acoustic methods mainly rely on Mel-Frequency cepstrum coefficient (MFCC) feature extraction, often supplemented by linear predictive cepstrum coefficient (LPCC). Acoustic methods based on deep learning convert sound signals into spectrograms and extract relevant features. In recent years, vision-based detection methods, especially those based on deep learning, have gained prominence. The models listed in the study use public datasets that are not specifically optimized for UAV detection. However, when these models are optimized and trained on specialized UAV datasets, detection accuracy can often exceed 90%. The effectiveness of UAV detection methods depends on factors such as dataset quality, training techniques, and optimization for specific use cases.

Table 3. Comparison of traditional RF-based detection. the processed RF features are input into the machine learning algorithm (KNN, SVM, RandF) to identify the UAV, and the accuracy rate is optimized.

Fingerprint	Accuracy
Fractal dimension (FD) [29]	100%
Axially integrated bispectra (AIB) [30]	98%
Square integrated bispectra (SIB) [31]	96%
Signal Spectrum (SFS) [32]	97.85%
Wavelet energy entropy (WEE) [32]	93.75%
Power spectral entropy (PSE) [32]	83.85%

Table 4. Comparison of RF-based detection using deep learning.

Authors	Accuracy
Y. Mo et al. [33]	99%
C. J. Swinney et al. [34]	100%
S. Lu et al. [35]	98%
Z. He et al. [36]	90.2%
T. Li et al. [37]	98.57%

Table 5. Comparison of acoustic-based detection using deep learning.

Authors	Accuracy
S. Al-Emadi et al. [27]	96.38%
Q. Dong et al. [38]	99%
İ Aydın et al. [39]	98.78%

Table 6. Accuracy of object detection on VOC07, VOC12 and MS-COCO datasets. Detectors in the table: RCNN, SPPNet, Fast RCNN, Faster RCNN, YOLOv3, SSD, RetinaNet.

mAp	VOC07 mAP	VOC12 mAP	COCO [email protected]	COCO mAP@[.5, .95]
RCNN	58.50%	53.70%	-	-
SPPNet	59.20%	-	-	-
Fast RCNN	70.00%	68.40%	35.90%	19.70%
Faster RCNN	73.20%	70.40%	42.70%	21.90%
YOLOv3	63.40%	57.90%	-	-
SSD	76.80%	74.90%	46.50%	26.80%
RetinaNet	-	-	59.10%	39.10%

Table 7. Comparison of AOA estimation techniques.

Approach	Model	Advantage	Disadvantage
Model-driven	Beamforming [91]	Easy installation	High computational complexity
	Beamforming [91]	Easy installation	Vulnerable at high SNR
	MUSIC [92]	High accuracy	Performance degradation of far field
	MUSIC [92]	High accuracy	High installation cost
	ESPRIT [93]	Array calibration is not required	Lower accuracy in non-ideal sensor design
	ESPRIT [93]	More complex than MUSIC	High expensive system
Machine learning	CWSVR [94]	Independent on array geometrics	Degrade performance at non
	CWSVR [94]	Independent on array geometrics	identical distributions
	GCA [95]	Less expensive than MD approach	Unable to process high density data
Deep learning	Spectrum CNN [89]	Construct complicated propagation model	Low accuracy at low SNR
Deep learning	Multi-speaker CNN [96]	Easy to perform at non-identical	High computational expense

Table 8. Comparison of anti-UAV systems.

System	Technology				Function			Video	Range
System	Radar	Audio	Visual	RF	Detection	Tracking	Defense	Recording	Range
DedroneTracker	✔	✘	✔	✔	✔	✔	✔	✔	5000 m
Dronesshield	✔	✔	✔	✔	✔	✔	✔	✘	-
ARDRONIS	✘	✘	✘	✔	✔	✔	✔	✔	5000 m
AUDS	✔	✘	✔	✘	✔	✔	✔	✘	10,000 m
ORELIA	✘	✔	✘	✘	✔	✔	✘	✘	100 m
Falcon Shield	✔	✘	✔	✔	✔	✔	✔	✘	-
ELTA	✔	✘	✔	✔	✔	✔	✔	✘	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yan, X.; Fu, T.; Lin, H.; Xuan, F.; Huang, Y.; Cao, Y.; Hu, H.; Liu, P. UAV Detection and Tracking in Urban Environments Using Passive Sensors: A Survey. Appl. Sci. 2023, 13, 11320. https://doi.org/10.3390/app132011320

AMA Style

Yan X, Fu T, Lin H, Xuan F, Huang Y, Cao Y, Hu H, Liu P. UAV Detection and Tracking in Urban Environments Using Passive Sensors: A Survey. Applied Sciences. 2023; 13(20):11320. https://doi.org/10.3390/app132011320

Chicago/Turabian Style

Yan, Xiaochen, Tingting Fu, Huaming Lin, Feng Xuan, Yi Huang, Yuchen Cao, Haoji Hu, and Peng Liu. 2023. "UAV Detection and Tracking in Urban Environments Using Passive Sensors: A Survey" Applied Sciences 13, no. 20: 11320. https://doi.org/10.3390/app132011320

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

UAV Detection and Tracking in Urban Environments Using Passive Sensors: A Survey

Abstract

1. Introduction

2. UAV Detection and Identification

2.1. RF-Based Detection

2.1.1. Traditional RF-Based Detection

2.1.2. RF-Based Detection Using Deep Learning

2.1.3. Datasets and Metrics for RF-Based Detection

2.2. Acoustic-Based Detection

2.2.1. Traditional Acoustic-Based Detection

2.2.2. Acoustic-Based Detection Using Deep Learning

2.2.3. Datasets and Metrics for Acoustic-Based Detection

2.3. Vision-Based Detection

2.3.1. Traditional Object Detection

Viola Jones Detector

HOG Detector

2.3.2. Deep Learning Based Object Detection

RCNN

SPPNet

Fast RCNN

Faster RCNN

YOLO

SSD

RetinaNet

2.4. Datasets and Metrics for Vision-Based Detection

2.4.1. Well-Known Datasets

2.4.2. Object Detection Dataset for UAVs

2.5. Anti-UAV in IoT Environments

2.5.1. Detect and Identify UAVs Trying to Hack IoT

2.5.2. Anti-UAV with Edge Computing

3. UAV Localization and Tracking

3.1. UAV Localization

3.1.1. RF-Based Localization

3.1.2. Acoustic-Based Localization

3.2. UAV Tracking

3.2.1. Filters-Based Tracker

3.2.2. Deep Siamese Networks

SiamFC [109]

SiamRPN++ [110]

LTMU [111]

4. Anti-UAV System

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI