Dense Feature Matching for Hazard Detection and Avoidance Using Machine Learning in Complex Unstructured Scenarios

Posada, Daniel; Henderson, Troy

doi:10.3390/aerospace11050351

Open AccessArticle

Dense Feature Matching for Hazard Detection and Avoidance Using Machine Learning in Complex Unstructured Scenarios

by

Daniel Posada

^*

and

Troy Henderson

Space Technologies Laboratory, Embry-Riddle Aeronautical University, Daytona Beach, FL 32114, USA

^*

Author to whom correspondence should be addressed.

Aerospace 2024, 11(5), 351; https://doi.org/10.3390/aerospace11050351

Submission received: 1 February 2024 / Revised: 13 April 2024 / Accepted: 25 April 2024 / Published: 28 April 2024

(This article belongs to the Special Issue Advances in Entry, Descent, and Landing (EDL) for Planetary Exploration)

Download

Browse Figures

Versions Notes

Abstract

:

Exploring the Moon and Mars are crucial steps in advancing space exploration. Numerous missions aim to land and research in various lunar locations, some of which possess challenging surfaces with unchanging features. Some of these areas are cataloged as lunar light plains. Their main characteristics are that they are almost featureless and reflect more light than other lunar surfaces. This poses a challenge during navigation and landing. This paper compares traditional feature matching techniques, specifically scale-invariant feature transform and the oriented FAST and rotated BRIEF, and novel machine learning approaches for dense feature matching in challenging, unstructured scenarios, focusing on lunar light plains. Traditional feature detection methods often need help in environments characterized by uniform terrain and unique lighting conditions, where unique, distinguishable features are rare. Our study addresses these challenges and underscores the robustness of machine learning. The methodology involves an experimental analysis using images that mimic lunar-like landscapes, representing these light plains, to generate and compare feature maps derived from traditional and learning-based methods. These maps are evaluated based on their density and accuracy, which are critical for effective structure-from-motion reconstruction commonly utilized in navigation for landing. The results demonstrate that machine learning techniques enhance feature detection and matching, providing more intricate representations of environments with sparse features. This improvement indicates a significant potential for machine learning to boost hazard detection and avoidance in space exploration and other complex applications.

Keywords:

hazard detection and avoidance; machine learning; feature matching; light plains

1. Introduction

In the domain of autonomous navigation and hazard detection and avoidance, feature matching stands as a pivotal technique for relative navigation, especially in unstructured and challenging environments during the entry, descent, and landing maneuvers on other celestial bodies. Traditional feature matching methods, such as scale-invariant feature transform (SIFT) [1] and oriented FAST and rotated BRIEF (ORB) [2], have been foundational in interpreting and navigating terrestrial and extraterrestrial landscapes by identifying and mapping the environment. However, their efficacy can often be challenged in environments with few distinct features, such as the lunar plains, which can sometimes present unique difficulties, such as an apparent uniform flat terrain, variable illumination, and other photometric effects like the opposition effect, which can reduce visibility [3]. Examples of these landscapes on the Moon are referred to as lunar light planes [4]. They are smooth surfaces composed of maria-like deposits higher in albedo (reflectance). The light plains cover approximately 9.5% of the lunar surface, while the maria cover about 16% [5]. Figure 1 illustrates in green color some of the identified lunar plains across the lunar surface.

Furthermore, developing autonomous navigation systems capable of operating effectively in extraterrestrial environments poses unique challenges. One of the most significant hurdles is the creation of robust and reliable solutions within a limited time frame [7] and without the luxury of extensive ground intervention or pre-existing detailed maps [8]. Traditional feature matching methods, while effective in controlled or well-understood environments, often need to catch up in terms of rapidly adapting to the unpredictable and feature-sparse landscapes encountered in space exploration. Systems that can autonomously detect hazards, navigate, and map in real-time, without prior terrain knowledge, are imperative as only some missions can ensure enough data to characterize their landing site. This necessitates the exploration of more adaptive, self-sufficient algorithms capable of on-the-fly learning and decision-making. With their ability to learn from and adapt to new data, machine-learning approaches present a promising avenue for addressing these challenges [9].

By leveraging the potential of machine learning (ML), we aim to develop methodologies that not only overcome the limitations of traditional feature matching in unknown terrains but also cater to the urgent need for rapid deployment and autonomous functionality in such critical missions [10]. Figure 2 compares two different surfaces in the Apollo missions. The left shows the benefit of rocks and shadows as they create contrast and a structure that a feature matcher can use. The right image shows an area made of uniform material reflecting light uniformly. Even for the human eye, it is difficult to perceive the differences and focal planes given the surface’s homogeneity. This is an unstructured scenario, as it lacks features that can create shapes that become an identifiable pattern.

This paper aims to bridge the gap between traditional feature-matching techniques and learning-based approaches applied to the aerospace context. It is essential for this study to clarify a fundamental difference in the feature detection and matching used. Commonly, traditional matchers base their detection on local descriptors and pixels surrounding keypoints. On the other hand, learning-based methods use different structures to extract the feature by using convolutions or newer methods such as transformers. By comparing the performance of SIFT and ORB with advanced state-of-the-art ML methods under these challenging conditions [9,10,12,13,14], this study seeks to provide empirical evidence supporting the advantages of ML in generating detailed feature maps for hazard detection and avoidance. This comparison illustrates the limitations of traditional methods and highlights the potential of ML in autonomous navigation and hazard detection in uncharted and feature-sparse domains. Finally, this paper includes results and compares the implementation of DISK and Superpoint for feature detection and LightGlue for feature matching.

Our main contributions are:

Demonstrating that ML-based feature detectors and matchers can produce denser and more accurate feature maps in unstructured environments with complex light conditions.
Estimations of geometrical features such as slopes for safe landing are more accurate using these methods.
Proof of concept running on embedded hardware.

The rest of this paper is structured as follows: Section 2 provides some background literature on traditional and ML feature detectors and feature matchers. It also includes some literature on recent implementations of hazard navigation for safe landing. Section 3 will discuss how we implement and test these solutions and why they are essential for a safe landing. Section 4 will illustrate the results and discuss the advantages of these methods over the traditional methods for this application. This section also includes results from the actual implementation. Finally, Section 5 will provide some closing statements and future work on this topic.

2. Background

2.1. Lunar Light Plains

Research focusing on sparse environments like lunar landscapes reveals significant challenges for feature detection. The unique lunar conditions, such as homogeneous terrain and varying lighting due to the opposition effect, demand more sophisticated approaches. These landscapes are referred to as lunar light planes. They are smooth and composed of maria-like deposits higher in albedo (reflectance) than the smooth, flat, dark maria. Light plains cover about 9.5% of the lunar surface, while the maria cover about 16% [5]. Figure 3 illustrates an example of different light conditions a lunar landscape can experience during different seasons.

For this reason, some of the state-of-the-art hazard detection and avoidance methods have transitioned to rely on light detection and ranging (LiDAR) approaches, as the laser allows for the quick scanning and rebuilding of surfaces with high quality and robustness to variable light conditions [15,16]. However, no completely LiDAR systems have flown or achieved NASA’s Technology Readiness Level due to size, weight, and power consumption constraints. Currently, the industry is working on developing better systems [17,18]. Some of these systems will be tested in the new lunar mission campaign Commercial Lunar Payload Services (CLPS) [19]. This highlights a critical gap where traditional algorithms struggle, modern approaches are still in development, and ML techniques can offer significant advancements using passive sensors for missions that may not be able to incorporate a complete LiDAR system.

2.2. Traditional Feature Matching

The landscape of feature matching has been significantly shaped by algorithms like the scale-invariant feature transform, and oriented FAST, and rotated BRIEF [1,2]. Initially developed for structured environments, SIFT has been widely recognized for its scale and rotation invariance robustness, proving highly effective in various image contexts. This robustness has made SIFT an algorithm of choice for critical applications such as structure-from-motion (SfM) for reconstruction [20]. On the other hand, ORB stands out for its computational efficiency. This key factor has led to widespread use, particularly in applications requiring real-time processing, such as visual simultaneous localization and mapping (Visual SLAM). ORB’s efficiency, combined with its capability to provide reliable feature detection and matching in near real-time, makes it particularly suitable for determining the pose of cameras in structured environments [21].

However, these algorithms need help in unstructured environments. The effectiveness of a feature in such settings is highly contingent upon its distinctiveness relative to other detected features. In scenarios where a feature closely resembles others, it becomes less distinctive, complicating the matching process and diminishing its utility. This similarity can lead to an increased likelihood of false positives, adversely affecting the accuracy and density of feature maps [22]. Such issues highlight the need for algorithms that can adapt to the varying characteristics of structured and unstructured environments, ensuring reliable feature detection and matching across a broad spectrum of scenarios. This adaptability is essential for accurate mapping and effective camera pose estimation in diverse and dynamically changing environments.

2.3. Machine Learning Feature Detection and Matching

2.3.1. Learning-Based Feature Detection

The integration of machine learning into feature detection, represented by techniques like SuperPoint [12], ALIKED [13], and DISK [9], signifies a significant paradigm shift in the field of image recognition and matching.

SuperPoint, developed at New York University, is a machine learning-based technique designed to efficiently match and identify similar images within large datasets. It achieves this by compactly and robustly extracting and representing local features of an image using a fully convolutional neural network architecture. The network introduces the new concept of homographic adaptation. This capability allows it to match images despite substantial scale, rotation, and illumination changes. SuperPoint further improves efficiency by clustering similar descriptors into “superpixels”, followed by a similarity search wherein these descriptors are compared with those of reference images using distance metrics. This approach has made SuperPoint popular for computer vision tasks like image retrieval, object recognition, and video tracking due to its efficiency and robustness in large-scale image matching. It also achieved state-of-the-art homography estimation results on HPatches compared to LIFT, SIFT, and ORB.

ALIKED (Affine-Local Invariant Keypoint-based Extractor and Descriptor), developed by the University of Science and Technology of China, represents another leap forward. It merges global and local image descriptors, focusing on affine-local regions to capture extensive contextual information while maintaining invariance to scale, rotation, and illumination. ALIKED extracts keypoints from these regions, generating descriptors resistant to affine transformations by introducing a geometric invariant sparse deformable descriptor head (SDDH). This head mechanism learns the deformable positions of supporting features for each keypoint and constructs deformable descriptors. During the similarity search phase, these descriptors are compared with those of reference images using metrics such as Euclidean distance or cosine similarity. ALIKED has demonstrated its effectiveness in various computer vision tasks, including image retrieval, object recognition, and video tracking, due to its comprehensive feature capture and transformation invariance.

DISK (DIScrete Keypoints), developed by the Chinese University of Hong Kong, introduces a novel approach by learning a policy for gradient extraction, concentrating on the most critical image regions. DISK’s key aspects include policy learning to predict the importance of different image regions, local gradient extraction based on these predictions, and forming a dense descriptor that captures local and global image information. It also employs a keyword-based retrieval system, enhancing the matching process. DISK’s method of combining policy-based gradient extraction with keyword integration has been shown to outperform existing methods in tasks such as image retrieval, object recognition, and video tracks, making it a formidable tool in large-scale image-matching applications.

All these methods—SuperPoint, ALIKED, and DISK—demonstrate the transformative impact of machine learning, offering advanced solutions to complex computer vision challenges and significantly improving the effectiveness of image retrieval, object recognition, and video tracking in large-scale applications.

2.3.2. Learning-Based Feature Matching

Just like feature detection, the integration of learning-based methods into feature matching is represented by state-of-the-art techniques like SuperGlue [14], D2-Net [23], and LightGlue [10], creating a new robust shift in feature matching.

SuperGlue, a state-of-the-art keypoint matcher, has revolutionized the realm of computer vision with its amalgamation of geometric and semantic information. Developed by researchers at EPFL and Oxford, SuperGlue employs deep learning techniques (SuperPoint, although it is compatible with others) to detect keypoints and provide robust matching capabilities. By considering both geometric and semantic information, SuperGlue can determine correspondences even in challenging scenarios, such as occlusions and changes in viewpoint. This holistic approach ensures that keypoints are matched accurately, providing a reliable foundation for various computer vision tasks, including 3D reconstruction, image-based localization, and augmented reality.

D2-Net, a recent addition to the keypoint matching landscape, excels in achieving precise and repeatable keypoint detection and description. Developed by Inria, D2-Net utilizes deep learning to produce distinctive and robust local descriptors. Its training process incorporates a margin-based loss, enhancing the discriminative power of these descriptors. This, in turn, enables D2-Net to excel in tasks that demand highly reliable keypoint matching, such as image stitching, visual SLAM, and SfM.

LightGlue, an emerging keypoint matcher, is poised to impact large-scale image-matching tasks significantly. Developed by researchers at Stanford University, LightGlue adopts a lightweight approach to keypoint matching. It prioritizes efficiency without compromising accuracy, making it suitable for real-time applications. LightGlue’s minimalist design ensures minimal computational overhead while maintaining robustness in keypoint matching across various environmental conditions. This efficiency-driven approach positions LightGlue as a promising solution for resource-constrained scenarios, such as mobile robotics, embedded systems, and edge computing.

SuperGlue, D2-Net, and LightGlue exemplify the evolution of machine learning-based keypoint matching techniques. These methods advance the state of the art in terms of accuracy and robustness and cater to diverse application scenarios, from high-precision computer vision tasks to resource-efficient real-time applications.

2.4. Hazard-Detection and Avoidance

Prior efforts have been made in the development of Hazard Detection and Avoidance (HDA). Epp et al. introduce the ALHAT project, which employs advanced sensors, guidance algorithms, and simulation tools to achieve precise lunar landings. Integrating systems engineering, GNC technologies, and hardware-in-the-loop testing aims to navigate and land safely on the lunar surface, regardless of terrain or lighting conditions. The project emphasizes autonomy in detecting and avoiding hazards, using technologies like Terrain Relative Navigation and LIDAR for accurate landings within stringent requirements, thus supporting future crewed, cargo, and robotic lunar missions without reliance on pre-existing lunar navigation systems [24].

Crane explores the integration of autonomous hazard detection and avoidance (HDA) into the guidance system for lunar landers, focusing on small spacecraft. It develops an information-seeking control strategy that adapts the landing trajectory to improve visual data collection for hazard mapping. It uses a monocular camera for its suitability for missions with strict mass and power constraints [25].

Yu et al. present a crater detection and matching method to improve visual navigation during the descent phase of planetary landings. It introduces an image region pairing method for detecting craters and employs a winner-takes-all (WTA) rule for crater matching, aiming to enhance landing precision by reducing false detection and matching rates using real and simulated planetary images [26].

Jiang et al. discuss two innovative schemes for autonomous hazard detection and avoidance (AHDA) in planetary landings. The first, an explicit relay AHDA strategy, employs infrared and radar sensors for coarse and precise hazard detection, improving robustness under various lighting and atmospheric conditions. The second scheme, an implicit relay AHDA strategy, uses a single 3D imaging sensor for both detection phases, offering a streamlined approach adaptable to different mission requirements [27].

IIyama et al. introduce a deep reinforcement learning framework for autonomously selecting safe landing sites and planning divert maneuvers for lunar landings. By considering the terrain’s safety and the feasibility of divert maneuvers during descent, this framework simultaneously optimizes landing site selection and guidance strategies. It achieves a high success rate in simulations of challenging landing scenarios by effectively updating the target landing site and control parameters in response to new observations. It showcases the potential for improving safety and efficiency in planetary landings [28].

Finally, Ghilardi and Furaro propose a solution using vision transformers to extract terrain features for low-light RGB. Their solution is compared with a U-Net architecture and shows better results when extracting information to detect hazards in a low-light environment [29].

It is essential to note that the implementation of HDA systems varies significantly due to mission-specific landing requirements, the availability of sensors, and computational constraints [30,31]. A notable recent implementation of HDA technology was utilized during the landing of the Mars 2020 Perseverance Rover [32]. This system integrated terrain-relative navigation (TRN) technology with a camera that resized imagery for real-time processing against a pre-loaded map, developed with the assistance of the Mars Reconnaissance Orbiter [8]. Although these systems require further testing, they are expected to gain familiarity as more missions head to the Moon and other celestial bodies. Newer approaches incorporate hazard footprint and spacecraft geometry but demand substantial computational power [33]. With numerous missions planned for the Moon and beyond, safe landing systems have been developed and implemented, especially for the upcoming CLPS missions [34,35,36] scheduled for 2024. These missions will collect valuable data to validate HDA algorithms based on cameras. However, it is worth noting that one limitation of these algorithms is their processing efficiency, often resulting in sparse results that limit measurement resolution, such as the slope angle. While prior research and the current state-of-the-art assume LiDAR technology’s readiness for spaceflight [15,28,37], it is essential to recognize that this active sensor is still undergoing development for space applications, particularly regarding power consumption, mass, and volume [18,38]. As mentioned earlier, CLPS missions will test some of the first LiDAR systems with a constrained number of lasers, providing limited data primarily for odometry. Consequently, passive sensors like cameras remain the most viable choice for these applications despite their sensitivity to varying light conditions. One of the foremost challenges in hazard-relative navigation is achieving almost “real-time” HDA. Providing waypoints to the guidance, navigation, and control (GNC) system involves multiple steps. Classical computer vision methods demand substantial computational resources, leading to time-consuming operations. Consequently, images are often down-sampled (resized) to enhance processing speed at the expense of accuracy. For instance, the detection and segmentation of hazards such as rocks, shadows, and craters [7] is performed in full-resolution images with the assistance of compact, more portable neural networks running in a constraint environment such as a Raspberry Pi.

3. Methodology

3.1. Simulation Environment

In order to simulate the lunar plains, high-resolution lunar maps were obtained to simulate the light conditions during different seasons. The maps were derived by Goddard Space Flight Center using data from NASA’s Lunar Reconnaissance Orbiter [39]. These are some of the most up-to-date high-resolution digital elevation maps (DEMs) of several regions surrounding high-priority lunar south pole landing sites using exclusively laser altimetry data acquired by LRO-LOLA [40]. These DEMs have among the smallest median RMS Z (height) error ≈ 0.30–0.50 m and a median RMS slope error of ≈1.5–2.5°, and some of this variation is also due to the changes in light conditions encountered during the different lunar seasons [41]. Figure 4 shows a mesh from South Pole Malapert Crater 1 constructed from the raw data stored as XYZI in binary double precision floating point tables with four columns—X, Y, Z, and RDRid—where X, Y, and Z are the polar stereographic coordinates (in km) and RDRid is the LOLA RDR ID.

From this giant map, smaller meshes that are closer to the physical criteria of a lunar plain are selected. These meshes are loaded into Blender to provide a rendering framework and control how the light is ray-traced and the Sun’s position. Then, they are subdivided to generate a more detailed surface. To generate the ground truth measurement, the mesh’s point cloud is used to calculate the best plane approximation for that area. The ground truth point cloud of the selected area has approximately 1 million points with a height between

- 0.2

and

0.16

m. Figure 5 illustrates the height map, where yellow is the highest point and deep blue is the lowest. The ground truth slope from this mesh is numerically effectively

0.0

°.

3.2. Spacecraft and Camera

To model the images, a spacecraft trajectory was based on [42], which bases the trajectory of a possible landing scenario for Intuitive Machines’ Nova-C lunar lander. During this maneuver, maintaining an attitude hold while acquiring the images is required to maximize the baseline effectiveness for SfM. The output will then provide a set of hazard-relative candidates to the navigation system, where final filtering is performed to account for available fuel. Figure 6 displays the spacecraft during this approach. Instead of hovering over the surface like the Chinese Chang’e or India’s Vikram lander, the spacecraft performs its analysis as it approaches the landing site. It is similar to Neil Armstrong looking through the Lunar Module window to find a place to land during the final approach when he landed on Apollo 11 [43].

To generate the image, a pinhole camera is assumed as the lens that is used to generate the images is narrow field of view. Table 1 contains the values used to define the pinhole camera intrinsics K.

The camera intrinsics K can be defined as:

K = [\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}] .

(1)

where

f_{x}

and

f_{y}

are the focal lengths, and

c_{x}

and

c_{y}

are the principal points, respectively. Using the values of Table 1, K becomes:

K = [\begin{matrix} 7363.63 & 0 & 1023.5 \\ 0 & 7363.63 & 1023.5 \\ 0 & 0 & 1 \end{matrix}] .

(2)

Figure 7 illustrates the camera coordinate frame. The image coordinate frame also aligns with the

X Y

plane of the camera frame. The image coordinate frame is defined by u and v, where u aligns with X and v with Y, respectively. This defines the camera extrinsics used to project the 2D features when performing SfM to estimate their respective 3D world positions.

The extrinsics for each image T can be defined using the a homogeneous transform as

T = [\begin{matrix} R_{3 x 3} & t_{3 x 1} \\ 0_{1 x 3} & 1 \end{matrix}] .

(3)

where R represents the attitude rotation and t the translation vector of the camera per image.

3.3. HDA, SfM, and Slope Estimation

Hazard detection and avoidance and structure-from-motion are pivotal in ensuring the safe landing of spacecraft on planetary bodies like the Moon. As detailed in the papers on lunar landings [35,42], HDA systems use advanced algorithms to quickly identify and avoid potential hazards such as rocks, craters, and uneven terrain within seconds of landing. This rapid assessment is crucial in a space environment where real-time decisions are necessary for the safety of the mission.

SfM, a technique used in these HDA systems, plays a crucial role by creating 3D models of the landing area from 2D images. It aids in accurately assessing the topography of the landing site, allowing for a better understanding of the terrain’s slope, roughness, and other 3D derived features. This information is essential for identifying the safest possible landing site. To assess the SfM component, the matched features are processed using a simple pipeline using direct linear transformation (DLT) [22]. This is achieved using the OpenCV implementation of triangulation [44]. There are other methods that might be more optimal for the two-image approach for example where the solution can be reduced to solving a polynomial transforming iterative to a linear system approximation with low reprojection error [45].

The 3D points obtained from the OpenCV triangulation are then processed using singular value decomposition (SVD) to find the normal vector to them and define a plane. In order to obtain the slope, the point cloud

\vec{P C} = (x, y, z)

composed of the triangulated coordinates is processed using SVD from which can be obtained

U, Σ, V

.

U, Σ, V = S V D (\vec{P C}) .

(4)

From the decomposition, U is used to extract the smallest singular vector

\vec{p}

and can be used to the fit the coefficients of a plane

a x + b y + c z = 0

. Then, with the known plane coefficients and the local gravity vector

g = (0, 0, 1)

, the plane’s normal vector angle can be measured using the following expression:

S l o p e = a r c c o s (\frac{| c |}{| | \vec{p} | |}) .

(5)

Finally, the slope is converted from radians to degrees. It is important as well to consider that if the slope is greater than 90 degrees, then it will be calculated as

S l o p e = 180 - S l o p e_{S V D} .

(6)

Figure 8 illustrates a pair of images with the detected features. The right image is the reconstructed point cloud with the plane fit to calculate the slope. The green dots are features matched successfully, and the red dots are the reprojected points.

3.4. Learning-Based Methods Configuration

Transfer learning was used to evaluate the effectiveness of the learning-based methods and understand how well this method could adapt to this scenario without any further training. It has been proven that transfer learning is an excellent tool for initializing an ML network, which can be tweaked depending on the application’s requirements [46]. The networks were not tweaked or tuned for this experiment, and the hyperparameters were left to their default initial training. SuperPoint and SuperGlue were trained using the MS-COCO dataset [47]. The weights used are the trained defaults on their public repositories [48]. The SuperPoint encoder architecture is similar to a VGG-like [49] network composed of eight 3 × 3 convolution layers sized 64-64-64-64-128-128-128-128. For every two layers, there is a 2 × 2 max pool layer. Each decoder head has a single 3 × 3 convolutional layer of 256 units, followed by a 1 × 1 convolution layer with 65 and 256 units for the interest point detector and descriptor, respectively. Additionally, all of the convolution layers in the network have a ReLU non-linear activation and BatchNorm normalization [12,14].

ALIKED was trained using the MegaDepth [50] dataset, and the weights are available on their public repository [51]. This architecture consists of three components: feature encoding, aggregation, and keypoint and descriptor extraction. Due to the size of the network, a brief description is provided, but the reader is encouraged to read the original implementation paper. The feature encoder extracts low-level features at different scaling levels; these are then aggregated to define a feature for keypoint detection and extraction. Then, the score map head, via a convolution, identifies a score map of the feature, which is processed using a non-maximum suppression to identify local maxima [13].

DISK was trained using a custom dataset curated by EPFL, and its weights are also available on its public repository [52]. This network use a variation of U-Net [53]. The model has four down-and-up blocks consisting of a single convolutional layer with 5 × 5 kernels. It also includes instance normalization instead of BatchNorm normalization and PReLU nonlinearities. This model has approximately 1.1 M parameters, with a formal receptive field of 219 × 219 pixels [9].

Finally, for feature matching, a Jupyter notebook is available with the implementation of LightGlue [54,55]. This implementation gives the reader different examples to interact with various feature detectors. LightGlue is based on the newer advantages of transformers. The backbone is a transformer that assigns a state to each local feature, which is then verified using a self-attention layer. Then, using a cross-attention layer, each state is compared, and only the best scores are filtered and passed to a sigmoid function to determine the match based on the similarity and matchability scores. The smart benefit of this methodology is that analyzing this metric can prune features that are not matchable at all, reducing the dataset and computation time [10].

4. Results and Discussion

The following experiment includes the evaluation of the algorithm closer to the surface where the phenomena can make the surface shine higher towards the camera. The chosen altitude for the trajectory is 400 m, and images are taken 50 m apart. The spacecraft is holding an attitude to avoid rotations in the trajectory and maximize the chance of matching features to reconstruct the surface as it is heading to the intended landing site. Figure 9 illustrates a trajectory used to evaluate the feature detection that seems similar but is unique and highly uniform. The left image is the first image acquired at

t_{0}

. The center image was acquired at

t_{1}

50 m after the first image. The right image is a normal map of the surface showing the details of the missing lunar landscape and how light bounces on the terrain.

4.1. Feature Matching Example—ORB and SIFT

Figure 10 illustrates a comparison between features found and successfully matched using traditional SIFT and ORB using the OpenCV library [44]. These images illustrate a spacecraft moving towards the front of the camera with a baseline of 50 m in between and 400 m altitude. The camera is pointing 45° from the spacecraft nadir axis. Both detectors were set to find 8192 keypoints. The keypoints were matched using a nearest-neighbor brute force matcher. ORB would compare distance error between points using the Hamming norm, and SIFT would use the L2 norm.

This first filtered matches were subjected to Lowe’s ratio test. Finally, these matches were filtered by finding the points that best fitted the homography between both images, providing a good solution to perform the triangulation with SfM. However, given the light conditions, the detected points decreased the number, thereby lowering the resolution of the landing site. For the darker surface, the points were 1555 and 2522 for ORB and SIFT, respectively. For the lunar plain, the points were 1464 and 2743 for ORB and SIFT, respectively. It is also important to understand that this example was developed with a set of images with more details and contrast at two different sets of light conditions in order to show that the traditional feature matching can work as long as there are enough patterns or structures on the image.

4.2. Feature Matching Testing on a Desktop Computer

Figure 11 illustrates the testing of the methods mentioned before. The first set of matches were performed using ORB and detected 100 features and a slope of

0.05

°. For the second case, SuperPoint detected 79 features and a slope of

2.1

°. For the third case, ALIKED detected 88 features and a slope of

5.56

°. For the fourth and last case, DISK detected 3291 features and a

0.03

°. These runs were performed on a full-resolution image (2048 by 2048). When dealing with a practical situation in a limited setting, having this amount of data to analyze simultaneously is excessive. Therefore, Section 4.3 will illustrate results on an implemented pipeline wherein the image is split efficiently using quadtrees as explained in [42]. Additionally, these results are obtained using floating point precision in a regular desktop computer with an AMD Ryzen 7 3700x 8-core processor (AMD, Santa Clara, CA, USA) and 32 GB of RAM. Table 2 provides a brief summary of the results for this run. The best results from this method will be implemented in the embedded hardware.

4.3. Jetson TX2 Results

Flight software development and validation are always complex [56]. Notably, for complex deep learning models, these environments can make deployment difficult and complex to benchmark, depending on the size of the network and memory requirements. Most common machine learning frameworks, such as PyTorch or TensorFlow, are built using Python to make them more accessible to the scientific community; this can make the development of architectures easy, but deployment on embedded hardware is complex. For this purpose, an NVIDIA Jetson TX2 running Ubuntu 18 (NVIDIA, Santa Clara, CA, USA) is set up with the Open Neural Network Exchange (ONNX) standard to run machine learning models that will optimize the network to the specific pipeline. A significant caveat of this implementation is that the numerical precision that was obtained by a desktop computer will be affected due to the quantization to less precision for the constrained ARM processor of the TX2. The ONNX model is optimized and exported using the tools in [55] with a floating point accuracy of 16 bits. Figure 12 illustrates a quick flowchart of how this process is achieved using the mentioned methods and default pre-trained networks.

The two ML architectures implemented and evaluated were the DISK feature detector and SuperPoint feature detector combined with the LightGlue feature matcher. As illustrated before, the first combination has proven to be the best in detecting quality features in these unstructured environments. The second one has been proven to generate high-quality matches in many indoor and outdoor scenarios. Figure 13 and Figure 14 illustrate the feature matching in the TX2 for different resolutions and the estimated slope for the cases of SuperPoint and DISK, respectively. Table 3 provides the metrics from the different resolutions and the estimated slope.

Although the main point of this paper is to examine the use of these learning-based methods on lunar light plains, some additional cases were evaluated to provide further validity of the best method (DISK + LightGlue) under different light conditions and a different slope. Figure 15 shows a variation on the surface from 0° to 2°. A small, barely noticeable difference, the center and right depict the two images showing the motion.

Figure 16 shows the different results of using DISK and LightGlue on a 2° slope for two different resolutions. These images shows a different component of the feature matching. Green dots represent the matched features and red dots represent the projected features to verify an accurate 3D triangulation before performing the slope analysis. Table 4 provides the different measured metrics.

Finally, as a benchmark to show the robustness of DISK + LightGlue under different light conditions, Figure 17 shows the different results for the surfaces seen in Figure 3. It is visible in this example how effective the method is at still detecting complete surfaces under the varying conditions. This allows the system to react better under highly varying conditions. Table 5 shows the different metrics from this analysis. As the system reaches either extreme, observability will vary depending on the features on the surface. Notice how one of the darker scenarios creates an outlier from the triangulation process. This outlier generated a slope of ≈3.2°. RANSAC can remove this from the point cloud before fitting the plane and obtain a more accurate result. For the effect of viewing the raw data, the outlier was left on the calculation, increasing the error in the slope estimate. For the lander pictured in Figure 6, a slope of 10° is the limit before tilting [42], so even with this outlier, the plain is still suitable if chosen as a landing site.

4.4. Discussion

All the methods highlighted in this paper have their advantages and disadvantages. ORB and SIFT have been tested extensively and would work great in structured environments with specific light conditions. Only some ML methods are ready for production environments, SuperPoint with SuperGlue or DISK with LightGlue, for example. Others, such as ALIKED, require extra porting of some network layers to make them compatible with the limitations encountered in embedded devices. SuperPoint, although it has been trained on thousands of 240 × 320 images using the MS-COCO dataset, still requires some network tuning for specific tasks such as these that have illumination changes such that the image could be almost washed out due to the sunlight angle of incidence on the surface and the camera pointing direction. Numerical precision did not seem to affect the feature detection, but image resolution in the constrained environment did. On the other hand, DISK performed ideally even in the most complex lighting scenarios, as it detects many features even at small resolutions where detail could be lost. This advantage is significant because, based on the HDA logic using quadtree [42], it is more efficient to analyze a certain amount of regions of interest (ROIs) that could be safe for landing than to analyze the complete image and deal with too many data points. At 400 m in altitude, for the camera described previously, an ROI of ≈20 × 20 m is equivalent to a 256 × 256 image patch. This means that the algorithm can process four landing sites in 7 s. The benefit of the DISK + LightGlue pipeline is that it is more robust to feature matching and detection in more illumination ranges as seen previously, providing a better chance of safely estimating the slope of an unstructured landing site. All of these benefits occur while running in a constrained environment.

5. Conclusions

This paper has presented a comprehensive study comparing traditional feature-matching techniques to advanced ML approaches for dense feature matching in unstructured scenarios, focusing on lunar light plains. Our research demonstrates that learning-based methods significantly outperform traditional methods in environments characterized by few distinct features and challenging lighting conditions, which are common in extraterrestrial exploration scenarios like those encountered on the Moon’s surface. Through experimental analysis, simulated lunar-like landscapes, and flight-like computational hardware, we have shown that machine learning-based feature detection and matching provide denser, more accurate feature maps essential for effective SfM reconstruction, particularly slope estimation (as seen in Table 3), critical for spacecraft safety. This improvement is pivotal for autonomous navigation systems operating in unpredictable, feature-sparse landscapes without prior terrain knowledge or extensive ground intervention. Our findings highlight the potential of machine learning to revolutionize HDA in planetary and asteroid exploration by offering a more adaptive, efficient, and reliable approach to navigating and mapping uncharted terrains. DISK combined with LightGlue proved to be an efficient method that generates surfaces of the highest quality in feature-sparse environments, providing navigation with better features whether used by the system for TRN or for rebuilding a surface for HDA. Furthermore, the successful implementation of learning-based feature detection and matching methods underscores the need for continued research and development in this field. Future work should focus on optimizing these ML algorithms for real-time applications in space exploration using limited flight hardware, including further reducing computational requirements and improving the robustness of feature detection under a broader range of environmental conditions or including sensor fusion with other sensors such as Doppler radar or LiDAR. In summary, our study provides empirical evidence supporting the advantages of learning-based methods over traditional feature-matching techniques in handling the unique challenges of space exploration in different illumination and structure scenarios to provide navigation references. This research contributes to the ongoing efforts to explore and understand our celestial bodies more safely and effectively by paving the way for more sophisticated autonomous navigation and hazard detection systems. Launches to the Moon are synchronized with the light conditions, as the vision systems’ navigation capabilities are tied to how the algorithm processes the data. Learning methods like DISK and LightGlue pave the way for spacecraft to fly in less constrained scenarios, increasing the range of operation and making it less reliant on these specific launch windows.

Author Contributions

Conceptualization, methodology, software, and writing—original draft, D.P.; resources, supervision, and writing—review & editing, T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Simulation data were generated at MicaPlex, Embry–Riddle Aeronautical University. Derived data supporting the findings of this study are available from the corresponding author—D.P.—on request.

Acknowledgments

The authors would like to thank Madhur Tiwari for insightful conversations and proofreading. We also want to acknowledge the use of imagery from Lunar QuickMap (https://quickmap.lroc.asu.edu (accessed on 24 April 2024)), a collaboration between NASA, Arizona State University and Applied Coherent Technology Corp.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ORB	Oriented FAST and Rotated BRIEF
SIFT	Scale-Invariant Feature Transform
ML	Machine Learning
ALIKED	Affine-Local Invariant Keypoint-based Extractor and Descriptor
DISK	DIScrete Keypoints
SfM	Structure-from-Motion
LiDAR	Light Detection And Ranging
TRN	Terrain Relative Navigation
CLPS	Commercial Lunar Payload Services
SLAM	Simultaneous Localization and Mapping
HDA	Hazard Detection and Avoidance
GNC	Guidance, Navigation, and Control
DEM	Digital Elevation Maps
DLT	Direct Linear Transform
ONNX	Open Neural Network Exchange

References

Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2564–2571. [Google Scholar]
Hapke, B.; Nelson, R.; Smythe, W. The opposition effect of the moon: Coherent backscatterandshadow hiding. Icarus 1998, 133, 89–97. [Google Scholar] [CrossRef]
Meyer, C.H.; Robinson, M.; Denevi, B.; Boyd, A. A new global map of light plains from the Lunar Reconnaissance Orbiter. In Proceedings of the 49th Lunar and Planetary Science Conference, Woodlands, TX, USA, 19–23 March 2018; Lunar and Planetary Institute: Houston, TX, USA, 2018. Available online: https://www.hou.usra.edu/meetings/lpsc2018/pdf/1474.pdf (accessed on 10 January 2024).
Meyer, H.M.; Denevi, B.W.; Robinson, M.S.; Boyd, A.K. The Global Distribution of Lunar Light Plains From the Lunar Reconnaissance Orbiter Camera. J. Geophys. Res. Planets 2020, 125, e2019JE006073. [Google Scholar] [CrossRef]
NASA/ASU/ACT. 2023. Lunar QuickMap. Available online: https://quickmap.lroc.asu.edu/help?extent=-90%2C-26.8649195%2C90%2C33.7123568&id=lroc&layerListFilter=&showTerrain=true&queryOpts=N4XyA&trailType=0&layers=NrBsFYBoAZIRnpEBmZcAsjYIHYFcAbAyAbwF8BdC0ypZaOAThkQRXWUwW0nyJqoCKQA&proj=10 (accessed on 2 February 2024).
Posada, D.; Jordan, J.; Radulovic, A.; Hong, L.; Malik, A.; Henderson, T. Detection and Initial Assesment of Lunar Landing Sites Using Neural Networks. In Proceedings of the 2022 AAS/AIAA Astrodynamics Specialist Conference, Charlotte, NC, USA, 7–11 August 2022. [Google Scholar]
Cheng, Y.; Ansar, A.; Johnson, A. Making an onboard reference map From MRO/CTX imagery for Mars 2020 lander vision system. Earth Space Sci. 2021, 8, e2020EA001560. [Google Scholar] [CrossRef]
Tyszkiewicz, M.; Fua, P.; Trulls, E. DISK: Learning local features with policy gradient. Adv. Neural Inf. Process. Syst. 2020, 33, 14254–14265. [Google Scholar]
Lindenberger, P.; Sarlin, P.E.; Pollefeys, M. LightGlue: Local Feature Matching at Light Speed. arXiv 2023, arXiv:2306.13643. [Google Scholar]
Lawrence, S.; Robinson, M.; Broxton, M.; Stopar, J.; Close, W.; Grunsfeld, J.; Ingram, R.; Jefferson, L.; Locke, S.; Mitchell, R.; et al. The Apollo digital image archive: New research and data products. In Proceedings of the NLSI Lunar Science Conference, Moffett Field, CA, USA, 20–23 July 2008; Volume 2066. [Google Scholar]
DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 224–236. [Google Scholar]
Zhao, X.; Wu, X.; Chen, W.; Chen, P.C.; Xu, Q.; Li, Z. ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation. IEEE Trans. Instrum. Meas. 2023, 72, 5014016. [Google Scholar] [CrossRef]
Sarlin, P.E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4938–4947. [Google Scholar]
Johnson, A.E.; Klumpp, A.R.; Collier, J.B.; Wolf, A.A. Lidar-based hazard avoidance for safe landing on Mars. J. Guid. Control Dyn. 2002, 25, 1091–1099. [Google Scholar] [CrossRef]
Restrepo, C.I.; Chen, P.T.; Sostaric, R.R.; Carson, J.M. Next-generation nasa hazard detection system development. In Proceedings of the AIAA Scitech 2020 Forum, Orlando, FL, USA, 6–10 January 2020; p. 0368. [Google Scholar]
Brady, T.; Bailey, E.; Crain, T.; Paschall, S. ALHAT system validation. In Proceedings of the 8th International ESA Conference on Guidance, Navigation and Control Systems, Loutraki, Greece, 5 June 2011; p. JSC-CN-23833. [Google Scholar]
Amzajerdian, F.; Pierrottet, D.; Petway, L.B.; Hines, G.D.; Roback, V.E.; Reisse, R.A. Lidar sensors for autonomous landing and hazard avoidance. In Proceedings of the AIAA Space 2013 Conference and Exposition, San Diego, CA, USA, 10–12 September 2013; p. 5312. [Google Scholar]
Cummings, C. Impact Story: Navigation Doppler LIDAR. News, 24 April 2023. Available online: https://www.psionicnav.com/news/impact-story-navigation-doppler-lidar (accessed on 15 January 2024).
Schonberger, J.L.; Frahm, J.M. Structure-from-motion revisited. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
Campos, C.; Elvira, R.; Gómez, J.J.; Montiel, J.M.M.; Tardós, J.D. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Dusmanu, M.; Rocco, I.; Pajdla, T.; Pollefeys, M.; Sivic, J.; Torii, A.; Sattler, T. D2-net: A trainable cnn for joint description and detection of local features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8092–8101. [Google Scholar]
Epp, C.D.; Smith, T.B. Autonomous precision landing and hazard detection and avoidance technology (ALHAT). In Proceedings of the 2007 IEEE Aerospace Conference, New Orleans, LA, USA, 23–27 September 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 1–7. [Google Scholar]
Crane, E.S. Vision-Based Hazard Estimation during Autonomous Lunar Landing. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 2014. [Google Scholar]
Yu, M.; Cui, H.; Tian, Y. A new approach based on crater detection and matching for visual navigation in planetary landing. Adv. Space Res. 2014, 53, 1810–1821. [Google Scholar] [CrossRef]
Jiang, X.; Li, S.; Tao, T. Innovative hazard detection and avoidance strategy for autonomous safe planetary landing. Acta Astronaut. 2016, 126, 66–76. [Google Scholar] [CrossRef]
Iiyama, K.; Tomita, K.; Jagatia, B.A.; Nakagawa, T.; Ho, K. Deep reinforcement learning for safe landing site selection with concurrent consideration of divert maneuvers. arXiv 2021, arXiv:2102.12432. [Google Scholar]
Ghilardi, L.; Furfaro, R. Image-Based Lunar Hazard Detection in Low Illumination Simulated Conditions via Vision Transformers. Sensors 2023, 23, 7844. [Google Scholar] [CrossRef] [PubMed]
Villalpando, C.Y.; Johnson, A.E.; Some, R.; Oberlin, J.; Goldberg, S. Investigation of the tilera processor for real time hazard detection and avoidance on the altair lunar lander. In Proceedings of the 2010 IEEE Aerospace Conference, Big Sky, MN, USA, 6–13 March 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–9. [Google Scholar]
Johnson, A.E.; Keim, J.A.; Ivanov, T. Analysis of flash lidar field test data for safe lunar landing. In Proceedings of the 2010 IEEE Aerospace Conference, Big Sky, MN, USA, 6–13 March 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–11. [Google Scholar]
Aaron, S.B.; Cheng, Y.; Trawny, N.; Mohan, S.; Montgomery, J.; Ansari, H.; Smith, K.; Johnson, A.E.; Goguen, J.; Zheng, J. Camera Simulation For Perseverance Rover’s Lander Vision System. In Proceedings of the AIAA SCITECH 2022 Forum, San Diego, CA, USA, 3–7 January 2022; p. 0746. [Google Scholar]
Nelson, J.D.; Schaub, H. Landing Site Selection Using a Geometrically Conforming Footprint on Hazardous Small Bodies. J. Spacecr. Rocket. 2021, 59, 889–899. [Google Scholar] [CrossRef]
Posada, D. An Open Source, Autonomous, Vision-Based Algorithm for Hazard Detection and Avoidance for Celestial Body Landing. Master’s Thesis, Embry–Riddle Aeronautical University, Daytona Beach, FL, USA, 2020. [Google Scholar]
Getchius, J.; Renshaw, D.; Posada, D.; Henderson, T.; Ge, S.; Molina, G. Hazard Detection And Avoidance For The Nova-C Lander. In Proceedings of the 44th Annual American Astronautical Society Guidance, Navigation, and Control Conference, Breckenridge, CO, USA, 4–9 February 2022. [Google Scholar]
Owens, C.; Macdonald, K.; Hardy, J.; Lindsay, R.; Redfield, M.; Bloom, M.; Bailey, E.; Cheng, Y.; Clouse, D.; Villalpando, C.Y.; et al. Development of a signature-based terrain relative navigation system for precision landing. In Proceedings of the AIAA Scitech 2021 Forum, Orlando, FL, USA, 11 January 2021; p. 0376. [Google Scholar]
Moghe, R.; Zanetti, R. A Deep learning approach to Hazard detection for Autonomous Lunar landing. J. Astronaut. Sci. 2020, 67, 1811–1830. [Google Scholar] [CrossRef]
Roback, V.; Bulyshev, A.; Amzajerdian, F.; Reisse, R. Helicopter flight test of 3D imaging flash LIDAR technology for safe, autonomous, and precise planetary landing. In Proceedings of the Laser Radar Technology and Applications XVIII. International Society for Optics and Photonics, Bellingham, WA, USA, 4 June 2013; Volume 8731, p. 87310H. [Google Scholar]
Vondrak, R.; Keller, J.; Chin, G.; Garvin, J. Lunar Reconnaissance Orbiter (LRO): Observations for lunar exploration and science. Space Sci. Rev. 2010, 150, 7–22. [Google Scholar] [CrossRef]
Barker, M.K.; Mazarico, E.; Neumann, G.A.; Smith, D.E.; Zuber, M.T.; Head, J.W. Improved LOLA elevation maps for south pole landing sites: Error estimates and their impact on illumination conditions. Planet. Space Sci. 2021, 203, 105119. [Google Scholar] [CrossRef]
Mazarico, E.; Neumann, G.; Smith, D.; Zuber, M.; Torrence, M. Illumination conditions of the lunar polar regions using LOLA topography. Icarus 2011, 211, 1066–1081. [Google Scholar] [CrossRef]
Posada, D.; Jordan, J.; Radulovic, A.; Hong, L.; Malik, A.; Henderson, T. Detection and Initial Assessment of Lunar Landing Sites Using Neural Networks. arXiv 2022, arXiv:2207.11413. [Google Scholar]
MSC/TRW A-50. In Apollo Mission 11, Trajectory Reconstruction and Postflight Analysis; NASA: Washington, DC, USA, 1970; Volume 1.
Bradski, G. The OpenCV Library. Dr. Dobb’S J. Softw. Tools 2000, 25, 120–123. [Google Scholar]
Henry, S.; Christian, J.A. Absolute triangulation algorithms for space exploration. J. Guid. Control. Dyn. 2023, 46, 21–46. [Google Scholar] [CrossRef]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 1–40. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
Pautrat, R. GitHub SuperPoint. 2018. Available online: https://github.com/rpautrat/SuperPoint (accessed on 24 April 2024).
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Li, Z.; Snavely, N. Megadepth: Learning single-view depth prediction from internet photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2041–2050. [Google Scholar]
Shiaoming. GitHub ALIKED. 2022. Available online: https://github.com/Shiaoming/ALIKED (accessed on 24 April 2024).
CVLAB-EPFL. GitHub Disk. 2020. Available online: https://github.com/cvlab-epfl/disk (accessed on 24 April 2024).
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
CVG-ETH. GitHub LightGlue. 2023. Available online: https://github.com/cvg/LightGlue (accessed on 24 April 2024).
Sim, F. GitHub LightGlue ONNX. 2023. Available online: https://github.com/fabio-sim/LightGlue-ONNX (accessed on 24 April 2024).
Eleffendi, M.; Posada, D.; Akbas, M.I.; Henderson, T. NASA/GSFC’s Flight Software Core Flight System Implementation For A Lunar Surface Imaging Mission. arXiv 2022, arXiv:2204.07015. [Google Scholar]

Figure 1. Sample of global map of lunar light plains generated using the LROC Quickmap tool [6]. These plains are located across the entire globe including the South Pole where multiple lunar missions are headed.

Figure 2. Example of a structured (Left) and unstructured (Right) surface due to the even terrain, lack of hazards, and light conditions. Images from the Apollo Archive [11].

Figure 3. The exact lunar surface is rendered using Blender at different Sun angles. Depending on the surface material composition at multiple angles, the surface will lose almost all detail, making it difficult for traditional feature matchers.

Figure 4. Blender simulation with an approximately 20 by 20 km Malapert Crater 1 mesh.

Figure 5. Height map of selected area within mesh. An ideal lunar plain candidate has low variation.

Figure 6. Example of trajectory used to generate data and world coordinates. Lunar lander illustration courtesy of Intuitive Machines.

Figure 7. Spacecraft’s camera frames for SfM. The X-axis is also aligned in both the camera and spacecraft coordinate frame. Both images are taken at known times with an estimated pose to triangulate features and build the 3D surface.

Figure 8. SfM pipeline. (Left and Center): Example of two RGB images used for feature extraction. These features are then matched and triangulated to reconstruct a 3D surface by estimating a point cloud. (Right): SfM output and fitted plane on the estimated point cloud. The center image white line shows were the first image ends.

Figure 9. Example of image sequence aqcuired with a baseline of 50 m at 400 m altitude. The difference is barely noticeable to the naked eye, given the light conditions and the surface. Right is the normal map of the to image showing the geometry of the surface.

Figure 10. Example of keypoint matching and detection. Left is ORB; right is SIFT. Blue are detected keypoints, and green are matches using the homography between images as filter.

Figure 11. Initial comparison of all four methods and their respective surface reconstruction. From top to bottom are shown ORB, SuperPoint, ALIKED, and DISK on a desktop computer. Green are matched features and blue are 3D features.

Figure 12. Flowchart illustrating the process of converting the models from x86 to ONNX and pipeline of evaluating their implementation.

Figure 13. Superpoint + LightGlue feature matching and slope estimation running on an NVIDIA Jetson TX2 CPU using the ONNX standard. Green are matched features and blue are 3D features.

Figure 14. DISK + LightGlue feature matching and slope estimation running on an NVIDIA Jetson TX2 CPU using the ONNX standard. Green are matched features and blue are 3D features.

Figure 15. Lunar light plain with a 2° slope. (Left): Blender sim environment, (Center and Right): Trajectory images for SfM.

Figure 16. DISK + LightGlue feature matching and slope estimation running on an NVIDIA Jetson TX2 CPU using the ONNX standard for a GT surface with a slope of 2°.

Figure 17. DISK + LightGlue feature matching and slope estimation running on an NVIDIA Jetson TX2 CPU using the ONNX standard with different light conditions to illustrate robustness for a slope of ≈0°. Green are matched features, reed are reprojected 2D features, and blue are 3D features.

Table 1. Hazard navigation camera intrinsic parameters.

Parameter	Value
Focal Length	16.2 mm
Sensor Size	4.51 mm
Image Width	2048 px
Image Height	2048 px

Table 2. Results from running the different methods in a desktop computer with higher numerical precision.

Pipeline	Resolution (Pixels)	Number of Matches	Slope (Degrees)
ORB	2048 × 2048	100	0.05
Superpoint	2048 × 2048	79	2.1
ALIKED	2048 × 2048	88	5.56
DISK	2048 × 2048	3291	0.03

Table 3. Analysis of time consumed and good matches detected in the Jetson TX2 versus input image resolution for DISK detector and LightGlue matcher.

Pipeline	Resolution (Pixels)	Number of Matches	Slope (Degrees)	≈Time (Seconds)
Superpoint + LightGlue	256 × 256	1	- *	0.86
	512 × 512	23	6.51	2.99
	1024 × 1024	127	0.91	12.41
DISK + LightGlue	256 × 256	513	0.91	7.42
	512 × 512	2220	0.063	35.85
	1024 × 1024	8322	0.033	1494.46

* Single estimated point is not accurate.

Table 4. Good matches detected in the Jetson TX2 versus input image resolution for DISK detector and LightGlue matcher for a light plain with a slope of 2°.

Pipeline	Resolution (Pixels)	Number of Matches	Slope (Degrees)
DISK + LightGlue	256 × 256	437	3.86
DISK + LightGlue	512 × 512	2227	2.322

Table 5. Good matches detected in the Jetson TX2 versus input image resolution for DISK detector and LightGlue matcher for a light plain under varying light conditions for a slope of ≈0°.

Pipeline	Resolution (Pixels)	Number of Matches	Slope (Degrees)
DISK+ LightGlue	256 × 256	513	0.297
	256 × 256	623	0.231
	256 × 256	570	3.221
	256 × 256	488	0.292

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Posada, D.; Henderson, T. Dense Feature Matching for Hazard Detection and Avoidance Using Machine Learning in Complex Unstructured Scenarios. Aerospace 2024, 11, 351. https://doi.org/10.3390/aerospace11050351

AMA Style

Posada D, Henderson T. Dense Feature Matching for Hazard Detection and Avoidance Using Machine Learning in Complex Unstructured Scenarios. Aerospace. 2024; 11(5):351. https://doi.org/10.3390/aerospace11050351

Chicago/Turabian Style

Posada, Daniel, and Troy Henderson. 2024. "Dense Feature Matching for Hazard Detection and Avoidance Using Machine Learning in Complex Unstructured Scenarios" Aerospace 11, no. 5: 351. https://doi.org/10.3390/aerospace11050351

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dense Feature Matching for Hazard Detection and Avoidance Using Machine Learning in Complex Unstructured Scenarios

Abstract

1. Introduction

2. Background

2.1. Lunar Light Plains

2.2. Traditional Feature Matching

2.3. Machine Learning Feature Detection and Matching

2.3.1. Learning-Based Feature Detection

2.3.2. Learning-Based Feature Matching

2.4. Hazard-Detection and Avoidance

3. Methodology

3.1. Simulation Environment

3.2. Spacecraft and Camera

3.3. HDA, SfM, and Slope Estimation

3.4. Learning-Based Methods Configuration

4. Results and Discussion

4.1. Feature Matching Example—ORB and SIFT

4.2. Feature Matching Testing on a Desktop Computer

4.3. Jetson TX2 Results

4.4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI