Next Article in Journal
Investigation of the Internal Flow Characteristics of a Tiltrotor Aircraft Engine Inlet in a Gust Environment
Previous Article in Journal
Ascending Performance of Scientific Balloons with Buoyant Gas–Air Mixture Inflation for Designated Ceiling Height
Previous Article in Special Issue
Characterizing Satellite Geometry via Accelerated 3D Gaussian Splatting
 
 
Article
Peer-Review Record

Redundant Space Manipulator Autonomous Guidance for In-Orbit Servicing via Deep Reinforcement Learning

Aerospace 2024, 11(5), 341; https://doi.org/10.3390/aerospace11050341
by Matteo D’Ambrosio *, Lorenzo Capra, Andrea Brandonisio, Stefano Silvestrini and Michèle Lavagna
Aerospace 2024, 11(5), 341; https://doi.org/10.3390/aerospace11050341
Submission received: 29 March 2024 / Revised: 22 April 2024 / Accepted: 22 April 2024 / Published: 25 April 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Redundant Space Manipulator Autonomous Guidance for In-Orbit Servicing via Deep Reinforcement Learning

Manuscript ID: aerospace-2964549

The article discusses the application of Deep Reinforcement Learning (DRL) to address the issue of path-planning for space manipulators during the motion synchronization phase with a spacecraft equipped with a manipulator identified by a specific plate number. Additionally, the Proximal Policy Optimization (PPO) DRL algorithm is implemented to optimize the guidance law for the manipulator. While the topic is potentially interesting, the authors must address some critical observations in-depth within the article.

1. The motivation behind the research is missing while coming across the introduction.

2.       Since the system dynamics are obtained using the MATLAB library SPART (SPAce Robotics Toolkit) how well the considered parameters will reflect/mimic the real-world dynamics? This needs to be addressed in detail.

3.       It is claimed that the PPO’s loss function provides more stability during training. How does it differ from the TRPOs? Please explain, if possible, compare both techniques, and give them in the results and discussion (Same as Fig. 4).

4.       Some symbols used in the equations are not represented/appropriately explained.

5.       How is the nominal initial manipulator state obtained in Equation 15?

6.       Since the success rate is 100%, how much are the minimum error thresholds (line 234).

7.       For instance, if the first time entry/training using the proposed technique contains more steady-state errors? What will be the system outcome and its performance (Please try to link the result of Fig.9 and Lines 31* to 321.)?

8.       How the possibilities of increasing the succession rate on the final testing. This needs to be addressed as future research/directions.

 

9.       If possible, please try to use the fractional-order controllers to get smaller steady-state errors and which might improve the performance of the proposed technique.

Comments on the Quality of English Language

Some of the minor proof reading errors should be avoided.

Author Response

Please see attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

Please respond to the following questions

- line 49. Please elaborate on "high-dof" for space application how many dof is considered high?

- line 55. The first sentence is redundant. Remove it. "The case study..."

-line 58. Instead of "angular orientation" use "orientation" the angular is redundant since orientation is always about angles 

-line 102. What is LVLH? 

-parameters in eq3 does not have any dimension and are not defined properly 

-why does the translational movement of the target disregard?

-cite some references where eq1 is explained clearly. Since the robot arm is moving with respect to its base, then the states should be relative values. Also in eq2 the states need clear explanation. The orientation R is not clear if ut is Euler values or quaternions. Also how r0 and R0 become q0. This is not explained 

- the authors claimed that PID gaine are tuned through a trial and error process. Please explain exactly the process you used for tuning the gains. As the agent has not been trained before head then what guidance law is used for tuning? 

- in the observation spaces define all the states?

- can you explain why the reward value doesn't reach a stable value in fig 4

 

- RL needs numerous episodes and data gathering to train the agent. In reality how is it possible to implement this method? 

Comments on the Quality of English Language

The manuscript is well written with minor changes.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors answered all my comments. Regarding the real time deployment of this RL method sim2real is a promising method which has been used before. Authors can also consider using offline data. look at:

Ball, Philip J., et al. "Efficient online reinforcement learning with offline data." International Conference on Machine Learning. PMLR, 2023.

The uploaded manuscript is incomplete and does not have the references which should be updated by the authors. 

Author Response

Please see attachment.

Author Response File: Author Response.docx

Back to TopTop