Next Article in Journal
Assessing the Severity of Verticillium Wilt in Cotton Fields and Constructing Pesticide Application Prescription Maps Using Unmanned Aerial Vehicle (UAV) Multispectral Images
Previous Article in Journal
Manipulating Camera Gimbal Positioning by Deep Deterministic Policy Gradient Reinforcement Learning for Drone Object Detection
Previous Article in Special Issue
Establishment and Verification of the UAV Coupled Rotor Airflow Backward Tilt Angle Controller
 
 
Article
Peer-Review Record

Comparison and Optimal Method of Detecting the Number of Maize Seedlings Based on Deep Learning

by Zhijie Jia 1, Xinlong Zhang 2, Hongye Yang 3, Yuan Lu 4, Jiale Liu 4, Xun Yu 3, Dayun Feng 3, Kexin Gao 3, Jianfu Xue 1,*, Bo Ming 3,*, Chenwei Nie 5 and Shaokun Li 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5: Anonymous
Submission received: 13 March 2024 / Revised: 22 April 2024 / Accepted: 26 April 2024 / Published: 28 April 2024
(This article belongs to the Special Issue UAS in Smart Agriculture: 2nd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The authors compared the performance of different models in detecting maize seedlings. The paper is well-written, and its structure is very clear. However, I have a question regarding the image augmentation part. The paper states, 'Training and testing datasets were created from the resulting 14,815 images. The dataset was then divided, with 90% of images used for model training and 10% for validating model performance.' Were the datasets randomly split? If so, this could potentially introduce data leakage issues, as the model might be validated on images similar to those it has already seen during training, albeit with different contrasts or sizes. Ideally, the dataset should be separated based on the fields to prevent such issues. Image augmentation can be applied to the training dataset, but it is unnecessary for the validation dataset. My second question is where the definition of the YOLOv8n is. The transition from YOLOv8 to YOLOv8n is not explained.

Author Response

Dear Reviewer,

Thank you for your constructive and insightful comments on our manuscript. We appreciate your concerns regarding the dataset splitting and the transition from YOLOv8 to YOLOv8n.

*Data Leakage Concerns: *Regarding your query about potential data leakage due to the random splitting of images for training and validation, we recognize the importance of preventing such issues. Although our dataset split was initially random, we aim to ensure that our model can accurately process image data under a variety of real-world conditions. These conditions include different lighting situations and shooting heights, which can significantly impact image quality. Therefore, our data augmentation strategy is specifically designed to simulate these diverse conditions, ensuring the augmentations reflect the environments the model will likely encounter. This approach ensures that our model evaluations are robust and represent performance in real-world applications rather than just under controlled conditions. Additionally, for further validations of the model across different densities, periods, and heights, we used a completely new set of images that were not subjected to any form of image enhancement.

*Transition from YOLOv8 to YOLOv8n: *We realize that we did not provide a detailed explanation of the modifications from YOLOv8 to YOLOv8n in our original submission. YOLOv8n introduces several architectural and algorithmic refinements aimed at improving detection accuracy and processing speed, particularly in computational resource-constrained environments. We will include a detailed comparison and definition of these changes in the revised version of our manuscript.

We hope these clarifications address your concerns. We have incorporated all relevant details into the revised manuscript to ensure it meets the journal’s standards. We look forward to your further guidance and feedback.

Thank you once again for your valuable time and thoughtful critique.

Sincerely,

Zhijie Jia

Reviewer 2 Report

Comments and Suggestions for Authors

Comments:

1.       Highlight the novelty and importance of your research in your abstract.

2.       Add 1 to 2 keywords, please.

3.       Unify the literature in accordance with the guidelines.

4.       Complete the literature sources both in the methodology and in the analysis of the results.

5.       Improve the readability of diagrams 1, 4, 8, 10.

6.       The analysis of some diagrams is too general. For example, the analysis in lines 297 to 310 is not precise.

7.       "Stellar performance was also observed under cooperative and individual farmer planting conditions." Please explain - why?

 

Comments on the Quality of English Language

 Minor editing of English language required.

Author Response

Dear Reviewer,

Thank you for your detailed review and valuable suggestions. Below are our specific responses to your comments and the corresponding changes made in the revised manuscript:

Novelty and Importance in the Abstract: Thank you for pointing out the need to emphasize the novelty and importance of our research in the abstract. We have revised this section to better highlight the innovative aspects of our study and its potential contributions to the field.

Addition of Keywords: Based on your suggestion, we have added two keywords to better reflect the theme and scope of our paper.

Unification of Literature References: We have carefully checked and standardized all literature citations according to the journal's guidelines to ensure consistency and accuracy.

Completion of Literature Sources: In the methodology and results analysis sections, we have supplemented the required literature sources to ensure the traceability and verifiability of our research.

Improvement of Diagrams' Readability: Following your advice, we have redesigned Diagrams 1, 4, 8, and 10, improving labels, colors, and layouts to enhance their clarity and readability.

Specificity in Diagram Analysis: In response to your comment about the general analysis in lines 297 to 310, we have expanded and specified this section, providing more detailed data support and explanations.

Explanation of 'Stellar Performance Under Cooperative and Individual Farmer Planting Conditions': You asked for an explanation of this phenomenon. We have supplemented the text with relevant data and analysis, specifically noting that the observed rRMSE values were all below 1.23%, demonstrating the model's high adaptability and accuracy across different planting environments. We further discussed the potential mechanisms behind this performance, including how the model effectively handles diverse planting patterns and environmental variables.

We look forward to your further guidance and feedback, hoping that the revised manuscript meets the standards of the journal. Thank you once again for your attention to and constructive suggestions for our work.

Sincerely,

Zhijie Jia

Reviewer 3 Report

Comments and Suggestions for Authors

In this study, the authors investigate the significance of quantifying early maize seedlings and explore how factors such as planting density, flight altitude, and growth stage influence detection accuracy.

To enhance the clarity of the contributions made by this work, it is essential to provide a more detailed explanation of its novel findings and implications.

Furthermore, emphasizing the importance of detailing the preprocessing steps applied to the images, including data augmentation techniques, prior to their use in the experiments is crucial.

Additionally, making the dataset publicly available online and highlighting the hyperparameters used for training will greatly contribute to the reproducibility of the experiments.

Comments on the Quality of English Language

The paper is generally well-written and easy to understand. The language used is clear, and the concepts are effectively explained. However, it may be beneficial to consider reducing the text and shortening some sections to enhance readability.

Author Response

Dear Reviewer,

Thank you very much for your detailed review and valuable comments, which have greatly assisted us in improving the quality of our manuscript.

*Detailed Explanation of New Findings and Their Significance:*You pointed out the need for a more detailed explanation of our research findings and their significance, and we completely agree. In the revised manuscript, we have expanded the introduction and supplemented the discussion section. These additions will help readers better understand our research results and their potential value in practical applications.

*Detailed Description of Image Preprocessing and Data Augmentation Techniques:*We recognize that our initial submission did not adequately describe the image preprocessing and data augmentation steps in the experimental section. Therefore, we have added detailed information on the importance of data augmentation techniques in the revised manuscript. This information will ensure that other researchers can precisely replicate our experimental conditions.

*Importance of Publicly Sharing the Dataset and Describing Hyperparameters:*To enhance the transparency and reproducibility of our paper, we have decided to publicly share the dataset we used. We believe this will greatly assist our peers in verifying and extending our research findings.

We look forward to your further guidance and feedback, and hope that the revised manuscript meets the publication standards of the journal. Thank you once again for your attention to and insightful comments on our work.

Sincerely,

Zhijie Jia

Reviewer 4 Report

Comments and Suggestions for Authors

Major:

1. The modified version YOLOv8n must be compared with default YOLOv8n model to measure improvement.

2. The modified version YOLOv8n must be compared with default YOLOv8m and YOLOv8l models, because maybe it is more simple to select another default model neither make improvement.

Minor:

[Line 23] unclear description about what stages is discussed.

[Line 66-84] Different reference system is applied.

[Lines53-55] Skeletnization is not traditional preprocessing technique. Source [8] was not found and referenced incorrectly.

[Line 57] What technology was evolved? (unclear)

[Lines 51-53] Well-known definitions are referenced (sources [4-7]).

[Line 87] "Impressive accuracy" - there was not mentioned some quantity data about achieved accuracy to make this conclusion.

[line 87-93] Too short explanaition of own research, goal and objectives. Too abstract.

[Fig. 3] V2, V3 - unclear meaning. Must be explained.

[line 156] why image is referenced? It is not your?

[Fig. 4] Green boxes are not visible. It looks like mass of points.

[line 136, line 149, line 151, line 164] There is contradiction in the number of collected images. 6kx4x / 1kx1k = 24 => 24 * 1000 = 24000. But resulting number is 14815? Additionally, it must be known number per each region, therefore number of images must placed in Table 1.

[line 159] It is called "augmentation". "Enchancement" is another thing.

[lines 170-174] It must be clearly explained that modified YOLOv8n version was used. Firstly explaining modification, then providing the architecture depiction.

[line 176] It is called "architecture" neither "framework".

[Ch.2] YOLOv8 and YOLOv5 frameworks or their ArcXiv documents must referenced. Another applied architectures too to citate original authors.

[Ch.2] The modified version YOLOv8n was applied in the experiment, another title must be used to depict changes.

[lines 271] if 8 densities are multiplied with 20 images, the resulting amount must be 160 images.

[Fig. 10 and 11] There must be some principle, how comparable archictures were selected. It must be explained.

[Fig. 12] Dongsheng village: the images were collected with different altitude. Which altitude was applied in calculations? Not explained in text.

 

 

 

 

Author Response

Dear Reviewer,

Thank you very much for your thorough review and the detailed suggestions. We have comprehensively revised and enhanced our manuscript based on your feedback. Here are our specific responses to each of your comments:

Regarding the comparison between the default and modified versions of YOLOv8n: We did not alter the structure of the YOLOv8n model but chose to use the default version for our experiments. We have provided detailed reasons for choosing this model over YOLOv8m or YOLOv8l.

[Line 23] Clarity in discussion stages: We have revised the abstract to more clearly describe the various stages of our research.

[Lines 66-84] Use of different reference systems: This section has been updated and corrected with the latest research and references.

[Lines 53-55] Regarding skeletonization techniques: We have corrected the reference source and changed the term "traditional" to "image" to accurately describe its use in image processing.

[Line 57] Description of technological advancements: We have clearly outlined how advancements in image processing and analysis algorithms have propelled our research forward.

[Line 87] "Impressive accuracy": We have added specific data on accuracy to support our description of the model's performance.

[Lines 87-93] Explanation of research objectives and purposes: This section has been expanded to include a detailed description of our research intentions and goals, avoiding overly abstract statements.

[Figure 3] Explanation of V2, V3 meanings: We have clearly explained the meanings of these variables in the figure captions.

[Figure 4] Visibility of green boxes: We adjusted the image's color and contrast to ensure all markers are clearly visible.

[Lines 136, 149, 151, 164] Inconsistency in the number of pictures: We have corrected these figures to ensure consistency across all data.

[Line 159] Use of the term "enhancement": We have corrected "Enhancement" to "Augmentation" to accurately describe the technique used in our data processing.

[Lines 170-174] Explanation of the use of the default version of YOLOv8n: We clarified that the default version of YOLOv8n was used in the experiments and explained why no structural modifications were made.

[Chapter 2] References to the YOLOv8 and YOLOv5 frameworks: Regarding the citation issue, we have clarified that the original authors have not published formal academic articles on these models, thus we cannot provide traditional journal or conference paper citations. Instead, we have cited articles that utilize these models.

[Figures 10 and 11] Principles for choosing comparable architectures: We have detailed the criteria for selecting specific models for comparison, including differences in performance and architecture.

[Figure 12] Pictures collected at different heights: We have clarified that all images were collected at a flight altitude of 20 meters to ensure consistency in the experiments.

We appreciate your feedback and hope that these revisions meet the journal's requirements. We look forward to your further guidance and feedback.

Sincerely,

Zhijie Jia

Reviewer 5 Report

Comments and Suggestions for Authors

In this study, work was carried out to evaluate the effectiveness of deep learning models used for detecting corn seedlings. The key factors influencing the detection accuracy were selected: planting density, plant growth stages, flight height above the ground. Several deep learning models were evaluated: YOLOv8n, YOLOv3-tiny, Deformable-DETR, and Faster R-CNN. Single-stage models YOLOv8n and YOLOv5n showed stable performance at lower planting densities. Plant density and growth stage significantly affected the seedling detection accuracy of all models. Flight altitude was negatively correlated with image resolution and detection results, causing decreased detection at higher altitudes.

 

This work was completed to a good standard; a large amount of work was carried out. During the research, parameters were identified whose influence had the greatest impact on detection accuracy. However, as I read, several questions arose:

1) YOLOv5n is mentioned in the discussion and conclusions, although no results for this model were given.

2) Table 3 presented a large number of models, but the results were given only for: YOLOv8n, YOLOv3-tiny, Deformable-DETR, and Faster R-CNN.

3) It would be good if the principle of selecting key factors influencing the result were described. Why didn’t they take into account other factors that often arise when working in the field?

4) It may be worth making recommendations to improve the accuracy of corn detection at crop densities above 105,000 plants/ha.

5) It is advisable to compare the results and data obtained with global studies; it may be worth expanding the list of references.

Author Response

Dear Reviewer,

Thank you very much for your detailed review and valuable comments on our manuscript. Your queries and suggestions have been instrumental in refining our paper. Here are our responses to your questions and the corresponding revisions we have made:

Mention of the YOLOv5n Model: You rightly pointed out that we mentioned the YOLOv5n model in the discussion and conclusion sections without presenting its results. This oversight has been corrected in the revised manuscript, where we have now included the results for the YOLOv5n model, ensuring that all mentioned models are comprehensively reported.

Results in Table 3: The original version of Table 3 did list several models but only presented results for some. We have amended this in the revised manuscript by supplementing the results section to include detailed comparisons for all models mentioned.

Principles for Selecting Key Influencing Factors: Thank you for highlighting the lack of explanation regarding the principles for selecting key influencing factors. In the revised manuscript, we have detailed the scientific rationale for choosing planting density, growth stage, and flight height as study variables: Flight Height: This is a primary factor affecting image resolution and, consequently, detection accuracy. Changes in height directly influence the level of visible details in images, thus determining detection efficiency. In practical applications, the operational height of drones is adjusted based on task requirements and technical limitations, making it crucial to assess how different heights affect detection performance. Growth Stage: This is a critical period in agricultural production, determining the timeliness of crop health monitoring. Our research aims to identify the earliest and latest times during the crop growth cycle when seedling detection is effective, maximizing the practical value of monitoring. Planting Density: Different planting densities can affect the obscuration between plants and the complexity of the images, thereby impacting the performance of detection algorithms. By studying the effect of planting density, we aim to better understand and optimize the adaptability and accuracy of algorithms across various agricultural settings.

These factors were chosen based on the needs of actual agricultural production and preliminary experimental data, aiming to comprehensively evaluate the model's potential and limitations in real-world scenarios. We believe that exploring these key factors significantly enhances the practical application value and scientific depth of our research.

Suggestions for Improving Detection Accuracy at High Crop Densities: Your suggestion is very apt. In the revised manuscript, we have added a section discussing several strategies that could potentially enhance detection accuracy under high crop density conditions.

Comparison with Global Research and Expansion of the Reference List: We have included a comparative analysis with global studies in the revised manuscript and expanded the list of references. This ensures a broader academic context and acknowledges our study's contributions within the global research landscape.

We look forward to your further guidance and feedback, hoping that the revised manuscript meets the journal's publication standards. Thank you once again for your thorough review and constructive suggestions.

Sincerely,

Zhijie Jia

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have not addressed my concern regarding the data augmentation section. It is essential for them to conduct data splitting before implementing data augmentation. Training the model with 80% contrast and then validating it with 120% contrast of the same image constitutes data leakage, which can lead to artificially inflated accuracy scores. Therefore, the authors must redesign their experiments to ensure proper data handling.

Author Response

Dear Reviewer,

First and foremost, thank you for your in-depth review and valuable feedback on the data augmentation section of our manuscript. We take your concerns about potential data leakage in our experimental design very seriously. To address your feedback, we have revised our methods to ensure data splitting is conducted prior to any data augmentation.

Specifically, we have applied data augmentation only to the training dataset after ensuring a complete separation between the training and validation datasets. This revised approach effectively mitigates the risk of data leakage between the training and validation phases. Based on this new experimental design, we have obtained fresh results, which have been updated in the revised manuscript.
In sections 3.2 to 3.4, all the images we used were entirely new and had not appeared in the training set. The content of these images was obtained through independent inference by a well-trained target detection model. these results were manually verified to ensure the accuracy and reliability of the data.

We believe these changes adequately address your concerns and enhance the rigor and validity of our research. We have included these details in the revised manuscript for your review.

Thank you once again for your insightful feedback.

Yours sincerely,
Zhijie Jia

Reviewer 4 Report

Comments and Suggestions for Authors

Minor:
"Regarding the comparison between the default and modified versions of YOLOv8n: We did not alter the structure of the YOLOv8n model but chose to use the default version for our experiments."
[Lines 226-229] must be rewritten, because they can be understand incorrectly, that you have made modifications in YOLOv8 architecture.
[Lines 208-263] Sorry, but I do not understand, why you have so detailly described YOLOv8 architecture, if modifications are not done. Considering the similarity principle other architectures must described so detailly too.
But it is not usable to repeat the already described architectures. In this section, it is more logical to describe architectures applied in your experiment providing the appropriate references on CNN architecture to readers.

[Whole document] The document must be reread and corrected usage of definitions. The content can be understood incorrectly, for example, "CNN architecture" vs "model", etc.

[Line 21] "target detection" is other field, - "object detection".
[Line 108-110] Study is focused to compare CNN models based on different popular architecture (must be named).
Please, check and coordinate text in abstract and introduction related to aim and objectives. They are different due to incorrect usage of definitions.
[Table 2] YOLOv5 specific model is not depicted (s, m, l, x). What is 3? Version? Please precise.

Author Response

Dear Reviewer,

Thank you once again for your constructive feedback. Based on your recommendations, we have made significant revisions to our manuscriptBelow are the specific changes we have implemented:

Lines 226-229 and Lines 208-263: We have rewritten the sections to provide a clearer and more balanced overview of the object detection models used in our study. The revised text now succinctly outlines the distinctions between one-stage and two-stage object detection models, focusing on their processing speeds, accuracy, and adaptability to various image complexities without implying any modifications to the YOLOv8 architecture.

Section 2.4.1(One-stage object detection models): This section now includes a brief yet comprehensive description of one-stage models such as YOLOv8n, YOLOv5n, YOLOv3-tiny, SSD, FCOS, and RetinaNet. We emphasize their direct prediction capabilities and suitability for real-time processing, without detailing unnecessary architecture specifics unless they are central to our experimental results.

Section 2.4.2(Two-stage object detection models): Similarly, for two-stage models like Faster R-CNN, Cascade R-CNN, and Deformable DETR, we've outlined their mechanisms for generating and refining object candidates, focusing on how these features enhance detection accuracy in complex scenarios.

Terminology and consistency: We have rigorously reviewed the terminology used throughout the manuscript to ensure consistency and clarity, especially distinguishing between "CNN architecture" and "model." We have corrected "target detection" to "object detection" as per the standard terminology in the field.

Table 2: We have updated the information to accurately reflect the specific versions of YOLOv5 (s, m, l, x) and clarified what '3' refers to, ensuring it matches the context of our study.

We hope these amendments address your concerns and enhance the manuscript's clarity and technical accuracy. We appreciate your thorough review and look forward to your further suggestions.

Sincerely,

Zhijie Jia

Round 3

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have addressed my concerns. 

Back to TopTop