Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

CMCA-YOLO: A Study on a Real-Time Object Detection Model for Parking Lot Surveillance Imagery

Electronics 2024, 13(8), 1557; https://doi.org/10.3390/electronics13081557

by Ning Zhao¹, Ke Wang¹

, Jiaxing Yang¹

, Fengkai Luan¹, Liping Yuan² and Hu Zhang^1,*

Reviewer 1:

Keyan Chen

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Electronics 2024, 13(8), 1557; https://doi.org/10.3390/electronics13081557

Submission received: 5 March 2024 / Revised: 15 April 2024 / Accepted: 17 April 2024 / Published: 19 April 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper proposes a specific scene-targeted object detection method with clear logical structure. However, the following issues should be addressed:

The abbreviations used in the paper should be supplemented with their full names when they first appear, such as CMCA.
The introduction section should be divided into more distinct paragraphs. For example, the first paragraph can be split into two parts: research background and significance, and traditional methods and problems.
The introduction section should also review some current advanced methods, such as Transformer-based or CNN-Transformer methods like End-to-end object detection with transformers and Building extraction from remote sensing images with sparse token transformers, as well as foundation model methods like SAM and RSPromtper.
The related research should be described in separate sections, each focusing on a central topic.
The layout of Figure 4 should be adjusted to landscape orientation.
The algorithm steps 1 and 2 should be more concise and precise, using mathematical symbols where appropriate instead of relying solely on descriptive sentences.
More comparisons with other advanced methods should be added.

Comments on the Quality of English Language

n/a

Author Response

Dear Reviewer,

Thank you for your constructive feedback and the opportunity to refine our manuscript. We have carefully addressed each of your suggestions, making specific modifications throughout the document. Below, we detail the amendments made in response to each of your comments:

1.Abbreviations and Terminology: In lines 13-14, we have meticulously revised the manuscript to clearly introduce and define all abbreviations, including CMCA, at their first appearance. This ensures clarity and accessibility for all readers, enhancing the manuscript's readability.

2.Introduction Structure: Based on your valuable suggestion, we have reorganized the introduction section into three distinct sub-sections, with modifications made in lines 28, 40, and 91. This reorganization not only clarifies the structure of our discussion but also explicitly delineates the scope of our research, its challenges, and the contributions our work offers to the field.

3.Review of Advanced Methods: In lines 57-74, we have expanded our introduction to include a review of current advanced methods, contextualizing our work within the spectrum of contemporary research and highlighting its relevance and novelty.

4.Related Research Sections: Adhering to your advice, in lines 141, 172, and 217, we have segmented the related research section into discrete segments, each centered around a pivotal theme. This organization facilitates a more focused and thorough examination of each area, improving the manuscript's overall coherence and depth of analysis.

5.Figure 4 Layout Adjustment: In line 385, we have adjusted Figure 4 to a landscape format as recommended. This modification enhances the figure’s legibility and integrates more seamlessly with the manuscript’s layout, thereby improving the visual presentation of our data.

6.Algorithm Description: In the sections following lines 420 and 467, we refined our description of the algorithm steps, making them more concise and incorporating mathematical symbols for clarity and precision. This clarification aims to make the algorithmic processes more accessible while maintaining scientific rigor.

7.Comparisons with Other Methods: In line 569, we have included comparisons with advanced methods like RetinaNet and YOLOv8, focusing on models with feasibility for deployment on edge devices. This comparison not only demonstrates the efficacy of our method but also its practical applicability in resource-constrained environments.

We are confident that these revisions substantially improve our manuscript and effectively address the points you've raised. Your insightful comments have been instrumental in guiding these enhancements, and we are grateful for the detailed feedback. We eagerly anticipate your further feedback and hope that our revisions meet your approval.

Best regards,

Mr. Wang

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposes a new object detection model called CMCA-YOLO, designed specifically for parking lot surveillance scenarios. They introduced the CMCA (Criss-Cross Multi-Spectral Channel Attention) module that combines cross-attention and multi-spectral channel attention mechanisms to enhance recognition of small and overlapping targets and then created a dedicated parking lot scene dataset with 4,502 images to train and evaluate the model. Ablation studies and visualizations validate the effectiveness of the proposed approach in handling challenging parking lot surveillance scenarios.

However I have the following concerns,

(1) In the paper, the author mentions that their dataset is a custom dataset. Do they plan to publish their dataset so that the community can test their claims and build enhanced models based on their efforts? This would be beneficial for others looking to validate their findings and improve upon their work.

(2)Did the authors develop the YOLO code themselves? If not, why is there no mention of a GitHub repository that they have used or built upon? If they did develop the YOLO code, I believe they should open-source it for peer review, given that implementing YOLO is much more complex than what the current paper covers. With that being said, I hope the authors can make their models open-source for examination by the community.

(3)In the literature review section, I noticed that Transformer-based object detection and segmentation were not discussed. Considering that state-of-the-art models such as Segmentation Anything from Meta are based on Transformers, I believe the authors should include a brief discussion about this topic. Even though Transformer models may be too large to fit into IoT devices for edge computation, it's important not to overlook their significance in the field.

(4) Considering the CMCA structures, while I acknowledge that ablation studies address their effectiveness to some extent, I wonder about the potential of using Neural Architecture Search (NAS) to discover even better structures than the ones designed by the authors. Can the current structure outperform the candidates found through Neural Architecture Search? I suggest the authors include a brief section discussing the rationale behind why they chose the current design. This would provide valuable insights into their decision-making process and the potential superiority of their chosen architecture.

Comments on the Quality of English Language

English looks good to me.

Author Response

Dear Reviewer,

Thank you for your thorough review and insightful suggestions regarding our manuscript on the CMCA-YOLO model tailored for parking lot surveillance scenarios. We have carefully considered your comments and have made the following amendments and clarifications to our paper:

1.Dataset Publication: We appreciate your inquiry about the availability of our custom dataset. We indeed plan to make our dataset publicly available and are currently navigating the process with our collaborating companies and the broader community. While the dataset may contain sensitive information regarding vehicles and pedestrians, we are committed to ensuring it adheres to privacy standards and believe it will be available for public access soon. This will undoubtedly facilitate further validation and enhancement of our work by the research community.

2.YOLO Code Development and Sharing: Our work involved targeted optimizations to the original YOLOv5 code, detailed in Section III of our paper. We referenced the YOLOv5 architecture's original code at line 813, citing literature [18]. We are in the process of making our optimized code publicly available, pending the approval process of our collaborating company. We believe sharing the optimized code details will significantly aid peers in reproducing our work, and we anticipate releasing it shortly.

3.Discussion on Transformer-based Models: In lines 57-74, we discussed Transformer-based object detection and acknowledged your suggestion regarding the potential unsuitability of Transformer architectures for embedding into IoT devices for edge computing.

4.Use of Neural Architecture Search (NAS): In the revised section spanning lines 601-620, we clarified that the use of NAS did not significantly enhance model performance. This finding led us to prioritize computational efficiency and interpretability, especially considering our future research directions towards model light-weighting and deployment on edge devices. We have thus minimized the use of NAS to ensure a judicious use of computational resources while maintaining the model’s interpretability and practical applicability.

Your feedback has been invaluable in enhancing the quality and clarity of our manuscript. We have made these amendments to address your concerns and hope that our responses and modifications meet your expectations. We look forward to any further suggestions you may have and are optimistic about the contributions our research can make to the field of intelligent surveillance.

Best regards,

Mr. Wang

Reviewer 3 Report

Comments and Suggestions for Authors

Article methodology is sound. Authors are encouraged to carefully revise grammar and style, as well as the presentation. Figures and Tables should be improved.

Some comments:

Line 26: Introduction

Line 126: Add space after citation (all manuscript)

Add a remainder of the paper paragraph at the end of the introduction, referencing all sections.

Line 574-578: Paragraph should be supported with statistics for the claims.

Line 257: strengthens->improves

Line 431: one of the key contributions of the manuscript is the presented dataset with high-resolution images. If it is released publicly, please add the link to the article; as it can be of use for the community.

Line 466: Consider adding measurements using an edge device.

The article would benefit from additional experimentation, and comparison with other methodologies. It only compares to YOLOv5, what about YOLOv7 or V8?

Add a section with detection accuracy in complex scenes, with some example use cases.

Some general considerations:

There are many recent manuscripts presenting variants of YOLO for several applications [1,2,3], in particular handling lots in a parking [3,4,5]. Elaborate on your proposal and your contribution in comparison to other approaches.

[1] Vijayakumar, A., & Vairavasundaram, S. (2024). YOLO-based Object Detection Models: A Review and its Applications. Multimedia Tools and Applications, 1-40.

[2] Chen, Xinqiang, et al. "Ship imaging trajectory extraction via an aggregated you only look once (YOLO) model." Engineering Applications of Artificial Intelligence 130 (2024): 107742.

[3] Wang, C., He, W., Nie, Y., Guo, J., Liu, C., Wang, Y., & Han, K. (2024). Gold-YOLO: Efficient object detector via gather-and-distribute mechanism. Advances in Neural Information Processing Systems, 36.

[4] Nguyen, D. L., Vo, X. T., Priadana, A., & Jo, K. H. (2023, February). YOLO5PKLot: A Parking Lot Detection Network Based on Improved YOLOv5 for Smart Parking Management System. In International Workshop on Frontiers of Computer Vision (pp. 95-106). Singapore: Springer Nature Singapore.

[5] Ogawa, M., Arnon, T., & Gruber, E. (2023). Identifying Parking Lot Occupancy with YOLOv5. Journal of Student Research, 12(4).

[6] Retrialisca, F., Hariyanti, E., Eranti Putri, T., Inaiyah Agustin, E., Cahyana, C., & Arsya Kamelia, A. (2024). Genius Parking Space Detection System Based on Improve YOLO-V8. International Journal of Computing and Digital Systems, 16(1), 1-10.

Some broader insights:

Consider adding a layer of interpretability using Large Language Models (LLMs) on top of the detector. For instance LlaMA-2, Zephyr 7B Alpha or Mistral as in [6,7]. Consider exploring, or at least discuss, the possibility of using VLMs for the task [8,9] .

[6] Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. 2023 https://arxiv.org/abs/2305.18290

[7] de Zarzà, I.; de Curtò, J.; Roig, G.; Calafate, C.T. LLM Multimodal Traffic Accident Forecasting. Sensors 2023, 23, 9225. https://doi.org/10.3390/s23229225

[8] BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. https://arxiv.org/abs/2301.12597

[9] Piotr Teterwak, Ximeng Sun, Bryan A. Plummer, Kate Saenko, Ser-Nam Lim. CLAMP: Contrastive LAnguage Model Prompt-tuning https://arxiv.org/abs/2312.01629

Comments on the Quality of English Language

Authors are encouraged to carefully revise grammar and style, as well as the presentation. Figures and Tables should be improved.

Author Response

Dear Reviewer,

Thank you for your comprehensive review and constructive feedback on our manuscript. We have meticulously addressed each of your comments and suggestions to ensure our paper meets the highest standards of quality and clarity. Below are the modifications and clarifications made in response to your review:

1.Grammar and Style Revisions: We have conducted a thorough revision of grammar and style throughout the manuscript. Specific attention was given to Line 27 and Line 302, where spaces were added after each citation to adhere to proper formatting standards.

2. Introduction Summary and Paper Structure: We added a summary of the introduction and a detailed description of the remainder of the paper in Lines 128-139.

3. Support for Claims with Statistics: In response to your comment regarding the need for statistical support for our claims in Lines 574-578, we wish to clarify that the section in question provides a qualitative analysis, supplementing the visual evidence presented in Figure 7. This segment details the efficacy of the Grad-CAM visualization technique as an analytical tool, offering a qualitative complement to the visual comparisons. Figure 7 illustrates the CMCA-YOLO model's superior focus and accuracy in detecting targets compared to the YOLOv5s model, showcasing a qualitative advantage that does not necessarily require statistical validation. The descriptive nature of this section aims to highlight the qualitative distinctions and methodological superiority of the CMCA-YOLO model in handling complex detection scenarios, as visually evidenced by the comparative focus levels demonstrated in the figure.

4. Dataset Publication: We are in the process of making our dataset publicly available and are currently navigating the necessary procedures with our collaborating company and the community. We believe this dataset will be a valuable resource for the research community and anticipate its release soon.

5. Comparisons with Other Methodologies: We have expanded our comparison with existing advanced methodologies in Table 2 and Lines 569-581, including a detailed comparison and analysis with YOLOv8. We found that YOLOv7's performance was relatively lower for our specific research objectives, which is why we focused on comparing with YOLOv8.

6. Discussion of Cited References: We have referenced and discussed relevant literature in Lines 156-171. Regarding the paper by Retrialisca et al., we were unable to access the full text and, based on the abstract, determined that its content was already covered in our discussion of related work. Therefore, we decided not to include it as a reference.

7. Incorporation of Large Language Models (LLMs): We find your suggestion regarding the use of LLMs highly innovative and inspirational. We have considered this proposal as a potential direction for future research in Lines 762-777, citing the literature you mentioned.

We are grateful for your insightful feedback, which has significantly contributed to improving our manuscript. We have endeavored to address all your concerns and suggestions thoroughly and hope that our revisions meet your approval. We look forward to any further feedback you may have and are eager to contribute our research to the advancements in intelligent surveillance technologies.

Best regards,

Mr. Wang

Reviewer 4 Report

Comments and Suggestions for Authors

A comprehensive article of significant volume has been submitted for publication. The research area and emerging issues, research methodology, and research results are discussed in detail. Better research results were obtained compared to analogues. I suggest accepting the article for publication. The article is sufficiently comprehensive. I found all the answers to my questions in the article.

As for the shortcomings, I could mention several:

- There are some typographical errors, and some titles of the figures start with a lowercase letter.
- In some figures, the text is excessively small.

- The most serious drawback is that the article is indeed lengthy, and there is an excess of information. The extensive volume of the article makes it harder to read and understand. Therefore, I suggest considering ways to optimize the article's length. Purify the main ideas of the article

Comments on the Quality of English Language

There are some grammatical errors, some words are stuck together. But in general, the writing style is scientific and the number of mistakes is small.

Author Response

Dear Reviewer,

Thank you very much for your thorough review and constructive feedback on our manuscript. We greatly appreciate the time and effort you have invested in evaluating our work. Following your suggestions, we have made several adjustments and corrections to enhance the clarity, readability, and overall quality of our paper.

Regarding the typographical errors and the issue of figure titles starting with a lowercase letter, we have meticulously reviewed the manuscript and made necessary corrections, including adjustments in Lines 370, 676, and 686, among others. We aimed to ensure consistency and adherence to proper formatting standards throughout the document.

In response to your comment about the small text size in some figures, we have revised and adjusted the sizes of the figures to make the text more legible and the figures themselves more comprehensible to the readers. We believe these modifications will significantly improve the visual presentation of our data and findings.

Addressing the concern about the article's length, we understand the importance of conciseness and the need for readers to easily grasp and appreciate the main ideas of our research. With this in mind, we have carefully reviewed the entire manuscript to identify and revise sections where the presentation could be optimized. By refining our descriptions and focusing on the essential aspects of our research, we have sought to purify the main ideas, making the article more accessible and engaging to our audience without compromising the integrity of the research.

We hope that these revisions adequately address the issues you raised and enhance the manuscript's suitability for publication. We are grateful for your positive assessment of our work and for highlighting areas for improvement. Your feedback has been instrumental in refining our paper, and we believe the modifications made have significantly improved the quality of our manuscript.

Best regards,

Mr. Wang

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

Manuscript has overall been improved and the main concerns have been addressed.

Some comments:

Include detailed specifics on model parameters, training details, and any data preprocessing steps.

Some paragraphs sound a bit repetitive, carefully revise grammar and style.

Recently, YOLO-World [1], a Large Multimodal Model (LMM) for open-vocabulary object detection has been introduced. You can also add it to the discussion.

[1] YOLO-World: Real-Time Open-Vocabulary Object Detection. Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan

Comments on the Quality of English Language

Minor editing of English needed.

Author Response

Dear Reviewer,

Thank you for forwarding the reviewer’s comments on our manuscript. We appreciate the thoughtful feedback and have addressed each comment to further improve the quality of our manuscript. Below, we provide detailed responses to the reviewer's comments:

1. Detailed Specifics on Model Parameters, Training Details, and Data Preprocessing Steps: As suggested, we have provided detailed information on the model parameters, training details, and data preprocessing steps. These specifics are now thoroughly outlined in Section 4.2, particularly from lines 508 to 545 of the manuscript.

2. Repetitive Paragraphs and Grammar Style: We have carefully reviewed the manuscript and revised the sections that were previously noted as repetitive. The grammar and style have been refined to enhance clarity and readability.

3. Inclusion of YOLO-World in the Discussion: We acknowledge the importance of the YOLO-World model in the field of open-vocabulary object detection. Accordingly, we have discussed the implications and relevance of YOLO-World in the prospective section of our conclusions. This discussion should help to contextualize our findings within the broader scope of current advancements in the field.

We hope that these revisions adequately address the concerns raised by the reviewer. We appreciate the opportunity to revise our manuscript and believe that these changes have significantly improved the quality of our work. We look forward to the possibility of our manuscript being accepted for publication.

Best regards,

Mr. Wang

Article Menu

CMCA-YOLO: A Study on a Real-Time Object Detection Model for Parking Lot Surveillance Imagery

Further Information

Guidelines

MDPI Initiatives

Follow MDPI