Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Leveraging Neighbor Attention Initialization (NAI) for Efficient Training of Pretrained LLMs

Electronics 2024, 13(8), 1550; https://doi.org/10.3390/electronics13081550

by Qiao Tan and Jingjing Zhang^*

Reviewer 1: Anonymous

Reviewer 2:

Célia Tavares

Reviewer 3:

Yulia Kumar

Electronics 2024, 13(8), 1550; https://doi.org/10.3390/electronics13081550

Submission received: 20 March 2024 / Revised: 9 April 2024 / Accepted: 15 April 2024 / Published: 19 April 2024

(This article belongs to the Section Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper introduces Neighbor Attention Initialization (NAI), leveraging smaller PLMs to optimize training efficiency and cost-effectiveness. The authors validate their proposed approach and present the results of various experiments on GPT-2.

The authors claim to propose a "novel and innovative" approach named Neighbor Attention Initialization (NAI) to accelerate PLM training. However, the concept of leveraging smaller models for parameter initialization is not entirely novel and has been explored in previous works. Pls see

https://arxiv.org/abs/2204.07143

https://arxiv.org/pdf/2010.12256.pdf

https://www.sciencedirect.com/science/article/abs/pii/S0925231222006786?via%3Dihub

The paper primarily focuses on experiments conducted on GPT-2, which limits the generalizability of the proposed approach to other PLMs.

The paper does not discuss potential limitations or challenges associated with applying NAI to different model architectures or domains, thereby overlooking important considerations for real-world applicability.

Many typos, grammatical errors and language issues.

Provide details for "LAMBADA, PTB, WikiText2, and WikiText103 datasets.

Authors compare bert2BERT and the proposed NAI to the GPTbase model (tables 1 and 2). In the discussion presented in section 4, highlight NAI outperforms the GPTbase by what percentage?

In the experimental setup pls detail how and where did you run these experiments. If Google Colab was used, pls provide details for the parameters.

Comments on the Quality of English Language

Various typos and grammatical errors.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This article analyses a very current topic: the training of large pretrained language models.

This article focuses on presenting a new approach to LLMs pretraining, which allows for a reduction of computational costs of around 31%. This new approach is called "NAI - neighbour attention initialization" and by preserving the capabilities of the source model (smaller PLMs), NAI can ensure an effective initialization of the target model.

Although it is clear what the authors have researched and developed through the article, I believe that the introduction should have ended with a research question that clearly expresses why a study of this nature was carried out.

Regarding the related research, I believe that an adequate description of what already exists has been given, specifically bert2BERT, which is one of the approaches that has more similarities with NAI. Furthermore, this section finishes by stating that GPT2 served as the study's foundation. However, given the availability of more recent iterations of the GPT model, I believe it would be important to mention that its use is due to information access constraints.

The study and the algorithm were extensively detailed and comparisons with other approaches were also shown, which is relevant.

The selection of bibliographical references is quite extensive and up-to-date.

Overall, I believe that the study presented in the article contributes positively to the field and opens possibilities for future research, especially for larger-scale studies that could help validate the algorithm further.

To finish, some aspects need attention in order to enhance the overall readability and clarity of the article. Thus, some linguistic aspects should be corrected, as follows:

· Line 3 – “This paper we proposes” – correct to “This paper proposes”

· Line 10 – “demonstrate” – correct to “demonstrated”

· Line 46 – “we conduct” – correct to “we conducted”

· Line 81 – Double full stop

· Line 129 – “which employ” – correct to “which employs”

· Line 192 – Double full stop

Moreover, regarding the format of the article, there are several cases in which the figures are positioned before being mentioned for the first time. This method is unusual, and I believe it would be important to correct the cases where this happens so that the figures appear after being mentioned for the first time.

· Figure 2 is mentioned in the introduction, but it is only shown in section 2, where no mention is made about it;

· Figure 3 is mentioned in 3.1, but it is only shown in 3.1.2;

· Figure 4 is mentioned in 3.2, but it was shown previously in 3.1.2;

· Figure 8 and Figure 9 are presented, but not mentioned in the text.

· Figure 10 and 11 are shown in 4.1, but are only mentioned in 4.2.

Comments on the Quality of English Language

Overall, the article is well-written, but certain aspects need to be corrected (they are identified in the review).

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper presents solid research and is therefore of significant relevance, making its publication timely and pertinent. The exploration of Neighbor Attention Initialization (NAI) applied to larger Language Models (LLMs) promises to be a fascinating area for future applications. Although the primary focus on GPT-2 research was in Spring 2023, the timing for publishing this paper remains appropriate, offering valuable insights.

It is suggested to revise the title to avoid using a colon, opting for "Leveraging Neighbor Attention Initialization (NAI) for Efficient Training of Pretrained LLMs" instead. Additionally, incorporating references within the abstract is not standard practice and should be reconsidered. Minor linguistic errors and typos need addressing.

Regarding Figures 1, 2, 5, and 8, while they are indeed engaging, their design should be reconsidered to ensure inclusivity and accessibility for a color-blind audience. Lastly, incorporating a more thorough discussion on biases within the paper is recommended, as this represents a critical and necessary focus area.

Article Menu

Leveraging Neighbor Attention Initialization (NAI) for Efficient Training of Pretrained LLMs

Further Information

Guidelines

MDPI Initiatives

Follow MDPI