Evaluating Text Quality of GPT Engine Davinci-003 and GPT Engine Davinci Generation Using BLEU Score

Heryanto, Yayan (2023) Evaluating Text Quality of GPT Engine Davinci-003 and GPT Engine Davinci Generation Using BLEU Score. SAGA: Journal of Technology and Information Systems, 4 (1). pp. 121-129. ISSN 2985-8933

[img] Text
213 - Published Version

Download (37kB)

Abstract

The improvement of text generation based on language models has witnessed significant progress in the field of natural language processing with the use of Transformer-based language models, such as GPT (Generative Pre-trained Transformer). In this study, we conduct an evaluation of text quality using the BLEU (Bilingual Evaluation Understudy) score for two prominent GPT engines: Davinci-003 and Davinci. We generated questions and answers related to Python from internet sources as input data. The BLEU score comparison revealed that Davinci-003 achieved a higher score of 0.035, while Davinci attained a score of 0.021. Additionally, for the response times, with Davinci demonstrating an average response time of 4.20 seconds, while Davinci-003 exhibited a slightly longer average response time of 6.59 seconds. The decision of whether to use Davinci-003 or Davinci for chatbot development should be made based on the specific project requirements. If prioritizing text quality is paramount, Davinci-003 emerges as the superior choice due to its higher BLEU score. However, if faster response times are of greater importance, Davinci may be the more suitable option. Ultimately, the selection should align with the unique needs and objectives of the chatbot development project.

Item Type: Article
Subjects: Q Science > QA Mathematics > QA76 Computer software
T Technology > T Technology (General)
Divisions: Artikel > Informatika dan Sistem Informasi
Depositing User: - Abdurrahman -
Date Deposited: 26 Mar 2024 02:32
Last Modified: 26 Mar 2024 02:32
URI: http://repository.unas.ac.id/id/eprint/10454

Actions (login required)

View Item View Item