Turkish Text Summarization with Artificial Intelligence and Deep Learning Methods: Improving the Performance of Pretrained Large Language Models for Turkish through Vectorial Cosine Similarity-Based Content Selection
DOI:
https://doi.org/10.5281/zenodo.15775633Anahtar Kelimeler:
Artificial Intelligence, Deep Learning, Pre-trained Language Models, ROUGE ScoreÖzet
This paper aims to present an innovative solution for the Turkish text summarization process by leveraging artificial intelligence and deep learning-based large language models (LLMs), specifically focusing on the use of pre-trained language models. In this context, the primary objective of this study is to adapt pre-trained models for the task of Turkish text summarization, thereby proposing a cost-efficient and performance-oriented approach that eliminates the need for extensive training. The theoretical foundation of this research is based on the numerical representation of the distributional regularities of language, positing that leveraging the similarity ratios of these regularities can enhance the structured output generation of generative language models. In this regard, it is expected that large language models will generate more structured and coherent outputs by utilizing the contextual similarity between sentence vectors in a vector space where each word is represented as a vector. This dichotomic approach aims to improve the consistency of generative outputs by increasing the regularity of input vectors through a separate process that computes the similarities between them. In this study, a cosine similarity-based content selection strategy for sentence vectors in large language models is proposed for Turkish text summarization. The objective is to select the most meaningful sentences, providing the model with more targeted and summarization-friendly input, thereby enhancing summarization performance while reducing computational costs. The study investigates the improvement of the T5 model’s performance by feeding it with sentences that exhibit the highest vectorial similarity. The evaluation is conducted using ROUGE-1, ROUGE-2, and ROUGE-L metrics. The results indicate that selecting sentences with high vectorial cosine similarity contributes to a significant improvement in summarization quality. Furthermore, statistical analyses using t-tests demonstrate that this method enhances summarization performance in certain cases.
Referanslar
Anonim, 2024a. Tensorflow Kelime Projeksiyonları.
Anonim, 2020b. Schematic Diagram of Cosine Similarity and Euclidean Distance.
Avrupa Komisyonu, 2019. Avrupa Yeşil Mutabakatı. Avrupa Birliği Yayın Ofisi. (https://eur-lex.europa.eu/legal-content/TR/TXT/?uri=CELEX%3A52019DC0640), (Erişim tarihi: 29.02.2025).
Bakan, C.T., Yakut, S., 2024. Graf teorisi ve malatya merkezilik algoritmasına dayalı haber metinlerinin özetlemesi. Bilişim Teknolojileri Dergisi, 17(3): 189-198.
Gopalakrishnan, S., Garbayo, L., Zadrozny, W., 2025. Causality extraction from medical text using large language models (LLMs). Information, 16(1): 13.
Güngör, M., 2023. Wikipedia TR Summarization Dataset [Veri kümesi]. Hugging Face.
Harris, Z.S., 1954. Distributional Structure. WORD, 10(2–3): 146–162.
Johnson, S.J., Murty, M.R., Navakanth, I., 2024. A detailed review on word embedding techniques with emphasis on word2vec. Multimedia Tools and Applications, 83(13): 37979-38007.
Karakoç, E., Yılmaz, B., 2019. Deep learning based abstractive Turkish news summarization. In 2019 27th Signal Processing and Communications Applications Conference (SIU), pp. 1-4, IEEE.
Kartal, Y.S., Kutlu, M., 2020. Türkçe haber metinleri için makine öğrenmesi temelli özetleme. In 2020 28th Signal Processing and Communications Applications Conference, SIU 2020-Proceedings. Institute of Electrical and Electronics Engineers Inc.
Khurana, D., Koli, A., Khatter, K., Singh, S., 2023. Natural language processing: State of the art, current trends and challenges. Multimedia Tools and Applications, 82(3): 3713-3744.
Mana, S.C., Sasipraba, T., 2021. Research on cosine similarity and Pearson correlation based recommendation models. In Journal of Physics: Conference Series 1: 012014.
McDonald, S., Ramscar, M., 2001. Testing the distributional hypothesis: The influence of context on judgements of semantic similarity. Proceedings of the Annual Meeting of the Cognitive Science Society, 23: 23.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ...Liu, P.J., 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140): 1-67.
Rahutomo, F., Kitasuka, T., Aritsugi, M., 2012. Semantic cosine similarity. The 7th International Student Conference on Advanced Science and Technology ICAST, p. 1, South Korea: University of Seoul.
Reimers, N., 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint arXiv:1908.10084.
Şakar, T., Emekci, H., 2025. Maximizing RAG efficiency: A comparative analysis of RAG methods. Natural Language Processing, 31(1): 1-25.
Salton, G., Buckley, C., 1988. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5): 513−523.
TeraGron, 2021. Wikisum: A dataset for summarization of Wikipedia articles. Hugging Face. (https://huggingface.co/datasets/teragron/wikisum), (Erişim Tarihi: 25.02.2024)
Xia, P., Zhang, L., Li, F., 2015. Learning similarity with cosine similarity ensemble. Information Sciences, 307: 39-52.
İndir
Yayınlanmış
Nasıl Atıf Yapılır
Sayı
Bölüm
Lisans
Telif Hakkı (c) 2025 ISPEC JOURNAL OF SCIENCE INSTITUTE

Bu çalışma Creative Commons Attribution 4.0 International License ile lisanslanmıştır.