BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting

Zheng Xin Yong; Hailey Schoelkopf; Niklas Muennighoff; Alham Fikri Aji; David Ifeoluwa Adelani; Khalid Almubarak; M. Saiful Bari; Lintang Sutawika; Jungo Kasai; Ahmed Baruwa; Genta Indra Winata; Stella Biderman; Edward Raff; Dragomir Radev; Vassilina Nikoulina

BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting

Zheng Xin Yong
, Hailey Schoelkopf
, Niklas Muennighoff
, Alham Fikri Aji
, David Ifeoluwa Adelani
, Khalid Almubarak
, M. Saiful Bari
, Lintang Sutawika
, Jungo Kasai
, Ahmed Baruwa
, Genta Indra Winata
, Stella Biderman
, Edward Raff
, Dragomir Radev
, Vassilina Nikoulina

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

40 Scopus citations

Abstract

The BLOOM model is a large publicly available multilingual language model, but its pretraining was limited to 46 languages. To extend the benefits of BLOOM to other languages without incurring prohibitively large costs, it is desirable to adapt BLOOM to new languages not seen during pretraining. In this work, we apply existing language adaptation strategies to BLOOM and benchmark its zero-shot prompting performance on eight new languages in a resource-constrained setting. We find language adaptation to be effective at improving zero-shot performance in new languages. Surprisingly, we find that adapter-based finetuning is more effective than continued pretraining for large models. In addition, we discover that prompting performance is not significantly affected by language specifics, such as the writing system. It is primarily determined by the size of the language adaptation data. We also add new languages to BLOOMZ, which is a multitask finetuned version of BLOOM capable of following task instructions zero-shot. We find including a new language in the multitask fine-tuning mixture to be the most effective method to teach BLOOMZ a new language. We conclude that with sufficient training data language adaptation can generalize well to diverse languages. Our code is available at https://github.com/bigscience-workshop/multilingual-modeling.

Original language	English
Title of host publication	Long Papers
Publisher	Association for Computational Linguistics (ACL)
Pages	11682-11703
Number of pages	22
ISBN (Electronic)	9781959429722
State	Published - 2023
Externally published	Yes
Event	61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada Duration: 9 Jul 2023 → 14 Jul 2023

Publication series

Name	Proceedings of the Annual Meeting of the Association for Computational Linguistics
Volume	1
ISSN (Print)	0736-587X

Conference

Conference	61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Country/Territory	Canada
City	Toronto
Period	9/07/23 → 14/07/23

Cite this

Yong, Z. X., Schoelkopf, H., Muennighoff, N., Aji, A. F., Adelani, D. I., Almubarak, K., Bari, M. S., Sutawika, L., Kasai, J., Baruwa, A., Winata, G. I., Biderman, S., Raff, E., Radev, D., & Nikoulina, V. (2023). BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting. In Long Papers (pp. 11682-11703). (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Vol. 1). Association for Computational Linguistics (ACL).

@inproceedings{c58d9e87f81b4d3297f3087b9bbcaae2,

title = "BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting",

abstract = "The BLOOM model is a large publicly available multilingual language model, but its pretraining was limited to 46 languages. To extend the benefits of BLOOM to other languages without incurring prohibitively large costs, it is desirable to adapt BLOOM to new languages not seen during pretraining. In this work, we apply existing language adaptation strategies to BLOOM and benchmark its zero-shot prompting performance on eight new languages in a resource-constrained setting. We find language adaptation to be effective at improving zero-shot performance in new languages. Surprisingly, we find that adapter-based finetuning is more effective than continued pretraining for large models. In addition, we discover that prompting performance is not significantly affected by language specifics, such as the writing system. It is primarily determined by the size of the language adaptation data. We also add new languages to BLOOMZ, which is a multitask finetuned version of BLOOM capable of following task instructions zero-shot. We find including a new language in the multitask fine-tuning mixture to be the most effective method to teach BLOOMZ a new language. We conclude that with sufficient training data language adaptation can generalize well to diverse languages. Our code is available at https://github.com/bigscience-workshop/multilingual-modeling.",

author = "Yong, \{Zheng Xin\} and Hailey Schoelkopf and Niklas Muennighoff and Aji, \{Alham Fikri\} and Adelani, \{David Ifeoluwa\} and Khalid Almubarak and Bari, \{M. Saiful\} and Lintang Sutawika and Jungo Kasai and Ahmed Baruwa and Winata, \{Genta Indra\} and Stella Biderman and Edward Raff and Dragomir Radev and Vassilina Nikoulina",

note = "Publisher Copyright: {\textcopyright} 2023 Association for Computational Linguistics.; 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 ; Conference date: 09-07-2023 Through 14-07-2023",

year = "2023",

language = "English",

series = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics (ACL)",

pages = "11682--11703",

booktitle = "Long Papers",

address = "Australia",

}

Yong, ZX, Schoelkopf, H, Muennighoff, N, Aji, AF, Adelani, DI, Almubarak, K, Bari, MS, Sutawika, L, Kasai, J, Baruwa, A, Winata, GI, Biderman, S, Raff, E, Radev, D & Nikoulina, V 2023, BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting. in Long Papers. Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 1, Association for Computational Linguistics (ACL), pp. 11682-11703, 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, Toronto, Canada, 9/07/23.

BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting. / Yong, Zheng Xin; Schoelkopf, Hailey; Muennighoff, Niklas et al.
Long Papers. Association for Computational Linguistics (ACL), 2023. p. 11682-11703 (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Vol. 1).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - BLOOM+1

T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023

AU - Yong, Zheng Xin

AU - Schoelkopf, Hailey

AU - Muennighoff, Niklas

AU - Aji, Alham Fikri

AU - Adelani, David Ifeoluwa

AU - Almubarak, Khalid

AU - Bari, M. Saiful

AU - Sutawika, Lintang

AU - Kasai, Jungo

AU - Baruwa, Ahmed

AU - Winata, Genta Indra

AU - Biderman, Stella

AU - Raff, Edward

AU - Radev, Dragomir

AU - Nikoulina, Vassilina

PY - 2023

Y1 - 2023

N2 - The BLOOM model is a large publicly available multilingual language model, but its pretraining was limited to 46 languages. To extend the benefits of BLOOM to other languages without incurring prohibitively large costs, it is desirable to adapt BLOOM to new languages not seen during pretraining. In this work, we apply existing language adaptation strategies to BLOOM and benchmark its zero-shot prompting performance on eight new languages in a resource-constrained setting. We find language adaptation to be effective at improving zero-shot performance in new languages. Surprisingly, we find that adapter-based finetuning is more effective than continued pretraining for large models. In addition, we discover that prompting performance is not significantly affected by language specifics, such as the writing system. It is primarily determined by the size of the language adaptation data. We also add new languages to BLOOMZ, which is a multitask finetuned version of BLOOM capable of following task instructions zero-shot. We find including a new language in the multitask fine-tuning mixture to be the most effective method to teach BLOOMZ a new language. We conclude that with sufficient training data language adaptation can generalize well to diverse languages. Our code is available at https://github.com/bigscience-workshop/multilingual-modeling.

AB - The BLOOM model is a large publicly available multilingual language model, but its pretraining was limited to 46 languages. To extend the benefits of BLOOM to other languages without incurring prohibitively large costs, it is desirable to adapt BLOOM to new languages not seen during pretraining. In this work, we apply existing language adaptation strategies to BLOOM and benchmark its zero-shot prompting performance on eight new languages in a resource-constrained setting. We find language adaptation to be effective at improving zero-shot performance in new languages. Surprisingly, we find that adapter-based finetuning is more effective than continued pretraining for large models. In addition, we discover that prompting performance is not significantly affected by language specifics, such as the writing system. It is primarily determined by the size of the language adaptation data. We also add new languages to BLOOMZ, which is a multitask finetuned version of BLOOM capable of following task instructions zero-shot. We find including a new language in the multitask fine-tuning mixture to be the most effective method to teach BLOOMZ a new language. We conclude that with sufficient training data language adaptation can generalize well to diverse languages. Our code is available at https://github.com/bigscience-workshop/multilingual-modeling.

UR - https://www.scopus.com/pages/publications/85174388196

M3 - Conference contribution

AN - SCOPUS:85174388196

T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics

SP - 11682

EP - 11703

BT - Long Papers

PB - Association for Computational Linguistics (ACL)

Y2 - 9 July 2023 through 14 July 2023

ER -

BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting

Abstract

Publication series

Conference

Other files and links

Fingerprint

Cite this