PhoPhoNO Digital Archive


Seunghun LEE, Keita KURABE

This archive is a collection of text, audio, and plots of data from five Tibeto-Burman languages: Drenjongke, Dzongkha, Burmese, Mon and Tamang. The dataset is based on materials collected from 2017 to 2020.

(This content is published as an outcome of Information Resources Center (IRC) Project.)


This project was featured in a topical article on the IRC website as quoted below.

This 2020 IRC project is an archive of audio files of Tibeto-Burman languages. We expanded the target languages of the existing online resource “PhoPhoNO Digital Archive” in March 2021. This resource presently includes text, audio, and plots of data from four Tibeto-Burman languages: Drenjongke, Dzongkha, Tamang, and Burmese, as well as Mon (Austroasiatic). The dataset is based on materials collected from 2017 to 2020.

Drenjongke is spoken by about 80,000 people in Sikkim, India. Tamang is spoken by about 1.35 million people in Nepal and India. Dzongkha is spoken by about 226,000 people in Bhutan. Burmese is spoken by about 330,000 people in Myanmar. Mon is spoken by about 1 million people in Myanmar and Thailand.

For example, for Drenjongke, the resource provides audio data on segments, words, and sentences pronounced by multiple speakers, and videos of reading and singing of the nursery rhymes. You can listen to sample data (8bit) on the webpage. If you want to download the original data (16bit), please contact us using the order form on the website. Drenjongke nursery rhymes are downloadable for free.