Professor Steven Bird

B.Sc. M.Sc. Ph.D. (Edin)


Steven Bird is conducting social and technological experiments in the future evolution of the world's languages. Together with his students and colleagues, he is developing scalable methods for preserving disappearing words and worldviews for future generations of speakers and scholars. He is collaborating with speech communities in diasporas and ancestral homelands to design new approaches to language maintenance and revitalisation.

Steven studied computer science at the University of Melbourne before completing a PhD in computational linguistics at the University of Edinburgh. He has conducted fieldwork on endangered languages in West Africa, South America, Central Asia, Melanesia, and Australia. He has held academic positions at the Universities of Edinburgh, Pennsylvania, Melbourne, and UC Berkeley. He holds a secondary appointment as Senior Research Scientist at the International Computer Science Institute, UC Berkeley. He serves as Linguist at the Nawarddeken Academy in West Arnhem.

Steven is leading the Top End Language Lab

Prospective students are encouraged to consult

Research Interests

  • Language Maintenance and Revitalisation
  • Language Documentation and Description
  • Participatory Design
  • Computational Linguistics
  • Digital Archives

Open all | Close all

Publications & Resources


Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. Sebastopol, CA: O’Reilly Media, Inc.

Kan, M-Y., & Bird, S. (eds) (2009). ACL Anthology Reference Corpus. Philadelphia: Linguistic Data Consortium.

Bird, S. (2003). Grassfields Bantu Fieldwork: Dschang Lexicon. University of Pennsylvania Press.

Bird, S. (2003). Grassfields Bantu Fieldwork: Dschang Tone Paradigms. University of Pennsylvania Press.

Book Chapters

Bird, S., & Simons, G. (2004). Building an Open Language Archives Community on the DC Foundation. In D.I. Hillman & E.L. Westbrooks (eds.), Metadata in Practice (pp. 203 - 222). ALA Editions.

Journal Articles

Bird, S., & Lee, H. (2014). Computational support for early elicitation and classification of tone.  Language Documentation and Conservation, 8, 453-461.

Bird, S., Chiang, D., Frowein, F., Berez, A.L., Eby, M., Hanke, F., Shelby, R., Vaswani, A., & Wan, A. (2013). The International Workshop on Language Preservation: An Experiment in Text Collection and Language Technology. Language Documentation & Conservation, 7, 155-167.

Bird, S. (2011). Bootstrapping the Language Archive: New prospects for Natural Language Processing in Preserving Linguistic Heritage. Linguistic Issues in Language Technology, 6(4). 

Lai, C., & Bird, S. (2010). Querying Linguistic Trees. Journal of Logic, Language and Information, 19, 53-73.

Bird, S. (2009). Natural Language Processing and Linguistic Fieldwork. Computational Linguistics, 35(3), 469-474.

Robinson, S., Aumann, G., & Bird, S. (2007). Managing Fieldwork Data with Toolbox and the Natural Language Toolkit. Language Documentation and Conservation, 1(1), 44 - 57.

Goldman, J., Renals, S., Bird, S., de Jong, F., Federico, M., Fleischhauer, C., Kornbluh, M., Lamel, L., Oard, D. W., Stewart, C., & Wright, R. (2005). Accessing the spoken word. International Journal on Digital Libraries, 5(4), 287-298.

Simons, G., & Bird, S. (2003). Building an Open Language Archives Community on the DC Foundation. Library Hi Tech, 21(2), 210-218).

Bird, S., & Simons, G. (2003). Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources. Computers and the Humanities, 37, 375-388.

Bird, S., & Simons, G. (2003). Seven Dimensions of Portability for Language Documentation. Language, 79(3), 557-582.

Simons, G., & Bird, S. (2003). The Open Language Archives Community: An Infrastructure for Distributed Archiving of Language Resources. Literary and Linguistic Computing, 18(2), 117-128.

Conference Proceedings & Papers

Bird, S. (2016). Social Mobile Technologies for Reconnecting Indigenous and Immigrant Communities.  People.Policy.Place Seminar. Northern Institute, CDU.

Duong, L., Anastasopoulos, A., Chiang, D., Bird, S., & Cohn, T. (2016). An attentional model for speech translation without transcription. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

Duong, L., Cohn, T., Bird, S., & Cook, P. (2015). A neural network model for low-resource universal dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 339-348). Lisbon, Portugal.

Adams, O., Neubig, G., Cohn, T., & Bird, S. (2015). Inducing bilingual lexicons from small quantities of sentence-aligned phonemic transcriptions. In Proceedings of the International Workshop on Spoken Language Translation (pp. 248-255). Na Dang, Vietnam.

Burford, C., Bird, S., & Baldwin, T. (2015). Collective document classification with implicit inter-document semantic relationships. In Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics (pp. 106-116). Denver, USA.

Duong, L., Cohn, T., Bird, S., & Cook, P. (2015). Cross-lingual transfer for unsupervised dependency parsing without parallel data. In Proceedings of the 19th Conference on Computational Language Learning (pp. 113–122). Beijing, China.

Duong, L., Cohn, T., Bird, S., & Cook, P. (2015). Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (pp. 845–850). Beijing, China.

Duong, L., Chonm T., Verspoor, K., Bird, S., & Cook, P. (2014). What Can We Get From 1000 Tokens? A Case Study of Multilingual POS Tagging For Resource-Poor Languages. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 886–897). Doha, Qatar. 

Bird, S., Gawne, L., Gelbart, K., & McAlister, I. (2014). Collecting bilingual audio in remote indigenous villages. In  Proceedings of the 25th International Conference on Computational Linguistics (pp. 1015-1024). Dublin, Ireland

Bird, S., Hanke, F.R., Adams, O., & Lee, H. (2014). Aikuma: A Mobile App for Collaborative Language Documentation. Workshop on the Use of Computational Methods in the Study of Endangered Languages. Baltimore, USA

Bird, S., & Curran, J. (2006). Building a Search Engine to Drive Problem-Based Learning. In M. Goldweber & P. Salomoni (eds.), Proceedings of the Eleventh Annual Conference on Innovation and Technology in Computer Science Education (pp. 153-157). New York, NY: ACM.


Alwan, A., Bourland, H., Furui, S., Bird, S., & Harrington, J. (2001). Speech Communication - Special issue on speech annotation and corpus tools, 33(1-2).

Guest Editor

Bird, S., Meyer, B., & Christphersen, P. (eds). (2014). ALGORITHMICS: Higher Education Scored Study. Victorian Certificate of Education. 

Bird, S., & Hyman, L. (eds). (2014). How to study a tone language. Language Documentation and Conservation.

Professional Positions, Memberships & Awards

Professional Positions

2015 - currentSenior Research Scientist, International Computer Science Institute, University of
California Berkeley
2015 - currentVisiting Professor, Department of Linguistics, University of California Berkeley
2011 - 2014Convenor of CS4HS Victoria workshops
2002 - currentAssociate Professor, Department of Computing and Information Systems,
University of Melbourne
2002 - 2015Senior Research Associate, Linguistic Data Consortium, University of Pennsylvania
2001 - currentLead developer of the Natural Language Toolkit, open source software for language analysis
2000 - currentCoordinator of the Open Language Archives Community
1998 - 2002Adjunct Associate Professor, Department of Computer and Information Science and
Department of Linguistics, University of Pennsylvania
1990 - 1998Research Fellow, Centre for Cognitive Science, University of Edinburgh