Amazing works being done by Masakhane in the African NLP space
This article is intended to expose the awesome work done by the Masakhane community on supporting and facilitating African Languages from various perspectives such as building datasets and tools through research etc.
Hi once again, welcome to this new informative article about an awesome community doing really amazing work on strengthening Natural Language Processing (NLP) research in African languages, for Africans.
Africa has over 2000 languages. Despite this, African languages account for a tiny portion of available resources and publications in Natural Language Processing (NLP). This is due to multiple factors, including a lack of focus from government and funding, discoverability, a lack of community, sheer language complexity, difficulty in reproducing papers, and no benchmarks to compare techniques.
Masakhane is a research effort originally for Machine translation focused on African languages that are open-source, continent-wide, distributed online. It aimed to build a community of Natural Language Processing researchers, connect and grow it, spurring and sharing further research to enable language preservation, tool building, and increasing its global visibility and relevance.
Masakhane pushing to build datasets and tools to facilitate Natural Language Processing in African languages and pose new research problems to enrich the NLP research landscape.
Also, to build a community that will help to discover best practices for distributed research, to be applied by other emerging research communities.
The best about Masakhane, it uses smooth methodologies which don't require any experience of NLP to be connected with the community. If you are passionate about solving African challenges based on NLP, Masakhane is barrier-free, open access to first hands-on NLP experience with African languages. Enable anyone to train Machine translation models on a parallel corpus of their own choice and the results shared online.
It requires no academic prerequisites to conduct research or contribute to the Masakhane community.
Guess what? have a question in your mind about how I can connect with this awesome community?
Want to be Masakhanian? Don't worry, here you go, Masakhane has their active slack workspace and mailing list group, where you can find all the updates and research at hand.
Works are done by Masakhane
- Connecting people from different perspectives on solving research problems based on African language: It is simple for any researcher in the Masakhane community to start research on any language he/she could prefer simply because the support from the native language speaker is there, you don't want to worry about learning the African language that you what to solve a challenge with. I have seen many people from abroad interested in Swahili research and supported to complete their project by native Swahili speakers in the Masakhane community, this is awesome and helps people across the world to understand different cultures and languages.
- Machine Translation for African languages: Masakhane has an open-source online web for machine translation services for solely African languages. Masakhane Web is the platform that aims at hosting the already trained machine translation models from the Masakhane community and allows contributions from users to create new data for retraining and improving the models. If you would like to contribute to this project, train a model in your language or want to collaborate and work with Masakhane, find out how in https://github.com/dsfsi/masakhane-web or reach out to any of the Masakhane Web contributors. The machine translation for the African language project now has 52 African languages with benchmarks which can be seen on the Masakhane project's Github page.
- Language models for African Languages: The Masakhane community is able to publish different papers about African language modeling, this is among amazing initiatives on breaking the barrier of African language model availability. You can find the publication here.
- Data Gathering: One of the data collection projects done by the Masakhane community is Text & Speech for East Africa, which was aimed to deliver open, accessible, and high-quality text and speech datasets for low-resourced East African languages from Uganda, Tanzania, and Kenya. The project was focused on data for the languages: Luganda, Runyankore-Rukiga, Acholi, Swahili, and a subset of Luhya Languages which are cross-border between Uganda and Kenya. Read more about the project here.
- Named Entity Recognition for African Names: The project goes with the name
MasakhaNER. The project focused on information extraction and identifying African names, places, and people from information retrieval. Since the majority of existing NER datasets for African languages are WikiNER which are automatically annotated and are very noisy since the text quality for African languages is not verified. Only a few African languages have human-annotated NER datasets. You can find the project on the Masakhane Github page.
How you can contribute?
- Training a model: depending on the research problem you want to contribute you can just take part in training different models as a Masakhane community member.
- Documentation: You can opt to be part of members that help in documenting projects and writing progress about ongoing projects.
- Analysis: If you really like to perform analysis of models and data, through Masakhane you can take part in analyzing datasets of African languages to solve various problems.
- Data: People are interested in using new technology to advance their own languages. No one is happy seeing their languages are framed in a wrong way on various platforms, this is a chance for you to contribute through data collection activities in the Masakhane community to increase the accessibility of data for researchers, data scientists, software developers, and others to solve African challenges.
- Mentorship: You can contribute by providing advice or help to tune models for the languages you prefer or you had experience with and help people get started. This can be applied to members that want to apply research on the language of your origin, so providing support and compliment to their work is the great way to make the best practice of addressing specific challenges of the African language.
- Computation infrastructure: If you have computing infrastructures and your really love to support Masakhane initiatives but you don't have time to focus on training kinds of NLP models, just help the community with the infrastructures or you can just donate.
- Brainstorm: You can just join weekly meetings and contribute with suggestions, ideas, or advice depending on the topic. Also if you have an existing challenge within your community that can be solved by NLP technologies feel free to share with the Masakhane community to find the best way of solving that challenge.
Masakhane is gaining momentum in the world right now. Research institutions are heavily investing in Masakhane as a central platform for any needed resources to include one or more African languages in their works. Masakhane has many research collaborations with Google, Facebook as well as institutions in Africa. Organizations, companies, and investors are joining, too, by working together with Masakhane members on several projects which are at the core of African languages. There is much to gain from investing in Masakhane right now, as it is the hottest thing when we talk about natural language processing for African languages.
Thank you for taking your time to read this article, hope helps you to understand the progress of Natural language processing in the African language context and how you can take part in contributing to the initiatives started by the Masakhane community.