Understanding the concepts behind machine translation

From this article, you will understand the concept of Machine Translation, including its background, types, technologies for Machine Translation, and the current state of Machine Translation

Understanding the concepts behind machine translation
Machine Translation is wonderful technology but it is not wonderful to trust by Mohd Mustafa

Machine Translation started around the 1950s and involve a lot of manual processing, where some limitations such as the power of computing, data availability, and storage capabilities were really challenging.

Around the 2000s, the emergence of statistical databases was used by developers to teach computers to translate text but the issue of manual labor was still applicable.

Around 2016, Developers from Google come up with the exciting idea of using Neural learning models and Artificial Intelligence to train translation engines. The built machine translation engine from Google shows a significant improvement compared to the early existed machine translation engines. Both effective and faster in terms of text translations performance across many languages was improved.

Neural Machine Translation proved so effective that Google changed course and adopted it as their primary development model. Followed by Microsoft and Amazon.

What is Machine Translation(MT)?

Machine Translation is the process of using Artificial Intelligence to automatically translate content from one language to another without human input.

Image shows the translation of one language to Multiple languages

Using language software that learns over time and can be customized to include static business nomenclature is an asset. Machine language translation can save significant time as it is capable of translating entire text documents in seconds. However, please bear in mind that human translators should always post-edit translations done by MTs.

Employees can communicate and collaborate across time zones with MT software. With a shared knowledge of corporate terminology, the likelihood of judgment errors is diminished.

Using software that learns over time and can be customized to include static business nomenclature is an asset. Simply most MT software provides consistent translations. Feelings and opinions are often reflected in human translations, and sentiment can be altered depending on who is doing the translating.

Types of Machine Translations

The honest truth about the perfect area to apply MTs, More structured content works better with MTs such as technical documentation, and Intellectual property, etc

Colloquial content like marketing and branding or other customer-facing content MTs is optional to use simply because the results will need more human editing.

Machine Translations do vary according to their use cases, selecting the right MTs tool for your business depends on the use case, budget, computing power some of the MTs are too expensive and you can incur a cost that will not add profit to your business.

By understanding types of MTs, it will help to select the right choice depending on the use case, Let's look what are those MTs types are:-

  • Rule-Based Machine Translation(RBMT): the earliest form of Machine Translation that consists of many manual operations. It uses grammar and language rules, developed by language experts, and dictionaries that can be customized to a specific topic or industry.
  • Statistical Machine Translation(RBMT): the improved form of Rule-Based Machine translation. It deals with automatically mapping sentences in one human language, for example, Swahili into another human language such as English.
  • Neural Machine Translation(NMT): this is a smart form of MT, which uses Artificial Intelligence Technology to produce accurate and faster translation compared to other types of MTs. Before NMT, machines used statistical models for document translation services, operated by rigid sets of rules that didn’t accommodate the flexibility and figurative nature of language.

Language barriers affect various business activities. As the world becomes smaller with technology, businesses encounter difficulty accommodating the needs of an increasingly international consumer base. Hiring a translation company and translators can be expensive, but utilizing technology to perform document translation services is a cost-effective option for increasing understanding and promoting inclusivity.

How Machine Translation Works?

It is very interesting to understand how Machine Translation engines work such as Masakhane translate, Google translate, Amazon, Microsoft Translator, etc

We will look at Neural Machine Translation, as the most used form of MTs innovation technology in the world currently.

Neural Machine Translation is a single system that can be trained directly on the source and target text without the need for specialized systems compared to SMTs.

Image by DeepAI

In simple words, In order to teach a machine how to perform translation, you should have data, a collection of million sentences depending on the languages you want to work on. which shows correct translations and fits those sentences into Neural Networks then It will learn how to translate between those example sentences so you can see in order for a translator to become smart, it should be exposed or trained with millions of sentences.

Sounds easy?

Oops! No, it is very technical let's see what is behind this

Every language has 2 important components:

  • Tokens - smallest unit of language.
  • Grammar - defines the ordering of tokens.

How about It is Sunny, the sentence has only 3 tokens which are  It, is  , and sunny If languages were only dependent on tokens and grammar be ignored language translation could be much easier.

Grammar is a sensitive case in language translation, It involves syntax analysis and semantic analysis this is where the complexity of translation begins, simply because languages differ in their syntax and semantics.

But do computers understand human language grammar the same way as we humans do?

The Answer is No, simply because computers do understand numbers.

Instead of defining the grammar for the computer to understand, Neural Networks do it for you. Neural Network is able to learn the pattern in data and is able to translate from a source language(for example Swahili) to a target language(for example English).

Inputs and outputs are both sentences but the computer takes them as numeric values. First of all, they perform the conversion in numeric forms(vectors and matrices).

{Sentence (Swahili) — to — Vector form}

The resulting vector should be converted into a second language(English)

{Vector form  — to — Sentence(English)}

This process is called encoder-decoder architecture.

Encoder-Decoder Architecture

This architecture can be modified by applying various improved methods rather than Recurrent Neural Networks(RNN), Simply because RNN does not check before and after the token makes imperfect translations. The improved way of RNN is Long Short-Term Memory(LSTM) where it able to check before and after the token but is still confused with long sentences.

The improved version of LSTM is Bidirectional Recurrent Neural Networks. Instead of running an RNN only in the forward mode starting from the first token, we start another one from the last token running from back to front. Bidirectional RNNs add a hidden layer that passes information in a backward direction to more flexibly process.

Bidirectional RNN architecture

Then, let's Finalize with the Attention mechanism , The attention mechanism is a part of a neural architecture that enables to dynamically highlight relevant features of the input data, which, in NLP, is typically a sequence of textual elements. It can be applied directly to the raw input or to its higher-level representation.

Attention Mechanism

A neural network is considered to be an effort to mimic human brain actions in a simplified manner. Attention Mechanism is an attempt to implement the same action of selectively concentrating on a few relevant things while ignoring others in deep neural networks. More about attention mechanism here we go.

Final Thoughts

As translation tools become more reliable, there will be more competition for translation agencies to deliver better quality and faster turnaround translations.

This means marketing and sales will be needed to stay competitive in the market. Being able to sell your services will be crucial to convince clients to choose your agency instead of your competitors.

Translational Agencies will be always be needed to provide accurate services, proofreading in order to eliminate errors, simply because most of the existing translation tools are limited to the amount of data that has trained.

Also in terms of confidential information, companies are not ready to expose their confidential data online such as contracts and medical documents, etc.

Words travel worlds. Translator to the driving by Anna Rusconi