Recent breakthroughs in natural language processing technology make it straightforward to create Arabic chatbots. The new Arabic AI chatbot technology uses machine learning to understand the structure of the language as well as to understand the “meaning” of the words.
Arabic is the fourth most spoken language on the internet but it is one of the hardest languages for non-native speakers to learn.
This is because it is different from most languages in a few ways.
In addition to the above, there are many forms and dialects of Arabic. These forms and dialects are related to each other but do not overlap. In fact, one dialect may not be understandable to the speaker of another dialect, for all intents and purposes they are different languages.
All these factors mean that Arabic is more difficult to learn for humans.
Does that mean however that it is also more difficult to learn for machines? Unsurprisingly the answer is yes.
All the above creates challenges for Arabic natural language processing (NLP). The first step for any natural language processing algorithm is making sense of the language i.e. parsing up the sentences into discrete units of meaning. This task is officially called the tokenizing of the language as each discrete unit of meaning is called a token.
The more systematic and orderly the language the easier it is to tokenize the language.
The same challenges that make Arabic hard to learn for humans mean that Arabic is hard to tokenize compared to most other common languages.
Before we can understand the significance of the latest breakthroughs, we need to first understand how a language model for NLP was previously created.
The job of tokenizing the language required a great deal of manual intervention on the part of the NLP researcher. Every language had to be tokenized independently and essentially manually.
This job of tokenizing the language was particularly difficult for Arabic as you can imagine.
Once the language was tokenized, the AI algorithms could be applied to understanding the language, i.e. building a map of meaning for how words in the language relate to each other.
This step of understanding the language could be automated if the tokenization was reliable. The problem was however that the tokenization for Arabic was tricky and therefore even the understanding algorithms needed to be manually configured along with the tokenization.
And the end result was not good. The level of Arabic understanding compared to say English was poor. Of course, there has always been much more focus on research for English than Arabic so that also played a role, but the difficulty of the language made achieving a good result almost impossible.
As AI researchers would naturally do, they wondered whether the tokenization itself could be done by machine learning. This would allow the tokenization and understanding algorithms to become indifferent to the underlying language (called language agnostic) and therefore make training the AI on a language much faster and better.
And this is ultimately where the breakthrough was made in late 2018. The AI could be trained on Arabic without any manual intervention and as a result, the performance of the NLP became much better.
Chatbot Arabic AI platforms could instantly become much better and the resulting understanding in Arabic of the chatbots was similar to the level achieved with other languages.
The fact that this breakthrough occurred doesn’t necessarily mean that the quality of Arabic chatbots instantly improved.
For these benefits to be experienced by customers, the first step was for chatbot AI platforms to update their algorithms to use the latest technology. Given their investment in the previous technology, this is not something that they have done quickly.
Beyond that, there are many features that need to be put in place by the platforms to ensure that the Arabic chatbots deliver a good experience for end users. For example, the user interfaces need to accommodate Arabic. This can be as simple as making sure that the alignment on the chat is correct and that buttons are shown in the right order.
Working with multiple languages on different platforms can be difficult. Some platforms require bots with different languages to be built as separate bots which are obviously highly inefficient.
A good platform will be truly multi-lingual and will therefore allow multiple translations of all content within the user interface of the platform.
In addition, language needs to be tracked as a variable of the conversation so that the AI can detect the language accurately and conversational designers can design logic around the language.
Aside from language specific functionality, to create a great chatbot the general functionality of the chatbot platform needs to be excellent. There are two categories of functionality that are important.
Ultimately the quality of the chatbot experience created for the end user is directly related to the power of the tool used to create it, from the language understanding to the graphical UIs.
It is also often the case, especially in the Arab world, that companies require an on-prem Arabic chatbot. This is obviously a consideration when selecting a platform. An Arabic chatbot on-prem needs to be built with an on-prem Arabic chatbot platform that not only offers an on-prem UI but also houses the full NLU engine and trained language model on-prem.
Even with a good platform, there are still challenges in creating a great chatbot in Arabic. There are a limited number of Arabic speakers in the AI world and therefore it can be challenging to get the right resources to work on the project. While it is not necessary to find resources to write the underlying NLU algorithms as these are provided out of the box, there can be a challenge in finding competent designers that can speak all the languages or dialects supported by the chatbot. It is therefore important that the chatbot platform allows the content and translations to be easily updated and maintained by non-technicals as it is likely that the designer does not speak all the supported languages.
Obviously the fact that high quality Arabic chatbots are now coming online means that adoption of this technology will increase. This increasing adoption will solve the problems of resource constraints and allow potential buyers of the technology to get a clear idea of what the best practice is.
The breakthroughs in NLP technology apply not only to chatbots but also to other AI applications. We are now seeing multi-faceted systems that use Arabic AI in different ways from sentiment analysis in news stories to summarizing or generating text that previously could only be done by humans. Often a chatbot is used as the user interface to not only different AI technologies but to help end users use screens of other systems, such as websites or web apps.
Of course, even though there has been a significant leap in the power of Arabic NLU, the NLU could always be better. Research continues to make the NLU engines even better and no doubt new breakthroughs will come. Until NLU reaches human levels, there will always be work to do.
The next step for all NLU engines regardless of language is to do a better job at multi-turn dialogs. This means allowing a human to have a multi-step conversation with the bot in a narrow topic domain as opposed to just issuing one off commands or questions. And the related next step for the chatbot platforms is making it easy to create multi-turn dialogs.
Multi-turn dialog is particularly important for voice interfaces such as Alexa.
While we have been discussing breakthroughs in machine learning driven tokenization and the implication for Arabic NLP, a related topic is Arabic speech to text transcription. Speech to text transcription for Arabic still lags behind other languages but we are hopeful that the progress in NLP described here will help reduce the gap in the near future.