A Proposed Semantic Machine Translation System for

translating Arabic text to Arabic sign language
Ameera M.Almasoud Hend S. Al-Khalifa
College of Computer and Information Science College of Computer and Information Science
King Saud University King Saud University
Riyadh, Saudi Arabia Riyadh, Saudi Arabia
[email protected] [email protected]

ABSTRACT Deaf people are facing many difficulties when communicating

with other hearing people and in education, because there are
Arabic Sign Language (ArSL) is the native language for the limited resources of information written in their language. As a
Arabic deaf community. ArSL allows deaf people to communicate result, an automatic translation system from Arabic text to ArSL
among themselves and with non-deaf people around them to can help in making information and services accessible to the
express their needs, thoughts and feelings. Opposite to spoken Arab deaf community. Previous work in translating Arabic text to
languages, Sign Language (SL) depends on hands and facial ArSL are very few, most of these research worked only on
expression to express person thoughts instead of sounds. In recent translating words to signs and did not take care of the semantics of
years, interest in automatically translating text to sign language the translated sentence or the translation rules of Arabic text to
for different languages has increased. However, a small set of Arabic sign language. To resolve this problem, we aim in our
these works are specialized in ArSL. Basically, these works research paper to enhance previous research in this field by adding
translate word by word without taking care of the semantics of the an extra layer of semantics while translating Arabic text to ArSL,
translated sentence or the translation rules of Arabic text to Arabic this solution is aided by the power of semantic web technologies.
sign language. In this paper we will present a proposed system for Our proposed semantic translation system is limited to
translating Arabic text to Arabic sign language in the jurisprudence of prayer, because it is a small domain with limited
jurisprudence of prayer domain. The proposed system will vocabulary and it is really needed by our Arab deaf Muslims.
translate Arabic text to ArSL by applying ArSL translation rules
as well as using a domain ontology. The main objectives of pursuing such a solution is: (1) to enhance
Arabic text to Arabic sign language translation using the power of
Categories and Subject Descriptors semantic web technologies (i.e. ontologies) and (2) to advance the
research in the domain of automatic Arabic sign language
J.5 [Arts and Humanities]: language translation, linguistics.
General Terms
The organization of the paper is as follows: in section 2 we
Performance, Design, Languages.
present a brief background of SL, ArSL, notation systems and
Keywords Ontologies. In section 3 we discus previous research in translating
Accessibility, Semantic translation, Arabic sign language, text to SL in different languages. In section 4 we present our
SignWriting, Rule-based approach, Ontology. proposed system architecture and evaluation criteria. Finally, in
section 5 we conclude the paper with future work.


There are 17 million deaf people in the Arab world and 88,000
1. Sign language
deaf people in Saudi Arabia alone [1]. Arabic Sign Language
In the last century, Sign Language (SL) has gained increased
(ArSL) is the native language for many Arab deaf people. Also,
attention and universal recognition by many scientists in the field
of language and computer sciences. This is because SL is
those used in spoken languages that occur simultaneously, We will give a brief overview of the ways to write SL in notation
however they are linear and sequential in spoken languages [2]. system for the purpose of using it in machine translation.

Sign language differs in different regions and countries, but they Stokoe notation was developed in 1960 by William Stokoe, it is
all agree on several things regarding the sign basic parts such as written using symbols similar in the form to English alphabet
Manual Features (MF), Non-Manual Features (NMFs) and also symbols [6]. Figure 1 shows an example of Stokoe notation for
they agree in defining the Signing space. Manual Features are the word “story”: B means flat hand shape, a mean palm facing
signs performed by one or both hands using different shapes, up, z means side to side “means both of right and left hands are
locations, movement and orientations to represent meaning. side to side “ and ~ means up and down.
NMFs are those features that do not involve hands and are used to
give meaning and/or feeling or represent the morphological and
syntactic markers of a sentence [4]. Movements of body parts
such as head and shoulders, eye movements, eyebrows and facial
expression like puffed checks and mouth pattern, are kinds of Figure 1: Stokoe Notation example for: “Story” word [7]
NMF. Signing space is the space or the private area surrounding
the signer used to express signs in the sign language. This area HamNoSys was developed in 1989 by Hamburg University
extends from the top of the head to the waist level and extends by Research Group .The root of this system is Stokoe Notation in
the length of the signer arms [2]. addition to set of parameters set at the end of the word
representation such as: shape, location, orientation, extended
2. Arabic sign language finger orientation and movement of both dominant and the non-
dominant hands in addition to NMF representation [11]. Figure 2
Arabic Sign language is different in each Arab region or/and shows an example of HamNoSys notation for “Oh! Look! There!”
country with many dialects. This difference gives the difficulty of
communicating and dealing between deaf people in different Arab sentence: Means hand shape with extended Index finger,
countries. Means that hand orientation is away from the body, means
that palm of the hand is facing down, means hand location in
A need appeared to unify Arabic sign language in all Arab
the front of the signer neck.
countries. This derived the Council of Arab Ministers of Social
Affairs (CAMSA) to take a decision of developing a unified Arab
sign language dictionary and publish it to all countries, in an
attempt to help Arab deaf people to have a common language in
addition to their local language [3].This dictionary is mostly used
in education and in common communication such as sign Figure 2: HamNoSys notations example for: “Oh! Look! There!”
sentence [9]
language interpreters in television.
Gloss notation: is a textual representation of sign language used
Arabic sign language like other known sign languages depends on
for transcribing sign language video sequences [12].Sign Writing
three basic factors that are used to represent the manual features:
was developed by Valerie Sutton in 1974 [13]. Symbols used in
hand shape, hand location and orientation. In addition to the non-
this system are pictures that are similar to the real forms. Figure 3
manual features that are related to head, face, eyes, eyebrows,
shows an example of SignWriting notation for the word “girl”
shoulders and facial expression like puffed checks and mouth
pattern movements. ArSL is limited to represent nouns, adjectives where Means the head of the signer and the shaded parts
and verbs. Prepositions and adverbs are represented in the context means the hair of the signer, means hand with the index
of articulation by specifying locations, orientations and finger pointing out and the shaded square means that the back of
movement. Intensifiers represented by iteration [5]. Signs forming the hand is facing outward, means the points in the face that
and sequencing in the articulation, are done depending on the
the signer must touches by his index finger and means that the
Arabic sign language grammar and rules.
motion of the hand is downward.

3. Transcription and notation systems

Sign language is represented visually and it cannot be read as

other written languages. There are few attempts to write sign
language, however all of these attempts are not usable because of
their weakness. They also contain symbols that are difficult to Figure 3: SignWriting notation example for: “girl” word [45]

understand and learn.

Stokoe Gloss HamNoSys SignWriting

Representation Symbolic Textual Symbolic Symbolic

Language dependency language-dependent language-dependent language-independent language-independent

Uses Intermediate Intermediate Intermediate Intermediate or final

representation in the representation in the representation in the representation

Usability by Deaf Not practical Not practical Not practical Practical

Way of Writing Horizontally Multi level Horizontally Vertically

(from left to right) (any order) (from top to bottom)

Number of symbols ~55[7] - ~210[10] ~639[14]

NMF Not Supported Supported Supported Strongly supported

As we can see from TABLE II, SignWriting is the best choice for Terminological ontology designed to represent terms that are used
our system, it is a language independent, contains large number of to represent knowledge in certain domain such as lexicons, (2)
basic symbols that can give a chance to build a large number of Information ontology designed to record and structure the
final symbols, it has a better support of NMF, it is understandable, database of a certain domain, and (3) Knowledge modelling
practical and it is usable from the deaf people in their daily life ontology designed to specify the conceptualizations of the
such as education, communication, reading. knowledge.

4. Ontology The second dimension is classified into four categories: (1)

Application ontology designed to model knowledge required for
There are many definitions of ontologies; Studer et al. [22] specific application, (2) Domain ontology designed to represent
defined ontologies as: knowledge relevant to a certain domain, (3) Generic ontology
designed to represent knowledge relevant to many domains, and
"an abstract model of some phenomenon in the world (3) Representation ontology designed to represent a framework
by having identified the relevant concepts of that with a neutral view with respect to world entities.
phenomenon. Explicit means that the typeof concepts
used, and the constraints on their use are explicitly In our system we will be representing the jurisprudence of prayer
defined. Formal refers to the fact that the ontology domain using a Domain ontology designed to represent
should be machine readable. Shared reflects the knowledge relevant to this domain.
notion that an ontology captures consensual
knowledge, that is, it is not private of some individual, 5. Sign Language Machine translation
but accepted by a group"
Sign language machine translation follow two approaches: Rule
Ontologies are so important nowadays to share common based and Data driven approach. The Data driven approach, also
understanding of the domain knowledge and to know how known as corpus-based approach, can be
knowledge is structured and related to each other. Also, it is
important to help in reusing these knowledge artefacts. divided into Statistical Machine Translation (SMT) and Example–
Based Machine Translation (EBMT) methodologies.

The Data driven approach requires a prerequisite corpus to work

Ontologies can be classified according to [24] into two on it and the accuracy and quality of the transition depend on the
dimensions: (1) the amount and type of structure of the corpus size. On the other hand, Rule-based approach, the second
conceptualization and (2) the subject of the conceptualization. The approach, is based on linguistic rules. It has two paths: direct path
first dimension is classified into three categories: (1) and indirect path. Direct path approach is used in bilingual

dictionaries that require translating a word to corresponding word words with the corresponding signs and file names of the sign
only without any detailed analysis of the syntactic structures of representation video. If the user enters a word that is available in
the inputted text or any relation to the meaning of the words or the database then the recorded clip will be shown, if the word is
relationship between them. Indirect path approach is the most not included then finger spelling is done. Similarly, Tawassol
sophisticated and widely used approach in machine translation. [40], is another Arabic system for translating Arabic text to
This approach is used to analyse the syntactic structure of the Arabic SL. The system is used as an educational tool. It contains a
inputted text and create an intermediate or abstract representation translator, a dictionary of Arabic words for a set of categories, in
of it and then generate a target language text from it, this means addition to a finger spelling editor. The system uses Buckwalter
that we need to specify the word structure, sentence structure and Arabic Morphological Analyzer to analyse the inputted text and
semantic structure in successive processes [2]. Vcommunicator Gesture Builder 2.0 with Sign Smith Studio
program to generate the animation output.
According to the nature of the intermediate representation,
indirect approach can be divided into Transfer-based and As we can see from the previous work that the final output is
Interlingua-based methodologies. Transfer-based is a language- either a video clip or an animated avatar, none have used
dependent, need to know the source and target languages. The SignWriting notation as an output, this does not mean that
analysis of the source language sentence is a shallow analysis and SignWriting is not usable. Actually, SignWriting is used in other
works on the syntactic level. Interlingual-based analysis is a applications either as a final stage of the translation or as an
deeper analysis of the source language sentence which creates intermediate stage. For instance, the JSPad system [41] is used to
structures of a more semantic nature. This structure can be write a Japanese sign language (JSL) using SignWriting. The
transferred into language independent semantic representation that system take a Japanese text then split it into signs, these signs are
we can use to produce any target language translation. In our mapped to SignWriting symbols referring to the JSL dictionary
system we will follow the Rule-based approach for two reasons: then it display them on the screen to permit the users to edit the
(1) we do not have an Arabic corpus to work on and (2) no generated signs then add them to the dictionary. Likewise, Ahmed
previous Arabic work followed this way. and Seong [43] developed a system for writing and reading text
messages in signs as an alternative to SMS on mobile phones. The
SignWriting notation system was used to convert text to sign
message and sign to text message in two-way communication.
Brito and Pereira [44] also proposed a model to support sign
Previous research in machine translation of written text to signed language content development and deployment in digital
language follow two approaches, as mentioned in the previous television scenarios by using SignWriting.
section: Rule based and Data driven approach.
To further extend the research in SignWriting and Arabic text to
In our literature review we will focus on previous research that ArSL translation, our proposed system will benefit from the two
used rule-based approach for translating text to SL. In fact, there domains as we will describe next.
are a number of successful Rule based systems that translate text
to sign language. We can divide these works into three groups,
International research, Arabic research and SignWriting research. PROPOSED SYSTEM
International research is any work carried out to convert from Given the previous work in the domains of text to SL translation
Non-Arabic text to sign language. TEAM [28] and eSIGN [18] and SignWriting notation, our proposed solution will enhance
(essential Sign Language Information on Government Networks) previous techniques used to translate Arabic text to ArSL by
are sample of two projects that translate English text to American considering ArSL translation rules and using a domain ontology
SL. ViSiCAST [32] is another project that translate English text to to produce SignWriting notation. The SignWriting will be used as
British SL. Also, Zijl [33] developed a system to translate English the final output of the system or as an intermediate level for future
text to South African SL. Baldassarri et al. [35] developed a avatar animation.
system to translate Spanish text to Spanish SL. Dasgupta et al.
[36] developed a system to translate English text to Indian sign Next we will describe in detail the components of our proposed
language. Sarkar et al. [37] developed a system to translate Bangla system.
text to Bangla SL. JEMNI and ELGHOUL [38] developed a
system to translate a given text to SL for multiple languages.
1. Domain Ontology component description
Arabic research developed to translate Arabic text to Arabic SL is
rare. For instance, Mohandes [39] developed a system to translate The domain of jurisprudence of prayer will consist of a set of
Arabic text into Arabic SL. This system is one stage in the process classes in taxonomic (subclass) hierarchy, as follows:
of developing a system to translate Arabic speech to Arabic sign
language. The system has a database to store Arabic dictionary  “‫ ”دﯾﻦ‬class is a super class of “‫ ”دﯾﻦ إﺳﻼﻣﻲ‬class.

 “‫ ”دﯾﻦ إﺳﻼﻣﻲ‬class has three sub classes:”‫”ﻋﺒﺎدات‬,”‫ ”ﻣﻌﺎﻣﻼت‬and Morphological analysis: This process takes Arabic text as an
“‫”أﺧﻼق‬. input and sends each sentence to the Morphological Analysis and
 “‫ ”ﻋﺒﺎدات‬class has three sub classes:”‫”أرﻛﺎن‬,”‫”واﺟﺒﺎت‬ Disambiguation for Arabic (MADA) tool for Part of Speech
and”‫”ﻣﺴﻨﻮﻧﺎت‬. (POS) tagging. MADA returns a feature line for each word in the
 “‫”أرﻛﺎن‬ class has five sub classes:” ‫ﻧﻄﻖ‬ inputted sentence, feature line consist of a set of
<feature>:<value> pairs. Word features such as (Gender, Mood,
 “‫ ”اﻟﺼﻼة‬class has a set of sub classes:”‫”ﻧﺎﻓﻠﺔ‬,”‫ ”ﻓﺮض‬and” ‫أھﻞ‬
‫”اﻷﻋﺬار‬, etc. Case … etc), POS (Nouns, Verbs, Adjectives, Pronouns … etc)
 “‫ ”ﻧﺎﻓﻠﺔ‬class has a set of instances:”‫”اﻟﺴﻨﻦ اﻟﺮواﺗﺐ‬,”‫”اﻟﻮﺗﺮ‬ and proclitic (the word, question, conjunction, preposition... etc),
,”‫”اﻟﻜﺴﻮف‬,”‫”اﻟﺨﺴﻮف‬,”‫”اﻟﺠﻨﺎزة‬,”‫”اﻟﻀﺤﻰ‬, ”‫ ”اﻹﺳﺘﺨﺎرة‬and ” ‫ﺗﺤﯿﺔ‬ enclitics associated with (person, gender, number), the rest
‫”اﻟﻤﺴﺠﺪ‬. include the diacritic form (diac), the lexeme/lemma (lex), the
Buckwalter tag (bw) and the gloss (gloss) [12].
Also, there will be a set of properties for connecting classes and
instances with each other, this include: Grammatical transformation: The grammatical transformation
process takes the previous results as input and applies the Arabic
 Has. Sign Language rules on each word depending on its feature.
 Is-a.
 Is a kind of. Semantic translation: This process takes the result of the
 Is a synonym of. previous process and search for each word in the Domain
Ontology to get the word sign code. If the word does not have a
Figure 4 illustrates an example of an ontology component.
corresponding sign then replace this word by one of its synonyms
that have a sign in the SignWriting Database (DB). Then, replace
each sign code by the corresponding sign symbol stored in the
SignWriting DB. If the word does not have a corresponding sign
in the domain ontology, it will be finger spelled.


Based on our literature review, experts' evaluations have been

used widely to evaluate the translation result, e.g. [21],[22]. The
reason is that the translation can take different correct ways, only
the experts of the Arabic Sign Language can decide upon its
check their accuracy, (2) ask experts to translate a set of sentences
manually and compare their results to our system translation
Figure 4: Ontology component illustration

System architecture

The architecture of our system is illustrated in Figure 5. The

system is composed of a set of processes, namely: Morphological
analysis, Grammatical transformation and Semantic translation.

Figure 5: Proposed system architecture

