Kumano 2002

Systems and Computers in Japan, Vol. 33, No.
8, 2002
Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J84-D-II, No. 6, June 2001, pp. 1175–1184
A Translation Aid System by Retrieving Bilingual News

Database
Tadashi Kumano,1 Isao Goto,1 Hideki Tanaka,1 Noriyoshi Uratani,2 and Terumasa Ehara2
1
ATR Spoken Language Translation Research Laboratories, Kyoto, 619–0288 Japan
2
NHK Science and Technical Research Laboratories, Tokyo, 157–8510 Japan
SUMMARY applications such as web page content or certain types of

correspondence. However, skilled translators are still re-
Machine translation technology is currently incapa- quired for high-quality translations. The recent growth in
ble of producing translations of the high quality required international communications has increased the need for
for purposes such as broadcast news. Such translations still high-quality translations, and increased the pressure on
require skilled human translators. We have developed a human translators to produce better translations more
translation aid system to support translators in such tasks. quickly.
The system retrieves news articles by answering user que- Kay [4] proposed the germinal concept of Machine-
ries, and shows the entire article together with the corre- Aided Human Translations (MAHT), and various systems
sponding translated article. The system does not require based on the idea have been proposed and developed [3,
manual alignment of each sentence with its translation 10]. We developed a translation example retrieving system,
when storing articles in a database. Thus, it is capable of one product of our research and development efforts to find
handling flexible translations. Moreover, the system helps ways to assist in translating Japanese broadcast news into
users learn not just the translations for queried expressions, English.
but also the facts described in the articles, which can aid in The system is a type of Translation Memory system.
producing good translations. The results of a user inquiry A Translation Memory system has a database that contains
demonstrated the validity of the system. © 2002 Wiley the source text and its corresponding translation in the target
Periodicals, Inc. Syst Comp Jpn, 33(8): 19–29, 2002; Pub- language. A user enters an expression to be translated, and
lished online in Wiley InterScience (www.interscience. the system searches its database and returns an expression
wiley.com). DOI 10.1002/scj.10072 identical or similar to the input paired with its translation.
The user can translate by referring to this translation pair.
Key words: translation aid system; translation Several systems that implement Translation Memory have
memory; broadcast news; bilingual news database. been proposed thus far [1–3, 5, 11–14], and some commer-
cial products are currently available.*
Most of these systems file and display sentences as
1. Overview translation-pair units. The translation documents will be
registered in Translation Memory with the source and
In recent years, machine translation software has
attained a level suitable for practical applications and is *
For example, TRADOS Translator’s Workbench (TRADOS Corp.;
being used to provide information in foreign languages for http://www.trados.com/) or Transit (STAR; http://www.star-ag.chi/).
© 2002 Wiley Periodicals, Inc.

19
target sentences aligned manually or automatically without in Section 4. Section 5 summarizes the paper and discusses
considering their contexts. Some systems utilize transla- future plans.
tion-pair units that are larger than sentences. The translation
retrieval system proposed by Kitamura and Yamamoto
[5] stores articles that are assigned word and sentence 2. System Design
level alignment automatically, but displays sentential
In this section, we will explain the characteristics of
units as search results. In addition to search results, the
broadcast news translations and the associated translation
translation information management system of Ando’s
style, and then describe the system design policy for assist-
group [1] can display original documents, because it
ing broadcast news translation.
registers reference information between sentences and
documents. However, it still stores translation pairs in
sentential units. 2.1. Broadcast news translation work
Our system stores articles, which are independent
documents composed of several sentences. The system NHK (Japan Broadcasting Corporation) has a trans-
shows article pairs as translation units, because broadcast lation section that translates Japanese news into English.
news translations are difficult to store in sentential units. The English news thus produced is used for the supplemen-
tary sound channel of domestic TV programs and interna-
Broadcast news is rarely translated literally; it is modi-
tional broadcasting.* Approximately 90 English articles are
fied to suit the intended audience or news style of each
released daily. About 20 translators work simultaneously at
language. Such highly “free translations” are difficult to
the busiest time before the prime time news.
store as translation units because it is too hard to align
The source texts—Japanese news articles—are com-
each sentence, even for human experts. It is not possible
plete documents composed of about five sentences. The
to determine whether one can reuse a registered transla-
average sentence contains 90 letters [6]. For the subchannel
tion or not without a context. In other words, the senten-
of domestic news, a Japanese article is translated into
tial alignment like other systems is unsuitable for
English without changing its length. Several Japanese arti-
broadcast news translation. We resolved these difficulties
cles might be summarized in an English translation for
by using article pairs as translation units. Our system international news; contents may be added or deleted as
stores aligned articles, which are complete documents appropriate for audiences outside Japan.
composed of several sentences, and displays them to the English news articles are produced as follows. First,
user. We will show that the Translation Memory system “Japanese writers” write English articles based on Japanese
can be effectively applied to broadcast news translation articles. Next, “English news rewriters” check the quality
work. of the English. In the final step, “editors” check the content.
Our system is useful for checking fixed translations Working translators refer to various materials in ad-
such as proper names. In addition, it supports translators in dition to dictionaries and reference books, including past
making context-sensitive translations, a characteristic of translations, personal memorandums, electronically shared
broadcast news translation. Translators can use our system translation memorandums, newspapers, journals, web
as a tool not only for checking past work, but also for pages, and so on. The knowledge sought by translators is
retrieving articles in research for past events. classified as follows:
In a survey, users responded favorably to the concept
of “showing whole articles.” Over 60% of the users pointed • Linguistic information: the English (Japanese)
out that the system contributed to translation efficiency. equivalent for an expression
Contributions to translation quality were also noted. There- – Terms for proper names or fixed expressions
fore, we can conclude that our system provides an effective – A variety of translations, depending on con-
aid for broadcast news translation work. The system data- text
base will be expanded through the automatic addition of • Historical knowledge required to understand and
new articles. Daily expansion will result in greater useful- translate the source article
ness and stronger support. – The topic event details and its historical back-
In Section 2, we explain the characteristics of Japa- ground and past translations of these events
nese–English translations of broadcast news and detailed
work procedures, and then discuss the system design policy. For a broadcast news assisting system, the most im-
Section 3 describes the system framework and provides a portant design factor is fast turnaround. Broadcast news
specification of its components, and then introduces the requires speedy translations. New and important articles to
current status. The validity of our system is discussed in
*
light of the results of a user inquiry for system evaluation NHK homepage (http://www.nhk.or.jp/index-e.html).
20
be translated might arrive just as programs are about to air. • Appropriate style for English news:
While translation quality is important, speed has first pri- The English news will be broadcast not as a trans-
ority. lation of the Japanese news, but as an independent
broadcast. The English sentences must be natural
2.2. Translation style of broadcast news and fit the particular news style.
For technical document translations, a common area

2.3. Problems and system requirement
for professional translators, technical terms or translation
styles must be unified within a document or series of Existing Translation Memory systems store small
documents. The source texts are written in plain style and translation units, such as sentences, with alignment infor-
contain many fixed expressions and few rhetorical passages mation. The alignment between source and target texts
and are translated literally. These characteristics make such requires some human input.* Manual alignment has the
translation tasks more suitable for the successful applica- following problems:
tion of Translation Memory systems. The source document 1. The alignment represents additional work, impos-
and its translation can be aligned (semi)automatically, and ing a heavy burden in busy work environments with a
many useful translation examples can be archived. Transla- shortage of workers, or in environments that do not have
tion Memory is very effective with constant terminologies steady employees who could be favored with the benefit of
or translation styles in circumstances where several trans- working by themselves for database enrichment.
lators work together. 2. It is very difficult or inefficient to align source texts
On the other hand, literal translations are not always and “free translation” having differing content or informa-
adequate for literary texts. The source texts often contain tion structures, because correct manual alignment takes too
words or phrases that are difficult to translate into the target much time to determine. Moreover, not all sentences can be
language. Rather than a constant terminology or translation matched with a corresponding sentence. If a source sen-
style, varied wording is preferred. Literary translations tence does not have a corresponding translation, these sen-
require a tool for finding a variety of translations or a new tences are not registered into the database. It is also difficult
relation between matters is required, rather than one based to determine whether registered sentences can be reused or
on translation examples.† not without context.
Broadcast news translations share characteristics of
both of the above types of translation. This must be taken For the above reasons, we concluded that the existing
into consideration when selecting a method for providing Translation Memory method is not suitable for use with
broadcast news translation work.
translation support. Appendix A gives an example of a
Given the characteristics of broadcast news transla-
Japanese news article and its English translation. The char-
tions and the translation style described in the preceding
acteristics of broadcast news translation will be enumerated
section, we consider the following requirements for such a
in light of the translation pair given in Appendix A.
system:
• Changing content or composition according to
• No additional work is required for translation data
audience:
registration.
Contents may be added or deleted, as appropriate
• Users can obtain contextual information for the
for specific audiences. The Japanese article in
whole article and a variety of translations useful
Appendix A reports the earthquake magnitudes
for “free translations.”
and risk of seismic sea waves. The English article
• Users can obtain both linguistic and historical
says nothing about them, but gives the location of information on an expression.
the Izu Islands for a non-Japanese audience. • Quick response, easy to use, and no constraints on
• Constant terminology for proper nouns, the most query input.
common type of fixed expressions:
Proper names must always be translated into the
2.4. Storing/displaying articles as alignment
same unchanging expressions. There are fixed
units
expressions that are domain-specific, such as eco-
nomics. The Japanese word “Kishocho” must al- Based on the system requirements given in the pre-
ways be matched with the English equivalent “The vious section, we propose a new framework that employs
Meteorological Agency.”
*
This does not mean that existing TM systems have no function to assist
†
Yanase, the literary translator, referred to the effectiveness of using the work. Some commercial systems provide advanced supporting func-
electrical dictionaries for obtaining ideas in his essay [16]. tion as their selling points.
21
articles as a translation pair unit for storing and displaying miliar interface and commands provide easy access to the
results. The system stores the source article and its corre- system for uses.
sponding article without sentential alignment. When a user
inputs a query expression, the system will show the whole
article, including the expression and the corresponding 3. The System Framework and Working
article as they are. Users can easily find the information they
Status
need, since most articles are composed of five or six sen-
tences. In addition, the users of such a system—who will
be professional translators—will have a good grasp of both 3.1. The system framework and specification
the source and target languages. of its components
Making articles the units of such a system has the
following advantages: The architecture of our system is shown in Fig. 1. The
process of retrieval using the system is described as follows.
• Involved no manual sentential alignment. Since
each article is given its own article number and 1. When a query expression is input into the query
corresponding article information when it is cre- input page, the query expression is passed from the client
ated, data registration occurs automatically. to the search server. The search server executes an exact
• Consulting translations with full context. The user match, and the client is shown the search result, if a result
can examine the search expression and its transla- is returned.
tion in their original contexts, hence helping to 2. If there is no article identical to the query expres-
determine whether the search result should be sion, or if a user presses the “relax” button, the client sends
reused or not. The system is useful in obtaining the input to the keyword extract server. The client obtains
knowledge for highly “free translations.” keywords and the distance in character length between
• Obtaining historical background. The system every adjoining keyword from the keyword extract server,
functions as a tool for surveying past events (and and passes them to the search server to execute a similar
their translations) because it shows the user whole expression search.
articles. Users are simply required to input an 3. As a search result, the client shows the title list of
articles in which keywords appear most frequently in the
expression to obtain a translation or to track down
same order as the input expression. The retrieved articles
related facts. The user concurrently obtains his-
are listed by degree of similarity. The user selects a title
torical information by entering a query for trans-
from the list, a bilingual article pair is shown with the part
lation.
identical or similar to the input highlighted on the input
language side.
2.5. System design policy 4. The user can view articles in which fewer keywords
appear than the current search result by pressing the “relax”
Our system design policy consisted of the following button for a search request. The user can return to the
four points: previous result by pressing the “tighten” button.
1. Storing and showing articles as translation pair 3.1.1. Search server

units.
The search server receives an article retrieve request
2. Quick similar expression retrieval. Implementing
from the client over the network. It executes an “exact
a function for quick similar expression retrieval for brows- match search” when given the query input as the search
ing many translation examples in less time. Accepting every request. It executes a “similar expression search” when
type of string, from a word to a sentence, as query inputs. given the keyword sequences and the distance between
Employing a simple string-match-based method without every adjoining keyword, information obtained by passing
concept extension for quick response and accuracy, ena- the input expression to the keyword extract server. Search
bling detection of subtle difference among proper names. options can force the system to ignore the order of keywords
3. Accepts both Japanese and English as query inputs. or the distance between keywords. When several expres-
Accepting both the source and target languages as query sions are combined as an input, their order and distances
inputs enables flexible response to search requests. between expressions are disregarded. See Ref. 15 for an
4. Uses a Web browser as a user interface. Web in-depth discussion of the search algorithm.
browsers can be used as a shared interface from various The search server was implemented while keeping in
work environments and computer platforms, while the fa- mind the following points:
22
Fig. 1. The system framework.
• Employment of a search method having less index quickly has the effect of preventing time-consum-
information: ing keyword searches from interrupting other
The search method using suffix array index em- search requests, even in single-processor environ-
ployed by Tanaka and colleagues [15] is fast, ments.
especially for longer strings, but is not suitable for • Caching past search results:
a practical system because the index file is quite Each of the keyword search results can be reused
large and takes too much time to update. Instead in other searches unless the database is updated.
of this method, we employed a search method For reuse, the system caches results as far as
using simple character (for Japanese) and word memory permits. Caching significantly improves
(for English) index, to reduce the size of the index system response when search requests involving
file to about one-sixth. The increase in retrieval the same keyword are repeatedly submitted by
time caused by changing the search method is several users translating articles on similar topics
relatively unimportant, since a similar expression simultaneously.
search is a combination of searches for extracted The search server is implemented in C++ and runs
keywords, and these keywords are rarely long. under Solaris 2.6 or later environments.
• Multithreaded search process implementation:
The search program was designed to be mul- 3.1.2. Keyword extract server
tithreaded to permit multiple concurrent sub- The keyword extract server receives an input expres-
searches of keywords extracted from query sion in Japanese or English from the client over a network.
expressions in parallel threads. The multithreaded It extracts content words as keywords from the input after
processes enable hosts in multiprocessor environ- morphological analysis and returns keywords with distance
ments to make several keyword searches simulta- information between keywords (see Ref. 15). For morpho-
neously, thereby decreasing processing time. logical analysis, the server uses a morphological analysis
Keyword search threads invoked by independent program for Japanese developed by NHK for broadcast
search requests run in parallel. They are scheduled news articles, and the simple POS tagger for English cre-
to give precedence during execution to threads that ated for the system.
have longer keywords. In general, longer key-
words tend to occur less often in the database, and 3.1.3. Client
are therefore processed more quickly. Giving Mediating between the servers and the user, the client
precedence to threads that are likely to finish is implemented as Perl scripts (using mod_perl) running on
23
Fig. 2. The user interface of the system.
the Web server (Apache). The client consists of the query 4. User Evaluation and Discussion
input page, in which users input query expressions or
specify retrieval options, and the result-browsing page
(Fig. 2). The query input page can be selected from the We carried out a user inquiry to evaluate the system
contribution to broadcast news translation work. The in-
following three types according to the query type and user
quiry was carried out for about 2 months after the system
skill:
began providing full-scale service, and 41 people re-
sponded. We will discuss the usefulness of the system based
• Simple search page:
on the results of the inquiry (see Appendix B).
All retrieval options are set to the standard and
hidden from users.
• Normal search page: 4.1. System contribution to translation work
Users set their own options.
• Advanced search page: Before system introduction, users were able to re-
Users can extract keywords from input expres- trieve past articles using their creation date as a key, but
could not use expressions or even words to retrieve infor-
sions, and edit the keywords for retrieval.
mation. We asked users to describe how doing translations
had changed after the system introduction.
3.2. Working status on the site at which
broadcast news programs are made 4.1.1. Changes in translation speed and quality
To the question, “Did the time required for translation
The system began providing full-scale 24-hour serv- change when you began using the system?” (Question 2),
ice at the NHK International Broadcasting English Center over 60% of the users admitted that the system improved
in July 2000. The system runs on a Sun Ultra 10S (Ul- the speed of translation, and that the system was helpful in
traSPARC-IIi 333 MHz; memory 1 GB). About 15 users translation by providing information about both past trans-
use the retrieval function simultaneously at the busiest lations and past events.
times, while creating the supplementary sound channel for To another question, “Did your translation quality
the TV news program at 7 p.m. change as a result of using the system?” (Question 3), over
The system contained a database composed of about 30% of the users responded that the system improved
translation quality, by making it easy to look up translations
320,000 articles in Japanese and 70,000 articles in English
of proper names and to obtain knowledge of past events
(about 40,000 articles are matched to a corresponding arti-
required to perform the translation. A user pointed out that
cle in the opposite language) when the full-scale service
the faster searches enabled by the system encouraged fre-
was started. After the system began running, English arti- quent searches, even when time was limited, thereby im-
cles were registered into the database with Japanese articles proving translation quality.
everyday in the middle of the night. About 300 articles in We can conclude from the above responses that the
Japanese and 90 articles in English (with about 30 being system introduction created mutually potentiating effects
paired articles) are added daily. on translation speed and quality.
24
4.1.2. Changes in translation work procedure corresponding part of its translation. The responses above
indicate that showing the whole articles functions effec-
To the question, “Did using the system change your tively because the system can be used for researching both
translation procedures?” (Question 4), about 80% of the past translations and past facts.
users replied that they had not changed their translation
procedures, but some favorable responses indicated that the 4.2.2. Effect of showing whole articles
system had made it easier to perform research when prepar-
ing, or when translating. To the question, “Which do you prefer—showing the
To the question, “Did the system change the fre- whole articles and the equivalent translated articles, or
quency at which you added words to your own memoran- showing only identical or similar expressions to the input
dum or shared term list?” (Question 5), half of the users and the matching translations (if this is possible)?” (Ques-
surveyed responded that such activity clearly decreased. tion 8), about 70% of the users selected “showing the whole
These responses indicate that a system that enables users to article,” the implementation of our system. Although many
retrieve past translations easily can serve as an alternative users sometimes experience trouble, as shown in the replies
to storing translations manually for reuse, which is time- to another question, “Do you have any trouble finding the
consuming. information you want from a search result translation?”
The system can thus be considered to contribute not (Question 7), some comments in response to Question 8
just to translation speed and quality, but also to reductions suggest that it is difficult to determine whether the search
in translation-related tasks, such as researching material result can be reused for translation without referring con-
and storing translation knowledge. text, or that looking over the whole articles helps transla-
tions. Thus, in general, users thought highly of showing the
whole articles.
4.2. Considerations related to system design
policy 4.3. Adequacy of search algorithm
Our system shows retrieved articles and their transla- Our system employed the similar expression search
tions as whole articles. While parts identical or similar to algorithm [15], which can handle long expressions, to allow
the input are highlighted, the corresponding translation is input to range from a word to a sentence without restric-
not marked in the article. These system features were in- tions. We asked users what element they used as input. To
tended to achieve two objectives. One was to satisfy search the question, “What element do you use as input?” (Ques-
requests for both past translations and historical information 9), about 40% of the users answered that they used
tion; the other was to enable users to survey translation pairs longer expressions, such as phrases, clauses, and sentences,
with context where alignment in sentential unit or some rather than words. Hence, a search algorithm capable of
other unit is impossible due to the freeform nature of the flexibly executing similar expression search against longer
translation required. The questions were drawn up to deter- expressions is useful.
mine to what extent this method of displaying search results
achieved the objectives and met the users’ demand.
5. Conclusion
4.2.1. Purpose of searches
In this paper, we proposed a translation example
To the question, “What is your primary reason for
retrieving system for assisting broadcast news translation
performing searches?” (Question 6), about 70% of the users
work and described its implementation. Given the charac-
replied that they use the system to obtain both past transla-
teristics of broadcast news translations, which are typically
tions and historical information. To Question 4, one re-
free translations involving changes in content or structure,
sponse was that the user came to rely on the system in place
we based our system design policy on the concept of
of other materials. These responses show that the system
“showing whole articles.” We constructed a mechanism to
functions as a research method for surveying past, replacing
enable provision of both a translation of an expression and
other research methods.
historical information, required for translation work, in a
Furthermore, answers to Question 6 reveal that over
single step. Our survey of users confirmed the system
50% of the users, who use the system for the purpose of
contribution to translation speed and quality, and resulted
learning both past translations and past facts, are not con-
in favorable comments on the system from the users.
scious of a difference between the purposes when retriev-
Some future problems are the following.
ing. This suggests that such users will lose the chance to
obtain information from the whole articles if the system • Many users sometimes have trouble finding infor-
shows just the identical or similar part to the input and the mation from the translation part of the search
25
result (user inquiry [Question 7]). Some Japa- Annual Meeting of the Information Processing Society
nese–English alignment indication will be useful of Japan, Osaka, 1996;II:385–386. (in Japanese)
for such cases. We have been working on calculat- 6. Kumano T, Tanaka H, Kim Y, Uratani N. Primary
ing a useful alignment relation fully automatically investigation on NHK’s Japanese and English news
between bilingual documents containing highly database. Proc 2nd Annual Meeting of the Associa-
freeform translations such as news translations [7, tion for Natural Language Processing, Tokyo, 1996,
8], and will continue to work to find useful ways p 41–44. (in Japanese)
to present alignment information. 7. Kumano T, Tanaka H, Ehara T. Statistical alignment
• Because the system offered users an easy way to between Japanese and English news elements. Proc
refer to past translations, some users ended up 53rd Annual Meeting of the Information Processing
relying only on the search results of the system, Society of Japan, Osaka, 1996;II:53–54. (in Japanese)
instead of referring to other materials (inquiry 8. Kumano T, Tanaka H, Uratani N, Ehara T. Translation
[Question 4]). However, past translations are examples browser: Japanese to English translation
sometimes wrong or inappropriate, and the possi- aid for news articles. Proc Natural Language Process-
bility of repeating past mistranslations has been ing and Industrial Applications, Moncton, Canada,
pointed out (inquiry [Question 3]). To resolve this 1998;I:96–102.
problem, we proposed an integrated translation- 9. Kumano T, Goto I, Uratani N, Ehara T. Translators’
assisting environment, “Translators’ Workbench” Workbench: An integrated translation-aiding envi-
[9], which seeks to realize an advanced support ronment using bi-text editor as a mediator between
system. This environment will have an interface users and computer resources. Proc 6th Annual Meet-
that enables users to access not only past transla- ing of the Association for Natural Language Process-
tions, but also other materials, such as dictionar- ing, Ishikawa, 2000, p 143–146. (in Japanese)
ies, reference volumes, and Web pages in a unified 10. Melby A. On human–machine interaction in transla-
manner and allow one-step comparison of these tion. In Nirenburg S (editor). Machine Translation—
various information sources. The validity of the Theoretical and methodological issues. Cambridge
method will be discussed in another paper. University Press; 1987. p 145–154.
11. Nakamura M. Translation support by retrieving bilin-
Acknowledgments. We express our gratitude to gual texts. Proc 38th Annual Meeting of the Informa-
Seiichi Yamamoto, director of ATR SLT labs, and Hideki tion Processing Society of Japan, Tokyo,
Kashioka, senior researcher, for providing the opportunity 1989;I:357–358. (in Japanese)
to write this paper. We also thank Shin’yo Matsuda and 12. Sato S. CTM: an example-based translation aid sys-
Taizan Suzuki for their help in developing the system. tem. Proc 14th Int Conference on COLING, Nantes
1992;4:1259–1263.
13. Sumita E, Tsutsumi Y. A translation aid system using
REFERENCES flexible text retrieval based on syntax-matching. TRL
Research Report TR87-1019, IBM Tokyo Research
1. Ando S, Sato K, Okumura A. The development of Laboratory, 1988.
multilingual translation know-how sharing system. 14. Takeda A, Furugori T. A sample-based system for
Proc 5th Annual Meeting of the Association for Natu- helping Japanese write English sentences. J Inf Proc-
ral Language Processing, Tokyo, 2000, p 25–28. (in ess Soc Japan 1994;35:53–61. (in Japanese)
Japanese) 15. Tanaka H, Kumano T, Uratani N, Ehara T. An effi-
2. Hutchins J. The state of machine translation in cient way of gauging similarity between long Japa-
Europe. Proc 2nd Conference of the Association for nese news expressions. J Nat Language Process
Machine Translation in the Americas, Montreal, 1999;6:96–116. (in Japanese)
1996, p 198–205. 16. Yanase N. Dictionaries are Joyce-ful. TBS-Britan-
3. Isabelle P, Church KW (editors). Special issue on: nica; 1994. (in Japanese)
New tools for human translators. Machine Transla-
tion 1997;12.
4. Kay M. The proper place of men and machine in APPENDIX
language translation. Working Paper CSL-80-11,
Xerox Palo Alto Research Center, 1980. (Reprinted A. Sample of a Japanese News Article and Its
in Machine Translation 1997;12:3–34.)
Translation into English
5. Kitamura M, Yamamoto H. Translation retrieval sys-
tem using alignment data from parallel texts. Proc 53rd Literal translation into English:
26
1. There was a strong earthquake at 6:42 this morning [Question 1] Do you frequently use the system?
in Izu Islands, the site of recent numerous earthquakes. An Frequently use ... 29
earthquake of a little less than five in seismic intensity was Occasionally use ... 11
observed at Shikine Island. Not use ... 1
2. In addition, an event of seismic intensity four was – It is troublesome to find information
observed for Niijima and Kozu Island, events of seismic I need from the search results.
intensity three for Toshima Island and Miyake Island, and I don’t know how to copy and paste.
events of seismic intensity two and one for various parts of
Kanto Area and Shizuoka Prefecture. [Question 2] Did the time required for translation change
3. There is no risk of tsunamis resulting from this when you began using the system?
earthquake. Decreased . . . 24
4. According to observations by Kishocho (the Me- – It made it more convenient to look up proper
teorological Agency), the earthquake epicenter was located names.
in the sea at a depth of ten kilometers near Niijima and Kozu – It provided speedy reference for background
Island. The magnitude of the earthquakes was estimated to facts.
be five point one. – It allowed me to consult past translations for
5. In Izu Islands, where seismic activity has been fixed expressions.
observed from the end of June, repeated cycles of seismic No change . . . 11
activity and dormancy have been observed. On the 30th of – I expect to take less time once I have become
the previous month, a single strong earthquake having familiar with the system.
seismic intensity of a little less than six was observed at Increased ... 2
Miyake Island, while two earthquakes having seismic in- – I can check translations carefully using the
tensity of five were also observed there. system, but I find I tend to spend more time on
6. In a series of seismic events, seventeen earthquakes such research.
having seismic intensity over five have been observed up to – It takes time to find the appropriate search
this point, including strong tremors with a seismic intensity results.
of a little less than six observed four times at Kozu Island,
Niijima, and Miyake Island. [Question 3] Did your translation quality change as a result
Final version of translation into English: of using the system?
It got better . . . 12
1. A strong earthquake jolted Shikine Island, one of
– It allowed me to make translations based on
the Izu Islands south of Tokyo, early on Thursday morning.
information on past events.
2. The Meteorological Agency says the quake meas-
– It made it easy to look up words and phrases,
ured five-minus on the Japanese scale of seven.
such as proper names.
3. The quake affected other islands nearby.
– It allowed me to eliminate instances where I
4. Seismic activity began in the area in late July, and
used a common noun instead of a proper name
17 quakes of similar or stronger intensity have occurred.
due to time limitations.
5. Officials are warning of more similar or stronger
– It made it possible to find words for difficult
earthquakes around Niijima and Kozu Island.
technical terms from past translations.
6. Tokyo police say there have been no reports of
No change . . . 24
damage from the latest quake.
– Translation quality depends on a translator’s
skill. The quality of past translations varies, so
B. User Inquiry and Its Response translators need to treat them as supplemen-
tary materials.
Each question from the user inquiry, the choices, It got worse ... 0
respondent number, and typical comments are shown be- (Both sides) ... 1
low. The answers in parentheses are responses other than – Without careful research, it sometimes leads
the choices given. to perpetuating past mistranslations.
• Respondents’ professions (duplication included): [Question 4] Did using the system change your translation
Japanese writer ... 32 procedures?
Editor ... 7 Yes ... 8
Other (announcer, simultaneous translator, etc.) 7 – I got into the habit of researching difficult
Total ... 41 parts before making the translation.
27
– I looked up reference information more often Showing whole articles . . . 26
while making translations. – The context surrounding an expression is im-
– I began using only the system, in place of other portant.
materials. – It is difficult to select a translation when look-
No . . . 29 ing only at the expression itself, without con-
text.
[Question 5] Did using the system change the frequency at – The articles are relatively short, so it is better
which you added words to your own memorandum or to look over the whole articles.
shared term lists? Showing identical or similar expressions to
No change, or increased . . . 16 the input and their translations ... 5
Obviously decreased . . . 16 – I want to save time in finding corresponding
translations.
[Question 6] What is your primary reason for performing (Select result display style) ... 7
searches? – It is better to select a display style that can be
To obtain a translation of the input adjusted depending on time limitations.
expression ... 8
To find a whole article related to the input [Question 9] What element do you use as input (duplication
expression and its translation ... 4 included)?
Both of the above . . . 27 Words . . . 38
⇒ Do you make a distinction between Phrases or clauses . . . 24
these purposes? Sentences or longer ... 6
Yes . . . 11
No . . . 15 [Question 10] What proportion of the time do you use
Japanese as input?
[Question 7] Do you have any trouble finding the informa- –30% ... 5
tion you want from a search result translation? 40%–60% ... 9
No . . . 14 70% – . . . 25
– It’s easy to find, because the articles are rela- – I input Japanese when looking up translations
tively short. for expressions.
Sometimes . . . 24 – When I am researching historical information,
Yes ... 1 I use both Japanese and English for input.
– The equivalent expression in the translation
article should be highlighted, too. [Question 11] How do you input a query expression?
Input directly from the keyboard ... 39
[Question 8] Which do you prefer—showing the whole Copy and paste expression from the
articles and the equivalent translated articles, or showing source Japanese article ... 1
only identical or similar expressions to the input and the
matching translations (if this is possible)?
28
AUTHORS (from left to right)
Tadashi Kumano received his M.E. degree in computer science from Tokyo Institute of Technology in 1995. He was a
researcher at NHK Science and Technical Research Laboratories from 1995 to 2000, and has been a researcher at ATR Spoken
Language Translation Research Laboratories since 2000. His research interests include natural language processing (translation
aid, machine translation, and natural language generation), information retrieval, and artificial intelligence. He is a member of
IEICE, the Information Processing Society of Japan, the Japanese Society of Artificial Intelligence, and the Association for
Natural Language Processing.
Isao Goto received his M.E. degree in electrical engineering from Waseda University in 1997. He then joined NHK (Japan
Broadcasting Corporation) and has been with NHK Science and Technical Research Laboratories since 1999. He is engaged in
research on natural language processing, information retrieval, machine translation, and language corpus. He is a member of
the Association for Natural Language Processing and the Information Processing Society of Japan.
Hideki Tanaka received his B.E., M.E., and Ph.D. degrees from Kyusyu University in 1982, 1984, and 1995. He joined
NHK (Japan Broadcasting Corporation) in 1984 and has worked with the Science and Technical Research Laboratories since
1987. He has been engaged in the research on machine translation, information retrieval, and speech recognition. He moved to
ATR (Advanced Telecommunications Research Institute International) Spoken Language Translation Research Laboratories in
2000 and currently is a department head. He is a member of the Association for Natural Language Processing, the Information
Processing Society of Japan, and the Association for Computational Linguistics.
Noriyoshi Uratani received his M.E. degree in electrical engineering from the University of Tokyo in 1975. He then
joined NHK (Japan Broadcasting Corporation) and has been with NHK Science and Technical Research Laboratories since
1979. He received his Ph.D. degree in computer science from the University of Tokyo in 1997. He is currently engaged in
research on natural language processing, human interfaces, and information retrieval. He is a member of IEICE, the Association
for Natural Language Processing, and the Information Processing Society of Japan.
Terumasa Ehara received his B.E. degree in electrical engineering from Waseda University in 1967. He joined NHK
(Japan Broadcasting Corporation) and has been with NHK Science and Technical Research Laboratories since 1970. He received
his Ph.D. degree in computer science from Tokyo Institute of Technology in 1997. He is currently engaged in research on natural
language processing, machine translation, and language corpus. He is a member of IEICE, the Association for Natural Language
Processing, and the Information Processing Society of Japan.
29

Kumano 2002

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kumano 2002

Uploaded by

Copyright:

Available Formats

Systems and Computers in Japan, Vol. 33, No.

A Translation Aid System by Retrieving Bilingual News

SUMMARY applications such as web page content or certain types of

© 2002 Wiley Periodicals, Inc.

For technical document translations, a common area

1. Storing and showing articles as translation pair 3.1.1. Search server

You might also like