人工智能的兴起给很多领域带来了变革的曙光,这其中就包括机器翻译。而在学术研究领域,阅读大量英文文献和写作英文论文都是必不可少的,英语是横亘在中国学生和老师面前的一大障碍。学术英语和普通的英语还不一样,大量的专业名词和术语无疑耗费了我们很大精力,那么现在的人工智能机器翻译引擎是否能用在学术研究中呢?笔者进行了简单的测试。
测试对象和方法
这次的测试对象囊括了国内外几大IT厂商的机器翻译产品,包括国内的百度、腾讯、有道和搜狗,以及国外的微软和谷歌,还有一家国内初创公司TransGod。它们都声称自己的翻译引擎使用了人工智能技术。
这次测试中,百度、有道、搜狗、谷歌和微软都是直接调用云服务的API进行翻译,而腾讯、TransGod、IBM都是在浏览器界面直接翻译的。测试结果仅供参考。
质量比较——英译中
我们使用今年Nature杂志上的一篇学术论文中的句子,原文及翻译结果见下表:
工具 | 译文 |
---|---|
原文 | First, Voronoi tessellations of these surfaces show that disclinations (particles with topological charge equal to 6 less their coordination number) densely and uniformly cover the liquid-like sample, but are much rarer in the more ordered sample and are clustered in scars (middle panel of Fig. a). |
搜狗深智 | 首先,这些表面的Voronoi镶嵌显示出错向(拓扑电荷等于6减去它们的配位数的粒子)密集且均匀地覆盖液体状样品,但是在更有序的样品中更罕见,并且聚集在疤痕中(图a的中间面板)。 |
微软DNN | 首先, 这些表面的Voronoi 方格花纹表明, disclinations (与拓扑电荷相等于6的粒子的协调数) 密度和均匀地覆盖液体样的样品, 但在更有序的样品中更少见, 并且是在疤痕中聚集(图a 的中间板)。 |
百度翻译 | 首先,这些表面的Voronoi镶嵌显示偏转(具有拓扑电荷的粒子等于它们的配位数的6)密集且均匀地覆盖液体样,但是在更有序的样品中更稀少,并且聚集在疤痕中(图A的中间面板)。 |
谷歌神经翻译 | 首先,这些表面的Voronoi镶嵌表明,向量(具有拓扑电荷等于其配位数减去6的颗粒)密集均匀地覆盖液体状样品,但在更有序的样品中更为罕见,并且聚集在疤痕中(中间面板)图a)。 |
有道智云 | 首先,这些表面的Voronoi tessellations表明,不倾斜(带有拓扑电荷的粒子比它们的配位数少6)密集且均匀地覆盖在类液样品上,但在有序的样品中更少见,并聚集在疤痕(图中面板)。一个)。 |
IBM Watson | 首先,这些表面的Voronoi睾丸显示,具有拓扑电荷的粒子(具有等于6的拓扑电荷的粒子的配位数小于其配位数) ,均匀地覆盖液体样样本,但在更有序的样本中是更罕见的,并且是在图的中间板上的。 a)。 |
腾讯翻译君 | 首先,这些曲面上的voronoi图表明,具有拓扑电荷等于6的粒子,其配位数小于6配位数,且均匀覆盖液体样样品,但在较有序样品中较少见,并聚集在瘢痕(中的)中。 |
TransGod的Atman引擎 | 首先,这些表面的Voronoi图样显示,这种溶解(具有拓扑电荷的粒子等于6少,它们的协调数量)密集且均匀覆盖液态样的样品,但在更有序的样本中却非常棒,聚集在扇形(图的中间板)。 答)。 |
对于机器来说,这一段英文中蕴含了许多挑战。
- 术语:Voronoi tessellations和disclinations
- 结构:这句话分成了几个部分,并且有两处括号括起来的注释。
- 语法:topological charge equal to 6 less their coordination number,这个less究竟是什么意思。
在上面几种翻译工具中,腾讯翻译君和TransGod的翻译结果显然不怎么样,这里就不讨论了。我们来看其它的。
首先,Voronoi tessellations意思应该是Voronoi镶嵌,一种特殊形式的镶嵌。在这个术语上,有道直接放弃,而微软的翻译并不准确,而IBM则将tessellations与testis混淆,犯了一个令人啼笑皆非的错误。而第二个术语disclination意为旋错或向错,IBM和微软忽略了这个单词,有道、谷歌和百度的翻译都不尽准确,相比之下,搜狗的翻译最贴近其实际意思。另外一个术语topological charge意思应该是拓扑荷,所有的翻译引擎都翻译成了拓扑电荷,虽然只有一字之差,含意上容易给不了解相关领域的人带来误导。术语的翻译体现出了各大翻译工具的水平,而这也正是学术文章翻译的难点。在这一点上搜狗成绩最佳。
其次,我们再来看括号中的这句particles with topological charge equal to 6 less their coordination number。这句的实际意思应该是拓扑荷等于6减去它们的配位数的粒子。在这一句上,微软、百度、有道、IBM均败下阵来,翻译的句子与实际意思相去甚远,而谷歌虽然语句通顺,但是意思却正好相反,只有搜狗的意思完全准确。
最后,再来看整体。几大翻译工具基本都能还原句子其余部分的意思,但是IBM和有道在翻译最后一个括号里的注释时发生了失误,没有理解Fig. a的意思。而微软也有一些小错误。谷歌和搜狗的翻译很接近,但是相比之下,搜狗的更准确(谷歌多了一个括号)。
综上所述,搜狗在这次英译中的测试中以绝对优势拔得头筹,完全准确无误地翻译出了这个较为复杂的句子,令人刮目相看。
质量比较——中译英
中译英的比较使用的是科学网一篇博文中的句子,原文及翻译结果见下表:
工具 | 译文 |
---|---|
原文 | 从科学发展史来看,牛顿力学问世以来,还原论的研究方法主宰了现代科学中的众多领域,该法把系统分解为大量的基本单元,认为这些单元的行为及其相互作用遵从普遍而简单的自然法则,虽然该方法取得了巨大的成功,但是它同样存在极大的局限性,因为它仅仅适用于“简单系统’,而无法或根本不适于自然界中大量普遍存在的各种各样的“复杂系统”。 |
搜狗深智 | Judging from the history of scientific development, since Newton's mechanics came into being, reductionism has dominated many fields in modern science. this method breaks down the system into a large number of basic units, believing that the behavior and interaction of these units follow the universal and simple natural law. although this method has achieved great success, it also has great limitations because it is only suitable for " simple systems" and cannot or is not suitable for a large number of various " complex systems" commonly existing in nature. |
谷歌神经翻译 | From the perspective of the history of scientific development, since the birth of Newtonian mechanics, the research method of reductionism has dominated many fields in modern science. This method decomposes the system into a large number of basic units, and believes that the behavior of these units and their interactions are universal and simple. The natural law, although the method has achieved great success, it also has great limitations because it only applies to "simple systems" and cannot or is not suitable for the wide variety of ubiquitous species in nature, Complex system. |
有道智云 | Came out from the point of the history of science, Newtonian mechanics, the research method of the reductionism dominate the many fields of modern science, the basic unit of the system is decomposed into a lot of, think that the behavior of these units and their interaction follow a common and simple natural law, although this method has been a huge success, but it also has great limitation, because it only applies to the "simple systems', and can't or don't fit for the nature of the prevalence of various complex systems. |
微软DNN | From the perspective of the history of scientific development, since the advent of Newton's mechanics, Reductionism has dominated many fields in modern science, which decompose the system into a large number of basic units, thinking that the behavior of these units and their interactions conform to the universal and simple laws of nature, although this method has achieved great success, But it also has great limitations because it applies only to "simple systems" and is not or simply unsuitable for a large variety of "complex systems" in nature that are ubiquitous. |
百度翻译 | From the history of science, since the advent of Newtonian mechanics, the research method of reductionism has dominated many fields in modern science. This method decomposes the system into a large number of basic units, and considers the behavior and interaction of these units to follow the general and simple law of nature, although the method has achieved great success, but it has achieved great success. It also has great limitations, because it is only applicable to "simple systems", but can not or can not be suitable for a variety of "complex systems" that are widespread in nature. |
IBM Watson 语言翻译器 | In the history of scientific development, the theory of reductionism has dominated many fields in modern science, and the law breaks down the system into a large number of basic units, and believes that the behavior of these units and their interactions conform to universal, simple natural laws. While the approach has been a great success, it also has great limitations, because it applies only to "simple systems", and to a wide variety of "complex systems" that are not or are not suitable for large quantities in nature. |
腾讯翻译君 | From the perspective of scientific history, since Newton mechanics came out, research methods of reductionism dominated many fields of modern science, which decomposed systems into large quantities of basic units. Although the behavior and interaction of these units conform to universal and simple natural laws, although this method has achieved great success, it also has great limitations. Because it applies only to simple systems, it cannot or isn't suitable for many kinds of complex systems that exist universally in nature. |
TransGod的Atman引擎 | 不支持 |
中文原文中有几大难点:
- 学术术语,比如:牛顿力学、还原论等;
- 句子很长,语义有好几处转折;
- 修饰词很多,比如:极大、仅仅、各种各样等;
我们来看一下各个翻译工具的翻译结果。
第一句就显示出来差异:从科学发展史来看,百度、腾讯和IBM的翻译结果和原文相比并不完整,而微软和谷歌一模一样,有道和搜狗表达不同,相比较而言,有道的表达有点奇怪,搜狗的表达是贴合原意的。
第二个小句,IBM和有道都出现了漏译,牛顿力学,准确的翻译应该是Newtonian mechanics,谷歌和百度是正确的,微软,腾讯和搜狗翻译错误。具体表达上,微软和百度一样,搜狗和谷歌各有千秋,腾讯最简单。
看整体句子结构,只有谷歌,腾讯和百度有正确的断句和首字母大写,这一点无疑是非常基础的。搜狗意思准确,而谷歌多了一个短语the natural law,发生了失误。有道整体感觉质量不佳。微软整体质量尚可,但是有一些小失误,比如decompose单数。百度but it has achieved great success发生了重复,最后一句质量也欠佳。IBM最后一句则完全错误。腾讯使用了过去时,但是前后又不统一,并且连续两个although,整体表达也偏简单。
综上而言,在中译英上,各家都难称完美。相较而言,除了一处失误,谷歌的质量是最佳的。
总结
虽然测试的内容不多,但是各大翻译引擎的差距已经相当明显。综合来看,英译中,搜狗深智引擎有相当大优势,而中译英谷歌有优势,但是优势不明显。虽然谷歌翻译现在已经可以直接访问了,但是考虑到搜狗支持直接上传文档进行批量翻译,笔者力荐搜狗翻译。