NMT Conference and Webinar Questions and Answers
During conferences and webinar, the Omniscien team are often asked many questions about Neural Machine Translation. We have collected many of the common questions and provide the answers here in an easy reference list. Please keep in context that technology evolves quickly. At the time of being asked these questions, the response was accurate but there may be advances that would outdate the response. If you have more questions, please feel free to Contact Us:
The Questions and Answers List
Output words are predicted from the encoding of the full input sentence and all previously produced output words. Do previously translated phrases/segments in the same project predict future segments, or is the encoding/context confined to the segment?
For decoding to occur coding is necessary, but what is the conceptual relationship between coding, decoding and translation?
Typically, the input and output of SMT systems are sentences. This limits SMT systems in modeling discourse phenomena. Could we go beyond sentences with NMT systems?<br />
Two objectives were mentioned: Adequacy and Fluency. Where does grammar fit into this and does NMT check grammar rules or only sees sequence of words?
Some researchers proposed full character level NMT to solve the rare word problem in NMT, especially last two months. What is the performance of when using character level NMT?
Do you think that sequence to sequence gives better results than character based model?
You mentioned that the machine learns “It’s raining cats and dogs” as a whole phrase. How does it learn something more complex like “Sarah looked up, the sky was bleak but there shouldn’t be any cats and dogs coming out of it.”
When comparing SMT to NMT, what is the difference in the time required to train an engine?
Is monolingual data still useful in NMT as it is for an SMT language model or is parallel data the only data useful at training time?
At Omniscien we have developed techniques and technologies to leverage monolignol data for data synthesis and data manufacturing that offer significant benefits to translation quality. This is available as part of our Professional Guided Custom MT engines.
Can you do the “incremental” training of NMT?
How much parallel texts do we need when we use NMT systems compared to traditional SMT systems? Is it true that NMT systems needs more data than SMT systems?
The differences of type of data that goes into NMT system is that NMT currently is trained only on translated text for instance they are not even trained on any additional target language text on half which is a huge part of traditional SMT engines. There is no separate language model.
Do you think that we need more accurate (higher quality) data than is used to train SMT models?
One of the things that’s made SMT commercially feasible is adaptivity: to adjust a system’s behaviour based on small bits of human input, without retraining on the entire data set (too costly). As I understand that also needed technology innovation. Do current NMT models have this capability, or is adaptive NMT still some ways off / or not feasible with current technology at all?
In theory, it should be pretty straight-forward to take a model that exists, have some additional training, might be different in nature, different domains, different styles, and then adapt the existing model to the new domain. So, there is definitely some hope for that. Theoretically, this could work on a per sentence level. That you have an existing system, new sentences come in, you adapt it, and your system knows about the sentence. . While this works in the context of incremental training, it is not yet available as real-time learning. It is still in the research phase in the real-time context.
There are some other important things to consider about real-time learning. NMT engines cannot “unlearn”. So if the input sentence is bad, it is going to learn how to do things badly. If you are learning based on a sentence just entered in an editor previously and then the later translations may be produced incorrectly. In general, we would not recommend this approach as the data it is learning from has not yet been through quality assurance and been signed off.
Is self-learning from post-editing more easy with NMT than with SMT?
Is predictability supposed to work even with source text? Would it be possible, using the same approach, reckoning the suitability of a source for MT?
At Omniscien we use this approach as part of our data synthesis and data creation processes when customizing an engine with a Professional Guided Custom MT Engine.
Do words have fixed embeddings? How is the information in the embedding decided on?
Could you please explain with more details the concept of ¨Embedding¨?
Do NMT systems have problem with long sentences as SMT systems?
How much effort is being paid to translations into highly inflected languages?
Could you recommend a brief bibliography on Neural Machine Translation (NMT)?
A fairly comprehensive bibliography for neural machine translation can be found here.
In December ’16 Google GNMT also proposed multilingual NMT, by training several language pairs at the same time and use the same encoder-decoder, how do you think about the issues in Multilingual NMT?
Can NMT be applied when the source and target languages are the same? For example, English to English. And will it perform better than translating between two different languages?
Are there specific languages that NMT is better suited for than others?
Are there cases where Rule-Based Machine Translation RBMT/SMT is better than NMT?
While the NMT technology is evolving quickly, at this moment SMT still offers more control. NMT uses a very different technique to translate and is more difficult to guide for structure, writing style, and glossary vocabulary.
The Omniscien team has worked around this in part with the advanced customization techniques used to synthesize and create data for training. SMT has a language model to guide many features that is currently not part of the NMT approach. SMT systems also tend to translate better on very short (1-2 words) and very long (50+ words) text).
Finally, SMT systems still offer more control with their rules for example for the markup and handling of complex content such as patents or eCommerce product catalogs but also conversions such as “on-the-fly” conversion of measurements, etc.
Based on all the above, Omniscien has developed a Hybrid MT model that combines the strengths of both technologies seamlessly to deliver a higher translation quality.
Do you think linguists can be involved and work together with the developers in training the machines in order to achieve better results? Is this something that’s happening already perhaps?
There is a benefit to linguistic knowledge in preprocessing and preparing language data. We have seen benefits to linguistically motivated preprocessing rules for morphology or reordering in SMT systems. It is not clear yet if these or similar inputs would also benefit NMT systems. Since NMT are data-driven and typically use unannotated text, there is no obvious place where linguistic insight would be helpful. However, as with SMT, post-edited feedback can be used to improve translation quality in NMT.
What will be required to integrate NMT with translation software, CAT and TMS systems such as SDL Studio, XTM, MemSource and memoQ?
Omniscien has a well-established API that is fully integrated with a wide range of CAT an TMS systems and also in use by many enterprises directly.
An engine in Language Studio can be configured using a unique Domain Code (ID number) for a NMT specific, MT specific, or hybrid NMT/SMT workflow and all the appropriate pre and post processing then takes place automatically. From an external integration perspective, the technology used to perform the translation is transparent.
Do you see room for curated language/knowledge bases (such as FrameNet, for example) in the future of Neural MT? If so, which path do you think is a promising one?
What are some of the common search (decode) algorithms used to find the translation in neural machine translation? (i.e. Does it still employ Beam Search like Phrase-based SMT or is this substituted by the output word prediction model)
I am wondering who will be absorbing the cost of implementing NMT? Will it be the end client, the LSPs or translators?
Similarly, at translation runtime (decoding), GPU technology is faster than CPU, but costlier. Using a CPU to decode NMT delivers about 600 words per minute, while the same CPU for SMT would deliver 3,000 words per minute, 5 times faster. Using GPU requires new hardware investments, which at this time are quite costly, and depending on the GPU technology use, can be several times faster than SMT. Our current speeds are 9,000 words per minute with GPU. With the upcoming release of Language Studio 6.0 our speeds will increase to 40,000-45,000 words per minute.
One the one hand the cost will likely come down over the coming years as GPU capacity increases and the systems become more efficient, on the other hand, cost/benefit will depend largely on the uses cases and the industry. For now, we have kept our prices at the same level for several years while passing this benefit on to our customers.
Is human/BLEU scoring the preferred/only way of performing LQA for NMT output? Are models like DQF and MQM used as part of the research or is the NMT output quality above what these systems can show?
In the German-English context, how would an NMT system handle a sentence such as “Ich stehe um 8 Uhr auf.”, where the two parts of a “trennbares Verb” should be identified as belonging together, although distantly separated within the sentence?
What is the added value of Neural MT, compared to SMT and what are their drawbacks?
Have the word alignment algorithms been successfully transported to the publicly available cloud engines? What problems exist in creating a proper many-to-many word alignment? (i.e. 0, 1, 2 or more words in the source language can be mapped to 0, 1, 2 or more words in the target language).
In Language Studio, our upcoming NMT offerings will include word alignment information in the same manner as our SMT offerings that are available as part of the programmatically accessible workflow and log files. As with our current core SMT platform which is a hybrid rules, syntax and SMT engine, our NMT offering will also be hybrid to ensure that features such as word alignment is backwards compatible.
How do you think the reliability/accuracy of NMT will evolve? Like high with frequency trading, might we get to a situation in which humans cannot check the reliability/accuracy of NMT output?
In the near term, NMT does indeed deliver higher quality general fluency, readability and accuracy – this is important. However, a major feature that is available in Language Studio Professional SMT custom engines is the ability to manage and control terminology and writing style. This means NMT output may be more understandable, but may not match your style, resulting in greater levels of editing prior to publishing. To put this in context, if you compared the writing style of an automotive engineering manual to marketing for an automotive company they are very different. If your marketing content reads like an engineering manual, you would have to rewrite large portions of the sentence. At this time, both approaches to MT have benefits and disadvantages. We are working on resolve this and the other outstanding limitations of NMT for our own upcoming NMT product offerings.
Hence at this time, depending on the application, the technology choice might vary.
As to applications where “raw” MT is used (as opposed to post-edited MT), again the application requirements will drive the choice of technology. For example, where terminology normalization is required, for example for analytics applications, SMT is far better suited since it allows full control over terminology, even at runtime. If fluency is required on the output some NMT engines will do better than SMT, so again a detailed understanding of the requirements is needed to assist with the choice of technology.
Related Links
Pages
- Introduction to Machine Translation at Omniscien
- Hybrid Neural and Statistical Machine Translation
- Custom Machine Translation Engines
- Powerful Tools for Data Creation, Preparation, and Analysis
- Clean Data Machine Translation
- Industry Domains
- Ways to Translate
- Supported Languages
- Supported Document Formats
- Deployment Models
- Data Security & Privacy
- Secure by Design
- Localization Glossary - Terms that you should know
Products
- Language Studio
Enterprise-class private and secure machine translation. - Media Studio
Media and Subtitle Translation, Project Management and Data Processing - Workflow Studio
Enterprise-class private and secure machine translation.
FAQ
- FAQ Home Page
- Primer; Different Types of Machine Translation
- What is Custom Machine Translation?
- What is Generic Machine Translation?
- What is Do-It-Yourself (DIY) Machine Translation?
- What is Hybrid Machine Translation?
- What is Neural Machine Translation?
- What is Statistical Machine Translation?
- What is Rules-Based Machine Translation?
- What is Syntax-Based Machine Translation?
- What is the difference between “Clean Data MT” and “Dirty Data MT”?