Omniscien » Solutions » Optical Character Recognition, Language ID and Machine Translation for Invoice Automation

Optical Character Recognition, Language ID and Machine Translation for Invoice Automation

Use Case Scenario

A German retailer receives 20,000+ invoices per month from providers all around the world. Many of the invoices are not in German making it very difficult for the finance team to be able to process the invoices efficiently and causing many human errors due to language barriers.

  • 20,000+ invoices per month are received by email at accounts@<domain>.com
  • Invoices arrive in 40+ languages such as Chinese, Slovenian, French, English, etc.
  • Only 5 people are in the finance team to support invoice processing
  • Due to language barriers, ingesting and validating invoices is currently a very slow and manual task
  • Without automation, the increasing number of invoices was becoming impossible to process.
  • Costs and time required for processing were becoming excessive.
  • The number of human errors due to language related issues and misunderstandings was high.


  1. Language Studio workflow tools were configured to check the email box every 1 minute.
  2. Invoices come as attached documents or are sometimes embedded in the email body. Language Studio JavaScript workflow tools were used to determine if there was an attachment or if the invoice was part of the main body.
  3. Once the location of the invoice is determined, the following steps occurred:
    1. If the invoice was an image or a PDF file, then the invoice is processed via OCR to convert the invoice to Microsoft Word format.
    2. If the invoice was in a range of other formats that the organization was not able to view natively (i.e. view in Microsoft Word or a PDF viewer) then the format was converted to Microsoft Word keeping all the original formatting and layouts such as tables and fonts.
  4. Once the format is normalized to a small number of standard formats, the following occurred:
    1. Automatic Language ID was performed to determine the language that the invoice was submitted in.
    2. In the rare case when a language could not be determined, the original email is sent to another internal email address for a human review and resolution.
  5. Once the language was determined, if the language was not German then it was automatically machine translated from the identified source language to German.
  6. Further analysis was performed on the document using a number of Natural Language Processing (NLP) tools (i.e. Named Entity Recognition / NER) to identify and highlight the relevant areas of the document that were of importance to the finance team. This made human review of the invoice faster.
  7. Using the source email address as a reference (suppliers had to register email addresses that are permitted to submit invoices), the invoice was then automatically submitted to the organizations invoice management and processing system.
  8. The Invoice management system stored original copies and translated copies of all of the invoices in case some validation


  • Invoices were converted automatically into a format and language that the finance team can then process efficiently.
  • The 5 person finance team assigned to process invoices could process all 20,000 invoices per month. This breaks down to an average of 1,000 invoices being processed per 8 hour work day, 200 invoices per team member per day, 25 invoices being processed per team member per hour.
  • Language Studio tools were an invisible part of the process and enhanced the data and information available to the user so that they could work more efficiently.
  • The need to engage human translators and other resources was reduced to less than 1% of cases. This was only necessary when the quality of the invoice document was poor (i.e. background images behind text in the invoice). These issues were further reduced over time by engaging with suppliers to ensure that they provided clear invoice documents.
  • Instead of adding head-count to process the growing number of invoices, the human resources was reduced by only having the 5 finance team members involved in the process and significantly reducing the need to engage translators.
  • The number of errors in processing invoices due to language related issues an misunderstandings was greatly reduced.


FREE WEBINAR: AI and Language Processing Innovation – What Is It Good For? Real-World Use CasesWatch the Replay