Slide 1
Language Studio
Optical Character Recognition / OCR
Supported Languages

210 written languages are supported for Optical Character Recognition. Convert images and Adobe PDF files to editable formats for translation and further processing.

0
Written Languages
Block
Overview
Analyze and Recognize
Technical Features
Integrate and Scale
Languages
Recognize Me!!

Written Languages

This page provides a list of languages which are supported for Optical Character Recognition (OSR) processing.

Other types of language processing components such as Machine Translation (MT), Autonomous Speech Recognition (ASR), Sentiment Analysis, Named Entity Recognition, Glossary and Terminology Extraction and Automated Language Identification each have their own specific language support list.

Language Studio OCR provides support for the highest number of recognition languages on the market. It offers recognition of languages with Latin, Cyrillic, Greek or Armenian characters, as well as Arabic, Bangla, Burmese, Farsi, Hebrew, Chinese, Japanese, Korean, Russian, Thai and other languages.

In addition, support is provided for recognition of historic documents printed during the 17th through the 19th centuries in English, French, German, Italian and Spanish, recognition of artificial languages (Esperanto, Interlingua, Ido and Occidental) recognition of programming languages (Basic, C/C++, COBOL, Fortran, JAVA, and Pascal), simple chemical formulas and standard digits.

To further increase the recognition accuracy, integrated dictionaries are provided for many languages. To increase recognition of unusual words and atypical fonts, a small integrated utility can be used for implementing custom user dictionaries and creating user specific character patterns.

Key: With dictionary support.
  • Abkhaz
  • Adyghe
  • Afrikaans
  • Agul
  • Albanian
  • Altaic
  • Arabic (Saudi Arabia) 
  • Armenian (Eastern) 
  • Armenian (Grabar) 
  • Armenian (Western) 
  • Avar
  • Aymara
  • Azerbaijani (Cyrillic)
  • Azerbaijani (Latin) 
  • Bashkir 
  • Basic programming language
  • Basque
  • Belarussian
  • Bemba
  • Blackfoot
  • Breton
  • Bugotu
  • Bulgarian 
  • Bangla (technical preview)
  • Burmese (technical preview)
  • Buryat
  • C/C++ programming language
  • Catalan 
  • Chamorro
  • Chechen
  • Chinese Simplified
  • Chinese Traditional
  • Chukcha
  • Chuvash
  • For MICR (CMC-7) text type
  • Cobol programming language
  • Corsican
  • Crimean Tatar
  • Croatian 
  • Crow
  • Czech 
  • Danish 
  • Dargwa
  • Numbers
  • Dungan
  • Dutch (Netherlands) 
  • Dutch (Belgium) 
  • For MICR (E-13B) text type
  • English 
  • Eskimo (Cyrillic)
  • Eskimo (Latin)
  • Esperanto
  • Estonian 
  • Even
  • Evenki
  • Faeroese
  • Farsi 
  • Fijian
  • Finnish 
  • Fortran programming language
  • French 
  • Frisian
  • Friulian
  • Scottish Gaelic
  • Gagauz
  • Galician
  • Ganda
  • Georgian
  • German 
  • German (Luxembourg)
  • German (new spelling) 
  • Greek 
  • Guarani
  • Hani
  • Hausa
  • Hawaiian
  • Hebrew 
  • Hungarian 
  • Icelandic
  • Ido
  • Indonesian 
  • Ingush
  • Interlingua
  • Irish
  • Italian 
  • Japanese 
  • Japanese (Modern) 
  • Java programming language
  • Kabardian
  • Kalmyk
  • Karachay-Balkar
  • Karakalpak
  • Kasub
  • Kawa
  • Kazakh
  • Khakas
  • Khanty
  • Kikuyu
  • Kirghiz
  • Kongo
  • Korean 
  • Korean (Hangul) 
  • Koryak
  • Kpelle
  • Kumyk
  • Kurdish
  • Lak
  • Sami (Lappish)
  • Latin 
  • Latvian 
  • Latvian language written in Gothic script
  • Lezgin
  • Lithuanian 
  • Luba
  • Macedonian
  • Malagasy
  • Malay
  • Malinke
  • Maltese
  • Mansi
  • Maori
  • Mari
  • Maya
  • Miao
  • Minangkabau
  • Mohawk
  • Mongol
  • Mordvin
  • Nahuatl
  • Nenets
  • Nivkh
  • Nogay
  • Norwegian 
  • Norwegian (Bokmal) 
  • Norwegian (Nynorsk) 
  • Nyanja
  • Occidental
  • For OCR-A text type
  • For OCR-B text type
  • Ojibway
  • Old English 
  • Old French 
  • Old German 
  • Old Italian 
  • Old Slavonic
  • Old Spanish 
  • Ossetian
  • Papiamento
  • Pascal programming language
  • Tok Pisin
  • Polish 
  • Portuguese (Brazil) 
  • Portuguese (Portugal) 
  • Provencal
  • Quechua
  • Rhaeto-Romanic
  • Romanian 
  • Romanian (Moldavia)
  • Romany
  • Ruanda
  • Rundi
  • Russian (old spelling) 
  • Russian 
  • Russian (with accents marking stress position) 
  • Samoan
  • Selkup
  • Serbian (Cyrillic)
  • Serbian (Latin)
  • Shona
  • Simple chemical formulas
  • Simple mathematical formulas
  • Sioux (Dakota)
  • Slovak 
  • Slovenian 
  • Somali
  • Sorbian
  • Sotho
  • Spanish 
  • Sunda
  • Swahili
  • Swazi
  • Swedish 
  • Tabassaran
  • Tagalog
  • Tahitian
  • Tajik
  • Tatar 
  • Thai 
  • Jingpo
  • Tongan
  • Tswana
  • Tun
  • Turkish 
  • Turkmen
  • Turkmen (Latin)
  • Tuvan
  • Udmurt
  • Uighur (Cyrillic)
  • Uighur (Latin)
  • Ukrainian 
  • Uzbek (Cyrillic)
  • Uzbek (Latin)
  • Vietnamese 
  • Cebuano
  • Welsh
  • Wolof
  • Xhosa
  • Yakut
  • Yiddish
  • Zapotec
  • Zulu
FREE WEBINAR: AI and Language Processing Innovation – What Is It Good For? Real-World Use CasesWatch the Replay
+