Workflow Studio is designed to scale to be able to process large volumes of data. It is optimized for environments like AWS where capacity can be added and removed on demand. The features are so powerful that all of Omnsicien’s own data processing and workflow is managed with the exact same tools.
The following 5 core functional modules underpin the platform and make workflow automation and language processing accessible to all.
Workflow Automation Server
At the heart of the Workflow Studio platform is the Workflow Automation Server which executes all the workflows and contains all the Natural Language Processing (NLP), text and file manipulation, and workflow scripting engine features. A range of other technologies is also embedded in the Workflow Studio platform such as web crawling, document format conversion, and data extraction.
When there is more than one Workflow Automation Server instance they can be configured for different roles, different capacities, and different purposes. A Workflow Automation Server can be configured to process specific kinds of jobs or a range of jobs. Each instance can determine the number of concurrent jobs of each job type.
Dynamic Job Distribution
Workflow Studio operates as a standalone system or as a distributed job management platform designed to scale and spread large job workloads across multiple servers.
After assigning one instance of Workflow Studio Workflow Automation Server to operate in the Dynamic Job Distribution role, all job requests will be centralized and managed via an easy to use web-based portal. In the load balancer role, new instances of Workflow Automation Servers auto-register their presence and can be added and removed from the pool dynamically without any manual effort. As load increases or decreases, new servers can be added or removed.
Dynamic scaling based on load is ideal for cloud environments such as AWS, Microsoft Azure, or Google Cloud that can quickly instantiate new instances to handle sudden bursts of traffic. Each server instance can be configured to process one or more job types and for appropriate job volumes based on server configuration and capacity.
Folder and Data Source Monitoring
Jobs can be started or registered by a variety of different means. A simple REST API call from a custom application adds a job to the job queue ready for processing and tracking to completion. However, often there are data sources where the presence of a file or a record can be detected to automatically trigger the job registration process.
Folders can be monitored using a wide variety of file storage systems. Folders could be on the local disk, network disk, or remotely on AWS S3, FTP, SFTP, Google Drive, OneDrive Business, Dropbox, Box.com. Mail server folders can also be monitored. Files are automatically detected, copied to the server, processed and the output returned. Output can take the form of a new file being delivered, a REST API call to notify an external system that a job is complete, or an email containing the processed output.
Natural Language Processing and Data Processing
Embedded in Workflow Studio are a range of Natural Language Processing (NLP) tools. These tools analyze and automatically process text for a wide variety of tasks. The list below is just a partial list of the types of NLP related tools that are embedded in Workflow Studio and accessible by LSScript and LSTools:
- Language Identification: Identifying the language of sentences and files.
- Sentiment Analysis: Determine if the text is happy, sad, or neutral.
- Syntax Parsing: Grammatically analyze the structure of sentences.
- Part of Speech: Determine the part of speech of each word in a sentence.
- Named Entity Recognition: Extract people names, organization names, locations, dates, currencies, and other entities.
- Term Extraction and Generation: Extract predefined and relevant terms from a given text.
- Domain Identification: Identify the sphere and domain of a given text.
- Document Alignment: Match bilingual documents in different languages as document pairs.
- Sentence Alignment: Match bilingual sentences from pairs of documents.
- Word Stemming: Reduce words back to their stem form.
- Optical Character Recognition (OCR): Analyze images and PDF files and convert them into text or Microsoft Office Documents.
- Automated Speech Recognition (ASR): Recognizing and identifying text from speech.
- Document Conversion: Convert documents between a variety of different formats.
- Web Crawling: Download entire websites.
- Data Mining: Analyze large bodies of content and extract useful data.
- Machine Translation: Translate text across 600+ language pairs.
And many more...
Workflow Studio incorporates hundreds of features and functions for file, job, data, and language processing. The below feature list is just the tip of the iceberg in terms of features and functionality. Talk to an Omniscien team member for a demo to better understand the features and capabilities.
- A comprehensive toolkit for NLP, file, data, and text manipulation.
- Integrated into LSScript for easy workflow utilization.
LSScript Workflow Scripting
Language Studio Integration
- Language Studio connectors provide seamless processing for document translation and workflow before and after translation.
Media Studio Integration
- Subtitle Optimized Machine Translation and other media related workflows are pre-configured out-of-the-box.
- Workflow Studio can monitor a wide range of sources to register jobs for processing.
- Job sources include:
Email, FTP and SFTP, Local Disk, REST API (Call Out), REST API (Call In), AWS S3, Google Drive, Dropbox, Box.com, OneDrive Business, and more.
Job Queue Management
- Job queues are managed via a simple and easy to use REST API or secure portal-based web interface.
- Add, pause, delete, cancel, rerun, adjust priorities, and check the status of jobs.
Job Completion Callback
- When a job is completed (success or fail) and as a job progresses, the status can be sent to a callback REST API in an external application.
Multi-Layered Job Dependancies
- Jobs can be linked and defined dependent on earlier jobs being completed successfully.
- Workflows can be built by processing the outputs of each job as inputs to the secondary jobs.
Job Status Monitoring
- Track job status in real-time.
- Standardized feedback processes and approaches for workflow status tracking.
Automated Drop Folder Processing
- Process thousands of files by simply copying them into a pre-designated folder.
- Files are automatically submitted and returned to another folder on completion.
Automated House Keeping and Cleanup
- Post job cleanup and general house keeping can be performed to ensure that systems health remains in optimal condition and not littered with stale files and data.