Developers: | ORBL |
Date of the premiere of the system: | 2020/02/13 |
Branches: | Housing and public utilities, service and household services, Real estate, Trade, Pharmaceutics, medicine, health care |
Technology: | Cybersecurity - Biometric identification, Speech technologies, Systems of video analytics |
2020: Start of a system of speech recognition in commercial operation
On February 13, 2020 the ORBL company brought into commercial operation a b2b-product – the system of speech recognition of high accuracy. It is a descriptor of a voice using which it is possible to transfer a spontaneous stream speech to the text.
What data can be obtained using the program of speech recognition:
- emotional color of the speech (eight emotions – on a range for anger to joy);
- demographic characteristics telling (gender and age);
- the full shorthand report of audiodata for further analytics.
"The request from the existing clients who already use video analytics pushed to start of the provided ORBL function. The matter is that in total both solutions – face recognition and speech recognition - give a certain synergy which is expressed in completeness of the management decisions given for acceptance. From the technical point of view function speech recognition does not bear additional loads of the equipment, both processes are implemented on one device within the general architecture. Recognition at once both persons, and voices help the common decision to work optimum, and to business - to receive benefita from uniform architecture", 'Anton Rudov, the CEO and the founder of ORBL noted' |
Before bringing a product into commercial operation, ORBL spent several pilots with banks, medical institutions. According to the developer, regenerative feedback from customers says that the technology can be used in the most different industries:
- Retail and service industry. These are first of all the companies which wanted to fix work of the employees at offices and sales departments (shops) not only on video, but also using audio. In retail recognition of a voice is necessary to understand how there was a communication with buyers, for example, at the checkout. In the field of services (for example, beauty shops and the dealer centers) the solution helps to control work of consultants. The speech of employees registers in personal microphones and then is transformed to the text which a system analyzes on compliance to scripts. It very much helps also at analysis of conflict situations with clients.
- Call centers can control employees regarding that too, how precisely they follow scripts during the conversation with clients. Especially it is relevant for support services of banks.
- Marketing services can analyze mood of clients and also emotions experienced at the time of the conversation then to display average values of satisfaction.
- Medical sphere. For example, when carrying out opening pathologists can enter data in an information system, "filling" with voice the necessary fields.
- Authentication systems. The voice of the person is unique and it cannot be forged. Different helpdesk of a system are interested in access control on a voice, for example. The voice biometrics for access to an account by phone not only will recognize a voice and verifies it with a sample in base, but also is able to distinguish the "living" interlocutor from the speech written in advance. "Kapcha" is for this purpose used - the robot asks the interlocutor ask to repeat spontaneously selected phrase.
Specifications:
- The program can process the audiodata received via different devices – for example, a dictophone or phone. The final quality of recognition depends on initial quality of data. The minimum result is "capture" of a key word, and at sufficient quality of record the customer receives the ready shorthand report.
- Recognition accuracy at record on a dictophone reaches 95-96% at a spontaneous speech i.e. when the person does not dictate or does not try to pronounce specially accurately words). It allows to receive the ready text with the minimum quantity of typos and errors which can be corrected easily in the manual mode.
- The express speaker circuit provides slightly smaller accuracy – 80%. It is connected with what its sampling rate is 8 kHz whereas at record on a dictophone – 44 kHz. However the dictionary of recognition for specific subject gives an accuracy increase in 10%. For example, for real estate agency such dictionary can include names of residential complexes or slangy designations of types of designs of apartments.
If to speak about speech recognition accuracy in access control systems, then the result of processing of a voice is expressed as a percentage compliances - how precisely it matches reference record in base of biometrics.
Differences from the similar systems mentioned by the developer:
- This stack solution - recognition of the person and the speech in one architecture. Along with the high accuracy of recognition of a voice, a system provides face recognition with an accuracy of 99.99997% with turn of the head to 65 degrees, the minimum illumination in 60 lx, with a speed of 0.3-0.4 seconds.
- The horizontal scaling providing the minimum TCO for the client.
- Works not only from a cloud, but also locally (generally at the market cloud solutions). ORBL does not use public clouds that considerably, according to the developer, reduces risks of date leak. Data processing happens either on the ORBL server, or on servers of customers.
- Unlike other local solutions, the product ORBL requires less hardware resources - because data processing (both video, and audio) is made on video cards of averages on the power of computers. It gives economy of resources.
So far technical capabilities of a system on conversion of the speech to the text are limited to the existing requests. For example, it is not configured on the correct arrangement of punctuation marks as it is generally used for the standardized filling of fields, but not creation of the literary text. In plans of the developer for 2020 there is also a creation of intonational speech recognition thanks to which punctuation marks will be automatically placed in the course of a speech transkriptization. The algorithm of machine learning which will analyze an array of audiodata (for example, audiobooks) and will reveal patterns between intonations speaking and punctuation marks in the text can quite cope with this task. Also in plans - improvement of function of text analytics. With its help customers will be able to analyze text communication channels with audience (e-mail, chats, a forum) regarding mentioning of the defined subjects and key phrases. This function is useful also to training of chat-bots - that they could read out easily typos and errors in messages from clients, consider in ORBL. Besides, in the company work on speech synthesis for creation on the basis of this function of full-fledged voice bots which will be able to advise, for example, clients by phone.