How This Voice AI Startup Is Creating Human-Like Conversations

For this week’s startup function, Analytics India Magazine spoke to Abhimanyu, co-founder of, to know how the corporate is offering real-time voice AI to automate all the buyer help operate in enterprises.

Agara is an autonomous digital voice agent powered by Real-time Voice AI. The platform is contextualised and pre-trained to energy pure conversations over voice, with out human help. 

The voice agent brings innovative autonomous technology to unravel a real-life drawback in customer support. Agara’s proprietary machine studying fashions are pre-trained on industry-specific buyer care knowledge.

Abhimanyu mentioned, “Focused currently on voice communication, our AI platform works much like human brains on four lobes. One each for Speech Recognition, Natural Language Understanding, Conversations Module and Text-to-Speech. The lobes help in speech-to-text conversion, recognise intents, detect consumer’s emotional state, extract information from the discussion and respond.”

What’s the differentiator

Abhimanyu acknowledged, “Agara is built specifically for autonomous voice conversations. Every aspect of our product is hyper optimised to deliver quick, intelligent responses to customers over voice.”

He spoke about a couple of essential improvements of their lab, similar to:

  • Patent-pending, proprietary Spoken Language Understanding modules specialised in precisely figuring out key entities from a dialog like names, numbers, e mail addresses, cities, costs and extra.
  • Best-in-class speech recognition accuracy on telephone audio
  • Robust behaviour with accents and noisy inputs
  • A mix of GPU and CPU primarily based infrastructure for low latency responses

Furthermore, Abhimanyu additionally talked about a few of the key know-how differentiators, similar to:

  • Conversational flexibility with pre-built dialog blocks
  • No-code workflow constructing
  • Integrated multimodal communication
  • Pure autonomy focus
  • Industry main speech understanding
  • One workflow invoked from wherever
  • Vertically built-in system/100% autonomous name centre

Use of AI at

According to Abhimanyu, Agara’s custom-trained ASR system has been educated on a number of hundred hours of buyer help telephone name recordings for probably the most optimum, context-specific speech recognition. The ASR is delivered on a GPU-based infrastructure to make sure low latency. Along with the proprietary ASR, Agara additionally makes use of public ASR, particularly, the Google Enhanced Phone mannequin for transcription. 

“Agara’s extended R&D efforts on speech recognition warrant the need for an additional capability which accounts for errors often found in public speech recognition systems. Often, these errors came in one of the following three forms, which are irrecoverable transcript errors, accents & intonations and ambient noise,” Abhimanyu mentioned.

In parallel to the ASRs, the conversational AI platform makes use of SLU (Spoken Language Understanding) modules to seize particular entities and intent from the caller’s speech, working robustly towards accents and noises. SLUs are {custom} machine studying fashions developed in-house at Agara. These fashions are designed to function immediately on speech enter to establish and extract particular entities and intents. These fashions don’t generate transcripts and solely output the requisite entity/intent. 

See Also

Talking about proprietary NLP fashions, the co-founder mentioned these fashions mix the outputs of the Speech-to-Text system and SLU for a structured understanding of what the caller mentioned. The fashions are pre-trained utilizing industry-specific datasets to precisely establish the intent, entities, and tone and sentiment from the consumer’s speech. 

The Conversation Blocks deal with complicated, multi-turn conversations with the caller to gather related knowledge, adapt to any change in context naturally, and resolve caller requests in a totally autonomous method. Moreover, the platform additionally makes use of customised variations of publicly accessible Text-to-Speech companies to ship responses naturally.

Core tech stack

Abhimanyu mentioned the core tech stack includes a number of microservices working on high of AWS.

  • The firm makes use of Golang for the core backend companies. 
  • The ML prediction companies use Python and GCP speech companies. 
  • React powers all of the frontend apps.
  • All the metadata together with logs, audio, transcripts and many others. will get saved into AWS S3.
  • All transactional and platform knowledge goes into Postgres DB 
  • The growth and deployment pipeline is automated utilizing Github Actions, Terraform and AWS Fargate.
  • Languages: Golang, Python, Typescript
  • Frameworks: React, Vue
  • Data Storage: AWS RDS Postgres, Redis, Elasticsearch, AWS S3
  • Cloud Hosting: AWS ECS, AWS EC2
  • Deployment: AWS Fargate, Terraform, Github Actions

Recent funding

The firm has raised a $4.three million Pre-Series A extension led by UTEC, a Japan-based early-stage deep-tech enterprise capital agency. Existing buyers Blume Ventures and RTP Global additionally participated within the spherical.


Agara will deal with three foremost areas within the subsequent few years:

  • Significantly enhance speech understanding accuracy: Agara is within the strategy of transcribing, annotating and coaching its machine studying fashions to extract large quantities of intelligence. In addition, the platform additionally manages a rising world panel of information creators. Over the subsequent two years, Agara intends to reap each to considerably enhance the accuracy of their proprietary speech understanding. 
  • Create near-human conversations that may blur the distinction between individuals and machines to considerably enhance buyer expertise: Agara makes use of automated textual content to speech programs to synthesise voice on the fly. Agara is investing appreciable sources in making a text-to-speech system that’s extremely human-like and tuned significantly to the wants of a person-to-person dialog.
  • Enhancing the suite of capabilities to let customers create extremely participating dialog flows throughout a large number of situations in minutes: Agara intends to simplify the method of making, managing and bettering dialog flows, with out the necessity for any technical experience utilizing the drag and drop framework.

Join Our Telegram Group. Be a part of a fascinating on-line group. Join Here.

Subscribe to our Newsletter

Get the most recent updates and related presents by sharing your e mail.

Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and studying one thing out of the field. Contact: [email protected]


Please enter your comment!
Please enter your name here