Skip to content

Conversation

@ajgolledge
Copy link

@ajgolledge ajgolledge commented Jun 3, 2025

This PR provides two new services from Aristech:

Speech-To-Text

This service is called "aristech-transcribe" and can be called from the Call-API "startConversation" with this name alongside the folllowing JSON parameter:

{ "language": "de_DE" }

Note that this is in locale format, not BCP 47. Simply using "de" also works and I have not noticed any difference when using specific regions as well as in English ("en").

An entry like this in the ivr.toml file ensures that authentication is taken care of.

[[contextSwitch.service]] name = "aristech-transcribe" params = { apiKey = "an-apikey" } 

The following are still open issues:

  • Determine whether the credentials authentication is likely to be necessary in future or whether we can reliably just use apiKey
  • Is there a silence timeout and if so, is it configurable? Does the silence_timeout field in EndpointSpec have any effect?
  • When using the example, if the default microphone settings are used ( as opposed to explicitly using 16kHz) does the conversion function which is currently used get in the way? (audio::into_i16) i.e. does not using it improve the performance of the example?

Text-To-Speech

This service is called "aristech-synthesize" and can be called from the Call-API "startConversation" with this name alongside the folllowing JSON parameter:

{ "voice": "anne_de_DE" }

Currently the only alternative voice available to us is "tom_de_DE".

An entry like this in the ivr.toml file ensures that authentication is taken care of.

[[contextSwitch.service]] name = "aristech-synthesize" params = { endpoint = "https://example.com", token = "a-valid-token", secret = "a-valid-secret" } sampleRate = 22050 

Both voices available to us currently work at a sample rate of 22050 Hz. Not specifying this can lead to amusing results 😄

Open Issues

  • Are any other voices available to us apart from "tom_de_DE" and "anne_de_DE"?
@pragmatrix pragmatrix marked this pull request as draft June 4, 2025 05:27
@pragmatrix
Copy link
Owner

Just minor changes and in transcribe.rs I've removed the "" empty string for model / prompt as the default and adjusted the testcases. I like the deserialization of the different credentials options, I'll adopt this for azure.

@pragmatrix
Copy link
Owner

As discussed, merging even though some open issues remain.

@pragmatrix pragmatrix marked this pull request as ready for review June 11, 2025 06:50
@pragmatrix pragmatrix merged commit 1acd92b into pragmatrix:master Jun 11, 2025
3 checks passed
@pragmatrix pragmatrix mentioned this pull request Jun 11, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants