Use Open JTalk from Elixir. This package builds a local open_jtalk CLI and, by default, bundles a UTF-8 dictionary and an HTS voice (you can disable this), exposing convenient functions:
OpenJTalk.say/2— synthesize and play via a system audio playerOpenJTalk.to_wav_file/2— synthesize text to a WAV fileOpenJTalk.to_wav_binary/2— synthesize and return WAV bytesOpenJTalk.Wav.concat_binaries/1— merge multiple WAV binaries (same format)OpenJTalk.Wav.concat_files/1— merge multiple WAV files from paths (same format)
Add the dependency to your mix.exs:
def deps do [ {:open_jtalk_elixir, "~> 0.3"} ] endThen:
mix deps.get mix compileOn first compile the project may download and build MeCab, HTS Engine API, and Open JTalk. By default it also downloads and bundles a UTF-8 dictionary and a Mei voice into priv/ (you can turn this off with OPENJTALK_BUNDLE_ASSETS=0).
You’ll need common build tools: gcc/g++, make, curl, tar, unzip. On macOS Xcode Command Line Tools are sufficient.
Optional environment flags (honored by the Makefile):
OPENJTALK_FULL_STATIC=1— attempt a fully staticopen_jtalk(Linux only; requires static libstdc++)OPENJTALK_BUNDLE_ASSETS=0|1— whether to bundle dictionary/voice intopriv/
Host builds (compile and run on the same machine):
- Linux x86_64
- Linux aarch64
- macOS 14 (arm64, Apple Silicon)
Cross-compile (host → target):
- Linux x86_64 → Nerves rpi4 (aarch64)
# play via system audio player (aplay/paplay/afplay/play) OpenJTalk.say("元氣ですかあ 、元氣が有れば、なんでもできる")All synthesis calls accept the same options (values are clamped):
:timbre— voice color offset-0.8..0.8(default0.0):pitch_shift— semitones-24..24(default0):rate— speaking speed0.5..2.0(default1.0):gain— output gain in dB (default0):voice— path to a.htsvoicefile (optional):dictionary— path to a directory containingsys.dic(optional):timeout— max runtime in ms (default20_000):out— output WAV path (only forto_wav_file/2)
You can combine multiple WAVs (same format: channels/rate/bit depth/etc.) into one:
{:ok, a} = OpenJTalk.to_wav_binary("これは一つ目。") {:ok, b} = OpenJTalk.to_wav_binary("これは二つ目。") {:ok, c} = OpenJTalk.to_wav_binary("これは三つ目。") {:ok, merged} = OpenJTalk.Wav.concat_binaries([a, b, c]) # or from files: # {:ok, merged} = OpenJTalk.Wav.concat_files(["a.wav", "b.wav", "c.wav"])The package resolves required assets in this order:
- Environment variable override
- Bundled asset in
priv/ - System-installed location
- Env:
OPENJTALK_CLI— full path toopen_jtalk. - Bundled:
priv/bin/open_jtalk(built during compile). - System:
open_jtalkfound on$PATH.
- Env:
OPENJTALK_DICTIONARY_DIR— directory containingsys.dic. - Bundled:
priv/dictionary/sys.dicor anypriv/dictionary/**/sys.dic(e.g.naist-jdic). - System: common locations such as
/var/lib/mecab/dic/open-jtalk/naist-jdic,/usr/lib/*/mecab/dic/open-jtalk/naist-jdic, etc.
- Env:
OPENJTALK_VOICE— path to a.htsvoicefile. - Bundled: first file matching
priv/voices/**/*.htsvoice. - System: standard locations like
/usr/share/hts-voice/**or/usr/local/share/hts-voice/**.
If you change environment variables at runtime (or move files), refresh the cached paths:
OpenJTalk.Assets.reset_cache()This library is Nerves-aware. When MIX_TARGET is set the build defaults to:
OPENJTALK_FULL_STATIC=1— try to statically link the CLI on Linux targets when possibleOPENJTALK_BUNDLE_ASSETS=1— bundle CLI, dictionary, and voice intopriv/
So for many projects no extra configuration is needed.
export MIX_TARGET=rpi4 mix deps.get mix compile mix firmwareOn the device:
{:ok, info} = OpenJTalk.info() # bundled assets should show up as :bundled OpenJTalk.say("こんにちは")OpenJTalk.say/2 requires a system audio player. Most Nerves images use ALSA aplay. If your image does not include a player:
- add one to the system image, or
- use
OpenJTalk.to_wav_file/2and play the WAV with your chosen mechanism.
Bundling the full dictionary + voice + binary increases firmware size. Approximate (uncompressed) sizes:
- Dictionary: ~103 MB
- Mei voice: ~2.2 MB
- CLI binary: ~2.4 MB
If that’s too large you can avoid bundling at compile time and provision assets separately (rootfs overlay, /data, OTA, etc.):
MIX_TARGET=rpi4 OPENJTALK_BUNDLE_ASSETS=0 mix deps.compile open_jtalk_elixirThen point the library to the provisioned assets (for example in config/runtime.exs):
System.put_env("OPENJTALK_CLI", "/data/open_jtalk/bin/open_jtalk") System.put_env("OPENJTALK_DICTIONARY_DIR", "/data/open_jtalk/dic") System.put_env("OPENJTALK_VOICE", "/data/open_jtalk/voices/mei_normal.htsvoice") OpenJTalk.Assets.reset_cache()How you provision those files into your image is outside the scope of this library.
This package does not redistribute third-party assets by default. At compile time it may download and build the following components:
- Open JTalk 1.11
- License: Modified BSD (3-Clause)
- Source: http://open-jtalk.sourceforge.net/
- HTS Engine API 1.10
- License: Modified BSD (3-Clause)
- Source: http://hts-engine.sourceforge.net/
- MeCab 0.996
- License: Tri-licensed (GPL / LGPL / BSD); used under BSD terms
- Source: https://taku910.github.io/mecab/
- Open JTalk Dictionary (NAIST-JDIC UTF-8) 1.11
- License: BSD-style by NAIST
- Source: https://sourceforge.net/projects/open-jtalk/files/Dictionary/
- HTS Voice “Mei” (MMDAgent_Example 1.8)
- License: CC BY 3.0
- Source: https://sourceforge.net/projects/mmdagent/files/MMDAgent_Example/
- Attribution: “HTS Voice ‘Mei’ © Nagoya Institute of Technology, licensed CC BY 3.0.”