Short name: SMILE
Long name: text-to-Speech and speech-to-text using MachIne LEarning
Company: Tara Hill National Park Teo
Call Stage 1: F4Fp-SME-COD210216 (see call details)
Proposal number: F4Fp-SME-COD210216-01
SUMMARY REMARKS & TESTBEDS
The aim of the proposed SMILE experiments is to enhance the scalability of the Discover Places application using open source machine learning libraries. In developing tours to date, we have encountered some very serious scalability issues related to cost and resourcing (sourcing actors and sound engineers) with regards to recording and editing audio content for tour sites. The Text-to-Speech for all (TTS) open-source machine learning library offers a potential solution to this problem, which can model an actor’s voice based on pre-existing Tara Hill National Park’s professionally developed tour guide content.
In this proposal, we intend to train a Text-to-Speech model using our professionally generated audio content. We also intend to test the DeepSpeech Speech-to-Text framework to support interaction with tourist users. The proposed experiments are expected to provide immediate benefit to the scalability of the Discover Places tour guide application by supporting the rapid production of high-quality audio content, tailored to each tourist site, which can be changed and updated easily when required.