Improving speech recognition in automated testing

The continuous development of our test environments is a job for creative minds.

We are located in a region with many good scientists and technical universities, and we are committed to investing in the innovative minds of our students and the fresh skills they can offer. One such student is Simon, at the FAU university here in Erlangen. For the last few months, he has been an intern in our testing team. Here he explains his ambitions, the challenges and his successful implementation of improved speech recognition in automated testing.

My main aim was to gain practical experience in software development and, at the same time, learn more about the current state of defibrillator development. During discussion with my team, one aim really came to the fore: to improve and expand speech recognition for test automation, which is designed to test the timing and content of voice prompts from the defibrillators and then automatically convert them into actions.

The challenges I faced ranged from concurrency issues (due to the simultaneous recording of audio, translation/recognition and evaluation of the output) to the adaptation of language models and the programming of a smooth interface for transferring audio to speech recognition.

The solutions I developed were based on meticulous concept drawings that illustrated the library’s process relationships. Such an overview facilitated the identification and resolution of race conditions and deadlocks, and also led to the integration of an additional language model, which ensures improved accuracy in terms of the word error rate, despite a longer runtime.

Word boundaries instead of time intervals

Implementing a “Voice Activity Detection” function extended the transfer from audio recording to speech recognition by recognizing the speech in the audio and then splitting it only at word boundaries so that words are preserved intact. The segments to be recognized are often separated in time, as a result of which long words cross these boundaries and are then not recognized, or are recognized twice. Our solution was able to resolve this problem.

Looking back, during my time at Corscience I was given all the freedom I needed to be able to develop productively, as well as the opportunity to familiarize myself with the most important areas of software development. From day one, the team was welcoming, friendly and always ready to help, and I was able to tap into individual team members’ expertise in their particular field.

Based on my experience in university research, I have found that interdisciplinary collaboration within the company is far more effective and results-oriented, due to the shared objectives and the business background, compared to university, where there is often a different focus and less dynamism.

All in all, I found my internship very rewarding. On top of developing my technical skills, I also gained insights into how other teams work together, their fields of work and how the results of research are applied in practice.

Simon achieved a lot with us and will continue to do so after his Bachelor’s thesis “Auditory Attention Decoding” is done. We are delighted that, as a Master’s working student, Simon will again contribute to maintaining our product promise – delivering reliable and sophisticated products.

Improving speech recognition in automated testing

The continuous development of our test environments is a job for creative minds.

Word boundaries instead of time intervals

Simon achieved a lot with us and will continue to do so after his Bachelor’s thesis “Auditory Attention Decoding” is done. We are delighted that, as a Master’s working student, Simon will again contribute to maintaining our product promise – delivering reliable and sophisticated products.

Looking for a valuable master’s thesis?