code. art. new york.
Screen Shot 2017-03-10 at 10.44.19 PM.png

Somnolent Listener


Somnolent Listener is a partly intelligent, partly hearing impaired speech recognition system that
interprets speech | produces subtitles | captures keywords | records a transcript | creates visualizations


The entire idea is centered around making speeches and lectures more interactive. The listener records the data producing real time subtitles, maintains a transcript of the entire recording and also takes notes simultaneously.

There are three associated visualizations that can be switched using the arrow keys or the number keys:
1. Speaker Haze - Visualization one is a particle system that maps the volume levels of each audio input produced during the sound piece.
2. Aural Planets - Visualization two is a planetary system that responds when it encounters an audio input.
3. Keynotes - Visualization three uses the recorded transcript to generate keywords. Each keyword has a weight based on the number of times it is repeated. And each keyword also has associations with other keywords based on how close together they were spoken in time. It basically acts as an automatic note-taking system for the student/listener.

 The listener is prone to making errors while listening and hence, has been titled as a somnolent listener.


The idea was inspired from my own inability to take notes while in a lecture. Taking notes would make me split attention between listening and writing. Secondly, continuity of notes is generally lost over a longer period of time. Plus, it's difficult to capture everything that is spoken during the lecture unless the lecture is recorded on video.

The question : "How to make lectures more informative and interactive?"
The solution : "To translate the lecture into a performance."

As illustrated above, the piece involved formulating different dimensions to a spoken word scenario wherein the audience can feel more engaged, while making it easier for them to stay in tune with what was spoken.

The building process comprised of using:
1. Microphone for audio input.
2. Processing.js and p5.js for creating the visualizations and the note taker.
3. Speech recognition library to interpret speech to text.

The source code can be found on github.


The applications from this simple system can be multifold:
1. The system acts as an automatic note making device for a student or listener. This would ensure that a person can focus undivided attention on the speaker and not worry about capturing everything that's spoken.
2. For a person whose first language isn't English, it can often become difficult to follow what is being spoken. The subtitles assist such an audience member to understand the words.
3. The transcript can be used to look back at the lecture and use references. Or perhaps, if one is sleepy or inattentive in class, they could use it to know what had been spoken.
4. The visualizations make the lecture seem more engaging.


Project by Utsav Chadha

Thank you:
Daniel Shiffman, instructor and guide for Nature of Code
Luke Dubois, for p5.speech.js library
Jason Sigal for p5.sound.js library



This project was created as part of the Nature of Code class at NYU ITP ( Spring 2017 ).