Behavioral Prototype: Spotify Gesture Control

Feb 25, 2024 • 7 minute read

Intro and Prototype

In this project, our team embarked on the task of building and testing a behavioral prototype for a Gesture Recognition Platform designed for Spotify. The challenge was to explore user interaction scenarios using the Wizard of Oz technique, which allows for testing design assumptions when the actual technology is not readily available. 

We wanted to test what gestures users would most intuitively associate with the following actions: pausing/playing a track, increasing/decreasing the volume, and skipping/going to a previous track. Considering that we do not currently have developed technology for gesture recognition in Spotify, we needed a way to simulate a similar experience to get genuine reactions from our participants.

Spotify works in a way where performing any of the earlier identified actions on a mobile device will also take effect on any other device with the same account open. Thus, our strategy for this prototype was to have one person (the “wizard”) controlling the account from a mobile device while the participant interacted with a computer that had the same Spotify account open. When the participant made the correct gestures, the wizard would perform the associated actions. As the wizard would be hidden from the participants, it would appear as if the gestures were causing the changes.

The predetermined “most intuitive” gestures which we decided upon were:

  • Pause/play: palm out

  • Skip song/rewind: swipe right/left

  • Volume up/down: raise hand/lower hand

Facilitator Setup

During our user testing setup, participants sat in front of a laptop with Spotify Desktop open while a speaker positioned behind the laptop played music. The speaker was connected via Bluetooth to a phone with the same Spotify account logged in. This setup allowed for seamless interaction, with the "wizard" controlling the account from the mobile device in response to the participants' gestures, creating the illusion of direct interaction with Spotify via gestures. When the participant made the correct gestures, the wizard would perform the associated actions. As the wizard would be hidden from the participants, it would appear as if the gestures were causing the changes.

Speaker hidden behind laptop

View from participant

User Testing

Participant #1

Participant  1 was able to successfully gesture out the interactions on their first go without any hiccups. We first asked the participant to pause and play the Spotify Top Hits Today playlist with their palm in a stop motion — this works both to start and stop the song from playing. Afterward, we then suggested that the user would want to skip a song and possibly move forward in their playlist. They then proceeded to gesture in a swiping movement left and right to skip the video accordingly. Finally, we had our participants increase and decrease the volume of the song they were listening to — using a rising and falling gesture with the palm of their hand. 

Participant #2

Participant 2 experienced slightly more difficulty. We followed the same structure and asked them to perform three tasks, starting with the pause/play followed by the skip, and rewind and volume. The first task proved more difficult than the rest. They did not quite understand the concept of pausing and playing – They went about and tried clicking the screen, which we noted. However, the rest of the tasks were understandable when we briefed them on the gestural interactions we were testing. From this test, we noted that it can be difficult for first-time users who interact with the product to overcome that barrier and we would like to accommodate those who are experiencing those hurdles. 

Participant #3

Participant 3 had initial difficulty understanding how the gesture recognition system would be used. Their first reaction was to use the trackpad rather than holding their hand in the air and making a motion. As a result, our facilitator had to clarify again that it would be using the camera and suggest they try a motion with their hand. For the initial task, they attempted to point and click the screen and then transitioned into correctly pausing and playing the desired song in the playlist. The rest of the tasks became easier as they crossed that barrier of learning at the beginning of the test session.

Evaluation

The project prompt tasked us with crafting a video under 2 minutes in length to illustrate user interaction with the app. To ensure clarity and structure in evaluating the prototype, it was important to outline the evaluation criteria for assessment, showing how effectiveness would be measured. As a result, in addition to addressing the provided design research and usability questions, we aimed to assess our prototype based on the following criteria:

Design Research and Usability Questions:

  • How can the user effectively control the interface using hand gestures?

  • What are the most intuitive gestures for this application

  • What level of accuracy is required in this gesture recognition technology?

Evaluation Criteria:

  • Feasibility: If this gestural interaction can be completed with minimal hiccups and the interactions are easily understood. 

  • Usability: Ensure that the user can perform all of the tasks accurately and without significant delay just by using their hand in the air.

Video of User Testing

Analysis and Reflection

We felt that our Wizard of Oz prototype went pretty well as most of our participants were able to successfully do the gesture without the facilitator telling them. The environmental setup was simple so the participants wouldn’t feel overcrowded or intimidated. According to our success criteria, feasibility, and usability, our gestures were very easy to figure out and our Wizard did a good job by not having a significant delay when a participant made a gesture. 

The prototype was effective in achieving the initial goal that we set out to achieve, which was to test the “most intuitive” gestures. We found that overall, the participants had the easiest time intuitively coming up with and understanding the gestures for increasing/decreasing the volume. They all did the same gesture (raising/lowering their hand) and that gesture was the one that we had initially predicted to be the most intuitive. Comparatively, all three users had a different gesture for play/pause.

After presenting in class, feedback indicated that our simulated gesture recognition experience effectively mirrored our envisioned technology, prompting genuine reactions from participants. However, some users were confused when gestures didn't produce the expected results, emphasizing the need for clearer instructions. For example, one participant initially attempted trackpad gestures instead of in-air gestures, highlighting the necessity for clearer guidance. Although participants had clear preferences for gestures, they struggled to devise alternative intuitive options, suggesting further exploration of gesture intuitiveness. Future iterations will involve more in-depth participant engagement and contextual guidance to enhance system clarity and effectiveness, such as considering the user's physical context (standing, in a car, etc.).