top of page

StoryLab Science

Empowering Future Scientist Through Immersive Storytelling.

Project Management
UI Design
UX Research
Prompt Engineering
Instructional Design



Figma Prototype

Video Demo

Research Report

StoryLab Science Presentation


Felege Gebru, Partner
Anabela Tasevska, Illustrator
Michael Davies, Developer


Master's Project

Science education in elementary classrooms is often allotted a limited time of instruction, making it a challenging subject to teach. To address this, StoryLab Science was designed as a NGSS aligned supplementary educational tool that promotes scientific literacy and confidence in learners. StoryLab Science uses question-driven narratives and multiple representations of science to deepen their understanding. Learners communicate their knowledge to a teachable agent in which StoryLab Science provides timely feedback from AI to reinforce correct answers and address misconceptions.

The Design Process

January - March

Research & Needfinding 

My project partner and I wanted to learn more about the reasons as to why science education is not prioritized in elementary schools. Our approach involved conducting expert interviews with faculty members from the Stanford Graduate School of Education, engaging with Elementary and Middle School teachers, and gathering valuable perspectives from parents through surveys. To deepen our understanding of the subject, we also undertook an in-depth literature review to capture the broader educational landscape.

We identified 3 key challenges:

  1. Pedagogical Approaches: Research indicates that elementary students frequently conceptualize science in ways that may differ from teachers' expectations. This disparity can hinder the learning process, as individual misconceptions often go unaddressed within classroom settings.

  2. Prioritization within the Standard Curriculum: In elementary education, there is often a heightened focus on reading and math, primarily due to the significance of these subjects in standardized assessments. A survey revealed that the preparation for these tests can pose a challenge to the effective inclusion of science education within the elementary school curriculum.

  3. Student Readiness: Understanding scientific concepts can present a challenge for young children, as it presupposes a foundation in skills such as reading comprehension and critical thinking. Since these skills are still in the developmental stages for kids, science education might be difficult for them.

With these challenges in mind, we aimed to answer the question:


How might we design a tool that is rooted in pedagogy, can be integrated into the school's curriculum and meet student's where they are at?


Screenshot 2023-08-15 at 10.55.33 PM.png

Initially, we developed narratives with scientific information and used questions to assess learners' understanding. When learners answered questions incorrectly, we provided them with scaffolded explanations. However, during user testing, we discovered that the explanations were not as clear as we had hoped, and the approach of embedding scientific information within the narratives did not keep learners fully engaged.

Screenshot 2023-08-16 at 10.43.10 AM.png

In the next iteration, we removed the scientific information from the story. Instead, we adopted a new approach by creating multiple representations of science concepts. These representations are multimodal, offering learners various ways to explore the scientific concepts. Through user testing, we found that this shift provided learners with greater agency, empowering them to actively discover scientific principles on their own terms.

Screenshot 2023-08-16 at 10.58.17 AM.png

Additionally, we conducted user testing to gauge the comfort level and engagement of learners with a teaching agent. We observed that learners became more invested in the story. They actively took on the role of the teacher within the narrative, guiding the character's understanding of science to advance the story.

Screenshot 2023-08-16 at 3.24.26 PM.png

We recognized the need for providing personalized feedback on learner responses, leading us to explore the use of ChatGPT. As our first prototyping step, we developed a “Prompt Generator” tool to help us efficiently prompt engineer for feedback tailored to each individual learner. A key component of our feedback approach involves incorporating students’ interests, such as basketball or ballet, to create a stronger connection to the subject matter. Using the Prompt Generator, we fine-tuned our prompts, considering various parameters like name, age, interests, question, and responses, for effective and relevant feedback.

Screenshot 2023-08-16 at 3.22.10 PM.png

Utilizing the generated prompts, we initially tested the outcomes in the openAI online playground, enabling rapid iterations to identify the most effective prompts that provide feedback. Using those findings, we designed our own interface, as depicted in the image on the right, leveraging GPT to run on our own servers. If you'd like to see some of our tests, check out our appendix in our research report!

From our user testing, we found that learners prefer verbalizing their understanding over writing or answering multiple-choice questions. With a vision of enabling learners to engage in verbal communication with the teachable agents in the story, we recognized the need to implement speech-to-text technology. This essential step would allow us to convert their spoken responses into text format and feed them into ChatGPT to provide valuable feedback.

Screenshot 2023-08-16 at 3.31.22 PM.png

To begin prototyping, we used a series of YouTube clips of kids explaining science concepts such as gravity and transcribed them with Google Cloud Speech-To-Text online service. Then we stress tested transcriptions using audio from our own user testers in more noisy environments.

Screenshot 2023-08-16 at 3.31.54 PM.png

After analyzing these transcription results, we computed the Word Error Rate, a critical metric in speech recognition. Word Error Rate measures the accuracy of transcribing spoken language compared to the reference transcript, with lower values indicating higher accuracy models. The initial results were promising, and even in cases where transcriptions were not entirely accurate, chatGPT demonstrated its ability to interpret the data and provide valuable feedback. I will explain this further in the Efficacy section below.

After obtaining positive results from testing with recorded audio, we integrated real-time speech recognition into one of our prototypes. This addition enhanced the learning experience, enabling learners to engage in live conversations with the characters in the story!

The StoryLab Science Approach

April - June

Theory of Change

Following prototyping and user testing, we arrived at an approach that led us to the creation of a comprehensive beta lesson for StoryLab Science. To navigate the remainder of the design journey, we solidified our Theory of Change. Here is a high level overview of out Theory of Change: 

Screenshot 2023-08-16 at 3.47.35 PM.png

Now, let's explore the core mechanics further to better understand the user's journey:

Screenshot 2023-08-16 at 4.02.12 PM.png

1. Storytelling is a key element in our approach, aiming to engage learners, nurture curiosity, and foster a positive learning experience. Additionally, research has demonstrated the effectiveness of narratives in communicating scientific concepts and creating an engaging and memorable learning journey.

2. Learners will explore multiple representations of the same concept, including images, videos, and interactive simulations, to develop an understanding of the subject matter.


3. Learners will then apply their comprehension of science concepts to verbally teach a character. Research indicates that when learners are prompted to teach a teachable agent, they invest more time in learning and achieve a deeper understanding of the material.

4. Lastly, StoryLab offers timely and relevant AI feedback to enhance learning. This feedback reinforces correct answers and addresses misconceptions, providing learners the support they need to stay on track and maintain momentum.

Beta Prototype

To actually test our approach we narrowed the first StoryLab Science Beta prototype lesson around forces and interactions focusing on gravity, magnetism, and electricity. We also created the following NGSS aligned learning objectives:

  • Learners will be able to answer questions about a topic, referring explicitly to the multimodal media as the basis for their answers, demonstrating understanding of the concepts of forces and interactions (3-PS2-3, RI.3.1).

  • Learners will be able to relay information to a teachable agent about a relationship between a series of scientific ideas or concepts in multimodal media, using language that pertains to cause/effect, with a focus on forces and interactions (3-PS2-3, RI.3.3).

  • Learners will be able to integrate information from several texts on the same topic in order to speak about the subject knowledgeably, in the context of problem-solving related to forces and interactions (3-5-ETS1-2, RI.5.9).

We designed a narrative around these learning outcomes keep our learners engaged. Here is a quick introduction to the story: The Cosmic Adventures of Moo-nica: Exploring Space and the Forces Within.

Pictured below, we have Finn, the space explorer, and Baldwing, his trusty hairless chicken sidekick, who are tasked with helping Moo-nica, the space cow, find her way back home. Baldwing is a teachable agent that uses the the information learned to advance the plot of the story.

Screenshot 2023-08-16 at 4.10.02 PM.png

You can find our Figma prototype here and below are some images of our prototype as well as a short video demo of the StoryLab Science approach:

Validating StoryLab Science's Efficacy

June - August



Affordances of Technology

To ensure effective learning outcomes in StoryLab Science, we require:

  • Speech-to-text capabilities for learner verbalization.

  • Successful prompt engineering for high-quality feedback and effective interactions.

  • AI-generated feedback with interest-based metaphors.

  • Integration of multimodal resources, such as video, pictures, audio, and text.

To understanding the practical implications of our tool, we tested the speech-to-text technology in both quiet and noisy environments. For the clean input, we collected audio recordings from nine children explaining scientific concepts in controlled studio settings with minimal background noise. Using a formula to calculate Word Error Rate, we assessed the accuracy of the technology. We then used the Speech-to-text API to process the audios and fed them into our ChatGPT prompt. Additionally, we employed qualitative codes and a rubric to assess the effectiveness of the feedback.


To test the noisy environments, we placed learners in areas with background noise and allowed them to record their answers without specific microphone instructions. This accounted for real-life scenarios where learners might not speak directly into the microphone. Similarly, we calculated the word error rate, used the Speech-to-Text data as input for ChatGPT, and qualitatively analyzed the accuracy and effectiveness of the feedback.

Screenshot 2023-08-17 at 9.58.09 PM.png
Screenshot 2023-08-17 at 9.58.15 PM.png

Even though it was clear that the clean input did significantly better than the noisy input in the speech-to-text transcriptions, we wanted to see how ChatGPT would respond to both these inputs. Below you will find the qualitative rubric that we created, rating the feedback on a scale of 1-3. We also qualitatively coded the feedback and took into account factors such as accuracy, positive affirmation, reference to learners’ interests or prior knowledge, and addressing misconceptions.

Screenshot 2023-08-17 at 10.02.51 PM.png
Screenshot 2023-08-17 at 10.03.18 PM.png

In order to evaluate the feedback that we were getting, we first input the transcription of the learner’s response into our prompt. Once we got back a response from ChatGPT, we qualitatively coded the response to highlight areas of successful feedback.Lastly, we used our rubric and analyzed the feedback to give it an appropriate score. 


With this methodology, we got the following results: 

  • With the clean data used for transcription as input to our ChatGPT prompt, we received feedback that had an average rating of 2.1 out of 3. Meaning that, by our rubric, they were accurate, and relevant, but may need adjustments.

  • And the noisy data actually scored higher with an average rating of 2.6 out of 3, giving even more effective feedback. 


So even though the word error rate was higher for the noisy data, ChatGPT was still able to make sense of the input and provide relevant and useful feedback.

Learning Outcomes

We conducted a learner study with 3 participants, where they interacted with various representations of science concepts that focused on gravity, magnetism, and electricity.

Screenshot 2023-08-17 at 10.16.16 PM.png
Screenshot 2023-08-17 at 10.16.25 PM.png

Pictured below are a few examples of interactive content that our learners found particularly engaging: simulations and videos ranked among the most popular choices.

After interacting, they then verbally taught a character in the story. We recorded the learner’s and manually transcribed their responses.

We then created qualitative codes derived, pictured below, from our learning objectives to analyze the transcriptions and evaluate the learner's understanding. These codes included explicit references to multimodal media, the ability to verbally relay scientific ideas, and mentioning correct information from personal experiences.

Pictured below is an example of one of our participant’s responses. This participant explicitly referred to the examples presented in the videos and text for the subject matter topics. They effectively communicated about gravity, magnetism, and electricity to a teachable agent.

Screenshot 2023-08-17 at 10.17.01 PM.png

Our analysis revealed that 2 of the 3 participants were able to effectively achieve the learning outcomes and display their scientific literacy in gravity, magnetism, and electricity. It is important to note that even the one participant who didn't reach the learning outcome still felt empowered to continue the learning experience. And that all the participants felt comfortable verbally relaying information to a teachable agent.

Positive Learning Experience

During testing with 5 learners, all the participants successfully completed the Storylab Science experience. They engaged with the narrative, explored the science concepts, and effectively communicated their knowledge. Throughout the testing, we observed positive behaviors such as smiling and laughter, which indicated that users genuinely enjoyed the learning experience. Additionally, learners demonstrated great enthusiasm and engagement with the platform throughout the entire process. Pictured below is what some of our users had to say.

Screenshot 2023-08-17 at 10.30.21 PM.png

Additionally it was important for us to know what the individual's closest to the learners thought about StoryLab Science, so we conducted a survey with 20 parents of children under the age of 13. In the survey we asked them to watch the StoryLab Science promo video and answer a series of questions. We found that 100% of the parents would use the product with their child and that 95% would want the tool to be used in the classroom. Pictured below is what some of the parents had to say.

Screenshot 2023-08-17 at 10.32.57 PM.png
bottom of page