Interim Assessments Shift from Test Prep to Real Growth in Re-Imagined Khan Academy

Interim Assessments Shift from Test Prep to Real Growth in Re-Imagined Khan Academy

Khan Academy introduces new interim assessments designed to move beyond test prep, focusing on real student growth through adaptive, multi-format questions and personalized learning pathways.

from "Test Prep" to Real Growth: A New Way Look at Interim Assessments. | Transcript:

Hi everybody. Thank you for joining us. We're really excited today to give you a little sneak preview into our brand new interim assessments that are going to be part of Re-Imagined Khan Academy. And with me today, I have Lauren Deeders, who's our director of assessments, and Peter Jacobson, our senior product manager. I realize that when you read when you got your emails or your announcements, it said that Kristen, our chief academic officer, would be here. Unfortunately, she was not able to join us today, but Lauren is way more than qualified since she's in charge of this product. So, she can tell us everything we need to know.

Um, as far as our agenda, we'll start with why we are developing interim assessments, why we started on this journey, then Peter will give us a bit of an overview demo, and then we'll answer your questions. Please feel free to ask questions in the chat as you think of them. I'll keep track of them and we'll take about 10 minutes at the end to answer them. If anybody has a question that we don't get to, don't worry. I will send you a response by email. So Lauren, I'm going to hand it over to you. Awesome. Thanks, Denise. And welcome everyone. Thank you so much for joining us today and um giving us this opportunity to talk about the really exciting work that our team's been doing for the last two plus years. Um really

excited to share some of our thinking and then really exciting um what's coming with Peter in the demo. I'm excited to share that with you as well. So I wanted to set some context for us on why are we developing interm assessments? Why interim assessments? Why now? Um for folks uh here who are familiar with what we have on Khan Academy already, you may say, "Hey, don't you have unit tests? Don't you have quizzes, course challenges, and we do. We have those assessments um on our platform. We call those formative assessments or assessments for learning which are built into the instructional flow. And we saw an opportunity um with AI but also to create a fully comprehensive assessment and learning

system to develop interim assessments. And so for us, you know, when we embarked on this journey, we thought, you know, what are some of the pain points we can solve and what are some of those pain points we can solve with AI? And so when we started this thinking, um, if you don't mind going to the next slide, um, we start thinking about some of those pain points from our own experience for those of us here who have worked on assessments, but also in some of the conversations that we've had with students, with teachers, with district administrators. This list here is by no means exhaust uh exhaustive. So I'd love to hear in the chat if there are other pain points for you all. Um we always

love to learn from our partners. But one of the big ones is all multiple choice. Uh I have a measurement and psychometric background and you know from that point of view multiple choice is great because it allows you to assess a lot of skills in a really short amount of time. But multiple choice questions don't really mimic the real world application of those skills and they really don't allow for some of that nuance understanding or measuring of the extent of students knowledge. It's very much it's right or it's wrong and there's no real kind of like continuum of understanding what it is a student knows and can do.

Another pain point we identified you know as we embarked on this journey was lack of actionability. Uh, a lot of assessments when you look at those score reports, they have that sound measurement information, but they fall short of helping administrators, teachers, and students identify gaps, continue growing, and identify like very clear, actionable next steps. And then lack of trust. Assessments um assessment items and assessments are time consuming. They're expensive to create. And so, you know, we don't have massive large item banks that kind of underpin these assessments. And that results in, you know, the assessment industry tightly guarding those banks because we want to make sure, you know,

our test is secure. But it also means that, you know, folks who are really interested in what students will be seeing don't get to see that. And that's administrators, teachers, caregivers who may just want to peek under the hood and understand what's going to be on the assessment. And then finally, time. A lot of assessments out there in our experience, they make the promise of being something you can complete in a class period, but that's often not the case. And so when we thought about here are the pain points, we thought how are we going to address these? And so really we want an authentic assessment. Uh we want to move beyond multiple choice. there's still going to be some multiple choice that it is

familiar and that's helpful, but we want to move beyond to looking at other types of items um to truly measure the extent of what a student knows and can do. And so kind of like looking at that continuum of student knowledge. And this is with interactive item types. This is with short response in mathematics. And this is also with conversational assessment, which I'm really excited that you'll get to see a little bit of today. Actionable. So we really believe in that measurement rigor and those quantitative results in those score reports, but we also want to provide qualitative information and clear next steps. So what are some of the trends that we're seeing at the class level, at the district level? What

are some of the strengths? What are the areas of growths? Um where does where are there some common areas uh for a class to continue improving? Are there common errors that the class makes? So we want to point those out. and then open. When you have a large enough item bank um underpinning your assessment, you don't really have to worry so much about test security. And so we're working really hard um with our team um doing a lot of research um on the kind of like the combination of AI and psychometrics to build towards a future where we have an open item bank or a large enough item bank that anyone anywhere can just take the assessment, see what's on there and kind of build up that trust in assessments again. And

then finally, efficiency. You know, we want to make sure that, you know, when we say our assessment fits within a class period, it actually does. And so, so far, uh, that's what we're seeing. And so, before I transition over to Peter, I kind of want to address the kind of combination of assessment and AI and how does AI change what we can do in assessment. And our point of view here at Khan Academy is the intentional use of AI can make more authentic assessments possible through conversations. Um you may know or have even seen this yourself um in playing with generative AI that it's actually really good with language and having conversations. And through our work and our research um you know we've been creating these

conversational contexts to prompt and probe how a student arrived at their answer um and understand you know you know some of their conceptual knowledge as well. And they allow us to mimic not fully replace but mimic the conversations in an assessment that a teacher has with their student when they you know may ask their student you know tell me about this step here. tell me why you did that did it that way or tell me a little bit more why you chose that answer. And then finally, conversational assessments allow for us to explore more nuance scoring. So we can award students credit for what they know and what they demonstrate even if they don't get the answer fully correct. And then we can make that student thinking visible to

the teacher so they can kind of see those insights, maybe diagnose, oh this is what I should do next. Um, and so for us, that's how we see this kind of like this combination of AI and assessment because it's not just about AI, it's really about the foundation um, and our intended goal, which is to improve student learning outcomes. And so, I want to leave you with kind of, you know, our theory of action that's kind of guided our work as we've explored conversational assessment the past couple years. And our belief is that if educators integrate conversational assessments that prompt students to explain their thinking, then students will engage in deeper cognitive processing, verbalize their conceptual understanding, and develop metacognitive

awareness. So even going back after they're having a conversation, look at their answer and say, "Ooh, after I've talked through this, maybe I want to go back and change that." As a result, educators will gain more diagnostic insights into student thinking than a traditional assessment allows, enabling more targeted instruction, which will ultimately improve student learning outcomes and equity and access to rigorous tasks because at the end of the day, we want to improve those stu student learning outcomes. So, I'll stop there and pass it over to you, Peter. All right. Thank you, Lauren. Um, my name is Peter Jacobson. I am a product manager on the assessments team here at Khan Academy and I really love what I

do. I get to work with tons of creative people to solve challenging problems and kind of be on the edge of technology which is really fun and I'm excited today to get to share kind of what we're doing and also peak under the hood. So I'm going to give you a demo of the student experience of our assessment and I'm also going to show you a little bit of the underlying systems kind of the how I built this part of it. So now I'm going to first give myself a little test to figure how well I can screen share here. So can I get a thumbs up if from you Lauren or Denise if you're now seeing the assessment experience? Wonderful. All right. So um Lauren mentioned that we are trying to minimize multiple choice in an

effort to kind of be more authentic. So I'm going to walk you through a sandbox assessment. This is not the actual assessment we're delivering to our students, but the experience is very much indicative of what students sees. And we're also trying to demonstrate all the different types of items that we have here. So, I'm going to have to be on my toes because they'll be displayed to me in some random order. Um, getting in the very beginning, we do want to get students in the right mindset as they start to take the assessment. So, take their time. You know, we are also building adaptive assessments. So what you answer is final because it does dictate the next assessment or item that you'll be selecting.

So many of our item types are, you know, constructed response here. This is a numerical input. I'm not going to ask you to read all of these items, but I can kind of go through them quickly so you see kind of a smattering of different examples here. This is a geometry proof and our um content creators have done a nice job of using kind of a drop- down format to be able to simulate kind of the missing pieces of a proof right now. Um but this may even be another opportunity for more um leaning into AI in the future because proofs are sometimes difficult to assess. Um one of the common item types that we have we call internally kind of an expression. This allows you to put in a more complex equation. So in this case,

you know, I can put the equation for how what is it cupcake prices change or the sale price changes with the price of cupcakes. And then we also have kind of a number pad. So you can enter this is also allows us to make it accessible. Um again we're not completely devoid of multiple choice questions. So, we do have some examples of multi-check or single select items. Ah, and here we are. Now, I get to one of the kind of special ones that we've talked about. So, Lauren was mentioning conversational assessment. Our particular version of that we call explain your thinking. And this is an explain your thinking question. You'll

already notice that there's kind of an experience interruption where it's kind of telling the student this is a two-part question. Um, the reason for this was in some of our early testing, we were finding out that students would take the sudden appearance of an AI as some sort of cue that they had gotten something wrong. So, we're trying to set expectations that you're going to have this conversation no matter what, so that is not interpreted as some sort of measure of correctness. So, you get into a two-part question. And the first part of explain your thinking is very much like anything else. So right now I'm in a geometry item. I'm being asked for tangent. Um I'm going to use

that same expression widget we were talking about before and I'm going to remember my what is it? SOAOA. So opposite over adjacent right here. And I can jump in and I'll enter this. And now we are in the meat of the explain your uh thinking experience. You still have the first part on your left, but now you have this new conversation that's starting with our AI who we call conductor. You may be familiar with Conigo, which is the AI tutor that is very helpful in Khan Academy practice. We have felt the need to create our own persona for the assessment use case because in assessments things are a little bit different. We don't want to be giving away the answer. We don't want to be um causing fairness issues where some folks

are getting more hints and other students are not receiving them. So I'll show you a little bit more once you see the system architecture of how we're working on this. But here's our AI conductor. And you'll also notice that conductor is not just asking how did you calculate the tangent. This qu question is actually building upon this. How can you tell whether the tangent of 30.5° will be the same in all right triangles with a 30.5° angle? So I'm going to go ahead and symbol um maybe take the role of a partial understanding and I'll say something like tangent is always opposite over adjacent.

Now that is true but that's not really what the question is asking right here. Um, and so at this point I'm getting more probing and maybe I'll be a little bit more realistic and even go I don't know. So at this point conductor is continuing to probe to see if there's something that I can get out of this to help explain this. So I'm looking at the angles. Well, all right triangles with 30.5° angle are going to be similar. So, their sides are proportional. At this point, you'll notice that conductor is ending the conversation. And that's because I have given that extra bit of knowledge. I have satisfied the criteria that the AI behind the scenes is looking at to apply my knowledge of solving for tangent and understanding that is really that

the proportionality of the sides is going to keep the tangent the same all the time. So, I've kind of hit those key criteria and then conductor moves us forward. Now, I'm going to keep going through the assessment. Maybe I'll I'll even further demonstrate being a kid right here. Maybe I didn't even read the question. We do have some gates and flags to try and make sure students aren't just rushing through it. And this also provides data for our teachers if a lot of rushing flags are um triggered. We also have short response item types. Um

you've mostly been seeing math right now, but this is a very big part of our ELA assessment as well. Um, I won't bore you with my math. Divide by three. That's a very unsatisfactory answer right there. Um, and this one actually, this is what we call interactive graph. And one of the things I'm particularly proud of as a product manager is drag and drop type exercises are very difficult to make accessible. We have actually created this so this can all be done by keypad. So, it's accessible for anyone using a screen reader. We actually just had a VPAD come back and we're very proud of the results. Very strong accessibility rating. Um, so this is a type another type called interactive graph. And now I'm at the

end of my little sandbox assessment. And after a little bit of thinking, Conductor again, our AI guide through this assessment experience is providing insights. Now, in this particular case, since there was only one explain your thinking question, these insights are really just based on that one question. But we are on most of our assessments, we have many of these explain your thinking um interactions. And we're also building toward incorporating short response to really provide students with a good sense of where they were strong, where they were weak in a narrative format that's not just going to be a score um that's difficult to interpret. So, this is one of the places we're very

excited about being able to glean kind of more detailed insights from these nuanced interactions that are much more um you know helpful and useful and actionable than just getting the score. So, give me a moment to go back to the screen here. Um, all right. I'm feeling a sigh of relief. I navigated the trickiest part which is switching screens right there. Um here is the behind the scenes um architecture of what I just demonstrated with explain your thinking in particular. So you'll notice there's three pink boxes in this. These are our AI agents. So if you're at all familiar with AI, oftent times you kind of have different agents for different tasks. So in our particular experience, we have those three AI agents. But you'll also

notice the very first part of our question, which we call the conversation starter. So in the example we looked at, why are all um triangles with this angle going to have the same tangent? Um that conversation starter is not written by AI. We find that it's really important to make sure these conversations get off on the right foot. And so our content experts have been creating these specifically to get at that higher order kind of conceptual understanding some of the stuff that's more difficult to assess with traditional multiple choice type questions. They've crafted this to get that conversation to really go in the right place. Then the student responds, which you saw me do, and then

it goes to our first agent, which is the scorer. If the student is able to immediately understand everything that's required, kind of explain the underlying conceptual knowledge, we don't force it to be a conversation. If they get it right away, the conversation can end. If all of that underlying criteria is met, the conversation will end. They'll just go on to the next question. But in many cases, we've observed it's not all a one-shot deal. So if not everything is satisfied, then we have a response. And this is where the AI is trying to figure out how to make a productive conversation out of this. What we've found is we actually had to add this third agent, the self-critique agent.

And what this is doing is it's actually like looking at the other AI and checking its work. You're not supposed to give away the answer. And so the self-critique goes back and looks, was the response actually giving something away? And this has been a really good way for us to control for leading questions. Again, in an assessment environment, we're not trying to provide the most possible help at all times. We're trying to make sure that what help is being provided is fair across all students. So, we're trying not to give away leading questions or give away the underlying criteria. So, only the things that students have already brought up in

conversation are on the table for the AI to reference. And then this proceeds until either the student finishes the conversation by succeeding with the criteria or we cap it at a certain turn length because again as Lauren mentioned, we really do value time and we don't want these experiences to be so long that they bleed into multiple class periods. We've had some really exciting research already happening. Um there's a lot to unpack on the screen. So, you know, we I'll just try and harp on one important piece, which is really that second bullet. You know, we've been looking at

different items and how they perform. And what we're noticing is that many students will start with an incomplete understanding. So, that first turn in the conversation demonstrates maybe some but not complete understanding. And that in many cases by the end of the conversation, they are showing that they actually understand the criteria. So in this case, what we're getting at is had this been a oneshot deal, just a short response, student may not have been able to show fully show what they know. There's actually more information being revealed as the conversation progresses. And this student example, I think, is a really good version of this.

You know, how can you tell which interval has the greatest average rate of change? Students, honest, I found this pretty confusing. Not showing understanding. Can you tell me more about what you found confusing? Let's work together. thought the rate of change varied for each one. It's a good observation. How did you determine the rate of change for each individual? And here the students kind of starting to put it together. The change of y overx. Great. When you calculated the change of x or y overx for each interval, how did you decide which one had the greatest rate of change? And now the student is almost realizing the answer in this conversation. Oh, it's from x to z uh to

7. If I remember correctly, this is showing the slope on a graph where it's the steepest. and then thank you for explaining your thinking. So you can almost see the student understanding before your eyes. And again, you can change your answer so students can even correct for a mistake, which is some of that metacognitive idea that Lauren was referencing as well. We're not surfacing that as yet, but that certainly gets me excited for the future. Almost everything I've shown you so far is really for math, but we are piloting ELA assessments right now. In fact, right now we're in our winter testing window and we are trying out our first explain your thinking questions for ELA

and we're still experimenting to find what works. But I am a former secondary English teacher and from my experience, one of the things I noticed is many students could maybe identify a literary, you know, question. It's like this is the main idea, this is the uner's purpose. They might even be able to select the evidence, but connecting the dots of why that evidence is supporting that claim, that author's purpose, whatever they've selected, that's where things often broke down. And I'm really hoping that explain your thinking really shines in pulling out reasoning for students. That's where I see a lot of potential. But again, it's very early, so I don't want to overstate kind of how well this works in the ELA

context. We're hopeful, but we're still learning. And I already mentioned learner insights. This is drawing from all of those kind of richer conversations to kind of summarize what the students understands maybe where they have um unfinished learning. This gives you a little bit of the sense of the architecture behind that. So for each um assessment, if you look at student A over there, they might have four explain your thinking conversations. Well, those four then the criteria of those, the ones that were satisfied, the ones that were missed, all populate kind of a summarized learner insights for that student. It's originally written for the language for consumption of a teacher. But we then modify that so it can also be shown to a

student on the end of test card, which is what you were seeing in those screenshots. But where we get excited and in fact as we partnered with districts this was almost where they were thinking themselves. Could you summarize the summaries across all my students and so this is where we have these rollups. So you can think of all of these explain your thinking questions on multiple students in a class feeding these summaries for the individual students but then those individual summaries getting summarized as what we call like a rollup. So you can see a lay of the land across all of the students and get kind of those bulleted these were the criteria that people got. These are the ones they understood. These are

maybe the corrections that were happening. This is a really exciting piece of us and again a place where you can get more actionable data from an interim than you would normally be able to get. So I'm going to close us with a little bit of what's next. And um Lauren again kind of referred to this in some of the early slides. I would argue that what's next for us is the what's next. We want to make sure that you take a test and you're able to do something with that information. And as a complete kind of learning system, if you have assessments with practice, the obvious thing to do is how do those results inform the practice that you're going to be um taking? What is the most productive

place for students to be using their time? So, we're working on personalized recommendations. The assessments produce an overall score and reporting category scores which would be similar to like a domain score. And then each of those scores then populates a pathway that's personalized to the student. So they might be strong in a couple reporting categories, weak in another, really weak in another one still. All of those can work together to provide recommendations in our, you know, trusted proven Khan Academy highquality content. And we don't cut the teacher out of the process. you know, students can always have a bad day in assessments, so teachers can modify those recommendations in order to make

sure that they're challenging students appropriately. But it's really the last bullet point that I'm the most excited about. Um, we are always ensuring, and this is really kudos to our content team who's working on these recommended pathways. We're always ensuring that there is grade level content, no matter what level a student would score at. So what I mean by that is imagine you score high on the assessment in a given reporting category. We're going to be focusing on grade level content with some enrichment opportunities in that pathway. Imagine you score maybe in the middle of the road. You might have some gaps.

We're still going to try and prioritize the grade level content, but there might be some foundational work that's a grade level below or that still, you know, needs some work. And if you're scoring well below grade level in a given uh reporting category, again, you'll still see up here that there's still a focus on the core grade level material, but there's also going to be some gap filling and maybe even some really deep foundational work that needs to support that. And I love this because I have seen it myself where students kind of score well below grade level and they get sequestered in this kind of I'm two years behind land which is bad for self-efficacy and it sets them apart

from the rest of the class which in some ways makes it even more difficult to stay on pace with what's going on. So very excited about this. And then just one more kind of uh slide about the future. We've talked a lot about these adaptive assessments that are interims that we're building right now. Um, come this f this coming school year, we'll have interims in geometry, algebra 1, middle school math, high school ELA in grades 9 all the way to 12. But our assessment journey is not done right there. We're also expanding the portfolio. So we'll be piloting algebra 2, middle school, ELA grades, and then even further down the road, the idea is to get all the way down to third grade.

So we'll be going into elementary math and ELA. So assessments is a long journey, but we're we're very excited with what we're doing and um very excited that you've all been able to join us. And I think we may in fact have time for questions. So um I'd love to see what you all have. And I'll stop the screen share so I can actually see. Denise, maybe you can surface anything that might be coming through. Sure. And first I want to thank you Peter and Lauren for sharing all that information. And then for those of you that are joining us that are one of our current district partners, I'm assuming that you got the announcement that the

that we will be offering this to you next year as part of re-imagined Khan Academy at no charge. So, it's yours to use. If you have not yet had a conversation with either a sales rep or a district success manager, those are coming. Um, in each of the emails that you've been giving the given, there's also a link to fill out for more information. Avivve, can you put that link or the contact link in the chat if anybody wants to reach out now? And yeah, we did have a couple questions. I think we c I think Jennifer, we may have answered your question, but I'm going to ask Peter to go back to it again so that it's crystal clear. So, are students allowed to skip any questions or any assessment items? And in particular, the explain your thinking?

We currently do not allow students to skip questions. They need to put some sort of input into all of the different item types. However, for explain your thinking, we do have an opportunity for students to exit the conversation prematurely. You know, we try and because there's the opportunity for partial credit, we do try and encourage students to at least put something in there, but we're also not going to make it a beatd down and you're not going to have to say IDK four times if that's the case. We're continuing to look to improve these experiences. So, we're trying to find maybe even more elegant ways to have the early outs again so that we're very um efficient with time.

So, the very direct answer is students must put an answer for everything. This is part of an adaptive assessment. Um but we are looking for ways to find ways to get students more option to exit out and that may be an opportunity for some discovery for us. Student choice is definitely something we'd love to see in the future. Thank you, Peter. And then Lauren answered this question in the chat, but I want to make sure everybody got the answer to it. So, Rachel asked, "Can users select the standards to assess or are they preset by Khan Academy?" And Lauren, I'll turn that over to you. Yeah, I love this question and it's I think it's a question we've gotten from several of our district partners um you know and we've done you know

collaborative research with them. Right now the answer is no. We have you know blueprints that our content experts have worked on to try and follow the general scope and sequence um as best as possible. Um but we know like it's not perfect for anyone system. And so right now the answer is no. But one of our longer term plans is to think through ways we can build a modular system so that within a certain time period you can say well we assessed that or we taught this and this but we didn't get here so you can kind of make it your own but we're not quite there yet. Um from my perspective the thing I'm always thinking about and that we hear from district folks is you know the

comparability of scores and so that'll be a fun challenge we get to work through as we build towards that system. But um definitely I definitely hear the importance of that um because we know not everyone teaches everything in the exact same order. Thank you Lauren. Okay, we just got another question from Sherry. What are the established blueprints built from? You know, what is the scope and sequence? And I'm going to start briefly and then Lauren, I'll let you finish. So, um, we've looked at state standards and for any of the states that you all are in, you know, for the district partners that we're offering the early access to, we will give there will be a blueprint that is aligned specifically

to your state standards. Those are in process right now and they will be available very soon, Lauren. And so the blueprints, they're actually built from our content experts. So all of our content experts, our teachers, um, uh, assessment experts building these from, uh, from those standards and the scope and sequence. It's actually we kind of looked across all of the available scopes uh and sequences and kind of found a general like best path or best happy path knowing it wouldn't fit every exact one but it was kind of like the best combination of every scope and sequence. So we can share you know after this call you know maybe I can talk to you Denise and share here are the ones that we pulled from so you

have that information but we didn't want to just pull from one we kind of looked across and said what are the commonalities what is the best happy path given that you know there are so many uh different options out there great question and Sherry I'll follow up with you on that with that information um does anybody else have any other questions we'd also love to here be besides just questions if there is anything that you want us to include for the future. One of the benefits of partnering us partnering with us for early access is that we're going to ask you for your feedback and we want to know what you need and what you think could make assessments and student learning and there you know make assessment work

better for student learning. So please share anything that you have. Thank you so much for joining us.

More Learn Transcript