Thursday, May 2, 2024

Texas will use computers to grade written answers on STAAR exams

Posted

The state has rolled out a new method of grading this year’s STAAR tests – they will use computers instead of humans to grade a student’s written responses.

Starting in December 2023, the Texas Education Agency (TEA) began implementing a hybrid scoring model that uses a mix of an automated scoring engine and human graders for STAAR tests. The computers will grade written responses, and then 25% of the responses will be re-scored by humans, according to a report by the TEA.

“The Texas hybrid scoring model uses an automated scoring engine to augment the work of human scorers, allowing us to score constructed responses faster and at a lower cost,” TEA spokesperson Ricky Garcia said in an email. “The automated scoring engine uses features associated with writing quality and features associated with response meaning. Writing quality features include measures of syntax, grammatical/mechanical correctness, spelling correctness, text complexity, paragraphing quality, and sentence variation and quality.”

The STARR test was redesigned in 2023 to include fewer multiple-choice and more open-ended questions that are similar to those teachers ask in the classroom. Due to the increase in written responses, the TEA made the shift to a hybrid model to cut down on costs associated with hiring more human graders to review the tests. According to the TEA, the change saves the agency $15 million to $20 million per year.

The technology uses natural language processing, which is a building block of artificial intelligence chatbots such as GPT-4, the Texas Tribune reports.

Garcia said that while there are foundational aspects of the automated scoring engine that could be considered AI, the program is not like ChatGPT.

“ChatGPT is a generative AI software, the ASE (automated scoring engine) is not,” Garcia said.

As students sit down to take their STAAR exams this month, local school districts have voiced concern about the new process of grading the assessments.  

Superintendent of Gatesville ISD, Barrett Pollard, said that since the technology is still new, the automated scoring might look for key words or phrases rather than understand the intent and effectiveness of the students' writing.

“In other words, a poorly written response may have a few of the key terms the AI is looking for and receive a decent grade,” Pollard said. “Conversely, a very well-crafted and effective response may not contain some of the key words and phrases and receive a poor score. This would be a real disservice to students. If you have read or watched any of the news reports concerning artificial intelligence, you will know that it is prone to mistakes and inaccuracies.”

Superintendent of Oglesby ISD, Shane Webb, echoed similar concerns about using an AI-powered grading system, including the possibility of seeing lower test scores.  

“A human grader is going to understand a student’s voice, and a computer grader might not see those things,” Webb said.

Per the TEA, the automated scoring engine uses a sample of 3,000 human scored responses for its programming. The engine analyzes the responses to identify patterns so it can emulate how humans score the questions. The system will flag answers it is not sure about for humans to review.

The State of Texas Assessments of Academic Readiness (STAAR) is a standardized academic achievement test that is distributed every year to students from third grade to high school. It contains assessments in mathematics, reading, language arts, science, and social studies.