Most schools assign each student a “grade point average”, i.e., a number that averages over many teacher evaluations of that student. Many schools also assign each teacher an “average student evaluation”, i.e., a number that averages over many student evaluations of that teacher. Many workplaces similarly post evaluations which average worker performance ratings across different tasks. And sport leagues often post rankings of teams, which average over team performance across many contests.

A lot rides on such metrics, even though they are simple aggregates over contests of varying difficulty, which creates incentives for players to “game” these metrics. For example, students seek to take, and teachers seek to teach, easy/fun classes; workers seek to do easy tasks, and sport teams seek to play easy opponents.

Yet we have long known of a better way, one I described briefly in 2001: *stat-model-based summary evaluations.*

For example, imagine that a college took all of their student grade transcripts as data, and from that made a best-fit statistical linear regression model. Such a model would predict the grade of each student in each class by using a linear combination of features of each class, such as subject, location, time of day and week, and *also* “fixed effects” for dates, professors, and especially students. That is, the regression formula would include a term in its sum for each student, a term that is a coefficient for that student, times one or zero depending on if that datum is about a grade for that student.

Such a fixed effects regression coefficient regarding a student should effectively correct for whether the student took easy or hard majors, classes, profs, times of day, year of degree, etc. Furthermore, standard stat methods would give us a “standard error” uncertainty range for this coefficient, so that we are not fooled into thinking we know this parameter more precisely than we do.

Thus a “grade point coefficient”, i.e., a G.P.C., should do better than a G.P.A. as a measure of the overall quality of each student. And the more that potential employers, grad schools, etc. focused on G.P.C.s instead of G.P.A.s, the less incentive students would have to search out easy classes, profs, etc. We could do the same for student evaluations of professors, and the more we relied on prof fixed effects to judge profs, then the less incentives they would have to teach easy classes, or to give students As to bribe them to give high evaluations.

The general idea is simple: fit performance data to a statistical model that estimates each performance outcome as a function of the various context parameters that one would expect to influence performance, plus a parameter representing the quality of each contestant. Then use those contestant parameter estimates as our best estimates of contestant quality. Such statistical models are pretty easy to construct, and most universities contain hundreds of people who are up to this task. And once such models are made and listened to, then contestants should focus more on improving their quality, and less on trying to game the evaluation metric.

Yes, as new data comes in, the models would get adjusted, meaning that contestant estimates would change a little over time, even after a contestant stopped having new performances. Yes, there will be questions of how many context parameters to include in such a model, but there are standard stat tools for addressing such questions. Yes, even after using such tools, there will remain some degrees of freedom regarding the types and functional forms of the model, and how best to encode key relevant factors. And yes, authorities can and would use those remaining degrees of freedom to get evaluation results more in their preferred directions.

But even so, this should be a huge improvement over the status quo. Instead of students looking for easy classes to get easier *A*s, they’d focus instead on improving their overall abilities.

To prove this concept, all we need is one grad student (or exceptional undergrad) with stat training willing to try it, and one university willing to give that student access to their student transcripts (or student evals of profs). Once the models constructed passed some sanity tests, we’d try to get that university to let its students put their G.P.C.s onto their student transcripts. Then we’d try to get the larger world to care about G.P.C.s. So, who wants to try this?

P.S. I’ve posted previously on how broken are many of our eval systems, and how a better entry-level job eval system could allow such jobs to compete with college.

**Added:** This paper and this paper shows in detail how to do the stats.

One could get more than one useful number per student by adding terms that interact the student fixed effect terms with other features of classes. That second paper shows a two number system is more informative, but is rejected because “gains realized with the two-component index are offset by the additional complexity involved in explaining the two-component index to students, employers, college administrators and faculty.”

One might allow students to experiment with classes in new subjects by including a term that encodes such cases. One might include terms for race, gender, age, etc. of students, though I’d prefer transcripts to show student GPCs with and without such terms.

**Added 17Oct:** This book by Valen Johnson considers in detail models like those I describe above, wherein the performance of a student in a class is a linear combination of a student term, a class term, and an error. Except that sometimes instead of estimating a grade point, they instead estimate discrete grades, using several terms per class to describe the underlying parameter cutoffs between different discrete grades.

The student term sets an “adjusted GPA” and Johnson proposes to “allow students to optionally report adjusted GPAs on their transcripts.” He reports that when he attempted but failed to get Duke to do this in 1996, this was the biggest issue:

When the achievement index was considered for use as a mechanism to adjust GPAs for students at Duke, instructors who regularly assigned uniformly high grades quickly realized that the achievement index adjustment will make their grades irrelevant in the calculation of student GPAs. Worse still, many students notice the same thing. To thwart the adoption of the achievement index, these high-grading instructors and their student benefactors adopted the position that an A represented an objective assessment of student performance. An A was an A was an A. For them, it represented “excellent” performance on some well-defined but unobservable scale. Indeed, by the end of the debate, several literary theorists had finally identified an objective piece of text: a student grade. (p.222)

Apparently Johnson and others have long tried but failed to get schools to adopt GPCs and variations on them.

**GD Star Rating**

*loading...*