So where did standardized testing come from anyway? That’s not just a rhetorical question. There is a “father” of the multiple-choice test, someone who actually sat down and wrote the first one. His name was Frederick J. Kelly, and he devised it in 1914. It’s pretty shocking that if someone gave it to you today, the first multiple-choice test would seem quite familiar, at least in form. It has changed so little in the last eight or nine decades that you might not even notice the test was an antique until you realized that, in content, it addressed virtually nothing about the world since the invention of the radio.
…
Thus was born the timed reading test. The modern world of 1914 needed people who could come up with the exact right answer in the exact right amount of time, in a test that could be graded quickly and accurately by anyone. The Kansas Silent Reading Test was as close to the Model T form of automobile production as an educator could get in this world. It was the perfect test for the machine age, the Fordist ideal of “any color you want so long as it’s black.
Students in a classroom during scholarship examinations, 16 April 1940.
- Public domain image from State Library of Queensland, Australia, available at Wikipedia.
Seventy years and not much changes.
Your grandparents were subjected to standardized testing too. I especially love the test proctor lurking ominously in the back corner.
(Photo: robotpolisher)
From a teacher in my course on authentic assessment of student learning:
The millennial generation is already inundated with technology and it’s up to the instructors to harness that and utilize it in the class.
My reply:
I think ultimately this may be the most important thing we can do — to carefully identify what kinds of technology our students are already familiar with (such as texting) and discover ways to leverage this prior knowledge for learning and assessment (such as texting collaboratively in Twitter).
Vygotsky’s Zone of Proximal Development is a good model here — if we stretch our students a bit beyond their current use of technology, they will grow, but if we stretch them too far beyond where they are, they may not be able to make the leap.
Pew Internet collects a lot of data on what technologies are being used and how they are being used. This might be an excellent way to go about selecting the appropriate technology for our assessments.
Teachers have their own Zone of Proximal Development. A teacher who uses Skype to talk to faraway loved ones is probably more likely to use similar technology in the classroom.
Maybe faculty development efforts in technology start at the wrong place.
There is much greater risk for a teacher to try a new technology in the classroom without having already learned to use it in “real life.” Your family will be forgiving if your Skype video isn’t perfect. Your students might not be. Maybe we should be offering workshops for faculty (and students) to learn to use technology outside of the classroom where the stakes aren’t so high, and then let the technology migrate back into the classroom on its own. Adults need both safety and relevance for learning, and both are abundant when you have a video chat with your grandkids.
“We considered that our society is replete with tests to measure ‘objectively’ a host of skills and abilities. In addition to tests of abilities, such as the SAT, ACT, MCAT, LSAT, and GRE tests, we have several intelligence tests. These tests have all been standardized and made as objective as the test makers could think to do. In fact, because of the widespread belief in their objectivity, those who perform poorly on these tests are more limited in their options for further education, may spend a lifetime in self-recrimination, and often avoid entering careers or taking up activities for which they believe they have no ability because of their performance on the tests.
“The problem begins when some who are excluded from the group entitled to these resources take their exclusion to have psychological significance rather than to be the result of convenience for a group of decision makers. Even if more resources become available, we tend to stick to the original criteria as if they had some sense separate from that which decision makers invested them to justify the decision.
“For example, because there are limited places in college classrooms, we need to find a way to exclude some potential students. We create a test, say the SAT, and overlook meaningful alternative criteria for selection. We take things a step further and overlook the alternative criteria for the kinds of questions that could be on the test. We then use scores to exclude. We can justify the use of the scores by studies showing that they predict college performance and ignore the possibility that other tests inquiring into other abilities might do just as well. If Harvard randomly selected students from those who applied, educated them, and granted them a Harvard degree, they might well be as successful as those who get in via their SAT scores. But if we allowed everyone to go to Harvard who wanted to, many would not want to go because the glamour of being “selected” would be gone.
“In this way the relationship between evaluation and the perception of limited resources is reciprocal and interactive, each causing the other. The consequence of these evaluations for all but perhaps the top five percent of applicants is a lifetime of feeling inadequate. People at ages fifty and sixty still degrade themselves based on how they performed on a test they took in their teens. Those who did not score higher on tests like these should look at them once again as adults and consider the questions they could not answer. Do we really care if we don’t know which picture represents the results of unfolding a piece of paper? Would we respect someone simply because she knows the answer to this or some other arbitrary question? As adults, shouldn’t we realize that a group of people decided to ask this question but they could have asked a different one to which we might have known the answer. If we had these people in front of us so that we could judge them the way they had indirectly judged us, what would we find? Couldn’t we think of questions that they would not be able to answer? As long as we’re oblivious to context, we don’t think of things like this.”
Langer, E. (2005). On becoming an artist: reinventing yourself through mindful creativity. New York: Random House., p. 115-116
Click through photo to read on Google Books.
Assessment for online teaching and learning
Discussion post from a professional development course for online community college teachers on assessment, grading, and feedback:
The administrators want “data” they can show to their “stakeholders” (not my favorite word) to demonstrate the “effectiveness” of teaching and learning. As you know, I take a rather … ahem … non-traditional approach to this topic, which is one of the reasons I was asked to develop this course. My own personal view is that “data” that show that students scored 80 percent on a test before, and then 90 percent on a test later, doesn’t really say much about the effectiveness of teaching and learning, unless what you’re teaching your students to learn how to take multiple-choice tests.
There are a whole series of issues here involving test validity, levels of knowledge that can be assessed, learning domains, learning styles, expectations of the discipline, accreditation requirements, learner preparation and prior knowledge, and so forth. Most administrators are not “real” educators <grin> and come from the epistemological view that quantitative data from test scores are an objective measurement of real learning. I take the view that assessing through multiple choice tests — especially in areas such as the humanities — is never truly objective, and is probably not even measuring the learning that actually happening.
We have to play the hand we’re dealt. If they want multiple choice tests, give it to them. It is very unlikely you will persuade a committee to take an alternative approach, since these tests are cheap and easy to administer, and everyone can agree that 90 is a higher score than 80 (probably the real reasons they are used). So I give them all the spreadsheets they want. Then I go about subverting the system by focusing my efforts on authentic assessment models, such as student portfolios or whatever. A funny thing happens. The administrators who really lack the imagination to see assessment as anything but a spreadsheet get an opportunity to see what authentic assessment looks like, and they love it! They want to host portfolio shows and invite their “stakeholders,” and put student videos on the campus YouTube site, along with PowerPoint presentations and photographs of smiling students with their projects, and otherwise show the world what their students can actually DO!
So that’s my strategy. Give them what they ask for, then SHOW them what my students can actually DO. I don’t know if they ever really understand it as being “real” assessment, but they usually dig it and can see the benefit to the institution (whether or not they see the benefit for the learners). I’d rather work the fringe, wiggle through the gaps, and play within whatever sandbox they give me, than try to get them to imagine something about learning they’ve probably never experienced themselves.
The line for co-conspirators forms to the left. :-)



