|
The Story behind the Recent Changes in the ANSI Testing Procedure
Interview with Jennifer
Snow Wolff
Jennifer spearheaded the initial research on how to test symbols.
This research was funded by Electromark and become the basis for the new ANSI Z535.3
testing criteria. Shelly Waters Deppa,
the chairperson of Z535.3, took the leadership in modifying Jennifer’s research and
including it in the final standard.
How has the ANSI symbol standard changed?

In the most recent (1995) revision of the ANSI standards, the
recommended procedure for testing safety symbols has gotten much
more complicated—or at least it seems that way.

The section on how to test a safety symbol, Annex A, now Annex B went from 2 pages long
to 15 pages long. Multiple Choice testing, which used to be
recommended is now only recommended with so many additions that it
is no longer the easy test method it once was. The standard has
added preliminary testing, and testing in context and suggested
doing revisions.
What is behind all these changes?

Validity. That’s what’s behind the changes. If your test has “validity” it means you
have a good test. The test tests what it says it does. If your test
says the symbol is bad, you can be sure the symbol really is bad.
The biggest problem with the old standard was multiple choice.
Everyone one of us has taken a multiple-choice test where it was
easy to guess the right answer, even though we hadn’t studied, just
because all the other choices were so obviously wrong. So, everybody
knows it’s very important that if you have a multiple choice test
that all the answers are at least plausible.
But the million-dollar
question was “Exactly how hard is it to make a good multiple choice
test?” The answer turned out to be - almost impossible!
How did the research get started?

Electromark decided to sponsor some
independent researchers to answer the question of how effective a
multiple choice symbol questionnaire can be. Dr. Mike Wogalter, of
North Carolina State University, then asked me to help out. I was
then an Information Design graduate student at Georgia Institute of
Technology. Dr. Mike Wogalter, of course, is a recognized expert in
the area of symbols testing, supervised my work. In fact, the work
was so significant that it was later published by Human Factors, a
prestigious journal that only publishes 15% of all articles
submitted. A committee of professors in statistics, communications
and psychology oversaw my research design. The project required
research and involved tests involving hundreds of subjects.
Who else
was doing research into Symbol effectiveness at the time?

There are
a lot of experts and researchers who have been working on
testingmethods for testing safety symbols for years: experts like
Robert Dewar, Dr. Silver, Shelly Deppa, Harm Zwaga. Many of these
scientists have been contributing to the International Standards
Organization (ISO) and their work has been verified. But, they all
knew that multiple choice is pretty tricky to do right!
What’s so great about the open-ended Testing Method?

It’s more valid for two
reasons. It provides not only a test answer, but also valuable
information for redesigning a better symbol. The open-ended method
just asks, “what does this symbol mean?” Unlike multiple choice, an
open-ended test question doesn’t give the test-taker any “unfair
clues”.

Why is preliminary testing important?

Early informal
open-ended testing saves time in the long run. Doing informal
open-ended testing of symbols at the beginning of the testing
procedure is a good idea. First, it gives the testers an idea of how
good the symbol is. Second, it gives designers the information they
need to ”correct” any problems the symbol has. Without this early
testing, a lot of time and money is spent testing bad symbols that
should never have even have gotten to first base.

Designers will and should generate a lot of new symbol variants when designing a better
symbol, so a quick method is needed to choose among them. Harm
Zwaga’s “early estimation” method has been tested and validated by
ISO, so we have appropriated it for use by ANSI. Without this
method, testers either have to laboriously test ALL possible symbol
variants, or just guess which ones are best. This method is simply
to put all the symbols in a circle and ask test-takers what percent
of the population would understand each one. The three best are
taken and tested.
What is so important about context in symbol testing?

In real life, a symbol is seen “in context”, i.e., in a
work setting. Testing a symbol “with context” just means that the
test-taker is given information about where the symbol would be in
real life. A test can either provide a descriptive sentence or a
picture of the real-life setting. A test where the symbol is only
seen on a blank piece of paper lacks the “real-world validity” of a
test with context. Today, because of the availability of cheap,
color, photo-quality scanners and printers, it is now very easy for
any company to provide low-cost photographic context when testing a
symbol. Therefore, the ANSI Standards Committee felt justified for
the first time suggesting this as a both a realistic and affordable
method.

Symbols are designed and copyrighted by Paul Arthur.
Also, the use of external context could reduce costs in producing symbols
with acceptable, above criterion level performance (of 85% or 67%).

The connection between simple symbols and context cannot be
underestimated. ANSI and designers alike recommend simplicity in
designing a symbol. In general, a simple, clean symbol is easier to
read quickly, especially in smoky, foggy or dark conditions when
it’s most important to see the symbol. In the past, researchers were
puzzled that sometimes the most cluttered and busy symbols tested
better than simpler symbols that seemed far better.

This research suggested that these “busy and
cluttered” symbols are testing better only because they contain
“contextual information” in the symbol
itself. But in the real world, that symbol would not do better
because that entire context is in the world, and it isn’t needed in
the symbol. The simpler symbol that tests worse is actually the
better symbol! This is because without context, the test-taker
supplies their own context, which may have nothing to do with the
symbol’s purpose.

Take this example of the symbol meaning “Wear
Safety Shoes”. A test-taker who wanted to buy shoes might think it
meant “Shoe Store here”. An outdoor enthusiast might think it meant,
“hiking trail”. But when that symbol is hanging on the wall at a
construction site, nobody is going to think it means “Shoe store” or
“Hiking Trail”.

Why is Multiple Choice so bad?

Well, multiple choice tests are bad for lots of reasons. Constructing a multiple-choice
test with plausible distracters is difficult. First, scores depend
on plausibility of the alternative answers. Second, participants
will get a certain % right by chance. Third, the multiple-choice
test has low ecological validity, i.e it does not reflect the
cognitive processes involved in real world symbol comprehension.
And, finally, in the real word, a series of alternatives (most of
which are incorrect) are not posted next to symbols.
But can’t any smart person know if a multiple choice test is good or not?

No. It is very difficult to constructing a multiple-choice test with
plausible distracter sentences! A “distracter sentence” is the
“incorrect” answers that are supposed to distract the test-taker
from choosing the correct answer. We conducted an entire open-ended
test just to see if we could come up with three plausible
distracters for a four-answer multiple choice test... and we still
didn’t get three options for each symbol that rated as “plausible”.
So we conducted two more ratings and still had not found enough!
That’s when we decided it’s pretty hard to come up with three
plausible answers in a multiple-choice test.
How bad were the bad answers?

We collected “actual answers” from people to our
multiple-choice questions. We even analyzed multiple choice tests
that had been constructed by some of the most well respected
researchers in the US, and our test-takers rated them as having a
very low plausibility. These implausible answers could inflate the
scores of bad symbols by as much as 30%! So even the most sincere
researchers could unwittingly pass a bad safety symbol and have that
symbol used in the workforce. And, if that symbol fails to
communicate a safety message, it means that someone could be injured
or even die!
|