Electromark - Leading Source for Utility Marking Products
Home Shopping Cart My Designs My Account
The Story behind the Recent Changes in the ANSI Testing Procedure

Interview with Jennifer Snow Wolff

 

Jennifer spearheaded the initial research on how to test symbols. This research was funded by Electromark and become the basis for the new ANSI Z535.3 testing criteria. Shelly Waters Deppa, the chairperson of Z535.3, took the leadership in modifying Jennifer’s research and including it in the final standard.

How has the ANSI symbol standard changed?

In the most recent (1995) revision of the ANSI standards, the recommended procedure for testing safety symbols has gotten much more complicated—or at least it seems that way.

The section on how to test a safety symbol, Annex A, now Annex B went from 2 pages long to 15 pages long. Multiple Choice testing, which used to be recommended is now only recommended with so many additions that it is no longer the easy test method it once was. The standard has added preliminary testing, and testing in context and suggested doing revisions.

What is behind all these changes?

Validity. That’s what’s behind the changes. If your test has “validity” it means you have a good test. The test tests what it says it does. If your test says the symbol is bad, you can be sure the symbol really is bad. The biggest problem with the old standard was multiple choice. Everyone one of us has taken a multiple-choice test where it was easy to guess the right answer, even though we hadn’t studied, just because all the other choices were so obviously wrong. So, everybody knows it’s very important that if you have a multiple choice test that all the answers are at least plausible.

But the million-dollar question was “Exactly how hard is it to make a good multiple choice test?” The answer turned out to be - almost impossible!

How did the research get started?

Electromark decided to sponsor some independent researchers to answer the question of how effective a multiple choice symbol questionnaire can be. Dr. Mike Wogalter, of North Carolina State University, then asked me to help out. I was then an Information Design graduate student at Georgia Institute of Technology. Dr. Mike Wogalter, of course, is a recognized expert in the area of symbols testing, supervised my work. In fact, the work was so significant that it was later published by Human Factors, a prestigious journal that only publishes 15% of all articles submitted. A committee of professors in statistics, communications and psychology oversaw my research design. The project required research and involved tests involving hundreds of subjects.

Who else was doing research into Symbol effectiveness at the time?

There are a lot of experts and researchers who have been working on testingmethods for testing safety symbols for years: experts like Robert Dewar, Dr. Silver, Shelly Deppa, Harm Zwaga. Many of these scientists have been contributing to the International Standards Organization (ISO) and their work has been verified. But, they all knew that multiple choice is pretty tricky to do right!

What’s so great about the open-ended Testing Method?

It’s more valid for two reasons. It provides not only a test answer, but also valuable information for redesigning a better symbol. The open-ended method just asks, “what does this symbol mean?” Unlike multiple choice, an open-ended test question doesn’t give the test-taker any “unfair clues”.


Why is preliminary testing important?

Early informal open-ended testing saves time in the long run. Doing informal open-ended testing of symbols at the beginning of the testing procedure is a good idea. First, it gives the testers an idea of how good the symbol is. Second, it gives designers the information they need to ”correct” any problems the symbol has. Without this early testing, a lot of time and money is spent testing bad symbols that should never have even have gotten to first base.

Designers will and should generate a lot of new symbol variants when designing a better symbol, so a quick method is needed to choose among them. Harm Zwaga’s “early estimation” method has been tested and validated by ISO, so we have appropriated it for use by ANSI. Without this method, testers either have to laboriously test ALL possible symbol variants, or just guess which ones are best. This method is simply to put all the symbols in a circle and ask test-takers what percent of the population would understand each one. The three best are taken and tested.

What is so important about context in symbol testing?

In real life, a symbol is seen “in context”, i.e., in a work setting. Testing a symbol “with context” just means that the test-taker is given information about where the symbol would be in real life. A test can either provide a descriptive sentence or a picture of the real-life setting. A test where the symbol is only seen on a blank piece of paper lacks the “real-world validity” of a test with context. Today, because of the availability of cheap, color, photo-quality scanners and printers, it is now very easy for any company to provide low-cost photographic context when testing a symbol. Therefore, the ANSI Standards Committee felt justified for the first time suggesting this as a both a realistic and affordable method.


Symbols are designed and copyrighted by Paul Arthur.

Also, the use of external context could reduce costs in producing symbols with acceptable, above criterion level performance (of 85% or 67%).

The connection between simple symbols and context cannot be underestimated. ANSI and designers alike recommend simplicity in designing a symbol. In general, a simple, clean symbol is easier to read quickly, especially in smoky, foggy or dark conditions when it’s most important to see the symbol. In the past, researchers were puzzled that sometimes the most cluttered and busy symbols tested better than simpler symbols that seemed far better.


This research suggested that these “busy and cluttered” symbols are testing better only because they contain “contextual information” in the symbol itself. But in the real world, that symbol would not do better because that entire context is in the world, and it isn’t needed in the symbol. The simpler symbol that tests worse is actually the better symbol! This is because without context, the test-taker supplies their own context, which may have nothing to do with the symbol’s purpose.

Take this example of the symbol meaning “Wear Safety Shoes”. A test-taker who wanted to buy shoes might think it meant “Shoe Store here”. An outdoor enthusiast might think it meant, “hiking trail”. But when that symbol is hanging on the wall at a construction site, nobody is going to think it means “Shoe store” or “Hiking Trail”.


Why is Multiple Choice so bad?

Well, multiple choice tests are bad for lots of reasons. Constructing a multiple-choice test with plausible distracters is difficult. First, scores depend on plausibility of the alternative answers. Second, participants will get a certain % right by chance. Third, the multiple-choice test has low ecological validity, i.e it does not reflect the cognitive processes involved in real world symbol comprehension. And, finally, in the real word, a series of alternatives (most of which are incorrect) are not posted next to symbols.

But can’t any smart person know if a multiple choice test is good or not?

No. It is very difficult to constructing a multiple-choice test with plausible distracter sentences! A “distracter sentence” is the “incorrect” answers that are supposed to distract the test-taker from choosing the correct answer. We conducted an entire open-ended test just to see if we could come up with three plausible distracters for a four-answer multiple choice test... and we still didn’t get three options for each symbol that rated as “plausible”. So we conducted two more ratings and still had not found enough! That’s when we decided it’s pretty hard to come up with three plausible answers in a multiple-choice test.

How bad were the bad answers?

We collected “actual answers” from people to our multiple-choice questions. We even analyzed multiple choice tests that had been constructed by some of the most well respected researchers in the US, and our test-takers rated them as having a very low plausibility. These implausible answers could inflate the scores of bad symbols by as much as 30%! So even the most sincere researchers could unwittingly pass a bad safety symbol and have that symbol used in the workforce. And, if that symbol fails to communicate a safety message, it means that someone could be injured or even die!


 
 
 

| About Us | Customer Service | Feedback | Sales Team | Request a Catalog | Help & FAQs | Disclaimer | VIP XpressCenter © 2007 Electromark
#2