Category Archives: Item Response Theory

Photographs As Items for Assessment – Free Example

Photographs of people in activity is a promising newer area for development of business-relevant assessments that has been in use for years in healthcare.  Originally developed in the Netherlands to help patients suffering from fear of pain when moving the body (kinesiophobia), the University of Maastricht’s website has details on citations and free compressed (zipped file) short version of the main test.

Clearly, this same approach could be used to develop more engaging employee and organizational assessments that may be difficult to fake, have better face validity, and more workplace fidelity than other types of items.  Further, with cheap and even free video sites, video items could also play a bigger role in future assessments.

Consider these possible fruitful examples.

a) Vocational Interest Assessment

Vocational interest tests help people identify career paths for which their interests, values, and aptitudes are particularly suited.  But most all are purely text-based.  What if each career alternative had photographs of the tasks in each job or job family, with video vignettes of major tasks?  Perhaps this could be a fun way to assess what activities and careers would ultimately help the person realize their goals.  Take another look at the picture at the top of this article.  It’s an actual picture from PHODA’s assessment, but couldn’t it represent the task of lifting articles out of a trunk for the job of a taxicab driver?

b) Employee Selection

Cognitive and knowledge-based tests are often used to select new employees, but not nearly as often or instead of the ubiquitous job interview.  What if good instruments could be developed, perhaps with a combination of item types, to include pictures?  People could rate pictures like this one on the degree to which it looks similar to their desk – would you expect highly conscientious people to endorse this picture?

I would guess that highly conscientious and prudent people would be unlikely to indicate that this picture reflects their own office.  Sales Convention pictures would be good for the high-end of extraversion; Police taking down violent offenders for low levels of agreeableness.  The potential for pre-hire selection, especially using to add to Computer-Adaptive Testing item banks is tremendous.

c) Culture & Climate

Static pictures may be difficult to identify that reflect various organizational cultural differences, but videos could certainly be used to assess these. 

Limitations
As optimistic as I am about the potential for picture-based items to take a larger role in organizational assessment, I recognize there are also downsides.  First, while digital cameras are cheap, actors may not be.  If you can find existing workplaces where you can take these pictures, it may help you avoid hiring actors for static pictures, but perhaps not for videos that could really suffer with amateur actors.

Second, one New Zeland user of the PHODA complains that if the photographs are context-specific, they can loose value in other contexts.  I remember once when I worked for AT&T Microelectronics, we hired Wally Borman to redo his 1970’s era rater training videos because while the content was good, the actors wore sideburns, bell-bottoms and leisure suits.  This was never going to be very persuasive as “cutting edge” to managers in a bleeding-edge semiconductor factory (computer chips).

Do you see the same potential for photograph-based items as The Scientific Leader?

, , , , , , , , , , , , , , , , , , , , , ,

Advertisements

Australian CAT for Kids

The Australian State of Victoria’s standard educational assessments include computer-adaptive tests (CATs), reports their new, free manual on report interpretation.  I was pleased to discover that the Victorian Curriculum and Assessment Authority uses the most modern form of human assessment to help children of all ages learn.

In particular, it is noteworthy that their easy-to-read manual includes an understanding of Rasch Measurement.  It notes the specific locations where there are items that are out of scope for a given assessment.  In these places, the child is mismatched with the test – the questions are either too hard or too easy to produce a trustworthy metric.

I’m hopeful that Australia’s educational leadership rubs off on more schools around the world.

, , , , , , , , , , ,

CAT for Senior Citizens

Professor Jette

Can Computer-Adaptive Testing Help Senior Citizens?  Research from Boston University suggests it can.  Professor Alan Jette, Director of the Boston University Health & Disability Research Institute published a recent paper examining disability assessments in traditional and computer forms.  With data from 671 older adults residing in care facilities, CAT compared favorably with fixed-form scales, even for a version of the CAT with only 10 questions.  In his study, each CAT was administered in less than three minutes, and were highly correlated with the original instrument.

His research strongly suggests that in situations where time is a scarce resource, and measurement fidelity is still important, that Computer-Adaptive Measurement approaches are often more useful than others.

, , , , , , ,

Leader Due Diligence – Psychometrics as Part of Financial Transparency?

With the Bernie Madoff being the latest in a series of massive financial frauds caused by leaders who misrepresented themselves, the time may have come to broaden the financial world’s definition of “transparency”.  I’d like to offer a broader view to include publicly reported reports on leadership knowledge, skills, abilities, traits, values and interests.  Would you have invested in Madoff’s Ponzie scheme if you had previously reviewed a report from a trusted authority on leadership assessment that noted he is low on conscientiousness and prudence?  How would a board view this same report on a founder-CEO?

How well do you know your leaders?

How well do you know your leaders?

Poor leadership is common, but leaders rarely fail in such a public way.  In one study of nearly 400 Fortunte 1000 companies, 47% of executives and managers rated their company’s overall leadership as fair or poor; and only 8% rated it as excellent (Csoka, 1998).  Personality traits predict both performance and ineffective leadership.  For example, conscientiousness is one of the “Big 5” factors of normal personality that has been shown to consistently predict both job performance and dishonest behavior in the worklpace. Former professors of mine, Robert and Joyce Hogan have written extensively about this area, and have authored some of the better classical test theory instruments for normal personality, the “dark side” or disfunctional leadership, and leader motives, values and preferences.  None of these sorts of assesments are typically used systematically to plan CEO development in private by the board.  And it is entirely unheard of for these reports to be shared publicly with prospective customers, partners and shareholders.  Perhaps we should reconsider making these transparent, systematically, given the risk and lack of confidence in markets of late?   The free paper I drafted, “The Three Stooges of Operational Risk: Advances in Leadership Due Diligence and Rasch Measurement” proposes a way of improving our leadership assessments.  If desired, they could be used for this transparency purpose.   I welcome your feedback.

Special thanks to Alexei M for inspiring this idea.

References

Csoka, L. S. (1998).  Bridging the Leadership Gap.  New York: Conference Board.

Hogan, R., Curphy, G., & Hogan, J. (1994).  What We Know About Leadership: Effectiveness & Personality.  American Psychologist 49(6), 493-504.

Robie, C., Brown, D., & Bly, P. (2008, March).  Relationship Between Major Personality Traits and Managerial Performance: Moderating Effects of Derailing Traits.  International Journal of Management, 25(1), 131-139.

Madoff Destroys $50 Billion with “Giant Ponzie Scheme”

Bernie Madoff is the latest in the series of senior executives to destroy value, this time with an apparent $50 billion dollar fraud, according to the Financial Times.  Madoff, a former Chairman of the NASDAQ stock market, on thursday admitted to his employees including his two sons that his operations were “all just one big lie” and “basically, a giant Ponzi scheme”.  The alleged fraud is the largest ever investor fraud ever blamed on a single individual.

Previously, I had written about the “Three Stooges of Operational Risk“, where I detailed senior executive destruction from Key Lay of Enron, Bernie Evers of Worldcom and most recently, Dick Fuld‘s follies with Lehman Brothers.  In two of those three I noted the dishonesty and fraud that accounted for their downfall similar to Madoff.  But unlike Madoff, they were less candid about thair fraud.  After Madoff’s brazen alleged admission, is there any uncertainty that leadership due dilligence is a critical part of the selection process of hiring senior executives?  Could it be any more clear that the pre-hire assessment procedure is a non-trivial subset of Enterprise Risk Management?

In fairness, these Industrial Organizational Psychology methods have their limitations.  No forecast could ever be perfect, or and even the best assessment procedures only account for 30-60% of the variance in job performance.  But it’s relatively rare that factors such as conscientiousness are used to screen executives – and conscientiousness highly predicts dishonest, and imprudent behavior in the workplace like that of Madoff.  With new methods from Rasch Measurement, Computer-Adaptive Testing, and an innovation from the Scientific Leader, “Inverted Computer Adaptive Testing” using Virtual Realtity, it’s increasingly difficult for people to fake or misrepresent themselves on these assessments. 

How much risk are you accepting when you use standard interviews to hire your employees?

, , , , , , , , , , , , ,

Utah Leadership Supports Computer-Adaptive Testing In Spite of “No Bureaucrat Left Behind” Act

Nine schools in Utah have found the benefits of Computer-Adaptive Testing to trump older methods.  Adaptive tests change to match a student’s skill level, avoiding wasted time and effort on questions that are far below or above their proficiency level.  They’re also at least 20% shorter.  This allows for periodic reassessment, and personalized focus on the specific curricular areas a learner needs to work on.  Each student is treated as a special, unique person.

But the US Federal Government’s Department of Education is behind the times, and making it difficult for Utah to use the modern psychometric methods, according to Utah’s Daily Herald.  The “No Child Left Behind Act” requires outdated, non-adaptive methods to be used in addition to the modern approaches.  While on the surface, the DoE’s request for peer review is something that is good, in applied settings, it’s rarely used.  The instruments I’ve developed would certainly pass the scrutiny of my peers, and the feedback they give is useful.  But these extra steps are typically unnecessary to ensure that instrumentation is useful, as long as professionals develop the Computer-Adaptive Tests.  It’s downright destructive to children for the federal government to force Utah to use outdated, longer, and less precise measures of learning.  While I presume those favored by Washington are “peer reviewed”, I suspect that the review committee is selected by those who are friends of politicians, and are likely unskilled in the recent developments in computer-adaptive measurement.

Fortunately, Utah appears to have visionary, contemporary leadership about steadfastly supporting good measurement to help children learn.  The Utah Legislature, the State School Board and Governor all approved the plan to continue to use it – and the Feds require the outdated assessments to be used as well.  This is both a hassle, unnecessary cost, and an opportunity cost – the children could have been spending the time they’ll take on the DoE tests on learning something new.  Are you a visionary leader like the folks in Utah?  More by The Scientific Leader on Computer-Adaptive Measurement, applied to organizations and business is free here.

, , , , , , , , , , , ,

Clash of the Psychometric Titans: Rasch and IRT

Does the approach you take to measuring your customers, employees or patients really matter?  If bad decisions are costly, then yes it really does.  As an Industrial/Organizational Psychologist, I was taught about the science of human measurement.  This included historical treatment of “true score” or classical test theory and also Item Response Theory (IRT).   Classical approaches are slow to create, and require comparisons with others (“norms”) to make sense of them.  But when you measure temperature with a thermometer, do you need to know the distribution of other thermometers in the area for it to make sense?  No – physical and biological sciences have a long history of successful measurement prior to Social Sciences pseudo-measurement approach. 

IRT is relatively better than Classical Test Theory, however, it violates some of the physical science axioms for measurement, is very complex, requires large sample sizes, and produces weird results that are nearly impossible to explain to the untrained.

Unfortunately, I/O Psychologists like me don’t get training in Rasch Measurement, other than believing that it’s the same as the simplest form of IRT.  This isn’t accurate – rather, these are two competing paradigms, and 12 years past my Ph.D., I’ve decided to dedicate myself to Rasch Measurement for both practical and scientific reasons. 

Practical Benefits of Rasch:

  • Smaller, unrepresentative samples are sufficiently useful
  • Accuracy and precision have same meaning in psychology as in physics and biology.  This makes it easier to communicate with physical and biological-science colleagues
  • People and items are on the same “ruler”.  For development, this is extremely useful to help focus learning on areas that are “just right” and not too hard or easy.
  • Raw scores are sufficient.  When communicating the results of a test, quiz, or assessment, this is essential.  With IRT, you can have a lower raw score but have a higher result – good luck explaining that to parents and juries.

After taking two terrific classes from Mike Linacre, I changed my mind and switched all my practice and science to Rasch.  Statistics.com still has the classes, both in the basics, advanced methods and the extremely powerful Many-Facet Rasch Measurement that he invented.   While the class format is a bit awkward and highly self paced, Linacre is an enthusiastic and responsive teacher.  Highly recommended.

, , , , , , , ,