31. Burdon of Proof, Useful Data, Reliability, Radiological Diagnosis, and Iridology

The eyes are the windows to the soul.

“If you cannot measure it, it does not exist.” Young psychometrician

Hmm, lost an eye?  My professional opinion is to cover your good eye with gauze so you can only see light or dark.  Non-parametric user

If you believe in Science (note the capital ‘S’), you require proof.  Without any valid theoretical underpinning, I must admit scientific proof requires additional proof.  Therefore, I guiltily admit, I would require more stringent proof to believe in any medicine whose basis is chockras, meridians or other alt- approaches, which lack any physiological, chemical, microscopic or neurological evidence, than in medicines who use proteins which are known to affect the correct biological system and have some animal research which validated it.  I also believe in the existence of bones.  Don’t get me wrong, I still would require two adequate and well controlled studies for any drug, but for a chockra based treatment, I likely would require more.

For me, the converse is also true.  I am above all else an empiricist.  If something has weak theory, but has a proven value, I would use it.  I still might have a healthy skepticism, since I am also very familiar with the placebo effect.  More on this later.

Now there is a difference with proving a theory and finding that theory has useful implications.  Here is where I differ from the Agency.  I require clinical, not just statistical, significance.  I have known ESP researchers which have demonstrated (to my satisfaction) that ESP works (consistent p < 0.0001).  On the other hand, from what I’ve seen, all demonstrations, even among ‘gifted’ telepaths, show its effect size is trivial.  If it is demonstrated that an effect size is greater than zero, but less than 1%, I might smile and say thank you, but no thank you.  For example, if a measurement can measure itself with 10% or less true prediction (i.e., 90% error variance or noise), I regard it as potentially useful, but inadmissible for individual prediction.

One gift from psychometrics is the concept of reliability.  Let me define terms, reliability is basically how well a set of measurement is repeatable, usually measured by a correlation coefficient.  A zero indicates that the measurement is useless – totally useless, and a 1.00 indicates that it consistently gets the same measurement.  I’ll return to this in more detail below.  Now I should also point out that a reliable measurement may not be useful, that is the validity of a scale.  For example, one can very consistently measure a man’s shoe size, but its utility as a measure of ‘manliness’ might be contested.  However, it should be obvious, that a reliable measurement is a necessary, but not sufficient, condition for a measurement to be valid (i.e., useful).  Let me be more mathematical.  If I had a perfect, gold-standard measurement of something (call it ‘T’ for TRUE) and a secondary measurement of it (call it ‘x’) and correlate them together, the maximum correlation of r{T,x} has a limit of r{x,x’}, where r{x,x’} is something call reliability of ‘x’.

Let me spill the beans on radiology’s dirty little secret.  When you get an X-ray taken and a radiologist reads it, their readings suck!  I worked at a diagnostics firm, when I was told this dirty little secret.  You might think I exaggerate, let me give you the data.  If you give two radiologists the same image, they should give the same reading.  If you correlate the different radiologist’s readings they should correlate 1.0.  Would you believe 0.3-0.4?  If you read an early blog (4. Meaningful ways to determine the adequacy of a treatment effect when you lack an intuitive knowledge of the dependent variable), you would note that the amount of prediction from one measurement to the other (r) is r².  So 0.3 has a prediction of 9% of the variance and 0.4 has a prediction of 16% of the variance.  Or 0.3 is 91% NOISE and 0.4 is 84% NOISE. Statisticians who have worked with radiological readings have learned this dirty little secret the hard way.  In fact, this is an excellent way to weed out new, inexperienced statisticians from their more seasoned brethren.   Would I go to a hospital and get an X-ray if I had a totally broken bone?  Yes.  Is it worth while to go to a hospital to have them read a chip or a subtle crack?  NO.  If I tell them what to look for, they might find it (or claim to find it – same thing).  I was recently asked by a colleague to review a research methodology whose primary endpoint was reading CT scans.  I told him my concern.  Given the unreliability and the subjective flaws I’ve experienced, I told him to ensure adequate training of these experienced radiologists, use a single central lab, have the images blinded by study, region, site, patient, and date (especially pre- vs post-).  Then test the radiologists for their inter-rater reliability.  While there is excellent theoretical reason to believe that the key parameter should be bone appearance (it was actually muscle ossification) one might want a more reliable measurement, or ways to make it more reliable.  Faced with this unreliable measurement, the clinical team suggested categorizing the data into a three point rating.  I told them that was the wrong way to go (see 7a Assumptions of Statistical Tests: Ordinal Data).  When you have poor data, the only thing to do is to acknowledge it and try to improve it, not throw away more information!  The best analogy is telling a person who lost an eye that the best treatment is to cover the good eye with thick gauze so they could only see light or dark.  Categorization with poorly measured data is completely counterproductive!

A week ago, my wife and I went to a health food store.  She went for a probiotic which was claimed to contain 10,000,000,000 active cultures.  As I said above, I’m an empiricist.  There is some evidence about the macrobiome in the scientific literature, and more importantly it makes my wife happy.  So we bought some huge pills.  On the cash register I noticed a flier for an iridology reading.  I was curious so I Googled it.

Iridology is a alternative <science> (‘<‘ and ‘>’ indicate air-quotes) which allegedly examines the iris of the eyes to <determine> sickness or potential weakness in a human’s biological systems.  For example, the innermost part of the iris is said to reflect the health of the stomach.  It was based on a <scientist> who made a single observation about an owl’s eye and then created a system to evaluate all bodily systems.  If you Google iridology, one of the first entries is the confession of a former iridologist (Confession of a Former Iridologist) who observed that his <readings> were totally unreliable.  The placement of the camera, the lights in the room, and even his own ratings varied with each measurement.  More to the point, the <science> totally lacks any biological basis, e.g., any color seen in the iris is not from chemicals or metals seeping into the iris (by some magical means) but by melanin.

More to the point, scientific studies on the accuracy of Iridology have found it totally lacking.  For example in a website Iridology is Nonsense, they observed a 1979 study where “one iridologist, for example, decided that 88% of the normal patients had kidney disease, while another judged that 74% of patients sick enough to need artificial kidney treatment were normal”.

This site concludes “If you encounter anyone practicing iridology, please complain to your state attorney general.”  Perhaps the same warning should be made for clinical radiologists.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *