The data we will consider originates from a randomized controlled study (RCT). The main purpose of the RCT was to investigate the effect of Acceptance and Commitment Therapy (ACT) in 126 patients with severe health anxiety. During the study patients were asked to complete several questionnaires (among others SF-36 and SF-12) at different timepoints.
SF-36
The SF-36 is a generic multi-purpose, short-form health survey with only 36 questions. It yields an 8-scale profile of scores (physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional and mental health) as well as physical (PCS) and mental (MCS) health summary measures. The 8-scale profile scores and the summary measures ranges from 0-100 with higher scores indicating better health. The scores are considered continuous.
SF-12
The SF-12 Health Survey is a 12-item subset of the SF-36 that measures the same 8 scales of health including PCS and MCS. It is a brief, reliable measure of overall health status. It is useful in large population health surveys and has been used extensively as a screening tool. The score ranges are the same as for the SF-36 and scores are considered continuous.
You choose to design a validation study and look at criterion validity of the Physical Component Summary (PCS), as computed by the SF-12, by comparing it to PCS calculated by the SF-36 (gold standard).
Question 1 - The gold standard
1.1 Discuss whether the choice of gold standard and sample is appropriate for the purpose?
Question 2 – limits of agreement
A simple paired t-test between the PCS SF36 and the PCS SF12 reveals the following output:
. ttest pcs_sf36 = pcs_sf12
Paired t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
pcs_sf36 | 123 47.36617 .7058742 7.828523 45.96882 48.76352
pcs_sf12 | 123 44.82154 .6827179 7.571708 43.47003 46.17305
---------+--------------------------------------------------------------------
diff | 123 2.544632 .3344371 3.709087 1.882581 3.206684
------------------------------------------------------------------------------
mean(diff) = mean(pcs_sf36 - pcs_sf12) t = 7.6087
Ho: mean(diff) = 0 degrees of freedom = 122
Ha: mean(diff) < 0 Ha: mean(diff) != 0 Ha: mean(diff) > 0
Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000
2.1 What is the mean difference between pcs_sf36 and pcs_sf12?
2.2 What is the limits of agreement (i.e. prediction interval)?
The LOA = δ ± 1.96 x SD
2.3 Explain what the limits of agreement tells you?
Question 3 – Bland and Altman plot
The score range of PCS is 0-100 and the summary statistics of the two variables are:
. sum pcs_sf36 pcs_sf12
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
pcs_sf36 | 125 47.37597 7.768135 29.12679 66.29746
pcs_sf12 | 123 44.82154 7.571708 25.04456 57.46968
From the paired t-test we have a mean difference between pcs_sf36 and pcs_sf12 = 2.54. The limits of agreement were found in question 2.2.
3.1 Draw a Bland & Altman limits of agreement plot and put values on the x and y axis.
You need to put labels and numbers on the x and y-axis. Furthermore, you need to draw the lines for zero, LOAupper, LOAlower, and the systematic error.
In another study, 16060 patients were screened for cancer using a gold standard (criterion) and a new test instrument (test). You hypothesize that the new instrument is as good as the gold standard.
Question 4 – sensitivity and specificity
To test your hypothesis, you calculate sensitivity and specificity:
. diagt test criterion
| criterion
test | Pos. Neg. | Total
-----------+----------------------+----------
Abnormal | 969 231 | 1,200
Normal | 164 14,696 | 14,860
-----------+----------------------+----------
Total | 1,133 14,927 | 16,060
True abnormal diagnosis defined as test = 1
[95% Confidence Interval]
---------------------------------------------------------------------------
Prevalence Pr(A) 7.5% 7.1% 7.9%
---------------------------------------------------------------------------
Sensitivity Pr(+|A) 80.8% 78.4% 82.9%
Specificity Pr(-|N) 98.9% 98.7% 99.1%
ROC area (Sens. + Spec.)/2 0.90 0.89 0.91
---------------------------------------------------------------------------
Likelihood ratio (+) Pr(+|A)/Pr(+|N) 73.17 62.68 85.41
Likelihood ratio (-) Pr(-|A)/Pr(-|N) 0.19 0.17 0.22
Odds ratio LR(+)/LR(-) 375.90 304.60 463.88
Positive predictive value Pr(A|+) 85.5% 83.3% 87.5%
Negative predictive value Pr(N|-) 98.5% 98.2% 98.6%
---------------------------------------------------------------------------
4.1 How large do you think sensitivity and specificity should be before the two methods agree?
Explain to the group what sensitivity and specificity means. What is the misclassification?
4.2 Explain the positive and negative predictive values and summarise your findings.