Group work - interpretation & responsiveness answers

1. MIC-predictive

You have collected data from a clinical trial evaluating the effect of lumbar spinal surgery compared to exercise therapy. As a secondary aim you have decided to look at how to interpret the Oswestry Disability Index (ODI) and want to calculate the MIC-predictive.

Recall from the introductory course that the ODI has 10 items, each item has 6 response options, and the scale range is 0-100 (high score equals high disability). It is based on a reflective model. You can view the ODI in full by clicking here: ODI

For this you need to download the dataset: (unpack the zip-file). The dataset can be read by Stata 12-16.

1.1 Find the proportion of improved patients in the population?


Use the summarize command to answer the question.


1.1 What is the proportion of improved patients in the population and why is it important?


We first look at the proportion of improved patients to see if it is more or less than 50%. If not, we do not need to apply the adjustmend to the MIC-predictive.

. summarize anc

    Variable |        Obs        Mean    Std. Dev.       Min        Max
         anc |        168    .5357143    .5002138          0          1

The proportion of improved patiens is 0.54.

1.2 Perform the logistic regression used to calculate the MIC-predictive


Use the logit command with the anchor (anc) as the dependent variable and the ODI change score (odich) as the independent variable.


1.2 What are the intercept C, the regression coefficient Bx and their SE’s (standard errors)?


We carry out the following logistic regression model:

. logit anc odich

Iteration 0:   log likelihood = -115.39304  
Iteration 1:   log likelihood = -87.337139  
Iteration 2:   log likelihood = -86.540531  
Iteration 3:   log likelihood = -86.538959  
Iteration 4:   log likelihood = -86.538959  

Logistic regression                             Number of obs     =        167
                                                LR chi2(1)        =      57.71
                                                Prob > chi2       =     0.0000
Log likelihood = -86.538959                     Pseudo R2         =     0.2501

         anc |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
       odich |   .1181484   .0206487     5.72   0.000     .0776776    .1586192
       _cons |  -1.009753   .2530678    -3.99   0.000    -1.505757    -.513749

The vales are:

1.3 Calculate the correlation between the intercept C and the regression coefficient Bx


Right after the logistic regression you make a postestimation using the estat vce, corr command.


1.3 What is the correlation between C and Bx?


We carry out the postestimation command vce:

. estat vce, corr

Correlation matrix of coefficients of logit model

             | anc                
        e(V) |    odich     _cons 
anc          |                    
       odich |   1.0000           
       _cons |  -0.6810    1.0000 

The correlation between C and Bx = -0.68

1.4 Calculate the MIC-predictive and CI’s using the Excel spreadsheet available at the homepage



1.4 What is the MIC-predictive and it’s CI’s?


The important coefficients we need are:

If we enter these values in the spreadsheet we get the MIC-predictive (CI) = 9.76 (6.629; 13.263)

1.5 Using the formula in the slideshow, calculate the adjusted MIC-predictive value


The formula is as follows:

MIC-adjusted = MIC-predictive − (0.09 + 0.103 x cor) x SD-change x log−odds(imp)


These coefficients are found by using the following commands:

esize twosample odich, by(anc) pbcorr     // Point biserial correlation
sum odich     // SD of the change score
sum anc     // Proportion of improved patients


1.5.1 What is the adjusted MIC-predictive?


We need to calculate the formula:

MIC-adjusted = MIC-predictive − (0.09 + 0.103 x cor) x SD-change x log−odds(imp)

To do this, we need to find the point biserial correlation (cor) and SD-change:

. esize twosample odich, by(anc) pbcorr   // Point biserial correlation

Effect size based on mean comparison

                               Obs per group:
                              No improvement =         78
                                 Improvement =         89
        Effect Size |   Estimate     [95% Conf. Interval]
   Point-Biserial r |  -.5214815    -.6133492   -.4059314

. sum odich       // SD of the change score

    Variable |        Obs        Mean    Std. Dev.       Min        Max
       odich |        190    9.953843    14.98928        -30         62

MIC-adjusted = 9.758 – (0.09 + 0.103 x (-0.681)) x 14.989 x ln(0.536/(1-0.536))

The MIC-adjusted = 9.68. This value is very close to the MIC-predictive as the prevalence is close to 50%.

1.5.2 Describe what the MIC-predictive value and the CI’s mean. How can you use it?


The MIC-predictive reflects the gMIC which is the mean of all iMIC in a group (see Terluin et al. (2017)). It is determined using logistic modelling and is more precise compared to MIC estimated using ROC analysis. The wideness of the CI’s gives us an indication of how precise our estimate is and if we can trust it. The MIC-redictive and NNT can be used to interpret clinical trials in a more user-friendly and clinically relevant way.

2. van der Windt: Responsiveness & interpretation

This part requires that you have read the article:

van der Windt DA, van der Heijden GJ, de Winter AF, Koes BW, Deville W, Bouter LM. The responsiveness of the Shoulder Disability Questionnaire. Ann Rheum Dis. 1998;57(2):82-7. You can find it here.

Article summary

Some years ago, the responsiveness of the Shoulder Disability Questionnaire (SDQ) was evaluated in a general practice setting in patients with shoulder pain. The SDQ was compared with the Pain Severity Score (PSS) and Functional Status Questionnaire (FSQ).

The SDQ (shoulder Disability Questionnaire) consists of 16 items and is scored on a scale from 0–100, with higher scores indicating more severe disability (see Appendix 1 at the end of the page).

The PSS (Pain Severity Scale) is a single question about the severity of pain, scored on a scale of 0–10. The PSS is also converted to a scale of 0–100, with higher scores indicating more severe pain.

The FSQ (Functional Status Questionnaire) consists of a three-point scale (1: little discomfort during daily activities; 2: much discomfort during daily activities; 3: unable to perform daily activities).

Clinical improvements and deterioration were documented through self-reported changes since the beginning of the episode. No change and little improvement were considered clinical stability. Measurements were taken upon entrance into the study and at one and six months’ follow-up.

In the study, responsiveness was assessed in terms of Guyatt’s responsiveness ratio and a ROC curve. The most important findings are presented in Table 3 and Figure 1 (see below).

Table 3. Mean change scores (SD) and responsiveness ratios for the SDQ and PSS after 1 and 6 months.

* Substitution of missing values was conducted for patients reporting complete recovery: for 74 patients at one month (PSS only) and for 157 patients at six months (PSS and SDQ),
† Responsivess ratio: the ratio of the mean change score in clinically improved patients to the variability (SD) of change in scores in clinically stable patients.

Figure 1. ROC curves for change scores of the SDQ, PSS and FSQ at 1 month.

Note: True positive rate (sensitivity) and false positive rate (100-specificity) are for discriminating between patients reporting clinical improvement or clinical stability. Potential cut off points for the SDQ-change-score = 18.75: sensitivty 74%, specificity 77% (optimal trade off); SDQ-change-score = 40: (mean change in clinically improved patients) sensitivity 46%, specificity 98%.


2.1 How were the responsiveness ratios for the SDQ and PSS calculated? Can you check these calculations using the presented data?


The mean change scores for the Shoulder Disability Questionnaire (SDQ) and the Pain Severity Score (PSS) for patients with ‘clinical improvement’ after one and six months appear in the first row of Table 3. These are divided by the standard deviation of the change scores (SD-change) of stable patients, found in the next row.

For example, for the SDQ after six months, the Guyatt’s responsiveness ratio was calculated as 51/27 = 1.89. Note that the authors did NOT include the MINIMAL important change (MIC) in the numerator. Thus, the Guyatt’s responsiveness ratio is overestimated. Moreover, this ratio is more a measure of interpretability than a measure of responsiveness.

2.2 Calculate the smallest detectable change (SDC) and the limits of agreement (LOA) for the SDQ after the one-month follow-up. Clarify the possible difference between these two measures and explain which one you prefer in this case.


Because the SEM is not given, the smallest detectable change (SDC) must be calculated with the SD-change:

SDC = 1.96 x SD-change = l.96 x 18 = 35.3

Limits of agreement (LOA) = Mean change in stable group ± 1.96 x SD-change = 4 ± 1.96 x 18 = (-31.3; 39.3)

These measures give a different outcome because, in the stable group, there was a mean difference of 4 points between the first and second measurements. If there were no change, the SDC (so calculated) and the limits of agreement would arrive at the same outcome. But if there is a systematic difference (as expected), preference should be given to the limits of agreement.

(NB: these calculations are actually based on SEM-consistency, because it is part of SD-change. LOA cannot be deduced from SEM-agreement)

2.3 What do think about the chosen external criterion? How would the SDC and responsiveness ratio change if the category ‘little improvement’ was considered ‘clinical improvement’?


The chosen external criterion for clinical stability, ‘recovery as experienced by the patient’, is subjective. The literature expresses doubt about the reliability and validity of this criterion, because it is derived from only one question and often better correlates with the last rather than the first measurement. Self-reported recovery (‘much improved’) corresponds, as expected, insufficiently with the smallest clinically relevant change. The chosen criterion may classify a number of patients who did have clinically relevant improvements as stable.

If the category ‘little improvement’ is counted as ‘clinically relevant improvement’, the SD of the stable group would become smaller, thus giving the SDC a smaller value. The denominator of the responsiveness ratio is reduced, but so is the numerator: this is because patients with slight improvements are also counted as improved. What effect that has on the responsiveness ratio is hard to predict.

2.4 On the basis of the data in Table 3, draw an anchor-based MIC distribution after one month for the SDQ: distribution of scores for the group without clinically relevant improvement or deterioration and distribution of scores for the group with clinically relevant improvement.


Relevant numbers needed to draw the graph:

Figure 2. Anchor-based MIC distribution graph

2.5 Estimate the MIC value with optimal sensitivity and specificity for the SDQ using the ROC curve.


The optimal sensitivity and specificity is 74% and 77%, respectively. This gives a ROC cut-off value of 18.75 points.

2.6 How would the MIC value change if the category ‘little improvement’ was considered ‘clinical improvement’?


If the category ‘little improvement’ is counted as ‘clinically relevant improvement’, some of the people who were classified as not importantly changed in the right-hand curve in the anchor-based MIC distribution will move to the left-hand curve. Specifically: the upper part of the right-hand curve moves to the lowest part of the left-hand curve. The MIC based on optimal sensitivity and specificity will become smaller.

Appendix 1. The Shoulder Disability Questionnaire