Group work - reproducibility answers

Assignment

Lars is involved in a study on grip strength in patients with rheumatoid arthritis, and he is interested in finding out more about the reproducibility of the procedures he uses.

Therefore, he has designed a study which includes measuring the grip strength (Nm2) in 20 patiens with rheumatoid arthritis at two different time points (1 week in between). He obtains the results outlined in Table 1.

Table 1. Test-retest data on grip strength in 20 rheumatoid arthritis patients

Patient no. Measurement 1 Measurement 2
1 86 92
2 40 47
3 50 55
4 52 57
5 69 74
6 62 64
7 84 84
8 68 72
9 58 62
10 71 74
11 92 97
12 76 74
13 77 81
14 77 83
15 64 67
16 35 34
17 88 94
18 76 74
19 103 110
20 117 125

1. Pearson’s correlation coefficient

Lars calculates the Pearson’s correlation coefficient to be: 0.991 (p<0.0001).

Questions

1.1 Discuss what the correlation coefficient tells us about the reliability between the two measurements?

Answer

The Person’s correlation coefficient:

  1. Tells us if there is a linear association between the two measurements
  2. Depends on the variance of the data. E.g.: if the SD is large, we get a higher correlation compared to a small SD
  3. Does not take systematic error or variance differences between the measurements into account

The variance of the data is: Measurement 1 [35 - 117], Measurement 2 [47 - 125]

Therefore, the reliability is high, but this does not mean that the two measurements find exactly the same values for grip strength.

1.2 Explain what the p-value means?

Answer

The P-value tells us that the correlation coefficient is statistically significant different from 0. This is of no value as we would like to know if how close the coefficient is to 1.

2. Intraclass correlation coefficients

The numbers in Table 2 below are an extraction from an SPSS analysis (General Linear Models, variance components, restricted maximum likelihood) of the data from the Table 1.

Table 2. Variance components

Variance component Estimate
Var(patients) 424,579
Var(measurements) 6,811
Var(error) 4,414

Questions

2.1 Using the variance estimates, please calculate ICC-consistency (model 2.1), ICC-agreement (model 2.1) SEM-consistency and SEM-agreement.

Answer

2.2 What do you think about the reliability and the measurement error?

Answer

Both ICC’s are high:

SEM-consistency = 2.10 and SEM-agreement = 3.35, therefore the measurement error is 2.10 Nm2 or 3.35 Nm2 (if including the systematic error):

3. Limits of agreeement and systematic differences

A paired t-test of measurement 1 (score1) and measurement 2 (score2) is shown in Table 3.

Table 3. Paired t-test

. ttest score1 = score2

Paired t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
  score1 |      20       72.25    4.508982    20.16478    62.81259    81.68741
  score2 |      20          76    4.750623    21.24543    66.05683    85.94317
---------+--------------------------------------------------------------------
    diff |      20       -3.75    .6644151    2.971354   -5.140637   -2.359363
------------------------------------------------------------------------------
     mean(diff) = mean(score1 - score2)                           t =  -5.6441
 Ho: mean(diff) = 0                              degrees of freedom =       19

 Ha: mean(diff) < 0           Ha: mean(diff) != 0           Ha: mean(diff) > 0
 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

Questions

3.1 What is the systematic difference between measurement 1 (score1) and measurement 2 (score2)? Which measurement has the highest average score?

Answer

3.2 What is LOA (limits of agreement)?

Answer

3.3 What will happen with ICC-consistency and ICC-agreement if measurement 2 (score2) always measures 20 Nm2 lower compared to measurement 1 (score1)?

Answer

If measurement 2 consequently measures 20 Nm2 lower compared to measurement 1, then:

3.4 What will happen to the ‘limits of agreement’?

Answer

4. Reproducibility of shoulder measurements

In a study by De Winter et al. (2004) two physiotherapists have measured shoulder abduction (Figure 1) in 155 patients with pain in one shoulder. They used an electronic inclinometer, showing the abduction angle in degrees (Figure 2).

Figure 1. Shoulder abduction

Figure 2. Inclinometer measuring the abduction angle

Table 4 shows the results in degrees as well as other measures of reproducibility for both shoulders.

Table 4. Results

Questions

4.1 Explain the large difference between the two ICC’s in light of the exact same results for the 5 and 10% agreement.

Answer

Look at the variance of the results (SD):

4.2 Discuss which parameter is preferrable?

Answer

Whether one prefers degrees or ICC from 0-1 is a personal preference. Personally, I prefer degrees as it is more meaningful clinically and because ICC’s are more difficult to interpret.

5. Design of a reproducibility study

A researcher is designing a reproducibility study. On a course in questionnaire technique he has heard that the reliability will improve if the study population is more heterogenous. He therefore designs the study to include very different patients.

Questions

5.1 Discuss this strategy?

Answer

Reliability of a questionnaire must be tested in the same patient population as where is should be used. Therefore, this design is flawed.