5 Reliability and Validity

As noted earlier, as there is no reality we are measuring, there is no way for us to ensure that what we are doing is correct. In other words - everything we do is probably wrong in one way or the other - but it might be useful. So how do we ensure that? To begin with, we should ensure that everything we are do should is both reliable and valid. Reliability here refers to consistency, that is, the degree to which we get similar results whenever we apply a measuring instrument to measure a given concept. This is similar to the concept of Replicability. Validity, on the other hand, refers to unbiasedness, that is, the degree to which our measure does really measure the concept that we intend to measure. In other words, validity looks at whether the measuring instrument that we are using is objective.

Carmines & Zeller (1979) distinguish among three types of validity. Content Validity, which refers to whether our measure represents all facets of the construct of interest; Criterion Validity, which looks at whether our measure correlates with other measures of the same concept, and Construct Validity, which looks at whether our measure behaves as expected within a given theoretical context. We should also say here, that these three types of validity are not interchangeable. In the ideal case, one has to prove that their results pass all three validity tests. In the words of Grimmer & Stewart (2013): “Validate, validate, validate”!

Krippendorff (2018) distinguishes among three types of reliability. The first is Stability, which he considers as the weakest form of coding reliability. This we can measure by having our text coded more than once by the same coder. The higher the differences between the different codings, the lower our reliability. The second is Reproducibility, which reflects the agreement among independent coders on the same text. Finally, the third is Accuracy, which he considers as the strongest form of coding reliability. For this, we look at the agreement between coders, as with reproducibility, but now use a given standard. Yet, as benchmarks are rare, reproducibility is often the highest form we can go for. This agreement between coders we need for this is also known as inter-coder agreement, which we will look at next.

References

Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment. Sage. https://doi.org/10.4135/9781412985642

Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfals of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297. https://doi.org/10.1093/pan/mps028

Krippendorff, K. (2018). Content Analysis - an Introduction to Its Methodology (4th ed.). Sage.