Data Quality

Data quality is a critical concern for AI projects. Poor data quality can bias prediction results and mislead users. There are six key areas of data quality:

Accuracy: how well does the data reflect reality?

Completeness: is the data comprehensive or not missing value entries in unexpected instances?

Consistency: does information stored in one place match the information stored in another place?

Timeliness: is data available when needed?

Validity: is data in a specific format? Is it an unusable format? Does it follow business rules?

Uniqueness: Is the data instance the only instance in which the data appears in the sample?

Attribution:

Sarfin, R. L., & Editor, P. (2021, May 12). Data Quality Dimensions: How do you measure up? (+ free scorecard). Precisely. Retrieved from https://www.precisely.com/blog/data-quality/data-quality-dimensions-measure