Computer Science

Data Quality

Data quality refers to the accuracy, completeness, consistency, and reliability of data. In computer science, it is crucial for ensuring that data is suitable for its intended use, analysis, and decision-making. High data quality is essential for effective data-driven applications and systems.

Written by Perlego with AI-assistance

5 Key excerpts on "Data Quality"

  • Information Quality
    eBook - ePub

    Information Quality

    The Potential of Data and Analytics to Generate Knowledge

    • Ron S. Kenett, Galit Shmueli(Authors)
    • 2016(Publication Date)
    • Wiley
      (Publisher)
    data preprocessing. They can be grouped into “source quality” and “Data Quality” criteria (Kaynak and Herbig, 2014). Obviously, source quality affects Data Quality:
    It is almost impossible to know too much about the data collection process because it can influence the quality of the data in many ways, some of them not obvious.
    Boslaugh (2007, p. 5) further considers availability, completeness, and data format:
    A secondary data set should be examined carefully to confirm that it includes the necessary data, that the data are defined and coded in a manner that allows for the desired analysis, and that the researcher will be allowed to access the data required.
    We again note that the questions and criteria mentioned relate to the data and goal, but not to an analysis method or utility; the InfoQ definition, however, requires all four components.

    3.1.3 Operationalizing “Data Quality” in management information systems

    In the field of management information systems (MIS), Data Quality is defined as the level of conformance to specifications or standards. Wang et al. (1993) define Data Quality as “conformance to requirements.” They operationalize this construct by defining quality indicators that are based on objective measures such as data source, creation time, and collection method, as well as subjective measures such as the credibility level of the data at hand, as determined by the researcher.
    As mentioned in Chapter 2 , Lee et al. (2002) propose a methodology for assessment and benchmarking of InfoQ of IT systems called AIMQ. They collate 15 dimensions from academic papers in MIS: accessibility, appropriate amount, believability, completeness, concise representation, consistent representation, ease of operation, free of error, interpretability, objectivity, relevancy, reputation, security, timeliness, and understandability
  • Healthcare Business Intelligence
    eBook - ePub

    Healthcare Business Intelligence

    A Guide to Empowering Successful Data Reporting and Analytics

    • Laura Madsen(Author)
    • 2012(Publication Date)
    • Wiley
      (Publisher)
    Obviously I believe that Data Quality is important, but I would encourage a pragmatic focus on what is reasonable to accomplish. It's important to know that Data Quality has to be part of your BI effort before you start. First, we should all understand how Data Quality is defined. Perhaps the most famous definition is by Joseph Juran, a well-known management consultant and quality advocate: “Data are of high quality if they are fit for their intended uses in operations, decision making and planning.” Another definition outlines the attributes of Data Quality:

    Definition

    1. Accuracy: The extent to which there are no errors.
    2. Scope: The extent to which the breadth and depth of the data provide sufficient coverage of the event(s) of interest.
    3. Timeliness: The extent to which data is received on time to take suitable actions and decisions.
    4. Recency: The extent to which data is up to date relative to the event(s) of interest.
    (Barua, 2011)
    Simply put, data of high quality in your data warehouse should be error-free (no identification numbers with letters in them), include salient data points (such as diagnosis code for billing), and provide the data within hours or days (not weeks or months) to allow your business users to make decisions.

    Data Quality Implications for Healthcare

    The importance of good Data Quality cannot be underestimated; in a recent study on the value of Data Quality to organizations, it was found that even a 10 percent increase in the quality of data was attributable to $2 billion in revenue annually for a Fortune 1000 organization (Barua, 2011). That is money directly attributable to the improvement of usability of the data. So the importance of Data Quality is undeniable, but that doesn't mean that we drop everything and spend the next two years rebuilding the processes that create bad data. But it does
  • Information Technology and Data in Healthcare
    eBook - ePub
    Chapter 4

    Data Quality

    DOI: 10.4324/9780429061219-4  

    What Is Data Quality?

    The ISO 9000:2015* definition of Data Quality would be “Data Quality can be defined as the degree to which a set of characteristics of data fulfills requirements.” Remember that data means facts that are verifiable , so how does this definition align with the idea of quality. Let’s explore the idea of quality first.
    *
    https://en.wikipedia.org/wiki/ISO_9000
    https://en.wikipedia.org/wiki/Data_quality
    There are many attempts to determine the one list of data characteristics that should be used in this definition of quality. The one I have found to work best for healthcare data, and that I am currently using, is from Cai and Zhu. They propose five dimensions of Data Quality and characteristics for each dimension. The dimensions they describe are availability, usability, reliability, relevance, and presentation quality. For instance, if we look at the dimension of reliability, we see that the proposed characteristics are: accuracy, integrity, consistency, completeness, and auditability. These meet our sniff test (or at least mine) of how reliability might be characterized. If we then had definitions for how to measure each of these characteristics, we could produce an integrated measure of reliability for a specific data set. (See Figure 4.1 .)
    L. Cai and Y. Zhu, 2015, The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Science Journal , 14:2: 1–10. doi:http://dx.doi.org/10.5334/dsj-2015-002
    Figure 4.1 Data Quality framework
    .
    I deliberately chose reliability as this first example because each of the characteristics proposed can be measured (with the possible exception of integrity), that is, each can produce facts as a consequence of examination. Not all of Cai and Zhu’s dimensions are so clean in this manner, but I’ll take that up shortly. The question now becomes, “how are we to produce and evaluate these facts?” Let’s start with reliability …
  • Measuring Data Quality for Ongoing Improvement
    eBook - ePub

    Measuring Data Quality for Ongoing Improvement

    A Data Quality Assessment Framework

    • Laura Sebastian-Coleman(Author)
    • 2012(Publication Date)
    • Morgan Kaufmann
      (Publisher)
    Data Quality standards are assertions about the expected condition of the data that relate directly to quality dimensions: how complete the data is, how well it conforms to defined rules for validity, integrity, and consistency, as well as how it adheres to defined expectations for presentation. In other words, standards often pertain directly to the conventions of representation that we expect data to follow.
    General data standards and Data Quality standards address common situations. They are put in place largely to ensure a consistent approach to common problems (e.g., defaulting, use of the NULL or empty value in a data field, criteria for establishing data types, naming conventions, processes for maintaining historical data). Standards for a given system must be cohesive; they should not contradict each other. They should also be enforceable and measurable. They should be understood using similar criteria for measurement. Standards differ from requirements. They generally apply to how requirements are met, rather than being requirements themselves.10 That said, some requirements may be expressed in terms of standards. For example, a national standard for performance within the health care system is the Healthcare Effectiveness Data and Information Set (HEDIS), developed by the National Committee for Quality Assurance (NCQA).11
  • The Practitioner's Guide to Data Quality Improvement
    Although this enumeration is a basic set of aspects of how Data Quality can be measured, there are many other aspects that may be specific to an industry (e.g., conformance to industry data standards), a corporation (associated with internal information policies), or even a line-of-business level. Not-for-profit organizations may have different constraints and different productivity measures. Government agencies may have different kinds of collaboration and reporting oversight. No matter what, though, the determination of the critical aspects of Data Quality should be customized to the organization, since it will feed into the metrics and protocols for assessing and monitoring key Data Quality performance factors. 8.8 Summary The basis for assessing the quality of data is to create a framework that can be used for asserting expectations, providing a means for quantification, establishing performance objectives, and applying the oversight process to ensure that the participants conform to the policies. This framework is based on dimensions of Data Quality – those discussed in this chapter, along with any others that are specifically relevant within your industry, organization, or even just between the IT department and its business clients. These measures complete the description of the processes that can be used to collect measurements and report them to management (as introduced in chapter 7). Before any significant improvement will be manifested across the enterprise, though, the individuals in the organization must understand the virtues of performance-oriented Data Quality management and be prepared to make the changes needed for quality management.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.