On the Discovery of Semantically Meaningful SQL Constraints from Armstrong Samples: Foundations, Implementation, and Evaluation

Le, Van Tran Bao

doi:10.26686/wgtn.17007619

thesis_access.pdf (2.98 MB)

On the Discovery of Semantically Meaningful SQL Constraints from Armstrong Samples: Foundations, Implementation, and Evaluation

thesis

posted on 2021-11-14, 05:56 authored by Le, Van Tran Bao

A database is said to be C-Armstrong for a finite set Σ of data dependencies in a class C if the database satisfies all data dependencies in Σ and violates all data dependencies in C that are not implied by Σ. Therefore, Armstrong databases are concise, user-friendly representations of abstract data dependencies that can be used to judge, justify, convey, and test the understanding of database design choices. Indeed, an Armstrong database satisfies exactly those data dependencies that are considered meaningful by the current design choice Σ. Structural and computational properties of Armstrong databases have been deeply investigated in Codd’s Turing Award winning relational model of data. Armstrong databases have been incorporated in approaches towards relational database design. They have also been found useful for the elicitation of requirements, the semantic sampling of existing databases, and the specification of schema mappings. This research establishes a toolbox of Armstrong databases for SQL data. This is challenging as SQL data can contain null marker occurrences in columns declared NULL, and may contain duplicate rows. Thus, the existing theory of Armstrong databases only applies to idealized instances of SQL data, that is, instances without null marker occurrences and without duplicate rows. For the thesis, two popular interpretations of null markers are considered: the no information interpretation used in SQL, and the exists but unknown interpretation by Codd. Furthermore, the study is limited to the popular class C of functional dependencies. However, the presence of duplicate rows means that the class of uniqueness constraints is no longer subsumed by the class of functional dependencies, in contrast to the relational model of data. As a first contribution a provably-correct algorithm is developed that computes Armstrong databases for an arbitrarily given finite set of uniqueness constraints and functional dependencies. This contribution is based on axiomatic, algorithmic and logical characterizations of the associated implication problem that are also established in this thesis. While the problem to decide whether a given database is Armstrong for a given set of such constraints is precisely exponential, our algorithm computes an Armstrong database with a number of rows that is at most quadratic in the number of rows of a minimum-sized Armstrong database. As a second contribution the algorithms are implemented in the form of a design tool. Users of the tool can therefore inspect Armstrong databases to analyze their current design choice Σ. Intuitively, Armstrong databases are useful for the acquisition of semantically meaningful constraints, if the users can recognize the actual meaningfulness of constraints that they incorrectly perceived as meaningless before the inspection of an Armstrong database. As a final contribution, measures are introduced that formalize the term “useful” and it is shown by some detailed experiments that Armstrong tables, as computed by the tool, are indeed useful. In summary, this research establishes a toolbox of Armstrong databases that can be applied by database designers to concisely visualize constraints on SQL data. Such support can lead to database designs that guarantee efficient data management in practice.

History

Copyright Date

2014-01-01

Date of Award

2014-01-01

Publisher

Te Herenga Waka—Victoria University of Wellington

Rights License

Author Retains Copyright

Degree Discipline

Information Systems

Degree Grantor

Te Herenga Waka—Victoria University of Wellington

Degree Level

Doctoral

Degree Name

Doctor of Philosophy

ANZSRC Type Of Activity code

970108 Expanding Knowledge in the Information and Computing Sciences

Victoria University of Wellington Item Type

Awarded Doctoral Thesis

Language

en_NZ

Victoria University of Wellington School

School of Information Management

Advisors

Link, Sebastian; Ferrarotti, Flavio

Usage metrics

Keywords

NULL constraint Armstrong table SQL constraint School: School of Information Management 080604 Database Management 080611 Information Systems Theory 970108 Expanding Knowledge in the Information and Computing Sciences Degree Discipline: Information Systems Degree Level: Doctoral Degree Name: Doctor of Philosophy Database Management Information Systems Theory

Licence

Author Retains Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

On the Discovery of Semantically Meaningful SQL Constraints from Armstrong Samples: Foundations, Implementation, and Evaluation

History

Copyright Date

Date of Award

Publisher

Rights License

Degree Discipline

Degree Grantor

Degree Level

Degree Name

ANZSRC Type Of Activity code

Victoria University of Wellington Item Type

Language

Victoria University of Wellington School

Advisors

Usage metrics

Categories

Keywords

Licence

Exports