A functional dependency in which one or more nonkey attributes are functionally dependent on part

Data Profiling

Inhaltsverzeichnis Show

Data Profiling
Functional Dependency
Normalization
Understanding Functional Dependencies
13.3.2 Functional dependencies
Relational Database Systems
VI.G. Multivalued Dependencies and Fourth Normal Form
Normalization
Transitive Dependencies
Normalization
5.6 Boyce-Codd Normal Form (BCNF)
Temporal Databases
14.4.2 Constraint-generating dependencies
Normalization
9.1 Functional and Multivalued Dependencies
8.2.3 Defining a Dimension with Attributes
Normalization
The Design of Normalized Tables: A Simple Example
Is a functional dependency in which one or more Nonkey attributes are functionally dependent on part but not all of the primary key?
Is a functional dependency in which one or more no key attributes are functionally dependent on part of the primary key?
What is a functional dependency between non
Is where a Nonkey column is dependent on part of the primary key but is not dependent on the entire primary key?

David Loshin, in Business Intelligence (Second Edition), 2013

Functional Dependency

A functional dependency between two columns, X and Y, means that for any two records R1 and R2 in the table, if field X of record R1 contains value x and field X of record R2 contains the same value x, then if field Y of record R1 contains the value y, then field Y of record R2 must contain the value y. We can say that attribute Y is determined by attribute X. Functional dependencies may exist between multiple source columns. In other words, we can indicate that one set of attributes determines a target set of attributes.

A functional dependency establishes a relationship between two sets of attributes. If the relationship is causal (i.e., the dependent attribute’s value is filled in as a function of the defining attributes), that is an interesting piece of business knowledge that can be added to the growing knowledge base. A simple example is a “total_amount_charged” field that is computed by multiplying the “qty_ordered” field by the “price” field.

If the relationship is not causal, then that piece of knowledge can be used to infer information about normalization of the data. If a pair of data attribute values is consistently bound together, then those two columns can be extracted from the targeted table and the instance pairs inserted uniquely into a new table and assigned a reference identifier. The dependent attribute pairs (that had been removed) can then be replaced by a reference to the newly created corresponding table entry.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123858894000107

Normalization

Jan L. Harrington, in Relational Database Design and Implementation (Fourth Edition), 2016

Understanding Functional Dependencies

A functional dependency is a one-way relationship between two attributes, such that at any given time, for each unique value of attribute A, only one value of attribute B is associated with it throughout the relation. For example, assume that A is the customer number from the orders relation. Each customer number is associated with one customer first name, one last name, one street address, one city, one state, one zip code, and one phone number. Although the values for those attributes may change at any moment, there is only one.

We therefore can say that first name, last name, street, city, state, zip, and phone are functionally dependent upon the customer number. This relationship is often written:

and read “customer number determines first name, last name, street, city, state, zip, and phone.” In this relationship, customer number is known as the determinant (an attribute that determines the value of other attributes).

Notice that the functional dependency does not necessarily hold in the reverse direction. For example, any given first or last name may be associated with more than one customer number. (It would be unusual to have a customer table of any size without some duplication of names.)

The functional dependencies in the orders table are:

Notice that there is one determinant for each entity in the relation and the determinant is what we have chosen as the entity identifier. Notice also that, when an entity has a concatenated identifier, the determinant is also concatenated. In this example, whether an item has shipped depends on the combination of the item and the order.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128043998000077

Algebras

Robert Laurini, Derek Thompson, in Fundamentals of Spatial Information Systems, 1992

13.3.2 Functional dependencies

In order to avoid the kind of anomalies resulting from inconsistent updating, some constraints between attributes must be defined. This topic is approached by looking at some special linkages among attributes, and the associated normalization procedures.

A functional dependency is a constraint between two sets of attributes in the database. Suppose our relational database schema has N attributes {A1, A2, …, AN}. Instead of thinking about the set of tables which undoubtedly exist for the database, let us think for the moment that the whole database is described by simply a single universal relation schema R = {A1, A2, A3} encompassing all attributes concerned. This concept does not mean that we are attempting to store the database as a single relation; it is necessary only to explain the idea of functional dependencies.

A functional dependency, denoted X → Y, between two sets of attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation instance r of R. This constraint states that for any two tuples t1 and t2 in r, such that t1(X) = t2(X), we must also have similar parts in Y, that is: t1(Y) = t2(Y). This means that the values of the Y component of a tuple in r depend on the values of the X component, or, alternatively, the values of the X component of a tuple uniquely determine the values of the Y component. For instance, a Social Security Number (SSN) determines the name of a person, or a point longitude and latitude determines a city name:

SSN → Person_name

(LON, LAT) → City_name

Another way to define functional dependencies is to say that in a relation R, X functionally determines Y if and only if whenever two tuples of r(R) agree on their X-value, they must necessarily agree on their Y-value. For example, if the supposed two persons with social security number 123 45 6789 each have the same name, then the condition of dependency of the name (Y) on the number (X) is met. However, if the names are different, the condition is not met and we might suspect we have some bad data because there is supposed to be a one-to-one relation between name and number. However, the situation of identical names does not presuppose the same person, so that the social security number could be different. Moreover, if R states that there cannot be more than one tuple with a given X-value in any instance r(R), X is called a candidate key for R.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080924205500189

Relational Database Systems

Catherine M. Ricardo, in Encyclopedia of Information Systems, 2003

VI.G. Multivalued Dependencies and Fourth Normal Form

Although BCNF eliminates anomalies due to functional dependencies, Fagin, Zaniolo, and Delobel independently identified another type of dependency that can cause problems due to repetition of data. Multivalued dependency may arise out of the process of normalization to achieve first normal form. Recall that first normal form forbids multiple values in a cell of a table. One solution is to repeat the rest of the data along with each of the values of the cell, making the multivalued attribute part of the key. If we have two attributes that are, by nature, multivalued, we must create tuples for every combination of values of one with values of the second. A multivalued dependency exists when there are three attributes A, B, and C in a relation R such that for each value of A the set of B values associated with the A value are independent of the set of C values associated with the A value. We say A multidetermines B and A multidetermines C. By definition, multivalued dependencies occur in pairs. We write

A trivial multivalued dependency A → > B is one where either B is a subset of A, or A and B together make up all the attributes of R. A relation is in fourth normal form if it is in BCNF and has no nontrivial multivalued dependencies. For example, consider the relation

We will assume an employee can have several skills and several dependents. An unnormalized instance of this relation is shown in Figure 19. Since this is not a valid INF table, we need to normalize it by removing the repeating values from the skill and dependents cells. When we “flatten” the table in this way, we have to repeat all combinations of skill and dependent name for each employee, to avoid the appearance of a relationship between the skill and the dependent name. The resulting table instance is shown in Figure 20. In this table, all three attributes form the key, and we have

Figure 19. The Unnormalized Emp Table.

Figure 20. The Emp Table in First Normal Form.

The table is not in 4NF. To make it 4NF, we decompose it into two tables, as shown in Figure 21:

Figure 21. The Emp Database in 4NE

Each of these tables is in 4NF.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0122272404001477

Normalization

Jan L. Harrington, in Relational Database Design (Third Edition), 2009

Transitive Dependencies

A transitive dependency exists when you have the following functional dependency pattern:

A→B and B→C; therefore A→C

This is precisely the case with the original items relation. The only reason that the warehouse phone number is functionally dependent on the item number is because the distributor is functionally dependent on the item number and the phone number is functionally dependent on the distributor. The functional dependencies are really:

Item_numb −> distrib_numb

Distrib_numb −> warehouse_phone_number

Note: Transitive dependencies take their name from the transitive property in mathematics, which states that if a > b and b > c, then a > c.

There are two determinants in the original items relation, each of which should be the primary key of its own relation. However, it is not merely the presence of the second determinant that creates the transitive dependency. What really matters is that the second determinant is not a candidate key for the relation.

Consider for example, this relation:

Item (item_numb, UPC, distrib_numb, price)

The item number is an arbitrary number that Antique Opticals assigns to each merchandise item. The UPC is an industry-wide code that is unique to each item as well. The functional dependencies in this relation are:

Item_numb −> UPC, distrib_numb, price

UPC −> item_numb, distrib_numb, price

Is there a transitive dependency here? No, because the second determinant is a candidate key. (Antique Opticals could have just as easily used the UPC as the primary key.) There are no insertion, deletion, or modification anomalies in this relation; it describes only one entity: the merchandise item.

A transitive dependency therefore exists only when the determinant that is not the primary key is not a candidate key for the relation. In the items table we have been using, for example, the distributor is a determinant but not a candidate key for the table. (There can be more than one item coming from a single distributor.)

When you have a transitive dependency in a 2NF relation, you should break the relation into two smaller relations, each of which has one of the determinants in the transitive dependency as its primary key. The attributes determined by the determinant become non-key attributes in each relation. This removes the transitive dependency—and its associated anomalies—and places the relation in third normal form.

Note: A second normal form relation that has no transitive dependencies is, of course, automatically in third normal form.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123747303000061

Normalization

Joe Celko, in Joe Celko's SQL for Smarties (Fifth Edition), 2015

5.6 Boyce-Codd Normal Form (BCNF)

A table is in BCNF when for all nontrivial FDs (X → A), X is a superkey for the whole schema. A superkey is a unique set of columns that identify each row in a table, but you can remove some columns from it and it will still be a key. Informally, a superkey is carrying extra weight.

BCNF is the normal form that actually removes all transitive dependencies. A table is in BCNF if for all (X → Y), X is a key period. We can go to this normal form just by adding another key with UNIQUE (room_nbr, time_period) constraint clause to the table Classes.

There are some other interesting and useful “higher” normal forms, but they are outside of the scope of this discussion. In our example, we have removed all of the important anomalies with BCNF.

Third Normal Form was concerned with the relationship between key and nonkey columns. However, a column can often play both roles. Consider a table for computing each salesman’s bonus gifts that has for each salesman his base salary, the number of gift_points he has won in a contest, and the bonus gift awarded for that combination of salary range and gift_points. For example, we might give a fountain pen to a beginning salesman with a base pay rate between $15,000.00 and $20,000.00 and 100 gift_points, but give a car to a master salesman, whose salary is between $30,000.00 and $60,000.00 and who has 200 gift_points. The functional dependencies are, therefore,

(pay_step, gift_points) → gift_name

gift_name → gift_points

Let’s start with a table that has all the data in it and normalize it.

Gifts

salary_amt gift_points gift_name

=================================

15000.00 100 'Pencil'

17000.00 100 'Pen'

30000.00 200 'Car'

31000.00 200 'Car'

32000.00 200 'Car'

CREATE TABLE Gifts

(salary_amt DECIMAL(8,2) NOT NULL

gift_points INTEGER NOT NULL,

PRIMARY KEY (salary_amt, gift_points),

gift_name VARCHAR(10) NOT NULL);

This schema is in 3NF, but it has problems. You cannot insert a new gift into our offerings and points unless we have a salary to go with it. If you remove any sales points, you lose information about the gifts and salaries (e.g., only people in the $30,000.00 to $32,000.00 range can win a car). And, finally, a change in the gifts for a particular point score would have to affect all the rows within the same pay step. This table needs to be broken apart into two tables:

PayGifts

salary_amt gift_name

=====================

15000.00 'Pencil'

17000.00 'Pen'

30000.00 'Car'

31000.00 'Car'

32000.00 'Car'

CREATE TABLE Gifts

(salary_amt DECIMAL(8,2) NOT NULL,

gift_points INTEGER NOT NULL,

PRIMARY KEY(salary_amt, gift_points),

gift_name VARCHAR(10) NOT NULL);

GiftsPoints

gift_name gift_points

======================

'Pencil' 100

'Pen' 100

'Car' 200

(salary_amt, gift_points) → gift

gift → gift_points

CREATE TABLE GiftsPoints

(gift_name VARCHAR(10) NOT NULL PRIMARY KEY,

gift_points INTEGER NOT NULL));

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012800761700005X

Temporal Databases

Jan Chomicki, David Toman, in Foundations of Artificial Intelligence, 2005

14.4.2 Constraint-generating dependencies

If we consider the first-order formulation of temporal functional dependencies in timestamp databases, we notice that the formulas obtained in this way contain equalities between temporal variables. It is natural to consider a generalization of such dependencies that allows not only equalities but also arbitrary constraints over the given temporal domain. Then we can formulate integrity constraints like “the transaction time of a given tuple should always be greater than or equal to the valid time of this tuple.” Note that the constraints over the temporal domain are not used here to represent infinite sets (as in constraint databases [Kanellakis et al., 1995]) but rather to obtain a more expressive language of integrity constraints.

This idea was first formulated in [Ginsburg and Hull, 1983; Ginsburg and Hull, 1986] and then formalized in [Baudinet et al., 1999] using the notion of a constraint-generating dependency (CGD). Baudinet et al. [Baudinet et al., 1999] described a general reduction of the implication problem for such dependencies to the problem of validity of universal formulas in the appropriate constraint theory. Complexity results for restricted classes of CGDs were also given. A similar idea was studied in the temporal database context in [Wijsen, 1998].

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/S1574652605800161

Normalization

Joe Celko, in Joe Celko's SQL for Smarties (Fourth Edition), 2011

9.1 Functional and Multivalued Dependencies

A normal form is a way of classifying a table based on the functional dependencies (FDs for short) in it. A functional dependency means that if I know the value of one attribute, I can always determine the value of another. The notation used in relational theory is an arrow between the two attributes, for example A → B, which can be read in English as “A determines B.” If I know your employee number, I can determine your name; if I know a part number, I can determine the weight and color of the part; and so forth.

A multivalued dependency (MVD) means that if I know the value of one attribute, I can always determine the values of a set of another attribute. The notation used in relational theory is a double-headed arrow between the two attributes, for instance A → B, which can be read in English as “A determines many Bs.” If I know a teacher's name, I can determine a list of her students; if I know a part number, I can determine the part numbers of its components; and so forth.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123820228000090

Dimensions

Lilian Hobbs, ... Pete Smith, in Oracle 10g Data Warehousing, 2005

8.2.3 Defining a Dimension with Attributes

In a dimension definition, the ATTRIBUTE clause is used to define any functional dependencies between columns within the same table that are not hierarchical in nature.

In the EASYDW schema, in the PRODUCT table we have two columns, PRODUCT_ID and PRODUCT_NAME, such that given a PRODUCT_ID, there is only one PRODUCT_NAME. The following example shows the definition of a dimension with the attribute clause, representing this relationship. Note, however, that this relationship is true only in one direction (i.e., it does not mean that given the PRODUCT_NAME we can determine the PRODUCT_ID).

In the ATTRIBUTE clause, the name on the left side of the DETERMINES keyword should be a level name—for example, PRODUCT_ID. To the right of the DETERMINES keyword are the dependent columns—for example, PRODUCT_NAME. Note that you can either specify multiple dependent columns within the same attribute clause or specify different attribute clauses for each one—both ways convey equivalent semantics.

Note that you can also specify a name for the attribute relationship; however, this is optional. To do this, you need to use the extended clause with the LEVEL keyword. For example, in the preceding example, the relationship between PRODUCT_ID and MANUFACTURER is given a name, PROD_MANUFACTURER.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781555583224500102

Normalization

Toby Teorey, ... H.V. Jagadish, in Database Modeling and Design (Fifth Edition), 2011

The Design of Normalized Tables: A Simple Example

The example in this section is based on the ER diagram in Figure 6.4 and the following FDs. In general, FDs can be given explicitly, derived from the ER diagram, or derived from intuition—that is, from experience with the problem domain.

Figure 6.4. ER diagram for employee database.

emp_id, start_date -> job_title, end_date

emp_id -> emp_name, phone_no, office_no, proj_no, proj_name, dept_no

phone_no -> office_no

proj_no -> proj_name, proj_start_date, proj_end_date

dept_no -> dept_name, mgr_id

mgr_id -> dept_no

Our objective is to design a relational database schema that is normalized to at least 3NF and, if possible, minimize the number of tables required. Our approach is to apply the definition of 3NF given previously to the FDs given above, and create tables that satisfy the definition.

If we try to put FDs 1–6 into a single table with the composite candidate key (and primary key) (emp_id, start_date) we violate the 3NF definition, because FDs 2–6 involve left sides of FDs that are not superkeys. Consequently, we need to separate FD 1 from the rest of the FDs. If we then try to combine 2–6 we have many transitivities. Intuitively, we know that 2, 3, 4, and 5 must be separated into different tables because of transitive dependencies. We then must decide whether 5 and 6 can be combined without loss of 3NF; this can be done because mgr_id and dept_no are mutually dependent and both attributes are superkeys in a combined table. Thus, we can define the following tables by appropriate projections from 1–6.

emp_hist: emp_id, start_date -> job_title, end_date

employee: emp_id -> emp_name, phone_no, proj_no, dept_no

phone: phone_no -> office_no

project: proj_no -> proj_name, proj_start_date, proj_end_date

department: dept_no -> dept_name, mgr_id

mgr_id -> dept_no

This solution, which is BCNF as well as 3NF, maintains all the original FDs. It is also a minimum set of normalized tables. In the “Determining the Minimum Set of 3NF Tables” section, we will look at a formal method of determining a minimum set that we can apply to much more complex situations.

Alternative designs may involve splitting tables into partitions for volatile (frequently updated) and passive (rarely updated) data, consolidating tables to get better query performance, or duplicating data in different tables to get better query performance without losing integrity. In summary, the measures we use to assess the trade-offs in our design are:

•

Query performance (time).

•

Update performance (time).

•

Storage performance (space).

•

Integrity (avoidance of delete anomalies).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123820204000100

Is a functional dependency in which one or more Nonkey attributes are functionally dependent on part but not all of the primary key?

A partial functional dependency is a functional dependency in which one or more nonkey attributes are functionally dependent on part (but not all) of the primary key. A transversal dependency is a functional dependency between two or more nonkey attributes.

Is a functional dependency in which one or more no key attributes are functionally dependent on part of the primary key?

Terms in this set (11) A relation is a functional dependency between the primary key and one or more non-key attributes that are dependent on the primary key via another non-key attribute. A foreign key in a relation that references the primary key values of the same relation.

What is a functional dependency between non

A functional dependency (FD) is a relationship between two attributes, typically between the PK and other non-key attributes within a table. For any relation R, attribute Y is functionally dependent on attribute X (usually the PK), if for every valid instance of X, that value of X uniquely determines the value of Y.