Session E-CASE

Database Maintenance
and Integrity

Elie Muyal
RESolution Ltd.


Introduction

In the past, the issue of database design in the PC environment has been very much neglected by both the developers of database applications and by the providers of development tools. There are many reasons for this. Due to the hardware limitations in processing power as well as disk space, developers developed database applications which were necessarily small and simple for use by a single user. Well aware of the hardware limitations, customers were neither demanding nor sophisticated. They were prepared to computerize only a small segment of their business without going into every detail and without really knowing for what to ask. As they could not have an ambitious information system containing 100% of all relevant information, they settled for very small databases containing few tables.

Because the databases were small and simple, providers of PC development tools focused on the programming language, compilers, linkers, and tools to enhance the user interface rather than on providing development tools for the design and maintenance of the database. As a result, databases were not only not given the proper attention by providers of development tools, but databases were not given the proper importance by developers who understood that their task was to develop applications for their clients, disregarding the fact that data is the raison d’être of their applications.

Background

Although the importance of good database design was recognized by developers of databases in other platforms, such as mini-systems and mainframes, there was no communication between developers of large system environments and of those in the PC world. Because the existing data modeling tools primarily addressed the large system databases and because their costs were so high, PC developers that may have been aware of the importance of good database design could not justify the high cost of purchasing data modeling tools.

Until only recently, there has been a general lack of awareness of the importance of good database design, and of the methodology and terminology of database design in the PC world, despite the very rapid and dramatic changes in the industry which have resulted in an "explosion" of data. In terms of speed and space, hardware has become more powerful and affordable. The mushrooming of networks have made departmental applications for Multi-User use not only possible, but essential. Providers have begun to develop more automated tools in order to increase productivity. And finally, clients have become more sophisticated and more demanding, well aware of the ever increasing power of computerization that can handle in detail more aspects of their businesses. Applications today no longer consist of just a few tables, and you, the developers of database applications, cannot afford to ignore the issue of good database design and maintenance when data is simply no longer easily manageable.

Current Situation

As more books and articles are being written on the subject and as more affordable data modeling tools are appearing on the market, data modeling is fast becoming a very mature technology. With the advent of Client/Server, the lack of communication between large systems and PC programs has come to an end as these two environments merge.

It is also clear to us now that data and applications are separate, that applications serve data and that the raison d’être of database applications is to manage and maintain data. The clear separation between data and database applications has been promoted by the emergence of Client/Server environments where there is physical separation between the Client, which works with the application, and the Server, which handles the data. The emergence of Object-Oriented concepts, another major revolution in the computer industry, also advocates a clear distinction between the data and the methods for manipulating the data.

Database Integrity and Maintenance

We have always known that information is the most important asset of a business. In order to reflect the business as accurately as possible, developers of database applications must focus on database design and maintenance. Because unreliable data is worse than no data, database integrity and maintenance is crucial.

Database Design: Business Rules and Referential Integrity

A good database design should reflect the real world, translating the customers’ business world, with all of its intricacies, into a set of related tables which must be both exact and exhaustive. However, a database is more than just a set of tables as it must include the implementation of the Business Rules which govern the accuracy and integrity of the data.

There are many types of Business Rules, depending on the nature of the business. For example, in a system which handles sales to customers, we may have the following Business Rules:

If we could classify Business Rules, we could re-use them in other applications, compile and market Libraries of Business Rules, and achieve greater automation and efficiency. Although we are not yet able to automate and standardize the kinds of Business Rules in our example, there is a sub-set of the Business Rules which does allow classification and systemization. This sub-set is the set of rules which governs Referential Integrity.

In a Relational Database, data is organized in tables which are interrelated. The relationships between tables are governed by two basic rules which are the essence of Referential Integrity:

Rule #1: Ensuring Unique Primary Keys

Each record in a table must be uniquely identified.

This means that the Primary key must be unique and therefore, the field or fields composing the Primary key should hold a unique value. This rule is quite straightforward and easily enforced. VFP implements Rule #1 by allowing you to define a Primary Index which ensures that no keys are duplicated in your table.

Rule #2: Ensuring the Validity of Foreign Keys

In each Child table, the Foreign key of the related table must point to the correct Parent table.

For example, if we have a Customer (Parent) table and an Orders (Child) table, there is a Foreign key in the Orders table. In the Customer table, the Primary key consists of a single field, Customer_ID. In the Orders table, we should have a Foreign key which consists of a single field, Orders.Customer_ID. Referential Integrity Rule #2 requires that each Order points to the correct Customer. This rule is more complex and more difficult to enforce.

Let us now analyze in greater depth how and when Referential Integrity Rule #2 should be enforced.

Ensuring the Viability of Foreign Keys (Rule #2)

There are three types of events that occur in a database for a given table: Deleting, Inserting, and Updating.

Delete

When we delete a record in the Customer (Parent) table, we must ask ourselves immediately: what happens to the related records in the Orders (Child) table, if there were any? There are four possible responses:

Ignore

In this case, it will be possible to delete a Customer even when Orders exist for the Customer. As a result, we will violate the Referential Integrity Rule because the Foreign key will point to a non-existent Customer.

Nullify

By nullifying, we change the value of the Foreign key in the related Child tables to a value which indicates that the record does not have a Parent, or to a value of a special record in the Parent table.

Cascade

The change in the value of the Customer will be reflected in the related fields in the Orders table.

Restrict

In this case, you will not be allowed to delete a Customer if he has any Orders.

Whatever action is taken will be the result of a business decision according to the needs and/or policy of the organization, and not as result of any theoretical considerations.

Insert

When a record is inserted in a table, you must verify that the Primary key is unique as dictated by Rule #1. You must then check that each Foreign key in the Child table points to the correct record in the Parent table to ensure that Rule #2 is respected for all of the Parent tables of the new record. When you insert a new Order, you must check that a Customer for this Order exists. Two possible actions can be taken:

Ignore

In this case, you will be allowed to add an Order to a non-existing Customer which will violate Referential Integrity Rule #2.

Restrict

You will not be allowed to insert an Order unless the Foreign key in Order points to an existing Customer.

Update

This is the most complex procedure. When you update a record, you must check if the Primary key was modified and if any of the Foreign keys were modified.

The Primary key was modified

First, you must make sure that if you modify the Primary key of the record, it is unique in accordance with Rule #1. Then you must check whether the Parent record has Children and if so, what action should be taken? For example, by changing Customer_ID, what happens to the Orders of the Customer? There are three possible responses:

Ignore

In this case, you will create Orphans as the Foreign key in the Child record will not point to a Parent. That is, the Orders will refer to a non-existing or incorrect Customer which is in violation of Referential Integrity Rule #2.

Cascade

By cascading, the modification of the Primary key will be carried to the Foreign key in the Child record. If you change the Customer ID, the Customer ID will be changed in the related records in the Order table.

Restrict

If there are children, you will not be allowed to modify the Primary key. For example, you will not be allowed to change Customer ID if the Customer has Orders.

One or more of the Foreign keys was modified

In the second instance, when updating a record, if any Foreign key or keys are modified, there are two possible responses with respect to the enforcement of Referential Integrity Rule #2:

Ignore

You will be allowed to update a record in the Child table that points to a non-existing record in the Parent table which violates Referential Integrity Rule #2.

Restrict

You will not be allowed to update an Order unless the Foreign key points to an existing Customer.

Triggers

Triggers serve to enforce the Business Rules reflected in the Referential Integrity Rules. VFP has implemented a powerful Trigger mechanism which greatly facilitates the implementation of Referential Integrity Rule #2 because the mechanism ensures that the relevant code is fired each time one of the three basic events occurs in the table. Furthermore, VFP provides the capacity to automatically generate the Referential Integrity Code which these Triggers must activate.

VFP Generated Code

The VFP RI Builder generates Referential Integrity Code, but does not allow access to the Builder itself in order to customize the code it will generate. That VFP does not allow you to modify the generation of the Referential Integrity Code is a serious shortcoming for two reasons:

Non-Modifiable Builder

If, for example, you want to implement the Nullify rule in Delete, VFP does not support this option. To do so, you are required to manually the generated code. If you regenerate the code because of changes in the physical data structure or because you want to modify a rule, VFP will regenerate the code completely disregarding the manual modifications made, and you will be required to rewrite all of your modifications. Needless to say, this is a maintenance nightmare.

Incorrect Referential Integrity Code

An even more serious issue is that the code generated by the VFP RI Builder will be correct in all cases ONLY under one condition. That is, all of the Primary keys of the tables must be single fields in order to achieve the desired results, or simply not to generate a bug. By using Surrogate keys, it is possible to ensure that the generated code is correct.

Surrogate Keys

A Surrogate key is a key which by definition requires that it not have "business" meaning and is used to uniquely identify a record in a table. While not required to be built on a single field, in practice it always is. In order for RI code to be correct, it requires that the Primary keys of the tables be built on a single key, but does not require that they have no "business" meaning.

Many "Gurus" advocate the use of Surrogate keys because of the following advantages:

Advantages

Using surrogate keys also has some disadvantages:

We may conclude that Surrogate keys are not always required. In most cases, it is appropriate to combine both surrogate and complex keys in an application.

Why VFP can generate correct RI code only with non-composite keys

As you may know, VFP allows you to establish a relation between tables by linking a Primary index in the Parent table to a Foreign index in the Child table. Although VFP does not refer to it as a Foreign index, that is what it is because it is an index whose key is built on the Foreign key for the relation. VFP basically links two expressions together (the keys of the indexes) which does not provide sufficient information in order to build correct RI code.

It is clear that in order to build correct RI code you must be able to:

The last point is the most important because if we know the precise matching between elements of the Primary key and elements of the Foreign key as well as the Primary and Foreign Index keys, it is possible to compute the keys to access. Keys to access are inverted keys. That is, the Primary Index key is expressed in terms of the Child table and Foreign Index is expressed in terms of the Parent table.

VFP does not capture and store this crucial information and therefore, the RI Code can be correct only when there are non-composite keys. When the keys are non-composite, there is no problem with matching since there is a one to one relation between the keys, and the inverted keys are identical to the index keys (except for the field names).

Additional Consequences

When you do not know the precise matching between the key elements, your Foreign key and Foreign Key index may be incorrect from the start. For example, the Primary key of Customer may be Numeric 6 while the Foreign key in Orders is Numeric 5. VFP will allow you to make this link which violates Referential Integrity. When the Foreign key and the Foreign Key index are correct, VFP does not automatically maintain them nor does it signal any problems that may arise when performing modifications to the physical database or changing the components of the Primary key. If you modify Customer ID to be of Type Character, VFP requires that you manually modify the Foreign key and the Foreign Key index in the Orders. Furthermore, VFP will not automatically join tables while designing a View when the key is composite.

xCase for Fox

xCase for Fox is a companion data modeling and maintenance tool for Fox 2.x and VFP which overcomes the shortcomings which are described in this article.

When you design a table, you begin by selecting the fields composing the Primary key. When you establish a relation into a Child table, the Primary key will automatically migrate as a Foreign key. There are many immediate advantages to this approach.

Although these are extremely important enhancements to Fox , they represent only about 5% of the powerful features xCase has to offer users of Fox 2.x and VFP. A more detailed presentation of all of the features and enhancements of xCase for Fox are beyond the scope of this presentation.

Example of Instances in which Visual Fox Pro Generated RI Code Fails

The following diagram and code illustrate these points:

Sample Database Model 1

The indexes of our sample Database are:

The key of the Primary Index of Orders is ORDER_ID

The key of the Primary Index of OrdItems is Str(ORDER_ID,6)+Str(LINE,3)

The Primary index of OrdItems is also used to establish the relation between Orders and Orditems .

That is, it also serves as a Foreign Index for the relation. Using this index, we can retrieve, for a given Order, its lines in OrdItems. In this sense it is a perfectly legitimate and valid Foreign Index.

1. Update a Parent table which has a Composite Primary Key
(Rule is "Cascade")

Example: Update of the EMPLOYEE table.

Given the above definitions, here is how VFP generates the code to handle updates to the EMPLOYEE table:

procedure __ri_update_employee:
...
llRetVal=riupdate("STR(DEPARTMENT,2)+STR(EMPLOYEE_N,3)",lcParentID)

The first parameter of riupdate is supposed to be a field name; however, the generated code used the key expression which the RI Builder assumed to be equivalent to the field name, when in fact it is composite, and wrote it as is into the stored procedure. The syntax is therefore incorrect and this code will produce a BUG !

There are 2 possible solutions:

Update the Code Manually

You can manually update the code to the following:

llRetVal=riupdate("DEPARTMENT",EMPLOYEE.DEPARTMENT)
llRetVal=llRetVal .AND. riupdate("EMPLOYEE_N",EMPLOYEE.EMPLOYEE_N)

Automatically generate the correct Referential Integrity code
using xCase for Fox

2. Update of a Child table which has a Foreign Index not compatible with the Primary Index of its Parent table (Rule is "Restrict")

Example: Update Orditems

VFP will generate the following code lines:

Procedure __RI_UPDATE_ORDITEMS:
lcChildID=STR(ORDER_ID,6)+ STR(LINE,3)
...
lcParentWkArea=riopen("orders","order_id")
...
llRetVal=SEEK(lcChildID,lcParentWkArea)

In this last line, lcChildId represents the concatenated string STR(ORDER_ID,6)+ STR(LINE,3). But the key of the order_id index is order_id which is Numeric:6

This SEEK will therefore produce a BUG!

There are 3 possible solutions:

Update the Code Manually

You can manually update the code to the following:

llRetVal=SEEK(ORDER_ID,lcParentWkArea)

Create an index on ORDER_ID for ORDITEMS and establish the relation between ORDERS and ORDITEMS using that index

This index will be REDUNDANT with the already existing and necessary Primary Index whose key is STR(ORDER_ID,6)+STR(LINE,3), causing an unnecessary overhead.

Automatically generate the correct Referential Integrity code
using xCase for Fox

3. Insert a record in a Child table which has a Foreign Index not compatible with the Primary Index of its Parent table (Rule is "Restrict")

All that was said in section 2 (above) also applies here.

Two types of problems can appear in sections 2 and 3:

The seek is performed with a value of a mismatching type as illustrated above.

The seek is performed with the correct type, but with a value which is "longer" than the Parent table Primary index key. This will result in the Parent record not being found, and the trigger will fail, although the Parent does exist.

4. Delete a record in a Parent table when its Primary Index is not compatible with the Foreign Index of the Child table (Rule is "Cascade")

Example: Delete Orders

VFP will generate the following code:

Procedure __ri_delete_orders:
STORE ORDER_ID TO lcParentID,pcParentID
...
SCAN WHILE STR(ORDER_ID,6)+ STR(LINE,3)=lcParentID AND llRetVal

As ORDER_ID is Numeric:6 the comparison is type mismatched and SCAN will result in a BUG.

There are 3 possible solutions:

Update the Code Manually

You can manually update the code to the following:

SCAN WHILE ORDER_ID=lcParentID AND llRetVal

Create an index on ORDER_ID for ORDITEMS and establish the relation between ORDERS and ORDITEMS using that index

This index will be REDUNDANT with the already existing and necessary Primary Index whose key is STR(ORDER_ID,6)+STR(LINE,3) causing an unnecessary overhead.

Automatically generate the correct Referential Integrity code
using xCase for Fox