dacs.doc electric

 

Joins, The Old Way

By Richard DiFranco

 

ANYONE who has done any database development has come across joins. The join seems to be unique to database systems but we developers use them in other processing, as you will see.

My introduction to joins came in a casual explanation of some other point by an instructor who said, "You are all familiar with joins, Right?" Wrong.

The usual "Step 3 before steps 1 and 2". Joins are a variation on sets which is how database queries work. In putting this together, I have assumed the reader is familiar with SQL(Structured query language). The SQL statement in which joins are relevant looks something like this:

Select Name
From PersFile
Where IDNum = '10203'

The join is used in the FROM clause.

Select PersFile.Name,SaleFile.saleamount
From PersFile join persFile.EmpId SaleFile.Empid
Where PersFile.IDNum = '10203'

Access and SQL

My latest contact with joins has been in trying to learn Access and SQL. It occurred to me that one way of understanding joins would be to fall back to my third generation language experience and try to duplicate the actions of a join. A sort of "how was it done back then".

Definition

The best definition of join that I have found is "Joins bring tables together". Joins show how tables are related. Or, even better, joins relate tables. Any two given tables have a relationship, even if it isn't readily apparent. We usually talk about a common field. That is, there is a field in each table which contains the same value. This field is used in matching records. Sometimes, there are no fields with matching values, but there is still a relationship. This relationship results in matching every record in the first table to every record in the second. A situation known to the scientists as the Cartesian product. We must specify which fields to use in joining the tables.

Join Types

There are two basic types of joins, inner and outer. A query using an inner join will give a result set containing only those rows from both tables which had matching values in the fields on which the tables were joined. That is, records in either table that have no matching value in the other table will Not be included in the result set. Another way to say it, they will not be part of the answer.

Right and Left Outer Joins

The other type of join is the outer join. Outer joins have two sub types, right and left. Again there is a matching of values in a "joining" field, but with outer joins all of the records from one of the tables will be included regardless of matches in the other table. When a left join is specified, all of the records in the first (left-hand) table is included even if there is no match in the other table. When a right join is specified, all of the records in the second (right hand) table are included even if there are no matches. Which table is right or left is determined by which table comes first in the "From" clause.

3GL Explanations

(Or "What is Access doing for me?")When running a batch program, at least one file is read sequentially from first record to last. This happens whether we are using a database or a third-generation language.

Inner Join

Finding a third generation language example of an inner join is difficult. Let's take the case in which we have two personnel files. The first file contains certain general information and the second has a record of classes. We need a report with information from both files. Our program reads the general personnel file from first record to last. We know that one field in the first file contains the employee id. We know that the second file also has a field containing employee id. The employee id is used as a key to the records in the second file. Thus we can use this key to directly read records without reading all records. This, of course, is known as random access and how it works could be the subject of many papers. As each record in the first file is read, the program extracts the employee ID and uses that to retrieve records from the second file. If there is no record in the second file with the current employee Id, the record from the first file is ignored or discarded. That sounds like a long explanation. What it means is, if there's no match then don't use it, don't even report it, go on to the next record.

You can see from the above discussion that some records from the first file and some records from the second file will not be used. All of the records in the first (I call it the driver) will be read but only those with a match will be used. Some of the records in the second file won't even be read.

Outer Joins

An outer Join, in the third generation world, reads every record in the first file and tries to match records in the second file in the same way as inner joins. As in the inner join case, the employee Id is used to randomly read the second file. However, with outer joins, All the records from the first file will be used in the output file, even records with no match in the second file. Not all records in the second file will be used, and as with the inner join example, not all records will be read.

Left or Right?

In the third generation world there is no specific concept of left file or right. The decision as to which file is to be the driver (read from first to last) is made by the programmer.

Other Join Terms

You may hear of other joins like equi-joins or non-equi joins. The typical inner join is an equi-join in which the records that match on the "common" key are used in the result. A non-equi join uses those records that have no match. The one I like is the double outer join in which a key is taken from the first file and used to read a second file to get a key field from the second file to use as a key in reading a third file.

Sybase has syntax for a "Natural" join. which builds a join for you in the Where statement based on the fields which have the same name and characteristics.

Have We Learned Anything?

The expressed object of this writing was to illustrate joins in terms of Third generation languages in order to gain a better understanding of what is happening in a database query. Many books on the subject tell us we don't have to worry about the join because, whether or not we explicitly enter the join, the system will handle it for us. I have found that knowing what is happening under the hood can help to prevent gross errors and bolster confidence.


Richard F. DiFranco is a new contributor to dacs.doc. You may e-mail Richard at Feo247@aol.com.

BackHomeNext