|
|
|
Search Mechanics
A typical search looks like this:
| Set |
Records |
|
Search Statement |
| #1 |
3 |
|
Consumption AND Fruit |
This search retrieves all the records in the database containing both the words consumption
and fruit. The records retrieved by the search statement are held in a set.
The same search could be entered term by term and the resulting sets
combined in the last step. The result is the same:
| Set |
Records |
|
Search Statement |
| #1 |
31530 |
|
Consumption |
| #2 |
3597 |
|
Fruit |
| #3 |
3 |
|
#1 AND #2 |
This second search provides a better illustration of the mechanics of a search,
although the process is the same in both searches. When the computer conducted the first
search, it temporarily created the equivalent of sets #1 and #2. In the first search these
sets were never displayed and were erased when the search was complete.
 |
- Question:
- What happens when you ask the computer to find all the records containing
both the words fruit and consumption - does the computer actually scan
each record to see if both words are present?
-
- Answer:
- No, the computer never actually searches the records themselves. To find out
what actually happens read on ...
|
Recall that:
Databases are made up of files,
and files are made up of records,
and records are
made up of fields.
Bibliographic databases have at least two files:
- A file of records
- An index file.
The index file contains an alphabetic list of every "word"
that occurs in the records.
Associated with each word in the index is a list of every record number in
which that word occurs. Every record in the database is assigned a unique record number.
This picture illustrates the two files in a database:
When a search for all records containing both the words FRUIT
AND CONSUMPTION is submitted the following steps occur:
The computer looks up fruit in the index.
When it finds fruit it retrieves all the associated record
numbers and holds them as a set.
The computer then looks up consumption in the index.
When it finds consumption it retrieves all the associated record
numbers and holds them as a set.
It compares the record numbers in the two sets.
Any record number which occurs in both sets is a hit and
that number is put into a third set - the retrieval set.
Watch this
illustration
of how two sets of numbers are processed in an 'AND' operation.
When the records are displayed from the retrieval set, the records are
fetched by record number from the database's records file.
This chart shows the sets and record numbers for this process:
| Set |
Word |
Hits |
|
Record Numbers |
|
|
|
|
|
| A |
CONSUMPTION |
31530 |
|
70, 256, 311, 467,
829, 1625,
2841, 3527, 4173, 4431,
4918, 5081, ... |
| B |
FRUIT |
3597 |
|
54, 256, 467,
829, 898, 2412, 4137, 4173, 5081, 6041, 7959, 8166, ... |
| C |
A AND B |
5 |
|
256, 467,829,4173, 5081 |
|
This Venn diagram shows the outcome of the search in yellow:
Now that you understand the mechanics of searching, you should
see that Venn diagrams provide a simple but accurate picture of the outcome of a search.
This is one of the records retrieved by the "fruit and
consumption" search:
|
| RN: |
256 |
| TI: |
Fruit and vegetable consumption
in later life. |
| SO: |
Age-Ageing. 1998 Nov; 27(6): 723-8 |
|
This records was not retrieved because it doesn't
contain the word fruit (singular). If you look at the sample
index again, you will see that record number 311 is in the list for both consumption
and fruits (plural), but it is not in the list for
fruit.
|
| RN: |
311 |
| TI: |
Factors affecting consumption of fruits
and vegetables by low-income families. |
| SO: |
J-Am-Diet-Assoc. 1994 Nov; 94(11): 1309-11 |
|
 |
Moral:
What the computer does is very mechanical. You have to provide the strategy for
a successful search.
If you keep in mind that a successful search is "just" a matter of
finding the right term combinations, you are on your way to becoming a good searcher.
|
|