Gears Search Mechanics

A typical search looks like this:

Set Records   Search Statement
#1 3   Consumption AND Fruit

This search retrieves all the records in the database containing both the words consumption and fruit. The records retrieved by the search statement are held in a set.

The same search could be entered term by term and the resulting sets combined in the last step. The result is the same:

Set Records   Search Statement
#1 31530   Consumption
#2 3597   Fruit
#3 3   #1 AND #2

This second search provides a better illustration of the mechanics of a search, although the process is the same in both searches. When the computer conducted the first search, it temporarily created the equivalent of sets #1 and #2. In the first search these sets were never displayed and were erased when the search was complete.


Confused Bean
Question:
What happens when you ask the computer to find all the records containing both the words fruit and consumption - does the computer actually scan each record to see if both words are present?
 
Answer:
No, the computer never actually searches the records themselves. To find out what actually happens read on ...

Recall that:

Databases are made up of files,
     and files are made up of records,
          and records are made up of fields.

 

Bibliographic databases have at least two files:

  1. A file of records
  2. An index file.

The index file contains an alphabetic list of every "word" that occurs in the records.

Associated with each word in the index is a list of every record number in which that word occurs. Every record in the database is assigned a unique record number.

This picture illustrates the two files in a database:

When a search for all records containing both the words FRUIT AND CONSUMPTION is submitted the following steps occur:

  1. The computer looks up fruit in the index.

    1. When it finds fruit it retrieves all the associated record numbers and holds them as a set.

  2. The computer then looks up consumption in the index.

    1. When it finds consumption it retrieves all the associated record numbers and holds them as a set.
       

  3. It compares the record numbers in the two sets.

    1. Any record number which occurs in both sets is a hit and that number is put into a third set - the retrieval set.

    2. Watch this illustration of how two sets of numbers are processed in an 'AND' operation.
       

  4. When the records are displayed from the retrieval set, the records are fetched by record number from the database's records file.

This chart shows the sets and record numbers for this process:

Set Word Hits Record Numbers




A CONSUMPTION

31530

70, 256, 311, 467, 829, 1625, 2841, 3527, 4173, 4431, 4918, 5081, ...
B FRUIT

3597

54, 256, 467, 829, 898, 2412, 4137, 4173, 5081, 6041, 7959, 8166, ...
C A AND B

5

256,  467,829,4173,  5081

This Venn diagram shows the outcome of the search in yellow:

Fruit and Consumption Diagram

Now that you understand the mechanics of searching, you should see that Venn diagrams provide a simple but accurate picture of the outcome of a search.

 

This is one of the records retrieved by the "fruit and consumption" search:


RN: 256
TI: Fruit and vegetable consumption in later life.
SO: Age-Ageing. 1998 Nov; 27(6): 723-8


This records was not retrieved because it doesn't contain the word fruit (singular). If you look at the sample index again, you will see that record number 311 is in the list for both consumption and fruits (plural), but it is not in the list for fruit.


RN: 311
TI: Factors affecting consumption of fruits and vegetables by low-income families.
SO: J-Am-Diet-Assoc. 1994 Nov; 94(11): 1309-11

 

Moral Bean Moral:

What the computer does is very mechanical. You have to provide the strategy for a successful search.

If you keep in mind that a successful search is "just" a matter of finding the right term combinations, you are on your way to becoming a good searcher.