In this topic, we described about the below sections -
VSAM constructs implemented some essential concepts and terms. Before learning more about VSAM, it is good to know these concepts and terms. Those are -
- Logical record
- Physical record
- Control Interval
- Spanned records
- Control area
- Alternate Indexes
Logical record -
A logical record is an information unit used to store the data in a VSAM or retrieve it from VSAM or any other data set type. The application program sees and uses logical records to access and process the data through I/O operations.
Logical records are a set of fields(that includes key fields) with logical descriptions. The application programmer designs the logical records.
A logical record can be of a fixed-length or a variable-length record. VSAM supports both based on the file organization type (file type).
For Example - A VSAM KSDS file with 47 bytes length logical record defintion as follows -
DATA DIVISION. FILE SECTION. FD KSDS-1. RECORD CONTAINS 47 CHARACTERS. BLOCK CONTAINS 470 CHARACTERS. DATA RECORD is KSDS-RECORD-1. RECORDING MODE IS F. 01 KSDS-RECORD-1. 05 KSDS-KEY PIC X(03). 05 KSDS-DATA PIC X(44).
In the above example, KSDS-RECORD-1 is the logical record used to process the physical VSAM file.
Ways to identify logical records -
VSAM has these ways to identify logical records -
- Key field - Key field is an essential and unique field. Its contents are used to identify the logical record.
- Relative byte address (RBA) - RBA is an offset of logical record first byte. For example - Assume if the first logical record RBA is zero, then the second logical record RBA is the length of the first record, the third record RBA is the length of the first two records, and so on.
- Relative record number (RRN) - RRN is the relative number of the logical record. For example - RRN is 2, which means it is the third record in the dataset.
Physical record -
A physical record is an actual record that is stored on the file. A physical record may be a set of logical records. A physical record is also called a physical block or simply a block.
For example -
Control Interval -
Control interval (CI) is the fundamental block of every VSAM dataset. A CI is the contiguous memory area of DASD used to store the logical records and control information about the records in the same CI. A CI is a set of physical blocks, and these are used to read or write during the I/O operation.
The CI size can be from 512 bytes to 32 KB. The CIs size can vary from one data set to another data set.
However, all the CIs within the data component of a cluster should have the same length.
The physical record(block) size determines by VSAM based on CI size for better use of the 3390 track.
Small CI is suitable for random access because small CI avoids bringing unneeded logical records copying to memory. Large CI is suitable for sequential access because large CI gets more records in a single read, reducing the number of reads indirectly.
The CI components impact the CI size decision, and those are -
- No. of logical records are stored in CI.
- Free space for records insertion.
- Control information - Control information is a combination of two fields -
- Control Interval Definition Field (CIDF) - CI has only one CIDF. It is a 4-byte field and contains information about the location and amount of free space in the CI.
- Record Definition Fields (RDFs) - CI can have several RDFs. It is a 3-byte field and describes about the record length. For fixed-length records, there are two RDFs, one with the fixed-length size and the other with the number of records with that length. For the variable-length record, there is one RDF for each logical record.
The CI components and properties can vary depending on the data set type. For example, a VSAM LDS does not have CIDFs and RDFs in its CI.
CI size can be decided in three ways -
- User-defined size by specifying in AMS DEFINE command.
- VSAM determined.
- The administrator-defined CISIZE.
Control area -
Control Area (CA) is a unique concept to VSAM. A CA is formed by two or more CIs together into a fixed-length contiguous memory area of DASD.
Generally, A CA size is 3390 cylinders (15 tracks), and the minimum size is one track. The maximum size of a CA is 16 tracks when the data set is stripped.
The CA size is completely defined when the data set size during the dataset definition. A VSAM data set always formed with multiple Control Areas (CAs).
VSAM datasets are extended in the units of CAs (i.e, multiples of CAs). Spanned record maximum size is CA size.
In sequential process, a single I/O operation wont cross the boundaries of CA.
Spanned records -
Spanned records are larger logical records than the CI size. To have spanned records, DEFINE CLUSTER should use SPANNED attribute when defining the data set. Spanned records can store on multiple control intervals (CIs).
RDFs specified the record is spanned record or not.
- A spanned record always begin on a control interval (CI) boundary and fills one or more CIs within a single CA.
- A spanned record cannot share CI with any other records.
- CIs free space at the end of the spanned record not filled with any other record.
- The CIs free space at the end of the spanned record is used only to extend the spanned record.
- The maximum size of the spanned record is the control area (CA) size.
Spanned records are required when the application program using long logical records.
A spanned record can use in the data component of the Alternate Index (AIX) cluster.
If the spanned records are used for KSDS, the primary key should be within the first CI.
Alternate Indexes -
Alternate indexes (AIXs) allow access to the logical records sequentially or directly using key fields (other than the original key field). Each alternate index is a KSDS cluster with an index component and a data component.
AIX eliminates the requirement of storing the same data in different orders in multiple clusters for different applications.
The AIX data component contains a set of primary keys that are associated with the alternate key.
Important Notes -
- Any key except the primary key in the base cluster can use as an alternate key.
- Alternate key overlaps any other key (primary in KSDS or any other alternate key).
- The alternate key can have repeated or duplicate values.
- An alternate key value can have more than one primary key value. For example - the primary key is the student number, and the alternative key is class. i.e., A class has several students.
- The AIX cluster data component contains the alternate key value (pointers to the data component in a base cluster) and all its corresponding primary keys.
- The primary keys are in ascending order within alternate index value, and also the alternate index is in ascending order.
- If an alternate key has many primary keys, consider the AIX as spanned and compressed.
AIX can define and used in three ways, and those are -
- Create AIX
- Build AIX
- Define Path
IDCAMS utility is used to define and create AIX. IDCAMS DEFINE used to define AIX and the AIX is created using the BLDINDEX command. IDCAMS DEFINE PATH command is used to define and name the path.
THe BLDINDEX command scans the associated base cluster sequentially. The process extracts the alternate key values and their corresponding primary keys (for a KSDS) or record RBAs (for an ESDS) together to form alternate index records. These records sort in ascending order based on alternate keys. The constructed alternate index records are written to alternate index clusters.
A path should be defined in the catalog to access KSDS or ESDS through the alternate index. A path means alternate indexes mapping to the base cluster. A path name refers to the base cluster and alternate index pair.
A sphere is a group of base clusters and its associated clusters (AIX). The associated clusters are the alternate indexes (AIX) of the base cluster. i.e., base cluster plus all its AIXs.
CI and CA split occur in KSDS and VRRDS datasets. These splits occur due to a new record insertion or increasing the length of the already existing record. If there is no enough space to process the above two requests, the CI is split. Approximately half of the records of these CIs moves to other free CI. In the CI split, both CIs belongs to the same CA.
Similarly, if there is no enough space in all the CIs of a CA, then CA is split. Approximately half of the CIs of this CAs data moves to the other free CIs of different CA.
The split occurrence worsens the performance. However, the subsequent split decreases the probability of having another one.