The Functionality we expect of a Storage Management Layer
There are 6 major functional areas. They are described in more detail
and broken down further below.
-
Cataloging and
Database Functions for Files and Tape Volumes
-
Specification
and Control of Tape Volume Storage Locations
-
Control of various
parameters which govern the functional behavior and performance of the
system
-
Management of
the robot resources (including error recovery and tracking)
-
Movement of files
between users machine/local disk and tape in robot.
-
Operational procedures
to run and manage the robot, the data stored in the system, the tape drives,
and the "databases"
1) Cataloging and Database Functions
1.1) Maintenance of the primary "database" of file
to tape volume information
-
reliable and backed up
"database" of each volume and file. the volume location of each file,
and the position within each volume for each file
-
assurance that all movement
of files/deletion of files is correctly reflected in the "database"
-
notation of volume/file
status (e.g. if unreadable or errors)
-
tracking of the tape volume
format
1.2) Provide access to File namespace
and Volume Information
-
tools for users to easily
and intuitively view all files in the system along with other commonly
needed information about the file - such as owner, 'grouping', date written,
etc.
-
tools for users to easily
and intuitively view all volumes in the system
-
tools for viewing file/volume
related information such as all files on a volume
-
tools for exporting all,
or recently changed parts, of the file and volume database to the
data access layer (for performance reasons) or to remote institutions
2) Specification and control of Tape storage locations
-
ability
to handle several distinct robot storage locations
-
ability
to treat a single physical robot as multiple logical storage locations
-
ability
to handle various physical 'shelves' as possible storage locations
-
ability
to migrate tape volumes between storage locations
-
ability
to import volumes into storage locations (given sufficient meta-data in
an acceptable format)
-
ability
to export volumes (with their associate meta-data)
-
possibly
implementation of quota system for particular user/group within a storage
location
3) Control of parameters which govern the functional
behavior of the system
3.1) Control of parameters which govern allocation
and use of tape drives
-
possibly specicification of preference
or affinity between certain access modes, users or groups, and certain
subsets or classes of physical tape drives
3.2) Control of parameters which govern how files
are written to tape
-
specification of "groupings" or File
Families for files
-
specification of "width" for a grouping
- ie number of tapes allowed to be written in //
-
possibly specification of a list of
files to be treated logically as one 'work unit'
-
specification of "append to tapes"
policy
-
specification of file wrappering format
-
associaton of tape volumes to a particular
file family and tape library
3.3) Control of parameters which govern how files
are read from tape
-
specification of error/retry behavior
3.4) Control of parameters which govern access
to files and volumes
-
access control based on user/group
for each file and each file family
3.5) Control of parameters which govern network
routing between storage system movers and client machines
-
ability to choose optimal path to load
balance in the case of multiple network interfaces on a single machine
3.6) Ability to set defaults for many/most of
the above parameters
-
storing of default values to be used
for all transfers/work done for
-
a particular user/group
-
a particular file family
-
a particular storage location
-
? possibly others
4) Management of the robot resources (including error
recovery and tracking)
-
Maintenance of a queue of work to do
in case of excess demand on the robot or on the tape drives
-
Ability to specify policies governing
the ordering and manipulation of that queue of work, and therefore the
delay seen by the user, including (but not limited to)
-
specification of a priority for all
work requested
-
specification of a priority increment
and delta time in order to implement a priority boost/aging algorithm (or
equivalent mechanisms)
-
specification of policy for dismounting
of tapes after work completed
-
other possible parameters to be decided
based on tests/tuning of system
-
Cleanup of work queue in case of errors
and cancelled requesting processes
-
Allocation of tape drives to units
of work in the queue
-
Retry of errored file reads/writes
up to specified maximum
-
Repeat attempts at failed work using
alternate tape drive resources
-
Notation and tracking of all work done,
all errors encountered, all retries performed
5) Movement of Files
between users machine/local disk and tape in robot.
-
ability
to transfer files from any network-connected machine to/from tape drive
in robot
-
ability
to transfer files reliably and with error detection, and correction by
retry
-
ability
to transfer files at > N% of raw tape bandwidth. N is probably about 50.
-
nothing
done to exclude the possibility of adding an intermediate disk cache layer
to rate adjust movement of data from tape drive to end user data sink -
should that become necessary
-
nothing
done to exclude the possibility of cooperation with the data access layer
as a distributed disk cache of recently requested data
6) Operational procedures to run and manage the robot,
tape drives and "databases"
6.1) Robot and Tape Drive Hardware
-
Well defined and safe procedures for
dealing with repair and maintenance of the robot itself
-
Procedures for monitoring the status
of tape drives and for replacing faulty drives with drives which
have been checked and tested through another well-defined process.
6.2) Operator procedures for import/export of batches
of tapes
-
Interface between Storage Management
system and Operator work/console system
-
Definition of policies for executing
batch imports/exports of tape volumes
6.3) Quality assurance procedures to assure integrity
of the data and metadata
-
Routine backup of "database"
-
Maintenance of a water-tight redo log
of all transaction to be used in case of errors
-
Recovery procedures in place, tested
and executed in case of failures
-
Ability to recover files and data on
tape in case of complete and catastrophic loss of all "databases"
-
Routine checks on readability of sample
of tapes - maintenance of statistics