Adding user data to the SAM system is straightforward, but needs
to have easily enforced rulesand well understood proceedures in order
for the data to be useful, managable, and not interfere with the official
datasets. There are several problems associated with adding user data to
the system. Unless the user is trusted, there is no way of knowing that
what the user is putting in is really what is described by the metadata.
Users tend to do processing steps which are not officially sanctioned by
the collaboration, like using untested versions of applications,
or unofficial input parameters, et cetera. User data needs to be
directed to particular output tapes in a way that they neither step on
official data, nor on each other. Users can have voratious appetites for
storage especially if it is easy to use and apparently free. Users tend
to not be vary cognizant of issues like file size and can fill up tapes
with thousands of tiny files that are painful for the storage system to
store and retrieve. Users may not employ "standardized" file naming conventions
so it is important that the metadata they provide has meaning, and in any
case the names must be unique. In order for the data to be useful for others,
it is important to maintain a complete history of the processing chain
in SAM, and this may be difficult to enforce. Finally, cleaning up obsolete
user data will be very difficult, especially if it is not organized well
in the beginning.
Procedures for adding official data to SAM have been in place for many years and are well understood. They include two basic kinds of data 1. import, and 2. project. Import data refers to data moved into the SAM system that was created outside of the control of SAM, i.e. not using SAM analysis projects. An example of this is a Monte Carlo file that has been created at a remote processing center without using a sam station and with no sam protocal. Each data file to be imported includes a description (metadata) file and a parameter file further describing its contents. Data that has been processed within a SAM station, in a project, has process and project information already in the SAM system and the description files are slightly more streamlined since much of the metadata has already been entered into SAM reguarding application and other vital information.
A special category of file has been created for storing online information
like epics, luminosity and other files. This has a data type called "online_archive".
This is convenient because the user has complete freedom to store data
as needed and to tar groups of files or directories together before storing
them. However, we fear that a similar "offline_archive" definition
might be abused as every manor of data would be stored including backups
of project areas, temp areas, theses, downloads from the internet,
family photo albums, et cetera. Of course, it is possible to abuse any
of the user storage mechanism in this way, just not quite as convenient.
In order to address these issues, the following policies need to
be established: