When you store a file with SAM, you store some data with it in order to find it later. A filename is unique in all places and times in SAM so if you choose "test" for your filename, you are likely to find you get an error. Therefore you should choose a filename that is unique -- usually by adding the unix time stamp and your station name or location. You should never use the filename with wildcards to search for files: that is what the metadata are used for.
The minimalist version of a metadata file contains the program name, version, number of events, time produced, your name, where it was produced, the type of run, the group, stream and some descriptive text that may be the same for your private dataset. You can also put a reference to some web page where additional information on the dataset is kept.
An example of this is shown here; while it says Monte Carlo and Generator, this is only a temporary kludge that you must use for all files. Use this anyway for a real file or ntuple. They are left here for illustration that parameters can be anything. More documentation about metadata can be found at: http://cdfdb-prd.fnal.gov/sam_user_pyapi/examples/
from import_classes import *
appfamily=AppFamily('generator', '1.00', 'generator')
filename = 'rs-1ev-test-031106-1910.root'
t = SAMMCFile(filename,Events(1, 2, 2),
"generated",
appfamily,
"01/21/2003 10:59:09",
"01/21/2003 11:20:08",
18,
{
'Global':
{ 'ProducedByName':'mrenna',
'OriginName':'fermilab',
'Phase':'unspecified',
'FacilityName':'fixed-target-farm',
'ProducedForName':'mrenna',
'RunType':'Monte Carlo',
'GroupName':'cdf',
'Stream':'m',
'Description':'test mc',
},
'CDF':
{ 'DataSet':'stink2',
'html':'http://cepa.fnal.gov/personal/mrenna/',
}
'Generated' :{
'AppFamily':'generator',
'FirstEvent':'1',
'AppVersion':'1.00',
'LastEvent':'2',
'NumRecords':'2',
'AppName':'generator',
'TotalEvents':'2',
'RunNumber':54321,}
}
)
from import_classes import *
appfamily=AppFamily('generator', '1.00', 'generator')
filename = 'rs-1ev-test-031106-1910.root'
t = SAMMCFile(filename,Events(1, 2, 2),
"generated",
appfamily,
"01/21/2003 10:59:09",
"01/21/2003 11:20:08",
18,
{
'Global':
{ 'ProducedByName':'mrenna',
'OriginName':'fermilab',
'Phase':'unspecified',
'FacilityName':'fixed-target-farm',
'ProducedForName':'mrenna',
'RunType':'Monte Carlo',
'GroupName':'cdf',
'Stream':'m',
'Description':'test mc',
},
'CDF':
{ 'DataSet':'stink2',
'html':'http://cepa.fnal.gov/personal/mrenna/',
},
'Pythia':
{ 'cdfrelease':'testpycdfrelease',
'collider':'testpycollider',
'comments':'testpycomments',
'decaytable':'testpydecaytable',
'energy':'testpyenergy',
'et_jet_cut':'testpyet_jet_cut',
'fact_scale':'testpyfact_scale',
'lamqcd5':'testpylamqcd5',
'numrecords':'testpynumrecords',
'partons':'testpypartons',
'pdf':'testpypdf',
'physicsprocess':'testpyphysicsprocess',
'picobarns':'testpypicobarns',
'qcd_order':'testpyqcd_order',
'qcd_power':'testpyqcd_power',
'qed_order':'testpyqed_order',
'qed_power':'testpyqed_power',
'ranseed1':'testpyranseed1',
'ranseed2':'testpyranseed2',
'renorm_scale':'testpyrenorm_scale',
'runnumber':'testpyrunnumber',
'useevtgen':'testpyuseevtgen',
'useqq':'testpyuseqq',
'validated':'testpyvalidated',
'version':'testpyversion',
'webpage':'testpywebpage',
},
'Herwig' :
{ 'cdfrelease':'testhercdfrelease',
'collider':'testhercollider',
'comments':'testhercomments',
'decaytable':'testherdecaytable',
'energy':'testherenergy',
'et_jet_cut':'testheret_jet_cut',
'fact_scale':'testherfact_scale',
'lamqcd5':'testherlamqcd5',
'numrecords':'testhernumrecords',
'partons':'testherpartons',
'pdf':'testherpdf',
'physicsprocess':'testherphysicsprocess',
'picobarns':'testherpicobarns',
'qcd_order':'testherqcd_order',
'qcd_power':'testherqcd_power',
'qed_order':'testherqed_order',
'qed_power':'testherqed_power',
'ranseed1':'testherranseed1',
'ranseed2':'testherranseed2',
'renorm_scale':'testherrenorm_scale',
'runnumber':'testherrunnumber',
'validated':'testhervalidated',
'version':'testherversion',
'webpage':'testherwebpage',
},
'Alpgen' :{
'collider':'testalpcollider',
'comments':'testalpcomments',
'dr_jj_cut':'testalpdr_jj_cut',
'dr_lj_cut':'testalpdr_lj_cut',
'energy':'testalpenergy',
'et_jet_cut':'testalpet_jet_cut',
'et_lep_cut':'testalpet_lep_cut',
'fact_scale':'testalpfact_scale',
'lamqcd5':'testalplamqcd5',
'll_mass_cut':'testalpll_mass_cut',
'numrecords':'testalpnumrecords',
'partons':'testalppartons',
'pdf':'testalppdf',
'physicsprocess':'testalpphysicsprocess',
'picobarns':'testalppicobarns',
'qcd_order':'testalpqcd_order',
'qcd_power':'testalpqcd_power',
'qed_order':'testalpqed_order',
'qed_power':'testalpqed_power',
'ranseed1':'testalpranseed1',
'ranseed2':'testalpranseed2',
'renorm_scale':'testalprenorm_scale',
'runnumber':'testalprunnumber',
'validated':'testalpvalidated',
'version':'testalpversion',
'webpage':'testalpwebpage',
'weight':'testalpweight',
},
'Madgraph' :{
'collider':'testmadcollider',
'comments':'testmadcomments',
'dr_jj_cut':'testmaddr_jj_cut',
'dr_lj_cut':'testmaddr_lj_cut',
'energy':'testmadenergy',
'et_jet_cut':'testmadet_jet_cut',
'et_lep_cut':'testmadet_lep_cut',
'fact_scale':'testmadfact_scale',
'lamqcd5':'testmadlamqcd5',
'll_mass_cut':'testmadll_mass_cut',
'numrecords':'testmadnumrecords',
'partons':'testmadpartons',
'pdf':'testmadpdf',
'physicsprocess':'testmadphysicsprocess',
'picobarns':'testmadpicobarns',
'qcd_order':'testmadqcd_order',
'qcd_power':'testmadqcd_power',
'qed_order':'testmadqed_order',
'qed_power':'testmadqed_power',
'ranseed1':'testmadranseed1',
'ranseed2':'testmadranseed2',
'renorm_scale':'testmadrenorm_scale',
'runnumber':'testmadrunnumber',
'validated':'testmadvalidated',
'version':'testmadversion',
'webpage':'testmadwebpage',
'weight':'testmadweight',
},
'Generated' :{
'AppFamily':'generator',
'FirstEvent':'1',
'AppVersion':'1.00',
'LastEvent':'2',
'NumRecords':'2',
'AppName':'generator',
'TotalEvents':'2',
'RunNumber':54321,}
}
)
When you have store metadata for a file, you can retrieve it with the command:
sam get metadata --file=<myfile>
First, a cdf dataset has a more modern (especially in Grid) concept of a "data collection". That is a group of files that are common in their properties. For our implementation of SAM in CDF, we have maintained this as a parameter although more sophisticated ways of handling this are being hammered out.
A SAM dataset definition corresponds to a selection of files meeting some criteria set by a variety of parameters that describe them based on the declarations made in the metadata.
A very simple way through the morass is to use the parameter cdf.dataset in defining a sam dataset and that is the end of the story. Using more sophisticated combinations of parameters requires care that one has specified the collection of files uniquely. Tools exist to allow you to examine files you care to inspect, but this is indeed a complex operation.
Once a dataset definition is made, it can be used to specify the files to be delivered to a project. When that delivery has been done, sam keeps permanent record of the project that was run and it is possible to always go back to find out what files were used. This is called a "dataset" within the context of sam or a "project snapshot" within the context of sam.
When a dataset definition is made, it is possible to immediately in the definition with "sam create dataset". Once this is done the definition is frozen. This is useful if you want to make sure that your definition is not modified - by someone else!