(If you are going to edit this file please be aware of the notes).
| Category of Task | WBS# | Task | SAM packages to be worked on |
SAM Release |
Estimate of amount of person weeks of work From 01/15/2001 |
Comments | Who? | Date Done task |
| Planned Major Features | 10.2 | Integration of SAM and batch system | station user |
V3 | 4 weeks | In progress | Igor | |
|   | 12.1 | Resource management of disks | station | V4 or V5 |   |   |   | |
|   | 3.12 | Design of file merge/split/streams/luminosity/trigger and how they all fit together | db | V3 | 5 1 hour meetings in next 3 weeks
+ 5 hours * 5 people thinking;
(+ 2 weeks implementation see 10.11) |
Must get the conceptual basis for file/run/stream/triggers/applications correct before we start the run. Incorporate needs from 22.10 | Matt, Julie, Lee, Vicky, Heidi, Igor |
|
|   | 10.3 | Resource management - Optimizer crude version to order files in groups according to order on tape (also requires data in db to be fixed to store order on tape) | station | V3 | 2 weeks | ListAlso requires small changes to fss and db server to store tape location cookie(and to fix tape type). Depends on script to do initial load of location cookies. | Sinisa
(Matt) |
|
|   | 12.2 | Resource management for tape and network resource control and allocation | station user |
V4 or V5 |   | May require changes to sam submit | Sinisa, Igor | |
| 12.2 | Use of job and resource description language for Grid | station
user |
V5 | 4 months | Integration with work done in context of PPDG | Igor,
TBD | ||
|   | 10.4 | Sam_manager - fix to use new fss | manager | V3 | 10 days | work list | Sinisa | |
|   | ? | Sam_manager - further enhancements including ability to ping client for status | manager | ? | Few days | Depends on D0 interactive framework using CORBA | Sinisa | |
|   | 12.3 | Handling of pass-through write cache - using allocation of cache space by station and better routing info in db. | station
db db_server |
V5 | ~ 2+ weeks | Requires supporting db and db serverchanges | Igor, Steve |
|
|   | 12.4 | Pass-through of file through intermediate station cache to external station - on read of file | station
db db_server |
V5 |   | Will survive without this only because all raw and reconstructed data will be on d0mino cache and therefore accessible from there. If we can use bbftp and somewhat control the network interface used then that will also alleviate the need for WBS |   | |
|   | 11.1 or 12.4 | Pick Events Server | station | V4 or V5 | 2 months |   |   | |
|   | 10.5 | Sam_user command for pickevents - to locate file | user | V3 | 3 days | Wrapper for query only returns file
or files that event(s) are in
Sam -pickevents -evnum= Sam -pickevents -evlist=filname |
Lauri | |
|   | 11.2 or 12.5 | Pick Events access (framework and other user access) | user
manager |
V4 or V5 | 1 month |   |   | |
|   | 10.6 | Station for Farm system - remove stationless mode | station | V3.1 | 2-3 weeks, combined with
10.6 |
Need to work with Farms group to figure out how to deal with disk cache and combine this work with enhancements needed for linux analysis clusters | Igor | |
|   | 10.7 | Initial Station for multi-node linux analysis system | station
user admin |
V3.1 | 2-3 weeks, combined with 10.5 | Move data intra-station always and don't attempt much optimization in terms of where a job runs | Igor | |
|   | 11.3 | Full station for multi-node linux analysis system - including folding in resource management of taking data to job or job to data and also looking at migration of jobs, and collection of results from parallelized analysis job. | station
user admin |
V4 | Many weeks | Combine with resource management and more general studies for the future of how all this work | Igor | |
|   | 10.8 | Design of syntax for remote storage location and implementation of interface to different file transfer programs. Need to identify what MSS file is in, decide on namespaces, also way of identifying transfer tool to be used. Release for bbftp, other? Document | station
db db_server |
V3 | 2 weeks | This work is in progress with Lyon.
For use depends on bbftp infrastructure in xxxx.
Not clear if it needs any changes to db or db server - merely conventions on names? |
Igor, Lyon people |
|
|   | 11.4 | New Standard Information services
prototype |
infoserver | V4 | 1 month | Demonstrate initial prototype of new inforserver and framework | Sinisa | |
| 12.7 | Design and implement distributed information services and integrate into all components of SAM | infoserver
all |
V5 | 3 months | Full information about whole system | Sinisa/Lauri | ||
|   |   | Export of database (meta-data) to remote institutions + re-synch of metadata |   | V7+ |   | only do this if really needed |   | |
|   |   | Prompt reconstruction pipeline - ? open-ended datasets? |   | V7+ |   |   |   | |
Online: Development + Testing and Operations |
Plan, Feb 01 | Implement logging of data from online twice - to mass store and then directly to d0mino private disk area or to 2 tapes in parallel. |   | V3 | 2-3 weeks | Will use rcp for local transfer plus changes to online to use -copy=2, setup of autodestination map and testing | Lee,ODS D0online |
|
|   | 23.8
and Online |
Event catalog - partitioned and filled by online with unique Run/Event number | db | V3 | Few days | Need to drop events table, add constraints
for uniqueness and partition by something simple - like a manually controlled
Event Partition number ??
Online need to exercise this and ensure that the new constraints on uniqueness of run+ event number do not break something! |
Julie, Diana, Lee |
|
|   | 22.10 | Upload of Run and Run conditions information from online | user
db_server |
V3 | 3 days | Any needed changes to run table must be implemented in 3.12. Must be tested in good and error/delay conditions. | Jeremy, Carmenita |
|
| Tests with database server down |
db_server |
V3 | 2 days | Julie Carmenita, ODS D0 online |
||||
|   | 1.3 | Python V2 for Online; | user
db_server |
V3 | ? | . | ODS D0 Online | |
|   | Run with 5 streams | V3 | 1day | ODS D0 online | ||||
|   | Run with 15 streams | ODS D0 online | ||||||
| Run at expected Rate | ||||||||
Small Developments |
11.4 | Thumbnail data design, file format, access strategy and implementation of any indicated changes to sam_manager or sam databases | db
manager |
V4 | ??? |   |   | |
|   | 11.5
12.6 |
ROOT objects and file formats - interface to ROOT? | ? | V4 or V5 |   | D0 version of ROOT that gets its data from sam? What about ROOT trees that span a set of files. |   | |
|   |   |   |   |   |   |   |   | |
| Bugs and potential bugs in SAM | 10.1.1 | Db server - nail all unknown exceptions | db_server | V3 |   |   | Steve | |
|   | 10.1.2 | Sam_bootstrap -claim that stagers don't always restart on Farms | bootstrap | V3 |   | Need to reproduce this or catch it somehow before it can be investigated | Lauri | False claim - sam bootstrap improperly used |
|   | 10.1.3 | Zombie projects -deal with/eliminate? | station | V3 |   | Station cleanup code | Igor | |
|   | 10.1.4 | Work around the fnorb enum bug if it cannot be fixed by a new version of fnorb | user
station |
V3 |   | Need to ping fnorb developers again | Carmenita | |
|   | 10.1.5 | Ensure all servers handle db server restart gracefully - online claimed fss did not one time | test
_harness |
V3 |   | Do explicit tests. Set test harness event to kill db server | Sinisa | |
|   | 10.1.6 | Error messages from sam_user must be fixed to give a meaningful error always |   | V3 |   | While working in sam_user - catch as much as poss. | Carmenita | |
|   | 10.1.7 | Impatient-end feature no longerworking | station | V3 |   | Reported by Heidi on farms. | Igor | |
|   | 10.1.8 | the sam find file command gives the wrong instructions. Not a big deal
but not very user friendly either.& d0bbin> sam find file --filename=%psim01%ttbar
|
user | V3 |   |   | Carmenita | |
|   | 10.1.9 | Sam_user - store command looks for parameter file in wrong directory | user | V3 |   |   | Carmenita | Fixed |
|   |   |   |   |   |   |   |   | |
| Dataset/ Project-editor |
10.9 | Effort to ensure robustness and clarity | project_
editor |
V3 | Ongoing work | Fix bugs as they arise - asap | Matt | |
|   | 10.9.1 | On
the farms we often submit a job and then want to pick up the files that
have arrived since then. sam create dataset definition --dim=3D"__set__ bigproject
and minus __snap__
|
project_ editor |
V3 |   | Requested by Heidi
This will be a very common mode of operation and we must ensure it works and is well documented |
Matt | |
|   | 10.9.2 | Manipulation of actual datasets rather than dataset definitions | project_ editor |
V3 |   | Mail on this from Igor - doesn't this actually work? | Matt | |
|   | 10.9.3 | On
the farms we often have an undifferentiated project with 800 files, we'd
like to split this up into smaller chunks. General users may wish to do so as well. The ability to do sam split project snapshot --num_files=200 which returns
a set of snapshots of
|
project_ editor |
V3 ? |   | Requested by Heidi | Matt | |
|   | 10.9.4 | Formal grammar for dataset definition language | project_ editor |
V3? | Few days |   | Matt | |
|   |   |   |   |   |   |   |   | |
| Sam_user enhancements |
11.6.1 | Samlock/unlock dataset | user | V4 |   |   | Steve | |
|   | 10.10.1 | Fully test modes of file_client needed for online where run info and process info are passed in, not run id or process id. | user | V3 |   |   | Carmenita | |
|   | 10.10.2 | Samstore - review command and metadata file format in general and deal with all file formats, module for Farms | user | V3 |   |   | Carmenita | |
|   |   | Sam store - add a switch to allow metadata only, no file store. Also perhaps designate in metadata file itself? | user | V3 or V4 |   | Done?? | Carmenita | |
|   | 10.10.3 | Review and restructure package as necessary and remove all junk unused files | user | V3 |   |   | Carmenita | |
|   | 11.6.2 | Sam command to give statistics on your analysis project - eventually to return luminosity also | user | V4 |   |   |   | |
|   | 10.10.4 | Use of standard command/parameters XML file to validate sam commands | user | V3 |   |   | Vicky | |
|   | 10.10.5 | Much improved test suite needed | user | V3 |   |   | Carmenita, Vicky |
|
|   | 10.10.6 | Python module for file_client needs documentation | user | V3 |   |   | Carmenita | |
|   | 10.10.7 | Different levels of printout of status block - improved formatting | user | V3 |   |   | Carmenita | |
|   |   | Review C++ code and see if it can go away. Compile all python code before distributing product | user | V3 or V4 |   |   | Carmenita | |
|   |   |   |   |   |   |   |   | |
| Database V3.0 | 10.11.1 | Filesplit/merge/trigger/luminosity interfaces | Db
Db_server User manager |
V3 | ~ 2 weeks |   | Julie, Matt |
|
|   | 11.7.1 | Data structures for disk and network pipe resources | db | V4 |   |   |   | |
|   | 11.7.2 | Data structures for routing of files through station caches | db | V4 |   |   |   | |
|   | 10.11.2 | Additional attributes on files | db
db_server user |
V3 |   | Implement what is needed | Julie, Matt | |
|   | 10.11.3 | MC production tables get into use | db
db_server |
V3 |   | Greg's code + ? | Julie, Matt, ?? |
|
| 10.11.4 | Data structures for batch integration/resource benefits/weights | db
db_server |
V3 | Matt | ||||
|   | 10.11.5 | Partitioning of several tables - ready for run | db | V3 |   | Have to give it our best guess to start with | Julie, diana |
|
|   | 10.11.6 | DropEvents table and partition prior to putting event cataloging into productiouse. | db | V3 |   |   | diana | Done- may be done again?? |
| Sam_admin tools and Diagnostics pages |
  | MC import file store scripts - improve and generalize for use offsite | admin | ? |   | Lee thinks these are too specific for d0mino and not generalizable. Now it is more verification. So users will have to write their own sam store scripts |   | |
|   | 10.12.1 | Commands to add some of the supporting data - such as new application version, instead of Forms interface | admin | V3 | 2 days |   | Lauri | |
|   | 10.12.2. | Statistics on disk cache usage | admin
db_server? |
V3 | 3 days to 3 weeks depending on what we want to see |   | Lauri | |
|   | 10.12.3 | Tool to update file entries in database with their sortable location cookie on tape (required for Optimizer) | admin | V3 | 1 day |   | Sinisa | |
|   | 10.12.4 | Scripts to find all files that are in SAM and not in Enstore | admin | V3 | 2 days | We have these scripts in test harness -> admin | Sinisa, Dehong |
|
|   | 10.12.5 | Tool and cron job to run to synchronize Tape status between Enstore and SAM and to create reports on actions taken- web page on bad tapes, noaccess tapes. | admin | V3 | 1 day |   | Sinisa, Lauri |
|
|   | 10.12.6 | Mark volumes as NOACCESS or NOTALLOWED or REMOVED (for when WE recycle or lose a volume) | admin
db_server |
V3 | 1 week | Db server needs to deal with and understand this | Lauri, Steve |
|
|   |   | Better way to view log files of a large number of db servers |   |   |   | -> in db server work | Steve, Lauri |
|
|   |   | Dump of infoserver statistics into individual web pages and line mode commands? | admin | ? |   | If we get time .... | Lauri, Sinisa |
|
|   |   | Easier way of locating and viewing the main SAM log and info file - from the
web and documentation of where the archived log files are - zipped or
whatever.
Automated process to zip and unzip archived log files |
admin | V4 | Few days |   | Lauri | |
|   |   | Make name server web page show actual alive servers - make sure all servers have base ping method in - cleanup web page periodically. | admin | V4 |   | Can we clean up naming service itself too? |   | |
|   | 10.12.7 | Enstore Statistics | admin | V3 | 2 weeks | Maintenance when encp/sam/stdout change | Sinisa | Done |
|   |   |   |   |   |   |   |   | |
| Sam_station servers enhancements |
10.8.1 | Use Enstore header/text file of errors (and their retry profile) in eworker | Station | V3 |   |   | Igor | |
|   |   | All servers to use new exceptions |   | V4 |   |   |   | |
|   |   | Think about how to implement a backup naming service and register all servers with both |   | V4 |   |   |   | |
|   | 10.13 | Sam_bootstrap - figure out how to designate different enstore system | bootstrap | V3 |   |   | Lauri | Done |
|   |   |   |   |   |   |   |   | |
| Db server enhancements |
10.14.1 | Reconnect to database if lose connection | db_server | V3 | 1 day |   | Steve | |
|   |   | Use new exceptions in all places | db_server | V4 |   | Can only do this partially in V3 - until all station servers use new exceptions | Steve | |
|   | 10.14.2 | Support multiple independent processes with independent conditions and a way to pass the connection params | db_server | V3.1 | 2 weeks | Figure out cleanup of processes | Steve | |
|   | 10.14.3 | Support different db servers for datasets/file queries and for other server transactions, and for online event catalog and make a proper framework for this - involves sam_bootstrap probably | db_server | V3.1 | 1 week | In general, in conjunction with 10.14.2 display multiple log files for multiple servers | Steve/Matt | |
|   | 10.14.4 | Support for `cookie' or equivalent for secure conections using users own name and pw | db_server | V3 | 1 week |   | Steve | |
|   | 10.14.4 | Add above support to db server gen | db_server | V3 | 1 week |   | Steve/Lauri | |
|   | 10.14.5 | Load testing with db servers | db_server | V3 | 1 day |   | Steve | |
|   |   |   |   |   |   |   |   | |
| Infrastructure | 1.3 | Python V2 |   | V3.1 |   | Figure out what it means Looks like we will move forward. What are the tasks and how long will they take? | Maciej,Matt,
Steve,Carmenita |
|
|   | 1.3 | Bbftp |   | V3 | 2 weeks | Package servers and clients for linux, irix and osf1. Work with Lyon developers. Ensure that IP address
and port are configurable.
Put in kits with test package. |
Mike | |
|   | 1.3 | Fnorb - chase bug and get new version ? |   | V3 |   |   | Carmenita | |
|   | 1.3 | New version of orbacus ? |   | V4 |   | We are very behind!We need a cookbook for this - the kits product has only the libraries and executables in - no input files ? | Carmenita, Steve |
|
|   | 1.3 | Install updated LSF on d0test and d0mino |   | V3 |   | DONE by Dave Fagan |   | Still waiting for V4.2? licence on D0test |
|   | 19.6 | Get extra GB Ethernets installed on d0test |   | V3 |   | DONE |   | |
|   | 1.5.5 | Write up something about fnidl - what? Why? |   | V4 |   |   | Lauri | |
|   | 1.8.5 | Code reviews - - of all major servers and sam_user |   |   |   |   |   | |
|   | 1.5.6 | Get $ID in all files; | all | all |   |   | all | |
|   | 1.5.7 | Remove redundant files left over from sam_doc, sam_talks, sam split and in all packages where possible | all | all |   |   | all | |
|   | 1.9 | Figure out how users are to use helpdesk tracking interface. Separate lists of bugs/minor enhancements | ? | V3 |   | Operational issue + might need some work for usability. | Matt, Vicky, Lee, Lauri |
|
|   |   |   |   |   |   |   |   | |
| Test Harness and testing | 22.16 | Continued testing - and logging of results -work down our list and extend list | Test
_harness |
V3 |   | Lists of work for testing and for test harness | Dehong, Sinisa |
|
|   | 22.16 | Test harness that simulates online/Farm/central_analysis + 2 or more user
linux cluster all running with event input rate at > 40 Hz.
Most important the we can get deterministic behavior and that we have enough statistics and measurements to a) Know that what we see is what we expect b) Modify input params and observe changes |
Test
_harness |
V3 | 2 weeks | We should have this able to run continuously by now - even if not at full rate | Dehong, Sinisa |
|
|   | 22.16 | Stress testing of db server(s) | Test
_harness |
V3 | Need to understand affects and limitations - in cases of out-of-control queries. -> db server | Steve | ||
|   |   | Track down files getting rm'd from pnfs space and if necessary change file protections etc. to track down | Test
_harness |
V3 |   | Might require encp change? | Gerry | No evidence so far |
|   |   |   |   |   |   |   |   | |
| Sam_manager |   | Track any changes in servers for exception handling | manager | V4 |   |   | Sinisa, Lauri |
|
|   |   | Name expander for file merge + metadata for this case | manager | V4 |   |   | Sinisa, Lauri |
|
|   | 10.4.1 | Any issues that arise as a result of rcp database and calibration databases in same executable | manager | V3 |   |   | Sinisa | |
|   |   |   |   |   |   |   |   | |
| Documentation | 1.9.1 | XML definition of commands and parameters -> docs and web page.Gives sam commands quick look | doc | V3 |   |   | Vicky | |
|   | 1.9.2 | Consolidated sam users guide | doc | V3 |   |   | Vicky, Matt, Lauri |
|
|   | 1.9.3 | Enhanced sam shifters documentation | doc | V3 |   |   | Lee, all |
|
|   | 1.9.4 | Sam system reference guide | doc |   |   |   |   | |
|   | 1.9.5 | Sam operations and administration guide | doc |   |   |   |   | |
|   | 1.9.6 | Installing sam at a remote site - improve guide | doc | V3 |   | Largely done | Lauri | |
|   | 1.9.7 | `Live' tutorial on web | doc | V4 |   |   |   | |
|   | 1.9.8 | Flesh out FAQ page | doc | V4 |   |   |   | |
|   | 2.2 | Update glossary of terms | doc | V4 |   |   |   | |
|   |   |   |   |   |   |   |   | |
| Web pages and web servers |
1.2.2 | Go over all sam browsing pages, add tapes, other fields, refurbish | data
_browsing |
V3 |   |   | Matt | |
|   | 1.2.3 | Re-organize for new documents, add registration, etc. | doc | V3 |   |   | Lauri, Vicky |
|
|   | 1.2.4 | Understand why wbs.py is not working in devel for the upteenth time!!! | doc |   |   |   | Lauri | Done |
|   | 1.2.6 | Commission d0pilio |   | After V3 |   |   |   | |
|   | 1.2.7 | Web pages to view all active sam stations - using db and infoserver? | admin
data _browsing |
V3 or V4 |   |   |   | |
|   | 1.2.8 | Productionweb server stats don't work |   | V3 |   |   | Lauri Steve Diana |
Done |
| Robustness, 24X7 | 3.27 | Start thinking about how/if could use data warehouse/ other databases if DB is down as part of 24X7 and failover? |   | V4 |   |   |   | |
|   | 3.27 | Think about how to deal with backup naming service and register all servants with both.Which servers currently register themselves again with the name service if it goes down | Many packages | V4 |   |   |   | |
|   | 3.27 | Change station cache file protections so that files not normally visible to users | station | V3 or V4 |   |   |   | |
| Operations and Support | 21.x | Ongoing user support and routine data maintenance tasks | all | |||||
| 21.x` | Helpdesk - bug tracking? | |||||||
You may edit this file only in something that preserves the "Simple HTML". Please view the source before and after you edit to check this. Use xemacs on Unix, Frontpage on NT or another simple editor please.
$Author: lauri $
$Date: 2001/03/07 15:25:57 $
Future Work on SAM Manager
==========================
In addition to getting SAM Manager to work with new fss, there are also
many small details that have to be finished and/or corrected. Here is the
list as I remember it right now:
1. Release new versions of all IDL products sam_manager uses.
2. Cleanup makefiles. Note that one will have to solve many small
problems in order to get things to compile again: new IDL products
now have a different directory structure than before, many IDL
structs have moved to different files, some files are gone, etc.
3. Fix sam_manager to use new fss in the same way as before, i.e.
restore its ability to store files in a synchronous fashion.
4. Add the ability to store files in an asynchronous fashion. Extra
RCP parameter will be needed here.
5. Add the ability to use environment variables SAM_STATION and
SAM_PROJECT
instead of rcp parameters.
6. Write output file metadata into a file.
7. Try to use autodest instead of RCP parameter for getting a valid
location for storing files. I do not know yet whether that can be done
at
this time.
8. Add ability to handle project restarts. At the moment we have that
only for the command line interface.
9. Update and cleanup documentation. There have been several new RCP
parameters added, probably by Steve, which were never documented.
10. Look into getting rcpID from framework. This will be used in the future
for identifying consumers, along with application name and version.
11. Get demo scripts working with Igor's example for submitting a job
to the batch system.
Future Work on Optimizer
========================
The work on optimizer for SAM V3 involves removing random order for
authorizing files. This can however be accomplished only after we
insert location cookies for all files in the SAM database, as well
as making appropriate changes in the dbserver, fss, station master and
eworker.
One also needs to worry about not to breaking the existing code and
preserving the backwards compatibility. Here is the list of things I
think need to be done:
1. Introduce new IDL struct which will contain location cookie in addition
to the file tape location, as well as new dbserver method that will
accept that struct.
2. Get the new the dbserver method working.
3. Make appropriate changes in the eworker code and parse encp output
for location cookie.
4. Make appropriate changes in the fss code and use new method for
adding new location and location cookie to the file. Make sure
that call to "pnfs xref" also retrieves location cookie upon file
store resubmittion.
5. Develop scripts which insert and/or verify location cookies in the
database.
6. Touch the station master code to use the new structs with location
cookies and pass that to the optimizer.
7. Finally, use location cookies instead of random numbers for sorting
files that have to be authorized.