Short term task list (Months 1-4) &short; Art 0.5 Randy 0.5 Rob 0.5 Dmitri 0.2 Sinisa 0.75 Lauri 0.75 Andrew 0.75 John 0.5 Steve 0.2 Diana 0.4 Jeff 0.7 Lee 0.7 Martin 0.2 Magherita 0.2 Stefan 1.0 Rick 0.3 Dave 0.3 Rod 0.1 Alan 1.0 Todd Dane Taehyun Carmenita Fagan Lin Stan Gabriele Glenn Tom Heidi Lancaster-hire Igor Timur Matt Management Task list administration management Task lists, assignments, meetings, etc. 1 4 Lee Jeff ongoing 7/26/02 Task list formatter management 2 0.2 0.5 Jeff prototype 8/4/02 D0 support D0 operations support support Support by experts to shifters. 1 Lee Lauri Sinisa Martin ongoing 7/26/02 Helpdesk followup support Tickets assigned to SAM, resolve and closeout. 0.4 Lauri Carmenita ongoing 7/26/02 d0mino-SAM integration Add ability for remotely initiated transfer to use d0mino-SAM dedicated interface on d0mino. Fagan does not know how to solve this yet. If it cannot be solved, then will need to set up an additional SAM server dedicated to serving files to remote sites. Need full routing to take full advantage of this, especially to get files in the d0mino cache. 1 0.5 1 Lee Fagan 7/26/02 Project report monitoring Provide physicists with a comprehensive report of files delivered and not delivered. 2 0.2 0.5 8/4/02 D0 SAM station deployment and systematic testing initative integration For identified site, benchmark network performance, establish and test working SAM stations, systematically move data to these sites and measure performance and bottlenecks. Work with DCD to get started with tools developed by IEPM (SLAC). Send list of station nodes to Frank Nagy so we can monitor network activity to them. 3 0.2 4 Lee 7/26/02 Farm batch code extraction from station integration Rewrite of getfile info. Waiting for tests to be defined to make sure everything still works after the break is done. 1 0.25 Lauri Steve waiting 8/13/02 CDF support CDF operations support support Partly hands-on training for future CDF experts. 1 4 Rob Art Jeff Rick Dave Stefan Todd Alan Lin Stan ongoing 7/26/02 Developer, shifter, and operations support training support 2 0.2 2 Lauri Gabriele Rick 7/26/02 Load CRC values into CDF database integration 2 1 0.5 Art 8/21/02 Current testing and initial deployment for CDF integration Completion. 2 1 2 Rick Dmitri 7/26/02 Run section range efficiency integration Compare performance of run section ranges, as currently implemented, with many-to-many table. 2 0.3 1 Randy 8/21/02 More update information from DFC integration For example, deleted files, new file or dataset associations, luminosity recalculations when fileset or dataset associations change. (How often should it be done?) Could be based on Predator. 4 0.3 1 Randy 8/21/02 User interface maintenance And other additions to support new experiment environment(s) 3 0.2 4 Rick Randy Dmitri 7/26/02 sam store development Features needed for storing user output into SAM 3 1 1 Dave 7/26/02 SAM Helper development Redesign parts of SAM Helper and maintain wrt SAM IDL's 4 0.3 4 Dave 7/26/02 SAM station of fcdfsgi2 integration 4 0.5 1 Jeff Glenn 7/26/02 Documentation General documentation maintenance Documentation 3 0.1 4 7/26/02 Diagnostics dev/int/prd scheme in diagnostics page integration Break up so it is consistent. This makes it easier to maintain the db server and create new installations. 1 Lauri 7/26/02 Local test script in sam_user maintenance Put a script in sam_user or similarly easy location to test the local setup. This will help SAM shifters help local users diagnose problems. (Rick already has a prototype script) 3 requested 7/26/02 File transfer monitor maintenance Include research using XML format. Possible changes to sam_admin tools for mining log files; break log files daily. Include intra-station transfers along with extra-station. 3 1 1 John 7/26/02 Review and update SAM dump output maintenance Formats and information. 3 8/4/02 Archive log files maintenance (waiting on SAM on Sun) Need stager and encp only, could have station running on other node. Sinisa will try to build stager on Sun, Lee will get encp for Sun (seems to be available). Lauri will do final set up. Try on Ora3. Use central analysis station with only stager running on ora1 and ora3. Just need to come up with the metadata and tier for this. 4 1 1 Lee Lauri John 8/4/02 SC2002 Display and Demo monitoring 4 0.25 5 John Lee designed 7/26/02 Improved SAM-at-a-glance monitoring Plots, etc., so it runs on ora1 and provides more up-to-date information. May require sam user to run on SunOS, or convert to use the name service status info (just ping the stations instead of sam dump). Need to add additional information to database: (1) known down; (2) monitoring level high, medium, and low availability systems (see Lauri's mail). Lauri suggests turning this into just doing requested dumps. 5 0.1 4 Lauri Lee 8/4/02 Station status development Mark entire station as down; also might want node down, station down, fss down. 5 1 0.5 Lauri Diana 8/4/02 FSS error state maintenance Leave the FSS in well-defined states on error conditions. Current sam store returns either success or failure, where the latter state tends to be undefined. Andrew and Sinisa will characterize return codes (possibly make new ones, such as a timeout). Originates from farm FSS crashes, usually associated with jammed tapes. 0.5 0.5 Sinisa Andrew 8/13/02 Testing Continued testing of new SAM versions testing Station servers, db servers, and user interface in several configuration environments, under simulated user load. Also needs documentation. 3 1 Art Andrew Tom ongoing 7/26/02 Unit tests for sam_user interface testing Tied to the sam parser task. 4 Lauri 8/4/02 General support and operations SAM db server support maintenance 0.1 Steve Diana ongoing 7/26/02 Python fnorb bug maintenance Find bug causing crash (or go to omniorb). First needs to be reproduced. 1 0.3 1 Steve Andrew Lauri 8/13/02 Autodest with processed files development Needs to be resolved. Bug in the server in constructing the path, pulling info from the parent that it should not. Load mapfile is very slow. 1 Heidi Lee Lauri beta 8/4/02 File family in SAM autodest development Add code to sam autodest so that the proposed path string uses an optional entry for "file-family=..." appended to the stream field. This has been requested by Gerry for the online direction of files to tape. Still some debate, but will provide flexibility for streaming decisions to be made later. 1 Carmenita 8/4/02 Pick event development D0 design and initial features 2 .6 1 Lee John Sinisa 7/26/02 Station bug fixes and minor feature additions maintenance Outstanding major issues (Sinisa 5/21/02): (1) small projects overlapping with big ones started first; the small one won't make any progress until files are delivered for the big one. (2) route does not work with --constrained-delivery. (3) unlocked files change group ownership upon station restart; this has been kludged by hardcoding D0 group as the default for orphans. (4) end of stream does not work if there is a file delivery error after consumer has been established. (5) station should not retrieve locations for all files when a project starts, but do it on a need-to-know basis. 2 0.5 Rob Andrew Sinisa 7/26/02 Cache management development Reengineer, debug caching algorithm. Missing group, station revival, db server work. 2 1 1 Rob Andrew Sinisa 7/26/02 sam_user maintenance maintenance Take over sam_user from Carmenita. Cleanup commands, error messages, exception handling, adding new commands. Need to understand why so slow on d0mino. Also need to check unhealthy mixtures of old and new exception handling. 3 0.2 4 Lauri 8/4/02 Resolve namespace conflicts for DB schema development 3 0.1 4 Randy Diana 7/26/02 Packaging maintenance Standardized build/config in generic environment 3 0.2 4 Art Lauri 7/26/02 gridFTP integration Transition from bbftp as extra-domain transfer protocol. 3 1 1 Stefan Gabriele Rod Dane packaged 8/13/02 sam_start_bbftp problem integration On some Linux machines (e.g., nglas09), "ps -fu sam" gets truncated when piped through "grep". This causes the sam_start_bbftp.sh/sam_stop_bbftp.sh scripts to break (send mail saying tht the daemon is not running, when actually it is, and then trying to restart something that is already running). 4 0.2 0.25 7/26/02 sam_bootstrap -sam problem under sh integration Luciano ran into this when restarting one of the db servers on fndaut1. See his mail around June 4. This is a problem in sh handling of "-sam"; bash works all right. 4 0.2 0.25 7/26/02 kinit in sam_kerberos_rcp script integration sam_kerberos_rcp script needs to be updated so that it doesn't try to kinit if it doesn't need to (especially for Linux, according to Fagan). See mail from Lauri (after 5/22/02). 4 0.2 0.25 7/26/02 Project restart after station crash development Need to be able to recover projects after a station crash. (1) application must be restartable. (2) batch system must coordinate with projects. (3) projects are restarted. Restart project known to be broken. User needs too close output file at last file boundary so work is not lost. 4 1 2 7/26/02 Common parts of SAM manager maintenance Break out and also make CDF and D0 specific parts. D0 specific clients should have same functionality as now provided by sam_manager and sam_root interfaces. 4 1 1 Gabriele Sinisa Dave 7/26/02 SAM admin and boostrap changes development Requirements and design (and general distribution plan) 1 1.5 Lauri Sinisa Art 7/26/02 Copy quantity in SAM db development Get number of copies for each file from the SAM db. Need to decide where this is kept in SAM. Carmenita 8/4/02 gcc3 build maintenance Build SAM code with gcc3. First step is to build Orbacus 3. Art 8/13/02 CRC CRC file transfer development Needs verification and FSS test. 1 1 1 Andrew Sinisa beta 7/26/02 cron file test development Test files from remote sites for corruption. Use dump event to read evnts. Get CRC from enstore. 1 1 1 8/4/02 File status, life cycles File parameters maintenance Additional arbitrary attributes (without strictly requiring db schema changes). From Predator work. 1 0.2 1 Randy prototype 8/21/02 File status and other file attributes development Needed to track life cycle of data files. Use file parameters, which now include strings, integers, reals, and timestamps. Accomodate new ways of processing data (production vs group vs individual). Start with a proposal from January. 1 1 3 Sinisa Lauri Steve Diana Randy Dmitri 7/26/02 Interim file status development Add new column and needed changes to use for file status. 1 1 0.5 Lauri Diana 7/26/02 Life cycle for projects development Changes to Data Set Editor and associated tools. Use file parameters. 3 1 2 Lauri Diana Randy 7/26/02 Distributed cluster computing ClueD0 and farms integration d0mino backend and clued0, farms. Needs dedicated test environment. Change to project since home areas for jobs in the queue will be on Linux. 1 0.5 1 Sinisa Andrew 7/26/02 Friday meltdown integration sam_submit starts project and then submits processing jobs. This behavior sometimes overloads d0mino as users submit lots of jobs. 1 0.5 1 Stefan 7/26/02 Design and implementation of SAM station(s) on CAF (CAB) development 1 1 2 Rob Stefan Taehyun 7/26/02 sam submit parallel processing development For CAF and general application 1 0.5 1 Stefan 7/26/02 Batch adapter for FBSng development 1 0.5 1 Stefan Gabriele 7/26/02 SAM operation on private network maintenance Needs documentation and tweaks. 2 0.3 1 Sinisa 7/26/02 URL file access development Create interface for a project to get a URL for a file (e.g., get_next_url) 3 1 1 Rob Andrew 7/26/02 in2p3 Computing Center in Lyon integration Requirements for interfaces (possible CDF need too for network file) 3 0 0 Lee waiting 7/26/02 Farm CP for intra-station transfers development Package, incorporate in sam_bootstrap. Tom will test when it's packaged (Sinisa will send mail). 3 0.2 0.25 Sinisa written 8/13/02 NFS disk as sam_cache development 3 1 1 Rob 7/26/02 Decentralized computing Decentralize naming service integration Set up secondary NS. Includes enhancements to diagnostics and keeping information local. Not writing a new information and monitoring server. 2 0.3 1 Sinisa 7/26/02 Dynamic station installation development Ability of SAM to be deployed, setup, and dismantled dynamically. Needs to install and add configuration to database, and run. Needs all libraries, orbacus, etc. 3 Igor 8/4/02 Station site autonomy development Additional infrastructure to decentralize station operation. A station would be more self-sufficient and could operate for some amount of time without access to the outside world. Also, failover to db alternatives to FNAL central databases. Sinisa Andrew 7/26/02 Monte Carlo support Additional MC metadata functionality development Explain to CDF and D0 people how to use file parameters, and make sure they don't go overboard. 1 0.2 4 Randy 8/21/02 Request system maintenance Physics parameters (use file parameters aside from MC system itself); MC generator, random number seeds, generation cross section, validation status, physics type, for example. Add dimensions for these parameters. Also data run to which the MC run corresponds. 2 0.5 1 Lancaster-hire Randy 8/21/02 Recent requests (needs discussion) SAM script wrapper for batch submission maintenance Check that the user script is executable before trying to execute it and give an easily recognizable error message 2 Stefan requested 7/26/02 sam_admin privileges maintenance use group id's intead of database information to authenticate 3 requested 7/26/02 Disk space check before writing to local cache maintenance How can a station master check df information on a remote node? 4 requested 7/26/02 Parameter fetch from CLOB integration Program to fetch parameters from CLOB into parameters for a file. requested 8/4/02 Longer term items (Months 5-24) &long; Andrew 0.5 Martin 0.5 experts 0.5 General sam_manager work development Pingable client; check restart option working with --CPID on command line. Also desire to reuse Gabriele's API for ROOT. Gabriele might be able to do this. 5 1 1 Sinisa Gabriele 7/26/02 dCache integration development Need sam_dcache product 2 1 1 Rob Timur 7/26/02 Site autonomy development Global data replica work, consolidation of station servers 3 2 3 Sinisa Andrew Rob 7/26/02 Site optimizer development Governs file transfers and tape usage. Closely related to autonomy, etc. 2 1 3 Sinisa design 7/26/02 Information service for local station development 5 1 3 7/26/02 boostrap and configuration development Redesign and implementation 2 3 Lauri Art 7/26/02 D0 pick event development Completion. 1 1 3 Lee Steve 7/26/02 D0 pick event for non-raw data development Fix dimensions to enable. Need to fix parentage for files. Involves schema changes if we want to use denormalized approach. 2 Matt Randy 8/4/02 SAM batch wrapper development 4 1 2 Stefan Lauri 7/26/02 Ongoing experiment support support 2 experiments 1 2 experts 7/26/02 Ongoing feature additions, fixes, improve web interfaces, etc. maintenance 2 2 7/26/02 Improved monitoring development 2 2 3 7/26/02 Miscellaneous unforeseen needs development 2 2 3 7/26/02 Testing testing 2 1 7/26/02 Documentation support Need SAM quick reference page to replace obsolete quick start guide. sam get metadata, list definition, --keywords, sam create dataset --keyword???, sam run project, sam submit, mc runjob new metadata, auto dest "sam store --descrip=...", add new phase; luminosity and archive files, sam batch commands, psusp, files not delivered, python api, new dimensions and examples. status block to sam tool translation. Sam station starting options through sam_bootstrap, new flags. Questions about groups. 4 0.1 20 7/26/02 Improved SAM-at-a-glance. monitoring Plots, etc. 5 0.1 Lauri Lee 7/26/02 SAM-Grid integration integration 4 1 6 Gabriele Igor Rick 7/26/02 Grid (at large) integration integration 4 1 6 Gabriele Igor Rick 7/26/02