The overall plan :
Goal1: Demonstrate an entire chain of
Define Project --> Initiate Project --> Start multiple consumers --> Display state of system
Goal2: Move to the actual sam testbed and get away from the precarious situation with RedHat 5.02 on Igor and Rich's machine, with the bleeding edge egcs and Orbacus. This means finishing validation of the 5.1 system and installing it on at least 2, preferably more, of the sam testbed machines.
Goal 3: Set up a stable fake Enstore server system using disk for the storage, instead of tape. It is proposed that this be one of the sam test systems - one with two external disks - where we can store data written to the 'mass store'. It could equally well be one of John's systems, if such a system were available, had disk space, and could be held stable during SC.
Goal 4: Write several sets of files into this fake Enstore 'mass storage' system, cataloging them in the SAM database. Attempt to run multiple projects, on multiple nodes (stations), each with multiple consumers nominally representing a mix of access modes. The statistics display is based on the log file and should be able to show the totals from all stations, projects and consumers.
These are goals. We will have to be realistic, of course, given that already many unexpected hurdles have appeared with respect to linux and compilers, etc. We could certainly run more than one project on a single node - but the bandwidth and number of process slots will be extremely limited - so if 3 or 4 or more nodes are available to launch almost identical scripts on it would be much better and would also move us forward to the next step of testing with Enstore and demonstrating the distributed and scalable nature of SAM.
The table below gives the testbed node names, their characteristics and roles. Unless otherwise stated they will be running Fermilab 5.1 test Linux, with various sam additions and mods applied. We will use as many machines as we can get up, or as many as we can run demo scripts on simultaneously without crashing the whole thing. We will use the dual cpus machine in single cpu mode. One machine will have a Farm system environment.
| Node Name | Physical Characteristics | Purpose |
| samson | dual cpu , 2 external drives ,runs Fermi Linux 5.02 | Enstore fake 'mass store' system with storage on the 2 disks - up to 36 GB |
| delilah | dual cpu, 2 external drives | Freight Train Station - to cycle large datasets through with multiple consumers. Can run >1 suite. |
| sameggs | single cpu | Orbacus naming service, sam_logger, sam_info_server, |
| samham | single cpu | sam_optimizer
|
| samiam | single cpu | Console server / also able to run project(s)/consumers
- limited by disk cache of 2GB?
|
| sammy | dual cpu | to run project(s)/consumers - limited by disk cache of
2GB
|
| samadams | dual cpu | to run project(s)/consumers - limited by disk cache of 2GB |
| samting | dual cpu, Fermi Linux 5.02 | Farm System environment machine for farm consumer tests
|
| fncduh | sun/solaris db machine | sam_db_server |
TODO....
| Task | Details | Responsible party(s) | Status/Date Done | ||
| 1.1.1 | SAM testbed | Finish hardware installation, network hookup. Fill in table above. | ? network | LL -> h/w group | 10/29/98 |
| 1.2.1 | Finish validation of initial Fermi 5.1 system, with SAM infrastructure added | finish rebuild of sam servers and sam_user - test | IT | 10/29/98 | |
| 1.2.2 | rebuild ACE/TAO (with version currently using) and sam_infoserver - test | RW | 10/29/98 | ||
| 1.2.3 | document sam additions/mods to base system | IT | |||
| 1.2.4 | agree system is good and can be 'cloned' | all | |||
| 1.2.5 | Get SAM user GID and UID from Yolanda. Add to users on 'master' system. | VAW -> IT | 10/29/98 | ||
| 1.3.1 | Do system and SAM infrastructure install on other sam nodes. Same users on all, different file systems on 2 of them. | clone system to 5 other nodes, including user accounts/areas | see Don's note ( now being done by IT/DJH) | ||
| 1.3.2 | install enstore on samson, prepare startup processes for enstore node | see Don's note | |||
| 1.3.3 | create file system for Enstore fake 'mass store' on samson | see Don's note | |||
| 1.3.4 | create file systems for 'Freight Train' clients on delilah.
Divide into 4 separate /samdisk-n areas. On all other nodes
create /samdisk file system of 2 GB.
|
IT | |||
| 1.4.1 | Label machines and take photo of rack | LL | |||
| 2.1.1 | startup of servers | Script(s) to start up all basic servers on sam testbed machines | enstore servers startup | see Don's note | |
| 2.1.2 | naming service, logger, info_server, db_server, station_masters (dummy), and optimizer | IT/IM | |||
| 3.1.1 | startup of projects/consumers | Script for individual test suite of 1 project and its consumers (and encp clients) (extract of Igor's current script). | make sure it is parameterized by 'station' and project name and disk cache (on the command line?). Get window names set appropriately and uniquely ready for multiple launches on multiple machines. | IT/IM | some improvements made to run_suite.sh |
| 3.1.2 | Script for startup of multiple project/consumer suites on multiple nodes | understand issues with multiple windows/possible hangs and permit 'quiet' mode of operation logging only to file? | IM | ||
| 4.1.1 | store and catalog files | Write files LARGEn (n = 1-20) 1 GB files of random
numbers into Enstore fake store.
Write files SMALLn (n=1-20) 100 MB files of fixed pattern into Enstore fake store |
do this only once, but better script it.
Use better names for files. Save list of files written in sam_testharness. |
IT/VAW | |
| 4.2.1 | Catalog files from 4.1.1. using mc_import scripts (or by hand) and using existing Run, Data Tier, Stream #, Process, etc. information in database | script to do the cataloging | JT | going to make 3 scripts | |
| 4.2.2 | script to delete the above catalog entries
|
JT | |||
| 4.3.1 | Display of catalogued files on web page | MISWEB page perhaps? just to display what is in the db | VAW | ||
| 5.1.1 | project/consumers test suite | assure single set of project master and multiple consumers runs consistently | ? decide if multiple encp clients per PM are to be used or not. If buggy, don't push it, instead run 2 projects on Freight train machine and use only 2 file buffer/caches - 1 per disk. | IT | |
| 5.1.2 | assure no hardcoded parameters which prevent movement between nodes | IT | under test now | ||
| 5.2.1 | add logging of significant events - for statistics display | Log event (or add parameter to existing log entry - providing next file name to consumer) to distinguish files found in diskbuffer/cache from files provided only after wait (for fetch from Enstore). | IT/VAW | ||
| 5.3.1 | assure interaction with sam_db_server sufficient to create/find consumers and initiate active projects. | debug db_server and assure it matches database structure (as frozen per 8.1.1) | IT/VAW/MV/SW | ||
| 6.1.1 | statistics display | Test with real log | Test current system with real log file created by test suite, instead of generated log file | RW (IT) | |
| 6.2.1 | Files opened - tab 1 | Resize graph, relabel x axis for time (not date), rebin by mins instead of hours. Re-title to Files Fetched from MS | RW | almost done | |
| 6.3.1 | System Throughput - tab 2 | Change unit to KB, rebin by mins, change x axis to time. Resize graph | RW | ||
| 6.4.1 | Updates of graphs | Add timer thread. On timeout refetch data from sam_info_server and redraw. | RW | done for tab1 | |
| 6.5.1 | DB Query - tab 3 | Relabel to info_server test, write explanatory text about info_server | |||
| 6.6.1 | Project Summary - tab 4 | relabel to System Dataflow. Hook graphic objects to methods supplying data. See details on separate picture. | RW +VAW | ||
| Attempt to use new data in log on whether requested file was found in cache immediately or had to wait for fetch | RW+VAW | ||||
| 6.7.1 | Project Detail - tab 5 | relabel to System Overview. | RW | ||
| 6.7.2 | Parse log file further to extract Station and Project
lists as well as Consumer and encp request lists OR
|
RW | |||
| 6.7.3 | Would a query to the db_server -> DB for all active projects be simpler? | RW/MV/SW | |||
| 6.7.4 | Hook graphic objects to info_server calls. | RW | |||
| 7.1.1 | create/save project | Db_server interaction | Determine and finalize all interfaces and structures required by the sam_admin project interface program. Need to add project name to several calls - this is the key which ties into PM and log file | RW/LL/MV | |
| 7.1.2 | Implement required interfaces in db_server | MV/SW/VAW | |||
| 7.2.1 | Minor mods to GUI ? | Modify GUI to present only parameters which can be 'AND'ed together to give an answer. For those params mutually exclusive or interrelated provide 'OR' pull down to select one. | LL
|
||
| Add project name | LL/RW | ||||
| 7.3.1 | Hook Action buttons to calls to db_server | RW | |||
| 7.4.1 | Write recipes for defining valid projects based on test files imported into system in 4.1.1. | after writing them - ready for 10.1 below | LL | ||
| 7.5.1 | Write cleanup script to remove consumers, consumer consumption entries, process entries, all parts of project defn, snapshot and running | JT | |||
| 8.1.1 | database | Freeze database structure. Make sure supporting data for test suite files exists. If not create by hand. | JT | ||
| 8.1.2 | Add all sam node names to database | JT | |||
| 8.1.3 | Add 100 fake 'users' to database - put flat text file list of 100 users in sam_testharness | JT | |||
| 8.1.4 | Make list of valid process/process families which are in DB (or put more in DB) - so the Project Master/Consumer test suites can use valid process family/process version data when creating consumer IDs. Save in sam_testharness | LL/JT | |||
| 9.1.1 | get laptops ready to go for SC | Standard products install | Check with PCS that laptops will be fitted with Exceed, Netscape, IE, FTP, screen snapshot, ghostview and will be running NT. | VAW | |
| 9.2.1 | Licences for products? (looks like we will run the java display stuff actually on the laptops at SC - although in principle the java runs on the linux boxes). In case of funny bugs will be easier to have on laptop. | Buy KLG (or will need to download another demo version) | VAW | ||
| 9.2.2 | Buy another Symantec VC licence | VAW | |||
| 9.3.1 | Arrange for installs of non-standard SAM specific products | Symantec VC, KLG- JChart, JOB with jidl, JTC, sam_admin files | ? /RW | ||
| 9.4.1 | Allow running of java code on sam testbed systems (in case problems with laptops) | Make product for Symantec jar files and ensure Jchart product is also on 1 system | RW | ||
| 10.1.1 | SC demo | Script demo | write script of what to do. Save in sam_testharness doc area | several.. | |
| 10.1.2 | Enhance web pages using snapshots of parts of scripted demo | VAW/LL | |||
| 10.2.1 | create sam_testharness cvs package | VAW | |||
| 10.3.1 | Get SAM web pages up under runiicomputing area on fnalu, so can put that url on pamphlet available in booth. | VAW |