The tagging process will set table statuses to 'pending delete', a logic delete, in all available tables. There will be tables that have no status, and will be deleted from as well. Once a data file has a 'pending delete' status, it must be segregated from the non deleted data. 'Pending delete' data should not be included in any queries, projects, snapshots, processes, etc. After a time, when it is jointly decided that the pending delete data can be phyiscally deleted, a third and final step will be run. Currently, the following tables have status columns and can be tagged. Others may be needed, but do not current have status columns.
Dbserver methods must be written to modify the statuses of these tables. Possibly 1 method sending a table name and status.
Another possibility for the tagging is to only tag the data_file records as pending delete, and not the other tables. This possibility needs to be more thouroughly researched.
The delete process begins with a query that identifies the file_ids of the files in error. This list must be the basis of also identifying all the children of these files and delete them and all their processes as well. A Dbserver method must also be written that will use as input, the candidate list of file_ids returned from the original query to identify deleteable files, and return a list of all the children. The child list of file_ids must be appended to the original list of file_ids so as to insure deletion of not only the original files, but all the offspring as well. This final list of deletable file_ids must not contain any duplicate file_ids, as can happen if the original query returns both parent and some child file_ids,and then the file_lineage tree walk also returns the same/duplicate child file_ids.
Once the table(s) are tagged pending delete, extensive testing must be done to verify the new status code, 'pending delete' is being used properly in the dbserver code. For example, not creating new analysis_projects with files that are tagged as pending delete. All the misweb reports will need testing as well. Problem...project_snapshots does not have a status, however, any snapshot is a potential delete candidate. Does there need to be dbserver code written that disallows the continued use of files that are tagged as pending delete, but are assigned to existing snapshots? Will the method that processes project_definitions need modification to ignore and/or communicate to the user certain data_files are unavailable do to a pending delete status? What about restarting a project that now contains pending delete files? New file_lineages must not be able to be established while a data_file is in 'pending delete' status. There are many circumstances that must be tested to insure the 'pending delete' data_files are no longer used.
Along with the process to tag rows as 'pending delete' we need another process to revert the status from 'pending delete' to ??????, incase there has been an error in the tagging as 'pending delete' process. What to set the status back to is an issue, as we do not save the original status.
The third step is to physically delete the rows. The query will not only check to see if the data files and other data meet the original logical delete criteria, but will also check that the status = 'pending delete'. This step should be run only by someone on the Sam design team that understand exactly what they are doing. Deleting from the tables must be done in the correct order to insure constraint violations will not be incurred. The only recovery from running this delete job is to restore the tables. The counts from the delete should match the counts from the pending delete run. Dbserver methods must be written to delete from all the tables, based on a provided where clause.
=============================================================================
Project : SAM
Package : sam_doc
$Id: sam_delete.html,v 1.3 2000/06/14 13:45:59 jtrumbo Exp $
This work is part of a development project, called SAM, which consists
of a
number of coordinated packages each named sam_xxxx .
Notice of authorship, copyright status, and terms and conditions,
should
the software eventually become available for use outside Fermilab,
can be
found in the README and LICENCE files in the top level directory of
the main
sam package.
==============================================================================