I am working with a drive with thousands of backup files, many of which are just cluttering space.
The program our employees use creates a backup of a spreadsheet every time they open one for editing. Each backup is a new file named according to the spreadsheet + the date + the time. We have a separate backup directory for each of our clients, and each directory has multiple spreadsheets and multiple backups of each spreadsheet.
In the end, we have a file structure filled with files like this:
Companyx/backup/abssheet_091210_111006.bps Companyx/backup/abssheet_091210_133335.bps Companyx/backup/xyzsheet_091210_145223.bps Companyx/backup/xyzsheet_100803_100332.bps Companyx/backup/xyzsheet_100812_111244.bps Companyy/backup/gnu_sheet_081029_110455.bps Companyy/backup/gnu_sheet_081029_111233.bps Companyy/backup/gnu_sheet_081029_112355.bps
We only need to keep the most recent 2 backups of any particular sheet. Out of the 8 files I listed here, I would want to keep 6. The date and time in the filename is unimportant, as I can use the date and time from the file information. But the filenames cannot end up changed.
I have played around with powershell some, and I already used gci to move these to a file location of their own. I can also strip the date and time strings from the filenames. I also found a powershell script to remove all but the 2 newest files from a particular directory. But I am at a loss on how to selectively delete what I want to.
So far, I have written/modified the following code:
$newlist = New-Object System.Collections.Generic.List[System.String] $fulllist = gci . | where {-not $_.PsIsContainer} | sort Name $array = @() foreach ($object in $fulllist) { $string = $object.name $psworiginal = $string.Replace("_"+($string -split "_")[-1]," ") $psworiginal2 = $psworiginal.Replace("_"+($psworiginal -split "_")[-1]," ") $newlist.Add($psworiginal2) } $newlist = $newlist | select -unique
This gives me a list of the individual spreadsheets. But them I'm not sure how to work from that list to go back through the original list and remove all but the latest 2 backup of each spreadsheet.
Ideally, I would like to put the -Recurse parameter back in the gci call and have it go through a complex directory structure to weed out older backups in every directory.
gci -File
and save the extra pipeline. If your file names are consistent for the first y chars, you canGroup-Object -Property {$_.name[y]}
to group the files. And finally, if you can trust your file time stamps, all of this becomes much easier.