3

I am working with a drive with thousands of backup files, many of which are just cluttering space.

The program our employees use creates a backup of a spreadsheet every time they open one for editing. Each backup is a new file named according to the spreadsheet + the date + the time. We have a separate backup directory for each of our clients, and each directory has multiple spreadsheets and multiple backups of each spreadsheet.

In the end, we have a file structure filled with files like this:

Companyx/backup/abssheet_091210_111006.bps Companyx/backup/abssheet_091210_133335.bps Companyx/backup/xyzsheet_091210_145223.bps Companyx/backup/xyzsheet_100803_100332.bps Companyx/backup/xyzsheet_100812_111244.bps Companyy/backup/gnu_sheet_081029_110455.bps Companyy/backup/gnu_sheet_081029_111233.bps Companyy/backup/gnu_sheet_081029_112355.bps 

We only need to keep the most recent 2 backups of any particular sheet. Out of the 8 files I listed here, I would want to keep 6. The date and time in the filename is unimportant, as I can use the date and time from the file information. But the filenames cannot end up changed.

I have played around with powershell some, and I already used gci to move these to a file location of their own. I can also strip the date and time strings from the filenames. I also found a powershell script to remove all but the 2 newest files from a particular directory. But I am at a loss on how to selectively delete what I want to.

So far, I have written/modified the following code:

$newlist = New-Object System.Collections.Generic.List[System.String] $fulllist = gci . | where {-not $_.PsIsContainer} | sort Name $array = @() foreach ($object in $fulllist) { $string = $object.name $psworiginal = $string.Replace("_"+($string -split "_")[-1]," ") $psworiginal2 = $psworiginal.Replace("_"+($psworiginal -split "_")[-1]," ") $newlist.Add($psworiginal2) } $newlist = $newlist | select -unique 

This gives me a list of the individual spreadsheets. But them I'm not sure how to work from that list to go back through the original list and remove all but the latest 2 backup of each spreadsheet.

Ideally, I would like to put the -Recurse parameter back in the gci call and have it go through a complex directory structure to weed out older backups in every directory.

5
  • If you have a recent Powershell (>= 3) you can specify files with gci -File and save the extra pipeline. If your file names are consistent for the first y chars, you can Group-Object -Property {$_.name[y]} to group the files. And finally, if you can trust your file time stamps, all of this becomes much easier. Commented Nov 25, 2015 at 23:00
  • 1
    This should point you in the right direction superuser.com/questions/794282/… Commented Nov 25, 2015 at 23:02
  • I've read this post a few times and I'm still not clear on what you're trying to do. Probably just me though. Commented Nov 30, 2015 at 18:31
  • I finally figured out how to do what I wanted. I'm not sure it was the most efficient, but it worked well to use a hashtable to keep track of how many times each file was found. As I found a file, I checked to see if it was in the hashtable or not. If it was not, I added it with a value of 1. If it was there with a value of 1, I changed the value to 2. Otherwise, I deleted the file. If it would be helpful to see some of the specific code, I can post it. Commented Dec 2, 2015 at 20:41
  • @Brian, please consider marking the answer as the solution if it solved you issue. Commented May 27 at 16:52

1 Answer 1

0

This will remove every file based on 'key' from each company folder except two newest one Filtering is based on LastWriteTimeUTC

In this code, the $local:allFiles = @{}; in Filter-BackupFiles function is a key-value hastable where THE KEY is part of filename before 12345_12345.bas (the fileKey named selection in RegEx, in my example it is everything before last _ symbol) ans THE VALUE is array of file objects (not file names). For each file I add (using Add-Member) an attribute for soring (in my case it is LastWriteTimeUTC, you can do something else in your case)

Then for each key (which is filename prefix) I sort the list of file objects and adding to removal list all except two ($Script:KeepMostRecent = 2) first files (this means two newest files because they were sorted by date desceding).

$Script:StartPath = 'D:\Test_1' $Script:KeepMostRecent = 2; $Script:BackupSubfolder = 'backup' $Script:RegexFilter = '^(?<fileKey>.*)_\d+_\d+\.bps$' $Script:SimulatingMode = $true #In my case regex is overriden because of file names $Script:RegexFilter = '^(?<fileKey>.+)_[^_]+\.txt$' #THIS IS DEV OVERRIDE Function Log-Error { Param ( [Parameter(Mandatory=$true)] [String]$LogMessage ) Write-Host -ForegroundColor Yellow "Error: $($LogMessage)"; } #End of Log-Error Function Get-Companies { Param ( [Parameter(Mandatory=$true)] [String]$SearchBase ) $local:companyDirectoies = @() try { $local:companyDirectoies = @(Get-ChildItem -Path $SearchBase -Recurse:$false -ErrorAction Stop | Where-Object {$_.psIsContainer -eq $true} -ErrorAction Stop | ForEach-Object {return $_.FullName} -ErrorAction Stop ) } catch { Log-Error -LogMessage $([String]::Format("Error while getting companies list: {0}", $_.Exception.Message)) return $null } return $local:companyDirectoies } Function Get-BackupFiles { Param ( [Parameter(Mandatory=$true)] [String]$CompanyDirectoryPath ) $local:files = @(); try { $local:files = @( Get-ChildItem -Path $CompanyDirectoryPath -Recurse:$false -ErrorAction Stop | Where-Object {$_.psIsContainer -eq $false} -ErrorAction Stop) } catch { Log-Error -LogMessage $([String]::Format("Error while getting Backup file list for path {0}: {1}", $CompanyDirectoryPath ,$_.Exception.Message)) return $null } return $local:files } Function Filter-BackupFiles { Param ( [Parameter(Mandatory=$true)] [Object[]]$CompanyBackupFiles ) $local:allFiles = @{}; $local:filesToRemove = @() foreach ($local:f in $CompanyBackupFiles) { $local:lastFileDate = $local:f.LastWriteTimeUtc $local:f | Add-Member -MemberType NoteProperty -Name 'LastDate' -Value $( $local:lastFileDate ) $local:fileName = $local:f.Name if ($local:fileName -match $Script:RegexFilter) { $local:fileKey = $Matches['fileKey'] #Use NotCContains ir you need case-sensitive filtering if ($local:allFiles.Keys -notcontains $local:fileKey) { $local:allFiles[$local:fileKey] = @() } $local:allFiles[$local:fileKey] += @($local:f) } else { Log-Error -LogMessage $([String]::Format( "Error - the file name {0} does not match regEx. None will be processed for this list",$local:fileName)) return $null } } foreach ($local:k in $local:allFiles.Keys) { Write-Host -ForegroundColor White "Checking files for key $($local:k)" $local:files = @( $local:allFiles[$local:k] | Sort-Object -Property 'LastDate' -Descending ) $local:filesToKeep = $Script:KeepMostRecent foreach ($local:f in $local:files) { $local:filesToKeep-- Write-Host -ForegroundColor White -NoNewline "$($local:f.FullName)`t$($local:f.LastDate)" if ($local:filesToKeep -lt 0) { $local:filesToRemove += @($local:f.FullName) Write-Host -ForegroundColor Red "`tMARKED TO REMOVE" } else { Write-Host -ForegroundColor Green "`tMARKED TO LIVE" } } } return $local:filesToRemove } Function _main { $local:AllFilesToRemove = @() $local:companiesPathList = @(Get-Companies -SearchBase $Script:StartPath) if ($local:companiesPathList.Count -le 0) { Log-Error -LogMessage "Companies list is empty" return } forEach ($local:comanyPath in $local:companiesPathList) { Write-Host -ForegroundColor White "`r`n`r`nProcessing company on path $($local:comanyPath)" $local:companyBackupFolder = "" try { $local:companyBackupFolder = $( Join-Path -Path $local:comanyPath -ChildPath $Script:BackupSubfolder -ErrorAction Stop ) $local:allCompanyFiles = Get-BackupFiles -CompanyDirectoryPath $local:companyBackupFolder -ErrorAction Stop } catch { Log-Error -LogMessage "Error getting backup files for company $($local:comanyPath) : $($_.Exception.Message)" } if (($local:allCompanyFiles.Count -le 0) -or ($local:allCompanyFiles -eq $null)) { Log-Error -LogMessage "Company $($local:companyBackupFolder) does not have files in backup. Will ignore it." continue } $local:companyFilesToRemove = Filter-BackupFiles -CompanyBackupFiles $local:allCompanyFiles if (($local:companyFilesToRemove.Count -le 0) -or ($local:companyFilesToRemove -eq $null)) { Write-Host -ForegroundColor Cyan "Company $($local:comanyPath) does not have files to remove. Will ignore it." continue } Write-Host -ForegroundColor White "Company $($local:comanyPath) have $($local:companyFilesToRemove.Count) file to remove" $local:AllFilesToRemove += @( $local:companyFilesToRemove ) } Write-Host -ForegroundColor White "Totally we have $($local:AllFilesToRemove.Count) files to remove" foreach ($local:f in $local:AllFilesToRemove) { Write-Host -ForegroundColor White "Removing $($local:f)" try { Remove-Item -Path $local:f -Force -Confirm:$false -WhatIf:$Script:SimulatingMode -ErrorAction Stop } catch { Log-Error -LogMessage "Error removing file $($local:f) : $($_.Exception.Message)" } } } _main 

So the output will be

Processing company on path D:\Test_1\Company1 Checking files for key File1_Custom_Name D:\Test_1\Company1\backup\File1_Custom_Name_bak1.txt 02/28/2016 07:07:38 MARKED TO LIVE D:\Test_1\Company1\backup\File1_Custom_Name_Bak2.txt 02/28/2016 07:06:38 MARKED TO LIVE D:\Test_1\Company1\backup\File1_Custom_Name_Bak3.txt 02/28/2016 07:05:38 MARKED TO REMOVE Checking files for key File2 D:\Test_1\Company1\backup\File2_Bak1.txt 02/28/2016 07:07:38 MARKED TO LIVE D:\Test_1\Company1\backup\File2_Bak2.txt 02/28/2016 07:06:38 MARKED TO LIVE D:\Test_1\Company1\backup\File2_Bak3.txt 02/28/2016 07:05:38 MARKED TO REMOVE Company D:\Test_1\Company1 have 2 file to remove Processing company on path D:\Test_1\Company2 Checking files for key File2 D:\Test_1\Company2\backup\File2_Bak3.txt 02/28/2016 07:58:34 MARKED TO LIVE D:\Test_1\Company2\backup\File2_Bak2.txt 02/28/2016 07:58:31 MARKED TO LIVE D:\Test_1\Company2\backup\File2_Bak1.txt 02/28/2016 07:58:28 MARKED TO REMOVE Checking files for key File4 D:\Test_1\Company2\backup\File4_Bak1.txt 02/28/2016 07:59:43 MARKED TO LIVE D:\Test_1\Company2\backup\File4_Bak3.txt 02/28/2016 07:58:42 MARKED TO LIVE D:\Test_1\Company2\backup\File4_Bak2.txt 02/28/2016 07:58:39 MARKED TO REMOVE Checking files for key File1 D:\Test_1\Company2\backup\File1_Bak3.txt 02/28/2016 07:58:25 MARKED TO LIVE D:\Test_1\Company2\backup\File1_Bak2.txt 02/28/2016 07:58:22 MARKED TO LIVE D:\Test_1\Company2\backup\File1_bak1.txt 02/28/2016 07:58:17 MARKED TO REMOVE Company D:\Test_1\Company2 have 3 file to remove Processing company on path D:\Test_1\Company3 Checking files for key File2 D:\Test_1\Company3\backup\File2_Bak1.txt 02/28/2016 07:07:38 MARKED TO LIVE D:\Test_1\Company3\backup\File2_Bak2.txt 02/28/2016 07:06:38 MARKED TO LIVE D:\Test_1\Company3\backup\File2_Bak3.txt 02/28/2016 07:05:38 MARKED TO REMOVE Checking files for key File4 D:\Test_1\Company3\backup\File4_Bak1.txt 02/28/2016 07:07:38 MARKED TO LIVE D:\Test_1\Company3\backup\File4_Bak2.txt 02/28/2016 07:06:38 MARKED TO LIVE D:\Test_1\Company3\backup\File4_Bak3.txt 02/28/2016 07:05:38 MARKED TO REMOVE Checking files for key File1 D:\Test_1\Company3\backup\File1_bak1.txt 02/28/2016 07:07:38 MARKED TO LIVE D:\Test_1\Company3\backup\File1_Bak2.txt 02/28/2016 07:06:38 MARKED TO LIVE D:\Test_1\Company3\backup\File1_Bak3.txt 02/28/2016 07:05:38 MARKED TO REMOVE Company D:\Test_1\Company3 have 3 file to remove Totally we have 8 files to remove Removing D:\Test_1\Company1\backup\File1_Custom_Name_Bak3.txt Removing D:\Test_1\Company1\backup\File2_Bak3.txt Removing D:\Test_1\Company2\backup\File2_Bak1.txt Removing D:\Test_1\Company2\backup\File4_Bak2.txt Removing D:\Test_1\Company2\backup\File1_bak1.txt Removing D:\Test_1\Company3\backup\File2_Bak3.txt Removing D:\Test_1\Company3\backup\File4_Bak3.txt Removing D:\Test_1\Company3\backup\File1_Bak3.txt 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.