Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.


JimsonWeed

126 posts

Master Geek
Inactive user


#207511 30-Dec-2016 11:55
Send private message

Greetings;


Over the years, I've collected a metric tonne of data.  Naturally, as new computers entered the household, I would migrate archives of files from one computer to the next.  Before long, I ended up with an enormous amount of duplication.  Since I'm a *nix fan, the platform and scripting is all Linux based so, as I describe this, I'll be speaking in those terms.


I ran a little utility called FSLINT Janitor.  One of the little features of this utility is that it will recursively search your directory structures for duplicates.  It uses MD5 values as it means of determining duplication.  Well... it found 90GB worth of duplicates (~43,900 files).  You'd have to know what I do to understand why there is so much :)  Anyhow, the output looks like this;


#2 x 41,747,941 (41,750,528)    bytes wasted
/home/APPS/Temp-A/Visual Studio 2008 Dev/msdn/cab35.cab
/home/APPS/VS8/Visual Studio 2008 Dev/msdn/cab35.cab
#5 x 10,363,392 (41,467,904)    bytes wasted
/home/APPS/Temp-A/Visual Studio 2008 Dev/TFC/WCU/DExplore/DExplore.exe
/home/APPS/Temp-A/Visual Studio 2008 Dev/WCU/DExplore/DExplore.exe
/home/APPS/VS8/Visual Studio 2008 Dev/TFC/WCU/DExplore/DExplore.exe
/home/APPS/VS8/Visual Studio 2008 Dev/WCU/DExplore/DExplore.exe
/home/APPS/VS8/Visual Studio 2008 Dev/msdn/WCU/DExplore/DExplore.exe
#2 x 40,891,792 (40,894,464)    bytes wasted
/home/Download/KindleForPC-installer.exe
/home/FROMGENERIC/Downloads/KindleForPC-installer.exe


So now I have a list of all the dupes and where they are located.  Obviously, it's fairly easy to script this to remove the excess and keep one, or so one would think.  I wrote a little programme to read the output file and like where there are 5 dupes... mark 4 and keep 1.  Since I used to know TCL fairly well, I chose to write a TCLSH script;


------------------------


#!/usr/bin/tclsh

#set words [exec /usr/bin/md5sum $line]

proc rFile {_inFile} {
  global cmp iFile
  set iFile $_inFile
  set marker "#"
  set iFile [ open $_inFile r]
  while {[gets $iFile line] >= 0} {
    set cmp [string first $marker $line]
    if {$cmp == 0} {
       puts "MARK: $line"
       set kk "[string range $line 1 1]"
       set xx [expr $kk-1]
    } else {
       set stop [expr $xx-1]
         if {$stop == 0} {
           set words $line
           puts "DELETE: $stop $words"
           set stop [expr $xx-1]
         } else {
           set words $line
           puts "KEEP: $stop $words"
         }
    }
  }
  close $iFile
}
rFile {dupes.txt}


------------------------


Please don't laugh at my shoddy coding because, it works all the way until I get to the bold area. The current output looks something like this;


MARK: #3 x 1,142,274,512    (2,284,560,384)    bytes wasted
KEEP: 1 /home/PICTURES/ACamera/Paraparaumu/20131226_154036.mp4
KEEP: 1 /home/PICTURES/GALLERY/Paraparaumu/20131226_154036.mp4
KEEP: 1 /home/VAR/html/Movies/20131226_154036.mp4
MARK: #3 x 1,068,809,035    (2,137,628,672)    bytes wasted
KEEP: 1 /home/PICTURES/ACamera/Wellington Zealandia/20131228_125608.mp4
KEEP: 1 /home/PICTURES/GALLERY/WellingtonZealandia/20131228_125608.mp4
KEEP: 1 /home/VAR/html/Movies/20131228_125608.mp4
MARK: #3 x 936,830,839    (1,873,674,240)    bytes wasted
KEEP: 1 /home/PICTURES/ACamera/Akatarawa Valley/20131228_165043.mp4
KEEP: 1 /home/PICTURES/GALLERY/AkatarawaValley/20131228_165043.mp4
KEEP: 1 /home/VAR/html/Movies/20131228_165043.mp4


I cannot remember how to structure decrements such that it marks 4, keeps 1 (etc).  Maybe I've structured it wrong altogether and simply coded myself into a corner.  It's just been so freaking long but, I think a coder will capture what it is I want to do.  Right now, I'm simply putting DELETE or KEEP as an output statement for debugging.  Ultimately it will be replaced with set var [exec rm -ef $fName] or something similar.


Any thoughts or advice will be greatly welcomed and appreciated.


Cheers


Create new topic
JimsonWeed

126 posts

Master Geek
Inactive user


  #1696522 30-Dec-2016 16:29
Send private message

Well, I guess we can disregard this... I solved it  :)  I'll tweak it for it's intended purpose but, if you use JSLINT to find duplicates on a Linux box and want to delete them quickly... Here you go.  Run this on the output file from JSLINT and redirect the output to another file.   ./dupeKill.tcl > kDupe.sh   chmod +x kDupe.sh    and then off you go.

 

---------------------------------------

 

#!/usr/bin/tclsh

proc rFile {_inFile} {
  global cmp iFile
  set iFile $_inFile
  set marker "#"
  set iFile [ open $_inFile r]
  puts "#!/bin/bash"

 

  while {[gets $iFile line] >= 0} {
    set cmp [string first $marker $line]
    if {$cmp == 0} {
       set kk "[string range $line 1 1]"
       set xx [expr $kk-1]
       set jj $xx
    }

 

    if {!$cmp == 0} {
       if {$jj > 0} {
         puts "rm -rf \"$line\""
       } else {
         puts "mv \"$line\" /var/archive"
       }
    set jj [expr $jj-1]
    }
  }
  close $iFile
}
rFile {dupes.txt}

 

---------------------------------------


Create new topic





News and reviews »

Air New Zealand Starts AI adoption with OpenAI
Posted 24-Jul-2025 16:00


eero Pro 7 Review
Posted 23-Jul-2025 12:07


BeeStation Plus Review
Posted 21-Jul-2025 14:21


eero Unveils New Wi-Fi 7 Products in New Zealand
Posted 21-Jul-2025 00:01


WiZ Introduces HDMI Sync Box and other Light Devices
Posted 20-Jul-2025 17:32


RedShield Enhances DDoS and Bot Attack Protection
Posted 20-Jul-2025 17:26


Seagate Ships 30TB Drives
Posted 17-Jul-2025 11:24


Oclean AirPump A10 Water Flosser Review
Posted 13-Jul-2025 11:05


Samsung Galaxy Z Fold7: Raising the Bar for Smartphones
Posted 10-Jul-2025 02:01


Samsung Galaxy Z Flip7 Brings New Edge-To-Edge FlexWindow
Posted 10-Jul-2025 02:01


Epson Launches New AM-C550Z WorkForce Enterprise printer
Posted 9-Jul-2025 18:22


Samsung Releases Smart Monitor M9
Posted 9-Jul-2025 17:46


Nearly Half of Older Kiwis Still Write their Passwords on Paper
Posted 9-Jul-2025 08:42


D-Link 4G+ Cat6 Wi-Fi 6 DWR-933M Mobile Hotspot Review
Posted 1-Jul-2025 11:34


Oppo A5 Series Launches With New Levels of Durability
Posted 30-Jun-2025 10:15



Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.