After Using PhotoRec
It may be hard to sort the files recovered by PhotoRec. You can find here some ideas to help you in this process.
Sort files by extension
Using a powershell script under Windows
https://github.com/lconte/Copy-PhotoRecFilesbyExtension.ps1
Using a Python script
- You can use this Python script to sort found files by extension.
- Save the following code as a file (recovery.py) and then run it with the parameters of 'source' & 'destination'
Example: $ python recovery.py /home/me/recovered_files /home/me/sorted_files
#!/usr/bin/env python import os import os.path import shutil import sys source = sys.argv[1] destination = sys.argv[2] while not os.path.exists(source): source = raw_input('Enter a valid source directory\n') while not os.path.exists(destination): destination = raw_input('Enter a valid destination directory\n') for root, dirs, files in os.walk(source, topdown=False): for file in files: extension = os.path.splitext(file)[1][1:].upper() destinationPath = os.path.join(destination,extension) if not os.path.exists(destinationPath): os.mkdir(destinationPath) if os.path.exists(os.path.join(destinationPath,file)): print('WARNING: this file was not copied :' + os.path.join(root,file)) else: shutil.copy2(os.path.join(root,file), destinationPath)
Using a more complex Python script
There are a more extended Python programs like
that do the following things with your recovered data:
- Sort all files by file extensions into own folders.
- Limit the number of files/folder by creating subfolders if a certain numbers is exceeded. The file/folder number can be customized.
- For all jpgs: put them into own folders per year when they have been created (EXIF-Data). Within a year folders for every event are created, e.g. all photos taken at one weekend or vacation are sorted into one folder.
Using a shell script for Mac OS X and Linux
Here is an alternative implementation which "copies" much more quickly (by creating "hard links"):
#!/bin/bash recup_dir="${1%/}" [ -d "$recup_dir" ] || { echo "Usage: $0 recup_dir"; echo "Mirror files from recup_dir into recup_dir.by_ext, organized by extension"; exit 1 }; find "$recup_dir" -type f | while read k; do ext="${k##*.}"; ext_dir="$recup_dir.by_ext/$ext"; [ -d "$ext_dir" ] || mkdir -p "$ext_dir"; echo "${k%/*}" ln "$k" "$ext_dir"; done
Save it as photorec-sort-by-ext and run
$ bash photorec-sort-by-ext /home/me/recovered_files
This will create a folder called /home/me/recovered_files.by_ext
If you are only interested in files with a specific extension (e.g. only .jpg) you can use the following *nix command to find all files in the recovered directories and copy them to a new location:
$ find /path/to/recovered/directories -name \*.jpg -exec cp {} /path/to/new/folder/ \;
- https://github.com/danthem/PRECsort is another shell script that move files based on extension, remove duplicated files, rename jpg...
JPEG
- JPEG file sorting using Exif meta-data. (Archived version)
- Canon PowerShot models store their image sequence numbers in the Exif data, so using a program that can dump Exif data to text like jhead, and the following Perl script, you can essentially restore all the JPG files to their original names. --Vees 01:59, 8 January 2007 (CET) - You may have to install jhead first, e.g.: sudo apt install jhead --UlfZibis 16:00, 17 January 2018 (CET)
#!/usr/bin/perl -w # read optional working directory from the command line: $dir = (@ARGV > 0) ? $ARGV[0] : '.'; $dir =~ s/\/*$//; # truncate trailing '/'s foreach $file (glob "$dir/*") { chomp $file; open(EXIF, '-|', 'jhead', '-v', $file) or die "Not found jhead $!"; if (defined(<EXIF>)) { foreach $line (<EXIF>) { if ($line =~ /Canon maker tag 0008 Value = [1-9]\d\d(\d{1,8})$/) { rename($file, "$dir/IMG_$1.JPG"); print "$dir/IMG_$1.JPG from $file\n"; last; } } close EXIF or die "EXIF: $! $?"; } }
Additionally restore the files into there original folder tree: --UlfZibis 16:10, 17 January 2018 (CET)
#!/usr/bin/perl -w # read optional working directory from the command line: $dir = (@ARGV > 0) ? $ARGV[0] : '.'; $dir =~ s/\/*$//; # truncate trailing '/'s foreach $file (glob "$dir/*") { chomp $file; open(EXIF, '-|', 'jhead', '-v', $file) or die "Not found jhead $!"; if (defined(<EXIF>)) { $folder_no = '000'; $picture_no = '0000'; $date = '0000'; foreach $line (<EXIF>) { if ($line =~ /Canon maker tag 0008 Value = ([1-9]\d\d)(\d{1,8})$/) { $folder_no = $1; $picture_no = $2; } if ($line =~ /Time\s*:\s*\d{4}:(\d\d):(\d\d)\s[\d:]{8}$/) { $date = "$1$2"; # $date = "$2$1"; # if camera language setting = german last; } } print "$dir/$folder_no\_$date/IMG_$picture_no.JPG from $file\n"; mkdir("$dir/$folder_no\_$date"); rename($file, "$dir/$folder_no\_$date/IMG_$picture_no.JPG"); close EXIF or die "EXIF: $! $?"; } }
Or use this script to list all directories, search for files of a certain size, and place them in a date-based directory:
#!/usr/bin/perl -w $working_dir = '/home/myhome/'; $result_dir = '/home/myhome/photos/' $jhead_bin = '/usr/bin/jhead'; @rec_dirs = `ls ${working_dir} | grep recup_dir`; foreach $recup_dir (@rec_dirs) { print "Scanning ${recup_dir}..."; chomp $recup_dir; @photos_in_recup = `find ${working_dir}${recup_dir}/*jpg -type f -size +800k`; foreach $photo_file (@photos_in_recup) { chomp $photo_file; #print "IMG $photo_file in $recup_dir\n"; @exif = `$jhead_bin -v $photo_file`; #print "$jhead_bin -v $photo_file\n"; foreach $line (@exif) { if ($line =~ /Time\s*:\s*([0-9]{4}):([0-9]{2}):([0-9]{2})\s[0-9:]{8}$/) { print "IMG $photo_file $1-$2-$3\n"; system("mkdir ${result_dir}$1-$2-$3"); # system("mv $photo_file $result_dir/$1-$2-$3/"); last; } } } }
- The following command recreates the original directory layout and file names present on the card (for Canon cameras, tested with numerous photos from an EOS 20D), using the file number EXIF info. ExifTool works under both Windows and Linux.
exiftool -r '-FileName<IMG_${FileIndex}%c.%e' DIR
It uses FileIndex
from EXIF information in file to rename to original filename, the %c
is checking for duplicate names and appends other digit to the name. And it works recursively (-r
).
- Issue the following command using Exiv2 to rename all JPEGs to their respective date (the program will ask what to do if conflicts occur):
$ exiv2 -t rename *.jpg
- When using the above exiv2 renaming and you have multiple thousands of files to rename, some shells might issue an error like "Argument list too long". In that case, use the following workaround:
$ find ./ -exec exiv2 -t rename {} \;
In those cases, in which the number of files is very large, specifying a default action, e.g. always rename duplicates (-F) seems advisable:
$ find ./ -exec exiv2 -F -t rename {} \;
Finding duplicate
- FSlint Duplicate file finder for Linux (very simple to handle, includes a GUI)
- Under Linux or Mac OS X (or with perl and 'sum'), you can find duplicates in a hierarchy using find_dup.
- Under Linux or Mac OSX, md5sum can used to find duplicate files (maybe just md5'ing only the first x bytes).
In this example, we check for the first 80k of recup_dir*/*.sib
for file in recup_dir.*/*.sib; do MD5=`dd count=20 bs=4k if="$file" 2> /dev/null|md5sum`; echo "$MD5 $file"; done|sort 1a07198de3486ff2ecab7859612fe7ba - Box Clever.sib 33105f4a7997b2e2681e404b3ac895f2 - Random, Matching - 2 bars.sib 376e0c53e78e56ba6f2858d9680f8c6b - 01aIdentifyCommonInst.sib b0b40a516a1e26660748a0a09cdf3207 - 01ArticulationFlashcards.sib
Each checksum is unique - there are no duplicates.
- On Windows you can use the fc utility to find duplicates - the following batch file (does not work on Win9x/ME) might help: --Joey 08:36, 17 July 2008 (CEST)
@echo off SETLOCAL ENABLEEXTENSIONS ENABLEDELAYEDEXPANSION SET FILELIST= FOR %%i IN (*) DO ( FOR %%j IN (!FILELIST!) DO ( IF %%~zi EQU %%~zj ( fc /b "%%~i" "%%~j">NUL && echo "%%~i" = "%%~j" ) ) SET FILELIST=!FILELIST! "%%~i" ) ENDLOCAL
- On Windows you may add a "/r" (without the quotes) after both "for"s in the above batch file.
- On Unix machines, you can use fdupes and the following script to generate a shell script with rm statements to remove all duplicate files:
#!/bin/sh OUTF='rm-dups.sh' if [ -e $OUTF ]; then echo "File $OUTF already exists." exit 1; fi echo "#!/bin/sh" > $OUTF fdupes -r -f . | sed -r 's/(.+)/rm \"&\"/' >> $OUTF chmod +x $OUTF
MP3, mp4, Ogg vorbis...
Most mp3, mp4 and ogg files have embedded information about Title, Album and Author. To automatically rename the recovered audios and videos using this information, you can use
exiftool -r -ext mp3 "-filename<D:\NewDirPath\${artist;} - ${album;}\${title;}%-c.%e" "D:\recup_dir.1"
Microsoft Office
- To read a broken Microsoft Office document (doc/xls/ppt/...) that MS Office could not read, you can try LibreOffice.
LibreOffice is a multiplatform and multilingual office suite and an open-source project. Compatible with all other major office suites, the product is free to download, use, and distribute.
- Some Microsoft Office documents (xls/ppt/...) may be recovered with a Word .doc extension - you may need to rename these files.
MS Outlook
- To recover a broken Outlook PST file, try Microsoft Scanpst
Messenger Log File
- To recover a broken Messenger log file, try Vital. Here is a description of the Messenger Log file format.