# Importing Duplicates?



## Rick (Mar 23, 2016)

Does anyone know what criteria LR 6 uses to identify "Suspected Duplicates"?

I am finding that it will re-import identical files if they are being imported from outside the Parent Folder where my catalogued photos are actually stored. I normally dump my jpgs and RAW files from the camera memory cards into a folder called Importing Folder. If I try to import images from this folder that are already in the catalog (true Duplicates), LR does NOT recognize them as Duplicates. It will allow me to import the files and it will attach a "-2" to the end of the filename, even though I have chosen to Not Import Suspected Duplicates in the Import dialog. Even if I choose to import only New Photos, rather than All Photos in the Import dialogue, LR doesn't recognize them as duplicates. I have tried Copying, Moving and Adding, without understanding what is going on here. As a test, I tried choosing to import images from a sub-folder where I store the pictures already in my catalog. (I store the catalogued files by Date.) When I did this, LR correctly identified the pictures that were "suspected duplicates" and greyed them out. So this leaves me wondering what the criteria are that are used to recognize "Suspected Duplicates". Is it the full extended path name (including containing folders)? 

I have tried Optimizing the Catalog and re-setting preferences. Could it be that LR or my catalog is corrupted and this is causing the problem? Do I need to re-install LR 6 or create a new catalog?

Any clarification of this puzzling issue would be much appreciated. 

Rick


----------



## Hal P Anderson (Mar 23, 2016)

Rick,
Welcome to the forum.

To be considered a duplicate, the file must match:

Original file name (as it was when first imported)

File size
and the EXIF capture date and time
If all those conditions are met, the images are considered duplicates and should not be imported if you have the Don't import suspected duplicates box ticked.


----------



## Rick (Mar 23, 2016)

By gosh, Hal, I think you have found the answer. When I checked two of the files that were duplicates, but were being "re-imported", the file sizes were different. That is undoubtedly why the files were not recognized as duplicates. I think that may be because the file was originally saved to a MacBook Air and some of these files have been moved around a bit before being imported into a Lightroom Catalog on a Windows 7 desktop PC. I wouldn't be surprised if the file sizes are different on the different platforms and file systems./ The next question is: Is there a way to relax the File Size criteriion in LR so that it would look only at the file name and the EXIF data to determine a duplicate? I am sure that I am not the first (and won't be the last) to move images across platforms.

Thank you Hal. 

Understanding the problem puts me on the road to a solution.

Rick


----------



## Hal P Anderson (Mar 23, 2016)

Rick,
I'm not convinced that merely moving a file between platforms would cause it to change size, and I can't test it. Do the files that are still have the same sizes show as duplicates?


----------



## Rick (Mar 24, 2016)

I think that the differences in the file systems and the type of attributes that they store with the file result in file sizes that are different enough for LR to see them as different files. This article seems to support that line of thought:

<Apple In particular, see the last paragraph in Kurt Lang's comments. I knew that the old Mac OS had additional attributes, but I wasn't sure about OS X.

The files that I have left all seem to be a different size from the ones in the catalog. I have been at this a few days (weeks?, months?) now and it seems that I have deleted most of the "Duplicates" that have the same file size as the catalogued files, leaving only the identical files with different file sizes (which do not get picked up by LR). I have tested a few Duplicate File Finder programs and they seem to exhibit the same behaviour, unless there is an option to allow matches when the file size differs by a user-specified maximum (e.g. CloneSpy). I will keep looking for an identical file to one in the catalog that LR DOES pick up as a Suspected Duplicate to further test this hypothesis. 

Thanks for your help.

Rick


----------



## Sprocket (May 30, 2016)

I have had endless trouble with the way LR handles the duplicates issue.  Working with a friend's archive of almost 400,000 images I keep running into situations where duplicates slip through - here's a current one: Old external backup drive found in camera bag and need to check these (3400) files in catalog and saved on NAS.  LR reports that they are all new but I seem to have seem some of the images before - a check shows that sure enough I had imported (at least that one and its neighbors) a couple of years back.  The only difference I can see is the the one in the catalog is a DNG and the one from the newly found drive is the original CR2.  Does LR really look at the file extension literally in that way, and if so how do I do this import and not bring in duplicates?


----------



## airhawk10 (May 30, 2016)

I believe LR looks at the file name.  If the file name is in anyway different from what is already in your catalog LR assumes it is a new file.  If you suspect your CR2 files are duplicates, you can show all these files by using the filter tool in the Library module.  In the text attribute type in .CR2.  This will pull up all the CR2 files.  If you decide these are all duplicates you can flag all as reject quickly.  Hope this helps.


----------



## Victoria Bampton (May 30, 2016)

LR looks at the entire filename and file size and capture time of the file on the card and compares them against the same metadata of the file AT THE TIME OF ITS IMPORT.

So if the photos was a CR2 when it was originally imported into THIS catalog and then converted to DNG (even as part of the import process), then the same CR2 on the card should be recognized as a duplicate.  But if the photo was initially imported into a different catalog, or was converted to DNG first, then it won't know the difference.

If I had to import the suspected duplicates, I might mark the new import with a purple label immediately after import so I could easily spot them when scrolling through and remove any duplicates, or use the Duplicate Finder plug-in to identify them once they're in the catalog.


----------



## PhilBurton (May 30, 2016)

Rick said:


> I think that the differences in the file systems and the type of attributes that they store with the file result in file sizes that are different enough for LR to see them as different files. This article seems to support that line of thought:
> 
> I have tested a few Duplicate File Finder programs and they seem to exhibit the same behaviour, unless there is an option to allow matches when the file size differs by a user-specified maximum (e.g. CloneSpy). I will keep looking for an identical file to one in the catalog that LR DOES pick up as a Suspected Duplicate to further test this hypothesis.
> 
> ...


Rick,

Reply to an old post.  Which Duplicate File Finder programs, free or paid, have you tested?

Phil


----------



## Hiace_Drifter (May 31, 2016)

Hmmm .... I am having a problem with duplicates which I believe should be spotted but aren't:

1. Import Fuji raw files files (batch "A"), these are converted to DNG in LR
2. A few days later import new Fuji RAW files (batch "B") from the same card, which still contains batch A
3. LR imports all of the files, even though Batch A have been imported before (but converted of course) 

...surely it knows the files are dupes. even though the extension is .RAF on the memory card and .DNG in LR?


----------



## clee01l (May 31, 2016)

neilp2000 said:


> Hmmm .... I am having a problem with duplicates which I believe should be spotted but aren't:
> 
> 1. Import Fuji raw files files (batch "A"), these are converted to DNG in LR
> 2. A few days later import new Fuji RAW files (batch "B") from the same card, which still contains batch A
> ...


This was covered by Victoria above + Quote.   Though generally, this process works well, I had a similar experience last week in the Trans-Pecos with my Nikon. I imported NEFs from the camera card (no conversion) – batch A.  Later I imported batch B from the same card containing batch A.  Batch A was identified as duplicate  as expected.  When I imported Batch C, batch A was Identified as a duplicate Batch B was not.  On importing Batch D, Batch A & C were ID' as duplicate but not Batch B.   Batch B may have contained file numbers that included sequence 9999 followed by sequence 0001.  (i.e. Camera odometer tripped 10000 images.)


----------



## Sprocket (Jun 1, 2016)

Sorry for having been out of the discussion for a couple of days - I thought I'd opted to watch the thread but somehow that didn't happen.  Anyway thanks for the replies.

Victoria, the files in the catalog were made from exactly the same CR2 images but it's possible that they were not from the drive that I was given to check - ie transferred, imported and converted to DNG from a different backup.  Nevertheless the name, created by date and filesize have not been changed in any way as far as I know: only the fact that the ones in the catalog are DNG seems to make them different.  I read somewhere that copying a file to different devices can result in a tiny change to the size - one hopes that LR is smart enough to have some fuzziness built in to take care of something like that.  Judging by some of the above posts the duplicate identification is not 100% reliable and whereas it might be tolerable in a consumer program like Elements it's not acceptable in LR which many professionals rely on.

I do see how it can be got round but (having done this kind of thing before) checking even colour flagged images for dupes can be a long, tiresome (and therefore expensive) process.  I haven't yet found a routine (paid or otherwise) that will check for and identify duplicates reliably enough to allow automatic deletion without my having to manually confirm each set, but if anyone has a candidate I'd be pleased to give it a try!

What LR sorely needs is a list of rules on which to base the "Do Not Import Suspected Duplicates" decision so the user can select the right set for any import situation, including those where there is a different filename, for example if an OS  has suffixed a copy number as a result of the same file having been put into a backup folder more than once.

I'd love to be told that my problems are caused by something I'm doing wrong, but based on my import woes to date I would think more than twice about using LR to manage a large and critical archive.


----------



## Victoria Bampton (Jun 1, 2016)

Sprocket said:


> I haven't yet found a routine (paid or otherwise) that will check for and identify duplicates reliably enough to allow automatic deletion without my having to manually confirm each set, but if anyone has a candidate I'd be pleased to give it a try!



The plug-in I linked to above allows you to select a combination of EXIF metadata criteria, so it's as close as you'll get to identifying accurately.


----------



## Hiace_Drifter (Jun 2, 2016)

clee01l said:


> This was covered by Victoria above + Quote.   Though generally, this process works well, I had a similar experience last week in the Trans-Pecos with my Nikon. I imported NEFs from the camera card (no conversion) – batch A.  Later I imported batch B from the same card containing batch A.  Batch A was identified as duplicate  as expected.  When I imported Batch C, batch A was Identified as a duplicate Batch B was not.  On importing Batch D, Batch A & C were ID' as duplicate but not Batch B.   Batch B may have contained file numbers that included sequence 9999 followed by sequence 0001.  (i.e. Camera odometer tripped 10000 images.)



Yes the reason I asked is because what Victorian said should happen isn't, i.e. 





> "So if the photos was a CR2 when it was originally imported into THIS catalog and then converted to DNG (even as part of the import process), then the same CR2 on the card should be recognized as a duplicate.


----------



## Sprocket (Jun 2, 2016)

Thank you, Victoria.  I'm pretty sure I tried that one before but downloaded the manual anyway.  Unfortunately, it's still going to require manually checking the flagged files whereas what I'm looking for is a way of not importing them or at worst an explanation for this behavior in LR so that I can avoid a similar issue in the future.  Was hoping someone here knew why!


----------

