Application Overview
Nablet Duplicates Search works with a single folder or a disk where the content is located. The library might be well-organized with subfolders, or files can be chaotically named in a single folder. Nablet Duplicates Search scans all the files in the specified folder including subfolders, and compares them to find if there are any copies. The files may have differences in the content, for example, cut scenes – in this case the Duplicates Search reports about the percentage of duplicated content.
When you start the application, the user interface looks like
To start the analysis process, you should complete 2 steps.
- You should set the search folder. For this, use the Browse button.
- You should set a path to the internal indexing (fingerprinting) files. To do so, you should go to
File -> Preferences, or click the Preferences button. There you should set folders for the fingerprinting files and, optionally, specify a folder for the reports. Make sure that a disk where the folder for the indexing files is located has enough free space.
Once done, the application is ready to start the search.
For advanced settings, you can open the Preferences menu with File -> Preferences.
Preferences
It includes all the parameters that tweak the application behavior:
- Path to Fingerprinting Files – this is a folder where indexing files are stored. The indexing files might take up a lot of disk space if you process many files or long files. So be sure to use a disk with enough free space.
- Path to Reports – a folder where the resulting reports are stored. This folder is used to restore the results when you restart the application.
- Normalize Content Videos – whatever is the original file resolution or frame rate, normalize it to an internal format for better search accuracy.
- Re-check Scanned Files – when you run the search multiple times, define whether the software runs a comparison between 2 files if it compared them already during the previous runs.
- Target Tolerance – an interval for the target (from video archive) files where the detected segments are considered as a single fragment. For example, if detected segments are 1 second away from each other, and the tolerance is 2.0 then both segments are included in the resulting report as a single fragment.
- Source Tolerance – like the target tolerance, but for the content files.
- Frames Tolerance – an interval between the segments in frames to split the detected segments.
- Duplicates Threshold – a threshold for the detected duplicates. If the detected duplicate duration is longer than the specified part of the “original” file, the file detected file is considered a “duplicate” and the file is displayed in the resulting report.
- Accuracy – sets the preciseness of the results. Possible values are “Similar”, “Accurate”, and “Precise”.
- Parallel Processing instances – sets how many files can be processed at the same time.
To implement the updated preferences, click on the Save Preferences button. When you change the accuracy settings (tolerance values, duration, or accuracy level), the existing resulting reports are refreshed once you save the updated preferences.
Workflow
When the preferences are set, and the content and the archive folders are selected, you can start the search process with the Start button. Once started, the application performs normalization (if enabled) and scans the video files. It fills the internal database with the content information. And when a file from the archive has been scanned, the software searches for its fragments in the database.
When a match is detected, you can see the name of the content file and the name of the archive file in the list of Duplicates.
If there are no matches detected for a file, it is added to the “Unique” list:
If there is a problem with scanning a file, the file is displayed in the “Failed” list, where you can open the file in File Explorer or force another try to scan the file:
For big archives, the processing might take time, and the scanning may consume system resources (especially, if multiple instances are used). When you need time to perform some other operations, you can pause the process with the Pause button.
When all the files have been processed, the Status of the application becomes Pending. It means that the software waits for the new files to be added to the library folder.
To stop the process, you should press the Stop button.
In the Detected Duplicates list, you can look for a desired file in the search areas by typing the name of the desired file. Or you can sort the list by file name or by the number of detected duplicates.
With the eye button, you can open a report for the selected file on the right side of the application. The folder button shows you the content file in the file explorer.
Reports
When you open a report, you see a list with the following information:
- A name of a duplicate file,
- A percentage of the duplicated content duration,
- Basic information about both source and duplicate file (duration, video codec, resolution, bitrate),
- As a tooltip of each duplicate segment, the duration of the segment and the segment’s percentage.
You can preview the files with the player controls in the report area. To initialize the duplicate player, click the “Play” button next to the “Duplicate” information. Or you can click any of the duplicate fragments on a timeline.
You can get a quick summary of the report where the total duration of detected content in each of the video archive files is listed in the Summary area.
You can save a content report as a CSV file with the Export Report button. The resulting file you might use with electronic tables software, for example, MS Excel or Google Sheets.
Tips and Tricks
If the application hits the disk space limitation it stops the processing automatically. Make sure that you have enough free space on the disk for the indexing files. Approximately, you need to have at least half of the total folder size of free space. For example, an indexing file for 2.5 hours of video takes about 1.1 GB of disk space.
You can dynamically add new files to the media library folders while the search is running. Once a current search operation is over, the newly added file will be added to the overall processing.