Find duplicate files
S-01640
As a Tullow Administrator I want to find and remove duplicate files. I want to remove duplicate files for storage capacity reasons. Proprietary software to achieve a similar result costs about 20,000 euro.
Pre-Dev Notes:
Have a two stage process.
- Check the file size, text headers, the binary headers and about a thousand random traces.
- Print out a list of possible matches
Provide a utility that takes in this list and verifies if they are a match or not.
Two files might contain the same data in a different sorting order. These files should be matched too. An example approach would be to take a hash of each trace, then sort the hashes, and use that as a matching criteria. Maybe take a hash of the sorted hashes...?
Once two files match in all heuristic tests, they should be sorted and compared.
Implementation Notes:
ADDME
System Test Changes:
ADDME
Bug Fixes:
ADDME
C++ API Changes:
ADDME
C API Changes:
ADDME
Success Criteria:
Given a list of files, the program should list all files that contain the same data, regardless of sorting.
CREATED ON - 05/05/2017