1
mirror of https://github.com/rclone/rclone synced 2024-12-21 11:45:56 +01:00
rclone/docs/content/commands/rclone_dedupe.md

367 lines
28 KiB
Markdown
Raw Normal View History

---
2018-09-01 13:58:00 +02:00
date: 2018-09-01T12:54:54+01:00
title: "rclone dedupe"
slug: rclone_dedupe
url: /commands/rclone_dedupe/
---
## rclone dedupe
2017-12-23 14:07:45 +01:00
Interactively find duplicate files and delete/rename them.
### Synopsis
2017-09-30 15:19:47 +02:00
By default `dedupe` interactively finds duplicate files and offers to
delete all but one or rename them to be different. Only useful with
Google Drive which can have duplicate file names.
2017-09-30 15:19:47 +02:00
In the first pass it will merge directories with the same name. It
will do this iteratively until all the identical directories have been
merged.
The `dedupe` command will delete all but one of any identical (same
md5sum) files it finds without confirmation. This means that for most
duplicated files the `dedupe` command will not be interactive. You
can use `--dry-run` to see what would happen without doing anything.
Here is an example run.
Before - with duplicates
$ rclone lsl drive:dupes
6048320 2016-03-05 16:23:16.798000000 one.txt
6048320 2016-03-05 16:23:11.775000000 one.txt
564374 2016-03-05 16:23:06.731000000 one.txt
6048320 2016-03-05 16:18:26.092000000 one.txt
6048320 2016-03-05 16:22:46.185000000 two.txt
1744073 2016-03-05 16:22:38.104000000 two.txt
564374 2016-03-05 16:22:52.118000000 two.txt
Now the `dedupe` session
$ rclone dedupe drive:dupes
2016/03/05 16:24:37 Google drive root 'dupes': Looking for duplicates using interactive mode.
one.txt: Found 4 duplicates - deleting identical copies
one.txt: Deleting 2/3 identical duplicates (md5sum "1eedaa9fe86fd4b8632e2ac549403b36")
one.txt: 2 duplicates remain
1: 6048320 bytes, 2016-03-05 16:23:16.798000000, md5sum 1eedaa9fe86fd4b8632e2ac549403b36
2: 564374 bytes, 2016-03-05 16:23:06.731000000, md5sum 7594e7dc9fc28f727c42ee3e0749de81
s) Skip and do nothing
k) Keep just one (choose which in next step)
r) Rename all to be different (by changing file.jpg to file-1.jpg)
s/k/r> k
Enter the number of the file to keep> 1
one.txt: Deleted 1 extra copies
two.txt: Found 3 duplicates - deleting identical copies
two.txt: 3 duplicates remain
1: 564374 bytes, 2016-03-05 16:22:52.118000000, md5sum 7594e7dc9fc28f727c42ee3e0749de81
2: 6048320 bytes, 2016-03-05 16:22:46.185000000, md5sum 1eedaa9fe86fd4b8632e2ac549403b36
3: 1744073 bytes, 2016-03-05 16:22:38.104000000, md5sum 851957f7fb6f0bc4ce76be966d336802
s) Skip and do nothing
k) Keep just one (choose which in next step)
r) Rename all to be different (by changing file.jpg to file-1.jpg)
s/k/r> r
two-1.txt: renamed from: two.txt
two-2.txt: renamed from: two.txt
two-3.txt: renamed from: two.txt
The result being
$ rclone lsl drive:dupes
6048320 2016-03-05 16:23:16.798000000 one.txt
564374 2016-03-05 16:22:52.118000000 two-1.txt
6048320 2016-03-05 16:22:46.185000000 two-2.txt
1744073 2016-03-05 16:22:38.104000000 two-3.txt
Dedupe can be run non interactively using the `--dedupe-mode` flag or by using an extra parameter with the same value
* `--dedupe-mode interactive` - interactive as above.
* `--dedupe-mode skip` - removes identical files then skips anything left.
* `--dedupe-mode first` - removes identical files then keeps the first one.
* `--dedupe-mode newest` - removes identical files then keeps the newest one.
* `--dedupe-mode oldest` - removes identical files then keeps the oldest one.
2018-04-28 12:46:27 +02:00
* `--dedupe-mode largest` - removes identical files then keeps the largest one.
* `--dedupe-mode rename` - removes identical files then renames the rest to be different.
For example to rename all the identically named photos in your Google Photos directory, do
rclone dedupe --dedupe-mode rename "drive:Google Photos"
Or
rclone dedupe rename "drive:Google Photos"
```
rclone dedupe [mode] remote:path [flags]
```
### Options
```
2016-11-06 11:17:52 +01:00
--dedupe-mode string Dedupe mode interactive|skip|first|newest|oldest|rename. (default "interactive")
2017-09-30 15:19:47 +02:00
-h, --help help for dedupe
```
### Options inherited from parent commands
```
2018-09-01 13:58:00 +02:00
--acd-auth-url string Auth server URL.
--acd-client-id string Amazon Application Client ID.
--acd-client-secret string Amazon Application Client Secret.
--acd-templink-threshold SizeSuffix Files >= this size will be downloaded via their tempLink. (default 9G)
--acd-token-url string Token server url.
--acd-upload-wait-per-gb Duration Additional time per GB to wait after a failed complete upload to see if it appears. (default 3m0s)
--alias-remote string Remote or path to alias.
--ask-password Allow prompt for password for encrypted configuration. (default true)
--auto-confirm If enabled, do not request console confirmation.
--azureblob-access-tier string Access tier of blob, supports hot, cool and archive tiers.
--azureblob-account string Storage Account Name (leave blank to use connection string or SAS URL)
--azureblob-chunk-size SizeSuffix Upload chunk size. Must fit in memory. (default 4M)
--azureblob-endpoint string Endpoint for the service
--azureblob-key string Storage Account Key (leave blank to use connection string or SAS URL)
--azureblob-sas-url string SAS URL for container level access only
--azureblob-upload-cutoff SizeSuffix Cutoff for switching to chunked upload. (default 256M)
--b2-account string Account ID or Application Key ID
--b2-chunk-size SizeSuffix Upload chunk size. Must fit in memory. (default 96M)
--b2-endpoint string Endpoint for the service.
--b2-hard-delete Permanently delete files on remote removal, otherwise hide files.
--b2-key string Application Key
--b2-test-mode string A flag string for X-Bz-Test-Mode header for debugging.
--b2-upload-cutoff SizeSuffix Cutoff for switching to chunked upload. (default 190.735M)
--b2-versions Include old versions in directory listings.
--backup-dir string Make backups into hierarchy based in DIR.
--bind string Local address to bind to for outgoing connections, IPv4, IPv6 or name.
--box-client-id string Box App Client Id.
--box-client-secret string Box App Client Secret
--box-commit-retries int Max number of times to try committing a multipart file. (default 100)
--box-upload-cutoff SizeSuffix Cutoff for switching to multipart upload. (default 50M)
--buffer-size int In memory buffer size when reading files for each --transfer. (default 16M)
--bwlimit BwTimetable Bandwidth limit in kBytes/s, or use suffix b|k|M|G or a full timetable.
--cache-chunk-clean-interval Duration Interval at which chunk cleanup runs (default 1m0s)
--cache-chunk-no-memory Disable the in-memory cache for storing chunks during streaming
--cache-chunk-path string Directory to cache chunk files (default "/home/ncw/.cache/rclone/cache-backend")
--cache-chunk-size SizeSuffix The size of a chunk. Lower value good for slow connections but can affect seamless reading. (default 5M)
--cache-chunk-total-size SizeSuffix The maximum size of stored chunks. When the storage grows beyond this size, the oldest chunks will be deleted. (default 10G)
--cache-db-path string Directory to cache DB (default "/home/ncw/.cache/rclone/cache-backend")
--cache-db-purge Purge the cache DB before
--cache-db-wait-time Duration How long to wait for the DB to be available - 0 is unlimited (default 1s)
--cache-dir string Directory rclone will use for caching. (default "/home/ncw/.cache/rclone")
--cache-info-age Duration How much time should object info (file size, file hashes etc) be stored in cache. (default 6h0m0s)
--cache-plex-password string The password of the Plex user
--cache-plex-url string The URL of the Plex server
--cache-plex-username string The username of the Plex user
--cache-read-retries int How many times to retry a read from a cache storage (default 10)
--cache-remote string Remote to cache.
--cache-rps int Limits the number of requests per second to the source FS. -1 disables the rate limiter (default -1)
--cache-tmp-upload-path string Directory to keep temporary files until they are uploaded to the cloud storage
--cache-tmp-wait-time Duration How long should files be stored in local cache before being uploaded (default 15s)
--cache-workers int How many workers should run in parallel to download chunks (default 4)
--cache-writes Will cache file data on writes through the FS
--checkers int Number of checkers to run in parallel. (default 8)
-c, --checksum Skip based on checksum & size, not mod-time & size
--config string Config file. (default "/home/ncw/.rclone.conf")
--contimeout duration Connect timeout (default 1m0s)
-L, --copy-links Follow symlinks and copy the pointed to item.
--cpuprofile string Write cpu profile to file
--crypt-directory-name-encryption Option to either encrypt directory names or leave them intact. (default true)
--crypt-filename-encryption string How to encrypt the filenames. (default "standard")
--crypt-password string Password or pass phrase for encryption.
--crypt-password2 string Password or pass phrase for salt. Optional but recommended.
--crypt-remote string Remote to encrypt/decrypt.
--crypt-show-mapping For all files listed show how the names encrypt.
--delete-after When synchronizing, delete files on destination after transfering (default)
--delete-before When synchronizing, delete files on destination before transfering
--delete-during When synchronizing, delete files during transfer
--delete-excluded Delete files on dest excluded from sync
--disable string Disable a comma separated list of features. Use help to see a list.
--drive-acknowledge-abuse Set to allow files which return cannotDownloadAbusiveFile to be downloaded.
--drive-alternate-export Use alternate export URLs for google documents export.
--drive-auth-owner-only Only consider files owned by the authenticated user.
--drive-chunk-size SizeSuffix Upload chunk size. Must a power of 2 >= 256k. (default 8M)
--drive-client-id string Google Application Client Id
--drive-client-secret string Google Application Client Secret
--drive-formats string Comma separated list of preferred formats for downloading Google docs. (default "docx,xlsx,pptx,svg")
--drive-impersonate string Impersonate this user when using a service account.
--drive-keep-revision-forever Keep new head revision forever.
--drive-list-chunk int Size of listing chunk 100-1000. 0 to disable. (default 1000)
--drive-root-folder-id string ID of the root folder
--drive-scope string Scope that rclone should use when requesting access from drive.
--drive-service-account-file string Service Account Credentials JSON file path
--drive-shared-with-me Only show files that are shared with me
--drive-skip-gdocs Skip google documents in all listings.
--drive-trashed-only Only show files that are in the trash
--drive-upload-cutoff SizeSuffix Cutoff for switching to chunked upload (default 8M)
--drive-use-created-date Use created date instead of modified date.
--drive-use-trash Send files to the trash instead of deleting permanently. (default true)
--dropbox-chunk-size SizeSuffix Upload chunk size. Max 150M. (default 48M)
--dropbox-client-id string Dropbox App Client Id
--dropbox-client-secret string Dropbox App Client Secret
-n, --dry-run Do a trial run with no permanent changes
--dump string List of items to dump from: headers,bodies,requests,responses,auth,filters,goroutines,openfiles
--dump-bodies Dump HTTP headers and bodies - may contain sensitive info
--dump-headers Dump HTTP bodies - may contain sensitive info
--exclude stringArray Exclude files matching pattern
--exclude-from stringArray Read exclude patterns from file
--exclude-if-present string Exclude directories if filename is present
--fast-list Use recursive list if available. Uses more memory but fewer transactions.
--files-from stringArray Read list of source-file names from file
-f, --filter stringArray Add a file-filtering rule
--filter-from stringArray Read filtering patterns from a file
--ftp-host string FTP host to connect to
--ftp-pass string FTP password
--ftp-port string FTP port, leave blank to use default (21)
--ftp-user string FTP username, leave blank for current username, ncw
--gcs-bucket-acl string Access Control List for new buckets.
--gcs-client-id string Google Application Client Id
--gcs-client-secret string Google Application Client Secret
--gcs-location string Location for the newly created buckets.
--gcs-object-acl string Access Control List for new objects.
--gcs-project-number string Project number.
--gcs-service-account-file string Service Account Credentials JSON file path
--gcs-storage-class string The storage class to use when storing objects in Google Cloud Storage.
--http-url string URL of http host to connect to
--hubic-client-id string Hubic Client Id
--hubic-client-secret string Hubic Client Secret
--ignore-checksum Skip post copy check of checksums.
--ignore-errors delete even if there are I/O errors
--ignore-existing Skip all files that exist on destination
--ignore-size Ignore size when skipping use mod-time or checksum.
-I, --ignore-times Don't skip files that match size and time - transfer all files
--immutable Do not modify files. Fail if existing files have been modified.
--include stringArray Include files matching pattern
--include-from stringArray Read include patterns from file
--jottacloud-md5-memory-limit SizeSuffix Files bigger than this will be cached on disk to calculate the MD5 if required. (default 10M)
--jottacloud-mountpoint string The mountpoint to use.
--jottacloud-pass string Password.
--jottacloud-user string User Name
--local-no-check-updated Don't check to see if the files change during upload
--local-no-unicode-normalization Don't apply unicode normalization to paths and filenames
--local-nounc string Disable UNC (long path names) conversion on Windows
--log-file string Log everything to this file
--log-level string Log level DEBUG|INFO|NOTICE|ERROR (default "NOTICE")
--low-level-retries int Number of low level retries to do. (default 10)
--max-age duration Only transfer files younger than this in s or suffix ms|s|m|h|d|w|M|y (default off)
--max-backlog int Maximum number of objects in sync or check backlog. (default 10000)
--max-delete int When synchronizing, limit the number of deletes (default -1)
--max-depth int If set limits the recursion depth to this. (default -1)
--max-size int Only transfer files smaller than this in k or suffix b|k|M|G (default off)
--max-transfer int Maximum size of data to transfer. (default off)
--mega-debug Output more debug from Mega.
--mega-hard-delete Delete files permanently rather than putting them into the trash.
--mega-pass string Password.
--mega-user string User name
--memprofile string Write memory profile to file
--min-age duration Only transfer files older than this in s or suffix ms|s|m|h|d|w|M|y (default off)
--min-size int Only transfer files bigger than this in k or suffix b|k|M|G (default off)
--modify-window duration Max time diff to be considered the same (default 1ns)
--no-check-certificate Do not verify the server SSL certificate. Insecure.
--no-gzip-encoding Don't set Accept-Encoding: gzip.
--no-traverse Obsolete - does nothing.
--no-update-modtime Don't update destination mod-time if files identical.
-x, --one-file-system Don't cross filesystem boundaries (unix/macOS only).
--onedrive-chunk-size SizeSuffix Chunk size to upload files with - must be multiple of 320k. (default 10M)
--onedrive-client-id string Microsoft App Client Id
--onedrive-client-secret string Microsoft App Client Secret
--opendrive-password string Password.
--opendrive-username string Username
--pcloud-client-id string Pcloud App Client Id
--pcloud-client-secret string Pcloud App Client Secret
-P, --progress Show progress during transfer.
--qingstor-access-key-id string QingStor Access Key ID
--qingstor-connection-retries int Number of connnection retries. (default 3)
--qingstor-endpoint string Enter a endpoint URL to connection QingStor API.
--qingstor-env-auth Get QingStor credentials from runtime. Only applies if access_key_id and secret_access_key is blank.
--qingstor-secret-access-key string QingStor Secret Access Key (password)
--qingstor-zone string Zone to connect to.
-q, --quiet Print as little stuff as possible
--rc Enable the remote control server.
--rc-addr string IPaddress:Port or :Port to bind server to. (default "localhost:5572")
--rc-cert string SSL PEM key (concatenation of certificate and CA certificate)
--rc-client-ca string Client certificate authority to verify clients with
--rc-htpasswd string htpasswd file - if not provided no authentication is done
--rc-key string SSL PEM Private key
--rc-max-header-bytes int Maximum size of request header (default 4096)
--rc-pass string Password for authentication.
--rc-realm string realm for authentication (default "rclone")
--rc-server-read-timeout duration Timeout for server reading data (default 1h0m0s)
--rc-server-write-timeout duration Timeout for server writing data (default 1h0m0s)
--rc-user string User name for authentication.
--retries int Retry operations this many times if they fail (default 3)
--retries-sleep duration Interval between retrying operations if they fail, e.g 500ms, 60s, 5m. (0 to disable)
--s3-access-key-id string AWS Access Key ID.
--s3-acl string Canned ACL used when creating buckets and/or storing objects in S3.
--s3-chunk-size SizeSuffix Chunk size to use for uploading (default 5M)
--s3-disable-checksum Don't store MD5 checksum with object metadata
--s3-endpoint string Endpoint for S3 API.
--s3-env-auth Get AWS credentials from runtime (environment variables or EC2/ECS meta data if no env vars).
--s3-force-path-style If true use path style access if false use virtual hosted style. (default true)
--s3-location-constraint string Location constraint - must be set to match the Region.
--s3-provider string Choose your S3 provider.
--s3-region string Region to connect to.
--s3-secret-access-key string AWS Secret Access Key (password)
--s3-server-side-encryption string The server-side encryption algorithm used when storing this object in S3.
--s3-sse-kms-key-id string If using KMS ID you must provide the ARN of Key.
--s3-storage-class string The storage class to use when storing objects in S3.
--s3-upload-concurrency int Concurrency for multipart uploads. (default 2)
--sftp-ask-password Allow asking for SFTP password when needed.
--sftp-disable-hashcheck Disable the execution of SSH commands to determine if remote file hashing is available.
--sftp-host string SSH host to connect to
--sftp-key-file string Path to unencrypted PEM-encoded private key file, leave blank to use ssh-agent.
--sftp-pass string SSH password, leave blank to use ssh-agent.
--sftp-path-override string Override path used by SSH connection.
--sftp-port string SSH port, leave blank to use default (22)
--sftp-set-modtime Set the modified time on the remote if set. (default true)
--sftp-use-insecure-cipher Enable the use of the aes128-cbc cipher. This cipher is insecure and may allow plaintext data to be recovered by an attacker.
--sftp-user string SSH username, leave blank for current username, ncw
--size-only Skip based on size only, not mod-time or checksum
--skip-links Don't warn about skipped symlinks.
--stats duration Interval between printing stats, e.g 500ms, 60s, 5m. (0 to disable) (default 1m0s)
--stats-file-name-length int Max file name length in stats. 0 for no limit (default 40)
--stats-log-level string Log level to show --stats output DEBUG|INFO|NOTICE|ERROR (default "INFO")
--stats-one-line Make the stats fit on one line.
--stats-unit string Show data rate in stats as either 'bits' or 'bytes'/s (default "bytes")
--streaming-upload-cutoff int Cutoff for switching to chunked upload if file size is unknown. Upload starts after reaching cutoff or when file ends. (default 100k)
--suffix string Suffix for use with --backup-dir.
--swift-auth string Authentication URL for server (OS_AUTH_URL).
--swift-auth-token string Auth Token from alternate authentication - optional (OS_AUTH_TOKEN)
--swift-auth-version int AuthVersion - optional - set to (1,2,3) if your auth URL has no version (ST_AUTH_VERSION)
--swift-chunk-size SizeSuffix Above this size files will be chunked into a _segments container. (default 5G)
--swift-domain string User domain - optional (v3 auth) (OS_USER_DOMAIN_NAME)
--swift-endpoint-type string Endpoint type to choose from the service catalogue (OS_ENDPOINT_TYPE) (default "public")
--swift-env-auth Get swift credentials from environment variables in standard OpenStack form.
--swift-key string API key or password (OS_PASSWORD).
--swift-region string Region name - optional (OS_REGION_NAME)
--swift-storage-policy string The storage policy to use when creating a new container
--swift-storage-url string Storage URL - optional (OS_STORAGE_URL)
--swift-tenant string Tenant name - optional for v1 auth, this or tenant_id required otherwise (OS_TENANT_NAME or OS_PROJECT_NAME)
--swift-tenant-domain string Tenant domain - optional (v3 auth) (OS_PROJECT_DOMAIN_NAME)
--swift-tenant-id string Tenant ID - optional for v1 auth, this or tenant required otherwise (OS_TENANT_ID)
--swift-user string User name to log in (OS_USERNAME).
--swift-user-id string User ID to log in - optional - most swift systems use user and leave this blank (v3 auth) (OS_USER_ID).
--syslog Use Syslog for logging
--syslog-facility string Facility for syslog, eg KERN,USER,... (default "DAEMON")
--timeout duration IO idle timeout (default 5m0s)
--tpslimit float Limit HTTP transactions per second to this.
--tpslimit-burst int Max burst of transactions for --tpslimit. (default 1)
--track-renames When synchronizing, track file renames and do a server side move if possible
--transfers int Number of file transfers to run in parallel. (default 4)
-u, --update Skip files that are newer on the destination.
--use-server-modtime Use server modified time instead of object metadata
--user-agent string Set the user-agent to a specified string. The default is rclone/ version (default "rclone/v1.43")
-v, --verbose count Print lots more stuff (repeat for more)
--webdav-bearer-token string Bearer token instead of user/pass (eg a Macaroon)
--webdav-pass string Password.
--webdav-url string URL of http host to connect to
--webdav-user string User name
--webdav-vendor string Name of the Webdav site/service/software you are using
--yandex-client-id string Yandex Client Id
--yandex-client-secret string Yandex Client Secret
```
### SEE ALSO
2018-09-01 13:58:00 +02:00
* [rclone](/commands/rclone/) - Sync files and directories to and from local and remote object stores - v1.43
2018-03-19 11:06:13 +01:00
2018-09-01 13:58:00 +02:00
###### Auto generated by spf13/cobra on 1-Sep-2018