1
mirror of https://github.com/rclone/rclone synced 2024-11-21 22:50:16 +01:00
rclone/fs/dirtree
Nick Craig-Wood 07133b892d dirtree: fix performance with large directories of directories and --fast-list
Before this change if using --fast-list on a directory with more than
a few thousand directories in it DirTree.CheckParents became very slow
taking up to 24 hours for a directory with 1,000,000 directories in
it.

This is because it becomes an O(N²) operation as DirTree.Find has to
search each directory in a linear fashion as it is stored as a slice.

This patch fixes the problem by scanning the DirTree for directories
before starting the CheckParents process so it never has to call
DirTree.Find.

After the fix calling DirTree.CheckParents on a directory with
1,000,000 directories in it will take about 1 second.

Anything which calls DirTree.Find can potentially have bad performance
so in the future we should redesign the DirTree to use a different
underlying datastructure or have an index.

https://forum.rclone.org/t/almost-24-hours-cpu-compute-time-during-sync-between-two-large-s3-buckets/39375/
2023-07-03 14:09:21 +01:00
..
dirtree_test.go dirtree: fix performance with large directories of directories and --fast-list 2023-07-03 14:09:21 +01:00
dirtree.go dirtree: fix performance with large directories of directories and --fast-list 2023-07-03 14:09:21 +01:00