You are here

Always checking MD5 ?

12 posts / 0 new
Last post
Cfaucheux
Offline
Last seen: 12 years 10 months ago
Joined: 2010-01-11 10:10
Always checking MD5 ?

Hi everybody.

In sync/mirror mode, is Toucan always comparing the files by computing their hash ?

I am following this procedure as a test:

I have a folder containing one big file that I want to sync (mirror) on a NAS.
First sync runs fine, the content of the source folder is copied on the destination.

Second sync, no modification on the content of the source folder. The timestamps and sizes of the two files are identical, but the MD5 is computed (I know it because the whole file is transfered over my network).

In the real life, I want to use Toucan to synchronize hundreds of GB on a small NAS, so I can't afford the MD5 comparison for every files.

So, am I doing something wrong ?
If not, is there a way to disable MD5 comparison (or add this as a feature) ?

Cyrille

Steve Lamerton
Steve Lamerton's picture
Offline
Last seen: 9 years 2 months ago
Developer
Joined: 2005-12-10 15:22
Currently

Toucan doesn't use MD5, it simply does a chunk by chunk comparison, which as you noticed really slows it down over a network, I am working on a better method in time for version 3.0. Until then if you look in the help file on the Sync section there is an option for disabling streams, this should improve your performance dramatically, hope that helps!

alexl_ru
Offline
Last seen: 10 years 8 months ago
Joined: 2010-02-19 10:17
md5

In 2.1.0.0 (unfortunately there is no 2.2.1 available on sourceforge) it actually uses md5 (so it has to read whole files to compare them), like this:

if(md5->GetFileMD5(source) != md5->GetFileMD5(dest)){
CopyFile(source, dest);
}

p.s. Oops, anyway, for 2 equal files it will have to read the whole files contents in both cases (md5 vs. chunk comparisson). Would be much better for it to do not read file content in case if file sizes and date/time match.

Will try that DisableStream feature, let'see..

p.p.s. Just tried DisableStream trick. Now it just copies all files from source to destination. So, even if there was only 1Mb of 600Gb updated it will copy the whole 600Gb archive!!! Which is 600 000 times slower then manual file copying.

Steve Lamerton
Steve Lamerton's picture
Offline
Last seen: 9 years 2 months ago
Developer
Joined: 2005-12-10 15:22
But

the problem is that even if your file times and sizes match they are not always the same file, there really is no easy way around that fact! That code you quote is also very very old, if you take a look in the mercurial repository you will notice that md5 is not used at all and hasn't been for some time.

alexl_ru
Offline
Last seen: 10 years 8 months ago
Joined: 2010-02-19 10:17
> the problem is that even if

> the problem is that even if your file times and sizes match they are not always the same file

Well, at least it's pretty reasonable to assume that this kind of files (very very likely) should match. For paranoids there could be added some option like "ignore files date/time/size when comparing".

> if you take a look in the mercurial repository you will notice that md5 is not used at all and hasn't been for some time.

Could you post some link to that repository?

Steve Lamerton
Steve Lamerton's picture
Offline
Last seen: 9 years 2 months ago
Developer
Joined: 2005-12-10 15:22
Sure,

it is linked to from the project page on sourceforge:

http://portableapps.hg.sourceforge.net/hgweb/portableapps/toucan/

And you are right, it is very likely that they are the same, there are definitely some changes already planned similar to what you suggest for version 3 Smile

alexl_ru
Offline
Last seen: 10 years 8 months ago
Joined: 2010-02-19 10:17
Thanks! Will take a

Thanks!
Will take a look.

offtopic: BTW, I'm thinking of idea to use hard links (NTFS has support for those) for equal files. Currently, if I have several copies of the same file (even if they are hardlinks) in the source path, I'll have all these files copied to the destination. Which is not smart, using hard drive's space like this. I'm dealing with a huge photo archive, so I have many (hardlinked) copies of the same files pretty often (e.g. different subsets of the same photoset, each subset is for some certain person).
So, I can add this functionality to the Toucan, when I'll have some time for it.

alexl_ru
Offline
Last seen: 10 years 8 months ago
Joined: 2010-02-19 10:17
ITaskbarList3

Looks like ITaskbarList3 (from frmprogress.h/cpp) is Win7 only feature.

alexl_ru
Offline
Last seen: 10 years 8 months ago
Joined: 2010-02-19 10:17
bitmaps missing

Well... compiling this thing (unlike 2.1.0) was hell of a task! But I finally I did it.

FYI, these image files are missing:
drive-harddisk.png
drive-optical.png
drive-removable-media.png
file-exe.png
file.png
folder.png

Steve Lamerton
Steve Lamerton's picture
Offline
Last seen: 9 years 2 months ago
Developer
Joined: 2005-12-10 15:22
Yeah,

sorry I forgot to add those to mercurial, I will push those soon, please note that mercurial at the moment is in flux a little as I make some big changes.

The key to getting it to compile at the moment is to use CMake, after that you should be able to work out what most of the dependencies are, but you are right I haven't got it checking for the required version of the platform SDK yet, I hope to add proper instructions to the user manual soon.

If you have any questions about compiling then would you mind putting them in a new thread? Probably easier for other people to follow that way! Thanks Smile

alexl_ru
Offline
Last seen: 10 years 8 months ago
Joined: 2010-02-19 10:17
I'm having the same problem

I'm having the same problem with 2.2.1
Somehow it doesn't take in account equal timestamps, so it reads hundreds of Gigabytes every sync.

alexl_ru
Offline
Last seen: 10 years 8 months ago
Joined: 2010-02-19 10:17
Solution to your problem

Do not use this "DisableStream" feature, since it's absolutely useless: the sync procedure would work as long as before.

Much better trick here is to use "Update" sync mode. At least it won't read/write the whole archive, only newer files. However, it will keep files that were removed from source.
So, the solution here is syncing in 2 steps:
1. Sync in "Update" mode - it will copy only newer files from the source.
2. Sync in "Clean" mode - it will remove obsolete files from destination.

That's the correct way of Mirroring!

Log in or register to post comments