linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Machek <pavel@ucw.cz>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: bhalevy@panasas.com, arjan@infradead.org,
	mikulas@artax.karlin.mff.cuni.cz, jaharkes@cs.cmu.edu,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	nfsv4@ietf.org
Subject: Re: Finding hardlinks
Date: Wed, 3 Jan 2007 13:42:11 +0100	[thread overview]
Message-ID: <20070103124211.GF3062@elf.ucw.cz> (raw)
In-Reply-To: <E1H25JD-0003SN-00@dorka.pomaz.szeredi.hu>

Hi!

> > > > > the use of a good hash function.  The chance of an accidental
> > > > > collision is infinitesimally small.  For a set of 
> > > > > 
> > > > >          100 files: 0.00000000000003%
> > > > >    1,000,000 files: 0.000003%
> > > > 
> > > > I do not think we want to play with probability like this. I mean...
> > > > imagine 4G files, 1KB each. That's 4TB disk space, not _completely_
> > > > unreasonable, and collision probability is going to be ~100% due to
> > > > birthday paradox.
> > > > 
> > > > You'll still want to back up your 4TB server...
> > > 
> > > Certainly, but tar isn't going to remember all the inode numbers.
> > > Even if you solve the storage requirements (not impossible) it would
> > > have to do (4e9^2)/2=8e18 comparisons, which computers don't have
> > > enough CPU power just yet.
> > 
> > Storage requirements would be 16GB of RAM... that's small enough. If
> > you sort, you'll only need 32*2^32 comparisons, and that's doable.
> > 
> > I do not claim it is _likely_. You'd need hardlinks, as you
> > noticed. But system should work, not "work with high probability", and
> > I believe we should solve this in long term.
> 
> High probability is all you have.  Cosmic radiation hitting your
> computer will more likly cause problems, than colliding 64bit inode
> numbers ;)

As I have shown... no, that's not right. 32*2^32 operations is small
enough not to have problems with cosmic radiation.

> But you could add a new interface for the extra paranoid.  The
> proposed 'samefile(fd1, fd2)' syscall is severly limited by the heavy
> weight of file descriptors.

I guess that is the way to go. samefile(path1, path2) is unfortunately
inherently racy.

> Another idea is to export the filesystem internal ID as an arbitray
> length cookie through the extended attribute interface.  That could be
> stored/compared by the filesystem quite efficiently.

How will that work for FAT?

Or maybe we can relax that "inode may not change over rename" and
"zero length files need unique inode numbers"...

								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

  reply	other threads:[~2007-01-03 12:42 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-20  9:03 Finding hardlinks Mikulas Patocka
2006-12-20 11:44 ` Miklos Szeredi
2006-12-20 16:36   ` Mikulas Patocka
2006-12-20 16:50     ` Miklos Szeredi
2006-12-20 19:54       ` Al Viro
2006-12-20 20:12         ` Mikulas Patocka
2006-12-31 15:02         ` Mikulas Patocka
2006-12-21 18:58   ` Jan Harkes
2006-12-21 23:49     ` Mikulas Patocka
2006-12-22  5:05       ` Jan Harkes
2006-12-23 10:18       ` Arjan van de Ven
2006-12-23 14:00         ` Mikulas Patocka
2006-12-28  9:06           ` Benny Halevy
2006-12-28 10:05             ` Arjan van de Ven
2006-12-28 15:24               ` Benny Halevy
2006-12-28 19:58                 ` Miklos Szeredi
2007-01-02 19:15                   ` Pavel Machek
2007-01-02 20:41                     ` Miklos Szeredi
2007-01-02 20:50                       ` Mikulas Patocka
2007-01-02 21:10                         ` Miklos Szeredi
2007-01-02 21:37                           ` Mikulas Patocka
2007-01-03 11:56                       ` Pavel Machek
2007-01-03 12:33                         ` Miklos Szeredi
2007-01-03 12:42                           ` Pavel Machek [this message]
2007-01-11 23:43                             ` Denis Vlasenko
2007-01-03 12:45                           ` Martin Mares
2007-01-03 13:54                           ` Matthew Wilcox
2007-01-03 15:51                             ` Miklos Szeredi
2007-01-03 19:04                               ` Mikulas Patocka
2007-01-04 22:59                               ` Pavel Machek
2007-01-05  8:43                                 ` Miklos Szeredi
2007-01-05 13:12                                   ` Pavel Machek
2007-01-05 13:55                                     ` Miklos Szeredi
2007-01-05 14:08                                       ` Mikulas Patocka
2007-01-05 15:09                                         ` Miklos Szeredi
2007-01-05 15:15                                           ` Miklos Szeredi
2007-01-08 11:27                                             ` Pavel Machek
2007-01-08  5:57                                           ` Mikulas Patocka
2007-01-08  8:49                                             ` Miklos Szeredi
2007-01-08 11:29                                               ` Pavel Machek
2007-01-08 12:00                                                 ` Miklos Szeredi
2007-01-08 13:26                                                   ` Martin Mares
2007-01-08 13:39                                                     ` Miklos Szeredi
2007-01-09 16:26                                                   ` Steven Rostedt
2007-01-09 19:53                                                     ` Frank van Maarseveen
2007-01-09 20:11                                                       ` Steven Rostedt
2007-01-11 10:07                                                       ` Pádraig Brady
2007-01-05 17:30                                   ` Frank van Maarseveen
2006-12-28 18:14               ` Mikulas Patocka
2006-12-29 10:34                 ` Trond Myklebust
2006-12-30  1:04                   ` Mikulas Patocka
2007-01-01  2:30                     ` Nikita Danilov
2007-01-01 22:58                       ` Mikulas Patocka
2007-01-01 23:05                         ` Nikita Danilov
2007-01-01 23:22                           ` Mikulas Patocka
2007-01-04 13:59                             ` Nikita Danilov
2007-01-02 23:14                     ` Trond Myklebust
2007-01-02 23:50                       ` Mikulas Patocka
2006-12-28 13:22             ` Jeff Layton
2006-12-28 15:12               ` Benny Halevy
2006-12-28 15:54                 ` Jeff Layton
2006-12-28 16:26                   ` Jan Engelhardt
2006-12-28 18:17                 ` Mikulas Patocka
2006-12-28 20:07                   ` Halevy, Benny
2006-12-29 10:28                     ` [nfsv4] " Trond Myklebust
2006-12-31 21:25                       ` Halevy, Benny
2007-01-02 23:21                         ` Trond Myklebust
2007-01-03 12:35                           ` Benny Halevy
2007-01-04  0:43                             ` Trond Myklebust
2007-01-04  8:36                             ` Trond Myklebust
2007-01-04 10:04                               ` Benny Halevy
2007-01-04 10:47                                 ` Trond Myklebust
2007-01-05  8:28                                   ` Benny Halevy
2007-01-05 10:29                                     ` Trond Myklebust
2007-01-05 16:40                                 ` Nicolas Williams
2007-01-05 16:56                                   ` Trond Myklebust
2007-01-06  7:44                                   ` Halevy, Benny
2007-01-10 13:04                                   ` Benny Halevy
2006-12-29 10:12                 ` Trond Myklebust
2006-12-31 21:19                   ` Halevy, Benny
2007-01-02 23:20                     ` Trond Myklebust
2007-01-02 23:46                     ` Trond Myklebust
2007-01-11 23:35             ` Denis Vlasenko
2006-12-29 10:02           ` Pavel Machek
2007-01-01 22:47             ` Mikulas Patocka
2007-01-01 23:53               ` Jan Harkes
2007-01-02  0:04                 ` Mikulas Patocka
2007-01-03 18:58                   ` Frank van Maarseveen
2007-01-03 19:17                     ` Mikulas Patocka
2007-01-03 19:26                       ` Frank van Maarseveen
2007-01-03 19:31                         ` Mikulas Patocka
2007-01-03 20:26                           ` Frank van Maarseveen
2007-01-12  0:00                             ` Denis Vlasenko
2007-01-03 22:30                           ` Pavel Machek
2007-01-03 21:09                     ` Bryan Henderson
2007-01-03 22:01                       ` Frank van Maarseveen
2007-01-03 23:43                         ` Mikulas Patocka
2007-01-04  0:12                           ` Frank van Maarseveen
2007-01-08  6:19                             ` Mikulas Patocka
     [not found] <7x5mR-2wX-3@gated-at.bofh.it>
     [not found] ` <7x9Ad-18O-35@gated-at.bofh.it>
     [not found]   ` <7yXEy-UI-39@gated-at.bofh.it>
     [not found]     ` <7yYKa-2Ds-3@gated-at.bofh.it>
     [not found]       ` <7zcWP-7ET-5@gated-at.bofh.it>
     [not found]         ` <7zdzA-jc-27@gated-at.bofh.it>
     [not found]           ` <7zeP5-2ic-15@gated-at.bofh.it>
     [not found]             ` <7zgH9-5my-17@gated-at.bofh.it>
     [not found]               ` <7zJSM-14t-9@gated-at.bofh.it>
     [not found]                 ` <7zSW5-6cj-9@gated-at.bofh.it>
     [not found]                   ` <7zX9l-4rS-7@gated-at.bofh.it>
     [not found]                     ` <7zXMb-5g5-27@gated-at.bofh.it>
2007-01-05 23:54                       ` Bodo Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070103124211.GF3062@elf.ucw.cz \
    --to=pavel@ucw.cz \
    --cc=arjan@infradead.org \
    --cc=bhalevy@panasas.com \
    --cc=jaharkes@cs.cmu.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=mikulas@artax.karlin.mff.cuni.cz \
    --cc=nfsv4@ietf.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).