All of lore.kernel.org
 help / color / mirror / Atom feed
* regressions due to 64-bit ext4 directory cookies
@ 2013-02-12 20:28 J. Bruce Fields
  2013-02-12 20:56 ` Bernd Schubert
                   ` (2 more replies)
  0 siblings, 3 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-12 20:28 UTC (permalink / raw)
  To: linux-ext4, sandeen, Theodore Ts'o, Bernd Schubert, gluster-devel

06effdbb49af5f6c "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)"
and previous patches solved problems with hash collisions in large
directories by using 64- instead of 32- bit directory hashes in some
cases.  But it caused problems for users who assume directory offsets
are "small".  Two cases we've run across:

	- older NFS clients: 64-bit cookies cause applications on many
	  older clients to fail.
	- gluster: gluster assumed that it could take the top bits of
	  the offset for its own use.

In both cases we could argue we're in the right: the nfs protocol
defines cookies to be 64 bits, so clients should be prepared to handle
them (remapping to smaller integers if necessary to placate applications
using older system interfaces).  And gluster was incorrect to assume
that the "offset" was really an "offset" as opposed to just an opaque
value.

But in practice things that worked fine for a long time break on a
kernel upgrade.

So at a minimum I think we owe people a workaround, and turning off
dir_index may not be practical for everyone.

A "no_64bit_cookies" export option would provide a workaround for NFS 
servers with older NFS clients, but not for applications like gluster.

For that reason I'd rather have a way to turn this off on a given ext4 
filesystem.  Is that practical?

--b.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-12 20:28 regressions due to 64-bit ext4 directory cookies J. Bruce Fields
@ 2013-02-12 20:56 ` Bernd Schubert
  2013-02-12 21:00   ` J. Bruce Fields
  2013-02-13  4:00 ` Theodore Ts'o
  2013-02-13  6:56 ` Andreas Dilger
  2 siblings, 1 reply; 65+ messages in thread
From: Bernd Schubert @ 2013-02-12 20:56 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-ext4, sandeen, Theodore Ts'o, gluster-devel, Andreas Dilger

On 02/12/2013 09:28 PM, J. Bruce Fields wrote:
> 06effdbb49af5f6c "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)"
> and previous patches solved problems with hash collisions in large
> directories by using 64- instead of 32- bit directory hashes in some
> cases.  But it caused problems for users who assume directory offsets
> are "small".  Two cases we've run across:
> 
> 	- older NFS clients: 64-bit cookies cause applications on many
> 	  older clients to fail.
> 	- gluster: gluster assumed that it could take the top bits of
> 	  the offset for its own use.
> 
> In both cases we could argue we're in the right: the nfs protocol
> defines cookies to be 64 bits, so clients should be prepared to handle
> them (remapping to smaller integers if necessary to placate applications
> using older system interfaces).  And gluster was incorrect to assume
> that the "offset" was really an "offset" as opposed to just an opaque
> value.
> 
> But in practice things that worked fine for a long time break on a
> kernel upgrade.
> 
> So at a minimum I think we owe people a workaround, and turning off
> dir_index may not be practical for everyone.
> 
> A "no_64bit_cookies" export option would provide a workaround for NFS 
> servers with older NFS clients, but not for applications like gluster.
> 
> For that reason I'd rather have a way to turn this off on a given ext4 
> filesystem.  Is that practical?

I think Ted needs to answer if he would accept another mount option. But
before we are going this way, what is gluster doing if there are hash
collions?

Thanks,
Bernd

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-12 20:56 ` Bernd Schubert
@ 2013-02-12 21:00   ` J. Bruce Fields
  2013-02-13  8:17     ` Bernd Schubert
  2013-02-13 13:31     ` [Gluster-devel] " Niels de Vos
  0 siblings, 2 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-12 21:00 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-ext4, sandeen, Theodore Ts'o, gluster-devel, Andreas Dilger

On Tue, Feb 12, 2013 at 09:56:41PM +0100, Bernd Schubert wrote:
> On 02/12/2013 09:28 PM, J. Bruce Fields wrote:
> > 06effdbb49af5f6c "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)"
> > and previous patches solved problems with hash collisions in large
> > directories by using 64- instead of 32- bit directory hashes in some
> > cases.  But it caused problems for users who assume directory offsets
> > are "small".  Two cases we've run across:
> > 
> > 	- older NFS clients: 64-bit cookies cause applications on many
> > 	  older clients to fail.
> > 	- gluster: gluster assumed that it could take the top bits of
> > 	  the offset for its own use.
> > 
> > In both cases we could argue we're in the right: the nfs protocol
> > defines cookies to be 64 bits, so clients should be prepared to handle
> > them (remapping to smaller integers if necessary to placate applications
> > using older system interfaces).  And gluster was incorrect to assume
> > that the "offset" was really an "offset" as opposed to just an opaque
> > value.
> > 
> > But in practice things that worked fine for a long time break on a
> > kernel upgrade.
> > 
> > So at a minimum I think we owe people a workaround, and turning off
> > dir_index may not be practical for everyone.
> > 
> > A "no_64bit_cookies" export option would provide a workaround for NFS 
> > servers with older NFS clients, but not for applications like gluster.
> > 
> > For that reason I'd rather have a way to turn this off on a given ext4 
> > filesystem.  Is that practical?
> 
> I think Ted needs to answer if he would accept another mount option. But
> before we are going this way, what is gluster doing if there are hash
> collions?

They probably just haven't tested NFS with large enough directories.
The birthday paradox says you'd need about 2^16 entries to have a 50-50
chance of hitting the problem.

I don't know enough about ext4 directory performance.  But unfortunately
I suspect there's a range of directory sizes that are too small to have
a significant chance of having directory collisions, but still large
enough to need dir_index?

--b.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-12 20:28 regressions due to 64-bit ext4 directory cookies J. Bruce Fields
  2013-02-12 20:56 ` Bernd Schubert
@ 2013-02-13  4:00 ` Theodore Ts'o
  2013-02-13 13:31   ` J. Bruce Fields
  2013-02-13  6:56 ` Andreas Dilger
  2 siblings, 1 reply; 65+ messages in thread
From: Theodore Ts'o @ 2013-02-13  4:00 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-ext4, sandeen, Bernd Schubert, gluster-devel

On Tue, Feb 12, 2013 at 03:28:41PM -0500, J. Bruce Fields wrote:
> 06effdbb49af5f6c "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)"
> and previous patches solved problems with hash collisions in large
> directories by using 64- instead of 32- bit directory hashes in some
> cases.  But it caused problems for users who assume directory offsets
> are "small".  Two cases we've run across:
> 
> 	- older NFS clients: 64-bit cookies cause applications on many
> 	  older clients to fail.

Is there a list of clients (and version numbers) which are having
problems?

> A "no_64bit_cookies" export option would provide a workaround for NFS 
> servers with older NFS clients, but not for applications like gluster.

Why isn't it sufficient for gluster?  Are they doing something
horrible such as assuming that telldir() cookies accessed from
userspace are identical to NFS cookies?  Or is it some other horrible
abstraction violation?

						- Ted

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-12 20:28 regressions due to 64-bit ext4 directory cookies J. Bruce Fields
  2013-02-12 20:56 ` Bernd Schubert
  2013-02-13  4:00 ` Theodore Ts'o
@ 2013-02-13  6:56 ` Andreas Dilger
  2013-02-13 13:40   ` J. Bruce Fields
  2 siblings, 1 reply; 65+ messages in thread
From: Andreas Dilger @ 2013-02-13  6:56 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-ext4, sandeen, Theodore Ts'o, Bernd Schubert, gluster-devel

On 2013-02-12, at 12:28 PM, J. Bruce Fields wrote:
> 06effdbb49af5f6c "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)"
> and previous patches solved problems with hash collisions in large
> directories by using 64- instead of 32- bit directory hashes in some
> cases.  But it caused problems for users who assume directory offsets
> are "small".  Two cases we've run across:
> 
> 	- older NFS clients: 64-bit cookies cause applications on
>           many older clients to fail.
> 	- gluster: gluster assumed that it could take the top bits of
> 	  the offset for its own use.
> 
> In both cases we could argue we're in the right: the nfs protocol
> defines cookies to be 64 bits, so clients should be prepared to handle them (remapping to smaller integers if necessary to placate 
> applications using older system interfaces).

There appears to already be support for handling this for NFSv2
clients, so it should be possible to have an NFS server mount
option to set this for all clients:

        /* NFSv2 only supports 32 bit cookies */
        if (rqstp->rq_vers > 2)
                may_flags |= NFSD_MAY_64BIT_COOKIE;

Alternately, this might be detected on a per-client basis by
whitelist or blacklist if there is some way for the server to
identify the client?

> And gluster was incorrect to assume that the "offset" was really
> an "offset" as opposed to just an opaque value.

Hmm, userspace already can't use the top bit of the cookie,
since the offset is a signed value, so gluster could continue
to use that bit for itself.  It could, in theory, also downshift
the cookie by one bit for 64-bit cookies and shift it back
before use, but I'm not sure that is kosher for all filesystems.

> But in practice things that worked fine for a long time break on a
> kernel upgrade.
> 
> So at a minimum I think we owe people a workaround, and turning off
> dir_index may not be practical for everyone.
> 
> A "no_64bit_cookies" export option would provide a workaround for NFS 
> servers with older NFS clients, but not for applications like gluster.

We added a "32bitapi" mount option to Lustre to handle the case
where it is re-exporting via NFS to 32-bit clients, which is like
your proposed "no_64bit_cookies" and "nfs.enable_ino64=0" together.

> For that reason I'd rather have a way to turn this off on a given ext4 filesystem.  Is that practical?

It wouldn't be impossible - pos2maj_hash() and pos2min_hash()
could get a per-superblock and/or kernel option to force 32-bit
hash values.

Cheers, Andreas






^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-12 21:00   ` J. Bruce Fields
@ 2013-02-13  8:17     ` Bernd Schubert
  2013-02-13 22:18       ` J. Bruce Fields
  2013-02-13 13:31     ` [Gluster-devel] " Niels de Vos
  1 sibling, 1 reply; 65+ messages in thread
From: Bernd Schubert @ 2013-02-13  8:17 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-ext4, sandeen, Theodore Ts'o, gluster-devel, Andreas Dilger

On 02/12/2013 10:00 PM, J. Bruce Fields wrote:
> On Tue, Feb 12, 2013 at 09:56:41PM +0100, Bernd Schubert wrote:
>> On 02/12/2013 09:28 PM, J. Bruce Fields wrote:
>>> 06effdbb49af5f6c "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)"
>>> and previous patches solved problems with hash collisions in large
>>> directories by using 64- instead of 32- bit directory hashes in some
>>> cases.  But it caused problems for users who assume directory offsets
>>> are "small".  Two cases we've run across:
>>>
>>> 	- older NFS clients: 64-bit cookies cause applications on many
>>> 	  older clients to fail.
>>> 	- gluster: gluster assumed that it could take the top bits of
>>> 	  the offset for its own use.
>>>
>>> In both cases we could argue we're in the right: the nfs protocol
>>> defines cookies to be 64 bits, so clients should be prepared to handle
>>> them (remapping to smaller integers if necessary to placate applications
>>> using older system interfaces).  And gluster was incorrect to assume
>>> that the "offset" was really an "offset" as opposed to just an opaque
>>> value.
>>>
>>> But in practice things that worked fine for a long time break on a
>>> kernel upgrade.
>>>
>>> So at a minimum I think we owe people a workaround, and turning off
>>> dir_index may not be practical for everyone.
>>>
>>> A "no_64bit_cookies" export option would provide a workaround for NFS
>>> servers with older NFS clients, but not for applications like gluster.
>>>
>>> For that reason I'd rather have a way to turn this off on a given ext4
>>> filesystem.  Is that practical?
>>
>> I think Ted needs to answer if he would accept another mount option. But
>> before we are going this way, what is gluster doing if there are hash
>> collions?
>
> They probably just haven't tested NFS with large enough directories.

Is it only related to NFS or generic readdir over gluster?

> The birthday paradox says you'd need about 2^16 entries to have a 50-50
> chance of hitting the problem.

We are frequently running into it with 50000 files per directory.

>
> I don't know enough about ext4 directory performance.  But unfortunately
> I suspect there's a range of directory sizes that are too small to have
> a significant chance of having directory collisions, but still large
> enough to need dir_index?

Here is a link to the initial benchmark:
http://search.luky.org/linux-kernel.2001/msg00117.html


Cheers,
Bernd

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-13  4:00 ` Theodore Ts'o
@ 2013-02-13 13:31   ` J. Bruce Fields
  2013-02-13 15:14     ` Theodore Ts'o
  0 siblings, 1 reply; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-13 13:31 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, sandeen, Bernd Schubert, gluster-devel

On Tue, Feb 12, 2013 at 11:00:03PM -0500, Theodore Ts'o wrote:
> On Tue, Feb 12, 2013 at 03:28:41PM -0500, J. Bruce Fields wrote:
> > 06effdbb49af5f6c "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)"
> > and previous patches solved problems with hash collisions in large
> > directories by using 64- instead of 32- bit directory hashes in some
> > cases.  But it caused problems for users who assume directory offsets
> > are "small".  Two cases we've run across:
> > 
> > 	- older NFS clients: 64-bit cookies cause applications on many
> > 	  older clients to fail.
> 
> Is there a list of clients (and version numbers) which are having
> problems?

I've seen complaints about Solaris, AIX, and HP-UX clients.  I don't
have version numbers.  It's possible that this is a problem with their
latest versions, so I probably shouldn't have said "older" above.

> > A "no_64bit_cookies" export option would provide a workaround for NFS 
> > servers with older NFS clients, but not for applications like gluster.
> 
> Why isn't it sufficient for gluster?  Are they doing something
> horrible such as assuming that telldir() cookies accessed from
> userspace are identical to NFS cookies?  Or is it some other horrible
> abstraction violation?

They're assuming they can take the high bits of the cookie for their own
use.

(In more detail: they're spreading a single directory across multiple
nodes, and encoding a node ID into the cookie they return, so they can
tell which node the cookie came from when they get it back.)

That works if you assume the cookie is an "offset" bounded above by some
measure of the directory size, hence unlikely to ever use the high
bits....

--b.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-02-12 21:00   ` J. Bruce Fields
  2013-02-13  8:17     ` Bernd Schubert
@ 2013-02-13 13:31     ` Niels de Vos
  2013-02-13 15:40       ` Bernd Schubert
  1 sibling, 1 reply; 65+ messages in thread
From: Niels de Vos @ 2013-02-13 13:31 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Bernd Schubert, sandeen, Andreas Dilger, linux-ext4,
	Theodore Ts'o, gluster-devel

On Tue, Feb 12, 2013 at 04:00:54PM -0500, J. Bruce Fields wrote:
> On Tue, Feb 12, 2013 at 09:56:41PM +0100, Bernd Schubert wrote:
> > On 02/12/2013 09:28 PM, J. Bruce Fields wrote:
> > > 06effdbb49af5f6c "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)"
> > > and previous patches solved problems with hash collisions in large
> > > directories by using 64- instead of 32- bit directory hashes in some
> > > cases.  But it caused problems for users who assume directory offsets
> > > are "small".  Two cases we've run across:
> > > 
> > > 	- older NFS clients: 64-bit cookies cause applications on many
> > > 	  older clients to fail.
> > > 	- gluster: gluster assumed that it could take the top bits of
> > > 	  the offset for its own use.
> > > 
> > > In both cases we could argue we're in the right: the nfs protocol
> > > defines cookies to be 64 bits, so clients should be prepared to handle
> > > them (remapping to smaller integers if necessary to placate applications
> > > using older system interfaces).  And gluster was incorrect to assume
> > > that the "offset" was really an "offset" as opposed to just an opaque
> > > value.
> > > 
> > > But in practice things that worked fine for a long time break on a
> > > kernel upgrade.
> > > 
> > > So at a minimum I think we owe people a workaround, and turning off
> > > dir_index may not be practical for everyone.
> > > 
> > > A "no_64bit_cookies" export option would provide a workaround for NFS 
> > > servers with older NFS clients, but not for applications like gluster.
> > > 
> > > For that reason I'd rather have a way to turn this off on a given ext4 
> > > filesystem.  Is that practical?
> > 
> > I think Ted needs to answer if he would accept another mount option. But
> > before we are going this way, what is gluster doing if there are hash
> > collions?
> 
> They probably just haven't tested NFS with large enough directories.
> The birthday paradox says you'd need about 2^16 entries to have a 50-50
> chance of hitting the problem.

The Gluster NFS-server gets into an infinite loop:
- https://bugzilla.redhat.com/show_bug.cgi?id=838784

The general advise (even before this Bug) is that XFS should be used, 
which is not affected with this problem (yet?).

Cheers,
Niels

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-13  6:56 ` Andreas Dilger
@ 2013-02-13 13:40   ` J. Bruce Fields
  0 siblings, 0 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-13 13:40 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: linux-ext4, sandeen, Theodore Ts'o, Bernd Schubert, gluster-devel

On Tue, Feb 12, 2013 at 10:56:36PM -0800, Andreas Dilger wrote:
> On 2013-02-12, at 12:28 PM, J. Bruce Fields wrote:
> > 06effdbb49af5f6c "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)"
> > and previous patches solved problems with hash collisions in large
> > directories by using 64- instead of 32- bit directory hashes in some
> > cases.  But it caused problems for users who assume directory offsets
> > are "small".  Two cases we've run across:
> > 
> > 	- older NFS clients: 64-bit cookies cause applications on
> >           many older clients to fail.
> > 	- gluster: gluster assumed that it could take the top bits of
> > 	  the offset for its own use.
> > 
> > In both cases we could argue we're in the right: the nfs protocol
> > defines cookies to be 64 bits, so clients should be prepared to handle them (remapping to smaller integers if necessary to placate 
> > applications using older system interfaces).
> 
> There appears to already be support for handling this for NFSv2
> clients, so it should be possible to have an NFS server mount
> option to set this for all clients:
> 
>         /* NFSv2 only supports 32 bit cookies */
>         if (rqstp->rq_vers > 2)
>                 may_flags |= NFSD_MAY_64BIT_COOKIE;
> 
> Alternately, this might be detected on a per-client basis by
> whitelist or blacklist if there is some way for the server to
> identify the client?

No, there isn't.

--b.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-13 13:31   ` J. Bruce Fields
@ 2013-02-13 15:14     ` Theodore Ts'o
  2013-02-13 15:19       ` J. Bruce Fields
  0 siblings, 1 reply; 65+ messages in thread
From: Theodore Ts'o @ 2013-02-13 15:14 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-ext4, sandeen, Bernd Schubert, gluster-devel

On Wed, Feb 13, 2013 at 08:31:31AM -0500, J. Bruce Fields wrote:
> They're assuming they can take the high bits of the cookie for their own
> use.
> 
> (In more detail: they're spreading a single directory across multiple
> nodes, and encoding a node ID into the cookie they return, so they can
> tell which node the cookie came from when they get it back.)
> 
> That works if you assume the cookie is an "offset" bounded above by some
> measure of the directory size, hence unlikely to ever use the high
> bits....

Right, but why wouldn't a nfs export option solave the problem for
gluster?

Basically, it would be nice if we did not have to degrade locally
running userspace applications by globally turning off 64-bit telldir
cookies just because there are some broken cluster file systems and
nfsv3 clients out there.  And if we are only turning off 64-bit
cookies for NFS, wouldn't it make sense to make this be a NFS export
option, as opposed to a mount option?

Regards,

					- Ted

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-13 15:14     ` Theodore Ts'o
@ 2013-02-13 15:19       ` J. Bruce Fields
  2013-02-13 15:36         ` Theodore Ts'o
  0 siblings, 1 reply; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-13 15:19 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, sandeen, Bernd Schubert, gluster-devel

On Wed, Feb 13, 2013 at 10:14:55AM -0500, Theodore Ts'o wrote:
> On Wed, Feb 13, 2013 at 08:31:31AM -0500, J. Bruce Fields wrote:
> > They're assuming they can take the high bits of the cookie for their own
> > use.
> > 
> > (In more detail: they're spreading a single directory across multiple
> > nodes, and encoding a node ID into the cookie they return, so they can
> > tell which node the cookie came from when they get it back.)
> > 
> > That works if you assume the cookie is an "offset" bounded above by some
> > measure of the directory size, hence unlikely to ever use the high
> > bits....
> 
> Right, but why wouldn't a nfs export option solave the problem for
> gluster?

No, gluster is running on ext4 directly.

> Basically, it would be nice if we did not have to degrade locally
> running userspace applications by globally turning off 64-bit telldir
> cookies just because there are some broken cluster file systems and
> nfsv3 clients out there.  And if we are only turning off 64-bit
> cookies for NFS, wouldn't it make sense to make this be a NFS export
> option, as opposed to a mount option?

Right, the problem is that from ext4's point of view gluster is just
another userspace application.

(And my worry of course is that there may be others.  Samba would be
another one to check.)

--b.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-13 15:19       ` J. Bruce Fields
@ 2013-02-13 15:36         ` Theodore Ts'o
       [not found]           ` <20130213153654.GC17431-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Theodore Ts'o @ 2013-02-13 15:36 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-ext4, sandeen, Bernd Schubert, gluster-devel

On Wed, Feb 13, 2013 at 10:19:53AM -0500, J. Bruce Fields wrote:
> > > (In more detail: they're spreading a single directory across multiple
> > > nodes, and encoding a node ID into the cookie they return, so they can
> > > tell which node the cookie came from when they get it back.)
> > > 
> > > That works if you assume the cookie is an "offset" bounded above by some
> > > measure of the directory size, hence unlikely to ever use the high
> > > bits....
> > 
> > Right, but why wouldn't a nfs export option solave the problem for
> > gluster?
> 
> No, gluster is running on ext4 directly.

OK, so let me see if I can get this straight.  Each local gluster node
is running a userspace NFS server, right?  Because if it were running
a kernel-side NFS server, it would be sufficient to use an nfs export
option.

A client which mounts a "gluster file system" is also doing this via
NFSv3, right?  Or are they using their own protocol?  If they are
using their own protocol, why can't they encode the node ID somewhere
else?

So this a correct picture of what is going on:

                                                  /------ GFS Storage
                                                 /        Server #1
  GFS Cluster     NFS V3      GFS Cluster      -- NFS v3
  Client        <--------->   Frontend Server  ---------- GFS Storage
                                               --         Server #2
                                                 \
                                                  \------ GFS Storage
                                                          Server #3


And the reason why it needs to use the high bits is because when it
needs to coalesce the results from each GFS Storage Server to the GFS
Cluster client?


The other thing that I'd note is that the readdir cookie has been
64-bit since NFSv3, which was released in June ***1995***.  And the
explicit, stated purpose of making it be a 64-bit value (as stated in
RFC 1813) was to reduce interoperability problems.  If that were the
case, are you telling me that Sun (who has traditionally been pretty
good worrying about interoperability concerns, and in fact employed
the editors of RFC 1813) didn't get this right?  This seems
quite.... surprising to me.

I thought this was the whole point of the various NFS interoperability
testing done at Connectathon, for which Sun was a major sponsor?!?  No
one noticed?!?

	     		      	    	- Ted

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-02-13 13:31     ` [Gluster-devel] " Niels de Vos
@ 2013-02-13 15:40       ` Bernd Schubert
  2013-02-14  5:32         ` Dave Chinner
  0 siblings, 1 reply; 65+ messages in thread
From: Bernd Schubert @ 2013-02-13 15:40 UTC (permalink / raw)
  To: Niels de Vos
  Cc: J. Bruce Fields, sandeen, Andreas Dilger, linux-ext4,
	Theodore Ts'o, gluster-devel

On 02/13/2013 02:31 PM, Niels de Vos wrote:
> On Tue, Feb 12, 2013 at 04:00:54PM -0500, J. Bruce Fields wrote:
>> On Tue, Feb 12, 2013 at 09:56:41PM +0100, Bernd Schubert wrote:
>>> On 02/12/2013 09:28 PM, J. Bruce Fields wrote:
>>>> 06effdbb49af5f6c "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)"
>>>> and previous patches solved problems with hash collisions in large
>>>> directories by using 64- instead of 32- bit directory hashes in some
>>>> cases.  But it caused problems for users who assume directory offsets
>>>> are "small".  Two cases we've run across:
>>>>
>>>> 	- older NFS clients: 64-bit cookies cause applications on many
>>>> 	  older clients to fail.
>>>> 	- gluster: gluster assumed that it could take the top bits of
>>>> 	  the offset for its own use.
>>>>
>>>> In both cases we could argue we're in the right: the nfs protocol
>>>> defines cookies to be 64 bits, so clients should be prepared to handle
>>>> them (remapping to smaller integers if necessary to placate applications
>>>> using older system interfaces).  And gluster was incorrect to assume
>>>> that the "offset" was really an "offset" as opposed to just an opaque
>>>> value.
>>>>
>>>> But in practice things that worked fine for a long time break on a
>>>> kernel upgrade.
>>>>
>>>> So at a minimum I think we owe people a workaround, and turning off
>>>> dir_index may not be practical for everyone.
>>>>
>>>> A "no_64bit_cookies" export option would provide a workaround for NFS
>>>> servers with older NFS clients, but not for applications like gluster.
>>>>
>>>> For that reason I'd rather have a way to turn this off on a given ext4
>>>> filesystem.  Is that practical?
>>>
>>> I think Ted needs to answer if he would accept another mount option. But
>>> before we are going this way, what is gluster doing if there are hash
>>> collions?
>>
>> They probably just haven't tested NFS with large enough directories.
>> The birthday paradox says you'd need about 2^16 entries to have a 50-50
>> chance of hitting the problem.
>
> The Gluster NFS-server gets into an infinite loop:
> - https://bugzilla.redhat.com/show_bug.cgi?id=838784

Hmm, this bugzilla is not entirely what I meant, as it refers to 64-bit 
hashes.
My question actually was, what is gluster going to do if there is a 
32-bit hash collision and ext4 seeks back to a random entry?
That might end in an endless loop, but it also simply might list entries 
multiple times on readdir().
Of course, something that only happens rarely is better than something 
that happens all the time, but it still would be better to properly fix 
it, wouldn't it?

> The general advise (even before this Bug) is that XFS should be used,
> which is not affected with this problem (yet?).

Hmm, well, always depends on the workload.


Cheers,
Bernd

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-13 15:36         ` Theodore Ts'o
@ 2013-02-13 16:20               ` J. Bruce Fields
  0 siblings, 0 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-13 16:20 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Bernd Schubert, sandeen-H+wXaHxf7aLQT0dZR+AlfA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A

Oops, probably should have cc'd linux-nfs.

On Wed, Feb 13, 2013 at 10:36:54AM -0500, Theodore Ts'o wrote:
> On Wed, Feb 13, 2013 at 10:19:53AM -0500, J. Bruce Fields wrote:
> > > > (In more detail: they're spreading a single directory across multiple
> > > > nodes, and encoding a node ID into the cookie they return, so they can
> > > > tell which node the cookie came from when they get it back.)
> > > > 
> > > > That works if you assume the cookie is an "offset" bounded above by some
> > > > measure of the directory size, hence unlikely to ever use the high
> > > > bits....
> > > 
> > > Right, but why wouldn't a nfs export option solave the problem for
> > > gluster?
> > 
> > No, gluster is running on ext4 directly.
> 
> OK, so let me see if I can get this straight.  Each local gluster node
> is running a userspace NFS server, right?

My understanding is that only one frontend server is running the server.
So in your picture below, "NFS v3" should be some internal gluster
protocol:


                                                   /------ GFS Storage
                                                  /        Server #1
   GFS Cluster     NFS V3      GFS Cluster      -- gluster protocol
   Client        <--------->   Frontend Server  ---------- GFS Storage
                                                --         Server #2
                                                  \
                                                   \------ GFS Storage
                                                           Server #3
 

That frontend server gets a readdir request for a directory which is
stored across several of the storage servers.  It has to return a
cookie.  It will get that cookie back from the client at some unknown
later time (possibly after the server has rebooted).  So their solution
is to return a cookie from one of the storage servers, plus some kind of
node id in the top bits so they can remember which server it came from.

(I don't know much about gluster, but I think that's the basic idea.)

I've assumed that users of directory cookies should treat them as
opaque, so I don't think what gluster is doing is correct.  But on the
other hand they are defined as integers and described as offsets here
and there.  And I can't actually think of anything else that would work,
short of gluster generating and storing its own cookies.

> Because if it were running
> a kernel-side NFS server, it would be sufficient to use an nfs export
> option.
> 
> A client which mounts a "gluster file system" is also doing this via
> NFSv3, right?  Or are they using their own protocol?  If they are
> using their own protocol, why can't they encode the node ID somewhere
> else?
> 
> So this a correct picture of what is going on:
> 
>                                                   /------ GFS Storage
>                                                  /        Server #1
>   GFS Cluster     NFS V3      GFS Cluster      -- NFS v3
>   Client        <--------->   Frontend Server  ---------- GFS Storage
>                                                --         Server #2
>                                                  \
>                                                   \------ GFS Storage
>                                                           Server #3
> 
> 
> And the reason why it needs to use the high bits is because when it
> needs to coalesce the results from each GFS Storage Server to the GFS
> Cluster client?
> 
> The other thing that I'd note is that the readdir cookie has been
> 64-bit since NFSv3, which was released in June ***1995***.  And the
> explicit, stated purpose of making it be a 64-bit value (as stated in
> RFC 1813) was to reduce interoperability problems.  If that were the
> case, are you telling me that Sun (who has traditionally been pretty
> good worrying about interoperability concerns, and in fact employed
> the editors of RFC 1813) didn't get this right?  This seems
> quite.... surprising to me.
> 
> I thought this was the whole point of the various NFS interoperability
> testing done at Connectathon, for which Sun was a major sponsor?!?  No
> one noticed?!?

Beats me.  But it's not necessarily easy to replace clients running
legacy applications, so we're stuck working with the clients we have....

The linux client does remap the server-provided cookies to small
integers, I believe exactly because older applications had trouble with
servers returning "large" cookies.  So presumably ext4-exporting-Linux
servers aren't the first to do this.

I don't know which client versions are affected--Connectathon's next
week and I'll talk to people and make sure there's an ext4 export with
this turned on to test against.

--b.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
@ 2013-02-13 16:20               ` J. Bruce Fields
  0 siblings, 0 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-13 16:20 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: linux-ext4, sandeen, Bernd Schubert, gluster-devel, linux-nfs

Oops, probably should have cc'd linux-nfs.

On Wed, Feb 13, 2013 at 10:36:54AM -0500, Theodore Ts'o wrote:
> On Wed, Feb 13, 2013 at 10:19:53AM -0500, J. Bruce Fields wrote:
> > > > (In more detail: they're spreading a single directory across multiple
> > > > nodes, and encoding a node ID into the cookie they return, so they can
> > > > tell which node the cookie came from when they get it back.)
> > > > 
> > > > That works if you assume the cookie is an "offset" bounded above by some
> > > > measure of the directory size, hence unlikely to ever use the high
> > > > bits....
> > > 
> > > Right, but why wouldn't a nfs export option solave the problem for
> > > gluster?
> > 
> > No, gluster is running on ext4 directly.
> 
> OK, so let me see if I can get this straight.  Each local gluster node
> is running a userspace NFS server, right?

My understanding is that only one frontend server is running the server.
So in your picture below, "NFS v3" should be some internal gluster
protocol:


                                                   /------ GFS Storage
                                                  /        Server #1
   GFS Cluster     NFS V3      GFS Cluster      -- gluster protocol
   Client        <--------->   Frontend Server  ---------- GFS Storage
                                                --         Server #2
                                                  \
                                                   \------ GFS Storage
                                                           Server #3
 

That frontend server gets a readdir request for a directory which is
stored across several of the storage servers.  It has to return a
cookie.  It will get that cookie back from the client at some unknown
later time (possibly after the server has rebooted).  So their solution
is to return a cookie from one of the storage servers, plus some kind of
node id in the top bits so they can remember which server it came from.

(I don't know much about gluster, but I think that's the basic idea.)

I've assumed that users of directory cookies should treat them as
opaque, so I don't think what gluster is doing is correct.  But on the
other hand they are defined as integers and described as offsets here
and there.  And I can't actually think of anything else that would work,
short of gluster generating and storing its own cookies.

> Because if it were running
> a kernel-side NFS server, it would be sufficient to use an nfs export
> option.
> 
> A client which mounts a "gluster file system" is also doing this via
> NFSv3, right?  Or are they using their own protocol?  If they are
> using their own protocol, why can't they encode the node ID somewhere
> else?
> 
> So this a correct picture of what is going on:
> 
>                                                   /------ GFS Storage
>                                                  /        Server #1
>   GFS Cluster     NFS V3      GFS Cluster      -- NFS v3
>   Client        <--------->   Frontend Server  ---------- GFS Storage
>                                                --         Server #2
>                                                  \
>                                                   \------ GFS Storage
>                                                           Server #3
> 
> 
> And the reason why it needs to use the high bits is because when it
> needs to coalesce the results from each GFS Storage Server to the GFS
> Cluster client?
> 
> The other thing that I'd note is that the readdir cookie has been
> 64-bit since NFSv3, which was released in June ***1995***.  And the
> explicit, stated purpose of making it be a 64-bit value (as stated in
> RFC 1813) was to reduce interoperability problems.  If that were the
> case, are you telling me that Sun (who has traditionally been pretty
> good worrying about interoperability concerns, and in fact employed
> the editors of RFC 1813) didn't get this right?  This seems
> quite.... surprising to me.
> 
> I thought this was the whole point of the various NFS interoperability
> testing done at Connectathon, for which Sun was a major sponsor?!?  No
> one noticed?!?

Beats me.  But it's not necessarily easy to replace clients running
legacy applications, so we're stuck working with the clients we have....

The linux client does remap the server-provided cookies to small
integers, I believe exactly because older applications had trouble with
servers returning "large" cookies.  So presumably ext4-exporting-Linux
servers aren't the first to do this.

I don't know which client versions are affected--Connectathon's next
week and I'll talk to people and make sure there's an ext4 export with
this turned on to test against.

--b.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-13 16:20               ` J. Bruce Fields
@ 2013-02-13 16:43                   ` Myklebust, Trond
  -1 siblings, 0 replies; 65+ messages in thread
From: Myklebust, Trond @ 2013-02-13 16:43 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Theodore Ts'o, linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	sandeen-H+wXaHxf7aLQT0dZR+AlfA, Bernd Schubert,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2013-02-13 at 11:20 -0500, J. Bruce Fields wrote:
> Oops, probably should have cc'd linux-nfs.
> 
> On Wed, Feb 13, 2013 at 10:36:54AM -0500, Theodore Ts'o wrote:
> > On Wed, Feb 13, 2013 at 10:19:53AM -0500, J. Bruce Fields wrote:
> > > > > (In more detail: they're spreading a single directory across multiple
> > > > > nodes, and encoding a node ID into the cookie they return, so they can
> > > > > tell which node the cookie came from when they get it back.)
> > > > > 
> > > > > That works if you assume the cookie is an "offset" bounded above by some
> > > > > measure of the directory size, hence unlikely to ever use the high
> > > > > bits....
> > > > 
> > > > Right, but why wouldn't a nfs export option solave the problem for
> > > > gluster?
> > > 
> > > No, gluster is running on ext4 directly.
> > 
> > OK, so let me see if I can get this straight.  Each local gluster node
> > is running a userspace NFS server, right?
> 
> My understanding is that only one frontend server is running the server.
> So in your picture below, "NFS v3" should be some internal gluster
> protocol:
> 
> 
>                                                    /------ GFS Storage
>                                                   /        Server #1
>    GFS Cluster     NFS V3      GFS Cluster      -- gluster protocol
>    Client        <--------->   Frontend Server  ---------- GFS Storage
>                                                 --         Server #2
>                                                   \
>                                                    \------ GFS Storage
>                                                            Server #3
>  
> 
> That frontend server gets a readdir request for a directory which is
> stored across several of the storage servers.  It has to return a
> cookie.  It will get that cookie back from the client at some unknown
> later time (possibly after the server has rebooted).  So their solution
> is to return a cookie from one of the storage servers, plus some kind of
> node id in the top bits so they can remember which server it came from.
> 
> (I don't know much about gluster, but I think that's the basic idea.)
> 
> I've assumed that users of directory cookies should treat them as
> opaque, so I don't think what gluster is doing is correct.  But on the
> other hand they are defined as integers and described as offsets here
> and there.  And I can't actually think of anything else that would work,
> short of gluster generating and storing its own cookies.
> 
> > Because if it were running
> > a kernel-side NFS server, it would be sufficient to use an nfs export
> > option.
> > 
> > A client which mounts a "gluster file system" is also doing this via
> > NFSv3, right?  Or are they using their own protocol?  If they are
> > using their own protocol, why can't they encode the node ID somewhere
> > else?
> > 
> > So this a correct picture of what is going on:
> > 
> >                                                   /------ GFS Storage
> >                                                  /        Server #1
> >   GFS Cluster     NFS V3      GFS Cluster      -- NFS v3
> >   Client        <--------->   Frontend Server  ---------- GFS Storage
> >                                                --         Server #2
> >                                                  \
> >                                                   \------ GFS Storage
> >                                                           Server #3
> > 
> > 
> > And the reason why it needs to use the high bits is because when it
> > needs to coalesce the results from each GFS Storage Server to the GFS
> > Cluster client?
> > 
> > The other thing that I'd note is that the readdir cookie has been
> > 64-bit since NFSv3, which was released in June ***1995***.  And the
> > explicit, stated purpose of making it be a 64-bit value (as stated in
> > RFC 1813) was to reduce interoperability problems.  If that were the
> > case, are you telling me that Sun (who has traditionally been pretty
> > good worrying about interoperability concerns, and in fact employed
> > the editors of RFC 1813) didn't get this right?  This seems
> > quite.... surprising to me.
> > 
> > I thought this was the whole point of the various NFS interoperability
> > testing done at Connectathon, for which Sun was a major sponsor?!?  No
> > one noticed?!?
> 
> Beats me.  But it's not necessarily easy to replace clients running
> legacy applications, so we're stuck working with the clients we have....
> 
> The linux client does remap the server-provided cookies to small
> integers, I believe exactly because older applications had trouble with
> servers returning "large" cookies.  So presumably ext4-exporting-Linux
> servers aren't the first to do this.
> 
> I don't know which client versions are affected--Connectathon's next
> week and I'll talk to people and make sure there's an ext4 export with
> this turned on to test against.

Actually, one of the main reasons for the Linux client not exporting raw
readdir cookies is because the glibc-2 folks in their infinite wisdom
declared that telldir()/seekdir() use an off_t. They then went yet one
further and decided to declare negative offsets to be illegal so that
they could use the negative values internally in their syscall wrappers.

The POSIX definition has none of the above rubbish
(http://pubs.opengroup.org/onlinepubs/009695399/functions/telldir.html)
and so glibc brilliantly saddled Linux with a crippled readdir
implementation that is _not_ POSIX compatible.

No, I'm not at all bitter...

Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
@ 2013-02-13 16:43                   ` Myklebust, Trond
  0 siblings, 0 replies; 65+ messages in thread
From: Myklebust, Trond @ 2013-02-13 16:43 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Theodore Ts'o, linux-ext4, sandeen, Bernd Schubert,
	gluster-devel, linux-nfs

On Wed, 2013-02-13 at 11:20 -0500, J. Bruce Fields wrote:
> Oops, probably should have cc'd linux-nfs.
> 
> On Wed, Feb 13, 2013 at 10:36:54AM -0500, Theodore Ts'o wrote:
> > On Wed, Feb 13, 2013 at 10:19:53AM -0500, J. Bruce Fields wrote:
> > > > > (In more detail: they're spreading a single directory across multiple
> > > > > nodes, and encoding a node ID into the cookie they return, so they can
> > > > > tell which node the cookie came from when they get it back.)
> > > > > 
> > > > > That works if you assume the cookie is an "offset" bounded above by some
> > > > > measure of the directory size, hence unlikely to ever use the high
> > > > > bits....
> > > > 
> > > > Right, but why wouldn't a nfs export option solave the problem for
> > > > gluster?
> > > 
> > > No, gluster is running on ext4 directly.
> > 
> > OK, so let me see if I can get this straight.  Each local gluster node
> > is running a userspace NFS server, right?
> 
> My understanding is that only one frontend server is running the server.
> So in your picture below, "NFS v3" should be some internal gluster
> protocol:
> 
> 
>                                                    /------ GFS Storage
>                                                   /        Server #1
>    GFS Cluster     NFS V3      GFS Cluster      -- gluster protocol
>    Client        <--------->   Frontend Server  ---------- GFS Storage
>                                                 --         Server #2
>                                                   \
>                                                    \------ GFS Storage
>                                                            Server #3
>  
> 
> That frontend server gets a readdir request for a directory which is
> stored across several of the storage servers.  It has to return a
> cookie.  It will get that cookie back from the client at some unknown
> later time (possibly after the server has rebooted).  So their solution
> is to return a cookie from one of the storage servers, plus some kind of
> node id in the top bits so they can remember which server it came from.
> 
> (I don't know much about gluster, but I think that's the basic idea.)
> 
> I've assumed that users of directory cookies should treat them as
> opaque, so I don't think what gluster is doing is correct.  But on the
> other hand they are defined as integers and described as offsets here
> and there.  And I can't actually think of anything else that would work,
> short of gluster generating and storing its own cookies.
> 
> > Because if it were running
> > a kernel-side NFS server, it would be sufficient to use an nfs export
> > option.
> > 
> > A client which mounts a "gluster file system" is also doing this via
> > NFSv3, right?  Or are they using their own protocol?  If they are
> > using their own protocol, why can't they encode the node ID somewhere
> > else?
> > 
> > So this a correct picture of what is going on:
> > 
> >                                                   /------ GFS Storage
> >                                                  /        Server #1
> >   GFS Cluster     NFS V3      GFS Cluster      -- NFS v3
> >   Client        <--------->   Frontend Server  ---------- GFS Storage
> >                                                --         Server #2
> >                                                  \
> >                                                   \------ GFS Storage
> >                                                           Server #3
> > 
> > 
> > And the reason why it needs to use the high bits is because when it
> > needs to coalesce the results from each GFS Storage Server to the GFS
> > Cluster client?
> > 
> > The other thing that I'd note is that the readdir cookie has been
> > 64-bit since NFSv3, which was released in June ***1995***.  And the
> > explicit, stated purpose of making it be a 64-bit value (as stated in
> > RFC 1813) was to reduce interoperability problems.  If that were the
> > case, are you telling me that Sun (who has traditionally been pretty
> > good worrying about interoperability concerns, and in fact employed
> > the editors of RFC 1813) didn't get this right?  This seems
> > quite.... surprising to me.
> > 
> > I thought this was the whole point of the various NFS interoperability
> > testing done at Connectathon, for which Sun was a major sponsor?!?  No
> > one noticed?!?
> 
> Beats me.  But it's not necessarily easy to replace clients running
> legacy applications, so we're stuck working with the clients we have....
> 
> The linux client does remap the server-provided cookies to small
> integers, I believe exactly because older applications had trouble with
> servers returning "large" cookies.  So presumably ext4-exporting-Linux
> servers aren't the first to do this.
> 
> I don't know which client versions are affected--Connectathon's next
> week and I'll talk to people and make sure there's an ext4 export with
> this turned on to test against.

Actually, one of the main reasons for the Linux client not exporting raw
readdir cookies is because the glibc-2 folks in their infinite wisdom
declared that telldir()/seekdir() use an off_t. They then went yet one
further and decided to declare negative offsets to be illegal so that
they could use the negative values internally in their syscall wrappers.

The POSIX definition has none of the above rubbish
(http://pubs.opengroup.org/onlinepubs/009695399/functions/telldir.html)
and so glibc brilliantly saddled Linux with a crippled readdir
implementation that is _not_ POSIX compatible.

No, I'm not at all bitter...

Trond

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-13 16:20               ` J. Bruce Fields
@ 2013-02-13 21:21                   ` Anand Avati
  -1 siblings, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-02-13 21:21 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: sandeen-H+wXaHxf7aLQT0dZR+AlfA, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	Theodore Ts'o, Bernd Schubert,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A


[-- Attachment #1.1: Type: text/plain, Size: 3581 bytes --]

>
> My understanding is that only one frontend server is running the server.
> So in your picture below, "NFS v3" should be some internal gluster
> protocol:
>
>
>                                                    /------ GFS Storage
>                                                   /        Server #1
>    GFS Cluster     NFS V3      GFS Cluster      -- gluster protocol
>    Client        <--------->   Frontend Server  ---------- GFS Storage
>                                                 --         Server #2
>                                                   \
>                                                    \------ GFS Storage
>                                                            Server #3
>
>
> That frontend server gets a readdir request for a directory which is
> stored across several of the storage servers.  It has to return a
> cookie.  It will get that cookie back from the client at some unknown
> later time (possibly after the server has rebooted).  So their solution
> is to return a cookie from one of the storage servers, plus some kind of
> node id in the top bits so they can remember which server it came from.
>
> (I don't know much about gluster, but I think that's the basic idea.)
>
> I've assumed that users of directory cookies should treat them as
> opaque, so I don't think what gluster is doing is correct.


NFS uses the term cookies, while man pages of readdir/seekdir/telldir calls
them "offsets". RFC 1813 only talks about communication between and NFS
server and NFS client. While knfsd performs a trivial 1:1 mapping between
d_off "offsets" into these "opaque cookies", the "gluster" issue at hand is
that, it made assumptions about the nature of these "offsets" (that they
are representing some kind of true distance/offset and therefore fall
within some kind of bounded magnitude -- somewhat like the inode
numbering), and performs a transformation (instead of a 1:1 trivial
mapping) like this:

  final_d_off = (ext4_d_off * MAX_SERVERS) + server_idx

thereby utilizing a few more top bits, also ability to perform a reverse
transformation to "continue" from a previous location.  As you can see,
final_d_off now overflows for very large values of ext4_d_off. This
final_d_off is used both as cookies in gluster-NFS (userspace) server, and
also as d_off entry parameter in FUSE readdir reply. The gluster / ext4
d_off issue is not limited to gluster-NFS, but also exists in the FUSE
client where NFS is completely out of picture.

You are probably right in that gluster has made different assumptions about
the "nature" of values filled in d_off fields. But the language used in all
man pages makes you believe they were supposed to be numbers representing
some kind of distance/offset (with bounded magnitude), and not a "random"
number.

This had worked (accidentally, you may call it) on all filesystems
including ext4, as expected. But on kernel upgrade, only ext4 backed
deployments started giving problems and we have been advising our users to
either downgrade their kernel or use a different filesystem (we really do
not want to force them into making a choice of one backend filesystem vs
another.)

You can always say "this is your fault" for interpreting the man pages
differently and punish us by leaving things as they are (and unfortunately
a big chunk of users who want both ext4 and gluster jeapordized). Or you
can be kind, generous and be considerate to the legacy apps and users (of
which gluster is only a subset) and only provide a mount option to control
the large d_off behavior.

Thanks!
Avati

[-- Attachment #1.2: Type: text/html, Size: 4215 bytes --]

[-- Attachment #2: Type: text/plain, Size: 185 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
@ 2013-02-13 21:21                   ` Anand Avati
  0 siblings, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-02-13 21:21 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: sandeen, linux-nfs, Theodore Ts'o, Bernd Schubert,
	linux-ext4, gluster-devel


[-- Attachment #1.1: Type: text/plain, Size: 3581 bytes --]

>
> My understanding is that only one frontend server is running the server.
> So in your picture below, "NFS v3" should be some internal gluster
> protocol:
>
>
>                                                    /------ GFS Storage
>                                                   /        Server #1
>    GFS Cluster     NFS V3      GFS Cluster      -- gluster protocol
>    Client        <--------->   Frontend Server  ---------- GFS Storage
>                                                 --         Server #2
>                                                   \
>                                                    \------ GFS Storage
>                                                            Server #3
>
>
> That frontend server gets a readdir request for a directory which is
> stored across several of the storage servers.  It has to return a
> cookie.  It will get that cookie back from the client at some unknown
> later time (possibly after the server has rebooted).  So their solution
> is to return a cookie from one of the storage servers, plus some kind of
> node id in the top bits so they can remember which server it came from.
>
> (I don't know much about gluster, but I think that's the basic idea.)
>
> I've assumed that users of directory cookies should treat them as
> opaque, so I don't think what gluster is doing is correct.


NFS uses the term cookies, while man pages of readdir/seekdir/telldir calls
them "offsets". RFC 1813 only talks about communication between and NFS
server and NFS client. While knfsd performs a trivial 1:1 mapping between
d_off "offsets" into these "opaque cookies", the "gluster" issue at hand is
that, it made assumptions about the nature of these "offsets" (that they
are representing some kind of true distance/offset and therefore fall
within some kind of bounded magnitude -- somewhat like the inode
numbering), and performs a transformation (instead of a 1:1 trivial
mapping) like this:

  final_d_off = (ext4_d_off * MAX_SERVERS) + server_idx

thereby utilizing a few more top bits, also ability to perform a reverse
transformation to "continue" from a previous location.  As you can see,
final_d_off now overflows for very large values of ext4_d_off. This
final_d_off is used both as cookies in gluster-NFS (userspace) server, and
also as d_off entry parameter in FUSE readdir reply. The gluster / ext4
d_off issue is not limited to gluster-NFS, but also exists in the FUSE
client where NFS is completely out of picture.

You are probably right in that gluster has made different assumptions about
the "nature" of values filled in d_off fields. But the language used in all
man pages makes you believe they were supposed to be numbers representing
some kind of distance/offset (with bounded magnitude), and not a "random"
number.

This had worked (accidentally, you may call it) on all filesystems
including ext4, as expected. But on kernel upgrade, only ext4 backed
deployments started giving problems and we have been advising our users to
either downgrade their kernel or use a different filesystem (we really do
not want to force them into making a choice of one backend filesystem vs
another.)

You can always say "this is your fault" for interpreting the man pages
differently and punish us by leaving things as they are (and unfortunately
a big chunk of users who want both ext4 and gluster jeapordized). Or you
can be kind, generous and be considerate to the legacy apps and users (of
which gluster is only a subset) and only provide a mount option to control
the large d_off behavior.

Thanks!
Avati

[-- Attachment #1.2: Type: text/html, Size: 4215 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-13 16:43                   ` Myklebust, Trond
  (?)
@ 2013-02-13 21:33                   ` J. Bruce Fields
  2013-02-14  3:59                     ` Myklebust, Trond
  -1 siblings, 1 reply; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-13 21:33 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Theodore Ts'o, linux-ext4, sandeen, Bernd Schubert,
	gluster-devel, linux-nfs

On Wed, Feb 13, 2013 at 04:43:05PM +0000, Myklebust, Trond wrote:
> On Wed, 2013-02-13 at 11:20 -0500, J. Bruce Fields wrote:
> > Oops, probably should have cc'd linux-nfs.
> > 
> > On Wed, Feb 13, 2013 at 10:36:54AM -0500, Theodore Ts'o wrote:
> > > The other thing that I'd note is that the readdir cookie has been
> > > 64-bit since NFSv3, which was released in June ***1995***.  And the
> > > explicit, stated purpose of making it be a 64-bit value (as stated in
> > > RFC 1813) was to reduce interoperability problems.  If that were the
> > > case, are you telling me that Sun (who has traditionally been pretty
> > > good worrying about interoperability concerns, and in fact employed
> > > the editors of RFC 1813) didn't get this right?  This seems
> > > quite.... surprising to me.
> > > 
> > > I thought this was the whole point of the various NFS interoperability
> > > testing done at Connectathon, for which Sun was a major sponsor?!?  No
> > > one noticed?!?
> > 
> > Beats me.  But it's not necessarily easy to replace clients running
> > legacy applications, so we're stuck working with the clients we have....
> > 
> > The linux client does remap the server-provided cookies to small
> > integers, I believe exactly because older applications had trouble with
> > servers returning "large" cookies.  So presumably ext4-exporting-Linux
> > servers aren't the first to do this.
> > 
> > I don't know which client versions are affected--Connectathon's next
> > week and I'll talk to people and make sure there's an ext4 export with
> > this turned on to test against.
> 
> Actually, one of the main reasons for the Linux client not exporting raw
> readdir cookies is because the glibc-2 folks in their infinite wisdom
> declared that telldir()/seekdir() use an off_t. They then went yet one
> further and decided to declare negative offsets to be illegal so that
> they could use the negative values internally in their syscall wrappers.
> 
> The POSIX definition has none of the above rubbish
> (http://pubs.opengroup.org/onlinepubs/009695399/functions/telldir.html)
> and so glibc brilliantly saddled Linux with a crippled readdir
> implementation that is _not_ POSIX compatible.
> 
> No, I'm not at all bitter...

Oh, right, I knew I'd forgotten part of the story....

But then you must have actually been testing against servers that were
using that 32nd bit?

I think ext4 actually only uses 31 bits even in the 32-bit case.  And
for a server that was literally using an offset inside a directory file,
that would be a colossal directory.

So I'm wondering how you ran across it.

Partly just pure curiosity.

--b.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-13  8:17     ` Bernd Schubert
@ 2013-02-13 22:18       ` J. Bruce Fields
  0 siblings, 0 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-13 22:18 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-ext4, sandeen, Theodore Ts'o, gluster-devel, Andreas Dilger

On Wed, Feb 13, 2013 at 09:17:28AM +0100, Bernd Schubert wrote:
> On 02/12/2013 10:00 PM, J. Bruce Fields wrote:
> >On Tue, Feb 12, 2013 at 09:56:41PM +0100, Bernd Schubert wrote:
> >>On 02/12/2013 09:28 PM, J. Bruce Fields wrote:
> >>>06effdbb49af5f6c "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)"
> >>>and previous patches solved problems with hash collisions in large
> >>>directories by using 64- instead of 32- bit directory hashes in some
> >>>cases.  But it caused problems for users who assume directory offsets
> >>>are "small".  Two cases we've run across:
> >>>
> >>>	- older NFS clients: 64-bit cookies cause applications on many
> >>>	  older clients to fail.
> >>>	- gluster: gluster assumed that it could take the top bits of
> >>>	  the offset for its own use.
> >>>
> >>>In both cases we could argue we're in the right: the nfs protocol
> >>>defines cookies to be 64 bits, so clients should be prepared to handle
> >>>them (remapping to smaller integers if necessary to placate applications
> >>>using older system interfaces).  And gluster was incorrect to assume
> >>>that the "offset" was really an "offset" as opposed to just an opaque
> >>>value.
> >>>
> >>>But in practice things that worked fine for a long time break on a
> >>>kernel upgrade.
> >>>
> >>>So at a minimum I think we owe people a workaround, and turning off
> >>>dir_index may not be practical for everyone.
> >>>
> >>>A "no_64bit_cookies" export option would provide a workaround for NFS
> >>>servers with older NFS clients, but not for applications like gluster.
> >>>
> >>>For that reason I'd rather have a way to turn this off on a given ext4
> >>>filesystem.  Is that practical?
> >>
> >>I think Ted needs to answer if he would accept another mount option. But
> >>before we are going this way, what is gluster doing if there are hash
> >>collions?
> >
> >They probably just haven't tested NFS with large enough directories.
> 
> Is it only related to NFS or generic readdir over gluster?
> 
> >The birthday paradox says you'd need about 2^16 entries to have a 50-50
> >chance of hitting the problem.
> 
> We are frequently running into it with 50000 files per directory.
> 
> >
> >I don't know enough about ext4 directory performance.  But unfortunately
> >I suspect there's a range of directory sizes that are too small to have
> >a significant chance of having directory collisions, but still large
> >enough to need dir_index?
> 
> Here is a link to the initial benchmark:
> http://search.luky.org/linux-kernel.2001/msg00117.html

Hm, so I still don't have a good feeling for when dir_index is likely to
start winning.

For comparison, assuming the probability of seeing a failure due to hash
collisions in an n-entry directory is the probability of a collision
among n numbers chosen uniformly at random from 2^31, that's about:

	 0.0002% for n=  100
	 0.006 % for n=  500
	 0.02  % for n= 1000
	 0.6   % for n= 5000
	 2     % for n=10000

So if we could tell anyone with directories smaller than 10,000 entries:
"hey, you don't need dir_index anyway, just turn it off"--good, the only
people still forced to deal with 64-bit cookies will be the ones that
have probably already found that ext4 isn't reliable for their purposes.

If there are people with only a few hundred entries who still need
dir_index--well, we may be making them unhappy as we're making them
suffer to fix a bug that they've never actually seen.

--b.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-02-13 21:21                   ` Anand Avati
@ 2013-02-13 22:20                       ` Theodore Ts'o
  -1 siblings, 0 replies; 65+ messages in thread
From: Theodore Ts'o @ 2013-02-13 22:20 UTC (permalink / raw)
  To: Anand Avati
  Cc: J. Bruce Fields, Bernd Schubert, sandeen-H+wXaHxf7aLQT0dZR+AlfA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A

On Wed, Feb 13, 2013 at 01:21:06PM -0800, Anand Avati wrote:
> 
> NFS uses the term cookies, while man pages of readdir/seekdir/telldir calls
> them "offsets". 

Unfortunately, telldir and seekdir are part of the "unspeakable Unix
design horrors" which has been with us for 25+ years.  To quote from
the rationale section from the Single Unix Specification v3 (there is
similar language in the Posix spec).

    The original standard developers perceived that there were
    restrictions on the use of the seekdir() and telldir() functions
    related to implementation details, and for that reason these
    functions need not be supported on all POSIX-conforming
    systems. They are required on implementations supporting the XSI
    extension.

    One of the perceived problems of implementation is that returning
    to a given point in a directory is quite difficult to describe
    formally, in spite of its intuitive appeal, when systems that use
    B-trees, hashing functions, or other similar mechanisms to order
    their directories are considered. The definition of seekdir() and
    telldir() does not specify whether, when using these interfaces, a
    given directory entry will be seen at all, or more than once.

    On systems not supporting these functions, their capability can
    sometimes be accomplished by saving a filename found by readdir()
    and later using rewinddir() and a loop on readdir() to relocate
    the position from which the filename was saved.


Telldir() and seekdir() are basically implementation horrors for any
file system that is using anything other than a simple array of
directory entries ala the V7 Unix file system or the BSD FFS.  For any
file system which is using a more advanced data structure, like
b-trees hash trees, etc, there **can't** possibly be a "offset" into a
readdir stream.  This is why ext3/ext4 uses a telldir cookie, and it's
why the NFS specifications refer to it as a cookie.  If you are using
a modern file system, it can't possibly be an offset.

> You can always say "this is your fault" for interpreting the man pages
> differently and punish us by leaving things as they are (and unfortunately
> a big chunk of users who want both ext4 and gluster jeapordized). Or you
> can be kind, generous and be considerate to the legacy apps and users (of
> which gluster is only a subset) and only provide a mount option to control
> the large d_off behavior.

The problem is that we made this change to fix real problems that take
place when you have hash collisions.  And if you are using a 31-bit
cookie, the birthday paradox means that by the time you have a
directory with 2**16 entries, the chances of hash collisions are very
real.  This could result in NFS readdir getting stuck in loops where
it constantly gets the file "foo.c", and then when it passes the
31-bit cookie for "bar.c", since there is a hash collision, it gets
"foo.c" again, and the readdir never terminates.

So the problem is that you are effectively asking me to penalize
well-behaved programs that don't try to steel bits from the top of the
telldir cookie, just for the benefit of gluster.

What if we have an ioctl or a process personality flag where a broken
application can tell the file system "I'm broken, please give me a
degraded telldir/seekdir cookie"?  That way we don't penalize programs
that are doing the right thing, while providing some accomodation for
programs who are abusing the telldir cookie.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
@ 2013-02-13 22:20                       ` Theodore Ts'o
  0 siblings, 0 replies; 65+ messages in thread
From: Theodore Ts'o @ 2013-02-13 22:20 UTC (permalink / raw)
  To: Anand Avati
  Cc: J. Bruce Fields, Bernd Schubert, sandeen, linux-nfs, linux-ext4,
	gluster-devel

On Wed, Feb 13, 2013 at 01:21:06PM -0800, Anand Avati wrote:
> 
> NFS uses the term cookies, while man pages of readdir/seekdir/telldir calls
> them "offsets". 

Unfortunately, telldir and seekdir are part of the "unspeakable Unix
design horrors" which has been with us for 25+ years.  To quote from
the rationale section from the Single Unix Specification v3 (there is
similar language in the Posix spec).

    The original standard developers perceived that there were
    restrictions on the use of the seekdir() and telldir() functions
    related to implementation details, and for that reason these
    functions need not be supported on all POSIX-conforming
    systems. They are required on implementations supporting the XSI
    extension.

    One of the perceived problems of implementation is that returning
    to a given point in a directory is quite difficult to describe
    formally, in spite of its intuitive appeal, when systems that use
    B-trees, hashing functions, or other similar mechanisms to order
    their directories are considered. The definition of seekdir() and
    telldir() does not specify whether, when using these interfaces, a
    given directory entry will be seen at all, or more than once.

    On systems not supporting these functions, their capability can
    sometimes be accomplished by saving a filename found by readdir()
    and later using rewinddir() and a loop on readdir() to relocate
    the position from which the filename was saved.


Telldir() and seekdir() are basically implementation horrors for any
file system that is using anything other than a simple array of
directory entries ala the V7 Unix file system or the BSD FFS.  For any
file system which is using a more advanced data structure, like
b-trees hash trees, etc, there **can't** possibly be a "offset" into a
readdir stream.  This is why ext3/ext4 uses a telldir cookie, and it's
why the NFS specifications refer to it as a cookie.  If you are using
a modern file system, it can't possibly be an offset.

> You can always say "this is your fault" for interpreting the man pages
> differently and punish us by leaving things as they are (and unfortunately
> a big chunk of users who want both ext4 and gluster jeapordized). Or you
> can be kind, generous and be considerate to the legacy apps and users (of
> which gluster is only a subset) and only provide a mount option to control
> the large d_off behavior.

The problem is that we made this change to fix real problems that take
place when you have hash collisions.  And if you are using a 31-bit
cookie, the birthday paradox means that by the time you have a
directory with 2**16 entries, the chances of hash collisions are very
real.  This could result in NFS readdir getting stuck in loops where
it constantly gets the file "foo.c", and then when it passes the
31-bit cookie for "bar.c", since there is a hash collision, it gets
"foo.c" again, and the readdir never terminates.

So the problem is that you are effectively asking me to penalize
well-behaved programs that don't try to steel bits from the top of the
telldir cookie, just for the benefit of gluster.

What if we have an ioctl or a process personality flag where a broken
application can tell the file system "I'm broken, please give me a
degraded telldir/seekdir cookie"?  That way we don't penalize programs
that are doing the right thing, while providing some accomodation for
programs who are abusing the telldir cookie.

					- Ted

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-02-13 22:20                       ` Theodore Ts'o
  (?)
@ 2013-02-13 22:41                       ` J. Bruce Fields
       [not found]                         ` <20130213224141.GU14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  -1 siblings, 1 reply; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-13 22:41 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Anand Avati, Bernd Schubert, sandeen, linux-nfs, linux-ext4,
	gluster-devel

On Wed, Feb 13, 2013 at 05:20:52PM -0500, Theodore Ts'o wrote:
> On Wed, Feb 13, 2013 at 01:21:06PM -0800, Anand Avati wrote:
> > 
> > NFS uses the term cookies, while man pages of readdir/seekdir/telldir calls
> > them "offsets". 
> 
> Unfortunately, telldir and seekdir are part of the "unspeakable Unix
> design horrors" which has been with us for 25+ years.  To quote from
> the rationale section from the Single Unix Specification v3 (there is
> similar language in the Posix spec).
> 
>     The original standard developers perceived that there were
>     restrictions on the use of the seekdir() and telldir() functions
>     related to implementation details, and for that reason these
>     functions need not be supported on all POSIX-conforming
>     systems. They are required on implementations supporting the XSI
>     extension.
> 
>     One of the perceived problems of implementation is that returning
>     to a given point in a directory is quite difficult to describe
>     formally, in spite of its intuitive appeal, when systems that use
>     B-trees, hashing functions, or other similar mechanisms to order
>     their directories are considered. The definition of seekdir() and
>     telldir() does not specify whether, when using these interfaces, a
>     given directory entry will be seen at all, or more than once.
> 
>     On systems not supporting these functions, their capability can
>     sometimes be accomplished by saving a filename found by readdir()
>     and later using rewinddir() and a loop on readdir() to relocate
>     the position from which the filename was saved.
> 
> 
> Telldir() and seekdir() are basically implementation horrors for any
> file system that is using anything other than a simple array of
> directory entries ala the V7 Unix file system or the BSD FFS.  For any
> file system which is using a more advanced data structure, like
> b-trees hash trees, etc, there **can't** possibly be a "offset" into a
> readdir stream.  This is why ext3/ext4 uses a telldir cookie, and it's
> why the NFS specifications refer to it as a cookie.  If you are using
> a modern file system, it can't possibly be an offset.
> 
> > You can always say "this is your fault" for interpreting the man pages
> > differently and punish us by leaving things as they are (and unfortunately
> > a big chunk of users who want both ext4 and gluster jeapordized). Or you
> > can be kind, generous and be considerate to the legacy apps and users (of
> > which gluster is only a subset) and only provide a mount option to control
> > the large d_off behavior.
> 
> The problem is that we made this change to fix real problems that take
> place when you have hash collisions.  And if you are using a 31-bit
> cookie, the birthday paradox means that by the time you have a
> directory with 2**16 entries, the chances of hash collisions are very
> real.  This could result in NFS readdir getting stuck in loops where
> it constantly gets the file "foo.c", and then when it passes the
> 31-bit cookie for "bar.c", since there is a hash collision, it gets
> "foo.c" again, and the readdir never terminates.
> 
> So the problem is that you are effectively asking me to penalize
> well-behaved programs that don't try to steel bits from the top of the
> telldir cookie, just for the benefit of gluster.
> 
> What if we have an ioctl or a process personality flag where a broken
> application can tell the file system "I'm broken, please give me a
> degraded telldir/seekdir cookie"?  That way we don't penalize programs
> that are doing the right thing, while providing some accomodation for
> programs who are abusing the telldir cookie.

Yeah, if there's a simple way to do that, maybe it would be worth it.

--b.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-02-13 22:41                       ` J. Bruce Fields
@ 2013-02-13 22:47                             ` Theodore Ts'o
  0 siblings, 0 replies; 65+ messages in thread
From: Theodore Ts'o @ 2013-02-13 22:47 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Anand Avati, Bernd Schubert, sandeen-H+wXaHxf7aLQT0dZR+AlfA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A

On Wed, Feb 13, 2013 at 05:41:41PM -0500, J. Bruce Fields wrote:
> > What if we have an ioctl or a process personality flag where a broken
> > application can tell the file system "I'm broken, please give me a
> > degraded telldir/seekdir cookie"?  That way we don't penalize programs
> > that are doing the right thing, while providing some accomodation for
> > programs who are abusing the telldir cookie.
> 
> Yeah, if there's a simple way to do that, maybe it would be worth it.

Doing this as an ioctl which gets called right after opendir, i.e
(ignoring error checking):

      DIR *dir = opendir("/foo/bar/baz");
      ioctl(dirfd(dir), EXT4_IOC_DEGRADED_READDIR, 1);
      ...

should be quite easy.  It would be a very ext3/4 specific thing,
though.

It would be more work to get something in as a process personality
flag, mostly due to the politics of assiging a bit out of the
bitfield.

						- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
@ 2013-02-13 22:47                             ` Theodore Ts'o
  0 siblings, 0 replies; 65+ messages in thread
From: Theodore Ts'o @ 2013-02-13 22:47 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Anand Avati, Bernd Schubert, sandeen, linux-nfs, linux-ext4,
	gluster-devel

On Wed, Feb 13, 2013 at 05:41:41PM -0500, J. Bruce Fields wrote:
> > What if we have an ioctl or a process personality flag where a broken
> > application can tell the file system "I'm broken, please give me a
> > degraded telldir/seekdir cookie"?  That way we don't penalize programs
> > that are doing the right thing, while providing some accomodation for
> > programs who are abusing the telldir cookie.
> 
> Yeah, if there's a simple way to do that, maybe it would be worth it.

Doing this as an ioctl which gets called right after opendir, i.e
(ignoring error checking):

      DIR *dir = opendir("/foo/bar/baz");
      ioctl(dirfd(dir), EXT4_IOC_DEGRADED_READDIR, 1);
      ...

should be quite easy.  It would be a very ext3/4 specific thing,
though.

It would be more work to get something in as a process personality
flag, mostly due to the politics of assiging a bit out of the
bitfield.

						- Ted


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
       [not found]                             ` <20130213224720.GE5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
@ 2013-02-13 22:57                                 ` Anand Avati
  0 siblings, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-02-13 22:57 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Bernd Schubert, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	sandeen-H+wXaHxf7aLQT0dZR+AlfA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A


[-- Attachment #1.1: Type: text/plain, Size: 1183 bytes --]

On Wed, Feb 13, 2013 at 2:47 PM, Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org> wrote:

> On Wed, Feb 13, 2013 at 05:41:41PM -0500, J. Bruce Fields wrote:
> > > What if we have an ioctl or a process personality flag where a broken
> > > application can tell the file system "I'm broken, please give me a
> > > degraded telldir/seekdir cookie"?  That way we don't penalize programs
> > > that are doing the right thing, while providing some accomodation for
> > > programs who are abusing the telldir cookie.
> >
> > Yeah, if there's a simple way to do that, maybe it would be worth it.
>
> Doing this as an ioctl which gets called right after opendir, i.e
> (ignoring error checking):
>
>       DIR *dir = opendir("/foo/bar/baz");
>       ioctl(dirfd(dir), EXT4_IOC_DEGRADED_READDIR, 1);
>       ...
>
> should be quite easy.  It would be a very ext3/4 specific thing,
> though.


That would work, even though it would be ext3/4 specific. What is the
recommended programmatic way to detect if the file is on ext3/4 -- we would
not want to attempt that blindly on a non-ext3/4 FS as the numerical value
of EXT4_IOC_DEGRADED_READDIR might get interpreted in dangerous ways?

Avati

[-- Attachment #1.2: Type: text/html, Size: 1638 bytes --]

[-- Attachment #2: Type: text/plain, Size: 185 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
@ 2013-02-13 22:57                                 ` Anand Avati
  0 siblings, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-02-13 22:57 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Bernd Schubert, linux-nfs, sandeen, linux-ext4, gluster-devel


[-- Attachment #1.1: Type: text/plain, Size: 1162 bytes --]

On Wed, Feb 13, 2013 at 2:47 PM, Theodore Ts'o <tytso@mit.edu> wrote:

> On Wed, Feb 13, 2013 at 05:41:41PM -0500, J. Bruce Fields wrote:
> > > What if we have an ioctl or a process personality flag where a broken
> > > application can tell the file system "I'm broken, please give me a
> > > degraded telldir/seekdir cookie"?  That way we don't penalize programs
> > > that are doing the right thing, while providing some accomodation for
> > > programs who are abusing the telldir cookie.
> >
> > Yeah, if there's a simple way to do that, maybe it would be worth it.
>
> Doing this as an ioctl which gets called right after opendir, i.e
> (ignoring error checking):
>
>       DIR *dir = opendir("/foo/bar/baz");
>       ioctl(dirfd(dir), EXT4_IOC_DEGRADED_READDIR, 1);
>       ...
>
> should be quite easy.  It would be a very ext3/4 specific thing,
> though.


That would work, even though it would be ext3/4 specific. What is the
recommended programmatic way to detect if the file is on ext3/4 -- we would
not want to attempt that blindly on a non-ext3/4 FS as the numerical value
of EXT4_IOC_DEGRADED_READDIR might get interpreted in dangerous ways?

Avati

[-- Attachment #1.2: Type: text/html, Size: 1596 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-02-13 22:57                                 ` Anand Avati
@ 2013-02-13 23:05                                     ` J. Bruce Fields
  -1 siblings, 0 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-13 23:05 UTC (permalink / raw)
  To: Anand Avati
  Cc: Theodore Ts'o, Bernd Schubert,
	sandeen-H+wXaHxf7aLQT0dZR+AlfA, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A

On Wed, Feb 13, 2013 at 02:57:13PM -0800, Anand Avati wrote:
> On Wed, Feb 13, 2013 at 2:47 PM, Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org> wrote:
> 
> > On Wed, Feb 13, 2013 at 05:41:41PM -0500, J. Bruce Fields wrote:
> > > > What if we have an ioctl or a process personality flag where a broken
> > > > application can tell the file system "I'm broken, please give me a
> > > > degraded telldir/seekdir cookie"?  That way we don't penalize programs
> > > > that are doing the right thing, while providing some accomodation for
> > > > programs who are abusing the telldir cookie.
> > >
> > > Yeah, if there's a simple way to do that, maybe it would be worth it.
> >
> > Doing this as an ioctl which gets called right after opendir, i.e
> > (ignoring error checking):
> >
> >       DIR *dir = opendir("/foo/bar/baz");
> >       ioctl(dirfd(dir), EXT4_IOC_DEGRADED_READDIR, 1);
> >       ...
> >
> > should be quite easy.  It would be a very ext3/4 specific thing,
> > though.
> 
> 
> That would work, even though it would be ext3/4 specific. What is the
> recommended programmatic way to detect if the file is on ext3/4 -- we would
> not want to attempt that blindly on a non-ext3/4 FS as the numerical value
> of EXT4_IOC_DEGRADED_READDIR might get interpreted in dangerous ways?

We must have been through this before, but: is the only way to generate
a collision-free readdir cookie really to use a larger hash?

Would it be possible to make something work like, for example, a 31-bit
hash plus an offset into a hash bucket?

I have trouble thinking about this, partly because I can't remember
where to find the requirements for readdir on concurrently modified
directories....

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
@ 2013-02-13 23:05                                     ` J. Bruce Fields
  0 siblings, 0 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-13 23:05 UTC (permalink / raw)
  To: Anand Avati
  Cc: Theodore Ts'o, Bernd Schubert, sandeen, linux-nfs,
	linux-ext4, gluster-devel

On Wed, Feb 13, 2013 at 02:57:13PM -0800, Anand Avati wrote:
> On Wed, Feb 13, 2013 at 2:47 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> 
> > On Wed, Feb 13, 2013 at 05:41:41PM -0500, J. Bruce Fields wrote:
> > > > What if we have an ioctl or a process personality flag where a broken
> > > > application can tell the file system "I'm broken, please give me a
> > > > degraded telldir/seekdir cookie"?  That way we don't penalize programs
> > > > that are doing the right thing, while providing some accomodation for
> > > > programs who are abusing the telldir cookie.
> > >
> > > Yeah, if there's a simple way to do that, maybe it would be worth it.
> >
> > Doing this as an ioctl which gets called right after opendir, i.e
> > (ignoring error checking):
> >
> >       DIR *dir = opendir("/foo/bar/baz");
> >       ioctl(dirfd(dir), EXT4_IOC_DEGRADED_READDIR, 1);
> >       ...
> >
> > should be quite easy.  It would be a very ext3/4 specific thing,
> > though.
> 
> 
> That would work, even though it would be ext3/4 specific. What is the
> recommended programmatic way to detect if the file is on ext3/4 -- we would
> not want to attempt that blindly on a non-ext3/4 FS as the numerical value
> of EXT4_IOC_DEGRADED_READDIR might get interpreted in dangerous ways?

We must have been through this before, but: is the only way to generate
a collision-free readdir cookie really to use a larger hash?

Would it be possible to make something work like, for example, a 31-bit
hash plus an offset into a hash bucket?

I have trouble thinking about this, partly because I can't remember
where to find the requirements for readdir on concurrently modified
directories....

--b.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-02-13 23:05                                     ` J. Bruce Fields
@ 2013-02-13 23:44                                         ` Theodore Ts'o
  -1 siblings, 0 replies; 65+ messages in thread
From: Theodore Ts'o @ 2013-02-13 23:44 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Anand Avati, Bernd Schubert, sandeen-H+wXaHxf7aLQT0dZR+AlfA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A

On Wed, Feb 13, 2013 at 06:05:11PM -0500, J. Bruce Fields wrote:
> 
> Would it be possible to make something work like, for example, a 31-bit
> hash plus an offset into a hash bucket?
> 
> I have trouble thinking about this, partly because I can't remember
> where to find the requirements for readdir on concurrently modified
> directories....

The requires are that for a directory entry which has not been
modified since the last opendir() or rewindir(), readdir() must return
that directory entry exactly once.

For a directory entry which has been added or removed since the last
opendir() or rewinddir() call, it is undefined whether the directory
entry is returned once or not at all.  And a rename is defined as a
add/remove, so it's OK for the old filename and the new file name to
appear in the readdir() stream; it would also be OK if neither
appeared in the readdir() stream.

The SUSv3 definition of readdir() can be found here:

   http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir.html

Note also that if you look at the SuSv3 definition of seekdir(), it
explicitly states that the value returned by telldir() is not
guaranteed to be valid after a rewinddir() or across another opendir():

   If the value of loc was not obtained from an earlier call to
   telldir(), or if a call to rewinddir() occurred between the call to
   telldir() and the call to seekdir(), the results of subsequent
   calls to readdir() are unspecified.

Hence, it would be legal, and arguably more correct, if we created an
internal array of pointers into the directory structure, where the
first call to telldir() return 1, and the second call to telldir()
returned 2, and the third call to telldir() returned 3, regardless of
the position in the directory, and this number was used by seekdir()
to index into the array of pointers to return the exact location in
the b-tree.  This would completely eliminate the possibility of hash
collisions, and guarantee that readdir() would never drop or return a
directory entry multiple times after seekdir().

This implementation approach would have a potential denial of service
potential since each call to telldir() would potentially be allocating
kernel memory, but as long as we make sure the OOM killler kills the
nasty process which is calling telldir() a lot, this would probably be
OK.

It would also be legal to throw away this array after a call to
rewinddir() and closedir(), since telldir() cookies and not guaranteed
to valid indefinitely.  See:

   http://pubs.opengroup.org/onlinepubs/009695399/functions/seekdir.html

I suspect this would seriously screw over Gluster, though, and this
wouldn't be a solution for NFSv3, since NFS needs long-lived directory
cookies, and not the short-lived cookies which is all POSIX/SuSv3 guarantees.

Regards,

					- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
@ 2013-02-13 23:44                                         ` Theodore Ts'o
  0 siblings, 0 replies; 65+ messages in thread
From: Theodore Ts'o @ 2013-02-13 23:44 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Anand Avati, Bernd Schubert, sandeen, linux-nfs, linux-ext4,
	gluster-devel

On Wed, Feb 13, 2013 at 06:05:11PM -0500, J. Bruce Fields wrote:
> 
> Would it be possible to make something work like, for example, a 31-bit
> hash plus an offset into a hash bucket?
> 
> I have trouble thinking about this, partly because I can't remember
> where to find the requirements for readdir on concurrently modified
> directories....

The requires are that for a directory entry which has not been
modified since the last opendir() or rewindir(), readdir() must return
that directory entry exactly once.

For a directory entry which has been added or removed since the last
opendir() or rewinddir() call, it is undefined whether the directory
entry is returned once or not at all.  And a rename is defined as a
add/remove, so it's OK for the old filename and the new file name to
appear in the readdir() stream; it would also be OK if neither
appeared in the readdir() stream.

The SUSv3 definition of readdir() can be found here:

   http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir.html

Note also that if you look at the SuSv3 definition of seekdir(), it
explicitly states that the value returned by telldir() is not
guaranteed to be valid after a rewinddir() or across another opendir():

   If the value of loc was not obtained from an earlier call to
   telldir(), or if a call to rewinddir() occurred between the call to
   telldir() and the call to seekdir(), the results of subsequent
   calls to readdir() are unspecified.

Hence, it would be legal, and arguably more correct, if we created an
internal array of pointers into the directory structure, where the
first call to telldir() return 1, and the second call to telldir()
returned 2, and the third call to telldir() returned 3, regardless of
the position in the directory, and this number was used by seekdir()
to index into the array of pointers to return the exact location in
the b-tree.  This would completely eliminate the possibility of hash
collisions, and guarantee that readdir() would never drop or return a
directory entry multiple times after seekdir().

This implementation approach would have a potential denial of service
potential since each call to telldir() would potentially be allocating
kernel memory, but as long as we make sure the OOM killler kills the
nasty process which is calling telldir() a lot, this would probably be
OK.

It would also be legal to throw away this array after a call to
rewinddir() and closedir(), since telldir() cookies and not guaranteed
to valid indefinitely.  See:

   http://pubs.opengroup.org/onlinepubs/009695399/functions/seekdir.html

I suspect this would seriously screw over Gluster, though, and this
wouldn't be a solution for NFSv3, since NFS needs long-lived directory
cookies, and not the short-lived cookies which is all POSIX/SuSv3 guarantees.

Regards,

					- Ted


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
       [not found]                                         ` <20130213234430.GF5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
@ 2013-02-14  0:05                                             ` Anand Avati
  2013-02-14 21:46                                             ` J. Bruce Fields
  1 sibling, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-02-14  0:05 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Bernd Schubert, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	sandeen-H+wXaHxf7aLQT0dZR+AlfA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A


[-- Attachment #1.1: Type: text/plain, Size: 975 bytes --]

On Wed, Feb 13, 2013 at 3:44 PM, Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org> wrote:
>
> I suspect this would seriously screw over Gluster, though, and this
> wouldn't be a solution for NFSv3, since NFS needs long-lived directory
> cookies, and not the short-lived cookies which is all POSIX/SuSv3
> guarantees.
>

Actually this would work just fine with Gluster. Except in the case of
gluster-NFS, the native client is only acting like a router/proxy of
syscalls to the backend system. A directory opened by an application will
have a matching directory fd opened on ext4, and readdir from an app will
be translated into readdir on the matching fd on ext4. So the
app-on-glusterfs and glusterfsd-on-ext4 are essentially "moving in tandem".
As long as the offs^H^H^H^H cookies do not overflow in the transformation,
Gluster would not have a problem.

However Gluster-NFS (and NFS in general, too) will break, as we
opendir/closedir potentially on every request.

Avati

[-- Attachment #1.2: Type: text/html, Size: 1332 bytes --]

[-- Attachment #2: Type: text/plain, Size: 185 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
@ 2013-02-14  0:05                                             ` Anand Avati
  0 siblings, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-02-14  0:05 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Bernd Schubert, linux-nfs, sandeen, linux-ext4, gluster-devel


[-- Attachment #1.1: Type: text/plain, Size: 954 bytes --]

On Wed, Feb 13, 2013 at 3:44 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>
> I suspect this would seriously screw over Gluster, though, and this
> wouldn't be a solution for NFSv3, since NFS needs long-lived directory
> cookies, and not the short-lived cookies which is all POSIX/SuSv3
> guarantees.
>

Actually this would work just fine with Gluster. Except in the case of
gluster-NFS, the native client is only acting like a router/proxy of
syscalls to the backend system. A directory opened by an application will
have a matching directory fd opened on ext4, and readdir from an app will
be translated into readdir on the matching fd on ext4. So the
app-on-glusterfs and glusterfsd-on-ext4 are essentially "moving in tandem".
As long as the offs^H^H^H^H cookies do not overflow in the transformation,
Gluster would not have a problem.

However Gluster-NFS (and NFS in general, too) will break, as we
opendir/closedir potentially on every request.

Avati

[-- Attachment #1.2: Type: text/html, Size: 1290 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* RE: regressions due to 64-bit ext4 directory cookies
  2013-02-13 21:33                   ` J. Bruce Fields
@ 2013-02-14  3:59                     ` Myklebust, Trond
       [not found]                       ` <4FA345DA4F4AE44899BD2B03EEEC2FA91F3D6BAB-UCI0kNdgLrHLJmV3vhxcH3OR4cbS7gtM96Bgd4bDwmQ@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Myklebust, Trond @ 2013-02-14  3:59 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Theodore Ts'o, linux-ext4, sandeen, Bernd Schubert,
	gluster-devel, linux-nfs

> -----Original Message-----
> From: J. Bruce Fields [mailto:bfields@fieldses.org]
> Sent: Wednesday, February 13, 2013 4:34 PM
> To: Myklebust, Trond
> Cc: Theodore Ts'o; linux-ext4@vger.kernel.org; sandeen@redhat.com;
> Bernd Schubert; gluster-devel@nongnu.org; linux-nfs@vger.kernel.org
> Subject: Re: regressions due to 64-bit ext4 directory cookies
> 
> On Wed, Feb 13, 2013 at 04:43:05PM +0000, Myklebust, Trond wrote:
> > On Wed, 2013-02-13 at 11:20 -0500, J. Bruce Fields wrote:
> > > Oops, probably should have cc'd linux-nfs.
> > >
> > > On Wed, Feb 13, 2013 at 10:36:54AM -0500, Theodore Ts'o wrote:
> > > > The other thing that I'd note is that the readdir cookie has been
> > > > 64-bit since NFSv3, which was released in June ***1995***.  And
> > > > the explicit, stated purpose of making it be a 64-bit value (as
> > > > stated in RFC 1813) was to reduce interoperability problems.  If
> > > > that were the case, are you telling me that Sun (who has
> > > > traditionally been pretty good worrying about interoperability
> > > > concerns, and in fact employed the editors of RFC 1813) didn't get
> > > > this right?  This seems quite.... surprising to me.
> > > >
> > > > I thought this was the whole point of the various NFS
> > > > interoperability testing done at Connectathon, for which Sun was a
> > > > major sponsor?!?  No one noticed?!?
> > >
> > > Beats me.  But it's not necessarily easy to replace clients running
> > > legacy applications, so we're stuck working with the clients we have....
> > >
> > > The linux client does remap the server-provided cookies to small
> > > integers, I believe exactly because older applications had trouble
> > > with servers returning "large" cookies.  So presumably
> > > ext4-exporting-Linux servers aren't the first to do this.
> > >
> > > I don't know which client versions are affected--Connectathon's next
> > > week and I'll talk to people and make sure there's an ext4 export
> > > with this turned on to test against.
> >
> > Actually, one of the main reasons for the Linux client not exporting
> > raw readdir cookies is because the glibc-2 folks in their infinite
> > wisdom declared that telldir()/seekdir() use an off_t. They then went
> > yet one further and decided to declare negative offsets to be illegal
> > so that they could use the negative values internally in their syscall
> wrappers.
> >
> > The POSIX definition has none of the above rubbish
> > (http://pubs.opengroup.org/onlinepubs/009695399/functions/telldir.html
> > ) and so glibc brilliantly saddled Linux with a crippled readdir
> > implementation that is _not_ POSIX compatible.
> >
> > No, I'm not at all bitter...
> 
> Oh, right, I knew I'd forgotten part of the story....
> 
> But then you must have actually been testing against servers that were using
> that 32nd bit?
> 
> I think ext4 actually only uses 31 bits even in the 32-bit case.  And for a server
> that was literally using an offset inside a directory file, that would be a
> colossal directory.
> 
> So I'm wondering how you ran across it.
> 
> Partly just pure curiosity.

IIRC, XFS on IRIX used 0xFFFFF as the readdir eof marker, which caused us to generate an EIO...

Cheers
  Trond

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-02-13 15:40       ` Bernd Schubert
@ 2013-02-14  5:32         ` Dave Chinner
  0 siblings, 0 replies; 65+ messages in thread
From: Dave Chinner @ 2013-02-14  5:32 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Niels de Vos, J. Bruce Fields, sandeen, Andreas Dilger,
	linux-ext4, Theodore Ts'o, gluster-devel

On Wed, Feb 13, 2013 at 04:40:35PM +0100, Bernd Schubert wrote:
> >The general advise (even before this Bug) is that XFS should be used,
> >which is not affected with this problem (yet?).
> 
> Hmm, well, always depends on the workload.

XFS won't suffer from this collision bug, for 2 reasons. The first
is that XFS uses a virtual mapping for directory data and uses an
encoded index into that virtual mapping as the cookie data. You
can't have 2 entries at the same index, so you cannot get cookie
collisions.

The second is that the virtual mapping is for a 32GB data segment,
(2^35 bytes) and, like so much of XFS, the cookie is made up of
bitfields that encode a specific location. The high bits are the
virtual block offset into the directory data segment, the low bits
the offset into the directory block. Given that directory entries
are aligned to 8 bytes, the offset into the directory block can have
3 bits compressed out and hence we end up with only 32 bits being
needed to address the entire 32GB directory data segment.

So, there are no collisions or 32/64 bit issues with XFS directory
cookies regardless of the workload.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-14  3:59                     ` Myklebust, Trond
@ 2013-02-14  5:45                           ` Dave Chinner
  0 siblings, 0 replies; 65+ messages in thread
From: Dave Chinner @ 2013-02-14  5:45 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: J. Bruce Fields, Theodore Ts'o,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	sandeen-H+wXaHxf7aLQT0dZR+AlfA, Bernd Schubert,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Thu, Feb 14, 2013 at 03:59:17AM +0000, Myklebust, Trond wrote:
> > -----Original Message-----
> > From: J. Bruce Fields [mailto:bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org]
> > Sent: Wednesday, February 13, 2013 4:34 PM
> > To: Myklebust, Trond
> > Cc: Theodore Ts'o; linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > Bernd Schubert; gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org; linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Subject: Re: regressions due to 64-bit ext4 directory cookies
> > 
> > On Wed, Feb 13, 2013 at 04:43:05PM +0000, Myklebust, Trond wrote:
> > > On Wed, 2013-02-13 at 11:20 -0500, J. Bruce Fields wrote:
> > > > Oops, probably should have cc'd linux-nfs.
> > > >
> > > > On Wed, Feb 13, 2013 at 10:36:54AM -0500, Theodore Ts'o wrote:
> > > > > The other thing that I'd note is that the readdir cookie has been
> > > > > 64-bit since NFSv3, which was released in June ***1995***.  And
> > > > > the explicit, stated purpose of making it be a 64-bit value (as
> > > > > stated in RFC 1813) was to reduce interoperability problems.  If
> > > > > that were the case, are you telling me that Sun (who has
> > > > > traditionally been pretty good worrying about interoperability
> > > > > concerns, and in fact employed the editors of RFC 1813) didn't get
> > > > > this right?  This seems quite.... surprising to me.
> > > > >
> > > > > I thought this was the whole point of the various NFS
> > > > > interoperability testing done at Connectathon, for which Sun was a
> > > > > major sponsor?!?  No one noticed?!?
> > > >
> > > > Beats me.  But it's not necessarily easy to replace clients running
> > > > legacy applications, so we're stuck working with the clients we have....
> > > >
> > > > The linux client does remap the server-provided cookies to small
> > > > integers, I believe exactly because older applications had trouble
> > > > with servers returning "large" cookies.  So presumably
> > > > ext4-exporting-Linux servers aren't the first to do this.
> > > >
> > > > I don't know which client versions are affected--Connectathon's next
> > > > week and I'll talk to people and make sure there's an ext4 export
> > > > with this turned on to test against.
> > >
> > > Actually, one of the main reasons for the Linux client not exporting
> > > raw readdir cookies is because the glibc-2 folks in their infinite
> > > wisdom declared that telldir()/seekdir() use an off_t. They then went
> > > yet one further and decided to declare negative offsets to be illegal
> > > so that they could use the negative values internally in their syscall
> > wrappers.
> > >
> > > The POSIX definition has none of the above rubbish
> > > (http://pubs.opengroup.org/onlinepubs/009695399/functions/telldir.html
> > > ) and so glibc brilliantly saddled Linux with a crippled readdir
> > > implementation that is _not_ POSIX compatible.
> > >
> > > No, I'm not at all bitter...
> > 
> > Oh, right, I knew I'd forgotten part of the story....
> > 
> > But then you must have actually been testing against servers that were using
> > that 32nd bit?
> > 
> > I think ext4 actually only uses 31 bits even in the 32-bit case.  And for a server
> > that was literally using an offset inside a directory file, that would be a
> > colossal directory.

That's exactly what XFS directory cookies are - a direct encoding of
the dirent offset into the directory file. Which means a overflow
would occur at 16GB of directory data for XFS. That is in the realm
of several hundreds of millions of files in a single directory,
which I have seen done before....

> > So I'm wondering how you ran across it.
> > 
> > Partly just pure curiosity.
> 
> IIRC, XFS on IRIX used 0xFFFFF as the readdir eof marker, which caused us to generate an EIO...

And this discussion explains the magic 0x7fffffff offset mask in the
linux XFS readdir code. I've been trying to find out for years
exactly why that was necessary, and now I know.

I probably should write a patch that makes it a "non-magic" number
and remove it completely for 64 bit platforms before I forget again...

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
@ 2013-02-14  5:45                           ` Dave Chinner
  0 siblings, 0 replies; 65+ messages in thread
From: Dave Chinner @ 2013-02-14  5:45 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: J. Bruce Fields, Theodore Ts'o, linux-ext4, sandeen,
	Bernd Schubert, gluster-devel, linux-nfs

On Thu, Feb 14, 2013 at 03:59:17AM +0000, Myklebust, Trond wrote:
> > -----Original Message-----
> > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > Sent: Wednesday, February 13, 2013 4:34 PM
> > To: Myklebust, Trond
> > Cc: Theodore Ts'o; linux-ext4@vger.kernel.org; sandeen@redhat.com;
> > Bernd Schubert; gluster-devel@nongnu.org; linux-nfs@vger.kernel.org
> > Subject: Re: regressions due to 64-bit ext4 directory cookies
> > 
> > On Wed, Feb 13, 2013 at 04:43:05PM +0000, Myklebust, Trond wrote:
> > > On Wed, 2013-02-13 at 11:20 -0500, J. Bruce Fields wrote:
> > > > Oops, probably should have cc'd linux-nfs.
> > > >
> > > > On Wed, Feb 13, 2013 at 10:36:54AM -0500, Theodore Ts'o wrote:
> > > > > The other thing that I'd note is that the readdir cookie has been
> > > > > 64-bit since NFSv3, which was released in June ***1995***.  And
> > > > > the explicit, stated purpose of making it be a 64-bit value (as
> > > > > stated in RFC 1813) was to reduce interoperability problems.  If
> > > > > that were the case, are you telling me that Sun (who has
> > > > > traditionally been pretty good worrying about interoperability
> > > > > concerns, and in fact employed the editors of RFC 1813) didn't get
> > > > > this right?  This seems quite.... surprising to me.
> > > > >
> > > > > I thought this was the whole point of the various NFS
> > > > > interoperability testing done at Connectathon, for which Sun was a
> > > > > major sponsor?!?  No one noticed?!?
> > > >
> > > > Beats me.  But it's not necessarily easy to replace clients running
> > > > legacy applications, so we're stuck working with the clients we have....
> > > >
> > > > The linux client does remap the server-provided cookies to small
> > > > integers, I believe exactly because older applications had trouble
> > > > with servers returning "large" cookies.  So presumably
> > > > ext4-exporting-Linux servers aren't the first to do this.
> > > >
> > > > I don't know which client versions are affected--Connectathon's next
> > > > week and I'll talk to people and make sure there's an ext4 export
> > > > with this turned on to test against.
> > >
> > > Actually, one of the main reasons for the Linux client not exporting
> > > raw readdir cookies is because the glibc-2 folks in their infinite
> > > wisdom declared that telldir()/seekdir() use an off_t. They then went
> > > yet one further and decided to declare negative offsets to be illegal
> > > so that they could use the negative values internally in their syscall
> > wrappers.
> > >
> > > The POSIX definition has none of the above rubbish
> > > (http://pubs.opengroup.org/onlinepubs/009695399/functions/telldir.html
> > > ) and so glibc brilliantly saddled Linux with a crippled readdir
> > > implementation that is _not_ POSIX compatible.
> > >
> > > No, I'm not at all bitter...
> > 
> > Oh, right, I knew I'd forgotten part of the story....
> > 
> > But then you must have actually been testing against servers that were using
> > that 32nd bit?
> > 
> > I think ext4 actually only uses 31 bits even in the 32-bit case.  And for a server
> > that was literally using an offset inside a directory file, that would be a
> > colossal directory.

That's exactly what XFS directory cookies are - a direct encoding of
the dirent offset into the directory file. Which means a overflow
would occur at 16GB of directory data for XFS. That is in the realm
of several hundreds of millions of files in a single directory,
which I have seen done before....

> > So I'm wondering how you ran across it.
> > 
> > Partly just pure curiosity.
> 
> IIRC, XFS on IRIX used 0xFFFFF as the readdir eof marker, which caused us to generate an EIO...

And this discussion explains the magic 0x7fffffff offset mask in the
linux XFS readdir code. I've been trying to find out for years
exactly why that was necessary, and now I know.

I probably should write a patch that makes it a "non-magic" number
and remove it completely for 64 bit platforms before I forget again...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-02-13 22:20                       ` Theodore Ts'o
@ 2013-02-14  6:10                           ` Dave Chinner
  -1 siblings, 0 replies; 65+ messages in thread
From: Dave Chinner @ 2013-02-14  6:10 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Anand Avati, J. Bruce Fields, Bernd Schubert,
	sandeen-H+wXaHxf7aLQT0dZR+AlfA, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A

On Wed, Feb 13, 2013 at 05:20:52PM -0500, Theodore Ts'o wrote:
> Telldir() and seekdir() are basically implementation horrors for any
> file system that is using anything other than a simple array of
> directory entries ala the V7 Unix file system or the BSD FFS.  For any
> file system which is using a more advanced data structure, like
> b-trees hash trees, etc, there **can't** possibly be a "offset" into a
> readdir stream. 

I'll just point you to this:

http://marc.info/?l=linux-ext4&m=136081996316453&w=2

so you can see that XFS implements what you say can't possibly be
done. ;)

FWIW, that post only talked about the data segment. I didn't mention
that XFS has 2 other segments in the directory file (both beyond
EOF) for the directory data indexes. One contains the name-hash btree
index used for name based lookups and the other contains a freespace
index for tracking free space in the data segment.

IOWs persistent, deterministic, low cost telldir/seekdir behaviour
was a problem solved in the 1990s. :)

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
@ 2013-02-14  6:10                           ` Dave Chinner
  0 siblings, 0 replies; 65+ messages in thread
From: Dave Chinner @ 2013-02-14  6:10 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Anand Avati, J. Bruce Fields, Bernd Schubert, sandeen, linux-nfs,
	linux-ext4, gluster-devel

On Wed, Feb 13, 2013 at 05:20:52PM -0500, Theodore Ts'o wrote:
> Telldir() and seekdir() are basically implementation horrors for any
> file system that is using anything other than a simple array of
> directory entries ala the V7 Unix file system or the BSD FFS.  For any
> file system which is using a more advanced data structure, like
> b-trees hash trees, etc, there **can't** possibly be a "offset" into a
> readdir stream. 

I'll just point you to this:

http://marc.info/?l=linux-ext4&m=136081996316453&w=2

so you can see that XFS implements what you say can't possibly be
done. ;)

FWIW, that post only talked about the data segment. I didn't mention
that XFS has 2 other segments in the directory file (both beyond
EOF) for the directory data indexes. One contains the name-hash btree
index used for name based lookups and the other contains a freespace
index for tracking free space in the data segment.

IOWs persistent, deterministic, low cost telldir/seekdir behaviour
was a problem solved in the 1990s. :)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-02-13 23:44                                         ` Theodore Ts'o
@ 2013-02-14 21:46                                             ` J. Bruce Fields
  -1 siblings, 0 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-14 21:46 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Anand Avati, Bernd Schubert, sandeen-H+wXaHxf7aLQT0dZR+AlfA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A

On Wed, Feb 13, 2013 at 06:44:30PM -0500, Theodore Ts'o wrote:
> On Wed, Feb 13, 2013 at 06:05:11PM -0500, J. Bruce Fields wrote:
> > 
> > Would it be possible to make something work like, for example, a 31-bit
> > hash plus an offset into a hash bucket?
> > 
> > I have trouble thinking about this, partly because I can't remember
> > where to find the requirements for readdir on concurrently modified
> > directories....
> 
> The requires are that for a directory entry which has not been
> modified since the last opendir() or rewindir(), readdir() must return
> that directory entry exactly once.
> 
> For a directory entry which has been added or removed since the last
> opendir() or rewinddir() call, it is undefined whether the directory
> entry is returned once or not at all.  And a rename is defined as a
> add/remove, so it's OK for the old filename and the new file name to
> appear in the readdir() stream; it would also be OK if neither
> appeared in the readdir() stream.

That's what I couldn't remember, thanks!

--b.

> 
> The SUSv3 definition of readdir() can be found here:
> 
>    http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir.html
> 
> Note also that if you look at the SuSv3 definition of seekdir(), it
> explicitly states that the value returned by telldir() is not
> guaranteed to be valid after a rewinddir() or across another opendir():
> 
>    If the value of loc was not obtained from an earlier call to
>    telldir(), or if a call to rewinddir() occurred between the call to
>    telldir() and the call to seekdir(), the results of subsequent
>    calls to readdir() are unspecified.
> 
> Hence, it would be legal, and arguably more correct, if we created an
> internal array of pointers into the directory structure, where the
> first call to telldir() return 1, and the second call to telldir()
> returned 2, and the third call to telldir() returned 3, regardless of
> the position in the directory, and this number was used by seekdir()
> to index into the array of pointers to return the exact location in
> the b-tree.  This would completely eliminate the possibility of hash
> collisions, and guarantee that readdir() would never drop or return a
> directory entry multiple times after seekdir().
> 
> This implementation approach would have a potential denial of service
> potential since each call to telldir() would potentially be allocating
> kernel memory, but as long as we make sure the OOM killler kills the
> nasty process which is calling telldir() a lot, this would probably be
> OK.
> 
> It would also be legal to throw away this array after a call to
> rewinddir() and closedir(), since telldir() cookies and not guaranteed
> to valid indefinitely.  See:
> 
>    http://pubs.opengroup.org/onlinepubs/009695399/functions/seekdir.html
> 
> I suspect this would seriously screw over Gluster, though, and this
> wouldn't be a solution for NFSv3, since NFS needs long-lived directory
> cookies, and not the short-lived cookies which is all POSIX/SuSv3 guarantees.
> 
> Regards,
> 
> 					- Ted
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
@ 2013-02-14 21:46                                             ` J. Bruce Fields
  0 siblings, 0 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-14 21:46 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Anand Avati, Bernd Schubert, sandeen, linux-nfs, linux-ext4,
	gluster-devel

On Wed, Feb 13, 2013 at 06:44:30PM -0500, Theodore Ts'o wrote:
> On Wed, Feb 13, 2013 at 06:05:11PM -0500, J. Bruce Fields wrote:
> > 
> > Would it be possible to make something work like, for example, a 31-bit
> > hash plus an offset into a hash bucket?
> > 
> > I have trouble thinking about this, partly because I can't remember
> > where to find the requirements for readdir on concurrently modified
> > directories....
> 
> The requires are that for a directory entry which has not been
> modified since the last opendir() or rewindir(), readdir() must return
> that directory entry exactly once.
> 
> For a directory entry which has been added or removed since the last
> opendir() or rewinddir() call, it is undefined whether the directory
> entry is returned once or not at all.  And a rename is defined as a
> add/remove, so it's OK for the old filename and the new file name to
> appear in the readdir() stream; it would also be OK if neither
> appeared in the readdir() stream.

That's what I couldn't remember, thanks!

--b.

> 
> The SUSv3 definition of readdir() can be found here:
> 
>    http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir.html
> 
> Note also that if you look at the SuSv3 definition of seekdir(), it
> explicitly states that the value returned by telldir() is not
> guaranteed to be valid after a rewinddir() or across another opendir():
> 
>    If the value of loc was not obtained from an earlier call to
>    telldir(), or if a call to rewinddir() occurred between the call to
>    telldir() and the call to seekdir(), the results of subsequent
>    calls to readdir() are unspecified.
> 
> Hence, it would be legal, and arguably more correct, if we created an
> internal array of pointers into the directory structure, where the
> first call to telldir() return 1, and the second call to telldir()
> returned 2, and the third call to telldir() returned 3, regardless of
> the position in the directory, and this number was used by seekdir()
> to index into the array of pointers to return the exact location in
> the b-tree.  This would completely eliminate the possibility of hash
> collisions, and guarantee that readdir() would never drop or return a
> directory entry multiple times after seekdir().
> 
> This implementation approach would have a potential denial of service
> potential since each call to telldir() would potentially be allocating
> kernel memory, but as long as we make sure the OOM killler kills the
> nasty process which is calling telldir() a lot, this would probably be
> OK.
> 
> It would also be legal to throw away this array after a call to
> rewinddir() and closedir(), since telldir() cookies and not guaranteed
> to valid indefinitely.  See:
> 
>    http://pubs.opengroup.org/onlinepubs/009695399/functions/seekdir.html
> 
> I suspect this would seriously screw over Gluster, though, and this
> wouldn't be a solution for NFSv3, since NFS needs long-lived directory
> cookies, and not the short-lived cookies which is all POSIX/SuSv3 guarantees.
> 
> Regards,
> 
> 					- Ted
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-02-14  0:05                                             ` Anand Avati
@ 2013-02-14 21:47                                                 ` J. Bruce Fields
  -1 siblings, 0 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-14 21:47 UTC (permalink / raw)
  To: Anand Avati
  Cc: Theodore Ts'o, Bernd Schubert,
	sandeen-H+wXaHxf7aLQT0dZR+AlfA, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A

On Wed, Feb 13, 2013 at 04:05:01PM -0800, Anand Avati wrote:
> On Wed, Feb 13, 2013 at 3:44 PM, Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org> wrote:
> >
> > I suspect this would seriously screw over Gluster, though, and this
> > wouldn't be a solution for NFSv3, since NFS needs long-lived directory
> > cookies, and not the short-lived cookies which is all POSIX/SuSv3
> > guarantees.
> >
> 
> Actually this would work just fine with Gluster. Except in the case of
> gluster-NFS, the native client is only acting like a router/proxy of
> syscalls to the backend system. A directory opened by an application will
> have a matching directory fd opened on ext4, and readdir from an app will
> be translated into readdir on the matching fd on ext4. So the
> app-on-glusterfs and glusterfsd-on-ext4 are essentially "moving in tandem".
> As long as the offs^H^H^H^H cookies do not overflow in the transformation,
> Gluster would not have a problem.
> 
> However Gluster-NFS (and NFS in general, too) will break, as we
> opendir/closedir potentially on every request.

Yes.  And, of course, NFS cookies live forever--we have no idea when a
client will hand one back to us and expect us to do something with it.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
@ 2013-02-14 21:47                                                 ` J. Bruce Fields
  0 siblings, 0 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-14 21:47 UTC (permalink / raw)
  To: Anand Avati
  Cc: Theodore Ts'o, Bernd Schubert, sandeen, linux-nfs,
	linux-ext4, gluster-devel

On Wed, Feb 13, 2013 at 04:05:01PM -0800, Anand Avati wrote:
> On Wed, Feb 13, 2013 at 3:44 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> >
> > I suspect this would seriously screw over Gluster, though, and this
> > wouldn't be a solution for NFSv3, since NFS needs long-lived directory
> > cookies, and not the short-lived cookies which is all POSIX/SuSv3
> > guarantees.
> >
> 
> Actually this would work just fine with Gluster. Except in the case of
> gluster-NFS, the native client is only acting like a router/proxy of
> syscalls to the backend system. A directory opened by an application will
> have a matching directory fd opened on ext4, and readdir from an app will
> be translated into readdir on the matching fd on ext4. So the
> app-on-glusterfs and glusterfsd-on-ext4 are essentially "moving in tandem".
> As long as the offs^H^H^H^H cookies do not overflow in the transformation,
> Gluster would not have a problem.
> 
> However Gluster-NFS (and NFS in general, too) will break, as we
> opendir/closedir potentially on every request.

Yes.  And, of course, NFS cookies live forever--we have no idea when a
client will hand one back to us and expect us to do something with it.

--b.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-02-14  6:10                           ` Dave Chinner
  (?)
@ 2013-02-14 22:01                           ` J. Bruce Fields
  2013-02-15  2:27                             ` Dave Chinner
  -1 siblings, 1 reply; 65+ messages in thread
From: J. Bruce Fields @ 2013-02-14 22:01 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Theodore Ts'o, Anand Avati, Bernd Schubert, sandeen,
	linux-nfs, linux-ext4, gluster-devel

On Thu, Feb 14, 2013 at 05:10:02PM +1100, Dave Chinner wrote:
> On Wed, Feb 13, 2013 at 05:20:52PM -0500, Theodore Ts'o wrote:
> > Telldir() and seekdir() are basically implementation horrors for any
> > file system that is using anything other than a simple array of
> > directory entries ala the V7 Unix file system or the BSD FFS.  For any
> > file system which is using a more advanced data structure, like
> > b-trees hash trees, etc, there **can't** possibly be a "offset" into a
> > readdir stream. 
> 
> I'll just point you to this:
> 
> http://marc.info/?l=linux-ext4&m=136081996316453&w=2
> 
> so you can see that XFS implements what you say can't possibly be
> done. ;)
> 
> FWIW, that post only talked about the data segment. I didn't mention
> that XFS has 2 other segments in the directory file (both beyond
> EOF) for the directory data indexes. One contains the name-hash btree
> index used for name based lookups and the other contains a freespace
> index for tracking free space in the data segment.

OK, so in some sense that reduces the problem to that of implementing
readdir cookies for directories that are stored in a simple linear
array.

Which I should know how to do but I don't: I guess all you need is a
provision for making holes on remove (so that you aren't required move
existing entries, messing up offsets for concurrent readers)?

Purely out of curiosity: is there a more detailed writeup of XFS's
directory format?  (Or a pointer to a piece of the code a person could
understand without losing a month to it?)

--b.

> 
> IOWs persistent, deterministic, low cost telldir/seekdir behaviour
> was a problem solved in the 1990s. :)


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-02-14 22:01                           ` J. Bruce Fields
@ 2013-02-15  2:27                             ` Dave Chinner
  0 siblings, 0 replies; 65+ messages in thread
From: Dave Chinner @ 2013-02-15  2:27 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Theodore Ts'o, Anand Avati, Bernd Schubert, sandeen,
	linux-nfs, linux-ext4, gluster-devel

On Thu, Feb 14, 2013 at 05:01:10PM -0500, J. Bruce Fields wrote:
> On Thu, Feb 14, 2013 at 05:10:02PM +1100, Dave Chinner wrote:
> > On Wed, Feb 13, 2013 at 05:20:52PM -0500, Theodore Ts'o wrote:
> > > Telldir() and seekdir() are basically implementation horrors for any
> > > file system that is using anything other than a simple array of
> > > directory entries ala the V7 Unix file system or the BSD FFS.  For any
> > > file system which is using a more advanced data structure, like
> > > b-trees hash trees, etc, there **can't** possibly be a "offset" into a
> > > readdir stream. 
> > 
> > I'll just point you to this:
> > 
> > http://marc.info/?l=linux-ext4&m=136081996316453&w=2
> > 
> > so you can see that XFS implements what you say can't possibly be
> > done. ;)
> > 
> > FWIW, that post only talked about the data segment. I didn't mention
> > that XFS has 2 other segments in the directory file (both beyond
> > EOF) for the directory data indexes. One contains the name-hash btree
> > index used for name based lookups and the other contains a freespace
> > index for tracking free space in the data segment.
> 
> OK, so in some sense that reduces the problem to that of implementing
> readdir cookies for directories that are stored in a simple linear
> array.

*nod*

> Which I should know how to do but I don't: I guess all you need is a
> provision for making holes on remove (so that you aren't required move
> existing entries, messing up offsets for concurrent readers)?

Exactly.

The data segment is a virtual mapping that is maintained by the
extent tree, so we can simply punch holes in it for directory blocks
that are empty and no longer referenced. i.e. the data segement
really is just a sparse file.

The result of doing block mapping this way is that the freespace
tracking segment actually only needs to track space in partially
used blocks. Hence we only need to allocate new blocks when the
freespace map empties, And we work out where to allocate the new
block in the virtual map by doing an extent tree lookup to find the
first hole....

> Purely out of curiosity: is there a more detailed writeup of XFS's
> directory format?  (Or a pointer to a piece of the code a person could
> understand without losing a month to it?)

Not really. There's documentation of the on-disk structures, but
it's a massive leap from there to understanding the structure and
how it all ties together.  I've been spending the past couple of
months deep in the depths of the XFS directory code so how it all
works is front-and-center in my brain right now...

That said, the thought had crossed my mind that there's a a couple
of LWN articles/conference talks I could put together as a brain
dump. ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-02-14  0:05                                             ` Anand Avati
@ 2013-03-26 15:23                                                 ` Bernd Schubert
  -1 siblings, 0 replies; 65+ messages in thread
From: Bernd Schubert @ 2013-03-26 15:23 UTC (permalink / raw)
  To: Anand Avati
  Cc: sandeen-H+wXaHxf7aLQT0dZR+AlfA, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	Theodore Ts'o, linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A

Sorry for my late reply, I had been rather busy.

On 02/14/2013 01:05 AM, Anand Avati wrote:
> On Wed, Feb 13, 2013 at 3:44 PM, Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org> wrote:
>>
>> I suspect this would seriously screw over Gluster, though, and this
>> wouldn't be a solution for NFSv3, since NFS needs long-lived directory
>> cookies, and not the short-lived cookies which is all POSIX/SuSv3
>> guarantees.
>>
>
> Actually this would work just fine with Gluster. Except in the case of

Would it really work perfectly? What about a server reboot in the middle 
of a readdir of a client?

> gluster-NFS, the native client is only acting like a router/proxy of
> syscalls to the backend system. A directory opened by an application will
> have a matching directory fd opened on ext4, and readdir from an app will
> be translated into readdir on the matching fd on ext4. So the
> app-on-glusterfs and glusterfsd-on-ext4 are essentially "moving in tandem".
> As long as the offs^H^H^H^H cookies do not overflow in the transformation,
> Gluster would not have a problem.
>
> However Gluster-NFS (and NFS in general, too) will break, as we
> opendir/closedir potentially on every request.

We don't have reached a conclusion so far, do we? What about the ioctl 
approach, but a bit differently? Would it work to specify the allowed 
upper bits for ext4 (for example 16 additional bit) and the remaining 
part for gluster? One of the mails had the calculation formula:

final_d_off = (ext4_d_off * MAX_SERVERS) + server_idx

But what is the value of MAX_SERVERS?


Cheers,
Bernd

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
@ 2013-03-26 15:23                                                 ` Bernd Schubert
  0 siblings, 0 replies; 65+ messages in thread
From: Bernd Schubert @ 2013-03-26 15:23 UTC (permalink / raw)
  To: Anand Avati
  Cc: Theodore Ts'o, J. Bruce Fields, sandeen, linux-nfs,
	linux-ext4, gluster-devel

Sorry for my late reply, I had been rather busy.

On 02/14/2013 01:05 AM, Anand Avati wrote:
> On Wed, Feb 13, 2013 at 3:44 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>>
>> I suspect this would seriously screw over Gluster, though, and this
>> wouldn't be a solution for NFSv3, since NFS needs long-lived directory
>> cookies, and not the short-lived cookies which is all POSIX/SuSv3
>> guarantees.
>>
>
> Actually this would work just fine with Gluster. Except in the case of

Would it really work perfectly? What about a server reboot in the middle 
of a readdir of a client?

> gluster-NFS, the native client is only acting like a router/proxy of
> syscalls to the backend system. A directory opened by an application will
> have a matching directory fd opened on ext4, and readdir from an app will
> be translated into readdir on the matching fd on ext4. So the
> app-on-glusterfs and glusterfsd-on-ext4 are essentially "moving in tandem".
> As long as the offs^H^H^H^H cookies do not overflow in the transformation,
> Gluster would not have a problem.
>
> However Gluster-NFS (and NFS in general, too) will break, as we
> opendir/closedir potentially on every request.

We don't have reached a conclusion so far, do we? What about the ioctl 
approach, but a bit differently? Would it work to specify the allowed 
upper bits for ext4 (for example 16 additional bit) and the remaining 
part for gluster? One of the mails had the calculation formula:

final_d_off = (ext4_d_off * MAX_SERVERS) + server_idx

But what is the value of MAX_SERVERS?


Cheers,
Bernd



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-03-26 15:23                                                 ` [Gluster-devel] " Bernd Schubert
@ 2013-03-26 15:48                                                     ` Eric Sandeen
  -1 siblings, 0 replies; 65+ messages in thread
From: Eric Sandeen @ 2013-03-26 15:48 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Anand Avati, Theodore Ts'o, J. Bruce Fields,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A

On 3/26/13 10:23 AM, Bernd Schubert wrote:
> Sorry for my late reply, I had been rather busy.
> 
> On 02/14/2013 01:05 AM, Anand Avati wrote:
>> On Wed, Feb 13, 2013 at 3:44 PM, Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org> wrote:
>>>
>>> I suspect this would seriously screw over Gluster, though, and this
>>> wouldn't be a solution for NFSv3, since NFS needs long-lived directory
>>> cookies, and not the short-lived cookies which is all POSIX/SuSv3
>>> guarantees.
>>>
>>
>> Actually this would work just fine with Gluster. Except in the case of
> 
> Would it really work perfectly? What about a server reboot in the middle of a readdir of a client?
> 
>> gluster-NFS, the native client is only acting like a router/proxy of
>> syscalls to the backend system. A directory opened by an application will
>> have a matching directory fd opened on ext4, and readdir from an app will
>> be translated into readdir on the matching fd on ext4. So the
>> app-on-glusterfs and glusterfsd-on-ext4 are essentially "moving in tandem".
>> As long as the offs^H^H^H^H cookies do not overflow in the transformation,
>> Gluster would not have a problem.
>>
>> However Gluster-NFS (and NFS in general, too) will break, as we
>> opendir/closedir potentially on every request.
> 
> We don't have reached a conclusion so far, do we? What about the
> ioctl approach, but a bit differently? Would it work to specify the
> allowed upper bits for ext4 (for example 16 additional bit) and the
> remaining part for gluster? One of the mails had the calculation
> formula:

I did throw together an ioctl patch last week, but I think Anand has a new
approach he's trying out which won't require ext4 code changes.  I'll let
him reply when he has a moment.  :)

-Eric

> final_d_off = (ext4_d_off * MAX_SERVERS) + server_idx
> 
> But what is the value of MAX_SERVERS?
> 
> 
> Cheers,
> Bernd
> 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
@ 2013-03-26 15:48                                                     ` Eric Sandeen
  0 siblings, 0 replies; 65+ messages in thread
From: Eric Sandeen @ 2013-03-26 15:48 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Anand Avati, Theodore Ts'o, J. Bruce Fields, linux-nfs,
	linux-ext4, gluster-devel

On 3/26/13 10:23 AM, Bernd Schubert wrote:
> Sorry for my late reply, I had been rather busy.
> 
> On 02/14/2013 01:05 AM, Anand Avati wrote:
>> On Wed, Feb 13, 2013 at 3:44 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>>>
>>> I suspect this would seriously screw over Gluster, though, and this
>>> wouldn't be a solution for NFSv3, since NFS needs long-lived directory
>>> cookies, and not the short-lived cookies which is all POSIX/SuSv3
>>> guarantees.
>>>
>>
>> Actually this would work just fine with Gluster. Except in the case of
> 
> Would it really work perfectly? What about a server reboot in the middle of a readdir of a client?
> 
>> gluster-NFS, the native client is only acting like a router/proxy of
>> syscalls to the backend system. A directory opened by an application will
>> have a matching directory fd opened on ext4, and readdir from an app will
>> be translated into readdir on the matching fd on ext4. So the
>> app-on-glusterfs and glusterfsd-on-ext4 are essentially "moving in tandem".
>> As long as the offs^H^H^H^H cookies do not overflow in the transformation,
>> Gluster would not have a problem.
>>
>> However Gluster-NFS (and NFS in general, too) will break, as we
>> opendir/closedir potentially on every request.
> 
> We don't have reached a conclusion so far, do we? What about the
> ioctl approach, but a bit differently? Would it work to specify the
> allowed upper bits for ext4 (for example 16 additional bit) and the
> remaining part for gluster? One of the mails had the calculation
> formula:

I did throw together an ioctl patch last week, but I think Anand has a new
approach he's trying out which won't require ext4 code changes.  I'll let
him reply when he has a moment.  :)

-Eric

> final_d_off = (ext4_d_off * MAX_SERVERS) + server_idx
> 
> But what is the value of MAX_SERVERS?
> 
> 
> Cheers,
> Bernd
> 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-03-26 15:48                                                     ` Eric Sandeen
  (?)
@ 2013-03-28 14:07                                                     ` Theodore Ts'o
  2013-03-28 16:26                                                       ` Eric Sandeen
  2013-03-28 17:52                                                       ` Zach Brown
  -1 siblings, 2 replies; 65+ messages in thread
From: Theodore Ts'o @ 2013-03-28 14:07 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Bernd Schubert, Anand Avati, J. Bruce Fields, linux-nfs,
	linux-ext4, gluster-devel

On Tue, Mar 26, 2013 at 10:48:14AM -0500, Eric Sandeen wrote:
> > We don't have reached a conclusion so far, do we? What about the
> > ioctl approach, but a bit differently? Would it work to specify the
> > allowed upper bits for ext4 (for example 16 additional bit) and the
> > remaining part for gluster? One of the mails had the calculation
> > formula:
> 
> I did throw together an ioctl patch last week, but I think Anand has a new
> approach he's trying out which won't require ext4 code changes.  I'll let
> him reply when he has a moment.  :)

Any update about whether Gluster can address this without needing the
ioctl patch?  Or should we push the ioctl patch into ext4 for the next
merge window?

Thanks,

						- Ted

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-03-28 14:07                                                     ` Theodore Ts'o
@ 2013-03-28 16:26                                                       ` Eric Sandeen
  2013-03-28 17:52                                                       ` Zach Brown
  1 sibling, 0 replies; 65+ messages in thread
From: Eric Sandeen @ 2013-03-28 16:26 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Bernd Schubert, Anand Avati, J. Bruce Fields, linux-nfs,
	linux-ext4, gluster-devel

On 3/28/13 9:07 AM, Theodore Ts'o wrote:
> On Tue, Mar 26, 2013 at 10:48:14AM -0500, Eric Sandeen wrote:
>>> We don't have reached a conclusion so far, do we? What about the
>>> ioctl approach, but a bit differently? Would it work to specify the
>>> allowed upper bits for ext4 (for example 16 additional bit) and the
>>> remaining part for gluster? One of the mails had the calculation
>>> formula:
>>
>> I did throw together an ioctl patch last week, but I think Anand has a new
>> approach he's trying out which won't require ext4 code changes.  I'll let
>> him reply when he has a moment.  :)
> 
> Any update about whether Gluster can address this without needing the
> ioctl patch?  Or should we push the ioctl patch into ext4 for the next
> merge window?

I went ahead & sent the ioctl patches to the ext4 list; they are lightly
tested, and not tested at all w/ gluster AFAIK.  Wanted to get them
out just in case we decide we want them.

Thanks,
-Eric

> Thanks,
> 
> 						- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-03-28 14:07                                                     ` Theodore Ts'o
  2013-03-28 16:26                                                       ` Eric Sandeen
@ 2013-03-28 17:52                                                       ` Zach Brown
       [not found]                                                         ` <20130328175205.GD16651-fypN+1c5dIyjpB87vu3CluTW4wlIGRCZ@public.gmane.org>
  1 sibling, 1 reply; 65+ messages in thread
From: Zach Brown @ 2013-03-28 17:52 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Eric Sandeen, Bernd Schubert, Anand Avati, J. Bruce Fields,
	linux-nfs, linux-ext4, gluster-devel

On Thu, Mar 28, 2013 at 10:07:44AM -0400, Theodore Ts'o wrote:
> On Tue, Mar 26, 2013 at 10:48:14AM -0500, Eric Sandeen wrote:
> > > We don't have reached a conclusion so far, do we? What about the
> > > ioctl approach, but a bit differently? Would it work to specify the
> > > allowed upper bits for ext4 (for example 16 additional bit) and the
> > > remaining part for gluster? One of the mails had the calculation
> > > formula:
> > 
> > I did throw together an ioctl patch last week, but I think Anand has a new
> > approach he's trying out which won't require ext4 code changes.  I'll let
> > him reply when he has a moment.  :)
> 
> Any update about whether Gluster can address this without needing the
> ioctl patch?  Or should we push the ioctl patch into ext4 for the next
> merge window?

They're testing a work-around:

  http://review.gluster.org/#change,4711

I'm not sure if they've decided that they're going to go with it, or
not.

- z

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
       [not found]                                                         ` <20130328175205.GD16651-fypN+1c5dIyjpB87vu3CluTW4wlIGRCZ@public.gmane.org>
@ 2013-03-28 18:05                                                             ` Anand Avati
  0 siblings, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-03-28 18:05 UTC (permalink / raw)
  To: Zach Brown
  Cc: Eric Sandeen, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	Theodore Ts'o, Bernd Schubert,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A


[-- Attachment #1.1: Type: text/plain, Size: 1329 bytes --]

On Thu, Mar 28, 2013 at 10:52 AM, Zach Brown <zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Thu, Mar 28, 2013 at 10:07:44AM -0400, Theodore Ts'o wrote:
> > On Tue, Mar 26, 2013 at 10:48:14AM -0500, Eric Sandeen wrote:
> > > > We don't have reached a conclusion so far, do we? What about the
> > > > ioctl approach, but a bit differently? Would it work to specify the
> > > > allowed upper bits for ext4 (for example 16 additional bit) and the
> > > > remaining part for gluster? One of the mails had the calculation
> > > > formula:
> > >
> > > I did throw together an ioctl patch last week, but I think Anand has a
> new
> > > approach he's trying out which won't require ext4 code changes.  I'll
> let
> > > him reply when he has a moment.  :)
> >
> > Any update about whether Gluster can address this without needing the
> > ioctl patch?  Or should we push the ioctl patch into ext4 for the next
> > merge window?
>
> They're testing a work-around:
>
>   http://review.gluster.org/#change,4711
>
> I'm not sure if they've decided that they're going to go with it, or
> not.
>

Jeff reported that the approach did not work in his testing. I haven't had
a chance to look into the failure yet. Independent of the fix, it would
certainly be good have the ioctl() support - Samba could use it too, if it
wanted.

Avati

[-- Attachment #1.2: Type: text/html, Size: 1995 bytes --]

[-- Attachment #2: Type: text/plain, Size: 185 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
@ 2013-03-28 18:05                                                             ` Anand Avati
  0 siblings, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-03-28 18:05 UTC (permalink / raw)
  To: Zach Brown
  Cc: Eric Sandeen, linux-nfs, Theodore Ts'o, Bernd Schubert,
	linux-ext4, gluster-devel


[-- Attachment #1.1: Type: text/plain, Size: 1300 bytes --]

On Thu, Mar 28, 2013 at 10:52 AM, Zach Brown <zab@redhat.com> wrote:

> On Thu, Mar 28, 2013 at 10:07:44AM -0400, Theodore Ts'o wrote:
> > On Tue, Mar 26, 2013 at 10:48:14AM -0500, Eric Sandeen wrote:
> > > > We don't have reached a conclusion so far, do we? What about the
> > > > ioctl approach, but a bit differently? Would it work to specify the
> > > > allowed upper bits for ext4 (for example 16 additional bit) and the
> > > > remaining part for gluster? One of the mails had the calculation
> > > > formula:
> > >
> > > I did throw together an ioctl patch last week, but I think Anand has a
> new
> > > approach he's trying out which won't require ext4 code changes.  I'll
> let
> > > him reply when he has a moment.  :)
> >
> > Any update about whether Gluster can address this without needing the
> > ioctl patch?  Or should we push the ioctl patch into ext4 for the next
> > merge window?
>
> They're testing a work-around:
>
>   http://review.gluster.org/#change,4711
>
> I'm not sure if they've decided that they're going to go with it, or
> not.
>

Jeff reported that the approach did not work in his testing. I haven't had
a chance to look into the failure yet. Independent of the fix, it would
certainly be good have the ioctl() support - Samba could use it too, if it
wanted.

Avati

[-- Attachment #1.2: Type: text/html, Size: 1937 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-03-28 18:05                                                             ` Anand Avati
@ 2013-03-28 18:31                                                                 ` J. Bruce Fields
  -1 siblings, 0 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-03-28 18:31 UTC (permalink / raw)
  To: Anand Avati
  Cc: Zach Brown, Theodore Ts'o, Eric Sandeen, Bernd Schubert,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A

On Thu, Mar 28, 2013 at 11:05:41AM -0700, Anand Avati wrote:
> On Thu, Mar 28, 2013 at 10:52 AM, Zach Brown <zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Thu, Mar 28, 2013 at 10:07:44AM -0400, Theodore Ts'o wrote:
> > > On Tue, Mar 26, 2013 at 10:48:14AM -0500, Eric Sandeen wrote:
> > > > > We don't have reached a conclusion so far, do we? What about the
> > > > > ioctl approach, but a bit differently? Would it work to specify the
> > > > > allowed upper bits for ext4 (for example 16 additional bit) and the
> > > > > remaining part for gluster? One of the mails had the calculation
> > > > > formula:
> > > >
> > > > I did throw together an ioctl patch last week, but I think Anand has a
> > new
> > > > approach he's trying out which won't require ext4 code changes.  I'll
> > let
> > > > him reply when he has a moment.  :)
> > >
> > > Any update about whether Gluster can address this without needing the
> > > ioctl patch?  Or should we push the ioctl patch into ext4 for the next
> > > merge window?
> >
> > They're testing a work-around:
> >
> >   http://review.gluster.org/#change,4711
> >
> > I'm not sure if they've decided that they're going to go with it, or
> > not.
> >
> 
> Jeff reported that the approach did not work in his testing. I haven't had
> a chance to look into the failure yet. Independent of the fix, it would
> certainly be good have the ioctl() support

The one advantage of your scheme is that it keeps more of the hash bits;
the chance of 31-bit cookie collisions is much higher.

> Samba could use it too, if it wanted.

It'd be useful to understand their situation.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
@ 2013-03-28 18:31                                                                 ` J. Bruce Fields
  0 siblings, 0 replies; 65+ messages in thread
From: J. Bruce Fields @ 2013-03-28 18:31 UTC (permalink / raw)
  To: Anand Avati
  Cc: Zach Brown, Theodore Ts'o, Eric Sandeen, Bernd Schubert,
	linux-nfs, linux-ext4, gluster-devel

On Thu, Mar 28, 2013 at 11:05:41AM -0700, Anand Avati wrote:
> On Thu, Mar 28, 2013 at 10:52 AM, Zach Brown <zab@redhat.com> wrote:
> 
> > On Thu, Mar 28, 2013 at 10:07:44AM -0400, Theodore Ts'o wrote:
> > > On Tue, Mar 26, 2013 at 10:48:14AM -0500, Eric Sandeen wrote:
> > > > > We don't have reached a conclusion so far, do we? What about the
> > > > > ioctl approach, but a bit differently? Would it work to specify the
> > > > > allowed upper bits for ext4 (for example 16 additional bit) and the
> > > > > remaining part for gluster? One of the mails had the calculation
> > > > > formula:
> > > >
> > > > I did throw together an ioctl patch last week, but I think Anand has a
> > new
> > > > approach he's trying out which won't require ext4 code changes.  I'll
> > let
> > > > him reply when he has a moment.  :)
> > >
> > > Any update about whether Gluster can address this without needing the
> > > ioctl patch?  Or should we push the ioctl patch into ext4 for the next
> > > merge window?
> >
> > They're testing a work-around:
> >
> >   http://review.gluster.org/#change,4711
> >
> > I'm not sure if they've decided that they're going to go with it, or
> > not.
> >
> 
> Jeff reported that the approach did not work in his testing. I haven't had
> a chance to look into the failure yet. Independent of the fix, it would
> certainly be good have the ioctl() support

The one advantage of your scheme is that it keeps more of the hash bits;
the chance of 31-bit cookie collisions is much higher.

> Samba could use it too, if it wanted.

It'd be useful to understand their situation.

--b.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-03-28 18:31                                                                 ` J. Bruce Fields
@ 2013-03-28 18:49                                                                     ` Anand Avati
  -1 siblings, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-03-28 18:49 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Eric Sandeen, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	Theodore Ts'o, Zach Brown, Bernd Schubert,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A


[-- Attachment #1.1: Type: text/plain, Size: 817 bytes --]

On Thu, Mar 28, 2013 at 11:31 AM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>wrote:
>
> > Jeff reported that the approach did not work in his testing. I haven't
> had
> > a chance to look into the failure yet. Independent of the fix, it would
> > certainly be good have the ioctl() support
>
> The one advantage of your scheme is that it keeps more of the hash bits;
> the chance of 31-bit cookie collisions is much higher.


Yes, it should, based on the theory of how ext4 was generating the 63bits.
But Jeff's test finds that the experiment is not matching the theory. I
intend to debug this, but currently drowned in a different issue. It would
be good if the ext developers can have a look at
http://review.gluster.org/4711 and see if there are obvious holes in the
approach or code.

Avati

[-- Attachment #1.2: Type: text/html, Size: 1274 bytes --]

[-- Attachment #2: Type: text/plain, Size: 185 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
@ 2013-03-28 18:49                                                                     ` Anand Avati
  0 siblings, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-03-28 18:49 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Eric Sandeen, linux-nfs, Theodore Ts'o, Zach Brown,
	Bernd Schubert, linux-ext4, gluster-devel


[-- Attachment #1.1: Type: text/plain, Size: 790 bytes --]

On Thu, Mar 28, 2013 at 11:31 AM, J. Bruce Fields <bfields@fieldses.org>wrote:
>
> > Jeff reported that the approach did not work in his testing. I haven't
> had
> > a chance to look into the failure yet. Independent of the fix, it would
> > certainly be good have the ioctl() support
>
> The one advantage of your scheme is that it keeps more of the hash bits;
> the chance of 31-bit cookie collisions is much higher.


Yes, it should, based on the theory of how ext4 was generating the 63bits.
But Jeff's test finds that the experiment is not matching the theory. I
intend to debug this, but currently drowned in a different issue. It would
be good if the ext developers can have a look at
http://review.gluster.org/4711 and see if there are obvious holes in the
approach or code.

Avati

[-- Attachment #1.2: Type: text/html, Size: 1220 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
  2013-03-28 18:49                                                                     ` Anand Avati
@ 2013-03-28 19:43                                                                         ` Jeff Darcy
  -1 siblings, 0 replies; 65+ messages in thread
From: Jeff Darcy @ 2013-03-28 19:43 UTC (permalink / raw)
  To: Anand Avati
  Cc: J. Bruce Fields, Eric Sandeen, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	Theodore Ts'o, Zach Brown, Bernd Schubert,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A

On 03/28/2013 02:49 PM, Anand Avati wrote:
> Yes, it should, based on the theory of how ext4 was generating the
> 63bits. But Jeff's test finds that the experiment is not matching the
> theory.

FWIW, I was able to re-run my test in between stuff related to That
Other Problem.  What seems to be happening is that we read correctly
until just after d_off 0x4000000000000000, then we suddenly wrap around
- not to the very first d_off we saw, but to a pretty early one (e.g.
0x0041b6340689a32e).  This is all on a single brick, BTW, so it's pretty
easy to line up the back-end and front-end d_off values which match
perfectly up to this point.

I haven't had a chance to ponder what this all means and debug it
further.  Hopefully I'll be able to do so soon, but I figured I'd
mention it in case something about those numbers rang a bell.


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
@ 2013-03-28 19:43                                                                         ` Jeff Darcy
  0 siblings, 0 replies; 65+ messages in thread
From: Jeff Darcy @ 2013-03-28 19:43 UTC (permalink / raw)
  To: Anand Avati
  Cc: J. Bruce Fields, Eric Sandeen, linux-nfs, Theodore Ts'o,
	Zach Brown, Bernd Schubert, linux-ext4, gluster-devel

On 03/28/2013 02:49 PM, Anand Avati wrote:
> Yes, it should, based on the theory of how ext4 was generating the
> 63bits. But Jeff's test finds that the experiment is not matching the
> theory.

FWIW, I was able to re-run my test in between stuff related to That
Other Problem.  What seems to be happening is that we read correctly
until just after d_off 0x4000000000000000, then we suddenly wrap around
- not to the very first d_off we saw, but to a pretty early one (e.g.
0x0041b6340689a32e).  This is all on a single brick, BTW, so it's pretty
easy to line up the back-end and front-end d_off values which match
perfectly up to this point.

I haven't had a chance to ponder what this all means and debug it
further.  Hopefully I'll be able to do so soon, but I figured I'd
mention it in case something about those numbers rang a bell.



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
  2013-03-28 19:43                                                                         ` Jeff Darcy
@ 2013-03-28 22:14                                                                             ` Anand Avati
  -1 siblings, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-03-28 22:14 UTC (permalink / raw)
  To: Jeff Darcy
  Cc: Eric Sandeen, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	Theodore Ts'o, Zach Brown, Bernd Schubert,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A


[-- Attachment #1.1: Type: text/plain, Size: 1323 bytes --]

On Thu, Mar 28, 2013 at 12:43 PM, Jeff Darcy <jdarcy-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On 03/28/2013 02:49 PM, Anand Avati wrote:
> > Yes, it should, based on the theory of how ext4 was generating the
> > 63bits. But Jeff's test finds that the experiment is not matching the
> > theory.
>
> FWIW, I was able to re-run my test in between stuff related to That
> Other Problem.  What seems to be happening is that we read correctly
> until just after d_off 0x4000000000000000, then we suddenly wrap around
> - not to the very first d_off we saw, but to a pretty early one (e.g.
> 0x0041b6340689a32e).  This is all on a single brick, BTW, so it's pretty
> easy to line up the back-end and front-end d_off values which match
> perfectly up to this point.
>
> I haven't had a chance to ponder what this all means and debug it
> further.  Hopefully I'll be able to do so soon, but I figured I'd
> mention it in case something about those numbers rang a bell.
>

Of course, the unit tests (with artificial offsets) were done with brick
count >= 2. You have tested with DHT subvol count=1, which was not tested,
and sure enough, the code isn't handling it well. Just verified with the
unit tests that brick count = 1 condition fails to return the same d_off.

Posting a fixed version. Thanks for the catch!

Avati

[-- Attachment #1.2: Type: text/html, Size: 1789 bytes --]

[-- Attachment #2: Type: text/plain, Size: 185 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
@ 2013-03-28 22:14                                                                             ` Anand Avati
  0 siblings, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-03-28 22:14 UTC (permalink / raw)
  To: Jeff Darcy
  Cc: Eric Sandeen, linux-nfs, Theodore Ts'o, Zach Brown,
	Bernd Schubert, linux-ext4, gluster-devel


[-- Attachment #1.1: Type: text/plain, Size: 1294 bytes --]

On Thu, Mar 28, 2013 at 12:43 PM, Jeff Darcy <jdarcy@redhat.com> wrote:

> On 03/28/2013 02:49 PM, Anand Avati wrote:
> > Yes, it should, based on the theory of how ext4 was generating the
> > 63bits. But Jeff's test finds that the experiment is not matching the
> > theory.
>
> FWIW, I was able to re-run my test in between stuff related to That
> Other Problem.  What seems to be happening is that we read correctly
> until just after d_off 0x4000000000000000, then we suddenly wrap around
> - not to the very first d_off we saw, but to a pretty early one (e.g.
> 0x0041b6340689a32e).  This is all on a single brick, BTW, so it's pretty
> easy to line up the back-end and front-end d_off values which match
> perfectly up to this point.
>
> I haven't had a chance to ponder what this all means and debug it
> further.  Hopefully I'll be able to do so soon, but I figured I'd
> mention it in case something about those numbers rang a bell.
>

Of course, the unit tests (with artificial offsets) were done with brick
count >= 2. You have tested with DHT subvol count=1, which was not tested,
and sure enough, the code isn't handling it well. Just verified with the
unit tests that brick count = 1 condition fails to return the same d_off.

Posting a fixed version. Thanks for the catch!

Avati

[-- Attachment #1.2: Type: text/html, Size: 1731 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
       [not found]                                                                             ` <CAFboF2xkvXx9YFYxBXupwg=s=3MaeQYm2KK2m8MFtEBPsxwQ7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-03-28 22:20                                                                                 ` Anand Avati
  0 siblings, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-03-28 22:20 UTC (permalink / raw)
  To: Jeff Darcy
  Cc: Eric Sandeen, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	Theodore Ts'o, Zach Brown, Bernd Schubert,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	gluster-devel-qX2TKyscuCcdnm+yROfE0A


[-- Attachment #1.1: Type: text/plain, Size: 1646 bytes --]

On Thu, Mar 28, 2013 at 3:14 PM, Anand Avati <anand.avati-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> On Thu, Mar 28, 2013 at 12:43 PM, Jeff Darcy <jdarcy-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
>> On 03/28/2013 02:49 PM, Anand Avati wrote:
>> > Yes, it should, based on the theory of how ext4 was generating the
>> > 63bits. But Jeff's test finds that the experiment is not matching the
>> > theory.
>>
>> FWIW, I was able to re-run my test in between stuff related to That
>> Other Problem.  What seems to be happening is that we read correctly
>> until just after d_off 0x4000000000000000, then we suddenly wrap around
>> - not to the very first d_off we saw, but to a pretty early one (e.g.
>> 0x0041b6340689a32e).  This is all on a single brick, BTW, so it's pretty
>> easy to line up the back-end and front-end d_off values which match
>> perfectly up to this point.
>>
>> I haven't had a chance to ponder what this all means and debug it
>> further.  Hopefully I'll be able to do so soon, but I figured I'd
>> mention it in case something about those numbers rang a bell.
>>
>
> Of course, the unit tests (with artificial offsets) were done with brick
> count >= 2. You have tested with DHT subvol count=1, which was not tested,
> and sure enough, the code isn't handling it well. Just verified with the
> unit tests that brick count = 1 condition fails to return the same d_off.
>
> Posting a fixed version. Thanks for the catch!
>

Posted an updated version http://review.gluster.org/4711. This passes unit
tests for all brick counts (>= 1). Can you confirm if the "loop"ing is now
gone in your test env?

Thanks,
Avati

[-- Attachment #1.2: Type: text/html, Size: 2463 bytes --]

[-- Attachment #2: Type: text/plain, Size: 185 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: regressions due to 64-bit ext4 directory cookies
@ 2013-03-28 22:20                                                                                 ` Anand Avati
  0 siblings, 0 replies; 65+ messages in thread
From: Anand Avati @ 2013-03-28 22:20 UTC (permalink / raw)
  To: Jeff Darcy
  Cc: Eric Sandeen, linux-nfs, Theodore Ts'o, Zach Brown,
	Bernd Schubert, linux-ext4, gluster-devel


[-- Attachment #1.1: Type: text/plain, Size: 1587 bytes --]

On Thu, Mar 28, 2013 at 3:14 PM, Anand Avati <anand.avati@gmail.com> wrote:

> On Thu, Mar 28, 2013 at 12:43 PM, Jeff Darcy <jdarcy@redhat.com> wrote:
>
>> On 03/28/2013 02:49 PM, Anand Avati wrote:
>> > Yes, it should, based on the theory of how ext4 was generating the
>> > 63bits. But Jeff's test finds that the experiment is not matching the
>> > theory.
>>
>> FWIW, I was able to re-run my test in between stuff related to That
>> Other Problem.  What seems to be happening is that we read correctly
>> until just after d_off 0x4000000000000000, then we suddenly wrap around
>> - not to the very first d_off we saw, but to a pretty early one (e.g.
>> 0x0041b6340689a32e).  This is all on a single brick, BTW, so it's pretty
>> easy to line up the back-end and front-end d_off values which match
>> perfectly up to this point.
>>
>> I haven't had a chance to ponder what this all means and debug it
>> further.  Hopefully I'll be able to do so soon, but I figured I'd
>> mention it in case something about those numbers rang a bell.
>>
>
> Of course, the unit tests (with artificial offsets) were done with brick
> count >= 2. You have tested with DHT subvol count=1, which was not tested,
> and sure enough, the code isn't handling it well. Just verified with the
> unit tests that brick count = 1 condition fails to return the same d_off.
>
> Posting a fixed version. Thanks for the catch!
>

Posted an updated version http://review.gluster.org/4711. This passes unit
tests for all brick counts (>= 1). Can you confirm if the "loop"ing is now
gone in your test env?

Thanks,
Avati

[-- Attachment #1.2: Type: text/html, Size: 2345 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2013-03-28 22:20 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-12 20:28 regressions due to 64-bit ext4 directory cookies J. Bruce Fields
2013-02-12 20:56 ` Bernd Schubert
2013-02-12 21:00   ` J. Bruce Fields
2013-02-13  8:17     ` Bernd Schubert
2013-02-13 22:18       ` J. Bruce Fields
2013-02-13 13:31     ` [Gluster-devel] " Niels de Vos
2013-02-13 15:40       ` Bernd Schubert
2013-02-14  5:32         ` Dave Chinner
2013-02-13  4:00 ` Theodore Ts'o
2013-02-13 13:31   ` J. Bruce Fields
2013-02-13 15:14     ` Theodore Ts'o
2013-02-13 15:19       ` J. Bruce Fields
2013-02-13 15:36         ` Theodore Ts'o
     [not found]           ` <20130213153654.GC17431-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 16:20             ` J. Bruce Fields
2013-02-13 16:20               ` J. Bruce Fields
     [not found]               ` <20130213162059.GL14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 16:43                 ` Myklebust, Trond
2013-02-13 16:43                   ` Myklebust, Trond
2013-02-13 21:33                   ` J. Bruce Fields
2013-02-14  3:59                     ` Myklebust, Trond
     [not found]                       ` <4FA345DA4F4AE44899BD2B03EEEC2FA91F3D6BAB-UCI0kNdgLrHLJmV3vhxcH3OR4cbS7gtM96Bgd4bDwmQ@public.gmane.org>
2013-02-14  5:45                         ` Dave Chinner
2013-02-14  5:45                           ` Dave Chinner
2013-02-13 21:21                 ` Anand Avati
2013-02-13 21:21                   ` Anand Avati
     [not found]                   ` <CAFboF2wXvP+vttiff8iRE9rAgvV8UWGbFprgVp8p7kE43TU=PA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 22:20                     ` [Gluster-devel] " Theodore Ts'o
2013-02-13 22:20                       ` Theodore Ts'o
2013-02-13 22:41                       ` J. Bruce Fields
     [not found]                         ` <20130213224141.GU14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 22:47                           ` Theodore Ts'o
2013-02-13 22:47                             ` Theodore Ts'o
     [not found]                             ` <20130213224720.GE5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 22:57                               ` Anand Avati
2013-02-13 22:57                                 ` Anand Avati
     [not found]                                 ` <CAFboF2z1akN_edrY_fT915xfehfHGioA2M=PSHv0Fp3rD-5v5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 23:05                                   ` [Gluster-devel] " J. Bruce Fields
2013-02-13 23:05                                     ` J. Bruce Fields
     [not found]                                     ` <20130213230511.GW14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 23:44                                       ` Theodore Ts'o
2013-02-13 23:44                                         ` Theodore Ts'o
     [not found]                                         ` <20130213234430.GF5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14  0:05                                           ` Anand Avati
2013-02-14  0:05                                             ` Anand Avati
     [not found]                                             ` <CAFboF2zS+YAa0uUxMFUAbqgPh3Kb4xZu40WUjLyGn8qPoP+Oyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-14 21:47                                               ` [Gluster-devel] " J. Bruce Fields
2013-02-14 21:47                                                 ` J. Bruce Fields
2013-03-26 15:23                                               ` Bernd Schubert
2013-03-26 15:23                                                 ` [Gluster-devel] " Bernd Schubert
     [not found]                                                 ` <5151BD5F.30607-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
2013-03-26 15:48                                                   ` Eric Sandeen
2013-03-26 15:48                                                     ` Eric Sandeen
2013-03-28 14:07                                                     ` Theodore Ts'o
2013-03-28 16:26                                                       ` Eric Sandeen
2013-03-28 17:52                                                       ` Zach Brown
     [not found]                                                         ` <20130328175205.GD16651-fypN+1c5dIyjpB87vu3CluTW4wlIGRCZ@public.gmane.org>
2013-03-28 18:05                                                           ` Anand Avati
2013-03-28 18:05                                                             ` Anand Avati
     [not found]                                                             ` <CAFboF2ztc06G00z8ga35NrxgnT2YgBiDECgU_9kvVA_Go1_Bww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 18:31                                                               ` [Gluster-devel] " J. Bruce Fields
2013-03-28 18:31                                                                 ` J. Bruce Fields
     [not found]                                                                 ` <20130328183153.GG7080-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-03-28 18:49                                                                   ` Anand Avati
2013-03-28 18:49                                                                     ` Anand Avati
     [not found]                                                                     ` <CAFboF2w49Lc0vM0SerbJfL9_RuSHgEU+y_Yk7F4pLxeiqu+KRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 19:43                                                                       ` [Gluster-devel] " Jeff Darcy
2013-03-28 19:43                                                                         ` Jeff Darcy
     [not found]                                                                         ` <51549D74.1060703-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-28 22:14                                                                           ` Anand Avati
2013-03-28 22:14                                                                             ` Anand Avati
     [not found]                                                                             ` <CAFboF2xkvXx9YFYxBXupwg=s=3MaeQYm2KK2m8MFtEBPsxwQ7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 22:20                                                                               ` Anand Avati
2013-03-28 22:20                                                                                 ` Anand Avati
2013-02-14 21:46                                           ` [Gluster-devel] " J. Bruce Fields
2013-02-14 21:46                                             ` J. Bruce Fields
     [not found]                       ` <20130213222052.GD5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14  6:10                         ` Dave Chinner
2013-02-14  6:10                           ` Dave Chinner
2013-02-14 22:01                           ` J. Bruce Fields
2013-02-15  2:27                             ` Dave Chinner
2013-02-13  6:56 ` Andreas Dilger
2013-02-13 13:40   ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.