All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS/credentials leak in 2.6.29-rc1
@ 2009-01-20 11:46 Evgeniy Polyakov
  2009-01-20 13:37 ` Trond Myklebust
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Evgeniy Polyakov @ 2009-01-20 11:46 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-fsdevel

Hi Trond.

I recently observed several OOM crashes on the NFS server machine used
for the heavy benchmarks. It did not show in .28 iirc and I never got
OOM crash with other benches with different filesystems.

That's what I observe right now via slabtop, both constantly grew uring
the test:
1798890 1798672  99%    0.12K  59963       30    239852K cred_jar
1798320 1798307  99%    0.25K 119888       15    479552K size-256

machine then died after quite short time.
Overall it took about an hour or two for the 8gb of ram.

Is this a known issue or should I spent more time investigating the
cause?

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-20 11:46 NFS/credentials leak in 2.6.29-rc1 Evgeniy Polyakov
@ 2009-01-20 13:37 ` Trond Myklebust
  2009-01-20 13:49   ` Evgeniy Polyakov
  2009-01-20 15:11 ` J. Bruce Fields
  2009-01-20 21:44 ` David Howells
  2 siblings, 1 reply; 27+ messages in thread
From: Trond Myklebust @ 2009-01-20 13:37 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: linux-fsdevel

On Tue, 2009-01-20 at 14:46 +0300, Evgeniy Polyakov wrote:
> Hi Trond.
> 
> I recently observed several OOM crashes on the NFS server machine used
> for the heavy benchmarks. It did not show in .28 iirc and I never got
> OOM crash with other benches with different filesystems.
> 
> That's what I observe right now via slabtop, both constantly grew uring
> the test:
> 1798890 1798672  99%    0.12K  59963       30    239852K cred_jar
> 1798320 1798307  99%    0.25K 119888       15    479552K size-256
> 
> machine then died after quite short time.
> Overall it took about an hour or two for the 8gb of ram.
> 
> Is this a known issue or should I spent more time investigating the
> cause?

Hi Evgeniy,

This leak is occurring on the server only and not the client, is that
correct?
I don't think this is a known issue but I've forwarded your mail on to
Bruce Fields, who is the current NFS server maintainer, for further
information.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-20 13:37 ` Trond Myklebust
@ 2009-01-20 13:49   ` Evgeniy Polyakov
  0 siblings, 0 replies; 27+ messages in thread
From: Evgeniy Polyakov @ 2009-01-20 13:49 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-fsdevel

On Tue, Jan 20, 2009 at 08:37:03AM -0500, Trond Myklebust (trond.myklebust@fys.uio.no) wrote:
> This leak is occurring on the server only and not the client, is that
> correct?

Yes, I only observed it on the server. I think I can reproduce it here,
at least I already crashed machine 3 times. It looks like it fires with
random IO (can not say if read or write though), since the same overall
amount of data read or written sequentially did not result in OOM
condition, but leak still may be presented there if credential is
attached in single readpages()... Do not know.

> I don't think this is a known issue but I've forwarded your mail on to
> Bruce Fields, who is the current NFS server maintainer, for further
> information.

Thank you.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-20 11:46 NFS/credentials leak in 2.6.29-rc1 Evgeniy Polyakov
  2009-01-20 13:37 ` Trond Myklebust
@ 2009-01-20 15:11 ` J. Bruce Fields
  2009-01-20 15:23   ` Evgeniy Polyakov
  2009-01-20 21:44 ` David Howells
  2 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2009-01-20 15:11 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Trond Myklebust, linux-fsdevel, David Howells

On Tue, Jan 20, 2009 at 02:46:49PM +0300, Evgeniy Polyakov wrote:
> I recently observed several OOM crashes on the NFS server machine used
> for the heavy benchmarks. It did not show in .28 iirc and I never got
> OOM crash with other benches with different filesystems.
> 
> That's what I observe right now via slabtop, both constantly grew uring
> the test:
> 1798890 1798672  99%    0.12K  59963       30    239852K cred_jar
> 1798320 1798307  99%    0.25K 119888       15    479552K size-256
> 
> machine then died after quite short time.
> Overall it took about an hour or two for the 8gb of ram.
> 
> Is this a known issue or should I spent more time investigating the
> cause?

This doesn't look familiar, no; thanks for the report.  I guess we
should take a careful look at the recent changes to fs/nfsd/auth.c?

--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-20 15:11 ` J. Bruce Fields
@ 2009-01-20 15:23   ` Evgeniy Polyakov
  2009-01-20 23:53     ` J. Bruce Fields
  2009-01-21 12:23     ` David Howells
  0 siblings, 2 replies; 27+ messages in thread
From: Evgeniy Polyakov @ 2009-01-20 15:23 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Trond Myklebust, linux-fsdevel, David Howells

On Tue, Jan 20, 2009 at 10:11:25AM -0500, J. Bruce Fields (bfields@fieldses.org) wrote:
> This doesn't look familiar, no; thanks for the report.  I guess we
> should take a careful look at the recent changes to fs/nfsd/auth.c?

If creds are allocated in nfsd_setuser() and never freed? groups_alloc()
can also explain size-256 slab grew, so this may be the place where
things are allocated, but why they are not freed?
This may also explain why I did not see this for the large sequential
IO, since number of requests to the server was noticebly smaller, than
in random IO test.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-20 11:46 NFS/credentials leak in 2.6.29-rc1 Evgeniy Polyakov
  2009-01-20 13:37 ` Trond Myklebust
  2009-01-20 15:11 ` J. Bruce Fields
@ 2009-01-20 21:44 ` David Howells
  2009-01-21 22:42   ` J. Bruce Fields
  2 siblings, 1 reply; 27+ messages in thread
From: David Howells @ 2009-01-20 21:44 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: dhowells, Evgeniy Polyakov, Trond Myklebust, linux-fsdevel

J. Bruce Fields <bfields@fieldses.org> wrote:

> On Tue, Jan 20, 2009 at 02:46:49PM +0300, Evgeniy Polyakov wrote:
> This doesn't look familiar, no; thanks for the report.  I guess we
> should take a careful look at the recent changes to fs/nfsd/auth.c?

I don't see anything obvious in fs/nfsd/auth.c or nfsfh.c, but I proably won't
be able to investigate properly until I get back from LCA in Feb.

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-20 15:23   ` Evgeniy Polyakov
@ 2009-01-20 23:53     ` J. Bruce Fields
  2009-01-21 12:23     ` David Howells
  1 sibling, 0 replies; 27+ messages in thread
From: J. Bruce Fields @ 2009-01-20 23:53 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Trond Myklebust, linux-fsdevel, David Howells

On Tue, Jan 20, 2009 at 06:23:04PM +0300, Evgeniy Polyakov wrote:
> On Tue, Jan 20, 2009 at 10:11:25AM -0500, J. Bruce Fields (bfields@fieldses.org) wrote:
> > This doesn't look familiar, no; thanks for the report.  I guess we
> > should take a careful look at the recent changes to fs/nfsd/auth.c?
> 
> If creds are allocated in nfsd_setuser() and never freed? groups_alloc()
> can also explain size-256 slab grew, so this may be the place where
> things are allocated, but why they are not freed?
> This may also explain why I did not see this for the large sequential
> IO, since number of requests to the server was noticebly smaller, than
> in random IO test.

Looking through nfsd_setuser(), one obvious bug: in the (flags &
NFSEXP_ALLSQUASH) case, we never check the return value from the
groups_alloc(0).  If it returns NULL, we dereference it anyway.

But that's unrelated.  For the cred reference counting:

	- revert_creds(get_cred(current->real_cred)) modifies
	  current->cred, putting the old value and getting the new.  OK.
	- new = prepare_creds() creates a new object with count 1.  All
	  subsequent exits from the function go through the oom or error
	  labels, which put new. OK.
	- Each of the three ALLSQUASH/ROOTSQUASH/else paths create a new
	  reference to a groups_info struct in gi, which is
	  unconditionally assigned to new by set_groups().  set_groups()
	  takes its own reference on gi, then we put the original
	  reference in the following put_group_info().  OK.
	- Finally, we put_cred(override_creds(new)).  That modifies
	  current->cred again, putting the old value and getting the
	  new.

Hm.  But that last part's not OK; aren't we still holding our own
reference to new, in addition to the one that override_creds() just
took?  So I think we need the following?

--b.

diff --git a/fs/nfsd/auth.c b/fs/nfsd/auth.c
index c903e04..9966e9e 100644
--- a/fs/nfsd/auth.c
+++ b/fs/nfsd/auth.c
@@ -85,6 +85,7 @@ int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp)
 		new->cap_effective = cap_raise_nfsd_set(new->cap_effective,
 							new->cap_permitted);
 	put_cred(override_creds(new));
+	put_cred(new);
 	return 0;
 
 oom:

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-20 15:23   ` Evgeniy Polyakov
  2009-01-20 23:53     ` J. Bruce Fields
@ 2009-01-21 12:23     ` David Howells
  2009-01-21 12:37       ` Evgeniy Polyakov
                         ` (3 more replies)
  1 sibling, 4 replies; 27+ messages in thread
From: David Howells @ 2009-01-21 12:23 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: dhowells, Evgeniy Polyakov, Trond Myklebust, linux-fsdevel

J. Bruce Fields <bfields@fieldses.org> wrote:

> 	- Finally, we put_cred(override_creds(new)).  That modifies
> 	  current->cred again, putting the old value and getting the
> 	  new.
> 
> Hm.  But that last part's not OK; aren't we still holding our own
> reference to new, in addition to the one that override_creds() just
> took?  So I think we need the following?

Yes, you're right.  override_creds() takes an extra ref on the argument it is
passed, thus leaving the caller with their original reference intact.

So really, you don't want to call override_creds() as that will cost you an
extra atomic_inc() and atomic_dec_and_test().  I recommend you replace:

        put_cred(override_creds(new));

with:

	revert_creds(new);

I think that should do the right thing.  It may look a bit odd, but it'll be
quicker.  If you object to using revert_creds)( because of the name, we can
come up with an alternative name.

> Looking through nfsd_setuser(), one obvious bug: in the (flags &
> NFSEXP_ALLSQUASH) case, we never check the return value from the
> groups_alloc(0).  If it returns NULL, we dereference it anyway.

Since a zero-length groups list must be copied before writing, can I recommend
that we make groups_alloc(0) a special case that returns pointer to a
statically allocated groups list (after inc'ing the refcount) that represents
a zero-length list, thus meaning groups_alloc(0) will never fail?

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-21 12:23     ` David Howells
@ 2009-01-21 12:37       ` Evgeniy Polyakov
  2009-01-21 22:39         ` J. Bruce Fields
                           ` (2 more replies)
  2009-01-21 13:17       ` David Howells
                         ` (2 subsequent siblings)
  3 siblings, 3 replies; 27+ messages in thread
From: Evgeniy Polyakov @ 2009-01-21 12:37 UTC (permalink / raw)
  To: David Howells; +Cc: J. Bruce Fields, Trond Myklebust, linux-fsdevel

On Wed, Jan 21, 2009 at 12:23:09PM +0000, David Howells (dhowells@redhat.com) wrote:
> J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> > 	- Finally, we put_cred(override_creds(new)).  That modifies
> > 	  current->cred again, putting the old value and getting the
> > 	  new.
> > 
> > Hm.  But that last part's not OK; aren't we still holding our own
> > reference to new, in addition to the one that override_creds() just
> > took?  So I think we need the following?
> 
> Yes, you're right.  override_creds() takes an extra ref on the argument it is
> passed, thus leaving the caller with their original reference intact.
> 
> So really, you don't want to call override_creds() as that will cost you an
> extra atomic_inc() and atomic_dec_and_test().  I recommend you replace:
> 
>         put_cred(override_creds(new));
> 
> with:
> 
> 	revert_creds(new);
> 
> I think that should do the right thing.  It may look a bit odd, but it'll be
> quicker.  If you object to using revert_creds)( because of the name, we can
> come up with an alternative name.

With additional put_cred, i.e.:

        put_cred(override_creds(new));
        put_cred(new);
        return 0;

I got following fun tcpdump and failed mount (it stuck, but can be interrupted):

15:34:41.253911 IP 77.88.20.183.1835336279 > 77.88.20.182.2049: 44 null
15:34:41.253916 IP 77.88.20.182.2049 > 77.88.20.183.728: . ack 44 win 88 <nop,nop,timestamp 125462 37402358>
15:34:41.254103 IP 77.88.20.182.2049 > 77.88.20.183.1835336279: reply ok 28 null
15:34:41.254229 IP 77.88.20.183.728 > 77.88.20.182.2049: . ack 29 win 89 <nop,nop,timestamp 37402358 125463>
15:34:41.254238 IP 77.88.20.183.1852113495 > 77.88.20.182.2049: 44 null
15:34:41.254271 IP 77.88.20.182.2049 > 77.88.20.183.1852113495: reply ok 28 null
15:34:41.254378 IP 77.88.20.183.1868890711 > 77.88.20.182.2049: 100 fsinfo [|nfs]
15:34:41.254411 IP 77.88.20.182.2049 > 77.88.20.183.1868890711: reply ok 36 fsinfo [|nfs]
15:34:41.254528 IP 77.88.20.183.1885667927 > 77.88.20.182.2049: 100 fsinfo [|nfs]
15:34:41.254555 IP 77.88.20.182.2049 > 77.88.20.183.1885667927: reply ok 36 fsinfo [|nfs]

But no corruption in the dmesg (like oops or bug).

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-21 12:23     ` David Howells
  2009-01-21 12:37       ` Evgeniy Polyakov
@ 2009-01-21 13:17       ` David Howells
  2009-01-21 13:18         ` Evgeniy Polyakov
  2009-01-21 22:37       ` J. Bruce Fields
  2009-01-22  7:08       ` David Howells
  3 siblings, 1 reply; 27+ messages in thread
From: David Howells @ 2009-01-21 13:17 UTC (permalink / raw)
  To: Evgeniy Polyakov
  Cc: dhowells, J. Bruce Fields, Trond Myklebust, linux-fsdevel

Evgeniy Polyakov <zbr@ioremap.net> wrote:

> I got following fun tcpdump and failed mount (it stuck, but can be interrupted):
> ...
> But no corruption in the dmesg (like oops or bug).

Are you running with CONFIG_DEBUG_SLAB=y?

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-21 13:17       ` David Howells
@ 2009-01-21 13:18         ` Evgeniy Polyakov
  0 siblings, 0 replies; 27+ messages in thread
From: Evgeniy Polyakov @ 2009-01-21 13:18 UTC (permalink / raw)
  To: David Howells; +Cc: J. Bruce Fields, Trond Myklebust, linux-fsdevel

On Wed, Jan 21, 2009 at 01:17:02PM +0000, David Howells (dhowells@redhat.com) wrote:
> > I got following fun tcpdump and failed mount (it stuck, but can be interrupted):
> > ...
> > But no corruption in the dmesg (like oops or bug).
> 
> Are you running with CONFIG_DEBUG_SLAB=y?

Nope: # CONFIG_DEBUG_KERNEL is not set

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-21 12:23     ` David Howells
  2009-01-21 12:37       ` Evgeniy Polyakov
  2009-01-21 13:17       ` David Howells
@ 2009-01-21 22:37       ` J. Bruce Fields
  2009-01-22  7:08       ` David Howells
  3 siblings, 0 replies; 27+ messages in thread
From: J. Bruce Fields @ 2009-01-21 22:37 UTC (permalink / raw)
  To: David Howells; +Cc: Evgeniy Polyakov, Trond Myklebust, linux-fsdevel

On Wed, Jan 21, 2009 at 12:23:09PM +0000, David Howells wrote:
> J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> > 	- Finally, we put_cred(override_creds(new)).  That modifies
> > 	  current->cred again, putting the old value and getting the
> > 	  new.
> > 
> > Hm.  But that last part's not OK; aren't we still holding our own
> > reference to new, in addition to the one that override_creds() just
> > took?  So I think we need the following?
> 
> Yes, you're right.  override_creds() takes an extra ref on the argument it is
> passed, thus leaving the caller with their original reference intact.
> 
> So really, you don't want to call override_creds() as that will cost you an
> extra atomic_inc() and atomic_dec_and_test().  I recommend you replace:
> 
>         put_cred(override_creds(new));
> 
> with:
> 
> 	revert_creds(new);
> 
> I think that should do the right thing.  It may look a bit odd, but it'll be
> quicker.  If you object to using revert_creds)( because of the name, we can
> come up with an alternative name.

If the only difference is just whether it takes a reference on the
passed-in cred it might be clearest just to write

	set_creds(new);

or
	set_creds(get_creds(new));

depending on which you want?

In any case, yes, the revert_creds()/override_creds() names don't tell
me much.

> > Looking through nfsd_setuser(), one obvious bug: in the (flags &
> > NFSEXP_ALLSQUASH) case, we never check the return value from the
> > groups_alloc(0).  If it returns NULL, we dereference it anyway.
> 
> Since a zero-length groups list must be copied before writing, can I recommend
> that we make groups_alloc(0) a special case that returns pointer to a
> statically allocated groups list (after inc'ing the refcount) that represents
> a zero-length list, thus meaning groups_alloc(0) will never fail?

Is there a really big advantage to that?  On the face of it it strikes
me as a weird corner case that I'll trip over every time I look at this
code.

--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-21 12:37       ` Evgeniy Polyakov
@ 2009-01-21 22:39         ` J. Bruce Fields
  2009-01-21 22:46           ` Evgeniy Polyakov
  2009-01-27  0:49         ` J. Bruce Fields
  2009-02-05 13:22         ` David Howells
  2 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2009-01-21 22:39 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: David Howells, Trond Myklebust, linux-fsdevel

On Wed, Jan 21, 2009 at 03:37:28PM +0300, Evgeniy Polyakov wrote:
> On Wed, Jan 21, 2009 at 12:23:09PM +0000, David Howells (dhowells@redhat.com) wrote:
> > J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > > 	- Finally, we put_cred(override_creds(new)).  That modifies
> > > 	  current->cred again, putting the old value and getting the
> > > 	  new.
> > > 
> > > Hm.  But that last part's not OK; aren't we still holding our own
> > > reference to new, in addition to the one that override_creds() just
> > > took?  So I think we need the following?
> > 
> > Yes, you're right.  override_creds() takes an extra ref on the argument it is
> > passed, thus leaving the caller with their original reference intact.
> > 
> > So really, you don't want to call override_creds() as that will cost you an
> > extra atomic_inc() and atomic_dec_and_test().  I recommend you replace:
> > 
> >         put_cred(override_creds(new));
> > 
> > with:
> > 
> > 	revert_creds(new);
> > 
> > I think that should do the right thing.  It may look a bit odd, but it'll be
> > quicker.  If you object to using revert_creds)( because of the name, we can
> > come up with an alternative name.
> 
> With additional put_cred, i.e.:
> 
>         put_cred(override_creds(new));
>         put_cred(new);
>         return 0;
> 
> I got following fun tcpdump and failed mount (it stuck, but can be interrupted):

I'm really bad at reading tcpdump.  Where's the problem here?  It looks
like every call gets a reply.

--b.

> 
> 15:34:41.253911 IP 77.88.20.183.1835336279 > 77.88.20.182.2049: 44 null
> 15:34:41.253916 IP 77.88.20.182.2049 > 77.88.20.183.728: . ack 44 win 88 <nop,nop,timestamp 125462 37402358>
> 15:34:41.254103 IP 77.88.20.182.2049 > 77.88.20.183.1835336279: reply ok 28 null
> 15:34:41.254229 IP 77.88.20.183.728 > 77.88.20.182.2049: . ack 29 win 89 <nop,nop,timestamp 37402358 125463>
> 15:34:41.254238 IP 77.88.20.183.1852113495 > 77.88.20.182.2049: 44 null
> 15:34:41.254271 IP 77.88.20.182.2049 > 77.88.20.183.1852113495: reply ok 28 null
> 15:34:41.254378 IP 77.88.20.183.1868890711 > 77.88.20.182.2049: 100 fsinfo [|nfs]
> 15:34:41.254411 IP 77.88.20.182.2049 > 77.88.20.183.1868890711: reply ok 36 fsinfo [|nfs]
> 15:34:41.254528 IP 77.88.20.183.1885667927 > 77.88.20.182.2049: 100 fsinfo [|nfs]
> 15:34:41.254555 IP 77.88.20.182.2049 > 77.88.20.183.1885667927: reply ok 36 fsinfo [|nfs]
> 
> But no corruption in the dmesg (like oops or bug).
> 
> -- 
> 	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-20 21:44 ` David Howells
@ 2009-01-21 22:42   ` J. Bruce Fields
  0 siblings, 0 replies; 27+ messages in thread
From: J. Bruce Fields @ 2009-01-21 22:42 UTC (permalink / raw)
  To: David Howells; +Cc: Evgeniy Polyakov, Trond Myklebust, linux-fsdevel

On Tue, Jan 20, 2009 at 09:44:58PM +0000, David Howells wrote:
> J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> > On Tue, Jan 20, 2009 at 02:46:49PM +0300, Evgeniy Polyakov wrote:
> > This doesn't look familiar, no; thanks for the report.  I guess we
> > should take a careful look at the recent changes to fs/nfsd/auth.c?
> 
> I don't see anything obvious in fs/nfsd/auth.c or nfsfh.c, but I proably won't
> be able to investigate properly until I get back from LCA in Feb.

As long as we get this figured out well before the 2.6.29 release....

--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-21 22:39         ` J. Bruce Fields
@ 2009-01-21 22:46           ` Evgeniy Polyakov
  2009-01-21 23:18             ` J. Bruce Fields
  0 siblings, 1 reply; 27+ messages in thread
From: Evgeniy Polyakov @ 2009-01-21 22:46 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: David Howells, Trond Myklebust, linux-fsdevel

On Wed, Jan 21, 2009 at 05:39:19PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote:
> > I got following fun tcpdump and failed mount (it stuck, but can be interrupted):
> 
> I'm really bad at reading tcpdump.  Where's the problem here?  It looks
> like every call gets a reply.

Port fields are fun

> > 15:34:41.253911 IP addr1.1835336279 > addr2.2049: 44 null
                             ^^^^^^^^^^   vs    ^^^^

It is likely copied skb corruption, ports change, but acks correspond
sometimes.

> > 15:34:41.253916 IP addr2.2049 > addr1.728: . ack 44 win 88 <nop,nop,timestamp 125462 37402358>
> > 15:34:41.254103 IP addr2.2049 > addr1.1835336279: reply ok 28 null
> > 15:34:41.254229 IP addr1.728 > addr2.2049: . ack 29 win 89 <nop,nop,timestamp 37402358 125463>
> > 15:34:41.254238 IP addr1.1852113495 > addr2.2049: 44 null
> > 15:34:41.254271 IP addr2.2049 > addr1.1852113495: reply ok 28 null
> > 15:34:41.254378 IP addr1.1868890711 > addr2.2049: 100 fsinfo [|nfs]
> > 15:34:41.254411 IP addr2.2049 > addr1.1868890711: reply ok 36 fsinfo [|nfs]
> > 15:34:41.254528 IP addr1.1885667927 > addr2.2049: 100 fsinfo [|nfs]
> > 15:34:41.254555 IP addr2.2049 > addr1.1885667927: reply ok 36 fsinfo [|nfs]

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-21 22:46           ` Evgeniy Polyakov
@ 2009-01-21 23:18             ` J. Bruce Fields
  2009-01-21 23:31               ` Evgeniy Polyakov
  0 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2009-01-21 23:18 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: David Howells, Trond Myklebust, linux-fsdevel

On Thu, Jan 22, 2009 at 01:46:59AM +0300, Evgeniy Polyakov wrote:
> On Wed, Jan 21, 2009 at 05:39:19PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote:
> > > I got following fun tcpdump and failed mount (it stuck, but can be interrupted):
> > 
> > I'm really bad at reading tcpdump.  Where's the problem here?  It looks
> > like every call gets a reply.
> 
> Port fields are fun
> 
> > > 15:34:41.253911 IP addr1.1835336279 > addr2.2049: 44 null
>                              ^^^^^^^^^^   vs    ^^^^
> 
> It is likely copied skb corruption, ports change, but acks correspond
> sometimes.

Obviously it's normal that the client- and server- side ports would
differ, so I assume you meant to point out the variation in the client
port number?  (1835336279, 728, 1852113495, 1868890711, 1885667927).

--b.

> 
> > > 15:34:41.253916 IP addr2.2049 > addr1.728: . ack 44 win 88 <nop,nop,timestamp 125462 37402358>
> > > 15:34:41.254103 IP addr2.2049 > addr1.1835336279: reply ok 28 null
> > > 15:34:41.254229 IP addr1.728 > addr2.2049: . ack 29 win 89 <nop,nop,timestamp 37402358 125463>
> > > 15:34:41.254238 IP addr1.1852113495 > addr2.2049: 44 null
> > > 15:34:41.254271 IP addr2.2049 > addr1.1852113495: reply ok 28 null
> > > 15:34:41.254378 IP addr1.1868890711 > addr2.2049: 100 fsinfo [|nfs]
> > > 15:34:41.254411 IP addr2.2049 > addr1.1868890711: reply ok 36 fsinfo [|nfs]
> > > 15:34:41.254528 IP addr1.1885667927 > addr2.2049: 100 fsinfo [|nfs]
> > > 15:34:41.254555 IP addr2.2049 > addr1.1885667927: reply ok 36 fsinfo [|nfs]
> 
> -- 
> 	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-21 23:18             ` J. Bruce Fields
@ 2009-01-21 23:31               ` Evgeniy Polyakov
  0 siblings, 0 replies; 27+ messages in thread
From: Evgeniy Polyakov @ 2009-01-21 23:31 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: David Howells, Trond Myklebust, linux-fsdevel

On Wed, Jan 21, 2009 at 06:18:11PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote:
> > Port fields are fun
> > 
> > > > 15:34:41.253911 IP addr1.1835336279 > addr2.2049: 44 null
> >                              ^^^^^^^^^^   vs    ^^^^
> > 
> > It is likely copied skb corruption, ports change, but acks correspond
> > sometimes.
> 
> Obviously it's normal that the client- and server- side ports would
> differ, so I assume you meant to point out the variation in the client
> port number?  (1835336279, 728, 1852113495, 1868890711, 1885667927).

I meant that ack still comes to the fun port, not only to correct one.
And some ports are obviously incorrect like 4 above.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-21 12:23     ` David Howells
                         ` (2 preceding siblings ...)
  2009-01-21 22:37       ` J. Bruce Fields
@ 2009-01-22  7:08       ` David Howells
  2009-01-22 16:43         ` J. Bruce Fields
  3 siblings, 1 reply; 27+ messages in thread
From: David Howells @ 2009-01-22  7:08 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: dhowells, Evgeniy Polyakov, Trond Myklebust, linux-fsdevel

J. Bruce Fields <bfields@fieldses.org> wrote:

> If the only difference is just whether it takes a reference on the
> passed-in cred it might be clearest just to write
> 
> 	set_creds(new);
> 
> or
> 	set_creds(get_creds(new));
> 
> depending on which you want?

The former would be preferable, if it transfers the reference on the creds to
the task_struct, thus eliminating the need for a put_cred().

> In any case, yes, the revert_creds()/override_creds() names don't tell
> me much.
> 
> > > Looking through nfsd_setuser(), one obvious bug: in the (flags &
> > > NFSEXP_ALLSQUASH) case, we never check the return value from the
> > > groups_alloc(0).  If it returns NULL, we dereference it anyway.
> > 
> > Since a zero-length groups list must be copied before writing, can I recommend
> > that we make groups_alloc(0) a special case that returns pointer to a
> > statically allocated groups list (after inc'ing the refcount) that represents
> > a zero-length list, thus meaning groups_alloc(0) will never fail?
> 
> Is there a really big advantage to that?  On the face of it it strikes
> me as a weird corner case that I'll trip over every time I look at this
> code.

It'll remove a potential OOM condition.  It's a minor optimisation, I think.

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-22  7:08       ` David Howells
@ 2009-01-22 16:43         ` J. Bruce Fields
  0 siblings, 0 replies; 27+ messages in thread
From: J. Bruce Fields @ 2009-01-22 16:43 UTC (permalink / raw)
  To: David Howells; +Cc: Evgeniy Polyakov, Trond Myklebust, linux-fsdevel

On Thu, Jan 22, 2009 at 07:08:33AM +0000, David Howells wrote:
> J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> > If the only difference is just whether it takes a reference on the
> > passed-in cred it might be clearest just to write
> > 
> > 	set_creds(new);
> > 
> > or
> > 	set_creds(get_creds(new));
> > 
> > depending on which you want?
> 
> The former would be preferable, if it transfers the reference on the creds to
> the task_struct, thus eliminating the need for a put_cred().

I think I was unclear--but I think we're agreeing anyway?  I was
proposing eliminating the two separate revert_creds() and
override_creds() functions and instead using a single set_creds() that
always consumes a reference to its argument, requiring the caller to
explicitly get a reference (as in the second example above) when
necessary.

> > Is there a really big advantage to that?  On the face of it it strikes
> > me as a weird corner case that I'll trip over every time I look at this
> > code.
> 
> It'll remove a potential OOM condition.  It's a minor optimisation, I think.

OK.  Let's keep things simple and set that idea aside for now; we've
lived with the current groups_alloc(0) behavior for a while, and keeping
it another kernel version or two can't be so bad.

--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-21 12:37       ` Evgeniy Polyakov
  2009-01-21 22:39         ` J. Bruce Fields
@ 2009-01-27  0:49         ` J. Bruce Fields
  2009-01-27  9:26           ` Evgeniy Polyakov
  2009-02-05 13:22         ` David Howells
  2 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2009-01-27  0:49 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: David Howells, Trond Myklebust, linux-fsdevel

On Wed, Jan 21, 2009 at 03:37:28PM +0300, Evgeniy Polyakov wrote:
> With additional put_cred, i.e.:
> 
>         put_cred(override_creds(new));
>         put_cred(new);
>         return 0;
> 
> I got following fun tcpdump and failed mount (it stuck, but can be interrupted):

I'm not able to reproduce this.

Have you been able to reproduce this consistently?  Any further results?

If not, I'm tempted to assume you're seeing some unrelated bug.

David, can you ACK the additional put_cred()?

--b.

> 
> 15:34:41.253911 IP 77.88.20.183.1835336279 > 77.88.20.182.2049: 44 null
> 15:34:41.253916 IP 77.88.20.182.2049 > 77.88.20.183.728: . ack 44 win 88 <nop,nop,timestamp 125462 37402358>
> 15:34:41.254103 IP 77.88.20.182.2049 > 77.88.20.183.1835336279: reply ok 28 null
> 15:34:41.254229 IP 77.88.20.183.728 > 77.88.20.182.2049: . ack 29 win 89 <nop,nop,timestamp 37402358 125463>
> 15:34:41.254238 IP 77.88.20.183.1852113495 > 77.88.20.182.2049: 44 null
> 15:34:41.254271 IP 77.88.20.182.2049 > 77.88.20.183.1852113495: reply ok 28 null
> 15:34:41.254378 IP 77.88.20.183.1868890711 > 77.88.20.182.2049: 100 fsinfo [|nfs]
> 15:34:41.254411 IP 77.88.20.182.2049 > 77.88.20.183.1868890711: reply ok 36 fsinfo [|nfs]
> 15:34:41.254528 IP 77.88.20.183.1885667927 > 77.88.20.182.2049: 100 fsinfo [|nfs]
> 15:34:41.254555 IP 77.88.20.182.2049 > 77.88.20.183.1885667927: reply ok 36 fsinfo [|nfs]
> 
> But no corruption in the dmesg (like oops or bug).
> 
> -- 
> 	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-27  0:49         ` J. Bruce Fields
@ 2009-01-27  9:26           ` Evgeniy Polyakov
  2009-01-27 22:07             ` J. Bruce Fields
  0 siblings, 1 reply; 27+ messages in thread
From: Evgeniy Polyakov @ 2009-01-27  9:26 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: David Howells, Trond Myklebust, linux-fsdevel

On Mon, Jan 26, 2009 at 07:49:55PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote:
> > With additional put_cred, i.e.:
> > 
> >         put_cred(override_creds(new));
> >         put_cred(new);
> >         return 0;
> > 
> > I got following fun tcpdump and failed mount (it stuck, but can be interrupted):
> 
> I'm not able to reproduce this.
> 
> Have you been able to reproduce this consistently?  Any further results?

Yes, I can easily reproduce it with the tree I work with.
But I will update to the latest git soon (likely today) and rerun the
tests, so it may happen that rc1-rc2 frame fixed the issue at least
partially. I can also turn on some debugging options if needed.

> If not, I'm tempted to assume you're seeing some unrelated bug.
> 
> David, can you ACK the additional put_cred()?

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-27  9:26           ` Evgeniy Polyakov
@ 2009-01-27 22:07             ` J. Bruce Fields
  2009-01-29 14:37               ` Evgeniy Polyakov
  0 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2009-01-27 22:07 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: David Howells, Trond Myklebust, linux-fsdevel

On Tue, Jan 27, 2009 at 12:26:59PM +0300, Evgeniy Polyakov wrote:
> On Mon, Jan 26, 2009 at 07:49:55PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote:
> > > With additional put_cred, i.e.:
> > > 
> > >         put_cred(override_creds(new));
> > >         put_cred(new);
> > >         return 0;
> > > 
> > > I got following fun tcpdump and failed mount (it stuck, but can be interrupted):
> > 
> > I'm not able to reproduce this.
> > 
> > Have you been able to reproduce this consistently?  Any further results?
> 
> Yes, I can easily reproduce it with the tree I work with.
> But I will update to the latest git soon (likely today) and rerun the
> tests, so it may happen that rc1-rc2 frame fixed the issue at least
> partially. I can also turn on some debugging options if needed.

I took a closer look at the tcpdump output.  Those numbers aren't port
numbers, they're rpc xid's: see "NFS Requests and Replies" in the
"OUTPUT FORMAT" section of the tcpdump man page.  Or compare the output
of tcpdump with wireshark's output.

That still doesn't explain why you were seeing a mount hang--but some
network or other configuration problem may be the most likely
explanation.

--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-27 22:07             ` J. Bruce Fields
@ 2009-01-29 14:37               ` Evgeniy Polyakov
  2009-01-29 18:52                 ` J. Bruce Fields
  0 siblings, 1 reply; 27+ messages in thread
From: Evgeniy Polyakov @ 2009-01-29 14:37 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: David Howells, Trond Myklebust, linux-fsdevel

On Tue, Jan 27, 2009 at 05:07:35PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote:
> I took a closer look at the tcpdump output.  Those numbers aren't port
> numbers, they're rpc xid's: see "NFS Requests and Replies" in the
> "OUTPUT FORMAT" section of the tcpdump man page.  Or compare the output
> of tcpdump with wireshark's output.

Yup, they are definitely not network ports :)

> That still doesn't explain why you were seeing a mount hang--but some
> network or other configuration problem may be the most likely
> explanation.

I've rerun the tests with the latest git to date and I do not observe
neither leak (with credential patch applied) nor mount fail, so please
put the patch into the tree.

Feel free to add my signed/acked/tested/whatevered if needed :)

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-29 14:37               ` Evgeniy Polyakov
@ 2009-01-29 18:52                 ` J. Bruce Fields
  2009-01-29 19:00                   ` Evgeniy Polyakov
  0 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2009-01-29 18:52 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: David Howells, Trond Myklebust, linux-fsdevel

On Thu, Jan 29, 2009 at 05:37:53PM +0300, Evgeniy Polyakov wrote:
> On Tue, Jan 27, 2009 at 05:07:35PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote:
> > I took a closer look at the tcpdump output.  Those numbers aren't port
> > numbers, they're rpc xid's: see "NFS Requests and Replies" in the
> > "OUTPUT FORMAT" section of the tcpdump man page.  Or compare the output
> > of tcpdump with wireshark's output.
> 
> Yup, they are definitely not network ports :)
> 
> > That still doesn't explain why you were seeing a mount hang--but some
> > network or other configuration problem may be the most likely
> > explanation.
> 
> I've rerun the tests with the latest git to date and I do not observe
> neither leak (with credential patch applied) nor mount fail, so please
> put the patch into the tree.

Thanks for the confirmation--but I jumped the gun and already submitted
them...

> Feel free to add my signed/acked/tested/whatevered if needed :)

... and forgot to credit you--sorry, I should have.  Thanks again for
catching the problem!

--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-29 18:52                 ` J. Bruce Fields
@ 2009-01-29 19:00                   ` Evgeniy Polyakov
  0 siblings, 0 replies; 27+ messages in thread
From: Evgeniy Polyakov @ 2009-01-29 19:00 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: David Howells, Trond Myklebust, linux-fsdevel

On Thu, Jan 29, 2009 at 01:52:53PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote:
> > I've rerun the tests with the latest git to date and I do not observe
> > neither leak (with credential patch applied) nor mount fail, so please
> > put the patch into the tree.
> 
> Thanks for the confirmation--but I jumped the gun and already submitted
> them...
> 
> > Feel free to add my signed/acked/tested/whatevered if needed :)
> 
> ... and forgot to credit you--sorry, I should have.  Thanks again for
> catching the problem!

No problem.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-01-21 12:37       ` Evgeniy Polyakov
  2009-01-21 22:39         ` J. Bruce Fields
  2009-01-27  0:49         ` J. Bruce Fields
@ 2009-02-05 13:22         ` David Howells
  2009-02-05 17:21           ` J. Bruce Fields
  2 siblings, 1 reply; 27+ messages in thread
From: David Howells @ 2009-02-05 13:22 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: dhowells, Evgeniy Polyakov, Trond Myklebust, linux-fsdevel


Sorry for not replying earlier, but I've been in Australia and my home DSL
line has been down for the last two weeks.

J. Bruce Fields <bfields@fieldses.org> wrote:

> David, can you ACK the additional put_cred()?

Feel free to add my Acked-by.

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS/credentials leak in 2.6.29-rc1
  2009-02-05 13:22         ` David Howells
@ 2009-02-05 17:21           ` J. Bruce Fields
  0 siblings, 0 replies; 27+ messages in thread
From: J. Bruce Fields @ 2009-02-05 17:21 UTC (permalink / raw)
  To: David Howells; +Cc: Evgeniy Polyakov, Trond Myklebust, linux-fsdevel

On Thu, Feb 05, 2009 at 01:22:37PM +0000, David Howells wrote:
> 
> Sorry for not replying earlier, but I've been in Australia and my home DSL
> line has been down for the last two weeks.
> 
> J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> > David, can you ACK the additional put_cred()?
> 
> Feel free to add my Acked-by.

I was impatient and already sent it upstream--but thanks for the
confirmation.--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2009-02-05 17:21 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-20 11:46 NFS/credentials leak in 2.6.29-rc1 Evgeniy Polyakov
2009-01-20 13:37 ` Trond Myklebust
2009-01-20 13:49   ` Evgeniy Polyakov
2009-01-20 15:11 ` J. Bruce Fields
2009-01-20 15:23   ` Evgeniy Polyakov
2009-01-20 23:53     ` J. Bruce Fields
2009-01-21 12:23     ` David Howells
2009-01-21 12:37       ` Evgeniy Polyakov
2009-01-21 22:39         ` J. Bruce Fields
2009-01-21 22:46           ` Evgeniy Polyakov
2009-01-21 23:18             ` J. Bruce Fields
2009-01-21 23:31               ` Evgeniy Polyakov
2009-01-27  0:49         ` J. Bruce Fields
2009-01-27  9:26           ` Evgeniy Polyakov
2009-01-27 22:07             ` J. Bruce Fields
2009-01-29 14:37               ` Evgeniy Polyakov
2009-01-29 18:52                 ` J. Bruce Fields
2009-01-29 19:00                   ` Evgeniy Polyakov
2009-02-05 13:22         ` David Howells
2009-02-05 17:21           ` J. Bruce Fields
2009-01-21 13:17       ` David Howells
2009-01-21 13:18         ` Evgeniy Polyakov
2009-01-21 22:37       ` J. Bruce Fields
2009-01-22  7:08       ` David Howells
2009-01-22 16:43         ` J. Bruce Fields
2009-01-20 21:44 ` David Howells
2009-01-21 22:42   ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.