linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* uncollected nfsd open owners
@ 2019-10-25  1:22 NeilBrown
  2019-10-25 15:20 ` J. Bruce Fields
  0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2019-10-25  1:22 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 1475 bytes --]


Hi,
 I have a coredump from a machine that was running as an NFS server.
 nfs4_laundromat was trying to expire a client, and in particular was
 cleaning up the ->cl_openowners.
 As there were 6.5 million of these, it took rather longer than the
 softlockup timer thought was acceptable, and hence the core dump.

 Those open owners that I looked at had empty so_stateids lists, so I
 would normally expect them to be on the close_lru and to be removed
 fairly soon.  But they weren't (only 32 openowners on close_lru).

 The only explanation I can think of for this is that maybe an OPEN
 request successfully got through nfs4_process_open1(), thus creating an
 open owner, but failed to get to or through nfs4_process_open2(), and
 so didn't add a stateid.  I *think* this can leave an openowner that is
 unused but will never be cleaned up (until the client is expired, which
 might be too late).

 Is this possible?  If so, how should we handle those openowners which
 never had a stateid?
 In 3.0 (which it the kernel were I saw this) I could probably just put
 the openowner on the close_lru when it is created.
 In more recent kernels, it seems to be assumed that openowners are only
 on close_lru if they have a oo_last_closed_stid.  Would we need a
 separate "never used lru", or should they just be destroyed as soon as
 the open fails?

 Also, should we put a cond_resched() in some or all of those loops in
 __destroy_client() ??

Thanks for your help,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: uncollected nfsd open owners
  2019-10-25  1:22 uncollected nfsd open owners NeilBrown
@ 2019-10-25 15:20 ` J. Bruce Fields
  2019-10-26 21:36   ` J. Bruce Fields
  2019-10-28  3:24   ` NeilBrown
  0 siblings, 2 replies; 6+ messages in thread
From: J. Bruce Fields @ 2019-10-25 15:20 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

On Fri, Oct 25, 2019 at 12:22:36PM +1100, NeilBrown wrote:
>  I have a coredump from a machine that was running as an NFS server.
>  nfs4_laundromat was trying to expire a client, and in particular was
>  cleaning up the ->cl_openowners.
>  As there were 6.5 million of these, it took rather longer than the
>  softlockup timer thought was acceptable, and hence the core dump.
> 
>  Those open owners that I looked at had empty so_stateids lists, so I
>  would normally expect them to be on the close_lru and to be removed
>  fairly soon.  But they weren't (only 32 openowners on close_lru).
> 
>  The only explanation I can think of for this is that maybe an OPEN
>  request successfully got through nfs4_process_open1(), thus creating an
>  open owner, but failed to get to or through nfs4_process_open2(), and
>  so didn't add a stateid.  I *think* this can leave an openowner that is
>  unused but will never be cleaned up (until the client is expired, which
>  might be too late).
> 
>  Is this possible?  If so, how should we handle those openowners which
>  never had a stateid?
>  In 3.0 (which it the kernel were I saw this) I could probably just put
>  the openowner on the close_lru when it is created.
>  In more recent kernels, it seems to be assumed that openowners are only
>  on close_lru if they have a oo_last_closed_stid.  Would we need a
>  separate "never used lru", or should they just be destroyed as soon as
>  the open fails?

Hopefully we can just throw the new openowner away when the open fails.

But it looks like the new openowner is visible on global data structures
by then, so we need to be sure somebody else isn't about to use it.

>  Also, should we put a cond_resched() in some or all of those loops in
>  __destroy_client() ??

Looks like it helped find a bug in this case....

Destroying a client that has a ton of active state should be an unusual
situation.

I don't know, maybe?  I'm sure this isn't the only spinlock-protected
kernel code where we don't have a strict bound on a loop, what's been
the practice elsewhere?  Worst case, the realtime code allows preempting
spinlocks, right?

Might be nice to have some sort of limits on the number of objects (like
stateowners) that can be created.  But it's a pain when we get one of
those limits wrong. (See
git log -L :nfsd4_get_drc_mem:fs/nfsd/nfs4state.c.)

--b.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: uncollected nfsd open owners
  2019-10-25 15:20 ` J. Bruce Fields
@ 2019-10-26 21:36   ` J. Bruce Fields
  2019-10-27 23:49     ` NeilBrown
  2019-10-28  3:24   ` NeilBrown
  1 sibling, 1 reply; 6+ messages in thread
From: J. Bruce Fields @ 2019-10-26 21:36 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: NeilBrown, linux-nfs

On Fri, Oct 25, 2019 at 11:20:47AM -0400, J. Bruce Fields wrote:
> On Fri, Oct 25, 2019 at 12:22:36PM +1100, NeilBrown wrote:
> >  I have a coredump from a machine that was running as an NFS server.
> >  nfs4_laundromat was trying to expire a client, and in particular was
> >  cleaning up the ->cl_openowners.
> >  As there were 6.5 million of these, it took rather longer than the
> >  softlockup timer thought was acceptable, and hence the core dump.
> > 
> >  Those open owners that I looked at had empty so_stateids lists, so I
> >  would normally expect them to be on the close_lru and to be removed
> >  fairly soon.  But they weren't (only 32 openowners on close_lru).
> > 
> >  The only explanation I can think of for this is that maybe an OPEN
> >  request successfully got through nfs4_process_open1(), thus creating an
> >  open owner, but failed to get to or through nfs4_process_open2(), and
> >  so didn't add a stateid.  I *think* this can leave an openowner that is
> >  unused but will never be cleaned up (until the client is expired, which
> >  might be too late).
> > 
> >  Is this possible?  If so, how should we handle those openowners which
> >  never had a stateid?
> >  In 3.0 (which it the kernel were I saw this) I could probably just put
> >  the openowner on the close_lru when it is created.
> >  In more recent kernels, it seems to be assumed that openowners are only
> >  on close_lru if they have a oo_last_closed_stid.  Would we need a
> >  separate "never used lru", or should they just be destroyed as soon as
> >  the open fails?
> 
> Hopefully we can just throw the new openowner away when the open fails.
> 
> But it looks like the new openowner is visible on global data structures
> by then, so we need to be sure somebody else isn't about to use it.

But, also, if this has only been seen on 3.0, it may have been fixed
already.  It sounds like kind of a familiar problem, but I didn't spot a
relevant commit on a quick look through the logs.

--b.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: uncollected nfsd open owners
  2019-10-26 21:36   ` J. Bruce Fields
@ 2019-10-27 23:49     ` NeilBrown
  2019-10-28 14:52       ` J. Bruce Fields
  0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2019-10-27 23:49 UTC (permalink / raw)
  To: J. Bruce Fields, J. Bruce Fields; +Cc: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 3214 bytes --]

On Sat, Oct 26 2019, J. Bruce Fields wrote:

> On Fri, Oct 25, 2019 at 11:20:47AM -0400, J. Bruce Fields wrote:
>> On Fri, Oct 25, 2019 at 12:22:36PM +1100, NeilBrown wrote:
>> >  I have a coredump from a machine that was running as an NFS server.
>> >  nfs4_laundromat was trying to expire a client, and in particular was
>> >  cleaning up the ->cl_openowners.
>> >  As there were 6.5 million of these, it took rather longer than the
>> >  softlockup timer thought was acceptable, and hence the core dump.
>> > 
>> >  Those open owners that I looked at had empty so_stateids lists, so I
>> >  would normally expect them to be on the close_lru and to be removed
>> >  fairly soon.  But they weren't (only 32 openowners on close_lru).
>> > 
>> >  The only explanation I can think of for this is that maybe an OPEN
>> >  request successfully got through nfs4_process_open1(), thus creating an
>> >  open owner, but failed to get to or through nfs4_process_open2(), and
>> >  so didn't add a stateid.  I *think* this can leave an openowner that is
>> >  unused but will never be cleaned up (until the client is expired, which
>> >  might be too late).
>> > 
>> >  Is this possible?  If so, how should we handle those openowners which
>> >  never had a stateid?
>> >  In 3.0 (which it the kernel were I saw this) I could probably just put
>> >  the openowner on the close_lru when it is created.
>> >  In more recent kernels, it seems to be assumed that openowners are only
>> >  on close_lru if they have a oo_last_closed_stid.  Would we need a
>> >  separate "never used lru", or should they just be destroyed as soon as
>> >  the open fails?
>> 
>> Hopefully we can just throw the new openowner away when the open fails.
>> 
>> But it looks like the new openowner is visible on global data structures
>> by then, so we need to be sure somebody else isn't about to use it.
>
> But, also, if this has only been seen on 3.0, it may have been fixed
> already.  It sounds like kind of a familiar problem, but I didn't spot a
> relevant commit on a quick look through the logs.
>
I guess I shouldn't expect you to remember something from 8 years ago.
This seems a perfect fit for what I see:

commit d29b20cd589128a599e5045d4effc2d7dbc388f5
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Thu Oct 13 15:12:59 2011 -0400

    nfsd4: clean up open owners on OPEN failure
    
    If process_open1() creates a new open owner, but the open later fails,
    the current code will leave the open owner around.  It won't be on the
    close_lru list, and the client isn't expected to send a CLOSE, so it
    will hang around as long as the client does.
    
    Similarly, if process_open1() removes an existing open owner from the
    close lru, anticipating that an open owner that previously had no
    associated stateid's now will, but the open subsequently fails, then
    we'll again be left with the same leak.
    
    Fix both problems.

I wonder if this is safe to backport ... 3.2 is fairly close to 3.0, but
there have been lots of changes to the code since then ... maybe there
are more bug fixes.
I'll work something out.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: uncollected nfsd open owners
  2019-10-25 15:20 ` J. Bruce Fields
  2019-10-26 21:36   ` J. Bruce Fields
@ 2019-10-28  3:24   ` NeilBrown
  1 sibling, 0 replies; 6+ messages in thread
From: NeilBrown @ 2019-10-28  3:24 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 910 bytes --]

On Fri, Oct 25 2019, J. Bruce Fields wrote:
>
>>  Also, should we put a cond_resched() in some or all of those loops in
>>  __destroy_client() ??
>
> Looks like it helped find a bug in this case....
>
> Destroying a client that has a ton of active state should be an unusual
> situation.
>
> I don't know, maybe?  I'm sure this isn't the only spinlock-protected
> kernel code where we don't have a strict bound on a loop, what's been
> the practice elsewhere?  Worst case, the realtime code allows preempting
> spinlocks, right?

 git grep cond_resched_lock

But most of __destroy_client isn't protected by a spinlock....

I dunno - maybe it doesn't matter.


> Might be nice to have some sort of limits on the number of objects (like
> stateowners) that can be created.  But it's a pain when we get one of
> those limits wrong. (See
> git log -L :nfsd4_get_drc_mem:fs/nfsd/nfs4state.c.)

Grin...
>

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: uncollected nfsd open owners
  2019-10-27 23:49     ` NeilBrown
@ 2019-10-28 14:52       ` J. Bruce Fields
  0 siblings, 0 replies; 6+ messages in thread
From: J. Bruce Fields @ 2019-10-28 14:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: J. Bruce Fields, linux-nfs

On Mon, Oct 28, 2019 at 10:49:26AM +1100, NeilBrown wrote:
> On Sat, Oct 26 2019, J. Bruce Fields wrote:
> 
> > On Fri, Oct 25, 2019 at 11:20:47AM -0400, J. Bruce Fields wrote:
> >> On Fri, Oct 25, 2019 at 12:22:36PM +1100, NeilBrown wrote:
> >> >  I have a coredump from a machine that was running as an NFS server.
> >> >  nfs4_laundromat was trying to expire a client, and in particular was
> >> >  cleaning up the ->cl_openowners.
> >> >  As there were 6.5 million of these, it took rather longer than the
> >> >  softlockup timer thought was acceptable, and hence the core dump.
> >> > 
> >> >  Those open owners that I looked at had empty so_stateids lists, so I
> >> >  would normally expect them to be on the close_lru and to be removed
> >> >  fairly soon.  But they weren't (only 32 openowners on close_lru).
> >> > 
> >> >  The only explanation I can think of for this is that maybe an OPEN
> >> >  request successfully got through nfs4_process_open1(), thus creating an
> >> >  open owner, but failed to get to or through nfs4_process_open2(), and
> >> >  so didn't add a stateid.  I *think* this can leave an openowner that is
> >> >  unused but will never be cleaned up (until the client is expired, which
> >> >  might be too late).
> >> > 
> >> >  Is this possible?  If so, how should we handle those openowners which
> >> >  never had a stateid?
> >> >  In 3.0 (which it the kernel were I saw this) I could probably just put
> >> >  the openowner on the close_lru when it is created.
> >> >  In more recent kernels, it seems to be assumed that openowners are only
> >> >  on close_lru if they have a oo_last_closed_stid.  Would we need a
> >> >  separate "never used lru", or should they just be destroyed as soon as
> >> >  the open fails?
> >> 
> >> Hopefully we can just throw the new openowner away when the open fails.
> >> 
> >> But it looks like the new openowner is visible on global data structures
> >> by then, so we need to be sure somebody else isn't about to use it.
> >
> > But, also, if this has only been seen on 3.0, it may have been fixed
> > already.  It sounds like kind of a familiar problem, but I didn't spot a
> > relevant commit on a quick look through the logs.
> >
> I guess I shouldn't expect you to remember something from 8 years ago.
> This seems a perfect fit for what I see:
> 
> commit d29b20cd589128a599e5045d4effc2d7dbc388f5
> Author: J. Bruce Fields <bfields@redhat.com>
> Date:   Thu Oct 13 15:12:59 2011 -0400
> 
>     nfsd4: clean up open owners on OPEN failure
>     
>     If process_open1() creates a new open owner, but the open later fails,
>     the current code will leave the open owner around.  It won't be on the
>     close_lru list, and the client isn't expected to send a CLOSE, so it
>     will hang around as long as the client does.
>     
>     Similarly, if process_open1() removes an existing open owner from the
>     close lru, anticipating that an open owner that previously had no
>     associated stateid's now will, but the open subsequently fails, then
>     we'll again be left with the same leak.
>     
>     Fix both problems.
> 
> I wonder if this is safe to backport ... 3.2 is fairly close to 3.0, but
> there have been lots of changes to the code since then ... maybe there
> are more bug fixes.
> I'll work something out.

This is all before the breakup of the big state lock, so you don't have
that to worry about, at least.

Anyway, not even remembering that commit, I'm afraid I'm no more expert
on that era of the history that you at this point....

--b.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-10-28 14:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-25  1:22 uncollected nfsd open owners NeilBrown
2019-10-25 15:20 ` J. Bruce Fields
2019-10-26 21:36   ` J. Bruce Fields
2019-10-27 23:49     ` NeilBrown
2019-10-28 14:52       ` J. Bruce Fields
2019-10-28  3:24   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).