linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* spurious sillyrename after O_DIRECT writes get ENOSPC
@ 2017-12-08 22:16 J. Bruce Fields
  2017-12-13 17:18 ` J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2017-12-08 22:16 UTC (permalink / raw)
  To: linux-nfs; +Cc: hch

Last year Christoph noticed a bug that could result in a file being
unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC:

	http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org

It's reproduceable on upstream, over v3 or v4.

I looked into it some more, and it seems to reproduce whenever a write
system call results in multiple WRITE calls, only some of which receive
ENOSPC.  I think that's resulting in a leak of the wb_kref on some
nfs_pages (possibly the ones corresponding to the ENOSPC failures?).
Those nfs_pages in turn hold references on nfs_{lock,open}_contexts.  So
a "rm" on the client (even after the file is closed) results in a
sillyrename.

I'll keep looking at this, but the relevant code is pretty opaque to me
so far.  Any ideas welcomed.

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: spurious sillyrename after O_DIRECT writes get ENOSPC
  2017-12-08 22:16 spurious sillyrename after O_DIRECT writes get ENOSPC J. Bruce Fields
@ 2017-12-13 17:18 ` J. Bruce Fields
  2017-12-14 13:08   ` Benjamin Coddington
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2017-12-13 17:18 UTC (permalink / raw)
  To: linux-nfs; +Cc: hch

On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote:
> Last year Christoph noticed a bug that could result in a file being
> unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC:
> 
> 	http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org
> 
> It's reproduceable on upstream, over v3 or v4.
> 
> I looked into it some more, and it seems to reproduce whenever a write
> system call results in multiple WRITE calls, only some of which receive
> ENOSPC.  I think that's resulting in a leak of the wb_kref on some
> nfs_pages (possibly the ones corresponding to the ENOSPC failures?).
> Those nfs_pages in turn hold references on nfs_{lock,open}_contexts.  So
> a "rm" on the client (even after the file is closed) results in a
> sillyrename.
> 
> I'll keep looking at this, but the relevant code is pretty opaque to me
> so far.  Any ideas welcomed.

Actually it looks like a leak of dreq->io_count?  That prevents commits
from being sent (which I'm also seeing in network traces--the succesfull
WRITEs are unstable but never get committed), which means
nfs_direct_commit_complete() is never called, and the reference taken on
wb_kref in the request_commit case of nfs_direct_write_completion is
never put.

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: spurious sillyrename after O_DIRECT writes get ENOSPC
  2017-12-13 17:18 ` J. Bruce Fields
@ 2017-12-14 13:08   ` Benjamin Coddington
  2017-12-14 16:36     ` J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Coddington @ 2017-12-14 13:08 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, hch


On 13 Dec 2017, at 12:18, J. Bruce Fields wrote:

> On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote:
>> Last year Christoph noticed a bug that could result in a file being
>> unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC:
>>
>> 	http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org
>>
>> It's reproduceable on upstream, over v3 or v4.
>>
>> I looked into it some more, and it seems to reproduce whenever a write
>> system call results in multiple WRITE calls, only some of which receive
>> ENOSPC.  I think that's resulting in a leak of the wb_kref on some
>> nfs_pages (possibly the ones corresponding to the ENOSPC failures?).
>> Those nfs_pages in turn hold references on nfs_{lock,open}_contexts.  So
>> a "rm" on the client (even after the file is closed) results in a
>> sillyrename.
>>
>> I'll keep looking at this, but the relevant code is pretty opaque to me
>> so far.  Any ideas welcomed.
>
> Actually it looks like a leak of dreq->io_count?  That prevents commits
> from being sent (which I'm also seeing in network traces--the succesfull
> WRITEs are unstable but never get committed), which means
> nfs_direct_commit_complete() is never called, and the reference taken on
> wb_kref in the request_commit case of nfs_direct_write_completion is
> never put.

This sounds to me like the problem Scott's working - he sent a patch
yesterday "nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the
mds".

I think the the rule should be that once we call
nfs_pgio_completion_ops->init_hdr, we have to finish with ->completion.
However, there are some paths where that is not the case.

The callgraph in between nfs_pgheader_init() and nfs_initiate_pgio() in
nfs_generic_pg_pgios() for this case might show where we're bailing out
early.

Ben

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: spurious sillyrename after O_DIRECT writes get ENOSPC
  2017-12-14 13:08   ` Benjamin Coddington
@ 2017-12-14 16:36     ` J. Bruce Fields
  2017-12-14 18:55       ` J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2017-12-14 16:36 UTC (permalink / raw)
  To: Benjamin Coddington; +Cc: linux-nfs, hch

On Thu, Dec 14, 2017 at 08:08:53AM -0500, Benjamin Coddington wrote:
> 
> On 13 Dec 2017, at 12:18, J. Bruce Fields wrote:
> 
> > On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote:
> >> Last year Christoph noticed a bug that could result in a file being
> >> unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC:
> >>
> >> 	http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org
> >>
> >> It's reproduceable on upstream, over v3 or v4.
> >>
> >> I looked into it some more, and it seems to reproduce whenever a write
> >> system call results in multiple WRITE calls, only some of which receive
> >> ENOSPC.  I think that's resulting in a leak of the wb_kref on some
> >> nfs_pages (possibly the ones corresponding to the ENOSPC failures?).
> >> Those nfs_pages in turn hold references on nfs_{lock,open}_contexts.  So
> >> a "rm" on the client (even after the file is closed) results in a
> >> sillyrename.
> >>
> >> I'll keep looking at this, but the relevant code is pretty opaque to me
> >> so far.  Any ideas welcomed.
> >
> > Actually it looks like a leak of dreq->io_count?  That prevents commits
> > from being sent (which I'm also seeing in network traces--the succesfull
> > WRITEs are unstable but never get committed), which means
> > nfs_direct_commit_complete() is never called, and the reference taken on
> > wb_kref in the request_commit case of nfs_direct_write_completion is
> > never put.
> 
> This sounds to me like the problem Scott's working - he sent a patch
> yesterday "nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the
> mds".
> 
> I think the the rule should be that once we call
> nfs_pgio_completion_ops->init_hdr, we have to finish with ->completion.
> However, there are some paths where that is not the case.

Yes, I wondered about that....

Unfortunately after some more tests now I was think I was wrong, the
dreq_get()s and put()s are balanced and the bug is somewhere else--in
the case of my particular reproducer.  Argh.  But yes I can easily
believe there could be a leak there.

--b.

> 
> The callgraph in between nfs_pgheader_init() and nfs_initiate_pgio() in
> nfs_generic_pg_pgios() for this case might show where we're bailing out
> early.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: spurious sillyrename after O_DIRECT writes get ENOSPC
  2017-12-14 16:36     ` J. Bruce Fields
@ 2017-12-14 18:55       ` J. Bruce Fields
  2017-12-19 16:56         ` J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2017-12-14 18:55 UTC (permalink / raw)
  To: Benjamin Coddington; +Cc: linux-nfs, hch

On Thu, Dec 14, 2017 at 11:36:54AM -0500, J. Bruce Fields wrote:
> On Thu, Dec 14, 2017 at 08:08:53AM -0500, Benjamin Coddington wrote:
> > 
> > On 13 Dec 2017, at 12:18, J. Bruce Fields wrote:
> > 
> > > On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote:
> > >> Last year Christoph noticed a bug that could result in a file being
> > >> unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC:
> > >>
> > >> 	http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org
> > >>
> > >> It's reproduceable on upstream, over v3 or v4.
> > >>
> > >> I looked into it some more, and it seems to reproduce whenever a write
> > >> system call results in multiple WRITE calls, only some of which receive
> > >> ENOSPC.  I think that's resulting in a leak of the wb_kref on some
> > >> nfs_pages (possibly the ones corresponding to the ENOSPC failures?).
> > >> Those nfs_pages in turn hold references on nfs_{lock,open}_contexts.  So
> > >> a "rm" on the client (even after the file is closed) results in a
> > >> sillyrename.
> > >>
> > >> I'll keep looking at this, but the relevant code is pretty opaque to me
> > >> so far.  Any ideas welcomed.
> > >
> > > Actually it looks like a leak of dreq->io_count?  That prevents commits
> > > from being sent (which I'm also seeing in network traces--the succesfull
> > > WRITEs are unstable but never get committed), which means
> > > nfs_direct_commit_complete() is never called, and the reference taken on
> > > wb_kref in the request_commit case of nfs_direct_write_completion is
> > > never put.
> > 
> > This sounds to me like the problem Scott's working - he sent a patch
> > yesterday "nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the
> > mds".
> > 
> > I think the the rule should be that once we call
> > nfs_pgio_completion_ops->init_hdr, we have to finish with ->completion.
> > However, there are some paths where that is not the case.
> 
> Yes, I wondered about that....
> 
> Unfortunately after some more tests now I was think I was wrong, the
> dreq_get()s and put()s are balanced and the bug is somewhere else--in
> the case of my particular reproducer.  Argh.  But yes I can easily
> believe there could be a leak there.

So actually what happens is if you do a direct io write where some
WRITEs succeed and the one fails, then this:

	if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) {
		dreq->flags = 0;
		dreq->error = hdr->error;
	}

clears the NFS_ODIRECT_DO_COMMIT flag, so nfs_direct_write_complete
never scheduels the commit calls.  It looks like that leaves a bunch of
nfs_pages on some to-be-committed list, so we end up leaking a bunch of
stuff, with the most visible symptom being an unnecessarily sillyrename
on close.

I can just remove that clear of dreq_flags and that fixes the problem,
but I doubt that's correct.

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: spurious sillyrename after O_DIRECT writes get ENOSPC
  2017-12-14 18:55       ` J. Bruce Fields
@ 2017-12-19 16:56         ` J. Bruce Fields
  2018-01-16 15:08           ` [PATCH] NFS: commit direct writes even if they fail partially J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2017-12-19 16:56 UTC (permalink / raw)
  To: Benjamin Coddington; +Cc: linux-nfs, hch, Trond Myklebust, Anna Schumaker

On Thu, Dec 14, 2017 at 01:55:14PM -0500, J. Bruce Fields wrote:
> So actually what happens is if you do a direct io write where some
> WRITEs succeed and the one fails, then this:
> 
> 	if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) {
> 		dreq->flags = 0;
> 		dreq->error = hdr->error;
> 	}
> 
> clears the NFS_ODIRECT_DO_COMMIT flag, so nfs_direct_write_complete
> never scheduels the commit calls.  It looks like that leaves a bunch of
> nfs_pages on some to-be-committed list, so we end up leaking a bunch of
> stuff, with the most visible symptom being an unnecessarily sillyrename
> on close.
> 
> I can just remove that clear of dreq_flags and that fixes the problem,
> but I doubt that's correct.

Or, maybe it is.  If *any* WRITE(s) involved in this write might need
a commit or reschedule, then surely we should run
nfs_direct_write_schedule_work and let it sort them out?  I'm having
trouble seeing why clearing that field partway through could ever be
correct.

Trond, Anna, does the following look right?

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] NFS: commit direct writes even if they fail partially
  2017-12-19 16:56         ` J. Bruce Fields
@ 2018-01-16 15:08           ` J. Bruce Fields
  0 siblings, 0 replies; 7+ messages in thread
From: J. Bruce Fields @ 2018-01-16 15:08 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, hch, Benjamin Coddington

From: "J. Bruce Fields" <bfields@redhat.com>

If some of the WRITE calls making up an O_DIRECT write syscall fail,
we neglect to commit, even if some of the WRITEs succeed.

We also depend on the commit code to free the reference count on the
nfs_page taken in the "if (request_commit)" case at the end of
nfs_direct_write_completion().  The problem was originally noticed
because ENOSPC's encountered partway through a write would result in a
closed file being sillyrenamed when it should have been unlinked.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfs/direct.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index d2972d537469..8c10b0562e75 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -775,10 +775,8 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
 
 	spin_lock(&dreq->lock);
 
-	if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) {
-		dreq->flags = 0;
+	if (test_bit(NFS_IOHDR_ERROR, &hdr->flags))
 		dreq->error = hdr->error;
-	}
 	if (dreq->error == 0) {
 		nfs_direct_good_bytes(dreq, hdr);
 		if (nfs_write_need_commit(hdr)) {
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-01-16 15:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-08 22:16 spurious sillyrename after O_DIRECT writes get ENOSPC J. Bruce Fields
2017-12-13 17:18 ` J. Bruce Fields
2017-12-14 13:08   ` Benjamin Coddington
2017-12-14 16:36     ` J. Bruce Fields
2017-12-14 18:55       ` J. Bruce Fields
2017-12-19 16:56         ` J. Bruce Fields
2018-01-16 15:08           ` [PATCH] NFS: commit direct writes even if they fail partially J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).