All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Benjamin Coddington <bcodding@redhat.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Jeff Layton <jlayton@redhat.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 3/6] NFSv4: change nfs4_do_setattr to take an open_context instead of a nfs4_state.
Date: Thu, 03 Nov 2016 10:34:56 +1100	[thread overview]
Message-ID: <877f8lqwv3.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <0773ADA2-C1DE-4D01-8B3C-5883A6A62C2E@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2307 bytes --]

On Thu, Nov 03 2016, Benjamin Coddington wrote:

> On 13 Oct 2016, at 0:26, NeilBrown wrote:

>> @@ -3694,20 +3695,17 @@ nfs4_proc_setattr(struct dentry *dentry, 
>> struct nfs_fattr *fattr,
>>
>>  	/* Search for an existing open(O_WRITE) file */
>>  	if (sattr->ia_valid & ATTR_FILE) {
>> -		struct nfs_open_context *ctx;
>>
>>  		ctx = nfs_file_open_context(sattr->ia_file);
>> -		if (ctx) {
>> +		if (ctx)
>>  			cred = ctx->cred;
>> -			state = ctx->state;
>> -		}
>>  	}
>
>
> Does this need a get_nfs_open_context() there to make sure the open 
> context
> doesn't drop away?

I can't see why you would.  The ia_file must hold a reference to the
ctx, and this code doesn't keep any reference to the ctx after
nfs4_proc_setattr completes - does it?



>     I'm getting this on generic/089:
>
> [  651.855291] run fstests generic/089 at 2016-11-01 11:15:57
> [  652.645828] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> [  653.166259] NFSD: client ::1 testing state ID with incorrect client 
> ID
> [  653.167218] BUG: unable to handle kernel NULL pointer dereference at 
> 0000000000000018

I think this BUG is happening in nfs41_check_expired_locks.
This:
	list_for_each_entry(lsp, &state->lock_states, ls_locks) {
walks off the end of a list, finding a NULL on a list which should never
have a NULL pointer.  That does suggest a use-after-free of an
nfs4_lock_state, or possibly of an nfs4_state.

I can't see it in the code yet though.

>
> Something else is also wrong there.. wrapping that with
> get_nfs_open_context() makes the crash go away, but there are still 
> several
> "NFS: nfs4_reclaim_open_state: Lock reclaim failed!" in the log.   Why 
> would
> we be doing reclaim at all?  I'll look at a network capture next.

The
> [  653.166259] NFSD: client ::1 testing state ID with incorrect client ID

errors suggests that the client is sending a stateid that doesn't match
the client id, so the server reports and error and the client enters
state recovery.
Maybe one thread is dropping a flock lock while another thread is using
it for some IO and they race?  I think there are refcounts in place to
protect that but something might be missing.

I look forward to seeing the network capture.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

  reply	other threads:[~2016-11-02 23:34 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-13  4:26 [PATCH 0/6] NFSv4: Fix stateid used when flock locks in use. - V2 NeilBrown
2016-10-13  4:26 ` [PATCH 6/6] NFS: discard nfs_lockowner structure NeilBrown
2016-10-13  4:26 ` [PATCH 3/6] NFSv4: change nfs4_do_setattr to take an open_context instead of a nfs4_state NeilBrown
2016-11-02 15:49   ` Benjamin Coddington
2016-11-02 23:34     ` NeilBrown [this message]
2016-11-03 16:38       ` Benjamin Coddington
2016-11-03 23:12         ` Benjamin Coddington
2016-10-13  4:26 ` [PATCH 2/6] NFSv4: add flock_owner to open context NeilBrown
2016-10-13  4:26 ` [PATCH 4/6] NFSv4: change nfs4_select_rw_stateid to take a lock_context inplace of lock_owner NeilBrown
2016-10-20  0:57   ` NeilBrown
2016-10-13  4:26 ` [PATCH 1/6] NFS: remove l_pid field from nfs_lockowner NeilBrown
2016-10-13  4:26 ` [PATCH 5/6] NFSv4: enhance nfs4_copy_lock_stateid to use a flock stateid if there is one NeilBrown
2016-10-13 15:22   ` Jeff Layton
2016-10-14  0:22     ` NeilBrown
2016-10-14 10:49       ` Jeff Layton
2016-12-19  0:33         ` [PATCH] NFSv4: ensure __nfs4_find_lock_state returns consistent result NeilBrown
2016-10-13 15:31 ` [PATCH 0/6] NFSv4: Fix stateid used when flock locks in use. - V2 Jeff Layton
2016-10-18 21:52   ` NeilBrown
2016-11-18  4:59     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877f8lqwv3.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=anna.schumaker@netapp.com \
    --cc=bcodding@redhat.com \
    --cc=jlayton@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.