Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "aglo@umich.edu" <aglo@umich.edu>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"Anna.Schumaker@netapp.com" <Anna.Schumaker@netapp.com>
Subject: Re: [PATCH v2 0/9] Various NFSv4 state error handling fixes
Date: Thu, 19 Sep 2019 23:42:09 +0000
Message-ID: <8720be3295e3b0035396b9bec70231a628231c93.camel@hammerspace.com> (raw)
In-Reply-To: <CAN-5tyGX_Mb-wGTREtSWRFFSNK0qjgqLbm8SFPG=DPM7M2OWoQ@mail.gmail.com>

Hi Olga

On Thu, 2019-09-19 at 09:14 -0400, Olga Kornievskaia wrote:
> Hi Trond,
> 
> On Wed, Sep 18, 2019 at 9:49 PM Trond Myklebust <
> trondmy@hammerspace.com> wrote:
> > Hi Olga
> > 
> > On Wed, 2019-09-18 at 15:38 -0400, Olga Kornievskaia wrote:
> > > Hi Trond,
> > > 
> > > These set of patches do not address the locking problem. It's
> > > actually
> > > not the locking patch (which I thought it was as I reverted it
> > > and
> > > still had the issue). Without the whole patch series the unlock
> > > works
> > > fine so something in these new patches. Something is up with the
> > > 2
> > > patches:
> > > NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> > > NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
> > > 
> > > If I remove either one separately, unlock fails but if I remove
> > > both
> > > unlock works.
> > 
> > Can you describe how you are testing this, and perhaps provide
> > wireshark traces that show how we're triggering these problems?
> 
> I'm triggering by running "nfstest_lock --nfsversion 4.1 --runtest
> btest01" against either linux or ontap servers (while the test
> doesn't
> fail but on the network trace you can see unlock failing with
> bad_stateid). Network trace attached.
> 
> But actually a simple test open, lock, unlock does the trick (network
> trace attached).
> fd1 = open(RDWR)
> fctl(fd1) (lock /unlock)


These traces really do not mesh with what I'm seeing using a simple
Connectathon lock test run. When I look at the wireshark output from
that, I see exadtly two cases where the stateid arguments are both
zero, and those are both SETATTR, so expected.

All the LOCKU are showing up as non-zero stateids, and so I'm seeing no
BAD_STATEID or OLD_STATEID errors at all.

Is there something special about how your test is running?

Cheers
  Trond

PS: Note: I do think I need a v3 of the LOCKU patch in order to add a
spinlock around the new copy of the lock stateid in
nfs4_alloc_unlockdata(). However I don't see how the missing spinlocks
could cause you to consistently be seeing an all-zero stateid argument.


> 
> > Thanks!
> >   Trond
> > 
> > > On Mon, Sep 16, 2019 at 4:46 PM Trond Myklebust <
> > > trondmy@gmail.com>
> > > wrote:
> > > > Various NFSv4 fixes to ensure we handle state errors correctly.
> > > > In
> > > > particular, we need to ensure that for COMPOUNDs like CLOSE and
> > > > DELEGRETURN, that may have an embedded LAYOUTRETURN, we handle
> > > > the
> > > > layout state errors so that a retry of either the LAYOUTRETURN,
> > > > or
> > > > the later CLOSE/DELEGRETURN does not corrupt the LAYOUTRETURN
> > > > reply.
> > > > 
> > > > Also ensure that if we get a NFS4ERR_OLD_STATEID, then we do
> > > > our
> > > > best to still try to destroy the state on the server, in order
> > > > to
> > > > avoid causing state leakage.
> > > > 
> > > > v2: Fix bug reports from Olga
> > > >  - Try to avoid sending old stateids on CLOSE/OPEN_DOWNGRADE
> > > > when
> > > >    doing fully serialised NFSv4.0.
> > > >  - Ensure LOCKU initialises the stateid correctly.
> > > > 
> > > > Trond Myklebust (9):
> > > >   pNFS: Ensure we do clear the return-on-close layout stateid
> > > > on
> > > > fatal
> > > >     errors
> > > >   NFSv4: Clean up pNFS return-on-close error handling
> > > >   NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
> > > >   NFSv4: Handle RPC level errors in LAYOUTRETURN
> > > >   NFSv4: Add a helper to increment stateid seqids
> > > >   pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping
> > > > the
> > > > state
> > > >     seqid
> > > >   NFSv4: Fix OPEN_DOWNGRADE error handling
> > > >   NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> > > >   NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
> > > > 
> > > >  fs/nfs/nfs4_fs.h   |  11 ++-
> > > >  fs/nfs/nfs4proc.c  | 204 ++++++++++++++++++++++++++++++-------
> > > > ----
> > > > ----
> > > >  fs/nfs/nfs4state.c |  16 ----
> > > >  fs/nfs/pnfs.c      |  71 ++++++++++++++--
> > > >  fs/nfs/pnfs.h      |  17 +++-
> > > >  5 files changed, 229 insertions(+), 90 deletions(-)
> > > > 
> > > > --
> > > > 2.21.0
> > > > 
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
> > 
> > 
Trond Myklebust
CTO, Hammerspace Inc
4300 El Camino Real, Suite 105
Los Altos, CA 94022
www.hammer.space

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  parent reply index

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-16 20:44 Trond Myklebust
2019-09-16 20:44 ` [PATCH v2 1/9] pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors Trond Myklebust
2019-09-16 20:44   ` [PATCH v2 2/9] NFSv4: Clean up pNFS return-on-close error handling Trond Myklebust
2019-09-16 20:44     ` [PATCH v2 3/9] NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close Trond Myklebust
2019-09-16 20:44       ` [PATCH v2 4/9] NFSv4: Handle RPC level errors in LAYOUTRETURN Trond Myklebust
2019-09-16 20:44         ` [PATCH v2 5/9] NFSv4: Add a helper to increment stateid seqids Trond Myklebust
2019-09-16 20:44           ` [PATCH v2 6/9] pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid Trond Myklebust
2019-09-16 20:44             ` [PATCH v2 7/9] NFSv4: Fix OPEN_DOWNGRADE error handling Trond Myklebust
2019-09-16 20:44               ` [PATCH v2 8/9] NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE Trond Myklebust
2019-09-16 20:44                 ` [PATCH v2 9/9] NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU Trond Myklebust
2019-09-18 19:38 ` [PATCH v2 0/9] Various NFSv4 state error handling fixes Olga Kornievskaia
2019-09-19  1:49   ` Trond Myklebust
     [not found]     ` <CAN-5tyGX_Mb-wGTREtSWRFFSNK0qjgqLbm8SFPG=DPM7M2OWoQ@mail.gmail.com>
2019-09-19 23:42       ` Trond Myklebust [this message]
2019-09-20 14:25         ` Olga Kornievskaia
2019-09-20 14:54           ` Trond Myklebust

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8720be3295e3b0035396b9bec70231a628231c93.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=aglo@umich.edu \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git