All of lore.kernel.org
 help / color / mirror / Atom feed
From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: Anna Schumaker
	<Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>,
	Linux NFS Mailing List
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>,
	Allen Andrews
	<allen.andrews-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH V3 00/17] NFS/RDMA client-side patches
Date: Fri, 2 May 2014 18:34:20 -0400 (EDT)	[thread overview]
Message-ID: <8781.92528985$1399070079@news.gmane.org> (raw)
In-Reply-To: <45067B04-660C-4971-B12F-AEC9F7D32785-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

----- Original Message -----
> 
> On May 2, 2014, at 3:27 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > I tested nfsv3 in both IB and RoCE modes with rsize=32768 and
> > wsize=32768 -> not DOA, reliable, did data verification and passed
> > 
> > I tested nfsv3 in both IB and RoCE modes with rsize=65536 and
> > wsize=65536 -> not DOA, but not reliable either, data transfers
> > will stop after a certain amount has been transferred and the
> > mount will have a soft hang
> 
> Can you clarify what you mean by “soft hang?” Are you seeing a
> problem when mounting with the “soft” mount option, or does this
> mean “CPU soft lockup?” (INFO: task hung for 120 seconds)

Neither of those options actually.  I'm using hard,intr on the mount
flags, and by soft hang I mean that the application copying data
will come to a stop and never make any progress again.  When that
happens, you can usually interrupt the process and get back to the
command line, but it doesn't clean up internally in the kernel
because from that point on, attempts to unmount the nfs filesystem
return EBUSY.


> > ToDo items that I see:
> > 
> > Write NFSv4 rdma protocol mount support
> 
> NFSv4 does not use the MNT protocol. If NFSv4 is not working for you,
> there’s something else going on. For me NFSv4 works as well as NFSv3.
> Let me know if you need help troubleshooting.

OK, I'll see if I'm doing something wrong.  I can do nfs4 tcp mounts
just fine, but trying to do nfs4 rdma mounts results in operation not
permitted returns on the client.  And nfs3 mounts using rdma work as
expected.  This is all with the same server, same client, same mount
point, etc.

> > Fix client soft mount hangs when rsize/wsize > 32768
> 
> Does that problem occur with unpatched v3.15-rc3 on the client?

Probably.  I've been able to reproduce this for a while.  I originally
thought it was a problem between Mellanox <-> QLogic/Intel operation
because it reproduces faster in that environment, but I can get it to
reproduce in Mellanox <-> Mellanox situations too.

> HCAs/RNICs that support MTHCAFMR and FRMR should be working up to the
> largest rsize and wsize supported by the client and server.
> 
> When I use ALLPHYSICAL with large wsize, typically the server starts
> dropping NFS WRITE requests. The client retries them forever, and
> that
> looks like a mount point hang.
> 
> Something like https://bugzilla.linux-nfs.org/show_bug.cgi?id=248

This sounds like what I'm seeing here too.

> > Fix DOA of ocrdma driver
> 
> Does that problem occur with unpatched v3.15-rc3 on the client?

Haven't tried.  I'll queue that up for next week.

> Emulex has reported some problems when reconnecting, but
> I haven’t heard of issues that occur right at mount time.
> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Doug Ledford <dledford@redhat.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Anna Schumaker <Anna.Schumaker@netapp.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	linux-rdma@vger.kernel.org,
	Roland Dreier <roland@purestorage.com>,
	Allen Andrews <allen.andrews@emulex.com>
Subject: Re: [PATCH V3 00/17] NFS/RDMA client-side patches
Date: Fri, 2 May 2014 18:34:20 -0400 (EDT)	[thread overview]
Message-ID: <8781.92528985$1399070079@news.gmane.org> (raw)
In-Reply-To: <45067B04-660C-4971-B12F-AEC9F7D32785@oracle.com>

----- Original Message -----
>=20
> On May 2, 2014, at 3:27 PM, Doug Ledford <dledford@redhat.com> wrote:
>=20
> > I tested nfsv3 in both IB and RoCE modes with rsize=3D32768 and
> > wsize=3D32768 -> not DOA, reliable, did data verification and passe=
d
> >=20
> > I tested nfsv3 in both IB and RoCE modes with rsize=3D65536 and
> > wsize=3D65536 -> not DOA, but not reliable either, data transfers
> > will stop after a certain amount has been transferred and the
> > mount will have a soft hang
>=20
> Can you clarify what you mean by =E2=80=9Csoft hang?=E2=80=9D Are you=
 seeing a
> problem when mounting with the =E2=80=9Csoft=E2=80=9D mount option, o=
r does this
> mean =E2=80=9CCPU soft lockup?=E2=80=9D (INFO: task hung for 120 seco=
nds)

Neither of those options actually.  I'm using hard,intr on the mount
flags, and by soft hang I mean that the application copying data
will come to a stop and never make any progress again.  When that
happens, you can usually interrupt the process and get back to the
command line, but it doesn't clean up internally in the kernel
because from that point on, attempts to unmount the nfs filesystem
return EBUSY.


> > ToDo items that I see:
> >=20
> > Write NFSv4 rdma protocol mount support
>=20
> NFSv4 does not use the MNT protocol. If NFSv4 is not working for you,
> there=E2=80=99s something else going on. For me NFSv4 works as well a=
s NFSv3.
> Let me know if you need help troubleshooting.

OK, I'll see if I'm doing something wrong.  I can do nfs4 tcp mounts
just fine, but trying to do nfs4 rdma mounts results in operation not
permitted returns on the client.  And nfs3 mounts using rdma work as
expected.  This is all with the same server, same client, same mount
point, etc.

> > Fix client soft mount hangs when rsize/wsize > 32768
>=20
> Does that problem occur with unpatched v3.15-rc3 on the client?

Probably.  I've been able to reproduce this for a while.  I originally
thought it was a problem between Mellanox <-> QLogic/Intel operation
because it reproduces faster in that environment, but I can get it to
reproduce in Mellanox <-> Mellanox situations too.

> HCAs/RNICs that support MTHCAFMR and FRMR should be working up to the
> largest rsize and wsize supported by the client and server.
>=20
> When I use ALLPHYSICAL with large wsize, typically the server starts
> dropping NFS WRITE requests. The client retries them forever, and
> that
> looks like a mount point hang.
>=20
> Something like https://bugzilla.linux-nfs.org/show_bug.cgi?id=3D248

This sounds like what I'm seeing here too.

> > Fix DOA of ocrdma driver
>=20
> Does that problem occur with unpatched v3.15-rc3 on the client?

Haven't tried.  I'll queue that up for next week.

> Emulex has reported some problems when reconnecting, but
> I haven=E2=80=99t heard of issues that occur right at mount time.
>=20
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>=20
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>=20

--=20
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2014-05-02 22:34 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-30 19:29 [PATCH V3 00/17] NFS/RDMA client-side patches Chuck Lever
2014-04-30 19:29 ` Chuck Lever
     [not found] ` <20140430191433.5663.16217.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-04-30 19:29   ` [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth Chuck Lever
2014-04-30 19:29     ` Chuck Lever
     [not found]     ` <20140430192936.5663.66537.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-05-16  7:08       ` Devesh Sharma
2014-05-16  7:08         ` Devesh Sharma
     [not found]         ` <EE7902D3F51F404C82415C4803930ACD3FDFBDA9-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-05-16 14:10           ` Steve Wise
2014-05-16 14:10             ` Steve Wise
     [not found]             ` <53761C63.4050908-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-05-16 14:14               ` Steve Wise
2014-05-16 14:14                 ` Steve Wise
     [not found]                 ` <53761D28.3070704-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-05-16 14:29                   ` Steve Wise
2014-05-16 14:29                     ` Steve Wise
     [not found]                     ` <537620AF.3010307-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-05-17  8:23                       ` Devesh Sharma
2014-05-17  8:23                         ` Devesh Sharma
2014-04-30 19:29   ` [PATCH V3 02/17] nfs-rdma: Fix for FMR leaks Chuck Lever
2014-04-30 19:29     ` Chuck Lever
2014-04-30 19:29   ` [PATCH V3 03/17] xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context Chuck Lever
2014-04-30 19:29     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 04/17] xprtrdma: Remove BOUNCEBUFFERS memory registration mode Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 05/17] xprtrdma: Remove MEMWINDOWS registration modes Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 06/17] xprtrdma: Remove REGISTER memory registration mode Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 07/17] xprtrdma: Fall back to MTHCAFMR when FRMR is not supported Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 08/17] xprtrdma: mount reports "Invalid mount option" if memreg mode " Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 09/17] xprtrdma: Simplify rpcrdma_deregister_external() synopsis Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 10/17] xprtrdma: Make rpcrdma_ep_destroy() return void Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 11/17] xprtrdma: Split the completion queue Chuck Lever
2014-04-30 19:31     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 12/17] xprtrmda: Reduce lock contention in completion handlers Chuck Lever
2014-04-30 19:31     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 13/17] xprtrmda: Reduce calls to ib_poll_cq() " Chuck Lever
2014-04-30 19:31     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 14/17] xprtrdma: Limit work done by completion handler Chuck Lever
2014-04-30 19:31     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 15/17] xprtrdma: Reduce the number of hardway buffer allocations Chuck Lever
2014-04-30 19:31     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 16/17] xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting Chuck Lever
2014-04-30 19:31     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 17/17] xprtrdma: Remove Tavor MTU setting Chuck Lever
2014-04-30 19:31     ` Chuck Lever
     [not found]     ` <20140430193155.5663.86148.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-05-01  7:36       ` Hal Rosenstock
2014-05-01  7:36         ` Hal Rosenstock
2014-05-02 19:27   ` [PATCH V3 00/17] NFS/RDMA client-side patches Doug Ledford
2014-05-02 19:27     ` Doug Ledford
2014-05-02 19:27   ` Doug Ledford
2014-05-02 19:27     ` Doug Ledford
2014-05-02 19:27 ` Doug Ledford
     [not found] ` <5363f223.e39f420a.4af6.6fc9SMTPIN_ADDED_BROKEN@mx.google.com>
     [not found]   ` <5363f223.e39f420a.4af6.6fc9SMTPIN_ADDED_BROKEN-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2014-05-02 20:20     ` Chuck Lever
2014-05-02 20:20       ` Chuck Lever
     [not found]       ` <45067B04-660C-4971-B12F-AEC9F7D32785-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-05-02 22:34         ` Doug Ledford [this message]
2014-05-02 22:34           ` Doug Ledford
2014-05-02 22:34         ` Doug Ledford
2014-05-02 22:34           ` Doug Ledford
2014-05-02 22:34       ` Doug Ledford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='8781.92528985$1399070079@news.gmane.org' \
    --to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
    --cc=allen.andrews-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org \
    --cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.