From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> Cc: Anna Schumaker <Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>, Linux NFS Mailing List <linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>, Allen Andrews <allen.andrews-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org> Subject: Re: [PATCH V3 00/17] NFS/RDMA client-side patches Date: Fri, 2 May 2014 18:34:20 -0400 (EDT) [thread overview] Message-ID: <8781.92528985$1399070079@news.gmane.org> (raw) In-Reply-To: <45067B04-660C-4971-B12F-AEC9F7D32785-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> ----- Original Message ----- > > On May 2, 2014, at 3:27 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > > I tested nfsv3 in both IB and RoCE modes with rsize=32768 and > > wsize=32768 -> not DOA, reliable, did data verification and passed > > > > I tested nfsv3 in both IB and RoCE modes with rsize=65536 and > > wsize=65536 -> not DOA, but not reliable either, data transfers > > will stop after a certain amount has been transferred and the > > mount will have a soft hang > > Can you clarify what you mean by “soft hang?” Are you seeing a > problem when mounting with the “soft” mount option, or does this > mean “CPU soft lockup?” (INFO: task hung for 120 seconds) Neither of those options actually. I'm using hard,intr on the mount flags, and by soft hang I mean that the application copying data will come to a stop and never make any progress again. When that happens, you can usually interrupt the process and get back to the command line, but it doesn't clean up internally in the kernel because from that point on, attempts to unmount the nfs filesystem return EBUSY. > > ToDo items that I see: > > > > Write NFSv4 rdma protocol mount support > > NFSv4 does not use the MNT protocol. If NFSv4 is not working for you, > there’s something else going on. For me NFSv4 works as well as NFSv3. > Let me know if you need help troubleshooting. OK, I'll see if I'm doing something wrong. I can do nfs4 tcp mounts just fine, but trying to do nfs4 rdma mounts results in operation not permitted returns on the client. And nfs3 mounts using rdma work as expected. This is all with the same server, same client, same mount point, etc. > > Fix client soft mount hangs when rsize/wsize > 32768 > > Does that problem occur with unpatched v3.15-rc3 on the client? Probably. I've been able to reproduce this for a while. I originally thought it was a problem between Mellanox <-> QLogic/Intel operation because it reproduces faster in that environment, but I can get it to reproduce in Mellanox <-> Mellanox situations too. > HCAs/RNICs that support MTHCAFMR and FRMR should be working up to the > largest rsize and wsize supported by the client and server. > > When I use ALLPHYSICAL with large wsize, typically the server starts > dropping NFS WRITE requests. The client retries them forever, and > that > looks like a mount point hang. > > Something like https://bugzilla.linux-nfs.org/show_bug.cgi?id=248 This sounds like what I'm seeing here too. > > Fix DOA of ocrdma driver > > Does that problem occur with unpatched v3.15-rc3 on the client? Haven't tried. I'll queue that up for next week. > Emulex has reported some problems when reconnecting, but > I haven’t heard of issues that occur right at mount time. > > -- > Chuck Lever > chuck[dot]lever[at]oracle[dot]com > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" > in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD http://people.redhat.com/dledford -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: Doug Ledford <dledford@redhat.com> To: Chuck Lever <chuck.lever@oracle.com> Cc: Anna Schumaker <Anna.Schumaker@netapp.com>, Linux NFS Mailing List <linux-nfs@vger.kernel.org>, linux-rdma@vger.kernel.org, Roland Dreier <roland@purestorage.com>, Allen Andrews <allen.andrews@emulex.com> Subject: Re: [PATCH V3 00/17] NFS/RDMA client-side patches Date: Fri, 2 May 2014 18:34:20 -0400 (EDT) [thread overview] Message-ID: <8781.92528985$1399070079@news.gmane.org> (raw) In-Reply-To: <45067B04-660C-4971-B12F-AEC9F7D32785@oracle.com> ----- Original Message ----- >=20 > On May 2, 2014, at 3:27 PM, Doug Ledford <dledford@redhat.com> wrote: >=20 > > I tested nfsv3 in both IB and RoCE modes with rsize=3D32768 and > > wsize=3D32768 -> not DOA, reliable, did data verification and passe= d > >=20 > > I tested nfsv3 in both IB and RoCE modes with rsize=3D65536 and > > wsize=3D65536 -> not DOA, but not reliable either, data transfers > > will stop after a certain amount has been transferred and the > > mount will have a soft hang >=20 > Can you clarify what you mean by =E2=80=9Csoft hang?=E2=80=9D Are you= seeing a > problem when mounting with the =E2=80=9Csoft=E2=80=9D mount option, o= r does this > mean =E2=80=9CCPU soft lockup?=E2=80=9D (INFO: task hung for 120 seco= nds) Neither of those options actually. I'm using hard,intr on the mount flags, and by soft hang I mean that the application copying data will come to a stop and never make any progress again. When that happens, you can usually interrupt the process and get back to the command line, but it doesn't clean up internally in the kernel because from that point on, attempts to unmount the nfs filesystem return EBUSY. > > ToDo items that I see: > >=20 > > Write NFSv4 rdma protocol mount support >=20 > NFSv4 does not use the MNT protocol. If NFSv4 is not working for you, > there=E2=80=99s something else going on. For me NFSv4 works as well a= s NFSv3. > Let me know if you need help troubleshooting. OK, I'll see if I'm doing something wrong. I can do nfs4 tcp mounts just fine, but trying to do nfs4 rdma mounts results in operation not permitted returns on the client. And nfs3 mounts using rdma work as expected. This is all with the same server, same client, same mount point, etc. > > Fix client soft mount hangs when rsize/wsize > 32768 >=20 > Does that problem occur with unpatched v3.15-rc3 on the client? Probably. I've been able to reproduce this for a while. I originally thought it was a problem between Mellanox <-> QLogic/Intel operation because it reproduces faster in that environment, but I can get it to reproduce in Mellanox <-> Mellanox situations too. > HCAs/RNICs that support MTHCAFMR and FRMR should be working up to the > largest rsize and wsize supported by the client and server. >=20 > When I use ALLPHYSICAL with large wsize, typically the server starts > dropping NFS WRITE requests. The client retries them forever, and > that > looks like a mount point hang. >=20 > Something like https://bugzilla.linux-nfs.org/show_bug.cgi?id=3D248 This sounds like what I'm seeing here too. > > Fix DOA of ocrdma driver >=20 > Does that problem occur with unpatched v3.15-rc3 on the client? Haven't tried. I'll queue that up for next week. > Emulex has reported some problems when reconnecting, but > I haven=E2=80=99t heard of issues that occur right at mount time. >=20 > -- > Chuck Lever > chuck[dot]lever[at]oracle[dot]com >=20 >=20 >=20 > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >=20 --=20 Doug Ledford <dledford@redhat.com> GPG KeyID: 0E572FDD http://people.redhat.com/dledford -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-05-02 22:34 UTC|newest] Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-04-30 19:29 [PATCH V3 00/17] NFS/RDMA client-side patches Chuck Lever 2014-04-30 19:29 ` Chuck Lever [not found] ` <20140430191433.5663.16217.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> 2014-04-30 19:29 ` [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth Chuck Lever 2014-04-30 19:29 ` Chuck Lever [not found] ` <20140430192936.5663.66537.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> 2014-05-16 7:08 ` Devesh Sharma 2014-05-16 7:08 ` Devesh Sharma [not found] ` <EE7902D3F51F404C82415C4803930ACD3FDFBDA9-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org> 2014-05-16 14:10 ` Steve Wise 2014-05-16 14:10 ` Steve Wise [not found] ` <53761C63.4050908-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2014-05-16 14:14 ` Steve Wise 2014-05-16 14:14 ` Steve Wise [not found] ` <53761D28.3070704-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2014-05-16 14:29 ` Steve Wise 2014-05-16 14:29 ` Steve Wise [not found] ` <537620AF.3010307-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2014-05-17 8:23 ` Devesh Sharma 2014-05-17 8:23 ` Devesh Sharma 2014-04-30 19:29 ` [PATCH V3 02/17] nfs-rdma: Fix for FMR leaks Chuck Lever 2014-04-30 19:29 ` Chuck Lever 2014-04-30 19:29 ` [PATCH V3 03/17] xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context Chuck Lever 2014-04-30 19:29 ` Chuck Lever 2014-04-30 19:30 ` [PATCH V3 04/17] xprtrdma: Remove BOUNCEBUFFERS memory registration mode Chuck Lever 2014-04-30 19:30 ` Chuck Lever 2014-04-30 19:30 ` [PATCH V3 05/17] xprtrdma: Remove MEMWINDOWS registration modes Chuck Lever 2014-04-30 19:30 ` Chuck Lever 2014-04-30 19:30 ` [PATCH V3 06/17] xprtrdma: Remove REGISTER memory registration mode Chuck Lever 2014-04-30 19:30 ` Chuck Lever 2014-04-30 19:30 ` [PATCH V3 07/17] xprtrdma: Fall back to MTHCAFMR when FRMR is not supported Chuck Lever 2014-04-30 19:30 ` Chuck Lever 2014-04-30 19:30 ` [PATCH V3 08/17] xprtrdma: mount reports "Invalid mount option" if memreg mode " Chuck Lever 2014-04-30 19:30 ` Chuck Lever 2014-04-30 19:30 ` [PATCH V3 09/17] xprtrdma: Simplify rpcrdma_deregister_external() synopsis Chuck Lever 2014-04-30 19:30 ` Chuck Lever 2014-04-30 19:30 ` [PATCH V3 10/17] xprtrdma: Make rpcrdma_ep_destroy() return void Chuck Lever 2014-04-30 19:30 ` Chuck Lever 2014-04-30 19:31 ` [PATCH V3 11/17] xprtrdma: Split the completion queue Chuck Lever 2014-04-30 19:31 ` Chuck Lever 2014-04-30 19:31 ` [PATCH V3 12/17] xprtrmda: Reduce lock contention in completion handlers Chuck Lever 2014-04-30 19:31 ` Chuck Lever 2014-04-30 19:31 ` [PATCH V3 13/17] xprtrmda: Reduce calls to ib_poll_cq() " Chuck Lever 2014-04-30 19:31 ` Chuck Lever 2014-04-30 19:31 ` [PATCH V3 14/17] xprtrdma: Limit work done by completion handler Chuck Lever 2014-04-30 19:31 ` Chuck Lever 2014-04-30 19:31 ` [PATCH V3 15/17] xprtrdma: Reduce the number of hardway buffer allocations Chuck Lever 2014-04-30 19:31 ` Chuck Lever 2014-04-30 19:31 ` [PATCH V3 16/17] xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting Chuck Lever 2014-04-30 19:31 ` Chuck Lever 2014-04-30 19:31 ` [PATCH V3 17/17] xprtrdma: Remove Tavor MTU setting Chuck Lever 2014-04-30 19:31 ` Chuck Lever [not found] ` <20140430193155.5663.86148.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> 2014-05-01 7:36 ` Hal Rosenstock 2014-05-01 7:36 ` Hal Rosenstock 2014-05-02 19:27 ` [PATCH V3 00/17] NFS/RDMA client-side patches Doug Ledford 2014-05-02 19:27 ` Doug Ledford 2014-05-02 19:27 ` Doug Ledford 2014-05-02 19:27 ` Doug Ledford 2014-05-02 19:27 ` Doug Ledford [not found] ` <5363f223.e39f420a.4af6.6fc9SMTPIN_ADDED_BROKEN@mx.google.com> [not found] ` <5363f223.e39f420a.4af6.6fc9SMTPIN_ADDED_BROKEN-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org> 2014-05-02 20:20 ` Chuck Lever 2014-05-02 20:20 ` Chuck Lever [not found] ` <45067B04-660C-4971-B12F-AEC9F7D32785-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2014-05-02 22:34 ` Doug Ledford [this message] 2014-05-02 22:34 ` Doug Ledford 2014-05-02 22:34 ` Doug Ledford 2014-05-02 22:34 ` Doug Ledford 2014-05-02 22:34 ` Doug Ledford
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='8781.92528985$1399070079@news.gmane.org' \ --to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \ --cc=Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \ --cc=allen.andrews-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org \ --cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \ --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.