From: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> To: Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> Cc: Devesh Sharma <Devesh.Sharma-iH1Dq9VlAzfQT0dZR+AlfA@public.gmane.org>, Linux NFS Mailing List <linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, Trond Myklebust <trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org> Subject: Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks Date: Sun, 27 Apr 2014 08:37:09 -0400 [thread overview] Message-ID: <4ACED3B0-CC8B-4F1F-8DB6-6C272AB17C99@oracle.com> (raw) In-Reply-To: <535CD819.3050508-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> On Apr 27, 2014, at 6:12 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote: > On 4/24/2014 6:01 PM, Chuck Lever wrote: >> On Apr 24, 2014, at 3:12 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote: >> >>> On 4/24/2014 2:30 AM, Devesh Sharma wrote: >>>> Hi Chuck >>>> >>>> Following is the complete call trace of a typical NFS-RDMA transaction while mounting a share. >>>> It is unavoidable to stop calling post-send in case it is not created. Therefore, applying checks to the connection state is a must >>>> While registering/deregistering frmrs on-the-fly. The unconnected state of QP implies don't call post_send/post_recv from any context. >>>> >>> Long thread... didn't follow it all. >> I think you got the gist of it. >> >>> If I understand correctly this race comes only for *cleanup* (LINV) of FRMR registration while teardown flow destroyed the QP. >>> I think this might be disappear if for each registration you post LINV+FRMR. >>> This is assuming that a situation where trying to post Fastreg on a "bad" QP can >>> never happen (usually since teardown flow typically suspends outgoing commands). >> That’s typically true for “hard” NFS mounts. But “soft” NFS mounts >> wake RPCs after a timeout while the transport is disconnected, in >> order to kill them. At that point, deregistration still needs to >> succeed somehow. > > Not sure I understand, Can you please explain why deregistration will not succeed? > AFAIK You are allowed to register FRMR and then deregister it without having to invalidate it. > > Can you please explain why you logically connected LINV with deregistration? Confusion. Sorry. > >> >> IMO there are three related problems. >> >> 1. rpcrdma_ep_connect() is allowing RPC tasks to be awoken while >> there is no QP at all (->qp is NULL). The woken RPC tasks are >> trying to deregister buffers that may include page cache pages, >> and it’s oopsing because ->qp is NULL. >> >> That’s a logic bug in rpcrdma_ep_connect(), and I have an idea >> how to address it. > > Why not first create a new id+qp and assign them - and then destroy the old id+qp? > see SRP related section: ib_srp.x:srp_create_target_ib() > > Anyway it is indeed important to guarantee that no xmit flows happens concurrently to that, > and cleanups are processed synchronously and in-order. I posted a patch on Friday that should provide that serialization. > >> >> 2. If a QP is present but disconnected, posting LOCAL_INV won’t work. >> That leaves buffers (and page cache pages, potentially) registered. >> That could be addressed with LINV+FRMR. But... >> >> 3. The client should not leave page cache pages registered indefinitely. >> Both LINV+FRMR and our current approach depends on having a working >> QP _at_ _some_ _point_ … but the client simply can’t depend on that. >> What happens if an NFS server is, say, destroyed by fire while there >> are active client mount points? What if the HCA’s firmware is >> permanently not allowing QP creation? > > Again, I don't understand why you can't dereg_mr(). > > How about allocating the FRMR pool *after* the QP was successfully created/connected (makes sense as the FRMRs are > not usable until then), and destroy/cleanup the pool before the QP is disconnected/destroyed. it also makes sense as they > must match PDs. It’s not about deregistration, but rather about invalidation, I was confused. xprt_rdma_free() invalidates and then frees the chunks on RPC chunk lists. We just need to see that those invalidations can be successful while the transport is disconnected. I understand that even in the error state, a QP should allow consumers to post send WRs to invalidate FRMRs…? The other case is whether the consumer can _replace_ a QP with a fresh one, and still have invalidations succeed, even if the transport remains disconnected, once waiting RPCs are awoken. An alternative would be to invalidate all waiting RPC chunk lists on a transport as soon as the QP goes to error state but before it is destroyed, and fastreg the chunks again when waiting RPCs are remarshalled. I think the goals are: 1. Avoid fastreg on an FRMR that is already valid 2. Avoid leaving FRMRs valid indefinitely (preferably just long enough to execute the RPC request, and no longer) > >> >> Here's a relevant comment in rpcrdma_ep_connect(): >> >> 815 /* TEMP TEMP TEMP - fail if new device: >> 816 * Deregister/remarshal *all* requests! >> 817 * Close and recreate adapter, pd, etc! >> 818 * Re-determine all attributes still sane! >> 819 * More stuff I haven't thought of! >> 820 * Rrrgh! >> 821 */ >> >> xprtrdma does not do this today. >> >> When a new device is created, all existing RPC requests could be >> deregistered and re-marshalled. As far as I can tell, >> rpcrdma_ep_connect() is executing in a synchronous context (the connect >> worker) and we can simply use dereg_mr, as long as later, when the RPCs >> are re-driven, they know they need to re-marshal. > > Agree. > > Sagi. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: Chuck Lever <chuck.lever@oracle.com> To: Sagi Grimberg <sagig@dev.mellanox.co.il> Cc: Devesh Sharma <Devesh.Sharma@Emulex.Com>, Linux NFS Mailing List <linux-nfs@vger.kernel.org>, "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, Trond Myklebust <trond.myklebust@primarydata.com> Subject: Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks Date: Sun, 27 Apr 2014 08:37:09 -0400 [thread overview] Message-ID: <4ACED3B0-CC8B-4F1F-8DB6-6C272AB17C99@oracle.com> (raw) In-Reply-To: <535CD819.3050508@dev.mellanox.co.il> On Apr 27, 2014, at 6:12 AM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote: > On 4/24/2014 6:01 PM, Chuck Lever wrote: >> On Apr 24, 2014, at 3:12 AM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote: >> >>> On 4/24/2014 2:30 AM, Devesh Sharma wrote: >>>> Hi Chuck >>>> >>>> Following is the complete call trace of a typical NFS-RDMA transaction while mounting a share. >>>> It is unavoidable to stop calling post-send in case it is not created. Therefore, applying checks to the connection state is a must >>>> While registering/deregistering frmrs on-the-fly. The unconnected state of QP implies don't call post_send/post_recv from any context. >>>> >>> Long thread... didn't follow it all. >> I think you got the gist of it. >> >>> If I understand correctly this race comes only for *cleanup* (LINV) of FRMR registration while teardown flow destroyed the QP. >>> I think this might be disappear if for each registration you post LINV+FRMR. >>> This is assuming that a situation where trying to post Fastreg on a "bad" QP can >>> never happen (usually since teardown flow typically suspends outgoing commands). >> That’s typically true for “hard” NFS mounts. But “soft” NFS mounts >> wake RPCs after a timeout while the transport is disconnected, in >> order to kill them. At that point, deregistration still needs to >> succeed somehow. > > Not sure I understand, Can you please explain why deregistration will not succeed? > AFAIK You are allowed to register FRMR and then deregister it without having to invalidate it. > > Can you please explain why you logically connected LINV with deregistration? Confusion. Sorry. > >> >> IMO there are three related problems. >> >> 1. rpcrdma_ep_connect() is allowing RPC tasks to be awoken while >> there is no QP at all (->qp is NULL). The woken RPC tasks are >> trying to deregister buffers that may include page cache pages, >> and it’s oopsing because ->qp is NULL. >> >> That’s a logic bug in rpcrdma_ep_connect(), and I have an idea >> how to address it. > > Why not first create a new id+qp and assign them - and then destroy the old id+qp? > see SRP related section: ib_srp.x:srp_create_target_ib() > > Anyway it is indeed important to guarantee that no xmit flows happens concurrently to that, > and cleanups are processed synchronously and in-order. I posted a patch on Friday that should provide that serialization. > >> >> 2. If a QP is present but disconnected, posting LOCAL_INV won’t work. >> That leaves buffers (and page cache pages, potentially) registered. >> That could be addressed with LINV+FRMR. But... >> >> 3. The client should not leave page cache pages registered indefinitely. >> Both LINV+FRMR and our current approach depends on having a working >> QP _at_ _some_ _point_ … but the client simply can’t depend on that. >> What happens if an NFS server is, say, destroyed by fire while there >> are active client mount points? What if the HCA’s firmware is >> permanently not allowing QP creation? > > Again, I don't understand why you can't dereg_mr(). > > How about allocating the FRMR pool *after* the QP was successfully created/connected (makes sense as the FRMRs are > not usable until then), and destroy/cleanup the pool before the QP is disconnected/destroyed. it also makes sense as they > must match PDs. It’s not about deregistration, but rather about invalidation, I was confused. xprt_rdma_free() invalidates and then frees the chunks on RPC chunk lists. We just need to see that those invalidations can be successful while the transport is disconnected. I understand that even in the error state, a QP should allow consumers to post send WRs to invalidate FRMRs…? The other case is whether the consumer can _replace_ a QP with a fresh one, and still have invalidations succeed, even if the transport remains disconnected, once waiting RPCs are awoken. An alternative would be to invalidate all waiting RPC chunk lists on a transport as soon as the QP goes to error state but before it is destroyed, and fastreg the chunks again when waiting RPCs are remarshalled. I think the goals are: 1. Avoid fastreg on an FRMR that is already valid 2. Avoid leaving FRMRs valid indefinitely (preferably just long enough to execute the RPC request, and no longer) > >> >> Here's a relevant comment in rpcrdma_ep_connect(): >> >> 815 /* TEMP TEMP TEMP - fail if new device: >> 816 * Deregister/remarshal *all* requests! >> 817 * Close and recreate adapter, pd, etc! >> 818 * Re-determine all attributes still sane! >> 819 * More stuff I haven't thought of! >> 820 * Rrrgh! >> 821 */ >> >> xprtrdma does not do this today. >> >> When a new device is created, all existing RPC requests could be >> deregistered and re-marshalled. As far as I can tell, >> rpcrdma_ep_connect() is executing in a synchronous context (the connect >> worker) and we can simply use dereg_mr, as long as later, when the RPCs >> are re-driven, they know they need to re-marshal. > > Agree. > > Sagi. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com
next prev parent reply other threads:[~2014-04-27 12:37 UTC|newest] Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-04-09 18:40 [PATCH V1] NFS-RDMA: fix qp pointer validation checks Devesh Sharma 2014-04-09 18:40 ` Devesh Sharma [not found] ` <014738b6-698e-4ea1-82f9-287378bfec19-3RiH6ntJJkOPfaB/Gd0HpljyZtpTMMwT@public.gmane.org> 2014-04-09 20:22 ` Trond Myklebust 2014-04-09 20:22 ` Trond Myklebust [not found] ` <D7AB2150-5F25-4BA2-80D9-94890AD11F8F-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org> 2014-04-09 20:26 ` Chuck Lever 2014-04-09 20:26 ` Chuck Lever [not found] ` <F1C70AD6-BDD4-4534-8DC4-61D2767581D9-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2014-04-09 23:56 ` Devesh Sharma 2014-04-09 23:56 ` Devesh Sharma [not found] ` <EE7902D3F51F404C82415C4803930ACD3FDEAA43-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org> 2014-04-10 0:26 ` Chuck Lever 2014-04-10 0:26 ` Chuck Lever [not found] ` <E66D006A-0D04-4602-8BF5-6834CACD2E24-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2014-04-10 15:01 ` Steve Wise 2014-04-10 15:01 ` Steve Wise [not found] ` <5346B22D.3060706-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2014-04-10 17:43 ` Chuck Lever 2014-04-10 17:43 ` Chuck Lever [not found] ` <D7836AB3-FCB6-40EF-9954-B58A05A87791-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2014-04-10 18:34 ` Steve Wise 2014-04-10 18:34 ` Steve Wise 2014-04-10 17:42 ` Devesh Sharma 2014-04-10 17:42 ` Devesh Sharma [not found] ` <EE7902D3F51F404C82415C4803930ACD3FDEB3B4-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org> 2014-04-10 17:51 ` Chuck Lever 2014-04-10 17:51 ` Chuck Lever [not found] ` <BD7B05C0-4733-4DD1-83F3-B30B6B0EE48C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2014-04-10 17:54 ` Devesh Sharma 2014-04-10 17:54 ` Devesh Sharma [not found] ` <EE7902D3F51F404C82415C4803930ACD3FDEB3DF-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org> 2014-04-10 19:53 ` Chuck Lever 2014-04-10 19:53 ` Chuck Lever [not found] ` <56C87770-7940-4006-948C-FEF3C0EC4ACC-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2014-04-11 23:51 ` Devesh Sharma 2014-04-11 23:51 ` Devesh Sharma [not found] ` <EE7902D3F51F404C82415C4803930ACD3FDEBD66-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org> 2014-04-13 4:01 ` Chuck Lever 2014-04-13 4:01 ` Chuck Lever [not found] ` <5710A71F-C4D5-408B-9B41-07F21B5853F0-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2014-04-14 20:53 ` Chuck Lever 2014-04-14 20:53 ` Chuck Lever [not found] ` <6837A427-B677-4CC7-A022-4FB9E52A3FC6-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2014-04-14 22:46 ` Devesh Sharma 2014-04-14 22:46 ` Devesh Sharma [not found] ` <EE7902D3F51F404C82415C4803930ACD3FDED915-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org> 2014-04-15 0:39 ` Chuck Lever 2014-04-15 0:39 ` Chuck Lever [not found] ` <C689AB91-46F6-4E96-A673-0DE76FE54CC4-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2014-04-15 18:25 ` Devesh Sharma 2014-04-15 18:25 ` Devesh Sharma [not found] ` <EE7902D3F51F404C82415C4803930ACD3FDEE11F-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org> 2014-04-23 23:30 ` Devesh Sharma 2014-04-23 23:30 ` Devesh Sharma [not found] ` <1bab6615-60c4-4865-a6a0-c53bb1c32341-3RiH6ntJJkP8BX6JNMqfyFjyZtpTMMwT@public.gmane.org> 2014-04-24 7:12 ` Sagi Grimberg 2014-04-24 7:12 ` Sagi Grimberg [not found] ` <5358B975.4020207-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2014-04-24 15:01 ` Chuck Lever 2014-04-24 15:01 ` Chuck Lever [not found] ` <B39C0B38-357F-4BDA-BDA7-048BD38853F7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2014-04-24 15:48 ` Devesh Sharma 2014-04-24 15:48 ` Devesh Sharma [not found] ` <EE7902D3F51F404C82415C4803930ACD3FDF4F83-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org> 2014-04-24 17:44 ` Chuck Lever 2014-04-24 17:44 ` Chuck Lever 2014-04-27 10:12 ` Sagi Grimberg 2014-04-27 10:12 ` Sagi Grimberg [not found] ` <535CD819.3050508@dev! .mellanox.co.il> [not found] ` <535CD819.3050508-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2014-04-27 12:37 ` Chuck Lever [this message] 2014-04-27 12:37 ` Chuck Lever [not found] ` <4ACED3B0-CC8B-4F1F-8DB6-6C272AB17C99-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2014-04-28 8:58 ` Sagi Grimberg 2014-04-28 8:58 ` Sagi Grimberg 2014-04-14 23:55 ` Devesh Sharma 2014-04-14 23:55 ` Devesh Sharma
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=4ACED3B0-CC8B-4F1F-8DB6-6C272AB17C99@oracle.com \ --to=chuck.lever-qhclzuegtsvqt0dzr+alfa@public.gmane.org \ --cc=Devesh.Sharma-iH1Dq9VlAzfQT0dZR+AlfA@public.gmane.org \ --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \ --cc=trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.