* blktest/rxe almost working @ 2021-09-02 21:41 Bob Pearson 2021-09-02 23:38 ` Jason Gunthorpe 0 siblings, 1 reply; 10+ messages in thread From: Bob Pearson @ 2021-09-02 21:41 UTC (permalink / raw) To: Bart Van Assche, Jason Gunthorpe, linux-rdma Now that for-next is on 5.14.0-rc6+ blktest srp/002 is very close to working for rxe but there is still one error. After adding MW support I added a test to local invalidate to check and see if the l/rkey matched the key actually contained in the MR/MW when local invalidate is called. This is failing for srp/002 with the key portion of the rkey off by one. Looking at ib_srp.c I see code that does in fact increment the rkey by one and also has code that posts a local invalidate. This was never checked before and is now failing to match. If I mask off the key portion in the test the whole test case passes so the other problems appear to have been fixed. If the increment and invalidate are out of sync this could result in the error. I suspect this may be a bug in srp. Worst case I can remove this test but I would rather not. Bob ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blktest/rxe almost working 2021-09-02 21:41 blktest/rxe almost working Bob Pearson @ 2021-09-02 23:38 ` Jason Gunthorpe 2021-09-03 22:18 ` Bob Pearson 0 siblings, 1 reply; 10+ messages in thread From: Jason Gunthorpe @ 2021-09-02 23:38 UTC (permalink / raw) To: Bob Pearson; +Cc: Bart Van Assche, linux-rdma On Thu, Sep 02, 2021 at 04:41:15PM -0500, Bob Pearson wrote: > Now that for-next is on 5.14.0-rc6+ blktest srp/002 is very close to > working for rxe but there is still one error. After adding MW > support I added a test to local invalidate to check and see if the > l/rkey matched the key actually contained in the MR/MW when local > invalidate is called. This is failing for srp/002 with the key > portion of the rkey off by one. Looking at ib_srp.c I see code that > does in fact increment the rkey by one and also has code that posts > a local invalidate. This was never checked before and is now failing > to match. If I mask off the key portion in the test the whole test > case passes so the other problems appear to have been fixed. If the > increment and invalidate are out of sync this could result in the > error. I suspect this may be a bug in srp. Worst case I can remove > this test but I would rather not. I didn't check the spec, but since SRP works with HW devices I wonder if invalidation is supposed to ignore the variant bits in the mkey? Jason ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blktest/rxe almost working 2021-09-02 23:38 ` Jason Gunthorpe @ 2021-09-03 22:18 ` Bob Pearson 2021-09-03 23:13 ` Bart Van Assche 0 siblings, 1 reply; 10+ messages in thread From: Bob Pearson @ 2021-09-03 22:18 UTC (permalink / raw) To: Jason Gunthorpe, Bart Van Assche; +Cc: linux-rdma On 9/2/21 6:38 PM, Jason Gunthorpe wrote: > On Thu, Sep 02, 2021 at 04:41:15PM -0500, Bob Pearson wrote: >> Now that for-next is on 5.14.0-rc6+ blktest srp/002 is very close to >> working for rxe but there is still one error. After adding MW >> support I added a test to local invalidate to check and see if the >> l/rkey matched the key actually contained in the MR/MW when local >> invalidate is called. This is failing for srp/002 with the key >> portion of the rkey off by one. Looking at ib_srp.c I see code that >> does in fact increment the rkey by one and also has code that posts >> a local invalidate. This was never checked before and is now failing >> to match. If I mask off the key portion in the test the whole test >> case passes so the other problems appear to have been fixed. If the >> increment and invalidate are out of sync this could result in the >> error. I suspect this may be a bug in srp. Worst case I can remove >> this test but I would rather not. > > I didn't check the spec, but since SRP works with HW devices I wonder > if invalidation is supposed to ignore the variant bits in the mkey? > > Jason > I am a little worried. srp is pretty complex but roughly it looks like it maintains a pool of MRs which it recycles. Each time it reuses the MR it increments the key portion of the rkey. Before that it uses local invalidate WRs to invalidate the MRs presumably to prevent stray accesses to the old version of the MR from e.g. replicated packets. It posts these WRs to a send queue but I don't see where it closes the loop by waiting for a WC so there may be a race between the invalidate and the subsequent map_sg call. The invalidate marks the MR as not usable so this must all happen before the MR is turned on again. Bob ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blktest/rxe almost working 2021-09-03 22:18 ` Bob Pearson @ 2021-09-03 23:13 ` Bart Van Assche 2021-09-04 22:30 ` Jason Gunthorpe 0 siblings, 1 reply; 10+ messages in thread From: Bart Van Assche @ 2021-09-03 23:13 UTC (permalink / raw) To: Bob Pearson, Jason Gunthorpe; +Cc: linux-rdma On 9/3/21 3:18 PM, Bob Pearson wrote: > On 9/2/21 6:38 PM, Jason Gunthorpe wrote: >> On Thu, Sep 02, 2021 at 04:41:15PM -0500, Bob Pearson wrote: >>> Now that for-next is on 5.14.0-rc6+ blktest srp/002 is very close to >>> working for rxe but there is still one error. After adding MW >>> support I added a test to local invalidate to check and see if the >>> l/rkey matched the key actually contained in the MR/MW when local >>> invalidate is called. This is failing for srp/002 with the key >>> portion of the rkey off by one. Looking at ib_srp.c I see code that >>> does in fact increment the rkey by one and also has code that posts >>> a local invalidate. This was never checked before and is now failing >>> to match. If I mask off the key portion in the test the whole test >>> case passes so the other problems appear to have been fixed. If the >>> increment and invalidate are out of sync this could result in the >>> error. I suspect this may be a bug in srp. Worst case I can remove >>> this test but I would rather not. >> >> I didn't check the spec, but since SRP works with HW devices I wonder >> if invalidation is supposed to ignore the variant bits in the mkey? > > I am a little worried. srp is pretty complex but roughly it looks like it maintains a pool of > MRs which it recycles. Each time it reuses the MR it increments the key portion of the rkey. Before > that it uses local invalidate WRs to invalidate the MRs presumably to prevent stray accesses > to the old version of the MR from e.g. replicated packets. It posts these WRs to a send queue but I > don't see where it closes the loop by waiting for a WC so there may be a race between the invalidate > and the subsequent map_sg call. The invalidate marks the MR as not usable so this must all happen > before the MR is turned on again. Hi Bob, If there would be any code in the SRP driver that is not compliant with the IBTA specification then I can fix it. Regarding the invalidate work requests submitted by the ib_srp driver: these are submitted before srp_fr_pool_put() is called. A new registration request is submitted after srp_fr_pool_get() succeeds. There is one MR pool per RDMA channel and there is one QP per RDMA channel. In other words, (re)registration requests are submitted to the same QP as unregistration requests after local invalidate requests. I think the IBTA requires does not allow to reorder a local invalidate followed by a fast registration request. Bart. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blktest/rxe almost working 2021-09-03 23:13 ` Bart Van Assche @ 2021-09-04 22:30 ` Jason Gunthorpe 2021-09-05 18:02 ` Bob Pearson 0 siblings, 1 reply; 10+ messages in thread From: Jason Gunthorpe @ 2021-09-04 22:30 UTC (permalink / raw) To: Bart Van Assche; +Cc: Bob Pearson, linux-rdma On Fri, Sep 03, 2021 at 04:13:22PM -0700, Bart Van Assche wrote: > On 9/3/21 3:18 PM, Bob Pearson wrote: > > On 9/2/21 6:38 PM, Jason Gunthorpe wrote: > > > On Thu, Sep 02, 2021 at 04:41:15PM -0500, Bob Pearson wrote: > > > > Now that for-next is on 5.14.0-rc6+ blktest srp/002 is very close to > > > > working for rxe but there is still one error. After adding MW > > > > support I added a test to local invalidate to check and see if the > > > > l/rkey matched the key actually contained in the MR/MW when local > > > > invalidate is called. This is failing for srp/002 with the key > > > > portion of the rkey off by one. Looking at ib_srp.c I see code that > > > > does in fact increment the rkey by one and also has code that posts > > > > a local invalidate. This was never checked before and is now failing > > > > to match. If I mask off the key portion in the test the whole test > > > > case passes so the other problems appear to have been fixed. If the > > > > increment and invalidate are out of sync this could result in the > > > > error. I suspect this may be a bug in srp. Worst case I can remove > > > > this test but I would rather not. > > > > > > I didn't check the spec, but since SRP works with HW devices I wonder > > > if invalidation is supposed to ignore the variant bits in the mkey? > > > > I am a little worried. srp is pretty complex but roughly it looks like it maintains a pool of > > MRs which it recycles. Each time it reuses the MR it increments the key portion of the rkey. Before > > that it uses local invalidate WRs to invalidate the MRs presumably to prevent stray accesses > > to the old version of the MR from e.g. replicated packets. It posts these WRs to a send queue but I > > don't see where it closes the loop by waiting for a WC so there may be a race between the invalidate > > and the subsequent map_sg call. The invalidate marks the MR as not usable so this must all happen > > before the MR is turned on again. > > Hi Bob, > > If there would be any code in the SRP driver that is not compliant with the > IBTA specification then I can fix it. > > Regarding the invalidate work requests submitted by the ib_srp driver: these > are submitted before srp_fr_pool_put() is called. A new registration request > is submitted after srp_fr_pool_get() succeeds. There is one MR pool per RDMA > channel and there is one QP per RDMA channel. In other words, > (re)registration requests are submitted to the same QP as unregistration > requests after local invalidate requests. I think the IBTA requires does not > allow to reorder a local invalidate followed by a fast registration request. Right Jason ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blktest/rxe almost working 2021-09-04 22:30 ` Jason Gunthorpe @ 2021-09-05 18:02 ` Bob Pearson 2021-09-07 12:01 ` Jason Gunthorpe 0 siblings, 1 reply; 10+ messages in thread From: Bob Pearson @ 2021-09-05 18:02 UTC (permalink / raw) To: Jason Gunthorpe, Bart Van Assche; +Cc: linux-rdma On 9/4/21 5:30 PM, Jason Gunthorpe wrote: > On Fri, Sep 03, 2021 at 04:13:22PM -0700, Bart Van Assche wrote: >> On 9/3/21 3:18 PM, Bob Pearson wrote: >>> On 9/2/21 6:38 PM, Jason Gunthorpe wrote: >>>> On Thu, Sep 02, 2021 at 04:41:15PM -0500, Bob Pearson wrote: >>>>> Now that for-next is on 5.14.0-rc6+ blktest srp/002 is very close to >>>>> working for rxe but there is still one error. After adding MW >>>>> support I added a test to local invalidate to check and see if the >>>>> l/rkey matched the key actually contained in the MR/MW when local >>>>> invalidate is called. This is failing for srp/002 with the key >>>>> portion of the rkey off by one. Looking at ib_srp.c I see code that >>>>> does in fact increment the rkey by one and also has code that posts >>>>> a local invalidate. This was never checked before and is now failing >>>>> to match. If I mask off the key portion in the test the whole test >>>>> case passes so the other problems appear to have been fixed. If the >>>>> increment and invalidate are out of sync this could result in the >>>>> error. I suspect this may be a bug in srp. Worst case I can remove >>>>> this test but I would rather not. >>>> >>>> I didn't check the spec, but since SRP works with HW devices I wonder >>>> if invalidation is supposed to ignore the variant bits in the mkey? >>> >>> I am a little worried. srp is pretty complex but roughly it looks like it maintains a pool of >>> MRs which it recycles. Each time it reuses the MR it increments the key portion of the rkey. Before >>> that it uses local invalidate WRs to invalidate the MRs presumably to prevent stray accesses >>> to the old version of the MR from e.g. replicated packets. It posts these WRs to a send queue but I >>> don't see where it closes the loop by waiting for a WC so there may be a race between the invalidate >>> and the subsequent map_sg call. The invalidate marks the MR as not usable so this must all happen >>> before the MR is turned on again. >> >> Hi Bob, >> >> If there would be any code in the SRP driver that is not compliant with the >> IBTA specification then I can fix it. >> >> Regarding the invalidate work requests submitted by the ib_srp driver: these >> are submitted before srp_fr_pool_put() is called. A new registration request >> is submitted after srp_fr_pool_get() succeeds. There is one MR pool per RDMA >> channel and there is one QP per RDMA channel. In other words, >> (re)registration requests are submitted to the same QP as unregistration >> requests after local invalidate requests. I think the IBTA requires does not >> allow to reorder a local invalidate followed by a fast registration request. > > Right > > Jason > srp_inv_rkey() wr = ... builds local invalidate WR wr.send_flags = 0 i.e. not signaled ib_post_send() posts the WR for delayed execution srp_unmap_data() srp_inv_rkey() schedules invalidate of each rkey in req srp_fr_pool_put() puts each desc entry on free list srp_map_finish_fr() ... misc checks not relevant desc = srp_fr_pool_get() returns desc from free list rkey = ib_inc_rkey() gets a new rkey one larger than the last one ib_update_fast_reg_key() immediately changes mr->rkey to new value ib_map_mr_sg() immediately updates buffer list in MR to new values wr = ... set WR to REG_MR work request not signaled wr.key = new rkey ib_post_send() wr is posted for delayed execution So as soon as the MR has had a WR posted to invalidate it the code goes ahead and adds it to the free list and then as soon as a new MR is gotten from the free list the rkey and mappings are changed and then a WR is posted to 'register' the MR which marks it as valid again. The register WR *also* resets the rkey which is redundant with the ib_update_fast_reg_key() call. All the work except for setting the state valid is done immediately regardless of the status of the completion of the previous invalidate and can complete before the MR is marked FREE. Because the WR is not signaled no one is checking the WC for these operations unless there is an error. The old code worked because the key part of the rkey wasn't checked for the invalidate. By changing the rkey before the mappings random stray old RDMA operations will fail because the rkey is not matching and not because the MR is not VALID. There is a theoretical risk here because the MR could be accessed through the new rkey with either the new or old mappings or a mixture while the MR is still VALID on the old mapping before the invalidate succeeds. Many years ago when I first learned IB verbs, the fast registration was actually done as a WR which posted an IO operation to update the mappings. The new API changed all that but still has little bit left in the WRs. Bob ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blktest/rxe almost working 2021-09-05 18:02 ` Bob Pearson @ 2021-09-07 12:01 ` Jason Gunthorpe 2021-09-07 16:35 ` Bob Pearson 0 siblings, 1 reply; 10+ messages in thread From: Jason Gunthorpe @ 2021-09-07 12:01 UTC (permalink / raw) To: Bob Pearson; +Cc: Bart Van Assche, linux-rdma On Sun, Sep 05, 2021 at 01:02:45PM -0500, Bob Pearson wrote: > On 9/4/21 5:30 PM, Jason Gunthorpe wrote: > > On Fri, Sep 03, 2021 at 04:13:22PM -0700, Bart Van Assche wrote: > >> On 9/3/21 3:18 PM, Bob Pearson wrote: > >>> On 9/2/21 6:38 PM, Jason Gunthorpe wrote: > >>>> On Thu, Sep 02, 2021 at 04:41:15PM -0500, Bob Pearson wrote: > >>>>> Now that for-next is on 5.14.0-rc6+ blktest srp/002 is very close to > >>>>> working for rxe but there is still one error. After adding MW > >>>>> support I added a test to local invalidate to check and see if the > >>>>> l/rkey matched the key actually contained in the MR/MW when local > >>>>> invalidate is called. This is failing for srp/002 with the key > >>>>> portion of the rkey off by one. Looking at ib_srp.c I see code that > >>>>> does in fact increment the rkey by one and also has code that posts > >>>>> a local invalidate. This was never checked before and is now failing > >>>>> to match. If I mask off the key portion in the test the whole test > >>>>> case passes so the other problems appear to have been fixed. If the > >>>>> increment and invalidate are out of sync this could result in the > >>>>> error. I suspect this may be a bug in srp. Worst case I can remove > >>>>> this test but I would rather not. > >>>> > >>>> I didn't check the spec, but since SRP works with HW devices I wonder > >>>> if invalidation is supposed to ignore the variant bits in the mkey? > >>> > >>> I am a little worried. srp is pretty complex but roughly it looks like it maintains a pool of > >>> MRs which it recycles. Each time it reuses the MR it increments the key portion of the rkey. Before > >>> that it uses local invalidate WRs to invalidate the MRs presumably to prevent stray accesses > >>> to the old version of the MR from e.g. replicated packets. It posts these WRs to a send queue but I > >>> don't see where it closes the loop by waiting for a WC so there may be a race between the invalidate > >>> and the subsequent map_sg call. The invalidate marks the MR as not usable so this must all happen > >>> before the MR is turned on again. > >> > >> Hi Bob, > >> > >> If there would be any code in the SRP driver that is not compliant with the > >> IBTA specification then I can fix it. > >> > >> Regarding the invalidate work requests submitted by the ib_srp driver: these > >> are submitted before srp_fr_pool_put() is called. A new registration request > >> is submitted after srp_fr_pool_get() succeeds. There is one MR pool per RDMA > >> channel and there is one QP per RDMA channel. In other words, > >> (re)registration requests are submitted to the same QP as unregistration > >> requests after local invalidate requests. I think the IBTA requires does not > >> allow to reorder a local invalidate followed by a fast registration request. > > > > Right > > > > Jason > > > > srp_inv_rkey() > wr = ... builds local invalidate WR > wr.send_flags = 0 i.e. not signaled > ib_post_send() posts the WR for delayed execution > > srp_unmap_data() > srp_inv_rkey() schedules invalidate of each rkey in req > srp_fr_pool_put() puts each desc entry on free list > > srp_map_finish_fr() > ... misc checks not relevant > desc = srp_fr_pool_get() returns desc from free list > rkey = ib_inc_rkey() gets a new rkey one larger than the last one > ib_update_fast_reg_key() immediately changes mr->rkey to new value > ib_map_mr_sg() immediately updates buffer list in MR to new values > wr = ... set WR to REG_MR work request not signaled > wr.key = new rkey > ib_post_send() wr is posted for delayed execution > > So as soon as the MR has had a WR posted to invalidate it the code goes ahead and adds it to the > free list and then as soon as a new MR is gotten from the free list the rkey and mappings are > changed and then a WR is posted to 'register' the MR which marks it as valid again. The register > WR *also* resets the rkey which is redundant with the ib_update_fast_reg_key() call. > > All the work except for setting the state valid is done immediately regardless of the status of the > completion of the previous invalidate and can complete before the MR is marked FREE. Because the WR > is not signaled no one is checking the WC for these operations unless there is an error. "HW" is not supposed to look at mr->rkey. "HW" has a hidden cache of mr->rkey which is manipulated through WQEs, and is then synchronous with the WQE stream as Bart said. So it sounds like the problem is rxe is crossing the HW and SW layers and checking the mr->rkey from HW logic instead of holding a 2nd HW specific value for HW to use. Jason ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blktest/rxe almost working 2021-09-07 12:01 ` Jason Gunthorpe @ 2021-09-07 16:35 ` Bob Pearson 2021-09-07 16:39 ` Jason Gunthorpe 0 siblings, 1 reply; 10+ messages in thread From: Bob Pearson @ 2021-09-07 16:35 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Bart Van Assche, linux-rdma On 9/7/21 7:01 AM, Jason Gunthorpe wrote: > On Sun, Sep 05, 2021 at 01:02:45PM -0500, Bob Pearson wrote: >> On 9/4/21 5:30 PM, Jason Gunthorpe wrote: >>> On Fri, Sep 03, 2021 at 04:13:22PM -0700, Bart Van Assche wrote: >>>> On 9/3/21 3:18 PM, Bob Pearson wrote: >>>>> On 9/2/21 6:38 PM, Jason Gunthorpe wrote: >>>>>> On Thu, Sep 02, 2021 at 04:41:15PM -0500, Bob Pearson wrote: >>>>>>> Now that for-next is on 5.14.0-rc6+ blktest srp/002 is very close to >>>>>>> working for rxe but there is still one error. After adding MW >>>>>>> support I added a test to local invalidate to check and see if the >>>>>>> l/rkey matched the key actually contained in the MR/MW when local >>>>>>> invalidate is called. This is failing for srp/002 with the key >>>>>>> portion of the rkey off by one. Looking at ib_srp.c I see code that >>>>>>> does in fact increment the rkey by one and also has code that posts >>>>>>> a local invalidate. This was never checked before and is now failing >>>>>>> to match. If I mask off the key portion in the test the whole test >>>>>>> case passes so the other problems appear to have been fixed. If the >>>>>>> increment and invalidate are out of sync this could result in the >>>>>>> error. I suspect this may be a bug in srp. Worst case I can remove >>>>>>> this test but I would rather not. >>>>>> >>>>>> I didn't check the spec, but since SRP works with HW devices I wonder >>>>>> if invalidation is supposed to ignore the variant bits in the mkey? >>>>> >>>>> I am a little worried. srp is pretty complex but roughly it looks like it maintains a pool of >>>>> MRs which it recycles. Each time it reuses the MR it increments the key portion of the rkey. Before >>>>> that it uses local invalidate WRs to invalidate the MRs presumably to prevent stray accesses >>>>> to the old version of the MR from e.g. replicated packets. It posts these WRs to a send queue but I >>>>> don't see where it closes the loop by waiting for a WC so there may be a race between the invalidate >>>>> and the subsequent map_sg call. The invalidate marks the MR as not usable so this must all happen >>>>> before the MR is turned on again. >>>> >>>> Hi Bob, >>>> >>>> If there would be any code in the SRP driver that is not compliant with the >>>> IBTA specification then I can fix it. >>>> >>>> Regarding the invalidate work requests submitted by the ib_srp driver: these >>>> are submitted before srp_fr_pool_put() is called. A new registration request >>>> is submitted after srp_fr_pool_get() succeeds. There is one MR pool per RDMA >>>> channel and there is one QP per RDMA channel. In other words, >>>> (re)registration requests are submitted to the same QP as unregistration >>>> requests after local invalidate requests. I think the IBTA requires does not >>>> allow to reorder a local invalidate followed by a fast registration request. >>> >>> Right >>> >>> Jason >>> >> >> srp_inv_rkey() >> wr = ... builds local invalidate WR >> wr.send_flags = 0 i.e. not signaled >> ib_post_send() posts the WR for delayed execution >> >> srp_unmap_data() >> srp_inv_rkey() schedules invalidate of each rkey in req >> srp_fr_pool_put() puts each desc entry on free list >> >> srp_map_finish_fr() >> ... misc checks not relevant >> desc = srp_fr_pool_get() returns desc from free list >> rkey = ib_inc_rkey() gets a new rkey one larger than the last one >> ib_update_fast_reg_key() immediately changes mr->rkey to new value >> ib_map_mr_sg() immediately updates buffer list in MR to new values >> wr = ... set WR to REG_MR work request not signaled >> wr.key = new rkey >> ib_post_send() wr is posted for delayed execution >> >> So as soon as the MR has had a WR posted to invalidate it the code goes ahead and adds it to the >> free list and then as soon as a new MR is gotten from the free list the rkey and mappings are >> changed and then a WR is posted to 'register' the MR which marks it as valid again. The register >> WR *also* resets the rkey which is redundant with the ib_update_fast_reg_key() call. >> >> All the work except for setting the state valid is done immediately regardless of the status of the >> completion of the previous invalidate and can complete before the MR is marked FREE. Because the WR >> is not signaled no one is checking the WC for these operations unless there is an error. > > "HW" is not supposed to look at mr->rkey. > > "HW" has a hidden cache of mr->rkey which is manipulated through > WQEs, and is then synchronous with the WQE stream as Bart said. > > So it sounds like the problem is rxe is crossing the HW and SW layers > and checking the mr->rkey from HW logic instead of holding a 2nd HW > specific value for HW to use. > > Jason > Interesting. But if that is the case the bigger problem is the ib_map_mr_sg() call which updates the mapping. rxe definitely does look at the mr->rkey value but we could fix that. It also looks at the mapping which is updated by ib_map_mr_sg(). My impression is that HW also uses this mapping or does HW also copy all the FMRs into SRAM? By not closing the loop on the invalidate by looking at the CQE the srp driver exposes the MR with changing mappings to the new values through either the old or new rkey depending on whether you cache the rkey. There is a suggestive comment in ib_verbs.h /* * Kernel users should universally support relaxed ordering (RO), as * they are designed to read data only after observing the CQE and use * the DMA API correctly. * * Some drivers implicitly enable RO if platform supports it. */ int (*map_mr_sg)(struct ib_mr *mr, struct scatterlist *sg, int sg_nents, unsigned int *sg_offset); There seems to be an assumption that users will be looking at CQE. Bob ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blktest/rxe almost working 2021-09-07 16:35 ` Bob Pearson @ 2021-09-07 16:39 ` Jason Gunthorpe 2021-09-07 16:47 ` Bob Pearson 0 siblings, 1 reply; 10+ messages in thread From: Jason Gunthorpe @ 2021-09-07 16:39 UTC (permalink / raw) To: Bob Pearson; +Cc: Bart Van Assche, linux-rdma On Tue, Sep 07, 2021 at 11:35:17AM -0500, Bob Pearson wrote: > Interesting. But if that is the case the bigger problem is the ib_map_mr_sg() call which updates the > mapping. rxe definitely does look at the mr->rkey value but we could fix that. It also looks at the > mapping which is updated by ib_map_mr_sg(). My impression is that HW also uses this mapping or does > HW also copy all the FMRs into SRAM? Yes, real HW has a copy of the DMA list. The sg in the mr struct is for CPU use only. It is not OK to use the CPU SG list inside the MR for DMA by HW, it has to be synchronized with the WR. > There seems to be an assumption that users will be looking at CQE. Yes, the kernel has to be driven by CQE, not only for data transfer but the DMA unmap of the SGL cannot be until after the invalidation CQE is observed. Ie the CPU should have two DMA lists active during the invalidation cycle. Jason ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blktest/rxe almost working 2021-09-07 16:39 ` Jason Gunthorpe @ 2021-09-07 16:47 ` Bob Pearson 0 siblings, 0 replies; 10+ messages in thread From: Bob Pearson @ 2021-09-07 16:47 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Bart Van Assche, linux-rdma On 9/7/21 11:39 AM, Jason Gunthorpe wrote: > On Tue, Sep 07, 2021 at 11:35:17AM -0500, Bob Pearson wrote: > >> Interesting. But if that is the case the bigger problem is the ib_map_mr_sg() call which updates the >> mapping. rxe definitely does look at the mr->rkey value but we could fix that. It also looks at the >> mapping which is updated by ib_map_mr_sg(). My impression is that HW also uses this mapping or does >> HW also copy all the FMRs into SRAM? > > Yes, real HW has a copy of the DMA list. The sg in the mr struct is > for CPU use only. > > It is not OK to use the CPU SG list inside the MR for DMA by HW, it > has to be synchronized with the WR. > >> There seems to be an assumption that users will be looking at CQE. > > Yes, the kernel has to be driven by CQE, not only for data transfer > but the DMA unmap of the SGL cannot be until after the invalidation > CQE is observed. > > Ie the CPU should have two DMA lists active during the invalidation > cycle. > > Jason > OK. Not 100% sure what that implies for SRP. SRP does *not* look at the CQE for invalidate and register WQEs. I can fix the rkey and DMA list semantics, making a copy of the list which is installed by the register WQE. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-09-07 16:48 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-02 21:41 blktest/rxe almost working Bob Pearson 2021-09-02 23:38 ` Jason Gunthorpe 2021-09-03 22:18 ` Bob Pearson 2021-09-03 23:13 ` Bart Van Assche 2021-09-04 22:30 ` Jason Gunthorpe 2021-09-05 18:02 ` Bob Pearson 2021-09-07 12:01 ` Jason Gunthorpe 2021-09-07 16:35 ` Bob Pearson 2021-09-07 16:39 ` Jason Gunthorpe 2021-09-07 16:47 ` Bob Pearson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).