All of lore.kernel.org
 help / color / mirror / Atom feed
* Mellanox target workaround in SRP
@ 2011-01-07 22:35 David Dillow
       [not found] ` <1294439717.6219.54.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: David Dillow @ 2011-01-07 22:35 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5
  Cc: mst-VPRAkNaXOzVS1MOuV/RT9w, ishai-VPRAkNaXOzVS1MOuV/RT9w, Roland Dreier

I have question regarding workaround introduced in commit 559ce8f1 of
the mainline tree:

    IB/srp: Work around data corruption bug on Mellanox targets
    
    Data corruption has been seen with Mellanox SRP targets when FMRs
    create a memory region with I/O virtual address != 0.  Add a
    workaround that disables FMR merging for Mellanox targets (OUI 0002c9).

I don't see how this can make a difference to the target -- it sees an
address and length, and there should be no visible difference to it when
it gets an FMR versus a direct-mapped region of the same space, right?
And how is it different than getting a direct or indirect descriptor
with a similar offset?

I could see there being a bug on the initiator HCA not liking such FMR
mappings, but then it should be keyed off of the vendor of our HCA and
not the target.

I'm sure this was tested and shown to fix the problem; I'm just confused
as to what the problem really was and if this is still relevant. Can
someone please enlighten me?
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Mellanox target workaround in SRP
       [not found] ` <1294439717.6219.54.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
@ 2011-01-08  4:05   ` Roland Dreier
       [not found]     ` <adaipy09h0i.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
  2011-01-10 18:51   ` Roland Dreier
  1 sibling, 1 reply; 14+ messages in thread
From: Roland Dreier @ 2011-01-08  4:05 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, mst-VPRAkNaXOzVS1MOuV/RT9w,
	ishai-VPRAkNaXOzVS1MOuV/RT9w

 > I'm sure this was tested and shown to fix the problem; I'm just confused
 > as to what the problem really was and if this is still relevant. Can
 > someone please enlighten me?

At this point I'm afraid it's all lost in the mists of time, but the
original patch seems to have come from

http://lists.openfabrics.org/pipermail/general/2006-July/024322.html

looking at the patch, I would guess that the corruption occurred when
the target got an IO request that started at a non-page-aligned address
but that spanned more than one page.

I don't know if the target was ever fixed, or whether that target code
has any relevance today.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Mellanox target workaround in SRP
       [not found]     ` <adaipy09h0i.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
@ 2011-01-08 18:13       ` David Dillow
       [not found]         ` <1294510396.7914.82.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: David Dillow @ 2011-01-08 18:13 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5,
	ishai-VPRAkNaXOzVS1MOuV/RT9w

On Fri, 2011-01-07 at 20:05 -0800, Roland Dreier wrote:
> > I'm sure this was tested and shown to fix the problem; I'm just confused
>  > as to what the problem really was and if this is still relevant. Can
>  > someone please enlighten me?
> 
> At this point I'm afraid it's all lost in the mists of time,

Yep, that's my fear. And since it is a corruption bug, I've got to tread
lightly in this area. :/

> looking at the patch, I would guess that the corruption occurred when
> the target got an IO request that started at a non-page-aligned address
> but that spanned more than one page.

That's my thought as well, but then I'm not sure this really solved
their problem. It may be more likely to occur in the FMR case, but the
initiator enables clustering, so blk_rq_map_sg() could generate the same
kinds of requests for both direct and indirect descriptors, even without
FMR. This looks to have been true since the initiator was added to the
kernel, though it is possible I'm misreading the code.

> I don't know if the target was ever fixed, or whether that target code
> has any relevance today.

Here's hoping someone from Mellanox can shed some light.

Thanks!
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Mellanox target workaround in SRP
       [not found]         ` <1294510396.7914.82.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2011-01-10 18:21           ` Vu Pham
       [not found]             ` <4D2B4E13.6070903-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Vu Pham @ 2011-01-10 18:21 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, Ishai Rabinovitz



David Dillow wrote:
> On Fri, 2011-01-07 at 20:05 -0800, Roland Dreier wrote:
>> > I'm sure this was tested and shown to fix the problem; I'm just confused
>>  > as to what the problem really was and if this is still relevant. Can
>>  > someone please enlighten me?
>>
>> At this point I'm afraid it's all lost in the mists of time,
> 
> Yep, that's my fear. And since it is a corruption bug, I've got to tread
> lightly in this area. :/
>

I don't recall to discuss or review this patch with Michael Tsirkin when he summited the patch.


>> looking at the patch, I would guess that the corruption occurred when
>> the target got an IO request that started at a non-page-aligned address
>> but that spanned more than one page.
> 
> That's my thought as well, but then I'm not sure this really solved
> their problem. It may be more likely to occur in the FMR case, but the
> initiator enables clustering, so blk_rq_map_sg() could generate the same
> kinds of requests for both direct and indirect descriptors, even without
> FMR. This looks to have been true since the initiator was added to the
> kernel, though it is possible I'm misreading the code.
> 
>> I don't know if the target was ever fixed, or whether that target code
>> has any relevance today.
> 
> Here's hoping someone from Mellanox can shed some light.


I think that the patch is specific for srp initiator using Mellanox FMR. It tried to avoid indirect desc with Mellanox FMR having first-byte-offset != 0.
Since the low level implementation of mlx4/mthca_map_phys_fmr() did not create + setup MPT for FMR with first_byte_offset != 0. The corruption can happen with any target.

-vu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ewg] Mellanox target workaround in SRP
       [not found]             ` <4D2B4E13.6070903-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2011-01-10 18:49               ` Roland Dreier
       [not found]                 ` <adapqs48uhm.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
  2011-01-10 19:02               ` David Dillow
  1 sibling, 1 reply; 14+ messages in thread
From: Roland Dreier @ 2011-01-10 18:49 UTC (permalink / raw)
  To: Vu Pham
  Cc: David Dillow, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, Ishai Rabinovitz

 > I think that the patch is specific for srp initiator using Mellanox
 > FMR. It tried to avoid indirect desc with Mellanox FMR having
 > first-byte-offset != 0.  Since the low level implementation of
 > mlx4/mthca_map_phys_fmr() did not create + setup MPT for FMR with
 > first_byte_offset != 0. The corruption can happen with any target.

I don't think this could be right -- right now the workaround only
triggers if the target has a Mellanox OUI, so if what you say is true,
presumably everyone who is using the SRP initiator with mlx4 would be
seeing this problem.

Also, the SRP initiator code that uses ib_fmr_pool_map_phys does not
pass in any non-aligned addresses -- it doesn't try to use any first
byte offset, it just uses the virtual address it passes to the target to
handle the offset.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Mellanox target workaround in SRP
       [not found] ` <1294439717.6219.54.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
  2011-01-08  4:05   ` Roland Dreier
@ 2011-01-10 18:51   ` Roland Dreier
       [not found]     ` <adaipxw8ue6.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: Roland Dreier @ 2011-01-10 18:51 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, mst-H+wXaHxf7aLQT0dZR+AlfA,
	ishai-VPRAkNaXOzVS1MOuV/RT9w

Maybe we can use MST's current email to ask him... Michael, do you have
any memory of the issue we worked around here?

 > I have question regarding workaround introduced in commit 559ce8f1 of
 > the mainline tree:
 > 
 >     IB/srp: Work around data corruption bug on Mellanox targets
 >     
 >     Data corruption has been seen with Mellanox SRP targets when FMRs
 >     create a memory region with I/O virtual address != 0.  Add a
 >     workaround that disables FMR merging for Mellanox targets (OUI 0002c9).
 > 
 > I don't see how this can make a difference to the target -- it sees an
 > address and length, and there should be no visible difference to it when
 > it gets an FMR versus a direct-mapped region of the same space, right?
 > And how is it different than getting a direct or indirect descriptor
 > with a similar offset?
 > 
 > I could see there being a bug on the initiator HCA not liking such FMR
 > mappings, but then it should be keyed off of the vendor of our HCA and
 > not the target.
 > 
 > I'm sure this was tested and shown to fix the problem; I'm just confused
 > as to what the problem really was and if this is still relevant. Can
 > someone please enlighten me?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ewg] Mellanox target workaround in SRP
       [not found]             ` <4D2B4E13.6070903-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2011-01-10 18:49               ` [ewg] " Roland Dreier
@ 2011-01-10 19:02               ` David Dillow
       [not found]                 ` <1294686163.3038.12.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: David Dillow @ 2011-01-10 19:02 UTC (permalink / raw)
  To: Vu Pham
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, Ishai Rabinovitz

On Mon, 2011-01-10 at 10:21 -0800, Vu Pham wrote:
> David Dillow wrote:
> > On Fri, 2011-01-07 at 20:05 -0800, Roland Dreier wrote:
> >> looking at the patch, I would guess that the corruption occurred when
> >> the target got an IO request that started at a non-page-aligned address
> >> but that spanned more than one page.
[snip]
> > Here's hoping someone from Mellanox can shed some light.
> 
> 
> I think that the patch is specific for srp initiator using Mellanox
> FMR. It tried to avoid indirect desc with Mellanox FMR having
> first-byte-offset != 0.
> Since the low level implementation of mlx4/mthca_map_phys_fmr() did
> not create + setup MPT for FMR with first_byte_offset != 0. The
> corruption can happen with any target.

Thanks for taking a look Vu -- but I'm not sure that is the problem,
either. The SRP FMR mapping code is careful to mask the SG address with
the FMR page mask, so we should never ask the HCA to map a page with the
first_byte_offset != 0. Instead, we tell the target to request an IO
virtual address appropriately offset into the first page of the FMR.

Or perhaps I misunderstood you, and it's the non-zero first byte offset
in the RDMA command on the wire that is the issue, and not the FMR setup
in the initiator? And it only affects FMR-mapped memory, not the
kernel's MR?

Thanks again,
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ewg] Mellanox target workaround in SRP
       [not found]                 ` <adapqs48uhm.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
@ 2011-01-10 19:05                   ` David Dillow
  2011-01-10 19:49                   ` Vu Pham
  1 sibling, 0 replies; 14+ messages in thread
From: David Dillow @ 2011-01-10 19:05 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Vu Pham, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, Ishai Rabinovitz

On Mon, 2011-01-10 at 10:49 -0800, Roland Dreier wrote:
> > I think that the patch is specific for srp initiator using Mellanox
>  > FMR. It tried to avoid indirect desc with Mellanox FMR having
>  > first-byte-offset != 0.  Since the low level implementation of
>  > mlx4/mthca_map_phys_fmr() did not create + setup MPT for FMR with
>  > first_byte_offset != 0. The corruption can happen with any target.
> 
> I don't think this could be right -- right now the workaround only
> triggers if the target has a Mellanox OUI, so if what you say is true,
> presumably everyone who is using the SRP initiator with mlx4 would be
> seeing this problem.

Only if they are using less than 4 KB logical blocks it seems, or
perhaps certain sg3-util commands to directly send SCSI commands. We
should be getting page-aligned IO for 4 KB filesystems, so we probably
wouldn't hit this.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ewg] Mellanox target workaround in SRP
       [not found]                 ` <adapqs48uhm.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
  2011-01-10 19:05                   ` David Dillow
@ 2011-01-10 19:49                   ` Vu Pham
  1 sibling, 0 replies; 14+ messages in thread
From: Vu Pham @ 2011-01-10 19:49 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, Ishai Rabinovitz



Roland Dreier wrote:
>  > I think that the patch is specific for srp initiator using Mellanox
>  > FMR. It tried to avoid indirect desc with Mellanox FMR having
>  > first-byte-offset != 0.  Since the low level implementation of
>  > mlx4/mthca_map_phys_fmr() did not create + setup MPT for FMR with
>  > first_byte_offset != 0. The corruption can happen with any target.
> 
> I don't think this could be right -- right now the workaround only
> triggers if the target has a Mellanox OUI, so if what you say is true,
> presumably everyone who is using the SRP initiator with mlx4 would be
> seeing this problem.

Yes, I'm afraid targets without Mellanox OUI would be seeing this problem.

> 
> Also, the SRP initiator code that uses ib_fmr_pool_map_phys does not
> pass in any non-aligned addresses -- it doesn't try to use any first
> byte offset, it just uses the virtual address it passes to the target to
> handle the offset.
> 


Yes and I suspect that the corruption happen with Mellanox FMR/MPT setup without fbo and target doing RDMA with offset vaddr.

Let me ask around and confirm.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ewg] Mellanox target workaround in SRP
       [not found]                 ` <1294686163.3038.12.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2011-01-10 19:58                   ` Vu Pham
       [not found]                     ` <4D2B64CA.6040609-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Vu Pham @ 2011-01-10 19:58 UTC (permalink / raw)
  To: David Dillow
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, Ishai Rabinovitz



David Dillow wrote:
> On Mon, 2011-01-10 at 10:21 -0800, Vu Pham wrote:
>> David Dillow wrote:
>>> On Fri, 2011-01-07 at 20:05 -0800, Roland Dreier wrote:
>>>> looking at the patch, I would guess that the corruption occurred when
>>>> the target got an IO request that started at a non-page-aligned address
>>>> but that spanned more than one page.
> [snip]
>>> Here's hoping someone from Mellanox can shed some light.
>>
>> I think that the patch is specific for srp initiator using Mellanox
>> FMR. It tried to avoid indirect desc with Mellanox FMR having
>> first-byte-offset != 0.
>> Since the low level implementation of mlx4/mthca_map_phys_fmr() did
>> not create + setup MPT for FMR with first_byte_offset != 0. The
>> corruption can happen with any target.
> 
> Thanks for taking a look Vu --

Thanks for taking ownership of srp :)

 but I'm not sure that is the problem,
> either. The SRP FMR mapping code is careful to mask the SG address with
> the FMR page mask, so we should never ask the HCA to map a page with the
> first_byte_offset != 0. Instead, we tell the target to request an IO
> virtual address appropriately offset into the first page of the FMR.
> 
> Or perhaps I misunderstood you, and it's the non-zero first byte offset
> in the RDMA command on the wire that is the issue, and not the FMR setup
> in the initiator? And it only affects FMR-mapped memory, not the
> kernel's MR?
> 

It's not the kernel's MR.

I suspect that the corruption happen with *only* Mellanox FMR + MPT setup without fbo and target doing RDMA with offset vaddr.

I need to ask internal hw/fw guys and confirm if it's true.

-vu


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Mellanox target workaround in SRP
       [not found]     ` <adaipxw8ue6.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
@ 2011-01-12 13:40       ` Michael S. Tsirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2011-01-12 13:40 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5,
	ishai-VPRAkNaXOzVS1MOuV/RT9w

On Mon, Jan 10, 2011 at 10:51:13AM -0800, Roland Dreier wrote:
> Maybe we can use MST's current email to ask him... Michael, do you have
> any memory of the issue we worked around here?
> 
>  > I have question regarding workaround introduced in commit 559ce8f1 of
>  > the mainline tree:
>  > 
>  >     IB/srp: Work around data corruption bug on Mellanox targets
>  >     
>  >     Data corruption has been seen with Mellanox SRP targets when FMRs
>  >     create a memory region with I/O virtual address != 0.  Add a
>  >     workaround that disables FMR merging for Mellanox targets (OUI 0002c9).
>  > 
>  > I don't see how this can make a difference to the target -- it sees an
>  > address and length, and there should be no visible difference to it when
>  > it gets an FMR versus a direct-mapped region of the same space, right?
>  > And how is it different than getting a direct or indirect descriptor
>  > with a similar offset?
>  > 
>  > I could see there being a bug on the initiator HCA not liking such FMR
>  > mappings, but then it should be keyed off of the vendor of our HCA and
>  > not the target.
>  > 
>  > I'm sure this was tested and shown to fix the problem; I'm just confused
>  > as to what the problem really was and if this is still relevant. Can
>  > someone please enlighten me?


I don't recall unfortunately. Sorry.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ewg] Mellanox target workaround in SRP
       [not found]                     ` <4D2B64CA.6040609-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2011-01-17  4:50                       ` David Dillow
       [not found]                         ` <1295239821.3051.1.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: David Dillow @ 2011-01-17  4:50 UTC (permalink / raw)
  To: Vu Pham
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, Ishai Rabinovitz

On Mon, 2011-01-10 at 11:58 -0800, Vu Pham wrote:
> David Dillow wrote:
> > either. The SRP FMR mapping code is careful to mask the SG address with
> > the FMR page mask, so we should never ask the HCA to map a page with the
> > first_byte_offset != 0. Instead, we tell the target to request an IO
> > virtual address appropriately offset into the first page of the FMR.
> > 
> > Or perhaps I misunderstood you, and it's the non-zero first byte offset
> > in the RDMA command on the wire that is the issue, and not the FMR setup
> > in the initiator? And it only affects FMR-mapped memory, not the
> > kernel's MR?
> 
> It's not the kernel's MR.
> 
> I suspect that the corruption happen with *only* Mellanox FMR + MPT
> setup without fbo and target doing RDMA with offset vaddr.
> 
> I need to ask internal hw/fw guys and confirm if it's true.

Have you had any response from the HW/FW guys?

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ewg] Mellanox target workaround in SRP
       [not found]                         ` <1295239821.3051.1.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2011-01-18 19:53                           ` Vu Pham
       [not found]                             ` <4D35EF9C.3050609-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Vu Pham @ 2011-01-18 19:53 UTC (permalink / raw)
  To: David Dillow
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, Ishai Rabinovitz



David Dillow wrote:
> On Mon, 2011-01-10 at 11:58 -0800, Vu Pham wrote:
>> David Dillow wrote:
>> > either. The SRP FMR mapping code is careful to mask the SG address with
>> > the FMR page mask, so we should never ask the HCA to map a page with the
>> > first_byte_offset != 0. Instead, we tell the target to request an IO
>> > virtual address appropriately offset into the first page of the FMR.
>> >
>> > Or perhaps I misunderstood you, and it's the non-zero first byte offset
>> > in the RDMA command on the wire that is the issue, and not the FMR setup
>> > in the initiator? And it only affects FMR-mapped memory, not the
>> > kernel's MR?
>>
>> It's not the kernel's MR.
>>
>> I suspect that the corruption happen with *only* Mellanox FMR + MPT
>> setup without fbo and target doing RDMA with offset vaddr.
>>
>> I need to ask internal hw/fw guys and confirm if it's true.
> 
> Have you had any response from the HW/FW guys?
> 

Sorry for late response.

Our hw/fw guys confirm that there is no problem, my suspect is wrong.

To explain clearly how hw translate from remote rdma address to physical address in fmr's MTT

X = requested/rdma_va - MPT.start + MPT.fbo
MTT index = X / MPT.blocksize
MTT offset = X % MPT.blocksize
PA = MTT[index] + MTT offset

MPT - memory protection table
MTT - memory translation table

-vu
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ewg] Mellanox target workaround in SRP
       [not found]                             ` <4D35EF9C.3050609-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2011-01-18 20:05                               ` David Dillow
  0 siblings, 0 replies; 14+ messages in thread
From: David Dillow @ 2011-01-18 20:05 UTC (permalink / raw)
  To: Vu Pham
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, Ishai Rabinovitz

On Tue, 2011-01-18 at 11:53 -0800, Vu Pham wrote:
> Our hw/fw guys confirm that there is no problem, my suspect is wrong.
> 
> To explain clearly how hw translate from remote rdma address to physical address in fmr's MTT
> 
> X = requested/rdma_va - MPT.start + MPT.fbo
> MTT index = X / MPT.blocksize
> MTT offset = X % MPT.blocksize
> PA = MTT[index] + MTT offset
> 
> MPT - memory protection table
> MTT - memory translation table

Thanks for following up... unfortunately, we're still in the dark as to
the bug this was intended to solve. :/

And thanks for the information about how the mapping works, that's good
to know.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-01-18 20:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-07 22:35 Mellanox target workaround in SRP David Dillow
     [not found] ` <1294439717.6219.54.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2011-01-08  4:05   ` Roland Dreier
     [not found]     ` <adaipy09h0i.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2011-01-08 18:13       ` David Dillow
     [not found]         ` <1294510396.7914.82.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-01-10 18:21           ` Vu Pham
     [not found]             ` <4D2B4E13.6070903-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2011-01-10 18:49               ` [ewg] " Roland Dreier
     [not found]                 ` <adapqs48uhm.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2011-01-10 19:05                   ` David Dillow
2011-01-10 19:49                   ` Vu Pham
2011-01-10 19:02               ` David Dillow
     [not found]                 ` <1294686163.3038.12.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-01-10 19:58                   ` Vu Pham
     [not found]                     ` <4D2B64CA.6040609-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2011-01-17  4:50                       ` David Dillow
     [not found]                         ` <1295239821.3051.1.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-01-18 19:53                           ` Vu Pham
     [not found]                             ` <4D35EF9C.3050609-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2011-01-18 20:05                               ` David Dillow
2011-01-10 18:51   ` Roland Dreier
     [not found]     ` <adaipxw8ue6.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2011-01-12 13:40       ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.