linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] IB/rxe: Don't clamp residual length to mtu
@ 2017-04-06 12:49 Johannes Thumshirn
  2017-04-13 12:00 ` Johannes Thumshirn
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Johannes Thumshirn @ 2017-04-06 12:49 UTC (permalink / raw)
  To: Moni Shoua, Doug Ledford, Sean Hefty, Hal Rosenstock
  Cc: Linux Kernel Mailinglist, linux-rdma, Johannes Thumshirn,
	Hannes Reinecke, Sagi Grimberg, Max Gurtovoy

When reading a RDMA WRITE FIRST packet we copy the DMA length from the RDMA
header into the qp->resp.resid variable for later use. Later in check_rkey()
we clamp it to the MTU if the packet is an  RDMA WRITE packet and has a
residual length bigger than the MTU. Later in write_data_in() we subtract the
payload of the packet from the residual length. If the packet happens to have a
payload of exactly the MTU size we end up with a residual length of 0 despite
the packet not being the last in the conversation. When the next packet in the
conversation arrives, we don't have any residual length left and thus set the QP
into an error state.

This broke NVMe over Fabrics functionality over rdma_rxe.ko

The patch was verified using the following test.

 # echo eth0 > /sys/module/rdma_rxe/parameters/add
 # nvme connect -t rdma -a 192.168.155.101 -s 1023 -n nvmf-test
 # mkfs.xfs -fK /dev/nvme0n1
 meta-data=/dev/nvme0n1           isize=256    agcount=4, agsize=65536 blks
          =                       sectsz=4096  attr=2, projid32bit=1
          =                       crc=0        finobt=0, sparse=0
 data     =                       bsize=4096   blocks=262144, imaxpct=25
          =                       sunit=0      swidth=0 blks
 naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
 log      =internal log           bsize=4096   blocks=2560, version=2
          =                       sectsz=4096  sunit=1 blks, lazy-count=1
 realtime =none                   extsz=4096   blocks=0, rtextents=0
 # mount /dev/nvme0n1 /tmp/
 [  148.923263] XFS (nvme0n1): Mounting V4 Filesystem
 [  148.961196] XFS (nvme0n1): Ending clean mount
 # dd if=/dev/urandom of=test.bin bs=1M count=128
 128+0 records in
 128+0 records out
 134217728 bytes (134 MB, 128 MiB) copied, 0.437991 s, 306 MB/s
 # sha256sum test.bin
 cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  test.bin
 # cp test.bin /tmp/
 sha256sum /tmp/test.bin
 cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  /tmp/test.bin

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index c9dd385..58764df 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -478,8 +478,6 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 				state = RESPST_ERR_LENGTH;
 				goto err;
 			}
-
-			qp->resp.resid = mtu;
 		} else {
 			if (pktlen != resid) {
 				state = RESPST_ERR_LENGTH;
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] IB/rxe: Don't clamp residual length to mtu
  2017-04-06 12:49 [PATCH] IB/rxe: Don't clamp residual length to mtu Johannes Thumshirn
@ 2017-04-13 12:00 ` Johannes Thumshirn
  2017-04-13 12:22   ` Leon Romanovsky
  2017-04-13 14:12 ` Moni Shoua
  2017-04-25  7:29 ` Johannes Thumshirn
  2 siblings, 1 reply; 7+ messages in thread
From: Johannes Thumshirn @ 2017-04-13 12:00 UTC (permalink / raw)
  To: Moni Shoua, Sagi Grimberg, Max Gurtovoy
  Cc: Linux Kernel Mailinglist, linux-rdma, Hannes Reinecke

On Thu, Apr 06, 2017 at 02:49:44PM +0200, Johannes Thumshirn wrote:
> When reading a RDMA WRITE FIRST packet we copy the DMA length from the RDMA
> header into the qp->resp.resid variable for later use. Later in check_rkey()
> we clamp it to the MTU if the packet is an  RDMA WRITE packet and has a
> residual length bigger than the MTU. Later in write_data_in() we subtract the
> payload of the packet from the residual length. If the packet happens to have a
> payload of exactly the MTU size we end up with a residual length of 0 despite
> the packet not being the last in the conversation. When the next packet in the
> conversation arrives, we don't have any residual length left and thus set the QP

Hi Moni, Sagi and Max,

Any comments on this?

Thanks,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] IB/rxe: Don't clamp residual length to mtu
  2017-04-13 12:00 ` Johannes Thumshirn
@ 2017-04-13 12:22   ` Leon Romanovsky
  2017-04-13 12:29     ` Johannes Thumshirn
  0 siblings, 1 reply; 7+ messages in thread
From: Leon Romanovsky @ 2017-04-13 12:22 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Moni Shoua, Sagi Grimberg, Max Gurtovoy,
	Linux Kernel Mailinglist, linux-rdma, Hannes Reinecke

[-- Attachment #1: Type: text/plain, Size: 1627 bytes --]

On Thu, Apr 13, 2017 at 02:00:00PM +0200, Johannes Thumshirn wrote:
> On Thu, Apr 06, 2017 at 02:49:44PM +0200, Johannes Thumshirn wrote:
> > When reading a RDMA WRITE FIRST packet we copy the DMA length from the RDMA
> > header into the qp->resp.resid variable for later use. Later in check_rkey()
> > we clamp it to the MTU if the packet is an  RDMA WRITE packet and has a
> > residual length bigger than the MTU. Later in write_data_in() we subtract the
> > payload of the packet from the residual length. If the packet happens to have a
> > payload of exactly the MTU size we end up with a residual length of 0 despite
> > the packet not being the last in the conversation. When the next packet in the
> > conversation arrives, we don't have any residual length left and thus set the QP
>
> Hi Moni, Sagi and Max,
>
> Any comments on this?

Hi,

I'm sorry, it looks like this patch failed on the floor. We have
Passover vacation these days and many people are OOO now.

But anyway, I'll remind to Moni.

Thanks

>
> Thanks,
> 	Johannes
>
> --
> Johannes Thumshirn                                          Storage
> jthumshirn@suse.de                                +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] IB/rxe: Don't clamp residual length to mtu
  2017-04-13 12:22   ` Leon Romanovsky
@ 2017-04-13 12:29     ` Johannes Thumshirn
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Thumshirn @ 2017-04-13 12:29 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Moni Shoua, Sagi Grimberg, Max Gurtovoy,
	Linux Kernel Mailinglist, linux-rdma, Hannes Reinecke

On Thu, Apr 13, 2017 at 03:22:00PM +0300, Leon Romanovsky wrote:
> On Thu, Apr 13, 2017 at 02:00:00PM +0200, Johannes Thumshirn wrote:
> > On Thu, Apr 06, 2017 at 02:49:44PM +0200, Johannes Thumshirn wrote:
> > > When reading a RDMA WRITE FIRST packet we copy the DMA length from the RDMA
> > > header into the qp->resp.resid variable for later use. Later in check_rkey()
> > > we clamp it to the MTU if the packet is an  RDMA WRITE packet and has a
> > > residual length bigger than the MTU. Later in write_data_in() we subtract the
> > > payload of the packet from the residual length. If the packet happens to have a
> > > payload of exactly the MTU size we end up with a residual length of 0 despite
> > > the packet not being the last in the conversation. When the next packet in the
> > > conversation arrives, we don't have any residual length left and thus set the QP
> >
> > Hi Moni, Sagi and Max,
> >
> > Any comments on this?
> 
> Hi,
> 
> I'm sorry, it looks like this patch failed on the floor. We have
> Passover vacation these days and many people are OOO now.
> 
> But anyway, I'll remind to Moni.

Ok, thanks and happy Passover then.

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] IB/rxe: Don't clamp residual length to mtu
  2017-04-06 12:49 [PATCH] IB/rxe: Don't clamp residual length to mtu Johannes Thumshirn
  2017-04-13 12:00 ` Johannes Thumshirn
@ 2017-04-13 14:12 ` Moni Shoua
  2017-04-25  7:29 ` Johannes Thumshirn
  2 siblings, 0 replies; 7+ messages in thread
From: Moni Shoua @ 2017-04-13 14:12 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock,
	Linux Kernel Mailinglist, linux-rdma, Hannes Reinecke,
	Sagi Grimberg, Max Gurtovoy

On Thu, Apr 6, 2017 at 3:49 PM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
> When reading a RDMA WRITE FIRST packet we copy the DMA length from the RDMA
> header into the qp->resp.resid variable for later use. Later in check_rkey()
> we clamp it to the MTU if the packet is an  RDMA WRITE packet and has a
> residual length bigger than the MTU. Later in write_data_in() we subtract the
> payload of the packet from the residual length. If the packet happens to have a
> payload of exactly the MTU size we end up with a residual length of 0 despite
> the packet not being the last in the conversation. When the next packet in the
> conversation arrives, we don't have any residual length left and thus set the QP
> into an error state.
>
> This broke NVMe over Fabrics functionality over rdma_rxe.ko
>
> The patch was verified using the following test.
>
>  # echo eth0 > /sys/module/rdma_rxe/parameters/add
>  # nvme connect -t rdma -a 192.168.155.101 -s 1023 -n nvmf-test
>  # mkfs.xfs -fK /dev/nvme0n1
>  meta-data=/dev/nvme0n1           isize=256    agcount=4, agsize=65536 blks
>           =                       sectsz=4096  attr=2, projid32bit=1
>           =                       crc=0        finobt=0, sparse=0
>  data     =                       bsize=4096   blocks=262144, imaxpct=25
>           =                       sunit=0      swidth=0 blks
>  naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
>  log      =internal log           bsize=4096   blocks=2560, version=2
>           =                       sectsz=4096  sunit=1 blks, lazy-count=1
>  realtime =none                   extsz=4096   blocks=0, rtextents=0
>  # mount /dev/nvme0n1 /tmp/
>  [  148.923263] XFS (nvme0n1): Mounting V4 Filesystem
>  [  148.961196] XFS (nvme0n1): Ending clean mount
>  # dd if=/dev/urandom of=test.bin bs=1M count=128
>  128+0 records in
>  128+0 records out
>  134217728 bytes (134 MB, 128 MiB) copied, 0.437991 s, 306 MB/s
>  # sha256sum test.bin
>  cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  test.bin
>  # cp test.bin /tmp/
>  sha256sum /tmp/test.bin
>  cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  /tmp/test.bin
>
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Max Gurtovoy <maxg@mellanox.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_resp.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index c9dd385..58764df 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -478,8 +478,6 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
>                                 state = RESPST_ERR_LENGTH;
>                                 goto err;
>                         }
> -
> -                       qp->resp.resid = mtu;
>                 } else {
>                         if (pktlen != resid) {
>                                 state = RESPST_ERR_LENGTH;
> --
> 2.10.2
>
> --
Thanks Johannes

Acked-by: Moni Shoua <monis@mellanox.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] IB/rxe: Don't clamp residual length to mtu
  2017-04-06 12:49 [PATCH] IB/rxe: Don't clamp residual length to mtu Johannes Thumshirn
  2017-04-13 12:00 ` Johannes Thumshirn
  2017-04-13 14:12 ` Moni Shoua
@ 2017-04-25  7:29 ` Johannes Thumshirn
  2017-05-01 18:44   ` Doug Ledford
  2 siblings, 1 reply; 7+ messages in thread
From: Johannes Thumshirn @ 2017-04-25  7:29 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Linux Kernel Mailinglist, linux-rdma, Hannes Reinecke,
	Sagi Grimberg, Max Gurtovoy, Moni Shoua, Sean Hefty,
	Hal Rosenstock

On Thu, Apr 06, 2017 at 02:49:44PM +0200, Johannes Thumshirn wrote:
> When reading a RDMA WRITE FIRST packet we copy the DMA length from the RDMA
> header into the qp->resp.resid variable for later use. Later in check_rkey()
> we clamp it to the MTU if the packet is an  RDMA WRITE packet and has a
> residual length bigger than the MTU. Later in write_data_in() we subtract the
> payload of the packet from the residual length. If the packet happens to have a
> payload of exactly the MTU size we end up with a residual length of 0 despite
> the packet not being the last in the conversation. When the next packet in the
> conversation arrives, we don't have any residual length left and thus set the QP
> into an error state.
> 
> This broke NVMe over Fabrics functionality over rdma_rxe.ko
> 
> The patch was verified using the following test.
> 
>  # echo eth0 > /sys/module/rdma_rxe/parameters/add
>  # nvme connect -t rdma -a 192.168.155.101 -s 1023 -n nvmf-test
>  # mkfs.xfs -fK /dev/nvme0n1
>  meta-data=/dev/nvme0n1           isize=256    agcount=4, agsize=65536 blks
>           =                       sectsz=4096  attr=2, projid32bit=1
>           =                       crc=0        finobt=0, sparse=0
>  data     =                       bsize=4096   blocks=262144, imaxpct=25
>           =                       sunit=0      swidth=0 blks
>  naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
>  log      =internal log           bsize=4096   blocks=2560, version=2
>           =                       sectsz=4096  sunit=1 blks, lazy-count=1
>  realtime =none                   extsz=4096   blocks=0, rtextents=0
>  # mount /dev/nvme0n1 /tmp/
>  [  148.923263] XFS (nvme0n1): Mounting V4 Filesystem
>  [  148.961196] XFS (nvme0n1): Ending clean mount
>  # dd if=/dev/urandom of=test.bin bs=1M count=128
>  128+0 records in
>  128+0 records out
>  134217728 bytes (134 MB, 128 MiB) copied, 0.437991 s, 306 MB/s
>  # sha256sum test.bin
>  cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  test.bin
>  # cp test.bin /tmp/
>  sha256sum /tmp/test.bin
>  cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  /tmp/test.bin
> 
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Max Gurtovoy <maxg@mellanox.com>
> ---

Doug anything left here? I already have an Ack from Moni. This patch is needed
to get NVMe over Fabrics working on rxe so I'd like to see it in v4.12.

Thanks,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] IB/rxe: Don't clamp residual length to mtu
  2017-04-25  7:29 ` Johannes Thumshirn
@ 2017-05-01 18:44   ` Doug Ledford
  0 siblings, 0 replies; 7+ messages in thread
From: Doug Ledford @ 2017-05-01 18:44 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Linux Kernel Mailinglist, linux-rdma, Hannes Reinecke,
	Sagi Grimberg, Max Gurtovoy, Moni Shoua, Sean Hefty,
	Hal Rosenstock

On Tue, 2017-04-25 at 09:29 +0200, Johannes Thumshirn wrote:
> On Thu, Apr 06, 2017 at 02:49:44PM +0200, Johannes Thumshirn wrote:
> > 
> > When reading a RDMA WRITE FIRST packet we copy the DMA length from
> > the RDMA
> > header into the qp->resp.resid variable for later use. Later in
> > check_rkey()
> > we clamp it to the MTU if the packet is an  RDMA WRITE packet and
> > has a
> > residual length bigger than the MTU. Later in write_data_in() we
> > subtract the
> > payload of the packet from the residual length. If the packet
> > happens to have a
> > payload of exactly the MTU size we end up with a residual length of
> > 0 despite
> > the packet not being the last in the conversation. When the next
> > packet in the
> > conversation arrives, we don't have any residual length left and
> > thus set the QP
> > into an error state.
> > 
> > This broke NVMe over Fabrics functionality over rdma_rxe.ko
> > 
> > The patch was verified using the following test.
> > 
> >  # echo eth0 > /sys/module/rdma_rxe/parameters/add
> >  # nvme connect -t rdma -a 192.168.155.101 -s 1023 -n nvmf-test
> >  # mkfs.xfs -fK /dev/nvme0n1
> >  meta-data=/dev/nvme0n1           isize=256    agcount=4,
> > agsize=65536 blks
> >           =                       sectsz=4096  attr=2,
> > projid32bit=1
> >           =                       crc=0        finobt=0, sparse=0
> >  data     =                       bsize=4096   blocks=262144,
> > imaxpct=25
> >           =                       sunit=0      swidth=0 blks
> >  naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> >  log      =internal log           bsize=4096   blocks=2560,
> > version=2
> >           =                       sectsz=4096  sunit=1 blks, lazy-
> > count=1
> >  realtime =none                   extsz=4096   blocks=0,
> > rtextents=0
> >  # mount /dev/nvme0n1 /tmp/
> >  [  148.923263] XFS (nvme0n1): Mounting V4 Filesystem
> >  [  148.961196] XFS (nvme0n1): Ending clean mount
> >  # dd if=/dev/urandom of=test.bin bs=1M count=128
> >  128+0 records in
> >  128+0 records out
> >  134217728 bytes (134 MB, 128 MiB) copied, 0.437991 s, 306 MB/s
> >  # sha256sum test.bin
> >  cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  
> > test.bin
> >  # cp test.bin /tmp/
> >  sha256sum /tmp/test.bin
> >  cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  
> > /tmp/test.bin
> > 
> > Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> > Cc: Hannes Reinecke <hare@suse.de>
> > Cc: Sagi Grimberg <sagi@grimberg.me>
> > Cc: Max Gurtovoy <maxg@mellanox.com>
> > ---
> 
> Doug anything left here? I already have an Ack from Moni. This patch
> is needed
> to get NVMe over Fabrics working on rxe so I'd like to see it in
> v4.12.

Nope, it's all good.  I applied it today.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
   
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-05-01 18:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-06 12:49 [PATCH] IB/rxe: Don't clamp residual length to mtu Johannes Thumshirn
2017-04-13 12:00 ` Johannes Thumshirn
2017-04-13 12:22   ` Leon Romanovsky
2017-04-13 12:29     ` Johannes Thumshirn
2017-04-13 14:12 ` Moni Shoua
2017-04-25  7:29 ` Johannes Thumshirn
2017-05-01 18:44   ` Doug Ledford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).