All of lore.kernel.org
 help / color / mirror / Atom feed
* nfsrdma fails to write big file,
@ 2010-02-22 18:41 Vu Pham
       [not found] ` <9FA59C95FFCBB34EA5E42C1A8573784F02662E58-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Vu Pham @ 2010-02-22 18:41 UTC (permalink / raw)
  To: Tom Tucker
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mahesh Siddheshwar,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Setup: 
1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600, ConnectX2
QDR HCAs fw 2.7.8-6, RHEL 5.2.
2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.


Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
count=10000*, operation fail, connection get drop, client cannot
re-establish connection to server.
After rebooting only the client, I can mount again.

It happens with both solaris and linux nfsrdma servers.

For linux client/server, I run memreg=5 (FRMR), I don't see problem with
memreg=6 (global dma key)

On Solaris server snv 130, we see problem decoding write request of 32K.
The client send two read chunks (32K & 16-byte), the server fail to do
rdma read on the 16-byte chunk (cqe.status = 10 ie.
IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the connection. We
don't see this problem on nfs version 3 on Solaris. Solaris server run
normal memory registration mode.

On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR

I added these notes in bug #1919 (bugs.openfabrics.org) to track the
issue.

thanks,
-vu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ewg] nfsrdma fails to write big file,
       [not found] ` <9FA59C95FFCBB34EA5E42C1A8573784F02662E58-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
@ 2010-02-22 18:49   ` Tom Tucker
  2010-02-22 20:22     ` Vu Pham
  0 siblings, 1 reply; 16+ messages in thread
From: Tom Tucker @ 2010-02-22 18:49 UTC (permalink / raw)
  To: Vu Pham
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mahesh Siddheshwar,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Vu Pham wrote:
> Setup: 
> 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600, ConnectX2
> QDR HCAs fw 2.7.8-6, RHEL 5.2.
> 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.
>
>
> Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
> count=10000*, operation fail, connection get drop, client cannot
> re-establish connection to server.
> After rebooting only the client, I can mount again.
>
> It happens with both solaris and linux nfsrdma servers.
>
> For linux client/server, I run memreg=5 (FRMR), I don't see problem with
> memreg=6 (global dma key)
>
>   

Awesome. This is the key I think.

Thanks for the info Vu,
Tom


> On Solaris server snv 130, we see problem decoding write request of 32K.
> The client send two read chunks (32K & 16-byte), the server fail to do
> rdma read on the 16-byte chunk (cqe.status = 10 ie.
> IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the connection. We
> don't see this problem on nfs version 3 on Solaris. Solaris server run
> normal memory registration mode.
>
> On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR
>
> I added these notes in bug #1919 (bugs.openfabrics.org) to track the
> issue.
>
> thanks,
> -vu
> _______________________________________________
> ewg mailing list
> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [ewg] nfsrdma fails to write big file,
  2010-02-22 18:49   ` [ewg] " Tom Tucker
@ 2010-02-22 20:22     ` Vu Pham
  2010-02-24 18:56       ` Vu Pham
  0 siblings, 1 reply; 16+ messages in thread
From: Vu Pham @ 2010-02-22 20:22 UTC (permalink / raw)
  To: Tom Tucker
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mahesh Siddheshwar,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Tom,

Some more info on the problem:
1. Running with memreg=4 (FMR) I can not reproduce the problem
2. I also see different error on client

Feb 22 12:16:55 mellanox-2 rpc.idmapd[5786]: nss_getpwnam: name 'nobody'
does not map into domain 'localdomain' 
Feb 22 12:16:55 mellanox-2 kernel: QP 0x70004b: WQE overflow
Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
Feb 22 12:16:55 mellanox-2 kernel: RPC: rpcrdma_ep_post: ib_post_send
returned -12 cq_init 48 cq_count 32
Feb 22 12:17:00 mellanox-2 kernel: RPC:       rpcrdma_event_process:
send WC status 5, vend_err F5
Feb 22 12:17:00 mellanox-2 kernel: rpcrdma: connection to
13.20.1.9:20049 closed (-103)

-vu

> -----Original Message-----
> From: Tom Tucker [mailto:tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
> Sent: Monday, February 22, 2010 10:49 AM
> To: Vu Pham
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> Subject: Re: [ewg] nfsrdma fails to write big file,
> 
> Vu Pham wrote:
> > Setup:
> > 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600,
> ConnectX2
> > QDR HCAs fw 2.7.8-6, RHEL 5.2.
> > 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.
> >
> >
> > Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
> > count=10000*, operation fail, connection get drop, client cannot
> > re-establish connection to server.
> > After rebooting only the client, I can mount again.
> >
> > It happens with both solaris and linux nfsrdma servers.
> >
> > For linux client/server, I run memreg=5 (FRMR), I don't see problem
> with
> > memreg=6 (global dma key)
> >
> >
> 
> Awesome. This is the key I think.
> 
> Thanks for the info Vu,
> Tom
> 
> 
> > On Solaris server snv 130, we see problem decoding write request of
> 32K.
> > The client send two read chunks (32K & 16-byte), the server fail to
> do
> > rdma read on the 16-byte chunk (cqe.status = 10 ie.
> > IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the
connection.
> We
> > don't see this problem on nfs version 3 on Solaris. Solaris server
> run
> > normal memory registration mode.
> >
> > On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR
> >
> > I added these notes in bug #1919 (bugs.openfabrics.org) to track the
> > issue.
> >
> > thanks,
> > -vu
> > _______________________________________________
> > ewg mailing list
> > ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [ewg] nfsrdma fails to write big file,
  2010-02-22 20:22     ` Vu Pham
@ 2010-02-24 18:56       ` Vu Pham
       [not found]         ` <9FA59C95FFCBB34EA5E42C1A8573784F02663166-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Vu Pham @ 2010-02-24 18:56 UTC (permalink / raw)
  To: Vu Pham, Tom Tucker
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mahesh Siddheshwar,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Tom,

Did you make any change to have bonnie++, dd of a 10G file and vdbench
concurrently run & finish?

I keep hitting the WQE overflow error below.
I saw that most of the requests have two chunks (32K chunk and
some-bytes chunk), each chunk requires an frmr + invalidate wrs;
However, you set ep->rep_attr.cap.max_send_wr = cdata->max_requests and
then for frmr case you do
ep->rep_atrr.cap.max_send_wr *=3; which is not enough. Moreover, you
also set ep->rep_cqinit = max_send_wr/2 for send completion signal which
causes the wqe overflow happened faster.

After applying the following patch, I have thing vdbench, dd, and copy
10g_file running overnight

-vu


--- ofa_kernel-1.5.1.orig/net/sunrpc/xprtrdma/verbs.c   2010-02-24
10:41:22.000000000 -0800
+++ ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c        2010-02-24
10:03:18.000000000 -0800
@@ -649,8 +654,15 @@
        ep->rep_attr.cap.max_send_wr = cdata->max_requests;
        switch (ia->ri_memreg_strategy) {
        case RPCRDMA_FRMR:
-               /* Add room for frmr register and invalidate WRs */
-               ep->rep_attr.cap.max_send_wr *= 3;
+               /* 
+                * Add room for frmr register and invalidate WRs
+                * Requests sometimes have two chunks, each chunk
+                * requires to have different frmr. The safest
+                * WRs required are max_send_wr * 6; however, we
+                * get send completions and poll fast enough, it
+                * is pretty safe to have max_send_wr * 4. 
+                */
+               ep->rep_attr.cap.max_send_wr *= 4;
                if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr)
                        return -EINVAL;
                break;
@@ -682,7 +694,8 @@
                ep->rep_attr.cap.max_recv_sge);

        /* set trigger for requesting send completion */
-       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
+       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/4;
+       
        switch (ia->ri_memreg_strategy) {
        case RPCRDMA_MEMWINDOWS_ASYNC:
        case RPCRDMA_MEMWINDOWS:





> -----Original Message-----
> From: ewg-bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org [mailto:ewg-
> bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org] On Behalf Of Vu Pham
> Sent: Monday, February 22, 2010 12:23 PM
> To: Tom Tucker
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> Subject: Re: [ewg] nfsrdma fails to write big file,
> 
> Tom,
> 
> Some more info on the problem:
> 1. Running with memreg=4 (FMR) I can not reproduce the problem
> 2. I also see different error on client
> 
> Feb 22 12:16:55 mellanox-2 rpc.idmapd[5786]: nss_getpwnam: name
> 'nobody'
> does not map into domain 'localdomain'
> Feb 22 12:16:55 mellanox-2 kernel: QP 0x70004b: WQE overflow
> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
> Feb 22 12:16:55 mellanox-2 kernel: RPC: rpcrdma_ep_post: ib_post_send
> returned -12 cq_init 48 cq_count 32
> Feb 22 12:17:00 mellanox-2 kernel: RPC:       rpcrdma_event_process:
> send WC status 5, vend_err F5
> Feb 22 12:17:00 mellanox-2 kernel: rpcrdma: connection to
> 13.20.1.9:20049 closed (-103)
> 
> -vu
> 
> > -----Original Message-----
> > From: Tom Tucker [mailto:tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
> > Sent: Monday, February 22, 2010 10:49 AM
> > To: Vu Pham
> > Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
> > ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> > Subject: Re: [ewg] nfsrdma fails to write big file,
> >
> > Vu Pham wrote:
> > > Setup:
> > > 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600,
> > ConnectX2
> > > QDR HCAs fw 2.7.8-6, RHEL 5.2.
> > > 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.
> > >
> > >
> > > Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
> > > count=10000*, operation fail, connection get drop, client cannot
> > > re-establish connection to server.
> > > After rebooting only the client, I can mount again.
> > >
> > > It happens with both solaris and linux nfsrdma servers.
> > >
> > > For linux client/server, I run memreg=5 (FRMR), I don't see
problem
> > with
> > > memreg=6 (global dma key)
> > >
> > >
> >
> > Awesome. This is the key I think.
> >
> > Thanks for the info Vu,
> > Tom
> >
> >
> > > On Solaris server snv 130, we see problem decoding write request
of
> > 32K.
> > > The client send two read chunks (32K & 16-byte), the server fail
to
> > do
> > > rdma read on the 16-byte chunk (cqe.status = 10 ie.
> > > IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the
> connection.
> > We
> > > don't see this problem on nfs version 3 on Solaris. Solaris server
> > run
> > > normal memory registration mode.
> > >
> > > On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR
> > >
> > > I added these notes in bug #1919 (bugs.openfabrics.org) to track
> the
> > > issue.
> > >
> > > thanks,
> > > -vu
> > > _______________________________________________
> > > ewg mailing list
> > > ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> > >
> 
> _______________________________________________
> ewg mailing list
> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ewg] nfsrdma fails to write big file,
       [not found]         ` <9FA59C95FFCBB34EA5E42C1A8573784F02663166-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
@ 2010-02-24 19:06           ` Roland Dreier
       [not found]             ` <ada3a0q1mje.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
  2010-02-24 22:07           ` Tom Tucker
  2010-02-24 22:48           ` Tom Tucker
  2 siblings, 1 reply; 16+ messages in thread
From: Roland Dreier @ 2010-02-24 19:06 UTC (permalink / raw)
  To: Vu Pham
  Cc: Tom Tucker, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Mahesh Siddheshwar, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

 > +               /* 
 > +                * Add room for frmr register and invalidate WRs
 > +                * Requests sometimes have two chunks, each chunk
 > +                * requires to have different frmr. The safest
 > +                * WRs required are max_send_wr * 6; however, we
 > +                * get send completions and poll fast enough, it
 > +                * is pretty safe to have max_send_wr * 4. 
 > +                */
 > +               ep->rep_attr.cap.max_send_wr *= 4;

Seems like a bad design if there is a possibility of work queue
overflow; if you're counting on events occurring in a particular order
or completions being handled "fast enough", then your design is going to
fail in some high load situations, which I don't think you want.

 - R.
-- 
Roland Dreier  <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ewg] nfsrdma fails to write big file,
       [not found]         ` <9FA59C95FFCBB34EA5E42C1A8573784F02663166-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
  2010-02-24 19:06           ` Roland Dreier
@ 2010-02-24 22:07           ` Tom Tucker
  2010-02-24 22:48           ` Tom Tucker
  2 siblings, 0 replies; 16+ messages in thread
From: Tom Tucker @ 2010-02-24 22:07 UTC (permalink / raw)
  To: Vu Pham
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mahesh Siddheshwar,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Vu Pham wrote:
> Tom,
>
> Did you make any change to have bonnie++, dd of a 10G file and vdbench
> concurrently run & finish?
>
>   

No I did not but my disk subsystem is pretty slow, so it might be that I 
just don't have fast enough storage.

> I keep hitting the WQE overflow error below.
> I saw that most of the requests have two chunks (32K chunk and
> some-bytes chunk), each chunk requires an frmr + invalidate wrs;
> However, you set ep->rep_attr.cap.max_send_wr = cdata->max_requests and
> then for frmr case you do
> ep->rep_atrr.cap.max_send_wr *=3; which is not enough. Moreover, you
> also set ep->rep_cqinit = max_send_wr/2 for send completion signal which
> causes the wqe overflow happened faster.
>
>   


> After applying the following patch, I have thing vdbench, dd, and copy
> 10g_file running overnight
>
> -vu
>
>
> --- ofa_kernel-1.5.1.orig/net/sunrpc/xprtrdma/verbs.c   2010-02-24
> 10:41:22.000000000 -0800
> +++ ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c        2010-02-24
> 10:03:18.000000000 -0800
> @@ -649,8 +654,15 @@
>         ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>         switch (ia->ri_memreg_strategy) {
>         case RPCRDMA_FRMR:
> -               /* Add room for frmr register and invalidate WRs */
> -               ep->rep_attr.cap.max_send_wr *= 3;
> +               /* 
> +                * Add room for frmr register and invalidate WRs
> +                * Requests sometimes have two chunks, each chunk
> +                * requires to have different frmr. The safest
> +                * WRs required are max_send_wr * 6; however, we
> +                * get send completions and poll fast enough, it
> +                * is pretty safe to have max_send_wr * 4. 
> +                */
> +               ep->rep_attr.cap.max_send_wr *= 4;
>                 if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr)
>                         return -EINVAL;
>                 break;
> @@ -682,7 +694,8 @@
>                 ep->rep_attr.cap.max_recv_sge);
>
>         /* set trigger for requesting send completion */
> -       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
> +       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/4;
> +       
>         switch (ia->ri_memreg_strategy) {
>         case RPCRDMA_MEMWINDOWS_ASYNC:
>         case RPCRDMA_MEMWINDOWS:
>
>
>   
Erf. This is client code. I'll take a look at this and see if I can 
understand what Talpey was up to.

Tom
>   


>
>
>   
>> -----Original Message-----
>> From: ewg-bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org [mailto:ewg-
>> bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org] On Behalf Of Vu Pham
>> Sent: Monday, February 22, 2010 12:23 PM
>> To: Tom Tucker
>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>
>> Tom,
>>
>> Some more info on the problem:
>> 1. Running with memreg=4 (FMR) I can not reproduce the problem
>> 2. I also see different error on client
>>
>> Feb 22 12:16:55 mellanox-2 rpc.idmapd[5786]: nss_getpwnam: name
>> 'nobody'
>> does not map into domain 'localdomain'
>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x70004b: WQE overflow
>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
>> Feb 22 12:16:55 mellanox-2 kernel: RPC: rpcrdma_ep_post: ib_post_send
>> returned -12 cq_init 48 cq_count 32
>> Feb 22 12:17:00 mellanox-2 kernel: RPC:       rpcrdma_event_process:
>> send WC status 5, vend_err F5
>> Feb 22 12:17:00 mellanox-2 kernel: rpcrdma: connection to
>> 13.20.1.9:20049 closed (-103)
>>
>> -vu
>>
>>     
>>> -----Original Message-----
>>> From: Tom Tucker [mailto:tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
>>> Sent: Monday, February 22, 2010 10:49 AM
>>> To: Vu Pham
>>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>>
>>> Vu Pham wrote:
>>>       
>>>> Setup:
>>>> 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600,
>>>>         
>>> ConnectX2
>>>       
>>>> QDR HCAs fw 2.7.8-6, RHEL 5.2.
>>>> 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.
>>>>
>>>>
>>>> Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
>>>> count=10000*, operation fail, connection get drop, client cannot
>>>> re-establish connection to server.
>>>> After rebooting only the client, I can mount again.
>>>>
>>>> It happens with both solaris and linux nfsrdma servers.
>>>>
>>>> For linux client/server, I run memreg=5 (FRMR), I don't see
>>>>         
> problem
>   
>>> with
>>>       
>>>> memreg=6 (global dma key)
>>>>
>>>>
>>>>         
>>> Awesome. This is the key I think.
>>>
>>> Thanks for the info Vu,
>>> Tom
>>>
>>>
>>>       
>>>> On Solaris server snv 130, we see problem decoding write request
>>>>         
> of
>   
>>> 32K.
>>>       
>>>> The client send two read chunks (32K & 16-byte), the server fail
>>>>         
> to
>   
>>> do
>>>       
>>>> rdma read on the 16-byte chunk (cqe.status = 10 ie.
>>>> IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the
>>>>         
>> connection.
>>     
>>> We
>>>       
>>>> don't see this problem on nfs version 3 on Solaris. Solaris server
>>>>         
>>> run
>>>       
>>>> normal memory registration mode.
>>>>
>>>> On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR
>>>>
>>>> I added these notes in bug #1919 (bugs.openfabrics.org) to track
>>>>         
>> the
>>     
>>>> issue.
>>>>
>>>> thanks,
>>>> -vu
>>>> _______________________________________________
>>>> ewg mailing list
>>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>
>>>>         
>> _______________________________________________
>> ewg mailing list
>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>     
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ewg] nfsrdma fails to write big file,
       [not found]             ` <ada3a0q1mje.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
@ 2010-02-24 22:13               ` Tom Tucker
  2010-02-28  4:22               ` Tom Tucker
  1 sibling, 0 replies; 16+ messages in thread
From: Tom Tucker @ 2010-02-24 22:13 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Vu Pham, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mahesh Siddheshwar,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Roland Dreier wrote:
>  > +               /* 
>  > +                * Add room for frmr register and invalidate WRs
>  > +                * Requests sometimes have two chunks, each chunk
>  > +                * requires to have different frmr. The safest
>  > +                * WRs required are max_send_wr * 6; however, we
>  > +                * get send completions and poll fast enough, it
>  > +                * is pretty safe to have max_send_wr * 4. 
>  > +                */
>  > +               ep->rep_attr.cap.max_send_wr *= 4;
>
> Seems like a bad design if there is a possibility of work queue
> overflow; if you're counting on events occurring in a particular order
> or completions being handled "fast enough", then your design is going to
> fail in some high load situations, which I don't think you want.
>
>   

I agree. It's basically a time bomb. A bump in the work flow and you'll 
overflow the CQ.

Thanks for finding the bug though Vu.
>  - R.
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ewg] nfsrdma fails to write big file,
       [not found]         ` <9FA59C95FFCBB34EA5E42C1A8573784F02663166-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
  2010-02-24 19:06           ` Roland Dreier
  2010-02-24 22:07           ` Tom Tucker
@ 2010-02-24 22:48           ` Tom Tucker
       [not found]             ` <4B85ACD2.9040405-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  2 siblings, 1 reply; 16+ messages in thread
From: Tom Tucker @ 2010-02-24 22:48 UTC (permalink / raw)
  To: Vu Pham
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mahesh Siddheshwar,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Vu,

Are you changing any of the default settings? For example rsize/wsize, 
etc... I'd like to reproduce this problem if I can.

Thanks,

Tom

Vu Pham wrote:
> Tom,
>
> Did you make any change to have bonnie++, dd of a 10G file and vdbench
> concurrently run & finish?
>
> I keep hitting the WQE overflow error below.
> I saw that most of the requests have two chunks (32K chunk and
> some-bytes chunk), each chunk requires an frmr + invalidate wrs;
> However, you set ep->rep_attr.cap.max_send_wr = cdata->max_requests and
> then for frmr case you do
> ep->rep_atrr.cap.max_send_wr *=3; which is not enough. Moreover, you
> also set ep->rep_cqinit = max_send_wr/2 for send completion signal which
> causes the wqe overflow happened faster.
>
> After applying the following patch, I have thing vdbench, dd, and copy
> 10g_file running overnight
>
> -vu
>
>
> --- ofa_kernel-1.5.1.orig/net/sunrpc/xprtrdma/verbs.c   2010-02-24
> 10:41:22.000000000 -0800
> +++ ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c        2010-02-24
> 10:03:18.000000000 -0800
> @@ -649,8 +654,15 @@
>         ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>         switch (ia->ri_memreg_strategy) {
>         case RPCRDMA_FRMR:
> -               /* Add room for frmr register and invalidate WRs */
> -               ep->rep_attr.cap.max_send_wr *= 3;
> +               /* 
> +                * Add room for frmr register and invalidate WRs
> +                * Requests sometimes have two chunks, each chunk
> +                * requires to have different frmr. The safest
> +                * WRs required are max_send_wr * 6; however, we
> +                * get send completions and poll fast enough, it
> +                * is pretty safe to have max_send_wr * 4. 
> +                */
> +               ep->rep_attr.cap.max_send_wr *= 4;
>                 if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr)
>                         return -EINVAL;
>                 break;
> @@ -682,7 +694,8 @@
>                 ep->rep_attr.cap.max_recv_sge);
>
>         /* set trigger for requesting send completion */
> -       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
> +       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/4;
> +       
>         switch (ia->ri_memreg_strategy) {
>         case RPCRDMA_MEMWINDOWS_ASYNC:
>         case RPCRDMA_MEMWINDOWS:
>
>
>
>
>
>   
>> -----Original Message-----
>> From: ewg-bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org [mailto:ewg-
>> bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org] On Behalf Of Vu Pham
>> Sent: Monday, February 22, 2010 12:23 PM
>> To: Tom Tucker
>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>
>> Tom,
>>
>> Some more info on the problem:
>> 1. Running with memreg=4 (FMR) I can not reproduce the problem
>> 2. I also see different error on client
>>
>> Feb 22 12:16:55 mellanox-2 rpc.idmapd[5786]: nss_getpwnam: name
>> 'nobody'
>> does not map into domain 'localdomain'
>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x70004b: WQE overflow
>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
>> Feb 22 12:16:55 mellanox-2 kernel: RPC: rpcrdma_ep_post: ib_post_send
>> returned -12 cq_init 48 cq_count 32
>> Feb 22 12:17:00 mellanox-2 kernel: RPC:       rpcrdma_event_process:
>> send WC status 5, vend_err F5
>> Feb 22 12:17:00 mellanox-2 kernel: rpcrdma: connection to
>> 13.20.1.9:20049 closed (-103)
>>
>> -vu
>>
>>     
>>> -----Original Message-----
>>> From: Tom Tucker [mailto:tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
>>> Sent: Monday, February 22, 2010 10:49 AM
>>> To: Vu Pham
>>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>>
>>> Vu Pham wrote:
>>>       
>>>> Setup:
>>>> 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600,
>>>>         
>>> ConnectX2
>>>       
>>>> QDR HCAs fw 2.7.8-6, RHEL 5.2.
>>>> 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.
>>>>
>>>>
>>>> Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
>>>> count=10000*, operation fail, connection get drop, client cannot
>>>> re-establish connection to server.
>>>> After rebooting only the client, I can mount again.
>>>>
>>>> It happens with both solaris and linux nfsrdma servers.
>>>>
>>>> For linux client/server, I run memreg=5 (FRMR), I don't see
>>>>         
> problem
>   
>>> with
>>>       
>>>> memreg=6 (global dma key)
>>>>
>>>>
>>>>         
>>> Awesome. This is the key I think.
>>>
>>> Thanks for the info Vu,
>>> Tom
>>>
>>>
>>>       
>>>> On Solaris server snv 130, we see problem decoding write request
>>>>         
> of
>   
>>> 32K.
>>>       
>>>> The client send two read chunks (32K & 16-byte), the server fail
>>>>         
> to
>   
>>> do
>>>       
>>>> rdma read on the 16-byte chunk (cqe.status = 10 ie.
>>>> IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the
>>>>         
>> connection.
>>     
>>> We
>>>       
>>>> don't see this problem on nfs version 3 on Solaris. Solaris server
>>>>         
>>> run
>>>       
>>>> normal memory registration mode.
>>>>
>>>> On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR
>>>>
>>>> I added these notes in bug #1919 (bugs.openfabrics.org) to track
>>>>         
>> the
>>     
>>>> issue.
>>>>
>>>> thanks,
>>>> -vu
>>>> _______________________________________________
>>>> ewg mailing list
>>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>
>>>>         
>> _______________________________________________
>> ewg mailing list
>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>     
> _______________________________________________
> ewg mailing list
> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ewg] nfsrdma fails to write big file,
       [not found]             ` <4B85ACD2.9040405-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-02-25  0:02               ` Tom Tucker
       [not found]                 ` <4B85BDF9.8020009-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Tom Tucker @ 2010-02-25  0:02 UTC (permalink / raw)
  To: Vu Pham
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mahesh Siddheshwar,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Vu,

Based on the mapping code, it looks to me like the worst case is 
RPCRDMA_MAX_SEGS * 2 + 1 as the multiplier. 
However, I think in practice, due to the way that iov are built, the 
actual max is 5 (frmr for head + pagelist plus invalidates for same plus 
one for the send itself). Why did you think the max was 6?

Thanks,
Tom

Tom Tucker wrote:
> Vu,
>
> Are you changing any of the default settings? For example rsize/wsize, 
> etc... I'd like to reproduce this problem if I can.
>
> Thanks,
>
> Tom
>
> Vu Pham wrote:
>   
>> Tom,
>>
>> Did you make any change to have bonnie++, dd of a 10G file and vdbench
>> concurrently run & finish?
>>
>> I keep hitting the WQE overflow error below.
>> I saw that most of the requests have two chunks (32K chunk and
>> some-bytes chunk), each chunk requires an frmr + invalidate wrs;
>> However, you set ep->rep_attr.cap.max_send_wr = cdata->max_requests and
>> then for frmr case you do
>> ep->rep_atrr.cap.max_send_wr *=3; which is not enough. Moreover, you
>> also set ep->rep_cqinit = max_send_wr/2 for send completion signal which
>> causes the wqe overflow happened faster.
>>
>> After applying the following patch, I have thing vdbench, dd, and copy
>> 10g_file running overnight
>>
>> -vu
>>
>>
>> --- ofa_kernel-1.5.1.orig/net/sunrpc/xprtrdma/verbs.c   2010-02-24
>> 10:41:22.000000000 -0800
>> +++ ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c        2010-02-24
>> 10:03:18.000000000 -0800
>> @@ -649,8 +654,15 @@
>>         ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>>         switch (ia->ri_memreg_strategy) {
>>         case RPCRDMA_FRMR:
>> -               /* Add room for frmr register and invalidate WRs */
>> -               ep->rep_attr.cap.max_send_wr *= 3;
>> +               /* 
>> +                * Add room for frmr register and invalidate WRs
>> +                * Requests sometimes have two chunks, each chunk
>> +                * requires to have different frmr. The safest
>> +                * WRs required are max_send_wr * 6; however, we
>> +                * get send completions and poll fast enough, it
>> +                * is pretty safe to have max_send_wr * 4. 
>> +                */
>> +               ep->rep_attr.cap.max_send_wr *= 4;
>>                 if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr)
>>                         return -EINVAL;
>>                 break;
>> @@ -682,7 +694,8 @@
>>                 ep->rep_attr.cap.max_recv_sge);
>>
>>         /* set trigger for requesting send completion */
>> -       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
>> +       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/4;
>> +       
>>         switch (ia->ri_memreg_strategy) {
>>         case RPCRDMA_MEMWINDOWS_ASYNC:
>>         case RPCRDMA_MEMWINDOWS:
>>
>>
>>
>>
>>
>>   
>>     
>>> -----Original Message-----
>>> From: ewg-bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org [mailto:ewg-
>>> bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org] On Behalf Of Vu Pham
>>> Sent: Monday, February 22, 2010 12:23 PM
>>> To: Tom Tucker
>>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>>
>>> Tom,
>>>
>>> Some more info on the problem:
>>> 1. Running with memreg=4 (FMR) I can not reproduce the problem
>>> 2. I also see different error on client
>>>
>>> Feb 22 12:16:55 mellanox-2 rpc.idmapd[5786]: nss_getpwnam: name
>>> 'nobody'
>>> does not map into domain 'localdomain'
>>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x70004b: WQE overflow
>>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
>>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
>>> Feb 22 12:16:55 mellanox-2 kernel: RPC: rpcrdma_ep_post: ib_post_send
>>> returned -12 cq_init 48 cq_count 32
>>> Feb 22 12:17:00 mellanox-2 kernel: RPC:       rpcrdma_event_process:
>>> send WC status 5, vend_err F5
>>> Feb 22 12:17:00 mellanox-2 kernel: rpcrdma: connection to
>>> 13.20.1.9:20049 closed (-103)
>>>
>>> -vu
>>>
>>>     
>>>       
>>>> -----Original Message-----
>>>> From: Tom Tucker [mailto:tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
>>>> Sent: Monday, February 22, 2010 10:49 AM
>>>> To: Vu Pham
>>>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
>>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>>>
>>>> Vu Pham wrote:
>>>>       
>>>>         
>>>>> Setup:
>>>>> 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600,
>>>>>         
>>>>>           
>>>> ConnectX2
>>>>       
>>>>         
>>>>> QDR HCAs fw 2.7.8-6, RHEL 5.2.
>>>>> 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.
>>>>>
>>>>>
>>>>> Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
>>>>> count=10000*, operation fail, connection get drop, client cannot
>>>>> re-establish connection to server.
>>>>> After rebooting only the client, I can mount again.
>>>>>
>>>>> It happens with both solaris and linux nfsrdma servers.
>>>>>
>>>>> For linux client/server, I run memreg=5 (FRMR), I don't see
>>>>>         
>>>>>           
>> problem
>>   
>>     
>>>> with
>>>>       
>>>>         
>>>>> memreg=6 (global dma key)
>>>>>
>>>>>
>>>>>         
>>>>>           
>>>> Awesome. This is the key I think.
>>>>
>>>> Thanks for the info Vu,
>>>> Tom
>>>>
>>>>
>>>>       
>>>>         
>>>>> On Solaris server snv 130, we see problem decoding write request
>>>>>         
>>>>>           
>> of
>>   
>>     
>>>> 32K.
>>>>       
>>>>         
>>>>> The client send two read chunks (32K & 16-byte), the server fail
>>>>>         
>>>>>           
>> to
>>   
>>     
>>>> do
>>>>       
>>>>         
>>>>> rdma read on the 16-byte chunk (cqe.status = 10 ie.
>>>>> IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the
>>>>>         
>>>>>           
>>> connection.
>>>     
>>>       
>>>> We
>>>>       
>>>>         
>>>>> don't see this problem on nfs version 3 on Solaris. Solaris server
>>>>>         
>>>>>           
>>>> run
>>>>       
>>>>         
>>>>> normal memory registration mode.
>>>>>
>>>>> On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR
>>>>>
>>>>> I added these notes in bug #1919 (bugs.openfabrics.org) to track
>>>>>         
>>>>>           
>>> the
>>>     
>>>       
>>>>> issue.
>>>>>
>>>>> thanks,
>>>>> -vu
>>>>> _______________________________________________
>>>>> ewg mailing list
>>>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>>
>>>>>         
>>>>>           
>>> _______________________________________________
>>> ewg mailing list
>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>     
>>>       
>> _______________________________________________
>> ewg mailing list
>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>   
>>     
>
> _______________________________________________
> ewg mailing list
> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ewg] nfsrdma fails to write big file,
       [not found]                 ` <4B85BDF9.8020009-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-02-25  0:51                   ` Tom Tucker
  0 siblings, 0 replies; 16+ messages in thread
From: Tom Tucker @ 2010-02-25  0:51 UTC (permalink / raw)
  To: Vu Pham
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mahesh Siddheshwar,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, Roland Dreier

Vu,

I ran the number of slots down to 8 (echo 8 > rdma_slot_table_entries) 
and I can reproduce the issue now. I'm going to try setting the 
allocation multiple to 5 and see if I can't prove to myself and Roland 
that we've accurately computed the correct factor.

I think overall a better solution might be a different credit system, 
however, I think that's a much more substantial change than we can 
tackle at this point.

Tom


Tom Tucker wrote:
> Vu,
>
> Based on the mapping code, it looks to me like the worst case is 
> RPCRDMA_MAX_SEGS * 2 + 1 as the multiplier. 
> However, I think in practice, due to the way that iov are built, the 
> actual max is 5 (frmr for head + pagelist plus invalidates for same plus 
> one for the send itself). Why did you think the max was 6?
>
> Thanks,
> Tom
>
> Tom Tucker wrote:
>   
>> Vu,
>>
>> Are you changing any of the default settings? For example rsize/wsize, 
>> etc... I'd like to reproduce this problem if I can.
>>
>> Thanks,
>>
>> Tom
>>
>> Vu Pham wrote:
>>   
>>     
>>> Tom,
>>>
>>> Did you make any change to have bonnie++, dd of a 10G file and vdbench
>>> concurrently run & finish?
>>>
>>> I keep hitting the WQE overflow error below.
>>> I saw that most of the requests have two chunks (32K chunk and
>>> some-bytes chunk), each chunk requires an frmr + invalidate wrs;
>>> However, you set ep->rep_attr.cap.max_send_wr = cdata->max_requests and
>>> then for frmr case you do
>>> ep->rep_atrr.cap.max_send_wr *=3; which is not enough. Moreover, you
>>> also set ep->rep_cqinit = max_send_wr/2 for send completion signal which
>>> causes the wqe overflow happened faster.
>>>
>>> After applying the following patch, I have thing vdbench, dd, and copy
>>> 10g_file running overnight
>>>
>>> -vu
>>>
>>>
>>> --- ofa_kernel-1.5.1.orig/net/sunrpc/xprtrdma/verbs.c   2010-02-24
>>> 10:41:22.000000000 -0800
>>> +++ ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c        2010-02-24
>>> 10:03:18.000000000 -0800
>>> @@ -649,8 +654,15 @@
>>>         ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>>>         switch (ia->ri_memreg_strategy) {
>>>         case RPCRDMA_FRMR:
>>> -               /* Add room for frmr register and invalidate WRs */
>>> -               ep->rep_attr.cap.max_send_wr *= 3;
>>> +               /* 
>>> +                * Add room for frmr register and invalidate WRs
>>> +                * Requests sometimes have two chunks, each chunk
>>> +                * requires to have different frmr. The safest
>>> +                * WRs required are max_send_wr * 6; however, we
>>> +                * get send completions and poll fast enough, it
>>> +                * is pretty safe to have max_send_wr * 4. 
>>> +                */
>>> +               ep->rep_attr.cap.max_send_wr *= 4;
>>>                 if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr)
>>>                         return -EINVAL;
>>>                 break;
>>> @@ -682,7 +694,8 @@
>>>                 ep->rep_attr.cap.max_recv_sge);
>>>
>>>         /* set trigger for requesting send completion */
>>> -       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
>>> +       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/4;
>>> +       
>>>         switch (ia->ri_memreg_strategy) {
>>>         case RPCRDMA_MEMWINDOWS_ASYNC:
>>>         case RPCRDMA_MEMWINDOWS:
>>>
>>>
>>>
>>>
>>>
>>>   
>>>     
>>>       
>>>> -----Original Message-----
>>>> From: ewg-bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org [mailto:ewg-
>>>> bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org] On Behalf Of Vu Pham
>>>> Sent: Monday, February 22, 2010 12:23 PM
>>>> To: Tom Tucker
>>>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
>>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>>>
>>>> Tom,
>>>>
>>>> Some more info on the problem:
>>>> 1. Running with memreg=4 (FMR) I can not reproduce the problem
>>>> 2. I also see different error on client
>>>>
>>>> Feb 22 12:16:55 mellanox-2 rpc.idmapd[5786]: nss_getpwnam: name
>>>> 'nobody'
>>>> does not map into domain 'localdomain'
>>>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x70004b: WQE overflow
>>>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
>>>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
>>>> Feb 22 12:16:55 mellanox-2 kernel: RPC: rpcrdma_ep_post: ib_post_send
>>>> returned -12 cq_init 48 cq_count 32
>>>> Feb 22 12:17:00 mellanox-2 kernel: RPC:       rpcrdma_event_process:
>>>> send WC status 5, vend_err F5
>>>> Feb 22 12:17:00 mellanox-2 kernel: rpcrdma: connection to
>>>> 13.20.1.9:20049 closed (-103)
>>>>
>>>> -vu
>>>>
>>>>     
>>>>       
>>>>         
>>>>> -----Original Message-----
>>>>> From: Tom Tucker [mailto:tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
>>>>> Sent: Monday, February 22, 2010 10:49 AM
>>>>> To: Vu Pham
>>>>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
>>>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>>>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>>>>
>>>>> Vu Pham wrote:
>>>>>       
>>>>>         
>>>>>           
>>>>>> Setup:
>>>>>> 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600,
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> ConnectX2
>>>>>       
>>>>>         
>>>>>           
>>>>>> QDR HCAs fw 2.7.8-6, RHEL 5.2.
>>>>>> 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.
>>>>>>
>>>>>>
>>>>>> Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
>>>>>> count=10000*, operation fail, connection get drop, client cannot
>>>>>> re-establish connection to server.
>>>>>> After rebooting only the client, I can mount again.
>>>>>>
>>>>>> It happens with both solaris and linux nfsrdma servers.
>>>>>>
>>>>>> For linux client/server, I run memreg=5 (FRMR), I don't see
>>>>>>         
>>>>>>           
>>>>>>             
>>> problem
>>>   
>>>     
>>>       
>>>>> with
>>>>>       
>>>>>         
>>>>>           
>>>>>> memreg=6 (global dma key)
>>>>>>
>>>>>>
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> Awesome. This is the key I think.
>>>>>
>>>>> Thanks for the info Vu,
>>>>> Tom
>>>>>
>>>>>
>>>>>       
>>>>>         
>>>>>           
>>>>>> On Solaris server snv 130, we see problem decoding write request
>>>>>>         
>>>>>>           
>>>>>>             
>>> of
>>>   
>>>     
>>>       
>>>>> 32K.
>>>>>       
>>>>>         
>>>>>           
>>>>>> The client send two read chunks (32K & 16-byte), the server fail
>>>>>>         
>>>>>>           
>>>>>>             
>>> to
>>>   
>>>     
>>>       
>>>>> do
>>>>>       
>>>>>         
>>>>>           
>>>>>> rdma read on the 16-byte chunk (cqe.status = 10 ie.
>>>>>> IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the
>>>>>>         
>>>>>>           
>>>>>>             
>>>> connection.
>>>>     
>>>>       
>>>>         
>>>>> We
>>>>>       
>>>>>         
>>>>>           
>>>>>> don't see this problem on nfs version 3 on Solaris. Solaris server
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> run
>>>>>       
>>>>>         
>>>>>           
>>>>>> normal memory registration mode.
>>>>>>
>>>>>> On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR
>>>>>>
>>>>>> I added these notes in bug #1919 (bugs.openfabrics.org) to track
>>>>>>         
>>>>>>           
>>>>>>             
>>>> the
>>>>     
>>>>       
>>>>         
>>>>>> issue.
>>>>>>
>>>>>> thanks,
>>>>>> -vu
>>>>>> _______________________________________________
>>>>>> ewg mailing list
>>>>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>>>
>>>>>>         
>>>>>>           
>>>>>>             
>>>> _______________________________________________
>>>> ewg mailing list
>>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>     
>>>>       
>>>>         
>>> _______________________________________________
>>> ewg mailing list
>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>   
>>>     
>>>       
>> _______________________________________________
>> ewg mailing list
>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>   
>>     
>
> _______________________________________________
> ewg mailing list
> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ewg] nfsrdma fails to write big file,
       [not found]             ` <ada3a0q1mje.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
  2010-02-24 22:13               ` Tom Tucker
@ 2010-02-28  4:22               ` Tom Tucker
  2010-03-02  0:19                 ` Vu Pham
       [not found]                 ` <4B89EF88.1030903-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  1 sibling, 2 replies; 16+ messages in thread
From: Tom Tucker @ 2010-02-28  4:22 UTC (permalink / raw)
  To: Vu Pham
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Mahesh Siddheshwar, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Roland Dreier wrote:
>  > +               /* 
>  > +                * Add room for frmr register and invalidate WRs
>  > +                * Requests sometimes have two chunks, each chunk
>  > +                * requires to have different frmr. The safest
>  > +                * WRs required are max_send_wr * 6; however, we
>  > +                * get send completions and poll fast enough, it
>  > +                * is pretty safe to have max_send_wr * 4. 
>  > +                */
>  > +               ep->rep_attr.cap.max_send_wr *= 4;
>
> Seems like a bad design if there is a possibility of work queue
> overflow; if you're counting on events occurring in a particular order
> or completions being handled "fast enough", then your design is going to
> fail in some high load situations, which I don't think you want.
>
>   

Vu,

Would you please try the following:

- Set the multiplier to 5
- Set the number of buffer credits small as follows "echo 4 > 
/proc/sys/sunrpc/rdma_slot_table_entries"
- Rerun your test and see if you can reproduce the problem?

I did the above and was unable to reproduce, but I would like to see if 
you can to convince ourselves that 5 is the right number.

Thanks,
Tom

>  - R.
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [ewg] nfsrdma fails to write big file,
  2010-02-28  4:22               ` Tom Tucker
@ 2010-03-02  0:19                 ` Vu Pham
       [not found]                   ` <9FA59C95FFCBB34EA5E42C1A8573784F02663602-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
       [not found]                 ` <4B89EF88.1030903-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  1 sibling, 1 reply; 16+ messages in thread
From: Vu Pham @ 2010-03-02  0:19 UTC (permalink / raw)
  To: Tom Tucker
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Mahesh Siddheshwar, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5



> -----Original Message-----
> From: Tom Tucker [mailto:tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
> Sent: Saturday, February 27, 2010 8:23 PM
> To: Vu Pham
> Cc: Roland Dreier; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> Subject: Re: [ewg] nfsrdma fails to write big file,
> 
> Roland Dreier wrote:
> >  > +               /*
> >  > +                * Add room for frmr register and invalidate WRs
> >  > +                * Requests sometimes have two chunks, each chunk
> >  > +                * requires to have different frmr. The safest
> >  > +                * WRs required are max_send_wr * 6; however, we
> >  > +                * get send completions and poll fast enough, it
> >  > +                * is pretty safe to have max_send_wr * 4.
> >  > +                */
> >  > +               ep->rep_attr.cap.max_send_wr *= 4;
> >
> > Seems like a bad design if there is a possibility of work queue
> > overflow; if you're counting on events occurring in a particular
> order
> > or completions being handled "fast enough", then your design is
going
> to
> > fail in some high load situations, which I don't think you want.
> >
> >
> 
> Vu,
> 
> Would you please try the following:
> 
> - Set the multiplier to 5
> - Set the number of buffer credits small as follows "echo 4 >
> /proc/sys/sunrpc/rdma_slot_table_entries"
> - Rerun your test and see if you can reproduce the problem?
> 
> I did the above and was unable to reproduce, but I would like to see
if
> you can to convince ourselves that 5 is the right number.
> 
> 

Tom,

I did the above and can not reproduce either.

I think 5 is the right number; however, we should optimize it later.

-vu
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ewg] nfsrdma fails to write big file,
       [not found]                   ` <9FA59C95FFCBB34EA5E42C1A8573784F02663602-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
@ 2010-03-02  3:17                     ` Tom Tucker
  0 siblings, 0 replies; 16+ messages in thread
From: Tom Tucker @ 2010-03-02  3:17 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Vu Pham, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mahesh Siddheshwar,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Roland:

I'll put together a patch based on 5 with a comment that indicates why I 
think 5 is the number. Since Vu has verified this behaviorally as well, 
I'm comfortable that our understanding of the code is sound. I'm on the 
road right now, so it won't be until tomorrow though.

Thanks,
Tom


Vu Pham wrote:
>   
>> -----Original Message-----
>> From: Tom Tucker [mailto:tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
>> Sent: Saturday, February 27, 2010 8:23 PM
>> To: Vu Pham
>> Cc: Roland Dreier; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>
>> Roland Dreier wrote:
>>     
>>>  > +               /*
>>>  > +                * Add room for frmr register and invalidate WRs
>>>  > +                * Requests sometimes have two chunks, each chunk
>>>  > +                * requires to have different frmr. The safest
>>>  > +                * WRs required are max_send_wr * 6; however, we
>>>  > +                * get send completions and poll fast enough, it
>>>  > +                * is pretty safe to have max_send_wr * 4.
>>>  > +                */
>>>  > +               ep->rep_attr.cap.max_send_wr *= 4;
>>>
>>> Seems like a bad design if there is a possibility of work queue
>>> overflow; if you're counting on events occurring in a particular
>>>       
>> order
>>     
>>> or completions being handled "fast enough", then your design is
>>>       
> going
>   
>> to
>>     
>>> fail in some high load situations, which I don't think you want.
>>>
>>>
>>>       
>> Vu,
>>
>> Would you please try the following:
>>
>> - Set the multiplier to 5
>> - Set the number of buffer credits small as follows "echo 4 >
>> /proc/sys/sunrpc/rdma_slot_table_entries"
>> - Rerun your test and see if you can reproduce the problem?
>>
>> I did the above and was unable to reproduce, but I would like to see
>>     
> if
>   
>> you can to convince ourselves that 5 is the right number.
>>
>>
>>     
>
> Tom,
>
> I did the above and can not reproduce either.
>
> I think 5 is the right number; however, we should optimize it later.
>
> -vu
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: nfsrdma fails to write big file,
       [not found]                 ` <4B89EF88.1030903-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-03-03 20:26                   ` Mahesh Siddheshwar
       [not found]                     ` <4B8EC600.9050101-xsfywfwIY+M@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Mahesh Siddheshwar @ 2010-03-03 20:26 UTC (permalink / raw)
  To: Tom Tucker, Vu Pham
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier,
	ewg-G2znmakfqn7U1rindQTSdQ

Hi Tom, Vu,

Tom Tucker wrote:
> Roland Dreier wrote:
>>  > +               /*  > +                * Add room for frmr 
>> register and invalidate WRs
>>  > +                * Requests sometimes have two chunks, each chunk
>>  > +                * requires to have different frmr. The safest
>>  > +                * WRs required are max_send_wr * 6; however, we
>>  > +                * get send completions and poll fast enough, it
>>  > +                * is pretty safe to have max_send_wr * 4.  > 
>> +                */
>>  > +               ep->rep_attr.cap.max_send_wr *= 4;
>>
>> Seems like a bad design if there is a possibility of work queue
>> overflow; if you're counting on events occurring in a particular order
>> or completions being handled "fast enough", then your design is going to
>> fail in some high load situations, which I don't think you want.   
>
> Vu,
>
> Would you please try the following:
>
> - Set the multiplier to 5
While trying to test this between a Linux client and Solaris server,
I made the following changes in :
/usr/src/ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c

diff verbs.c.org verbs.c
653c653
<               ep->rep_attr.cap.max_send_wr *= 3;
---
 >               ep->rep_attr.cap.max_send_wr *= 8;
685c685
<       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
---
 >       ep->rep_cqinit = ep->rep_attr.cap.max

(I bumped it to 8)

did make install. 

On reboot I see the errors on NFS READs as opposed to WRITEs
as seen before, when I try to read a 10G file from the server.

The client is running: RHEL 5.3 (2.6.18-128.el5PAE) with
OFED-1.5.1-20100223-0740 bits. The client has an Sun IB
HCA: SUN0070130001, MT25418, 2.7.0 firmware, hw_rev = a0.
The server is running Solaris based on snv_128.

rpcdebug output from the client:

==
RPC:    85 call_bind (status 0)
RPC:    85 call_connect xprt ec78d800 is connected
RPC:    85 call_transmit (status 0)
RPC:    85 xprt_prepare_transmit
RPC:    85 xprt_cwnd_limited cong = 0 cwnd = 8192
RPC:    85 rpc_xdr_encode (status 0)
RPC:    85 marshaling UNIX cred eddb4dc0
RPC:    85 using AUTH_UNIX cred eddb4dc0 to wrap rpc data
RPC:    85 xprt_transmit(164)
RPC:       rpcrdma_inline_pullup: pad 0 destp 0xf1dd1410 len 164 hdrlen 164
RPC:       rpcrdma_register_frmr_external: Using frmr ec7da920 to map 4 
segments
RPC:       rpcrdma_create_chunks: write chunk elem 
16384@0x38536d000:0xa601 (more)
RPC:       rpcrdma_register_frmr_external: Using frmr ec7da960 to map 1 
segments
RPC:       rpcrdma_create_chunks: write chunk elem 108@0x31dd153c:0xaa01 
(last)
RPC:       rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 164 padlen 
0 headerp 0xf1dd124c base 0xf1dd136c lkey 0x500
RPC:    85 xmit complete
RPC:    85 sleep_on(queue "xprt_pending" time 4683109)
RPC:    85 added to queue ec78d994 "xprt_pending"
RPC:    85 setting alarm for 60000 ms
RPC:       wake_up_next(ec78d944 "xprt_resend")
RPC:       wake_up_next(ec78d8f4 "xprt_sending")
RPC:       rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0 ep 
ec78db40
RPC:    85 __rpc_wake_up_task (now 4683110)
RPC:    85 disabling timer
RPC:    85 removed from queue ec78d994 "xprt_pending"
RPC:       __rpc_wake_up_task done
RPC:    85 __rpc_execute flags=0x1
RPC:    85 call_status (status -107)
RPC:    85 call_bind (status 0)
RPC:    85 call_connect xprt ec78d800 is not connected
RPC:    85 xprt_connect xprt ec78d800 is not connected
RPC:    85 sleep_on(queue "xprt_pending" time 4683110)
RPC:    85 added to queue ec78d994 "xprt_pending"
RPC:    85 setting alarm for 60000 ms
RPC:       rpcrdma_event_process: event rep ec116800 status 5 opcode 80 
length 2493606
RPC:       rpcrdma_event_process: recv WC status 5, connection lost
RPC:       rpcrdma_conn_upcall: disconnected: ec78dbccI4:20049 (ep 
0xec78db40 event 0xa)
RPC:       rpcrdma_conn_upcall: disconnected
rpcrdma: connection to ec78dbccI4:20049 closed (-103)
RPC:       xprt_rdma_connect_worker: reconnect
==

On the server I see:

Mar  3 17:45:16 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
hermon0: Device Error: CQE remote access error
Mar  3 17:45:16 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
bad sendreply
Mar  3 17:45:21 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
hermon0: Device Error: CQE remote access error
Mar  3 17:45:21 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
bad sendreply

The remote access error is actually seen on RDMA_WRITE.
Doing some more debug on the server with DTrace, I see that
the destination address and length matches the write chunk
element in the Linux debug output above.


  0   9385                  rib_write:entry daddr 38536d000, len 4000, 
hdl a601
  0   9358         rib_init_sendwait:return ffffff44a715d308
  1   9296       rib_svc_scq_handler:return 1f7
  1   9356              rib_sendwait:return 14
  1   9386                 rib_write:return 14

^^^ that is RDMA_FAILED in 

  1  63295    xdrrdma_send_read_data:return 0
  1   5969              xdr_READ3res:return
  1   5969              xdr_READ3res:return 0

Is this a variation of the previously discussed issue or something new?

Thanks,
Mahesh

> - Set the number of buffer credits small as follows "echo 4 > 
> /proc/sys/sunrpc/rdma_slot_table_entries"
> - Rerun your test and see if you can reproduce the problem?
>
> I did the above and was unable to reproduce, but I would like to see 
> if you can to convince ourselves that 5 is the right number.
>
> Thanks,
> Tom
>
>>  - R.
>>   
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ewg] nfsrdma fails to write big file,
       [not found]                     ` <4B8EC600.9050101-xsfywfwIY+M@public.gmane.org>
@ 2010-03-03 22:52                       ` Tom Tucker
       [not found]                         ` <4B8EE813.2010205-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Tom Tucker @ 2010-03-03 22:52 UTC (permalink / raw)
  To: Mahesh Siddheshwar
  Cc: Vu Pham, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-G2znmakfqn7U1rindQTSdQ

Mahesh Siddheshwar wrote:
> Hi Tom, Vu,
>
> Tom Tucker wrote:
>> Roland Dreier wrote:
>>>  > +               /*  > +                * Add room for frmr 
>>> register and invalidate WRs
>>>  > +                * Requests sometimes have two chunks, each chunk
>>>  > +                * requires to have different frmr. The safest
>>>  > +                * WRs required are max_send_wr * 6; however, we
>>>  > +                * get send completions and poll fast enough, it
>>>  > +                * is pretty safe to have max_send_wr * 4.  > 
>>> +                */
>>>  > +               ep->rep_attr.cap.max_send_wr *= 4;
>>>
>>> Seems like a bad design if there is a possibility of work queue
>>> overflow; if you're counting on events occurring in a particular order
>>> or completions being handled "fast enough", then your design is 
>>> going to
>>> fail in some high load situations, which I don't think you want.   
>>
>> Vu,
>>
>> Would you please try the following:
>>
>> - Set the multiplier to 5
> While trying to test this between a Linux client and Solaris server,
> I made the following changes in :
> /usr/src/ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c
>
> diff verbs.c.org verbs.c
> 653c653
> <               ep->rep_attr.cap.max_send_wr *= 3;
> ---
> >               ep->rep_attr.cap.max_send_wr *= 8;
> 685c685
> <       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
> ---
> >       ep->rep_cqinit = ep->rep_attr.cap.max
>
> (I bumped it to 8)
>
> did make install.
> On reboot I see the errors on NFS READs as opposed to WRITEs
> as seen before, when I try to read a 10G file from the server.
>
> The client is running: RHEL 5.3 (2.6.18-128.el5PAE) with
> OFED-1.5.1-20100223-0740 bits. The client has an Sun IB
> HCA: SUN0070130001, MT25418, 2.7.0 firmware, hw_rev = a0.
> The server is running Solaris based on snv_128.
>
> rpcdebug output from the client:
>
> ==
> RPC:    85 call_bind (status 0)
> RPC:    85 call_connect xprt ec78d800 is connected
> RPC:    85 call_transmit (status 0)
> RPC:    85 xprt_prepare_transmit
> RPC:    85 xprt_cwnd_limited cong = 0 cwnd = 8192
> RPC:    85 rpc_xdr_encode (status 0)
> RPC:    85 marshaling UNIX cred eddb4dc0
> RPC:    85 using AUTH_UNIX cred eddb4dc0 to wrap rpc data
> RPC:    85 xprt_transmit(164)
> RPC:       rpcrdma_inline_pullup: pad 0 destp 0xf1dd1410 len 164 
> hdrlen 164
> RPC:       rpcrdma_register_frmr_external: Using frmr ec7da920 to map 
> 4 segments
> RPC:       rpcrdma_create_chunks: write chunk elem 
> 16384@0x38536d000:0xa601 (more)
> RPC:       rpcrdma_register_frmr_external: Using frmr ec7da960 to map 
> 1 segments
> RPC:       rpcrdma_create_chunks: write chunk elem 
> 108@0x31dd153c:0xaa01 (last)
> RPC:       rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 164 
> padlen 0 headerp 0xf1dd124c base 0xf1dd136c lkey 0x500
> RPC:    85 xmit complete
> RPC:    85 sleep_on(queue "xprt_pending" time 4683109)
> RPC:    85 added to queue ec78d994 "xprt_pending"
> RPC:    85 setting alarm for 60000 ms
> RPC:       wake_up_next(ec78d944 "xprt_resend")
> RPC:       wake_up_next(ec78d8f4 "xprt_sending")
> RPC:       rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0 
> ep ec78db40
> RPC:    85 __rpc_wake_up_task (now 4683110)
> RPC:    85 disabling timer
> RPC:    85 removed from queue ec78d994 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:    85 __rpc_execute flags=0x1
> RPC:    85 call_status (status -107)
> RPC:    85 call_bind (status 0)
> RPC:    85 call_connect xprt ec78d800 is not connected
> RPC:    85 xprt_connect xprt ec78d800 is not connected
> RPC:    85 sleep_on(queue "xprt_pending" time 4683110)
> RPC:    85 added to queue ec78d994 "xprt_pending"
> RPC:    85 setting alarm for 60000 ms
> RPC:       rpcrdma_event_process: event rep ec116800 status 5 opcode 
> 80 length 2493606
> RPC:       rpcrdma_event_process: recv WC status 5, connection lost
> RPC:       rpcrdma_conn_upcall: disconnected: ec78dbccI4:20049 (ep 
> 0xec78db40 event 0xa)
> RPC:       rpcrdma_conn_upcall: disconnected
> rpcrdma: connection to ec78dbccI4:20049 closed (-103)
> RPC:       xprt_rdma_connect_worker: reconnect
> ==
>
> On the server I see:
>
> Mar  3 17:45:16 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
> hermon0: Device Error: CQE remote access error
> Mar  3 17:45:16 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
> bad sendreply
> Mar  3 17:45:21 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
> hermon0: Device Error: CQE remote access error
> Mar  3 17:45:21 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
> bad sendreply
>
> The remote access error is actually seen on RDMA_WRITE.
> Doing some more debug on the server with DTrace, I see that
> the destination address and length matches the write chunk
> element in the Linux debug output above.
>
>
>  0   9385                  rib_write:entry daddr 38536d000, len 4000, 
> hdl a601
>  0   9358         rib_init_sendwait:return ffffff44a715d308
>  1   9296       rib_svc_scq_handler:return 1f7
>  1   9356              rib_sendwait:return 14
>  1   9386                 rib_write:return 14
>
> ^^^ that is RDMA_FAILED in
>  1  63295    xdrrdma_send_read_data:return 0
>  1   5969              xdr_READ3res:return
>  1   5969              xdr_READ3res:return 0
>
> Is this a variation of the previously discussed issue or something new?
>

I think this is new. This seems to be some kind of base/bounds or access 
violation or perhaps an invalid rkey.

> Thanks,
> Mahesh
>
>> - Set the number of buffer credits small as follows "echo 4 > 
>> /proc/sys/sunrpc/rdma_slot_table_entries"
>> - Rerun your test and see if you can reproduce the problem?
>>
>> I did the above and was unable to reproduce, but I would like to see 
>> if you can to convince ourselves that 5 is the right number.
>>
>> Thanks,
>> Tom
>>
>>>  - R.
>>>   
>>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ewg] nfsrdma fails to write big file,
       [not found]                         ` <4B8EE813.2010205-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-03-04 16:43                           ` Mahesh Siddheshwar
  0 siblings, 0 replies; 16+ messages in thread
From: Mahesh Siddheshwar @ 2010-03-04 16:43 UTC (permalink / raw)
  To: Tom Tucker
  Cc: Vu Pham, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ewg-G2znmakfqn7U1rindQTSdQ

Tom Tucker wrote:
> Mahesh Siddheshwar wrote:
>> Hi Tom, Vu,
>>
>> Tom Tucker wrote:
>>> Roland Dreier wrote:
>>>>  > +               /*  > +                * Add room for frmr 
>>>> register and invalidate WRs
>>>>  > +                * Requests sometimes have two chunks, each chunk
>>>>  > +                * requires to have different frmr. The safest
>>>>  > +                * WRs required are max_send_wr * 6; however, we
>>>>  > +                * get send completions and poll fast enough, it
>>>>  > +                * is pretty safe to have max_send_wr * 4.  > 
>>>> +                */
>>>>  > +               ep->rep_attr.cap.max_send_wr *= 4;
>>>>
>>>> Seems like a bad design if there is a possibility of work queue
>>>> overflow; if you're counting on events occurring in a particular order
>>>> or completions being handled "fast enough", then your design is 
>>>> going to
>>>> fail in some high load situations, which I don't think you want.   
>>>
>>> Vu,
>>>
>>> Would you please try the following:
>>>
>>> - Set the multiplier to 5
>> While trying to test this between a Linux client and Solaris server,
>> I made the following changes in :
>> /usr/src/ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c
>>
>> diff verbs.c.org verbs.c
>> 653c653
>> <               ep->rep_attr.cap.max_send_wr *= 3;
>> ---
>> >               ep->rep_attr.cap.max_send_wr *= 8;
>> 685c685
>> <       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
>> ---
>> >       ep->rep_cqinit = ep->rep_attr.cap.max
>>
>> (I bumped it to 8)
>>
>> did make install.
>> On reboot I see the errors on NFS READs as opposed to WRITEs
>> as seen before, when I try to read a 10G file from the server.
>>
>> The client is running: RHEL 5.3 (2.6.18-128.el5PAE) with
>> OFED-1.5.1-20100223-0740 bits. The client has an Sun IB
>> HCA: SUN0070130001, MT25418, 2.7.0 firmware, hw_rev = a0.
>> The server is running Solaris based on snv_128.
>>
>> rpcdebug output from the client:
>>
>> ==
>> RPC:    85 call_bind (status 0)
>> RPC:    85 call_connect xprt ec78d800 is connected
>> RPC:    85 call_transmit (status 0)
>> RPC:    85 xprt_prepare_transmit
>> RPC:    85 xprt_cwnd_limited cong = 0 cwnd = 8192
>> RPC:    85 rpc_xdr_encode (status 0)
>> RPC:    85 marshaling UNIX cred eddb4dc0
>> RPC:    85 using AUTH_UNIX cred eddb4dc0 to wrap rpc data
>> RPC:    85 xprt_transmit(164)
>> RPC:       rpcrdma_inline_pullup: pad 0 destp 0xf1dd1410 len 164 
>> hdrlen 164
>> RPC:       rpcrdma_register_frmr_external: Using frmr ec7da920 to map 
>> 4 segments
>> RPC:       rpcrdma_create_chunks: write chunk elem 
>> 16384@0x38536d000:0xa601 (more)
>> RPC:       rpcrdma_register_frmr_external: Using frmr ec7da960 to map 
>> 1 segments
>> RPC:       rpcrdma_create_chunks: write chunk elem 
>> 108@0x31dd153c:0xaa01 (last)
>> RPC:       rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 164 
>> padlen 0 headerp 0xf1dd124c base 0xf1dd136c lkey 0x500
>> RPC:    85 xmit complete
>> RPC:    85 sleep_on(queue "xprt_pending" time 4683109)
>> RPC:    85 added to queue ec78d994 "xprt_pending"
>> RPC:    85 setting alarm for 60000 ms
>> RPC:       wake_up_next(ec78d944 "xprt_resend")
>> RPC:       wake_up_next(ec78d8f4 "xprt_sending")
>> RPC:       rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0 
>> ep ec78db40
>> RPC:    85 __rpc_wake_up_task (now 4683110)
>> RPC:    85 disabling timer
>> RPC:    85 removed from queue ec78d994 "xprt_pending"
>> RPC:       __rpc_wake_up_task done
>> RPC:    85 __rpc_execute flags=0x1
>> RPC:    85 call_status (status -107)
>> RPC:    85 call_bind (status 0)
>> RPC:    85 call_connect xprt ec78d800 is not connected
>> RPC:    85 xprt_connect xprt ec78d800 is not connected
>> RPC:    85 sleep_on(queue "xprt_pending" time 4683110)
>> RPC:    85 added to queue ec78d994 "xprt_pending"
>> RPC:    85 setting alarm for 60000 ms
>> RPC:       rpcrdma_event_process: event rep ec116800 status 5 opcode 
>> 80 length 2493606
>> RPC:       rpcrdma_event_process: recv WC status 5, connection lost
>> RPC:       rpcrdma_conn_upcall: disconnected: ec78dbccI4:20049 (ep 
>> 0xec78db40 event 0xa)
>> RPC:       rpcrdma_conn_upcall: disconnected
>> rpcrdma: connection to ec78dbccI4:20049 closed (-103)
>> RPC:       xprt_rdma_connect_worker: reconnect
>> ==
>>
>> On the server I see:
>>
>> Mar  3 17:45:16 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
>> hermon0: Device Error: CQE remote access error
>> Mar  3 17:45:16 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
>> bad sendreply
>> Mar  3 17:45:21 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
>> hermon0: Device Error: CQE remote access error
>> Mar  3 17:45:21 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
>> bad sendreply
>>
>> The remote access error is actually seen on RDMA_WRITE.
>> Doing some more debug on the server with DTrace, I see that
>> the destination address and length matches the write chunk
>> element in the Linux debug output above.
>>
>>
>>  0   9385                  rib_write:entry daddr 38536d000, len 4000, 
>> hdl a601
>>  0   9358         rib_init_sendwait:return ffffff44a715d308
>>  1   9296       rib_svc_scq_handler:return 1f7
>>  1   9356              rib_sendwait:return 14
>>  1   9386                 rib_write:return 14
>>
>> ^^^ that is RDMA_FAILED in
>>  1  63295    xdrrdma_send_read_data:return 0
>>  1   5969              xdr_READ3res:return
>>  1   5969              xdr_READ3res:return 0
>>
>> Is this a variation of the previously discussed issue or something new?
>>
>
> I think this is new. This seems to be some kind of base/bounds or 
> access violation or perhaps an invalid rkey.
>
Thanks for checking, Tom. I can file a new bug against this. The
test setup is a DDR HCA (client) connected to a DDR Voltaire Switch,
connected to a QDR HCA (server, but limited to PCI-gen1). I have
not seen this on a similar setup with both client/server configured with
QDR HCAs.

What type of debug info would you need to debug this further?

Thanks,
Mahesh
>> Thanks,
>> Mahesh
>>
>>> - Set the number of buffer credits small as follows "echo 4 > 
>>> /proc/sys/sunrpc/rdma_slot_table_entries"
>>> - Rerun your test and see if you can reproduce the problem?
>>>
>>> I did the above and was unable to reproduce, but I would like to see 
>>> if you can to convince ourselves that 5 is the right number.
>>>
>>> Thanks,
>>> Tom
>>>
>>>>  - R.
>>>>   
>>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-03-04 16:43 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-22 18:41 nfsrdma fails to write big file, Vu Pham
     [not found] ` <9FA59C95FFCBB34EA5E42C1A8573784F02662E58-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
2010-02-22 18:49   ` [ewg] " Tom Tucker
2010-02-22 20:22     ` Vu Pham
2010-02-24 18:56       ` Vu Pham
     [not found]         ` <9FA59C95FFCBB34EA5E42C1A8573784F02663166-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
2010-02-24 19:06           ` Roland Dreier
     [not found]             ` <ada3a0q1mje.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-02-24 22:13               ` Tom Tucker
2010-02-28  4:22               ` Tom Tucker
2010-03-02  0:19                 ` Vu Pham
     [not found]                   ` <9FA59C95FFCBB34EA5E42C1A8573784F02663602-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
2010-03-02  3:17                     ` Tom Tucker
     [not found]                 ` <4B89EF88.1030903-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-03 20:26                   ` Mahesh Siddheshwar
     [not found]                     ` <4B8EC600.9050101-xsfywfwIY+M@public.gmane.org>
2010-03-03 22:52                       ` [ewg] " Tom Tucker
     [not found]                         ` <4B8EE813.2010205-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-04 16:43                           ` Mahesh Siddheshwar
2010-02-24 22:07           ` Tom Tucker
2010-02-24 22:48           ` Tom Tucker
     [not found]             ` <4B85ACD2.9040405-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-25  0:02               ` Tom Tucker
     [not found]                 ` <4B85BDF9.8020009-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-25  0:51                   ` Tom Tucker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.