All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS over RDMA stops working for data larger than 812 Bytes
       [not found] <527CECD8.2000208@ims.co.at>
@ 2013-11-08 14:14 ` Klemens Senn
  2013-11-13  2:18   ` Harry Edmon
  0 siblings, 1 reply; 8+ messages in thread
From: Klemens Senn @ 2013-11-08 14:14 UTC (permalink / raw)
  To: linux-nfs

Hi,

I started setting up NFS over RDMA on a OpenSUSE 12.3 machine using the
3.7.10-1.16-default kernel.
Mounting the NFS share works and getting a file with 812 Bytes succeeds.
Reading 813 Bytes raises a Input/output error.

The export on the server side is done with
/data
172.16.100.0/24(rw,wdelay,crossmnt,insecure,no_root_squash,no_subtree_check,fsid=0,mountpoint)

Following command is used for mounting the NFSv4 share:
  mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.2:/ /mnt/

Testing the client side with older kernels stated that the mount is
working properly with the 3.4.67 kernel. I continued testing with the
3.10.17 kernel and enabled the debug level and noticed the message
  "server cheating in read reply: count 813 > recvd 0"

Here is the syslog output:
NFS: read(//813_bytes, 262144@0)
NFS: nfs_readpage (ffffea001db1e898 4096@0)
NFS:     0 initiated read call (req 0:33/70, 813 bytes @ offset 0)
RPC:       new task initialized, procpid 9881
RPC:       allocated task ffff88105dacabf8
RPC:  1631 __rpc_execute flags=0x1
<-- nfs4_setup_sequence status=0
RPC:  1631 call_start nfs4 proc READ (async)
RPC:  1631 call_reserve (status 0)
RPC:  1631 reserved req ffff88085e641c00 xid 3f6a89a6
RPC:  1631 call_reserveresult (status 0)
RPC:  1631 call_refresh (status 0)
RPC:  1631 refreshing UNIX cred ffff88087c2108c0
RPC:  1631 call_refreshresult (status 0)
RPC:  1631 call_allocate (status 0)
RPC:       xprt_rdma_allocate: size 684, request 0xffff88084a342000
RPC:  1631 call_bind (status 0)
RPC:  1631 call_connect xprt ffff88085f297000 is connected
RPC:  1631 call_transmit (status 0)
RPC:  1631 xprt_prepare_transmit
RPC:  1631 xprt_cwnd_limited cong = 0 cwnd = 4096
RPC:  1631 rpc_xdr_encode (status 0)
RPC:  1631 marshaling UNIX cred ffff88087c2108c0
RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to wrap rpc data
RPC:  1631 xprt_transmit(144)
RPC:       rpcrdma_inline_pullup: pad 0 destp 0xffff88084a3437f0 len 144
hdrlen 144
RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d10
to map 1 segments
RPC:       rpcrdma_create_chunks: write chunk elem
813@0x87bf95000:0x8a905 (more)
RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d38
to map 1 segments
RPC:       rpcrdma_create_chunks: write chunk elem
152@0x84a343974:0x8aa01 (last)
RPC:       rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 144 padlen
0 headerp 0xffff88084a343100 base 0xffff88084a343760 lkey 0x8000
RPC:       rpcrdma_event_process: event rep ffff88084b612d10 status 0
opcode 8 length 4294936584
RPC:       rpcrdma_event_process: event rep ffff88084b612d38 status 0
opcode 8 length 4294936584
RPC:  1631 xmit complete
RPC:  1631 sleep_on(queue "xprt_pending" time 4296115067)
RPC:  1631 added to queue ffff88085f297258 "xprt_pending"
RPC:  1631 setting alarm for 90000 ms
RPC:       wake_up_first(ffff88085f297190 "xprt_sending")
RPC:       rpcrdma_event_process: event rep ffff88085e8bf000 status 0
opcode 80 length 128
RPC:       rpcrdma_reply_handler: reply 0xffff88085e8bf000 completes
request 0xffff88084a342000
                   RPC request 0xffff88085e641c00 xid 0xa6896a3f
RPC:       rpcrdma_count_chunks: chunk 813@0x87bf95000:0x8a905
RPC:       rpcrdma_count_chunks: chunk 3@0x84a343974:0x8aa01
RPC:       rpcrdma_inline_fixup: srcp 0xffff88085e8bf0a4 len 60 hdrlen 60
RPC:       rpcrdma_reply_handler: xprt_complete_rqst(0xffff88085f297000,
0xffff88085e641c00, 876)
RPC:  1631 xid 3f6a89a6 complete (876 bytes received)
RPC:  1631 __rpc_wake_up_task (now 4296115067)
RPC:  1631 disabling timer
RPC:  1631 removed from queue ffff88085f297258 "xprt_pending"
RPC:       __rpc_wake_up_task done
RPC:  1631 __rpc_execute flags=0x801
RPC:  1631 call_status (status 876)
RPC:  1631 call_decode (status 876)
RPC:  1631 validating UNIX cred ffff88087c2108c0
RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to unwrap rpc data
NFS: server cheating in read reply: count 813 > recvd 0
RPC:  1631 call_decode result 0
NFS: nfs_readpage_result:  1631, (status 0)
--> nfs4_read_done

Adding other debug prints to the kernel module told me that
buf->page_len is set to 0 in xdr_align_pages (in file net/sunrpc/xdr.c).
Comparing this piece of code with older kernel versions showed that the
comparison with buf->page_len was introduces in kernel 3.7.
By disabling this check the file transfer seems to work.

Here is the patch for kernel 3.10.17:

--- net/sunrpc/xdr.c	2013-10-18 19:44:19.000000000 +0200
+++ net/sunrpc/xdr_new.c	2013-11-08 14:06:00.241217773 +0100
@@ -747,7 +747,7 @@
 		nwords = xdr->nwords;
 		len = nwords << 2;
 	}
-	if (buf->page_len <= len)
+	if (buf->page_len && buf->page_len <= len)
 		len = buf->page_len;
 	else if (nwords < xdr->nwords) {
 		/* Truncate page data and move it into the tail */


Can you please tell me which entity sets (or does not set) the total
length (buf->page_len) after receiving the data from the RDMA_SEND
operation from the server?

Please advise

Klemens

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS over RDMA stops working for data larger than 812 Bytes
  2013-11-08 14:14 ` NFS over RDMA stops working for data larger than 812 Bytes Klemens Senn
@ 2013-11-13  2:18   ` Harry Edmon
  0 siblings, 0 replies; 8+ messages in thread
From: Harry Edmon @ 2013-11-13  2:18 UTC (permalink / raw)
  To: linux-nfs

Klemens Senn <klemens.senn@...> writes:

> 
> Hi,
> 
> I started setting up NFS over RDMA on a OpenSUSE 12.3 machine using the
> 3.7.10-1.16-default kernel.
> Mounting the NFS share works and getting a file with 812 Bytes succeeds.
> Reading 813 Bytes raises a Input/output error.
> 
> The export on the server side is done with
> /data
> 
172.16.100.0/24(rw,wdelay,crossmnt,insecure,no_root_squash,no_subtree_check,
fsid=0,mountpoint)
> 
> Following command is used for mounting the NFSv4 share:
>   mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.2:/ /mnt/
> 
> Testing the client side with older kernels stated that the mount is
> working properly with the 3.4.67 kernel. I continued testing with the
> 3.10.17 kernel and enabled the debug level and noticed the message
>   "server cheating in read reply: count 813 > recvd 0"


I am seeing the same thing on an Debian 7.2 RDMA client running the 3.12 
kernel.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS over RDMA stops working for data larger than 812 Bytes
  2013-11-16  0:29       ` Tom Talpey
@ 2013-11-16  6:04         ` Myklebust, Trond
  0 siblings, 0 replies; 8+ messages in thread
From: Myklebust, Trond @ 2013-11-16  6:04 UTC (permalink / raw)
  To: Tom Talpey; +Cc: Charles Edward Lever, Klemens Senn, Linux NFS Mailing List


On Nov 15, 2013, at 19:29, Tom Talpey <tom@talpey.com> wrote:

> On 11/15/2013 5:53 PM, Myklebust, Trond wrote:
>> 
>> On Nov 15, 2013, at 17:31, Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>>> 
>>> On Nov 15, 2013, at 5:29 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
>>> 
>>>> 
>>>> On Nov 15, 2013, at 16:39, Chuck Lever <chuck.lever@oracle.com> wrote:
>>>> 
>>>>> Hi Trond-
>>>>> 
>>>>> Klemens Senn <klemens.senn@ims.co.at> says:
>>>>>> Hi,
>>>>>> 
>>>>>> I started setting up NFS over RDMA on a OpenSUSE 12.3 machine using the
>>>>>> 3.7.10-1.16-default kernel.
>>>>>> Mounting the NFS share works and getting a file with 812 Bytes succeeds.
>>>>>> Reading 813 Bytes raises a Input/output error.
>>>>>> 
>>>>>> The export on the server side is done with
>>>>>> /data
>>>>>> 172.16.100.0/24(rw,wdelay,crossmnt,insecure,no_root_squash,no_subtree_check,fsid=0,mountpoint)
>>>>>> 
>>>>>> Following command is used for mounting the NFSv4 share:
>>>>>> mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.2:/ /mnt/
>>>>>> 
>>>>>> Testing the client side with older kernels stated that the mount is
>>>>>> working properly with the 3.4.67 kernel. I continued testing with the
>>>>>> 3.10.17 kernel and enabled the debug level and noticed the message
>>>>>> "server cheating in read reply: count 813 > recvd 0"
>>>>>> 
>>>>>> Here is the syslog output:
>>>>>> NFS: read(//813_bytes, 262144@0)
>>>>>> NFS: nfs_readpage (ffffea001db1e898 4096@0)
>>>>>> NFS:     0 initiated read call (req 0:33/70, 813 bytes @ offset 0)
>>>>>> RPC:       new task initialized, procpid 9881
>>>>>> RPC:       allocated task ffff88105dacabf8
>>>>>> RPC:  1631 __rpc_execute flags=0x1
>>>>>> <-- nfs4_setup_sequence status=0
>>>>>> RPC:  1631 call_start nfs4 proc READ (async)
>>>>>> RPC:  1631 call_reserve (status 0)
>>>>>> RPC:  1631 reserved req ffff88085e641c00 xid 3f6a89a6
>>>>>> RPC:  1631 call_reserveresult (status 0)
>>>>>> RPC:  1631 call_refresh (status 0)
>>>>>> RPC:  1631 refreshing UNIX cred ffff88087c2108c0
>>>>>> RPC:  1631 call_refreshresult (status 0)
>>>>>> RPC:  1631 call_allocate (status 0)
>>>>>> RPC:       xprt_rdma_allocate: size 684, request 0xffff88084a342000
>>>>>> RPC:  1631 call_bind (status 0)
>>>>>> RPC:  1631 call_connect xprt ffff88085f297000 is connected
>>>>>> RPC:  1631 call_transmit (status 0)
>>>>>> RPC:  1631 xprt_prepare_transmit
>>>>>> RPC:  1631 xprt_cwnd_limited cong = 0 cwnd = 4096
>>>>>> RPC:  1631 rpc_xdr_encode (status 0)
>>>>>> RPC:  1631 marshaling UNIX cred ffff88087c2108c0
>>>>>> RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to wrap rpc data
>>>>>> RPC:  1631 xprt_transmit(144)
>>>>>> RPC:       rpcrdma_inline_pullup: pad 0 destp 0xffff88084a3437f0 len 144
>>>>>> hdrlen 144
>>>>>> RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d10
>>>>>> to map 1 segments
>>>>>> RPC:       rpcrdma_create_chunks: write chunk elem
>>>>>> 813@0x87bf95000:0x8a905 (more)
>>>>>> RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d38
>>>>>> to map 1 segments
>>>>>> RPC:       rpcrdma_create_chunks: write chunk elem
>>>>>> 152@0x84a343974:0x8aa01 (last)
>>>>>> RPC:       rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 144 padlen
>>>>>> 0 headerp 0xffff88084a343100 base 0xffff88084a343760 lkey 0x8000
>>>>>> RPC:       rpcrdma_event_process: event rep ffff88084b612d10 status 0
>>>>>> opcode 8 length 4294936584
>>>>>> RPC:       rpcrdma_event_process: event rep ffff88084b612d38 status 0
>>>>>> opcode 8 length 4294936584
>>>>>> RPC:  1631 xmit complete
>>>>>> RPC:  1631 sleep_on(queue "xprt_pending" time 4296115067)
>>>>>> RPC:  1631 added to queue ffff88085f297258 "xprt_pending"
>>>>>> RPC:  1631 setting alarm for 90000 ms
>>>>>> RPC:       wake_up_first(ffff88085f297190 "xprt_sending")
>>>>>> RPC:       rpcrdma_event_process: event rep ffff88085e8bf000 status 0
>>>>>> opcode 80 length 128
>>>>>> RPC:       rpcrdma_reply_handler: reply 0xffff88085e8bf000 completes
>>>>>> request 0xffff88084a342000
>>>>>>                RPC request 0xffff88085e641c00 xid 0xa6896a3f
>>>>>> RPC:       rpcrdma_count_chunks: chunk 813@0x87bf95000:0x8a905
>>>>>> RPC:       rpcrdma_count_chunks: chunk 3@0x84a343974:0x8aa01
>>>>>> RPC:       rpcrdma_inline_fixup: srcp 0xffff88085e8bf0a4 len 60 hdrlen 60
>>>>>> RPC:       rpcrdma_reply_handler: xprt_complete_rqst(0xffff88085f297000,
>>>>>> 0xffff88085e641c00, 876)
>>>>>> RPC:  1631 xid 3f6a89a6 complete (876 bytes received)
>>>>>> RPC:  1631 __rpc_wake_up_task (now 4296115067)
>>>>>> RPC:  1631 disabling timer
>>>>>> RPC:  1631 removed from queue ffff88085f297258 "xprt_pending"
>>>>>> RPC:       __rpc_wake_up_task done
>>>>>> RPC:  1631 __rpc_execute flags=0x801
>>>>>> RPC:  1631 call_status (status 876)
>>>>>> RPC:  1631 call_decode (status 876)
>>>>>> RPC:  1631 validating UNIX cred ffff88087c2108c0
>>>>>> RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to unwrap rpc data
>>>>>> NFS: server cheating in read reply: count 813 > recvd 0
>>>>>> RPC:  1631 call_decode result 0
>>>>>> NFS: nfs_readpage_result:  1631, (status 0)
>>>>>> --> nfs4_read_done
>>>>>> 
>>>>>> Adding other debug prints to the kernel module told me that
>>>>>> buf->page_len is set to 0 in xdr_align_pages (in file net/sunrpc/xdr.c).
>>>>>> Comparing this piece of code with older kernel versions showed that the
>>>>>> comparison with buf->page_len was introduces in kernel 3.7.
>>>>>> By disabling this check the file transfer seems to work.
>>>>>> 
>>>>>> Here is the patch for kernel 3.10.17:
>>>>>> 
>>>>>> --- net/sunrpc/xdr.c	2013-10-18 19:44:19.000000000 +0200
>>>>>> +++ net/sunrpc/xdr_new.c	2013-11-08 14:06:00.241217773 +0100
>>>>>> @@ -747,7 +747,7 @@
>>>>>> 		nwords = xdr->nwords;
>>>>>> 		len = nwords << 2;
>>>>>> 	}
>>>>>> -	if (buf->page_len <= len)
>>>>>> +	if (buf->page_len && buf->page_len <= len)
>>>>>> 		len = buf->page_len;
>>>>>> 	else if (nwords < xdr->nwords) {
>>>>>> 		/* Truncate page data and move it into the tail */
>>>>>> 
>>>>>> 
>>>>>> Can you please tell me which entity sets (or does not set) the total
>>>>>> length (buf->page_len) after receiving the data from the RDMA_SEND
>>>>>> operation from the server?
>>>>>> 
>>>>>> Please advise
>>>>>> 
>>>>>> Klemens
>>>>> 
>>>>> Comparing xdr_align_pages() before commit a11a2bf4 and afterwords, it appears that it no longer handles cases where buf->page_len == 0 the same way it used to.  xdr_align_pages() would never return zero unless xdr->nwords was zero.
>>>> 
>>>> Why would the buf->page_len == 0 in a READ call? Has RDMA been relying on some kind of undocumented behaviour?
>>> 
>>> See rpcrdma_inline_fixup().
>> 
>> I see it, but I don’t get it. What is it trying to do?
>> 
>> Firstly, why is it playing with rqst->rq_rcv_buf instead of using rq_private_buf? Secondly, why is it mucking with the page buffer length?
>> 
> 
> Speaking from memory, that routine was there originally to deliver
> the RDMA data into the iov list expected by the upper layer via
> RPC. When receiving via TCP, there were data copies that put it
> into the receive buffers, and the code tried to avoid a similar
> data copy - which would negate a big benefit of RDMA.
> 
> That code was first written for the 2.4 kernel, and it appears to
> be working correctly today, except for the assertion stopping it.
> Has rq_private_buf been there since then?

Hi Tom,

I had to go back to the 'old-2.6-bkcvs’ repository in order to find the commit that introduced rq_private_buf. As far as I can tell, it debuted in the Linux 2.6.0-test4 kernel, which was released on August 22, 2003. That definitely predates the RDMA code, which was merged in September 2007.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS over RDMA stops working for data larger than 812 Bytes
  2013-11-15 22:53     ` Myklebust, Trond
@ 2013-11-16  0:29       ` Tom Talpey
  2013-11-16  6:04         ` Myklebust, Trond
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Talpey @ 2013-11-16  0:29 UTC (permalink / raw)
  To: Myklebust, Trond, Charles Edward Lever
  Cc: Klemens Senn, Linux NFS Mailing List

On 11/15/2013 5:53 PM, Myklebust, Trond wrote:
>
> On Nov 15, 2013, at 17:31, Chuck Lever <chuck.lever@oracle.com> wrote:
>
>>
>> On Nov 15, 2013, at 5:29 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
>>
>>>
>>> On Nov 15, 2013, at 16:39, Chuck Lever <chuck.lever@oracle.com> wrote:
>>>
>>>> Hi Trond-
>>>>
>>>> Klemens Senn <klemens.senn@ims.co.at> says:
>>>>> Hi,
>>>>>
>>>>> I started setting up NFS over RDMA on a OpenSUSE 12.3 machine using the
>>>>> 3.7.10-1.16-default kernel.
>>>>> Mounting the NFS share works and getting a file with 812 Bytes succeeds.
>>>>> Reading 813 Bytes raises a Input/output error.
>>>>>
>>>>> The export on the server side is done with
>>>>> /data
>>>>> 172.16.100.0/24(rw,wdelay,crossmnt,insecure,no_root_squash,no_subtree_check,fsid=0,mountpoint)
>>>>>
>>>>> Following command is used for mounting the NFSv4 share:
>>>>> mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.2:/ /mnt/
>>>>>
>>>>> Testing the client side with older kernels stated that the mount is
>>>>> working properly with the 3.4.67 kernel. I continued testing with the
>>>>> 3.10.17 kernel and enabled the debug level and noticed the message
>>>>> "server cheating in read reply: count 813 > recvd 0"
>>>>>
>>>>> Here is the syslog output:
>>>>> NFS: read(//813_bytes, 262144@0)
>>>>> NFS: nfs_readpage (ffffea001db1e898 4096@0)
>>>>> NFS:     0 initiated read call (req 0:33/70, 813 bytes @ offset 0)
>>>>> RPC:       new task initialized, procpid 9881
>>>>> RPC:       allocated task ffff88105dacabf8
>>>>> RPC:  1631 __rpc_execute flags=0x1
>>>>> <-- nfs4_setup_sequence status=0
>>>>> RPC:  1631 call_start nfs4 proc READ (async)
>>>>> RPC:  1631 call_reserve (status 0)
>>>>> RPC:  1631 reserved req ffff88085e641c00 xid 3f6a89a6
>>>>> RPC:  1631 call_reserveresult (status 0)
>>>>> RPC:  1631 call_refresh (status 0)
>>>>> RPC:  1631 refreshing UNIX cred ffff88087c2108c0
>>>>> RPC:  1631 call_refreshresult (status 0)
>>>>> RPC:  1631 call_allocate (status 0)
>>>>> RPC:       xprt_rdma_allocate: size 684, request 0xffff88084a342000
>>>>> RPC:  1631 call_bind (status 0)
>>>>> RPC:  1631 call_connect xprt ffff88085f297000 is connected
>>>>> RPC:  1631 call_transmit (status 0)
>>>>> RPC:  1631 xprt_prepare_transmit
>>>>> RPC:  1631 xprt_cwnd_limited cong = 0 cwnd = 4096
>>>>> RPC:  1631 rpc_xdr_encode (status 0)
>>>>> RPC:  1631 marshaling UNIX cred ffff88087c2108c0
>>>>> RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to wrap rpc data
>>>>> RPC:  1631 xprt_transmit(144)
>>>>> RPC:       rpcrdma_inline_pullup: pad 0 destp 0xffff88084a3437f0 len 144
>>>>> hdrlen 144
>>>>> RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d10
>>>>> to map 1 segments
>>>>> RPC:       rpcrdma_create_chunks: write chunk elem
>>>>> 813@0x87bf95000:0x8a905 (more)
>>>>> RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d38
>>>>> to map 1 segments
>>>>> RPC:       rpcrdma_create_chunks: write chunk elem
>>>>> 152@0x84a343974:0x8aa01 (last)
>>>>> RPC:       rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 144 padlen
>>>>> 0 headerp 0xffff88084a343100 base 0xffff88084a343760 lkey 0x8000
>>>>> RPC:       rpcrdma_event_process: event rep ffff88084b612d10 status 0
>>>>> opcode 8 length 4294936584
>>>>> RPC:       rpcrdma_event_process: event rep ffff88084b612d38 status 0
>>>>> opcode 8 length 4294936584
>>>>> RPC:  1631 xmit complete
>>>>> RPC:  1631 sleep_on(queue "xprt_pending" time 4296115067)
>>>>> RPC:  1631 added to queue ffff88085f297258 "xprt_pending"
>>>>> RPC:  1631 setting alarm for 90000 ms
>>>>> RPC:       wake_up_first(ffff88085f297190 "xprt_sending")
>>>>> RPC:       rpcrdma_event_process: event rep ffff88085e8bf000 status 0
>>>>> opcode 80 length 128
>>>>> RPC:       rpcrdma_reply_handler: reply 0xffff88085e8bf000 completes
>>>>> request 0xffff88084a342000
>>>>>                 RPC request 0xffff88085e641c00 xid 0xa6896a3f
>>>>> RPC:       rpcrdma_count_chunks: chunk 813@0x87bf95000:0x8a905
>>>>> RPC:       rpcrdma_count_chunks: chunk 3@0x84a343974:0x8aa01
>>>>> RPC:       rpcrdma_inline_fixup: srcp 0xffff88085e8bf0a4 len 60 hdrlen 60
>>>>> RPC:       rpcrdma_reply_handler: xprt_complete_rqst(0xffff88085f297000,
>>>>> 0xffff88085e641c00, 876)
>>>>> RPC:  1631 xid 3f6a89a6 complete (876 bytes received)
>>>>> RPC:  1631 __rpc_wake_up_task (now 4296115067)
>>>>> RPC:  1631 disabling timer
>>>>> RPC:  1631 removed from queue ffff88085f297258 "xprt_pending"
>>>>> RPC:       __rpc_wake_up_task done
>>>>> RPC:  1631 __rpc_execute flags=0x801
>>>>> RPC:  1631 call_status (status 876)
>>>>> RPC:  1631 call_decode (status 876)
>>>>> RPC:  1631 validating UNIX cred ffff88087c2108c0
>>>>> RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to unwrap rpc data
>>>>> NFS: server cheating in read reply: count 813 > recvd 0
>>>>> RPC:  1631 call_decode result 0
>>>>> NFS: nfs_readpage_result:  1631, (status 0)
>>>>> --> nfs4_read_done
>>>>>
>>>>> Adding other debug prints to the kernel module told me that
>>>>> buf->page_len is set to 0 in xdr_align_pages (in file net/sunrpc/xdr.c).
>>>>> Comparing this piece of code with older kernel versions showed that the
>>>>> comparison with buf->page_len was introduces in kernel 3.7.
>>>>> By disabling this check the file transfer seems to work.
>>>>>
>>>>> Here is the patch for kernel 3.10.17:
>>>>>
>>>>> --- net/sunrpc/xdr.c	2013-10-18 19:44:19.000000000 +0200
>>>>> +++ net/sunrpc/xdr_new.c	2013-11-08 14:06:00.241217773 +0100
>>>>> @@ -747,7 +747,7 @@
>>>>> 		nwords = xdr->nwords;
>>>>> 		len = nwords << 2;
>>>>> 	}
>>>>> -	if (buf->page_len <= len)
>>>>> +	if (buf->page_len && buf->page_len <= len)
>>>>> 		len = buf->page_len;
>>>>> 	else if (nwords < xdr->nwords) {
>>>>> 		/* Truncate page data and move it into the tail */
>>>>>
>>>>>
>>>>> Can you please tell me which entity sets (or does not set) the total
>>>>> length (buf->page_len) after receiving the data from the RDMA_SEND
>>>>> operation from the server?
>>>>>
>>>>> Please advise
>>>>>
>>>>> Klemens
>>>>
>>>> Comparing xdr_align_pages() before commit a11a2bf4 and afterwords, it appears that it no longer handles cases where buf->page_len == 0 the same way it used to.  xdr_align_pages() would never return zero unless xdr->nwords was zero.
>>>
>>> Why would the buf->page_len == 0 in a READ call? Has RDMA been relying on some kind of undocumented behaviour?
>>
>> See rpcrdma_inline_fixup().
>
> I see it, but I don’t get it. What is it trying to do?
>
> Firstly, why is it playing with rqst->rq_rcv_buf instead of using rq_private_buf? Secondly, why is it mucking with the page buffer length?
>

Speaking from memory, that routine was there originally to deliver
the RDMA data into the iov list expected by the upper layer via
RPC. When receiving via TCP, there were data copies that put it
into the receive buffers, and the code tried to avoid a similar
data copy - which would negate a big benefit of RDMA.

That code was first written for the 2.4 kernel, and it appears to
be working correctly today, except for the assertion stopping it.
Has rq_private_buf been there since then?

Tom.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS over RDMA stops working for data larger than 812 Bytes
  2013-11-15 22:31   ` Chuck Lever
@ 2013-11-15 22:53     ` Myklebust, Trond
  2013-11-16  0:29       ` Tom Talpey
  0 siblings, 1 reply; 8+ messages in thread
From: Myklebust, Trond @ 2013-11-15 22:53 UTC (permalink / raw)
  To: Charles Edward Lever; +Cc: Klemens Senn, Linux NFS Mailing List


On Nov 15, 2013, at 17:31, Chuck Lever <chuck.lever@oracle.com> wrote:

> 
> On Nov 15, 2013, at 5:29 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> 
>> 
>> On Nov 15, 2013, at 16:39, Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>>> Hi Trond-
>>> 
>>> Klemens Senn <klemens.senn@ims.co.at> says:
>>>> Hi,
>>>> 
>>>> I started setting up NFS over RDMA on a OpenSUSE 12.3 machine using the
>>>> 3.7.10-1.16-default kernel.
>>>> Mounting the NFS share works and getting a file with 812 Bytes succeeds.
>>>> Reading 813 Bytes raises a Input/output error.
>>>> 
>>>> The export on the server side is done with
>>>> /data
>>>> 172.16.100.0/24(rw,wdelay,crossmnt,insecure,no_root_squash,no_subtree_check,fsid=0,mountpoint)
>>>> 
>>>> Following command is used for mounting the NFSv4 share:
>>>> mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.2:/ /mnt/
>>>> 
>>>> Testing the client side with older kernels stated that the mount is
>>>> working properly with the 3.4.67 kernel. I continued testing with the
>>>> 3.10.17 kernel and enabled the debug level and noticed the message
>>>> "server cheating in read reply: count 813 > recvd 0"
>>>> 
>>>> Here is the syslog output:
>>>> NFS: read(//813_bytes, 262144@0)
>>>> NFS: nfs_readpage (ffffea001db1e898 4096@0)
>>>> NFS:     0 initiated read call (req 0:33/70, 813 bytes @ offset 0)
>>>> RPC:       new task initialized, procpid 9881
>>>> RPC:       allocated task ffff88105dacabf8
>>>> RPC:  1631 __rpc_execute flags=0x1
>>>> <-- nfs4_setup_sequence status=0
>>>> RPC:  1631 call_start nfs4 proc READ (async)
>>>> RPC:  1631 call_reserve (status 0)
>>>> RPC:  1631 reserved req ffff88085e641c00 xid 3f6a89a6
>>>> RPC:  1631 call_reserveresult (status 0)
>>>> RPC:  1631 call_refresh (status 0)
>>>> RPC:  1631 refreshing UNIX cred ffff88087c2108c0
>>>> RPC:  1631 call_refreshresult (status 0)
>>>> RPC:  1631 call_allocate (status 0)
>>>> RPC:       xprt_rdma_allocate: size 684, request 0xffff88084a342000
>>>> RPC:  1631 call_bind (status 0)
>>>> RPC:  1631 call_connect xprt ffff88085f297000 is connected
>>>> RPC:  1631 call_transmit (status 0)
>>>> RPC:  1631 xprt_prepare_transmit
>>>> RPC:  1631 xprt_cwnd_limited cong = 0 cwnd = 4096
>>>> RPC:  1631 rpc_xdr_encode (status 0)
>>>> RPC:  1631 marshaling UNIX cred ffff88087c2108c0
>>>> RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to wrap rpc data
>>>> RPC:  1631 xprt_transmit(144)
>>>> RPC:       rpcrdma_inline_pullup: pad 0 destp 0xffff88084a3437f0 len 144
>>>> hdrlen 144
>>>> RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d10
>>>> to map 1 segments
>>>> RPC:       rpcrdma_create_chunks: write chunk elem
>>>> 813@0x87bf95000:0x8a905 (more)
>>>> RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d38
>>>> to map 1 segments
>>>> RPC:       rpcrdma_create_chunks: write chunk elem
>>>> 152@0x84a343974:0x8aa01 (last)
>>>> RPC:       rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 144 padlen
>>>> 0 headerp 0xffff88084a343100 base 0xffff88084a343760 lkey 0x8000
>>>> RPC:       rpcrdma_event_process: event rep ffff88084b612d10 status 0
>>>> opcode 8 length 4294936584
>>>> RPC:       rpcrdma_event_process: event rep ffff88084b612d38 status 0
>>>> opcode 8 length 4294936584
>>>> RPC:  1631 xmit complete
>>>> RPC:  1631 sleep_on(queue "xprt_pending" time 4296115067)
>>>> RPC:  1631 added to queue ffff88085f297258 "xprt_pending"
>>>> RPC:  1631 setting alarm for 90000 ms
>>>> RPC:       wake_up_first(ffff88085f297190 "xprt_sending")
>>>> RPC:       rpcrdma_event_process: event rep ffff88085e8bf000 status 0
>>>> opcode 80 length 128
>>>> RPC:       rpcrdma_reply_handler: reply 0xffff88085e8bf000 completes
>>>> request 0xffff88084a342000
>>>>                RPC request 0xffff88085e641c00 xid 0xa6896a3f
>>>> RPC:       rpcrdma_count_chunks: chunk 813@0x87bf95000:0x8a905
>>>> RPC:       rpcrdma_count_chunks: chunk 3@0x84a343974:0x8aa01
>>>> RPC:       rpcrdma_inline_fixup: srcp 0xffff88085e8bf0a4 len 60 hdrlen 60
>>>> RPC:       rpcrdma_reply_handler: xprt_complete_rqst(0xffff88085f297000,
>>>> 0xffff88085e641c00, 876)
>>>> RPC:  1631 xid 3f6a89a6 complete (876 bytes received)
>>>> RPC:  1631 __rpc_wake_up_task (now 4296115067)
>>>> RPC:  1631 disabling timer
>>>> RPC:  1631 removed from queue ffff88085f297258 "xprt_pending"
>>>> RPC:       __rpc_wake_up_task done
>>>> RPC:  1631 __rpc_execute flags=0x801
>>>> RPC:  1631 call_status (status 876)
>>>> RPC:  1631 call_decode (status 876)
>>>> RPC:  1631 validating UNIX cred ffff88087c2108c0
>>>> RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to unwrap rpc data
>>>> NFS: server cheating in read reply: count 813 > recvd 0
>>>> RPC:  1631 call_decode result 0
>>>> NFS: nfs_readpage_result:  1631, (status 0)
>>>> --> nfs4_read_done
>>>> 
>>>> Adding other debug prints to the kernel module told me that
>>>> buf->page_len is set to 0 in xdr_align_pages (in file net/sunrpc/xdr.c).
>>>> Comparing this piece of code with older kernel versions showed that the
>>>> comparison with buf->page_len was introduces in kernel 3.7.
>>>> By disabling this check the file transfer seems to work.
>>>> 
>>>> Here is the patch for kernel 3.10.17:
>>>> 
>>>> --- net/sunrpc/xdr.c	2013-10-18 19:44:19.000000000 +0200
>>>> +++ net/sunrpc/xdr_new.c	2013-11-08 14:06:00.241217773 +0100
>>>> @@ -747,7 +747,7 @@
>>>> 		nwords = xdr->nwords;
>>>> 		len = nwords << 2;
>>>> 	}
>>>> -	if (buf->page_len <= len)
>>>> +	if (buf->page_len && buf->page_len <= len)
>>>> 		len = buf->page_len;
>>>> 	else if (nwords < xdr->nwords) {
>>>> 		/* Truncate page data and move it into the tail */
>>>> 
>>>> 
>>>> Can you please tell me which entity sets (or does not set) the total
>>>> length (buf->page_len) after receiving the data from the RDMA_SEND
>>>> operation from the server?
>>>> 
>>>> Please advise
>>>> 
>>>> Klemens
>>> 
>>> Comparing xdr_align_pages() before commit a11a2bf4 and afterwords, it appears that it no longer handles cases where buf->page_len == 0 the same way it used to.  xdr_align_pages() would never return zero unless xdr->nwords was zero.
>> 
>> Why would the buf->page_len == 0 in a READ call? Has RDMA been relying on some kind of undocumented behaviour?
> 
> See rpcrdma_inline_fixup().

I see it, but I don’t get it. What is it trying to do?

Firstly, why is it playing with rqst->rq_rcv_buf instead of using rq_private_buf? Secondly, why is it mucking with the page buffer length?




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS over RDMA stops working for data larger than 812 Bytes
  2013-11-15 22:29 ` Myklebust, Trond
@ 2013-11-15 22:31   ` Chuck Lever
  2013-11-15 22:53     ` Myklebust, Trond
  0 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2013-11-15 22:31 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Klemens Senn, Linux NFS Mailing List


On Nov 15, 2013, at 5:29 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> 
> On Nov 15, 2013, at 16:39, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
>> Hi Trond-
>> 
>> Klemens Senn <klemens.senn@ims.co.at> says:
>>> Hi,
>>> 
>>> I started setting up NFS over RDMA on a OpenSUSE 12.3 machine using the
>>> 3.7.10-1.16-default kernel.
>>> Mounting the NFS share works and getting a file with 812 Bytes succeeds.
>>> Reading 813 Bytes raises a Input/output error.
>>> 
>>> The export on the server side is done with
>>> /data
>>> 172.16.100.0/24(rw,wdelay,crossmnt,insecure,no_root_squash,no_subtree_check,fsid=0,mountpoint)
>>> 
>>> Following command is used for mounting the NFSv4 share:
>>> mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.2:/ /mnt/
>>> 
>>> Testing the client side with older kernels stated that the mount is
>>> working properly with the 3.4.67 kernel. I continued testing with the
>>> 3.10.17 kernel and enabled the debug level and noticed the message
>>> "server cheating in read reply: count 813 > recvd 0"
>>> 
>>> Here is the syslog output:
>>> NFS: read(//813_bytes, 262144@0)
>>> NFS: nfs_readpage (ffffea001db1e898 4096@0)
>>> NFS:     0 initiated read call (req 0:33/70, 813 bytes @ offset 0)
>>> RPC:       new task initialized, procpid 9881
>>> RPC:       allocated task ffff88105dacabf8
>>> RPC:  1631 __rpc_execute flags=0x1
>>> <-- nfs4_setup_sequence status=0
>>> RPC:  1631 call_start nfs4 proc READ (async)
>>> RPC:  1631 call_reserve (status 0)
>>> RPC:  1631 reserved req ffff88085e641c00 xid 3f6a89a6
>>> RPC:  1631 call_reserveresult (status 0)
>>> RPC:  1631 call_refresh (status 0)
>>> RPC:  1631 refreshing UNIX cred ffff88087c2108c0
>>> RPC:  1631 call_refreshresult (status 0)
>>> RPC:  1631 call_allocate (status 0)
>>> RPC:       xprt_rdma_allocate: size 684, request 0xffff88084a342000
>>> RPC:  1631 call_bind (status 0)
>>> RPC:  1631 call_connect xprt ffff88085f297000 is connected
>>> RPC:  1631 call_transmit (status 0)
>>> RPC:  1631 xprt_prepare_transmit
>>> RPC:  1631 xprt_cwnd_limited cong = 0 cwnd = 4096
>>> RPC:  1631 rpc_xdr_encode (status 0)
>>> RPC:  1631 marshaling UNIX cred ffff88087c2108c0
>>> RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to wrap rpc data
>>> RPC:  1631 xprt_transmit(144)
>>> RPC:       rpcrdma_inline_pullup: pad 0 destp 0xffff88084a3437f0 len 144
>>> hdrlen 144
>>> RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d10
>>> to map 1 segments
>>> RPC:       rpcrdma_create_chunks: write chunk elem
>>> 813@0x87bf95000:0x8a905 (more)
>>> RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d38
>>> to map 1 segments
>>> RPC:       rpcrdma_create_chunks: write chunk elem
>>> 152@0x84a343974:0x8aa01 (last)
>>> RPC:       rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 144 padlen
>>> 0 headerp 0xffff88084a343100 base 0xffff88084a343760 lkey 0x8000
>>> RPC:       rpcrdma_event_process: event rep ffff88084b612d10 status 0
>>> opcode 8 length 4294936584
>>> RPC:       rpcrdma_event_process: event rep ffff88084b612d38 status 0
>>> opcode 8 length 4294936584
>>> RPC:  1631 xmit complete
>>> RPC:  1631 sleep_on(queue "xprt_pending" time 4296115067)
>>> RPC:  1631 added to queue ffff88085f297258 "xprt_pending"
>>> RPC:  1631 setting alarm for 90000 ms
>>> RPC:       wake_up_first(ffff88085f297190 "xprt_sending")
>>> RPC:       rpcrdma_event_process: event rep ffff88085e8bf000 status 0
>>> opcode 80 length 128
>>> RPC:       rpcrdma_reply_handler: reply 0xffff88085e8bf000 completes
>>> request 0xffff88084a342000
>>>                 RPC request 0xffff88085e641c00 xid 0xa6896a3f
>>> RPC:       rpcrdma_count_chunks: chunk 813@0x87bf95000:0x8a905
>>> RPC:       rpcrdma_count_chunks: chunk 3@0x84a343974:0x8aa01
>>> RPC:       rpcrdma_inline_fixup: srcp 0xffff88085e8bf0a4 len 60 hdrlen 60
>>> RPC:       rpcrdma_reply_handler: xprt_complete_rqst(0xffff88085f297000,
>>> 0xffff88085e641c00, 876)
>>> RPC:  1631 xid 3f6a89a6 complete (876 bytes received)
>>> RPC:  1631 __rpc_wake_up_task (now 4296115067)
>>> RPC:  1631 disabling timer
>>> RPC:  1631 removed from queue ffff88085f297258 "xprt_pending"
>>> RPC:       __rpc_wake_up_task done
>>> RPC:  1631 __rpc_execute flags=0x801
>>> RPC:  1631 call_status (status 876)
>>> RPC:  1631 call_decode (status 876)
>>> RPC:  1631 validating UNIX cred ffff88087c2108c0
>>> RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to unwrap rpc data
>>> NFS: server cheating in read reply: count 813 > recvd 0
>>> RPC:  1631 call_decode result 0
>>> NFS: nfs_readpage_result:  1631, (status 0)
>>> --> nfs4_read_done
>>> 
>>> Adding other debug prints to the kernel module told me that
>>> buf->page_len is set to 0 in xdr_align_pages (in file net/sunrpc/xdr.c).
>>> Comparing this piece of code with older kernel versions showed that the
>>> comparison with buf->page_len was introduces in kernel 3.7.
>>> By disabling this check the file transfer seems to work.
>>> 
>>> Here is the patch for kernel 3.10.17:
>>> 
>>> --- net/sunrpc/xdr.c	2013-10-18 19:44:19.000000000 +0200
>>> +++ net/sunrpc/xdr_new.c	2013-11-08 14:06:00.241217773 +0100
>>> @@ -747,7 +747,7 @@
>>> 		nwords = xdr->nwords;
>>> 		len = nwords << 2;
>>> 	}
>>> -	if (buf->page_len <= len)
>>> +	if (buf->page_len && buf->page_len <= len)
>>> 		len = buf->page_len;
>>> 	else if (nwords < xdr->nwords) {
>>> 		/* Truncate page data and move it into the tail */
>>> 
>>> 
>>> Can you please tell me which entity sets (or does not set) the total
>>> length (buf->page_len) after receiving the data from the RDMA_SEND
>>> operation from the server?
>>> 
>>> Please advise
>>> 
>>> Klemens
>> 
>> Comparing xdr_align_pages() before commit a11a2bf4 and afterwords, it appears that it no longer handles cases where buf->page_len == 0 the same way it used to.  xdr_align_pages() would never return zero unless xdr->nwords was zero.
> 
> Why would the buf->page_len == 0 in a READ call? Has RDMA been relying on some kind of undocumented behaviour?

See rpcrdma_inline_fixup().

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS over RDMA stops working for data larger than 812 Bytes
  2013-11-15 21:39 Chuck Lever
@ 2013-11-15 22:29 ` Myklebust, Trond
  2013-11-15 22:31   ` Chuck Lever
  0 siblings, 1 reply; 8+ messages in thread
From: Myklebust, Trond @ 2013-11-15 22:29 UTC (permalink / raw)
  To: Charles Edward Lever; +Cc: Klemens Senn, Linux NFS Mailing List


On Nov 15, 2013, at 16:39, Chuck Lever <chuck.lever@oracle.com> wrote:

> Hi Trond-
> 
> Klemens Senn <klemens.senn@ims.co.at> says:
>> Hi,
>> 
>> I started setting up NFS over RDMA on a OpenSUSE 12.3 machine using the
>> 3.7.10-1.16-default kernel.
>> Mounting the NFS share works and getting a file with 812 Bytes succeeds.
>> Reading 813 Bytes raises a Input/output error.
>> 
>> The export on the server side is done with
>> /data
>> 172.16.100.0/24(rw,wdelay,crossmnt,insecure,no_root_squash,no_subtree_check,fsid=0,mountpoint)
>> 
>> Following command is used for mounting the NFSv4 share:
>> mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.2:/ /mnt/
>> 
>> Testing the client side with older kernels stated that the mount is
>> working properly with the 3.4.67 kernel. I continued testing with the
>> 3.10.17 kernel and enabled the debug level and noticed the message
>> "server cheating in read reply: count 813 > recvd 0"
>> 
>> Here is the syslog output:
>> NFS: read(//813_bytes, 262144@0)
>> NFS: nfs_readpage (ffffea001db1e898 4096@0)
>> NFS:     0 initiated read call (req 0:33/70, 813 bytes @ offset 0)
>> RPC:       new task initialized, procpid 9881
>> RPC:       allocated task ffff88105dacabf8
>> RPC:  1631 __rpc_execute flags=0x1
>> <-- nfs4_setup_sequence status=0
>> RPC:  1631 call_start nfs4 proc READ (async)
>> RPC:  1631 call_reserve (status 0)
>> RPC:  1631 reserved req ffff88085e641c00 xid 3f6a89a6
>> RPC:  1631 call_reserveresult (status 0)
>> RPC:  1631 call_refresh (status 0)
>> RPC:  1631 refreshing UNIX cred ffff88087c2108c0
>> RPC:  1631 call_refreshresult (status 0)
>> RPC:  1631 call_allocate (status 0)
>> RPC:       xprt_rdma_allocate: size 684, request 0xffff88084a342000
>> RPC:  1631 call_bind (status 0)
>> RPC:  1631 call_connect xprt ffff88085f297000 is connected
>> RPC:  1631 call_transmit (status 0)
>> RPC:  1631 xprt_prepare_transmit
>> RPC:  1631 xprt_cwnd_limited cong = 0 cwnd = 4096
>> RPC:  1631 rpc_xdr_encode (status 0)
>> RPC:  1631 marshaling UNIX cred ffff88087c2108c0
>> RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to wrap rpc data
>> RPC:  1631 xprt_transmit(144)
>> RPC:       rpcrdma_inline_pullup: pad 0 destp 0xffff88084a3437f0 len 144
>> hdrlen 144
>> RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d10
>> to map 1 segments
>> RPC:       rpcrdma_create_chunks: write chunk elem
>> 813@0x87bf95000:0x8a905 (more)
>> RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d38
>> to map 1 segments
>> RPC:       rpcrdma_create_chunks: write chunk elem
>> 152@0x84a343974:0x8aa01 (last)
>> RPC:       rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 144 padlen
>> 0 headerp 0xffff88084a343100 base 0xffff88084a343760 lkey 0x8000
>> RPC:       rpcrdma_event_process: event rep ffff88084b612d10 status 0
>> opcode 8 length 4294936584
>> RPC:       rpcrdma_event_process: event rep ffff88084b612d38 status 0
>> opcode 8 length 4294936584
>> RPC:  1631 xmit complete
>> RPC:  1631 sleep_on(queue "xprt_pending" time 4296115067)
>> RPC:  1631 added to queue ffff88085f297258 "xprt_pending"
>> RPC:  1631 setting alarm for 90000 ms
>> RPC:       wake_up_first(ffff88085f297190 "xprt_sending")
>> RPC:       rpcrdma_event_process: event rep ffff88085e8bf000 status 0
>> opcode 80 length 128
>> RPC:       rpcrdma_reply_handler: reply 0xffff88085e8bf000 completes
>> request 0xffff88084a342000
>>                  RPC request 0xffff88085e641c00 xid 0xa6896a3f
>> RPC:       rpcrdma_count_chunks: chunk 813@0x87bf95000:0x8a905
>> RPC:       rpcrdma_count_chunks: chunk 3@0x84a343974:0x8aa01
>> RPC:       rpcrdma_inline_fixup: srcp 0xffff88085e8bf0a4 len 60 hdrlen 60
>> RPC:       rpcrdma_reply_handler: xprt_complete_rqst(0xffff88085f297000,
>> 0xffff88085e641c00, 876)
>> RPC:  1631 xid 3f6a89a6 complete (876 bytes received)
>> RPC:  1631 __rpc_wake_up_task (now 4296115067)
>> RPC:  1631 disabling timer
>> RPC:  1631 removed from queue ffff88085f297258 "xprt_pending"
>> RPC:       __rpc_wake_up_task done
>> RPC:  1631 __rpc_execute flags=0x801
>> RPC:  1631 call_status (status 876)
>> RPC:  1631 call_decode (status 876)
>> RPC:  1631 validating UNIX cred ffff88087c2108c0
>> RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to unwrap rpc data
>> NFS: server cheating in read reply: count 813 > recvd 0
>> RPC:  1631 call_decode result 0
>> NFS: nfs_readpage_result:  1631, (status 0)
>> --> nfs4_read_done
>> 
>> Adding other debug prints to the kernel module told me that
>> buf->page_len is set to 0 in xdr_align_pages (in file net/sunrpc/xdr.c).
>> Comparing this piece of code with older kernel versions showed that the
>> comparison with buf->page_len was introduces in kernel 3.7.
>> By disabling this check the file transfer seems to work.
>> 
>> Here is the patch for kernel 3.10.17:
>> 
>> --- net/sunrpc/xdr.c	2013-10-18 19:44:19.000000000 +0200
>> +++ net/sunrpc/xdr_new.c	2013-11-08 14:06:00.241217773 +0100
>> @@ -747,7 +747,7 @@
>> 		nwords = xdr->nwords;
>> 		len = nwords << 2;
>> 	}
>> -	if (buf->page_len <= len)
>> +	if (buf->page_len && buf->page_len <= len)
>> 		len = buf->page_len;
>> 	else if (nwords < xdr->nwords) {
>> 		/* Truncate page data and move it into the tail */
>> 
>> 
>> Can you please tell me which entity sets (or does not set) the total
>> length (buf->page_len) after receiving the data from the RDMA_SEND
>> operation from the server?
>> 
>> Please advise
>> 
>> Klemens
> 
> Comparing xdr_align_pages() before commit a11a2bf4 and afterwords, it appears that it no longer handles cases where buf->page_len == 0 the same way it used to.  xdr_align_pages() would never return zero unless xdr->nwords was zero.

Why would the buf->page_len == 0 in a READ call? Has RDMA been relying on some kind of undocumented behaviour?

Trond

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS over RDMA stops working for data larger than 812 Bytes
@ 2013-11-15 21:39 Chuck Lever
  2013-11-15 22:29 ` Myklebust, Trond
  0 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2013-11-15 21:39 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Klemens Senn, Linux NFS Mailing List

Hi Trond-

Klemens Senn <klemens.senn@ims.co.at> says:
> Hi,
> 
> I started setting up NFS over RDMA on a OpenSUSE 12.3 machine using the
> 3.7.10-1.16-default kernel.
> Mounting the NFS share works and getting a file with 812 Bytes succeeds.
> Reading 813 Bytes raises a Input/output error.
> 
> The export on the server side is done with
> /data
> 172.16.100.0/24(rw,wdelay,crossmnt,insecure,no_root_squash,no_subtree_check,fsid=0,mountpoint)
> 
> Following command is used for mounting the NFSv4 share:
>  mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.2:/ /mnt/
> 
> Testing the client side with older kernels stated that the mount is
> working properly with the 3.4.67 kernel. I continued testing with the
> 3.10.17 kernel and enabled the debug level and noticed the message
>  "server cheating in read reply: count 813 > recvd 0"
> 
> Here is the syslog output:
> NFS: read(//813_bytes, 262144@0)
> NFS: nfs_readpage (ffffea001db1e898 4096@0)
> NFS:     0 initiated read call (req 0:33/70, 813 bytes @ offset 0)
> RPC:       new task initialized, procpid 9881
> RPC:       allocated task ffff88105dacabf8
> RPC:  1631 __rpc_execute flags=0x1
> <-- nfs4_setup_sequence status=0
> RPC:  1631 call_start nfs4 proc READ (async)
> RPC:  1631 call_reserve (status 0)
> RPC:  1631 reserved req ffff88085e641c00 xid 3f6a89a6
> RPC:  1631 call_reserveresult (status 0)
> RPC:  1631 call_refresh (status 0)
> RPC:  1631 refreshing UNIX cred ffff88087c2108c0
> RPC:  1631 call_refreshresult (status 0)
> RPC:  1631 call_allocate (status 0)
> RPC:       xprt_rdma_allocate: size 684, request 0xffff88084a342000
> RPC:  1631 call_bind (status 0)
> RPC:  1631 call_connect xprt ffff88085f297000 is connected
> RPC:  1631 call_transmit (status 0)
> RPC:  1631 xprt_prepare_transmit
> RPC:  1631 xprt_cwnd_limited cong = 0 cwnd = 4096
> RPC:  1631 rpc_xdr_encode (status 0)
> RPC:  1631 marshaling UNIX cred ffff88087c2108c0
> RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to wrap rpc data
> RPC:  1631 xprt_transmit(144)
> RPC:       rpcrdma_inline_pullup: pad 0 destp 0xffff88084a3437f0 len 144
> hdrlen 144
> RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d10
> to map 1 segments
> RPC:       rpcrdma_create_chunks: write chunk elem
> 813@0x87bf95000:0x8a905 (more)
> RPC:       rpcrdma_register_frmr_external: Using frmr ffff88084b612d38
> to map 1 segments
> RPC:       rpcrdma_create_chunks: write chunk elem
> 152@0x84a343974:0x8aa01 (last)
> RPC:       rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 144 padlen
> 0 headerp 0xffff88084a343100 base 0xffff88084a343760 lkey 0x8000
> RPC:       rpcrdma_event_process: event rep ffff88084b612d10 status 0
> opcode 8 length 4294936584
> RPC:       rpcrdma_event_process: event rep ffff88084b612d38 status 0
> opcode 8 length 4294936584
> RPC:  1631 xmit complete
> RPC:  1631 sleep_on(queue "xprt_pending" time 4296115067)
> RPC:  1631 added to queue ffff88085f297258 "xprt_pending"
> RPC:  1631 setting alarm for 90000 ms
> RPC:       wake_up_first(ffff88085f297190 "xprt_sending")
> RPC:       rpcrdma_event_process: event rep ffff88085e8bf000 status 0
> opcode 80 length 128
> RPC:       rpcrdma_reply_handler: reply 0xffff88085e8bf000 completes
> request 0xffff88084a342000
>                   RPC request 0xffff88085e641c00 xid 0xa6896a3f
> RPC:       rpcrdma_count_chunks: chunk 813@0x87bf95000:0x8a905
> RPC:       rpcrdma_count_chunks: chunk 3@0x84a343974:0x8aa01
> RPC:       rpcrdma_inline_fixup: srcp 0xffff88085e8bf0a4 len 60 hdrlen 60
> RPC:       rpcrdma_reply_handler: xprt_complete_rqst(0xffff88085f297000,
> 0xffff88085e641c00, 876)
> RPC:  1631 xid 3f6a89a6 complete (876 bytes received)
> RPC:  1631 __rpc_wake_up_task (now 4296115067)
> RPC:  1631 disabling timer
> RPC:  1631 removed from queue ffff88085f297258 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:  1631 __rpc_execute flags=0x801
> RPC:  1631 call_status (status 876)
> RPC:  1631 call_decode (status 876)
> RPC:  1631 validating UNIX cred ffff88087c2108c0
> RPC:  1631 using AUTH_UNIX cred ffff88087c2108c0 to unwrap rpc data
> NFS: server cheating in read reply: count 813 > recvd 0
> RPC:  1631 call_decode result 0
> NFS: nfs_readpage_result:  1631, (status 0)
> --> nfs4_read_done
> 
> Adding other debug prints to the kernel module told me that
> buf->page_len is set to 0 in xdr_align_pages (in file net/sunrpc/xdr.c).
> Comparing this piece of code with older kernel versions showed that the
> comparison with buf->page_len was introduces in kernel 3.7.
> By disabling this check the file transfer seems to work.
> 
> Here is the patch for kernel 3.10.17:
> 
> --- net/sunrpc/xdr.c	2013-10-18 19:44:19.000000000 +0200
> +++ net/sunrpc/xdr_new.c	2013-11-08 14:06:00.241217773 +0100
> @@ -747,7 +747,7 @@
> 		nwords = xdr->nwords;
> 		len = nwords << 2;
> 	}
> -	if (buf->page_len <= len)
> +	if (buf->page_len && buf->page_len <= len)
> 		len = buf->page_len;
> 	else if (nwords < xdr->nwords) {
> 		/* Truncate page data and move it into the tail */
> 
> 
> Can you please tell me which entity sets (or does not set) the total
> length (buf->page_len) after receiving the data from the RDMA_SEND
> operation from the server?
> 
> Please advise
> 
> Klemens

Comparing xdr_align_pages() before commit a11a2bf4 and afterwords, it appears that it no longer handles cases where buf->page_len == 0 the same way it used to.  xdr_align_pages() would never return zero unless xdr->nwords was zero.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-11-16  6:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <527CECD8.2000208@ims.co.at>
2013-11-08 14:14 ` NFS over RDMA stops working for data larger than 812 Bytes Klemens Senn
2013-11-13  2:18   ` Harry Edmon
2013-11-15 21:39 Chuck Lever
2013-11-15 22:29 ` Myklebust, Trond
2013-11-15 22:31   ` Chuck Lever
2013-11-15 22:53     ` Myklebust, Trond
2013-11-16  0:29       ` Tom Talpey
2013-11-16  6:04         ` Myklebust, Trond

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.