All of lore.kernel.org
 help / color / mirror / Atom feed
* Helps to Decode rpc_debug Output
@ 2013-08-15  0:14 ` Wendy Cheng
  0 siblings, 0 replies; 12+ messages in thread
From: Wendy Cheng @ 2013-08-15  0:14 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

The IO on top of a NFS mounted directory was hanging so I forced a
(client side) "rpc_debug" output from the proc entry.

<6>[ 4311.590317]  1676 0001    -11 ffff8801e0bde400 ffff8801d18b1248
      0 ffffffff81420d40 nfsv3 WRITE a:call_status q:xprt_resend
... (similar lines) ...
<6>[ 4311.590435]  1682 0001    -11 ffff8801e0bde400 ffff8801d18b0e10
      0 ffffffff81420d40 nfsv3 WRITE a:call_connect_status
q:xprt_sending
... (similar lines) ...

Could someone give me an educational statement on what the above two
lines mean ?  . More specifically, what "call_connect_status" does and
what "xprt_sending" means ? Is there any way to force (i.e. hacking
the code to get) a re-connect (i.e. invoke "connect" from
rpc_xprt_ops) ?

Longer version of the question:
I'm trying to enable NFS-RDMA on an embedded system (based on 2.6.38
kernel) as a client. The IB stacks are taken from OFED 1.5.4. NFS
server is a RHEL 6.3 Xeon box. The connection uses mellox-4 driver.
Memory registration is "RPCRDMA_ALLPHYSICAL". There are many issues so
far but I do manage to get nfs mount working. Simple file operations
(such as "ls", file read/write, "scp", etc) seem to work as well.
While trying to run iozone to see whether the performance gain can be
justified for the development efforts, the program runs until it
reaches 2MB file size - at that point, RDMA CM sends out
"TIMEWAIT_EXIT" event, the xprt is disconnected, and all IOs on that
share hang. IPOIB still works though. Not sure what would be the best
way to debug this.

Thanks,
Wendy
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Helps to Decode rpc_debug Output
@ 2013-08-15  0:14 ` Wendy Cheng
  0 siblings, 0 replies; 12+ messages in thread
From: Wendy Cheng @ 2013-08-15  0:14 UTC (permalink / raw)
  To: linux-nfs, linux-rdma

The IO on top of a NFS mounted directory was hanging so I forced a
(client side) "rpc_debug" output from the proc entry.

<6>[ 4311.590317]  1676 0001    -11 ffff8801e0bde400 ffff8801d18b1248
      0 ffffffff81420d40 nfsv3 WRITE a:call_status q:xprt_resend
... (similar lines) ...
<6>[ 4311.590435]  1682 0001    -11 ffff8801e0bde400 ffff8801d18b0e10
      0 ffffffff81420d40 nfsv3 WRITE a:call_connect_status
q:xprt_sending
... (similar lines) ...

Could someone give me an educational statement on what the above two
lines mean ?  . More specifically, what "call_connect_status" does and
what "xprt_sending" means ? Is there any way to force (i.e. hacking
the code to get) a re-connect (i.e. invoke "connect" from
rpc_xprt_ops) ?

Longer version of the question:
I'm trying to enable NFS-RDMA on an embedded system (based on 2.6.38
kernel) as a client. The IB stacks are taken from OFED 1.5.4. NFS
server is a RHEL 6.3 Xeon box. The connection uses mellox-4 driver.
Memory registration is "RPCRDMA_ALLPHYSICAL". There are many issues so
far but I do manage to get nfs mount working. Simple file operations
(such as "ls", file read/write, "scp", etc) seem to work as well.
While trying to run iozone to see whether the performance gain can be
justified for the development efforts, the program runs until it
reaches 2MB file size - at that point, RDMA CM sends out
"TIMEWAIT_EXIT" event, the xprt is disconnected, and all IOs on that
share hang. IPOIB still works though. Not sure what would be the best
way to debug this.

Thanks,
Wendy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Helps to Decode rpc_debug Output
  2013-08-15  0:14 ` Wendy Cheng
@ 2013-08-15 12:46     ` Tom Talpey
  -1 siblings, 0 replies; 12+ messages in thread
From: Tom Talpey @ 2013-08-15 12:46 UTC (permalink / raw)
  To: Wendy Cheng
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 8/14/2013 8:14 PM, Wendy Cheng wrote:
> Longer version of the question:
> I'm trying to enable NFS-RDMA on an embedded system (based on 2.6.38
> kernel) as a client. The IB stacks are taken from OFED 1.5.4. NFS
> server is a RHEL 6.3 Xeon box. The connection uses mellox-4 driver.
> Memory registration is "RPCRDMA_ALLPHYSICAL". There are many issues so
> far but I do manage to get nfs mount working. Simple file operations
> (such as "ls", file read/write, "scp", etc) seem to work as well.

You're probably seeing connection loss from bad RDMA handles, which
come into play when you send large r/w traffic. The fact that small
i/o such as simple ls, and non-NFS traffic such as scp, means the
network itself is ok.

 > While trying to run iozone to see whether the performance gain can be
 > justified for the development efforts, the program runs until it
 > reaches 2MB file size - at that point, RDMA CM sends out
 > "TIMEWAIT_EXIT" event, the xprt is disconnected, and all IOs on that
 > share hang. IPOIB still works though. Not sure what would be the best
 > way to debug this.

I would suggest enabling RPC "transport" debugging, and any tracing
in the IB stack itself, looking to see if you can find any patterns.
You may want to look at the server side, too.

Unfortunately, because you are using Infiniband, packet capture is
going to be next to impossible. You might try using an iWARP adapter,
or one which can be sniffed, if you suspect a traffic issue.

Why did you replace the Linux IB stack with OFED? Did you also take the
NFS/RDMA from that package, and if so are you sure that it all is
is working properly? Doesn't 2.6.38 already have all this?

Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Helps to Decode rpc_debug Output
@ 2013-08-15 12:46     ` Tom Talpey
  0 siblings, 0 replies; 12+ messages in thread
From: Tom Talpey @ 2013-08-15 12:46 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: linux-nfs, linux-rdma

On 8/14/2013 8:14 PM, Wendy Cheng wrote:
> Longer version of the question:
> I'm trying to enable NFS-RDMA on an embedded system (based on 2.6.38
> kernel) as a client. The IB stacks are taken from OFED 1.5.4. NFS
> server is a RHEL 6.3 Xeon box. The connection uses mellox-4 driver.
> Memory registration is "RPCRDMA_ALLPHYSICAL". There are many issues so
> far but I do manage to get nfs mount working. Simple file operations
> (such as "ls", file read/write, "scp", etc) seem to work as well.

You're probably seeing connection loss from bad RDMA handles, which
come into play when you send large r/w traffic. The fact that small
i/o such as simple ls, and non-NFS traffic such as scp, means the
network itself is ok.

 > While trying to run iozone to see whether the performance gain can be
 > justified for the development efforts, the program runs until it
 > reaches 2MB file size - at that point, RDMA CM sends out
 > "TIMEWAIT_EXIT" event, the xprt is disconnected, and all IOs on that
 > share hang. IPOIB still works though. Not sure what would be the best
 > way to debug this.

I would suggest enabling RPC "transport" debugging, and any tracing
in the IB stack itself, looking to see if you can find any patterns.
You may want to look at the server side, too.

Unfortunately, because you are using Infiniband, packet capture is
going to be next to impossible. You might try using an iWARP adapter,
or one which can be sniffed, if you suspect a traffic issue.

Why did you replace the Linux IB stack with OFED? Did you also take the
NFS/RDMA from that package, and if so are you sure that it all is
is working properly? Doesn't 2.6.38 already have all this?

Tom.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Helps to Decode rpc_debug Output
  2013-08-15 12:46     ` Tom Talpey
@ 2013-08-15 18:08         ` Wendy Cheng
  -1 siblings, 0 replies; 12+ messages in thread
From: Wendy Cheng @ 2013-08-15 18:08 UTC (permalink / raw)
  To: Tom Talpey
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 15, 2013 at 5:46 AM, Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> wrote:
> On 8/14/2013 8:14 PM, Wendy Cheng wrote:
>>
>> Longer version of the question:
>> I'm trying to enable NFS-RDMA on an embedded system (based on 2.6.38
>> kernel) as a client. The IB stacks are taken from OFED 1.5.4. NFS
>> server is a RHEL 6.3 Xeon box. The connection uses mellox-4 driver.
>> Memory registration is "RPCRDMA_ALLPHYSICAL". There are many issues so
>> far but I do manage to get nfs mount working. Simple file operations
>> (such as "ls", file read/write, "scp", etc) seem to work as well.
>
[snip]

> Why did you replace the Linux IB stack with OFED? Did you also take the
> NFS/RDMA from that package, and if so are you sure that it all is
> is working properly? Doesn't 2.6.38 already have all this?
>

Other part of the cluster runs OFED 1.5.4 on top of RHEL 6.3 - it was
a product decision. Ditto for the 2.6.38 based uOS.

OFED 1.5.4 based NFS/RDMA (i.e. xprtrdma) does not run on both
platforms. It took a while to understand the setup. I believe issues
with RHEL boxes have been fixed - at least iozone runs thru (as client
and server) w/out trouble. Now the issue is with this 2.6.38 uOS (as
client) that talks to RHEL 6.3 (as server). I don't know much about
NFS V4 so the focus is on V3.

-- Wendy
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Helps to Decode rpc_debug Output
@ 2013-08-15 18:08         ` Wendy Cheng
  0 siblings, 0 replies; 12+ messages in thread
From: Wendy Cheng @ 2013-08-15 18:08 UTC (permalink / raw)
  To: Tom Talpey; +Cc: linux-nfs, linux-rdma

On Thu, Aug 15, 2013 at 5:46 AM, Tom Talpey <tom@talpey.com> wrote:
> On 8/14/2013 8:14 PM, Wendy Cheng wrote:
>>
>> Longer version of the question:
>> I'm trying to enable NFS-RDMA on an embedded system (based on 2.6.38
>> kernel) as a client. The IB stacks are taken from OFED 1.5.4. NFS
>> server is a RHEL 6.3 Xeon box. The connection uses mellox-4 driver.
>> Memory registration is "RPCRDMA_ALLPHYSICAL". There are many issues so
>> far but I do manage to get nfs mount working. Simple file operations
>> (such as "ls", file read/write, "scp", etc) seem to work as well.
>
[snip]

> Why did you replace the Linux IB stack with OFED? Did you also take the
> NFS/RDMA from that package, and if so are you sure that it all is
> is working properly? Doesn't 2.6.38 already have all this?
>

Other part of the cluster runs OFED 1.5.4 on top of RHEL 6.3 - it was
a product decision. Ditto for the 2.6.38 based uOS.

OFED 1.5.4 based NFS/RDMA (i.e. xprtrdma) does not run on both
platforms. It took a while to understand the setup. I believe issues
with RHEL boxes have been fixed - at least iozone runs thru (as client
and server) w/out trouble. Now the issue is with this 2.6.38 uOS (as
client) that talks to RHEL 6.3 (as server). I don't know much about
NFS V4 so the focus is on V3.

-- Wendy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Helps to Decode rpc_debug Output
  2013-08-15 18:08         ` Wendy Cheng
@ 2013-08-21 15:55             ` Wendy Cheng
  -1 siblings, 0 replies; 12+ messages in thread
From: Wendy Cheng @ 2013-08-21 15:55 UTC (permalink / raw)
  To: Tom Talpey
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 15, 2013 at 11:08 AM, Wendy Cheng <s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Thu, Aug 15, 2013 at 5:46 AM, Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> wrote:
>> On 8/14/2013 8:14 PM, Wendy Cheng wrote:
>>>
>>> Longer version of the question:
>>> I'm trying to enable NFS-RDMA on an embedded system (based on 2.6.38
>>> kernel) as a client. The IB stacks are taken from OFED 1.5.4. NFS
>>> server is a RHEL 6.3 Xeon box. The connection uses mellox-4 driver.
>>> Memory registration is "RPCRDMA_ALLPHYSICAL". There are many issues so
>>> far but I do manage to get nfs mount working. Simple file operations
>>> (such as "ls", file read/write, "scp", etc) seem to work as well.
>>

Yay ... got this up .. amazingly on a uOS that does not have much of
the conventional kernel debug facilities.

The hang was caused by auto disconnect, triggered by xprt->timer. The
task was carried out by xprt_init_autodisconnect(). It silently
disconnects the xprt w/out sensible warning. The uOS is on a
small-core (slower) hardware. Instead of a hard number, this timeout
value needs to be at least a "proc" tunable. Will check newer kernels
to see whether it's been improved and/or draft a patch later.

One thing I'm still scratching my head is that ... by looking at the
raw IOPS, I don't see dramatic difference between NFS-RDMA vs. NFS
over IPOIB (TCP). However, the total run time differs greatly. NFS
over RDMA seems to take a much longer time to finish (vs. NFS over
IPOIB). Not sure why is that .... Maybe by the constant
connect/disconnect triggered by reestablish_timeout ? The connection
re-establish is known to be expensive on this uOS. Why do we need two
sets of timeout where
1. xprt->timer disconnects (w/out reconnect) ?
2. reestablish_timeout constantly disconnect/re-connect ?

-- Wendy
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Helps to Decode rpc_debug Output
@ 2013-08-21 15:55             ` Wendy Cheng
  0 siblings, 0 replies; 12+ messages in thread
From: Wendy Cheng @ 2013-08-21 15:55 UTC (permalink / raw)
  To: Tom Talpey; +Cc: linux-nfs, linux-rdma

On Thu, Aug 15, 2013 at 11:08 AM, Wendy Cheng <s.wendy.cheng@gmail.com> wrote:
> On Thu, Aug 15, 2013 at 5:46 AM, Tom Talpey <tom@talpey.com> wrote:
>> On 8/14/2013 8:14 PM, Wendy Cheng wrote:
>>>
>>> Longer version of the question:
>>> I'm trying to enable NFS-RDMA on an embedded system (based on 2.6.38
>>> kernel) as a client. The IB stacks are taken from OFED 1.5.4. NFS
>>> server is a RHEL 6.3 Xeon box. The connection uses mellox-4 driver.
>>> Memory registration is "RPCRDMA_ALLPHYSICAL". There are many issues so
>>> far but I do manage to get nfs mount working. Simple file operations
>>> (such as "ls", file read/write, "scp", etc) seem to work as well.
>>

Yay ... got this up .. amazingly on a uOS that does not have much of
the conventional kernel debug facilities.

The hang was caused by auto disconnect, triggered by xprt->timer. The
task was carried out by xprt_init_autodisconnect(). It silently
disconnects the xprt w/out sensible warning. The uOS is on a
small-core (slower) hardware. Instead of a hard number, this timeout
value needs to be at least a "proc" tunable. Will check newer kernels
to see whether it's been improved and/or draft a patch later.

One thing I'm still scratching my head is that ... by looking at the
raw IOPS, I don't see dramatic difference between NFS-RDMA vs. NFS
over IPOIB (TCP). However, the total run time differs greatly. NFS
over RDMA seems to take a much longer time to finish (vs. NFS over
IPOIB). Not sure why is that .... Maybe by the constant
connect/disconnect triggered by reestablish_timeout ? The connection
re-establish is known to be expensive on this uOS. Why do we need two
sets of timeout where
1. xprt->timer disconnects (w/out reconnect) ?
2. reestablish_timeout constantly disconnect/re-connect ?

-- Wendy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Helps to Decode rpc_debug Output
  2013-08-21 15:55             ` Wendy Cheng
@ 2013-08-26 13:22                 ` Tom Talpey
  -1 siblings, 0 replies; 12+ messages in thread
From: Tom Talpey @ 2013-08-26 13:22 UTC (permalink / raw)
  To: Wendy Cheng
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 8/21/2013 11:55 AM, Wendy Cheng wrote:
> On Thu, Aug 15, 2013 at 11:08 AM, Wendy Cheng <s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Thu, Aug 15, 2013 at 5:46 AM, Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> wrote:
>>> On 8/14/2013 8:14 PM, Wendy Cheng wrote:
>>>>
>>>> Longer version of the question:
>>>> I'm trying to enable NFS-RDMA on an embedded system (based on 2.6.38
>>>> kernel) as a client. The IB stacks are taken from OFED 1.5.4. NFS
>>>> server is a RHEL 6.3 Xeon box. The connection uses mellox-4 driver.
>>>> Memory registration is "RPCRDMA_ALLPHYSICAL". There are many issues so
>>>> far but I do manage to get nfs mount working. Simple file operations
>>>> (such as "ls", file read/write, "scp", etc) seem to work as well.
>>>
>
> Yay ... got this up .. amazingly on a uOS that does not have much of
> the conventional kernel debug facilities.

Congrats!

> One thing I'm still scratching my head is that ... by looking at the
> raw IOPS, I don't see dramatic difference between NFS-RDMA vs. NFS
> over IPOIB (TCP).

Sounds like your bottleneck lies in some other component. What's the
storage, for example? RDMA won't do a thing to improve a slow disk.
Or, what kind of IOPS rate are you seeing? If these systems aren't
generating enough load to push a CPU limit, then shifting the protocol
on the same link might not yield much.

> However, the total run time differs greatly. NFS
> over RDMA seems to take a much longer time to finish (vs. NFS over
> IPOIB). Not sure why is that .... Maybe by the constant
> connect/disconnect triggered by reestablish_timeout ? The connection
> re-establish is known to be expensive on this uOS.

Um, yes, of course. Fix that before drawing any conclusions.

Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Helps to Decode rpc_debug Output
@ 2013-08-26 13:22                 ` Tom Talpey
  0 siblings, 0 replies; 12+ messages in thread
From: Tom Talpey @ 2013-08-26 13:22 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: linux-nfs, linux-rdma

On 8/21/2013 11:55 AM, Wendy Cheng wrote:
> On Thu, Aug 15, 2013 at 11:08 AM, Wendy Cheng <s.wendy.cheng@gmail.com> wrote:
>> On Thu, Aug 15, 2013 at 5:46 AM, Tom Talpey <tom@talpey.com> wrote:
>>> On 8/14/2013 8:14 PM, Wendy Cheng wrote:
>>>>
>>>> Longer version of the question:
>>>> I'm trying to enable NFS-RDMA on an embedded system (based on 2.6.38
>>>> kernel) as a client. The IB stacks are taken from OFED 1.5.4. NFS
>>>> server is a RHEL 6.3 Xeon box. The connection uses mellox-4 driver.
>>>> Memory registration is "RPCRDMA_ALLPHYSICAL". There are many issues so
>>>> far but I do manage to get nfs mount working. Simple file operations
>>>> (such as "ls", file read/write, "scp", etc) seem to work as well.
>>>
>
> Yay ... got this up .. amazingly on a uOS that does not have much of
> the conventional kernel debug facilities.

Congrats!

> One thing I'm still scratching my head is that ... by looking at the
> raw IOPS, I don't see dramatic difference between NFS-RDMA vs. NFS
> over IPOIB (TCP).

Sounds like your bottleneck lies in some other component. What's the
storage, for example? RDMA won't do a thing to improve a slow disk.
Or, what kind of IOPS rate are you seeing? If these systems aren't
generating enough load to push a CPU limit, then shifting the protocol
on the same link might not yield much.

> However, the total run time differs greatly. NFS
> over RDMA seems to take a much longer time to finish (vs. NFS over
> IPOIB). Not sure why is that .... Maybe by the constant
> connect/disconnect triggered by reestablish_timeout ? The connection
> re-establish is known to be expensive on this uOS.

Um, yes, of course. Fix that before drawing any conclusions.

Tom.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Helps to Decode rpc_debug Output
  2013-08-26 13:22                 ` Tom Talpey
@ 2013-08-26 17:08                     ` Wendy Cheng
  -1 siblings, 0 replies; 12+ messages in thread
From: Wendy Cheng @ 2013-08-26 17:08 UTC (permalink / raw)
  To: Tom Talpey
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Aug 26, 2013 at 6:22 AM, Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> wrote:
> On 8/21/2013 11:55 AM, Wendy Cheng wrote:
>>
>> On Thu, Aug 15, 2013 at 11:08 AM, Wendy Cheng <s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> wrote:
>>>
>>> On Thu, Aug 15, 2013 at 5:46 AM, Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> wrote:
>>>>
>>>> On 8/14/2013 8:14 PM, Wendy Cheng wrote:
>>>>>
>>>>>
>>>>> Longer version of the question:
>>>>> I'm trying to enable NFS-RDMA on an embedded system (based on 2.6.38
>>>>> kernel) as a client. The IB stacks are taken from OFED 1.5.4. NFS
>>>>> server is a RHEL 6.3 Xeon box. The connection uses mellox-4 driver.
>>>>> Memory registration is "RPCRDMA_ALLPHYSICAL". There are many issues so
>>>>> far but I do manage to get nfs mount working. Simple file operations
>>>>> (such as "ls", file read/write, "scp", etc) seem to work as well.
>>>>

>> One thing I'm still scratching my head is that ... by looking at the
>> raw IOPS, I don't see dramatic difference between NFS-RDMA vs. NFS
>> over IPOIB (TCP).
>
>
> Sounds like your bottleneck lies in some other component. What's the
> storage, for example? RDMA won't do a thing to improve a slow disk.
> Or, what kind of IOPS rate are you seeing? If these systems aren't
> generating enough load to push a CPU limit, then shifting the protocol
> on the same link might not yield much.

There is no kernel profiling tool with this uOS (yet) so it is hard to
identify the bottleneck. Looking from the surface, the slow down seems
to be from SUNRPC's Van Jacobson congestion control
(xprt_reserve_xprt_cong()) where it either creates a race condition
for the transmissions (write/commit) to miss their wake-up(s); or the
algorithm itself is not a right choice for this client system that
consists of many (244 on my system) slower cores (CPU).

Solid state drives are used on the RHEL server.

-- Wendy
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Helps to Decode rpc_debug Output
@ 2013-08-26 17:08                     ` Wendy Cheng
  0 siblings, 0 replies; 12+ messages in thread
From: Wendy Cheng @ 2013-08-26 17:08 UTC (permalink / raw)
  To: Tom Talpey; +Cc: linux-nfs, linux-rdma

On Mon, Aug 26, 2013 at 6:22 AM, Tom Talpey <tom@talpey.com> wrote:
> On 8/21/2013 11:55 AM, Wendy Cheng wrote:
>>
>> On Thu, Aug 15, 2013 at 11:08 AM, Wendy Cheng <s.wendy.cheng@gmail.com>
>> wrote:
>>>
>>> On Thu, Aug 15, 2013 at 5:46 AM, Tom Talpey <tom@talpey.com> wrote:
>>>>
>>>> On 8/14/2013 8:14 PM, Wendy Cheng wrote:
>>>>>
>>>>>
>>>>> Longer version of the question:
>>>>> I'm trying to enable NFS-RDMA on an embedded system (based on 2.6.38
>>>>> kernel) as a client. The IB stacks are taken from OFED 1.5.4. NFS
>>>>> server is a RHEL 6.3 Xeon box. The connection uses mellox-4 driver.
>>>>> Memory registration is "RPCRDMA_ALLPHYSICAL". There are many issues so
>>>>> far but I do manage to get nfs mount working. Simple file operations
>>>>> (such as "ls", file read/write, "scp", etc) seem to work as well.
>>>>

>> One thing I'm still scratching my head is that ... by looking at the
>> raw IOPS, I don't see dramatic difference between NFS-RDMA vs. NFS
>> over IPOIB (TCP).
>
>
> Sounds like your bottleneck lies in some other component. What's the
> storage, for example? RDMA won't do a thing to improve a slow disk.
> Or, what kind of IOPS rate are you seeing? If these systems aren't
> generating enough load to push a CPU limit, then shifting the protocol
> on the same link might not yield much.

There is no kernel profiling tool with this uOS (yet) so it is hard to
identify the bottleneck. Looking from the surface, the slow down seems
to be from SUNRPC's Van Jacobson congestion control
(xprt_reserve_xprt_cong()) where it either creates a race condition
for the transmissions (write/commit) to miss their wake-up(s); or the
algorithm itself is not a right choice for this client system that
consists of many (244 on my system) slower cores (CPU).

Solid state drives are used on the RHEL server.

-- Wendy

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-08-26 17:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-15  0:14 Helps to Decode rpc_debug Output Wendy Cheng
2013-08-15  0:14 ` Wendy Cheng
     [not found] ` <CABgxfbGFHv2n5=78_irkxXnX9BFDFPzZvqgs_iDn64AR_3cf5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-15 12:46   ` Tom Talpey
2013-08-15 12:46     ` Tom Talpey
     [not found]     ` <520CCDBB.1020501-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2013-08-15 18:08       ` Wendy Cheng
2013-08-15 18:08         ` Wendy Cheng
     [not found]         ` <CABgxfbFixDTUy4e1EQty86dvNeisG7+hd2QdVSQvv4T2tokFoQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-21 15:55           ` Wendy Cheng
2013-08-21 15:55             ` Wendy Cheng
     [not found]             ` <CABgxfbH3uA2WzQ2DgOgRrCMhcJ1J6E2NCh13FtDfRzDWje9vWQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-26 13:22               ` Tom Talpey
2013-08-26 13:22                 ` Tom Talpey
     [not found]                 ` <521B56A9.9090003-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2013-08-26 17:08                   ` Wendy Cheng
2013-08-26 17:08                     ` Wendy Cheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.