All of lore.kernel.org
 help / color / mirror / Atom feed
* Diagnosing and resolving bottleneck with NFS I/O
@ 2012-06-21 19:40 Jeff Wright
  2012-06-21 21:17 ` Chuck Lever
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff Wright @ 2012-06-21 19:40 UTC (permalink / raw)
  To: linux-nfs, Jeff Wright; +Cc: Chuck Lever, Craig Flaskerud, Andy Adamson

Team,

Earlier this month I posted a note about diagnosing RPC bottlenecks on 
Linux when sorting out throughput problems with NFS.  I took advantage 
of the mountstats and nfsiostat commands to watch the RPC backlog and 
server response time (RTT) in a sequence of write tests.  We also used 
netstat to watch the TCP send/receive queues on the client and server, 
netstat to watch the TCP send/receive queues on the server, DTrace to 
watch NFS read/write response time on the server, and iostat to watch 
the physical media response time on the server.

To simplify analysis we ran an application that would time the execution 
of stable pwrite64() (O_DIRECT) for a single I/O.  We observed a large 
increase in response time at the application (time to run pwrite64()) 
compared to the reported response time from nfsiostat.  We also measured 
that the response time measured by nfsiostat on the client closely 
matched the back-end media response time in the server, and that the RPC 
backlog was 0.  netstat showed that we had some data in the TCP send 
queue on the client, but not overloaded, and that we had very little 
data in the TCP receive queue on the server.  The values of the I/O rate 
and response time in this test were as follows:
* Application I/O rate: 102 IOPS
* Application I/O response time: 9.9 ms
* nfsstat I/O rate: 102 IOPS
* nfsstat I/O response time (RTT): 3.8 ms
* NFS server I/O rate: 104
* NFS server I/O response time: 1.5 ms
* Media I/O rate: 105 IOPS
* Media I/O response  time: 3.2 ms

In this use case I think we have unstable writes from the NFS client 
followed by commits, and this would lead to the short NFS server write 
response time because the commit is not included.  The nfsstat response 
time matching the media response time would be correct if the nfsstat 
response time included the commit time for the I/O.  Could anyone verify 
if the RTT reported includes the the total time to commit the writes, or 
if it only includes the write and not the subsequent commit?

If the nfsstat RTT includes the commit, then the bottleneck for this 
test appears to be between the NFS client and the pwrite64() call 
because of the large difference in response time for these 2 
operations.  I don't know this part of the stack well, so I'd like to 
ask the list if anyone has seen behavior like this before, and if there 
is a way to instrument the code path between pwrite64() and the nfs write?

Next, we doubled the amount of pending I/O (2 separate pwrite64() 
commands).  System throughput doubled in this case, and all of the 
response times remained the about the same:
* Application I/O rate: 200 IOPS
* Application I/O response time: 10 ms
* nfsstat I/O rate: 200 IOPS
* nfsstat I/O response time (RTT): 3.7 ms
* NFS server I/O rate: 190 IOPS
* NFS server I/O response time: 1.5 ms
* Media I/O rate: 200 IOPS
* Media I/O response  time: 3.5 ms

I think this shows that the bottleneck affecting the application 
response time is per-process, and that the underlying NFS subsystem is 
still not the bottleneck in the system.  Does this sound reasonable, or 
is there more interaction between the client application and the NFS 
subsystem?

We continued to scale the number of concurrent pwrite64() processes.  A 
possibly important detail is that the separate pwrite64() processes 
could update the same file.  We started to see another bottleneck with 4 
concurrent pwrite64() processes, and the bottleneck was in full-force 
with 8 processes.  In both cases, the response time reported by 
nfsiostat was constant at about 3.7 ms, and this nearly matched the 
media I/O response time in the server, about 4.0 ms.  The application 
response time measured 12.6 ms with 4 pending requests, and 18.6 ms with 
8 pending requests.  I think this suggests that the NFS client-server 
interaction is clean, and that something is amiss between the pwrite64() 
call and the NFS client - does this sound right, or is there something 
else we should be looking at to sort out the bottleneck?

Thanks for letting us know where you think we should check next to sort 
out the bottleneck.

Thanks,

Jeff




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Diagnosing and resolving bottleneck with NFS I/O
  2012-06-21 19:40 Diagnosing and resolving bottleneck with NFS I/O Jeff Wright
@ 2012-06-21 21:17 ` Chuck Lever
  2012-06-22 14:42   ` Jeff Wright
  0 siblings, 1 reply; 3+ messages in thread
From: Chuck Lever @ 2012-06-21 21:17 UTC (permalink / raw)
  To: Jeff Wright; +Cc: linux-nfs, Craig Flaskerud, Andy Adamson

Hi-

On Jun 21, 2012, at 3:40 PM, Jeff Wright wrote:

> To simplify analysis we ran an application that would time the execution of stable pwrite64() (O_DIRECT) for a single I/O.  We observed a large increase in response time at the application (time to run pwrite64()) compared to the reported response time from nfsiostat.  We also measured that the response time measured by nfsiostat on the client closely matched the back-end media response time in the server, and that the RPC backlog was 0.  netstat showed that we had some data in the TCP send queue on the client, but not overloaded, and that we had very little data in the TCP receive queue on the server.  The values of the I/O rate and response time in this test were as follows:
> * Application I/O rate: 102 IOPS
> * Application I/O response time: 9.9 ms
> * nfsstat I/O rate: 102 IOPS
> * nfsstat I/O response time (RTT): 3.8 ms
> * NFS server I/O rate: 104
> * NFS server I/O response time: 1.5 ms
> * Media I/O rate: 105 IOPS
> * Media I/O response  time: 3.2 ms
> 
> In this use case I think we have unstable writes from the NFS client followed by commits, and this would lead to the short NFS server write response time because the commit is not included.  The nfsstat response time matching the media response time would be correct if the nfsstat response time included the commit time for the I/O.  Could anyone verify if the RTT reported includes the the total time to commit the writes, or if it only includes the write and not the subsequent commit?

It depends on whether the server performs a FILE_SYNC or UNSTABLE write.  nfsiostat doesn't distinguish between them, and the server is free to promote an UNSTABLE write request to a FILE_SYNC write.

If a write is UNSTABLE, then no, the "commit" time is not included in the WRITE RTT statistic.  If a write is FILE_SYNC, then the "commit" time is built into the WRITE request, and the WRITE RTT statistic includes it.

Servers generally don't promote UNSTABLE writes, but NetApp always does.  Clients seldom request FILE_SYNC writes, though they will in some circumstances.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Diagnosing and resolving bottleneck with NFS I/O
  2012-06-21 21:17 ` Chuck Lever
@ 2012-06-22 14:42   ` Jeff Wright
  0 siblings, 0 replies; 3+ messages in thread
From: Jeff Wright @ 2012-06-22 14:42 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, Craig Flaskerud, Andy Adamson

On 06/21/12 15:17, Chuck Lever wrote:
> Hi-
>
> On Jun 21, 2012, at 3:40 PM, Jeff Wright wrote:
>
>> To simplify analysis we ran an application that would time the execution of stable pwrite64() (O_DIRECT) for a single I/O.  We observed a large increase in response time at the application (time to run pwrite64()) compared to the reported response time from nfsiostat.  We also measured that the response time measured by nfsiostat on the client closely matched the back-end media response time in the server, and that the RPC backlog was 0.  netstat showed that we had some data in the TCP send queue on the client, but not overloaded, and that we had very little data in the TCP receive queue on the server.  The values of the I/O rate and response time in this test were as follows:
>> * Application I/O rate: 102 IOPS
>> * Application I/O response time: 9.9 ms
>> * nfsstat I/O rate: 102 IOPS
>> * nfsstat I/O response time (RTT): 3.8 ms
>> * NFS server I/O rate: 104
>> * NFS server I/O response time: 1.5 ms
>> * Media I/O rate: 105 IOPS
>> * Media I/O response  time: 3.2 ms
>>
>> In this use case I think we have unstable writes from the NFS client followed by commits, and this would lead to the short NFS server write response time because the commit is not included.  The nfsstat response time matching the media response time would be correct if the nfsstat response time included the commit time for the I/O.  Could anyone verify if the RTT reported includes the the total time to commit the writes, or if it only includes the write and not the subsequent commit?
> It depends on whether the server performs a FILE_SYNC or UNSTABLE write.  nfsiostat doesn't distinguish between them, and the server is free to promote an UNSTABLE write request to a FILE_SYNC write.
>
> If a write is UNSTABLE, then no, the "commit" time is not included in the WRITE RTT statistic.  If a write is FILE_SYNC, then the "commit" time is built into the WRITE request, and the WRITE RTT statistic includes it.
>
> Servers generally don't promote UNSTABLE writes, but NetApp always does.  Clients seldom request FILE_SYNC writes, though they will in some circumstances.
We will check what the S11 nfsd implementation is doing - thanks for the 
clarification.
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-06-22 14:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-21 19:40 Diagnosing and resolving bottleneck with NFS I/O Jeff Wright
2012-06-21 21:17 ` Chuck Lever
2012-06-22 14:42   ` Jeff Wright

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.