All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [nfsv4]nfs client bug
       [not found] <BANLkTi=xcQseTx8BTWEzg-1DO=ayJuMLrw@mail.gmail.com>
@ 2011-06-29 16:28 ` Benny Halevy
  2011-06-30  2:32   ` quanli gui
  0 siblings, 1 reply; 11+ messages in thread
From: Benny Halevy @ 2011-06-29 16:28 UTC (permalink / raw)
  To: quanli gui; +Cc: linux-nfs, Mueller, Brian

Hi,

First, please use plain text only when sending to linux-nfs@vger.kernel.org
as multi-part / html messages are automatically blocked by the spam filter.

I'm not so sure that the nfs client is to blame for the performance
you're seeing.  The problem could arise from too small of a block size
by dd / iozone

I'd try:
a. using a larger block size (e.g.  dd bs=4096k)
b. tuning your tcp better for high bandwidth
c. using jumbo frames all the way, and making sure that the mtu is discovered
automatically and set properly to 9000.

Also, what's you network look like?
what's the switch you're using
is it indeed 10 Gbps non-blocking
are there any linecard / chip bottlenecks or over subscription

Do you see better throughput with iperf?

Brian, you's probably have even more tips and tricks :)

Regards,

Benny

On 2011-06-28 11:26, quanli gui wrote:
> Hi,
> Recently I test the nfsv4 speed, I found that there is something wrong in
> the nfs client, that is the one nfs client can only provide 400MB/S to the
> server.
> My tests as follow:
> machine:one client, four server; hardware: all 16core, 16G memory, 5T disk;
> os: all suse 11 enterprise server, 2.6.31-pnfs-kernel; network: client,
> 10GE, server, 2GE(bond, 1GE*2);
> test method: on the client, mkdir four independent directory, mount the four
> server via nfsv4 protocol, every time increase one;
> test tool: iozone, or dd if=/dev/zero of=test count=20K,then cat
> test>/dev/null
> test result:(force on read speed, and watch the client/server network
> input/output by the sar command)
> 1 client vs 1 server: 200MB/S
> 1 client vs 2 server: 380MB/S, every server: 190MB/S
> 1 client vs 3 server: 380MB/S, every server: 130MB/S
> 1 client vs 4 server: 385MB/S, every server: 95MB/S
> 
> From above, we found that 400MB/S is the max-speed for one client. This
> speed is the limition? How to increase this speed?
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [nfsv4]nfs client bug
  2011-06-29 16:28 ` [nfsv4]nfs client bug Benny Halevy
@ 2011-06-30  2:32   ` quanli gui
  2011-06-30 13:36     ` Andy Adamson
  0 siblings, 1 reply; 11+ messages in thread
From: quanli gui @ 2011-06-30  2:32 UTC (permalink / raw)
  To: Benny Halevy; +Cc: linux-nfs, Mueller, Brian

When I use the iperf tools for one client to 4 ds, the network
throughput is 890MB/S. It reflect that it is indeed 10GE non-blocking.

a. about block size, I use bs=1M when I use dd
b. we indeed use the tcp (doesn't the nfsv4 use the tcp defaultly?)
c. the jumbo frames is what? how set mtu automatically?

Brian, do you have some more tips?


On Thu, Jun 30, 2011 at 12:28 AM, Benny Halevy <bhalevy@tonian.com> wrote:
>
> Hi,
>
> First, please use plain text only when sending to linux-nfs@vger.kernel.org
> as multi-part / html messages are automatically blocked by the spam filter.
>
> I'm not so sure that the nfs client is to blame for the performance
> you're seeing.  The problem could arise from too small of a block size
> by dd / iozone
>
> I'd try:
> a. using a larger block size (e.g.  dd bs=4096k)
> b. tuning your tcp better for high bandwidth
> c. using jumbo frames all the way, and making sure that the mtu is discovered
> automatically and set properly to 9000.
>
> Also, what's you network look like?
> what's the switch you're using
> is it indeed 10 Gbps non-blocking
> are there any linecard / chip bottlenecks or over subscription
>
> Do you see better throughput with iperf?
>
> Brian, you's probably have even more tips and tricks :)
>
> Regards,
>
> Benny
>
> On 2011-06-28 11:26, quanli gui wrote:
> > Hi,
> > Recently I test the nfsv4 speed, I found that there is something wrong in
> > the nfs client, that is the one nfs client can only provide 400MB/S to the
> > server.
> > My tests as follow:
> > machine:one client, four server; hardware: all 16core, 16G memory, 5T disk;
> > os: all suse 11 enterprise server, 2.6.31-pnfs-kernel; network: client,
> > 10GE, server, 2GE(bond, 1GE*2);
> > test method: on the client, mkdir four independent directory, mount the four
> > server via nfsv4 protocol, every time increase one;
> > test tool: iozone, or dd if=/dev/zero of=test count=20K,then cat
> > test>/dev/null
> > test result:(force on read speed, and watch the client/server network
> > input/output by the sar command)
> > 1 client vs 1 server: 200MB/S
> > 1 client vs 2 server: 380MB/S, every server: 190MB/S
> > 1 client vs 3 server: 380MB/S, every server: 130MB/S
> > 1 client vs 4 server: 385MB/S, every server: 95MB/S
> >
> > From above, we found that 400MB/S is the max-speed for one client. This
> > speed is the limition? How to increase this speed?
> >
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [nfsv4]nfs client bug
  2011-06-30  2:32   ` quanli gui
@ 2011-06-30 13:36     ` Andy Adamson
  2011-06-30 14:24       ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Adamson @ 2011-06-30 13:36 UTC (permalink / raw)
  To: quanli gui; +Cc: Benny Halevy, linux-nfs, Mueller, Brian


On Jun 29, 2011, at 10:32 PM, quanli gui wrote:

> When I use the iperf tools for one client to 4 ds, the network
> throughput is 890MB/S. It reflect that it is indeed 10GE non-blocking.
> 
> a. about block size, I use bs=1M when I use dd
> b. we indeed use the tcp (doesn't the nfsv4 use the tcp defaultly?)
> c. the jumbo frames is what? how set mtu automatically?
> 
> Brian, do you have some more tips?

1) Set the mtu on both the client and the server 10G interface. Sometimes 9000 is too high. My setup uses 8000.
To set MTU on interface eth0.

% ifconfig eth0 mtu 9000

iperf will report the MTU of the full path between client and server - use it to verify the MTU of the connection.

2) Increase the # of rpc_slots on the client.
% echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries

3) Increase the # of server threads

% echo 128 > /proc/fs/nfsd/threads
% service nfs restart

4) Ensure the TCP buffers on both the client and the server are large enough for the TCP window.
Calculate the required buffer size by pinging the server from the client with the MTU packet size and multiply the round trip time by the interface capacity

% ping -s 9000 server  - say 108 ms average

10Gbits/sec = 1,250,000,000 Bytes/sec * .108 sec = 135,000,000 bytes

Use this number to set the following: 
sysctl -w net.core.rmem_max = 135000000
sysctl -w net.core.wmem_max 135000000
sysctl -w "net.ipv4.tcp_rmem <first number unchaged> <second unchanged> 135000000"
sysctl net.ipv4.tcp_wmem  <first number unchaged> <second unchanged> 135000000"

5) mount with rsize=131072,wsize=131072

see if this helps

-->Andy

> 
> 
> On Thu, Jun 30, 2011 at 12:28 AM, Benny Halevy <bhalevy@tonian.com> wrote:
>> 
>> Hi,
>> 
>> First, please use plain text only when sending to linux-nfs@vger.kernel.org
>> as multi-part / html messages are automatically blocked by the spam filter.
>> 
>> I'm not so sure that the nfs client is to blame for the performance
>> you're seeing.  The problem could arise from too small of a block size
>> by dd / iozone
>> 
>> I'd try:
>> a. using a larger block size (e.g.  dd bs=4096k)
>> b. tuning your tcp better for high bandwidth
>> c. using jumbo frames all the way, and making sure that the mtu is discovered
>> automatically and set properly to 9000.
>> 
>> Also, what's you network look like?
>> what's the switch you're using
>> is it indeed 10 Gbps non-blocking
>> are there any linecard / chip bottlenecks or over subscription
>> 
>> Do you see better throughput with iperf?
>> 
>> Brian, you's probably have even more tips and tricks :)
>> 
>> Regards,
>> 
>> Benny
>> 
>> On 2011-06-28 11:26, quanli gui wrote:
>>> Hi,
>>> Recently I test the nfsv4 speed, I found that there is something wrong in
>>> the nfs client, that is the one nfs client can only provide 400MB/S to the
>>> server.
>>> My tests as follow:
>>> machine:one client, four server; hardware: all 16core, 16G memory, 5T disk;
>>> os: all suse 11 enterprise server, 2.6.31-pnfs-kernel; network: client,
>>> 10GE, server, 2GE(bond, 1GE*2);
>>> test method: on the client, mkdir four independent directory, mount the four
>>> server via nfsv4 protocol, every time increase one;
>>> test tool: iozone, or dd if=/dev/zero of=test count=20K,then cat
>>> test>/dev/null
>>> test result:(force on read speed, and watch the client/server network
>>> input/output by the sar command)
>>> 1 client vs 1 server: 200MB/S
>>> 1 client vs 2 server: 380MB/S, every server: 190MB/S
>>> 1 client vs 3 server: 380MB/S, every server: 130MB/S
>>> 1 client vs 4 server: 385MB/S, every server: 95MB/S
>>> 
>>> From above, we found that 400MB/S is the max-speed for one client. This
>>> speed is the limition? How to increase this speed?
>>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [nfsv4]nfs client bug
  2011-06-30 13:36     ` Andy Adamson
@ 2011-06-30 14:24       ` Trond Myklebust
  2011-06-30 15:13         ` Benny Halevy
  2011-06-30 15:52         ` quanli gui
  0 siblings, 2 replies; 11+ messages in thread
From: Trond Myklebust @ 2011-06-30 14:24 UTC (permalink / raw)
  To: Andy Adamson; +Cc: quanli gui, Benny Halevy, linux-nfs, Mueller, Brian

On Thu, 2011-06-30 at 09:36 -0400, Andy Adamson wrote: 
> On Jun 29, 2011, at 10:32 PM, quanli gui wrote:
> 
> > When I use the iperf tools for one client to 4 ds, the network
> > throughput is 890MB/S. It reflect that it is indeed 10GE non-blocking.
> > 
> > a. about block size, I use bs=1M when I use dd
> > b. we indeed use the tcp (doesn't the nfsv4 use the tcp defaultly?)
> > c. the jumbo frames is what? how set mtu automatically?
> > 
> > Brian, do you have some more tips?
> 
> 1) Set the mtu on both the client and the server 10G interface. Sometimes 9000 is too high. My setup uses 8000.
> To set MTU on interface eth0.
> 
> % ifconfig eth0 mtu 9000
> 
> iperf will report the MTU of the full path between client and server - use it to verify the MTU of the connection.
> 
> 2) Increase the # of rpc_slots on the client.
> % echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries
> 
> 3) Increase the # of server threads
> 
> % echo 128 > /proc/fs/nfsd/threads
> % service nfs restart
> 
> 4) Ensure the TCP buffers on both the client and the server are large enough for the TCP window.
> Calculate the required buffer size by pinging the server from the client with the MTU packet size and multiply the round trip time by the interface capacity
> 
> % ping -s 9000 server  - say 108 ms average
> 
> 10Gbits/sec = 1,250,000,000 Bytes/sec * .108 sec = 135,000,000 bytes
> 
> Use this number to set the following: 
> sysctl -w net.core.rmem_max = 135000000
> sysctl -w net.core.wmem_max 135000000
> sysctl -w "net.ipv4.tcp_rmem <first number unchaged> <second unchanged> 135000000"
> sysctl net.ipv4.tcp_wmem  <first number unchaged> <second unchanged> 135000000"
> 
> 5) mount with rsize=131072,wsize=131072

6) Note that NFS always guarantees that the file is _on_disk_ after
close(), so if you are using 'dd' to test, then you should be using the
'conv=fsync' flag (i.e 'dd if=/dev/zero of=test count=20k conv=fsync')
in order to obtain a fair comparison between the NFS and local disk
performance. Otherwise, you are comparing NFS and local _pagecache_
performance.

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [nfsv4]nfs client bug
  2011-06-30 14:24       ` Trond Myklebust
@ 2011-06-30 15:13         ` Benny Halevy
  2011-06-30 15:35           ` Trond Myklebust
  2011-06-30 15:52         ` quanli gui
  1 sibling, 1 reply; 11+ messages in thread
From: Benny Halevy @ 2011-06-30 15:13 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Andy Adamson, quanli gui, Benny Halevy, linux-nfs, Mueller, Brian

On 2011-06-30 17:24, Trond Myklebust wrote:
> On Thu, 2011-06-30 at 09:36 -0400, Andy Adamson wrote: 
>> On Jun 29, 2011, at 10:32 PM, quanli gui wrote:
>>
>>> When I use the iperf tools for one client to 4 ds, the network
>>> throughput is 890MB/S. It reflect that it is indeed 10GE non-blocking.
>>>
>>> a. about block size, I use bs=1M when I use dd
>>> b. we indeed use the tcp (doesn't the nfsv4 use the tcp defaultly?)
>>> c. the jumbo frames is what? how set mtu automatically?
>>>
>>> Brian, do you have some more tips?
>>
>> 1) Set the mtu on both the client and the server 10G interface. Sometimes 9000 is too high. My setup uses 8000.
>> To set MTU on interface eth0.
>>
>> % ifconfig eth0 mtu 9000
>>
>> iperf will report the MTU of the full path between client and server - use it to verify the MTU of the connection.
>>
>> 2) Increase the # of rpc_slots on the client.
>> % echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries
>>
>> 3) Increase the # of server threads
>>
>> % echo 128 > /proc/fs/nfsd/threads
>> % service nfs restart
>>
>> 4) Ensure the TCP buffers on both the client and the server are large enough for the TCP window.
>> Calculate the required buffer size by pinging the server from the client with the MTU packet size and multiply the round trip time by the interface capacity
>>
>> % ping -s 9000 server  - say 108 ms average
>>
>> 10Gbits/sec = 1,250,000,000 Bytes/sec * .108 sec = 135,000,000 bytes
>>
>> Use this number to set the following: 
>> sysctl -w net.core.rmem_max = 135000000
>> sysctl -w net.core.wmem_max 135000000
>> sysctl -w "net.ipv4.tcp_rmem <first number unchaged> <second unchanged> 135000000"
>> sysctl net.ipv4.tcp_wmem  <first number unchaged> <second unchanged> 135000000"
>>
>> 5) mount with rsize=131072,wsize=131072
> 
> 6) Note that NFS always guarantees that the file is _on_disk_ after
> close(), so if you are using 'dd' to test, then you should be using the
> 'conv=fsync' flag (i.e 'dd if=/dev/zero of=test count=20k conv=fsync')
> in order to obtain a fair comparison between the NFS and local disk
> performance. Otherwise, you are comparing NFS and local _pagecache_
> performance.

FWIW, modern versions of gnu dd (not sure exactly which version changed that)
calculate and report throughput after close()ing the output file.

Benny

> 
> Trond


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [nfsv4]nfs client bug
  2011-06-30 15:13         ` Benny Halevy
@ 2011-06-30 15:35           ` Trond Myklebust
  2011-06-30 15:42             ` Benny Halevy
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2011-06-30 15:35 UTC (permalink / raw)
  To: Benny Halevy
  Cc: Andy Adamson, quanli gui, Benny Halevy, linux-nfs, Mueller, Brian

On Thu, 2011-06-30 at 18:13 +0300, Benny Halevy wrote: 
> On 2011-06-30 17:24, Trond Myklebust wrote:
> > On Thu, 2011-06-30 at 09:36 -0400, Andy Adamson wrote: 
> >> On Jun 29, 2011, at 10:32 PM, quanli gui wrote:
> >>
> >>> When I use the iperf tools for one client to 4 ds, the network
> >>> throughput is 890MB/S. It reflect that it is indeed 10GE non-blocking.
> >>>
> >>> a. about block size, I use bs=1M when I use dd
> >>> b. we indeed use the tcp (doesn't the nfsv4 use the tcp defaultly?)
> >>> c. the jumbo frames is what? how set mtu automatically?
> >>>
> >>> Brian, do you have some more tips?
> >>
> >> 1) Set the mtu on both the client and the server 10G interface. Sometimes 9000 is too high. My setup uses 8000.
> >> To set MTU on interface eth0.
> >>
> >> % ifconfig eth0 mtu 9000
> >>
> >> iperf will report the MTU of the full path between client and server - use it to verify the MTU of the connection.
> >>
> >> 2) Increase the # of rpc_slots on the client.
> >> % echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries
> >>
> >> 3) Increase the # of server threads
> >>
> >> % echo 128 > /proc/fs/nfsd/threads
> >> % service nfs restart
> >>
> >> 4) Ensure the TCP buffers on both the client and the server are large enough for the TCP window.
> >> Calculate the required buffer size by pinging the server from the client with the MTU packet size and multiply the round trip time by the interface capacity
> >>
> >> % ping -s 9000 server  - say 108 ms average
> >>
> >> 10Gbits/sec = 1,250,000,000 Bytes/sec * .108 sec = 135,000,000 bytes
> >>
> >> Use this number to set the following: 
> >> sysctl -w net.core.rmem_max = 135000000
> >> sysctl -w net.core.wmem_max 135000000
> >> sysctl -w "net.ipv4.tcp_rmem <first number unchaged> <second unchanged> 135000000"
> >> sysctl net.ipv4.tcp_wmem  <first number unchaged> <second unchanged> 135000000"
> >>
> >> 5) mount with rsize=131072,wsize=131072
> > 
> > 6) Note that NFS always guarantees that the file is _on_disk_ after
> > close(), so if you are using 'dd' to test, then you should be using the
> > 'conv=fsync' flag (i.e 'dd if=/dev/zero of=test count=20k conv=fsync')
> > in order to obtain a fair comparison between the NFS and local disk
> > performance. Otherwise, you are comparing NFS and local _pagecache_
> > performance.
> 
> FWIW, modern versions of gnu dd (not sure exactly which version changed that)
> calculate and report throughput after close()ing the output file.

...but not after syncing it unless you explicitly request that.

On most (all?) local filesystems, close() does not imply fsync().

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [nfsv4]nfs client bug
  2011-06-30 15:35           ` Trond Myklebust
@ 2011-06-30 15:42             ` Benny Halevy
  0 siblings, 0 replies; 11+ messages in thread
From: Benny Halevy @ 2011-06-30 15:42 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Andy Adamson, quanli gui, Benny Halevy, linux-nfs, Mueller, Brian

On 2011-06-30 18:35, Trond Myklebust wrote:
> On Thu, 2011-06-30 at 18:13 +0300, Benny Halevy wrote: 
>> On 2011-06-30 17:24, Trond Myklebust wrote:
>>> On Thu, 2011-06-30 at 09:36 -0400, Andy Adamson wrote: 
>>>> On Jun 29, 2011, at 10:32 PM, quanli gui wrote:
>>>>
>>>>> When I use the iperf tools for one client to 4 ds, the network
>>>>> throughput is 890MB/S. It reflect that it is indeed 10GE non-blocking.
>>>>>
>>>>> a. about block size, I use bs=1M when I use dd
>>>>> b. we indeed use the tcp (doesn't the nfsv4 use the tcp defaultly?)
>>>>> c. the jumbo frames is what? how set mtu automatically?
>>>>>
>>>>> Brian, do you have some more tips?
>>>>
>>>> 1) Set the mtu on both the client and the server 10G interface. Sometimes 9000 is too high. My setup uses 8000.
>>>> To set MTU on interface eth0.
>>>>
>>>> % ifconfig eth0 mtu 9000
>>>>
>>>> iperf will report the MTU of the full path between client and server - use it to verify the MTU of the connection.
>>>>
>>>> 2) Increase the # of rpc_slots on the client.
>>>> % echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries
>>>>
>>>> 3) Increase the # of server threads
>>>>
>>>> % echo 128 > /proc/fs/nfsd/threads
>>>> % service nfs restart
>>>>
>>>> 4) Ensure the TCP buffers on both the client and the server are large enough for the TCP window.
>>>> Calculate the required buffer size by pinging the server from the client with the MTU packet size and multiply the round trip time by the interface capacity
>>>>
>>>> % ping -s 9000 server  - say 108 ms average
>>>>
>>>> 10Gbits/sec = 1,250,000,000 Bytes/sec * .108 sec = 135,000,000 bytes
>>>>
>>>> Use this number to set the following: 
>>>> sysctl -w net.core.rmem_max = 135000000
>>>> sysctl -w net.core.wmem_max 135000000
>>>> sysctl -w "net.ipv4.tcp_rmem <first number unchaged> <second unchanged> 135000000"
>>>> sysctl net.ipv4.tcp_wmem  <first number unchaged> <second unchanged> 135000000"
>>>>
>>>> 5) mount with rsize=131072,wsize=131072
>>>
>>> 6) Note that NFS always guarantees that the file is _on_disk_ after
>>> close(), so if you are using 'dd' to test, then you should be using the
>>> 'conv=fsync' flag (i.e 'dd if=/dev/zero of=test count=20k conv=fsync')
>>> in order to obtain a fair comparison between the NFS and local disk
>>> performance. Otherwise, you are comparing NFS and local _pagecache_
>>> performance.
>>
>> FWIW, modern versions of gnu dd (not sure exactly which version changed that)
>> calculate and report throughput after close()ing the output file.
> 
> ...but not after syncing it unless you explicitly request that.
> 
> On most (all?) local filesystems, close() does not imply fsync().

Right.  My point is that for benchmarking NFS, conv=fsync won't show
any noticeable difference. We're in complete agreement that it's required
for benchmarking local file system performance.

Benny

> 
> Trond


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [nfsv4]nfs client bug
  2011-06-30 14:24       ` Trond Myklebust
  2011-06-30 15:13         ` Benny Halevy
@ 2011-06-30 15:52         ` quanli gui
  2011-06-30 15:57           ` Trond Myklebust
  2011-06-30 16:26           ` Andy Adamson
  1 sibling, 2 replies; 11+ messages in thread
From: quanli gui @ 2011-06-30 15:52 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Andy Adamson, Benny Halevy, linux-nfs, Mueller, Brian

Thanks for your tips. I will try to test by using the tips.

But I have a question about the nfsv4 performace indeed because of the
nfsv4 code, that is because the nfsv4 client code, the performace I
tested is slow. Do you have some test result about the nfsv4
performance?

On Thu, Jun 30, 2011 at 10:24 PM, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
> On Thu, 2011-06-30 at 09:36 -0400, Andy Adamson wrote:
>> On Jun 29, 2011, at 10:32 PM, quanli gui wrote:
>>
>> > When I use the iperf tools for one client to 4 ds, the network
>> > throughput is 890MB/S. It reflect that it is indeed 10GE non-blocking.
>> >
>> > a. about block size, I use bs=1M when I use dd
>> > b. we indeed use the tcp (doesn't the nfsv4 use the tcp defaultly?)
>> > c. the jumbo frames is what? how set mtu automatically?
>> >
>> > Brian, do you have some more tips?
>>
>> 1) Set the mtu on both the client and the server 10G interface. Sometimes 9000 is too high. My setup uses 8000.
>> To set MTU on interface eth0.
>>
>> % ifconfig eth0 mtu 9000
>>
>> iperf will report the MTU of the full path between client and server - use it to verify the MTU of the connection.
>>
>> 2) Increase the # of rpc_slots on the client.
>> % echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries
>>
>> 3) Increase the # of server threads
>>
>> % echo 128 > /proc/fs/nfsd/threads
>> % service nfs restart
>>
>> 4) Ensure the TCP buffers on both the client and the server are large enough for the TCP window.
>> Calculate the required buffer size by pinging the server from the client with the MTU packet size and multiply the round trip time by the interface capacity
>>
>> % ping -s 9000 server  - say 108 ms average
>>
>> 10Gbits/sec = 1,250,000,000 Bytes/sec * .108 sec = 135,000,000 bytes
>>
>> Use this number to set the following:
>> sysctl -w net.core.rmem_max = 135000000
>> sysctl -w net.core.wmem_max 135000000
>> sysctl -w "net.ipv4.tcp_rmem <first number unchaged> <second unchanged> 135000000"
>> sysctl net.ipv4.tcp_wmem  <first number unchaged> <second unchanged> 135000000"
>>
>> 5) mount with rsize=131072,wsize=131072
>
> 6) Note that NFS always guarantees that the file is _on_disk_ after
> close(), so if you are using 'dd' to test, then you should be using the
> 'conv=fsync' flag (i.e 'dd if=/dev/zero of=test count=20k conv=fsync')
> in order to obtain a fair comparison between the NFS and local disk
> performance. Otherwise, you are comparing NFS and local _pagecache_
> performance.
>
> Trond
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [nfsv4]nfs client bug
  2011-06-30 15:52         ` quanli gui
@ 2011-06-30 15:57           ` Trond Myklebust
  2011-06-30 16:26           ` Andy Adamson
  1 sibling, 0 replies; 11+ messages in thread
From: Trond Myklebust @ 2011-06-30 15:57 UTC (permalink / raw)
  To: quanli gui; +Cc: Andy Adamson, Benny Halevy, linux-nfs, Mueller, Brian

On Thu, 2011-06-30 at 23:52 +0800, quanli gui wrote: 
> Thanks for your tips. I will try to test by using the tips.
> 
> But I have a question about the nfsv4 performace indeed because of the
> nfsv4 code, that is because the nfsv4 client code, the performace I
> tested is slow. Do you have some test result about the nfsv4
> performance?

Define "slow". Do you mean "slow relative to NFSv3" or is there some
other benchmark you are using?

On my setup, NFSv4 performance is roughly equivalent to NFSv3, but my
workloads are probably different to yours.

Trond

> On Thu, Jun 30, 2011 at 10:24 PM, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
> > On Thu, 2011-06-30 at 09:36 -0400, Andy Adamson wrote:
> >> On Jun 29, 2011, at 10:32 PM, quanli gui wrote:
> >>
> >> > When I use the iperf tools for one client to 4 ds, the network
> >> > throughput is 890MB/S. It reflect that it is indeed 10GE non-blocking.
> >> >
> >> > a. about block size, I use bs=1M when I use dd
> >> > b. we indeed use the tcp (doesn't the nfsv4 use the tcp defaultly?)
> >> > c. the jumbo frames is what? how set mtu automatically?
> >> >
> >> > Brian, do you have some more tips?
> >>
> >> 1) Set the mtu on both the client and the server 10G interface. Sometimes 9000 is too high. My setup uses 8000.
> >> To set MTU on interface eth0.
> >>
> >> % ifconfig eth0 mtu 9000
> >>
> >> iperf will report the MTU of the full path between client and server - use it to verify the MTU of the connection.
> >>
> >> 2) Increase the # of rpc_slots on the client.
> >> % echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries
> >>
> >> 3) Increase the # of server threads
> >>
> >> % echo 128 > /proc/fs/nfsd/threads
> >> % service nfs restart
> >>
> >> 4) Ensure the TCP buffers on both the client and the server are large enough for the TCP window.
> >> Calculate the required buffer size by pinging the server from the client with the MTU packet size and multiply the round trip time by the interface capacity
> >>
> >> % ping -s 9000 server  - say 108 ms average
> >>
> >> 10Gbits/sec = 1,250,000,000 Bytes/sec * .108 sec = 135,000,000 bytes
> >>
> >> Use this number to set the following:
> >> sysctl -w net.core.rmem_max = 135000000
> >> sysctl -w net.core.wmem_max 135000000
> >> sysctl -w "net.ipv4.tcp_rmem <first number unchaged> <second unchanged> 135000000"
> >> sysctl net.ipv4.tcp_wmem  <first number unchaged> <second unchanged> 135000000"
> >>
> >> 5) mount with rsize=131072,wsize=131072
> >
> > 6) Note that NFS always guarantees that the file is _on_disk_ after
> > close(), so if you are using 'dd' to test, then you should be using the
> > 'conv=fsync' flag (i.e 'dd if=/dev/zero of=test count=20k conv=fsync')
> > in order to obtain a fair comparison between the NFS and local disk
> > performance. Otherwise, you are comparing NFS and local _pagecache_
> > performance.
> >
> > Trond
> > --
> > Trond Myklebust
> > Linux NFS client maintainer
> >
> > NetApp
> > Trond.Myklebust@netapp.com
> > www.netapp.com
> >
> >

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [nfsv4]nfs client bug
  2011-06-30 15:52         ` quanli gui
  2011-06-30 15:57           ` Trond Myklebust
@ 2011-06-30 16:26           ` Andy Adamson
  2011-06-30 16:57             ` Ben Greear
  1 sibling, 1 reply; 11+ messages in thread
From: Andy Adamson @ 2011-06-30 16:26 UTC (permalink / raw)
  To: quanli gui; +Cc: Trond Myklebust, Benny Halevy, linux-nfs, Mueller, Brian


On Jun 30, 2011, at 11:52 AM, quanli gui wrote:

> Thanks for your tips. I will try to test by using the tips.
> 
> But I have a question about the nfsv4 performace indeed because of the
> nfsv4 code, that is because the nfsv4 client code, the performace I
> tested is slow. Do you have some test result about the nfsv4
> performance?


I'm just beginning testing NFSv4.0 Linux client to Linux server.  Both are Fedora 13 with the 3.0-rc1 kernel and 10G interfaces.

I'm getting ~ 5Gb/sec READs with iperf and ~3.5Gb/sec READs with NFSv4.0 using iozone. Much more testing/tuning to do.

-->Andy
> 
> On Thu, Jun 30, 2011 at 10:24 PM, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
>> On Thu, 2011-06-30 at 09:36 -0400, Andy Adamson wrote:
>>> On Jun 29, 2011, at 10:32 PM, quanli gui wrote:
>>> 
>>>> When I use the iperf tools for one client to 4 ds, the network
>>>> throughput is 890MB/S. It reflect that it is indeed 10GE non-blocking.
>>>> 
>>>> a. about block size, I use bs=1M when I use dd
>>>> b. we indeed use the tcp (doesn't the nfsv4 use the tcp defaultly?)
>>>> c. the jumbo frames is what? how set mtu automatically?
>>>> 
>>>> Brian, do you have some more tips?
>>> 
>>> 1) Set the mtu on both the client and the server 10G interface. Sometimes 9000 is too high. My setup uses 8000.
>>> To set MTU on interface eth0.
>>> 
>>> % ifconfig eth0 mtu 9000
>>> 
>>> iperf will report the MTU of the full path between client and server - use it to verify the MTU of the connection.
>>> 
>>> 2) Increase the # of rpc_slots on the client.
>>> % echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries
>>> 
>>> 3) Increase the # of server threads
>>> 
>>> % echo 128 > /proc/fs/nfsd/threads
>>> % service nfs restart
>>> 
>>> 4) Ensure the TCP buffers on both the client and the server are large enough for the TCP window.
>>> Calculate the required buffer size by pinging the server from the client with the MTU packet size and multiply the round trip time by the interface capacity
>>> 
>>> % ping -s 9000 server  - say 108 ms average
>>> 
>>> 10Gbits/sec = 1,250,000,000 Bytes/sec * .108 sec = 135,000,000 bytes
>>> 
>>> Use this number to set the following:
>>> sysctl -w net.core.rmem_max = 135000000
>>> sysctl -w net.core.wmem_max 135000000
>>> sysctl -w "net.ipv4.tcp_rmem <first number unchaged> <second unchanged> 135000000"
>>> sysctl net.ipv4.tcp_wmem  <first number unchaged> <second unchanged> 135000000"
>>> 
>>> 5) mount with rsize=131072,wsize=131072
>> 
>> 6) Note that NFS always guarantees that the file is _on_disk_ after
>> close(), so if you are using 'dd' to test, then you should be using the
>> 'conv=fsync' flag (i.e 'dd if=/dev/zero of=test count=20k conv=fsync')
>> in order to obtain a fair comparison between the NFS and local disk
>> performance. Otherwise, you are comparing NFS and local _pagecache_
>> performance.
>> 
>> Trond
>> --
>> Trond Myklebust
>> Linux NFS client maintainer
>> 
>> NetApp
>> Trond.Myklebust@netapp.com
>> www.netapp.com
>> 
>> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [nfsv4]nfs client bug
  2011-06-30 16:26           ` Andy Adamson
@ 2011-06-30 16:57             ` Ben Greear
  0 siblings, 0 replies; 11+ messages in thread
From: Ben Greear @ 2011-06-30 16:57 UTC (permalink / raw)
  To: Andy Adamson
  Cc: quanli gui, Trond Myklebust, Benny Halevy, linux-nfs, Mueller, Brian

On 06/30/2011 09:26 AM, Andy Adamson wrote:
>
> On Jun 30, 2011, at 11:52 AM, quanli gui wrote:
>
>> Thanks for your tips. I will try to test by using the tips.
>>
>> But I have a question about the nfsv4 performace indeed because of the
>> nfsv4 code, that is because the nfsv4 client code, the performace I
>> tested is slow. Do you have some test result about the nfsv4
>> performance?
>
>
> I'm just beginning testing NFSv4.0 Linux client to Linux server.  Both are Fedora 13 with the 3.0-rc1 kernel and 10G interfaces.
>
> I'm getting ~ 5Gb/sec READs with iperf and ~3.5Gb/sec READs with NFSv4.0 using iozone. Much more testing/tuning to do.

We've almost saturated two 10G links (about 17Gbps total) using older (maybe 2.6.34 or so) kernels with Linux clients and
Linux servers.  We use a RAM FS on the server side to make sure disk access isn't a problem,
and fast 10G NICs with TCP offload enabled (Intel 82599, 5GT/s pci-e bus).

We haven't benchmarked this particular setup lately...

Thanks,
Ben

>
> -->Andy
>>
>> On Thu, Jun 30, 2011 at 10:24 PM, Trond Myklebust
>> <Trond.Myklebust@netapp.com>  wrote:
>>> On Thu, 2011-06-30 at 09:36 -0400, Andy Adamson wrote:
>>>> On Jun 29, 2011, at 10:32 PM, quanli gui wrote:
>>>>
>>>>> When I use the iperf tools for one client to 4 ds, the network
>>>>> throughput is 890MB/S. It reflect that it is indeed 10GE non-blocking.
>>>>>
>>>>> a. about block size, I use bs=1M when I use dd
>>>>> b. we indeed use the tcp (doesn't the nfsv4 use the tcp defaultly?)
>>>>> c. the jumbo frames is what? how set mtu automatically?
>>>>>
>>>>> Brian, do you have some more tips?
>>>>
>>>> 1) Set the mtu on both the client and the server 10G interface. Sometimes 9000 is too high. My setup uses 8000.
>>>> To set MTU on interface eth0.
>>>>
>>>> % ifconfig eth0 mtu 9000
>>>>
>>>> iperf will report the MTU of the full path between client and server - use it to verify the MTU of the connection.
>>>>
>>>> 2) Increase the # of rpc_slots on the client.
>>>> % echo 128>  /proc/sys/sunrpc/tcp_slot_table_entries
>>>>
>>>> 3) Increase the # of server threads
>>>>
>>>> % echo 128>  /proc/fs/nfsd/threads
>>>> % service nfs restart
>>>>
>>>> 4) Ensure the TCP buffers on both the client and the server are large enough for the TCP window.
>>>> Calculate the required buffer size by pinging the server from the client with the MTU packet size and multiply the round trip time by the interface capacity
>>>>
>>>> % ping -s 9000 server  - say 108 ms average
>>>>
>>>> 10Gbits/sec = 1,250,000,000 Bytes/sec * .108 sec = 135,000,000 bytes
>>>>
>>>> Use this number to set the following:
>>>> sysctl -w net.core.rmem_max = 135000000
>>>> sysctl -w net.core.wmem_max 135000000
>>>> sysctl -w "net.ipv4.tcp_rmem<first number unchaged>  <second unchanged>  135000000"
>>>> sysctl net.ipv4.tcp_wmem<first number unchaged>  <second unchanged>  135000000"
>>>>
>>>> 5) mount with rsize=131072,wsize=131072
>>>
>>> 6) Note that NFS always guarantees that the file is _on_disk_ after
>>> close(), so if you are using 'dd' to test, then you should be using the
>>> 'conv=fsync' flag (i.e 'dd if=/dev/zero of=test count=20k conv=fsync')
>>> in order to obtain a fair comparison between the NFS and local disk
>>> performance. Otherwise, you are comparing NFS and local _pagecache_
>>> performance.
>>>
>>> Trond
>>> --
>>> Trond Myklebust
>>> Linux NFS client maintainer
>>>
>>> NetApp
>>> Trond.Myklebust@netapp.com
>>> www.netapp.com
>>>
>>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-06-30 16:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <BANLkTi=xcQseTx8BTWEzg-1DO=ayJuMLrw@mail.gmail.com>
2011-06-29 16:28 ` [nfsv4]nfs client bug Benny Halevy
2011-06-30  2:32   ` quanli gui
2011-06-30 13:36     ` Andy Adamson
2011-06-30 14:24       ` Trond Myklebust
2011-06-30 15:13         ` Benny Halevy
2011-06-30 15:35           ` Trond Myklebust
2011-06-30 15:42             ` Benny Halevy
2011-06-30 15:52         ` quanli gui
2011-06-30 15:57           ` Trond Myklebust
2011-06-30 16:26           ` Andy Adamson
2011-06-30 16:57             ` Ben Greear

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.