* rsize,wsize=1M causes severe lags in 10/100 Mbps, what sets those defaults? @ 2019-09-19 7:29 Alkis Georgopoulos 2019-09-19 15:08 ` Trond Myklebust 0 siblings, 1 reply; 19+ messages in thread From: Alkis Georgopoulos @ 2019-09-19 7:29 UTC (permalink / raw) To: linux-nfs Hi, in any recent distribution that I tried, the default NFS wsize/rsize was 1 MB. On 10/100 Mbps networks, this causes severe lags, timeouts, and dmesg fills with messages like: > [ 316.404250] nfs: server 192.168.1.112 not responding, still trying > [ 316.759512] nfs: server 192.168.1.112 OK Forcing wsize/rsize to 32K makes all the problems disappear and NFS access more snappy, without sacrificing any speed at least up to gigabit networks that I tested with. I would like to request that the defaults be changed to 32K. But I didn't find out where these defaults come from, where to file the issue and my test case / benchmarks to support it. I've initially reported it at the klibc nfsmount program that I was using, but this is just using the NFS defaults, which are the ones that should be amended. So initial test case / benchmarks there: https://lists.zytor.com/archives/klibc/2019-September/004234.html Please Cc me as I'm not in the list. Thank you, Alkis Georgopoulos ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps, what sets those defaults? 2019-09-19 7:29 rsize,wsize=1M causes severe lags in 10/100 Mbps, what sets those defaults? Alkis Georgopoulos @ 2019-09-19 15:08 ` Trond Myklebust 2019-09-19 15:58 ` rsize,wsize=1M causes severe lags in 10/100 Mbps Alkis Georgopoulos 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2019-09-19 15:08 UTC (permalink / raw) To: Alkis Georgopoulos; +Cc: linux-nfs On Thu, 19 Sep 2019 at 03:44, Alkis Georgopoulos <alkisg@gmail.com> wrote: > > Hi, in any recent distribution that I tried, the default NFS wsize/rsize > was 1 MB. > > On 10/100 Mbps networks, this causes severe lags, timeouts, and dmesg > fills with messages like: > > > [ 316.404250] nfs: server 192.168.1.112 not responding, still trying > > [ 316.759512] nfs: server 192.168.1.112 OK > > Forcing wsize/rsize to 32K makes all the problems disappear and NFS > access more snappy, without sacrificing any speed at least up to gigabit > networks that I tested with. > > I would like to request that the defaults be changed to 32K. > But I didn't find out where these defaults come from, where to file the > issue and my test case / benchmarks to support it. > > I've initially reported it at the klibc nfsmount program that I was > using, but this is just using the NFS defaults, which are the ones that > should be amended. So initial test case / benchmarks there: > https://lists.zytor.com/archives/klibc/2019-September/004234.html > > Please Cc me as I'm not in the list. > The default client behaviour is just to go with whatever recommended value the server specifies. You can change that value yourself on the knfsd server by editing the pseudo-file in /proc/fs/nfsd/max_block_size. Cheers Trond ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-19 15:08 ` Trond Myklebust @ 2019-09-19 15:58 ` Alkis Georgopoulos 2019-09-19 16:11 ` Trond Myklebust 0 siblings, 1 reply; 19+ messages in thread From: Alkis Georgopoulos @ 2019-09-19 15:58 UTC (permalink / raw) To: linux-nfs; +Cc: Trond Myklebust On 9/19/19 6:08 PM, Trond Myklebust wrote: > The default client behaviour is just to go with whatever recommended > value the server specifies. You can change that value yourself on the > knfsd server by editing the pseudo-file in > /proc/fs/nfsd/max_block_size. Thank you, and I guess I can automate this, by running `systemctl edit nfs-kernel-server`, and adding: [Service] ExecStartPre=sh -c 'echo 32768 > /proc/fs/nfsd/max_block_size' But isn't it a problem that the defaults cause errors in dmesg and severe lags in 10/100 Mbps, and even make 1000 Mbps a lot less snappy than with 32K? In any case thank you again. Alkis Georgopoulos ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-19 15:58 ` rsize,wsize=1M causes severe lags in 10/100 Mbps Alkis Georgopoulos @ 2019-09-19 16:11 ` Trond Myklebust 2019-09-19 19:21 ` Alkis Georgopoulos 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2019-09-19 16:11 UTC (permalink / raw) To: alkisg, linux-nfs On Thu, 2019-09-19 at 18:58 +0300, Alkis Georgopoulos wrote: > On 9/19/19 6:08 PM, Trond Myklebust wrote: > > The default client behaviour is just to go with whatever > > recommended > > value the server specifies. You can change that value yourself on > > the > > knfsd server by editing the pseudo-file in > > /proc/fs/nfsd/max_block_size. > > Thank you, and I guess I can automate this, by running > `systemctl edit nfs-kernel-server`, and adding: > [Service] > ExecStartPre=sh -c 'echo 32768 > /proc/fs/nfsd/max_block_size' > > But isn't it a problem that the defaults cause errors in dmesg and > severe lags in 10/100 Mbps, and even make 1000 Mbps a lot less > snappy > than with 32K? > No. It is not a problem, because nfs-utils defaults to using TCP mounts. Fragmentation is only a problem with UDP, and we stopped defaulting to that almost 2 decades ago. However it may well be that klibc is still defaulting to using UDP, in which case it should be fixed. There are major Linux distros out there today that don't even compile in support for NFS over UDP any more. Cheers Trond -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-19 16:11 ` Trond Myklebust @ 2019-09-19 19:21 ` Alkis Georgopoulos 2019-09-19 19:51 ` Trond Myklebust 0 siblings, 1 reply; 19+ messages in thread From: Alkis Georgopoulos @ 2019-09-19 19:21 UTC (permalink / raw) To: Trond Myklebust, linux-nfs On 9/19/19 7:11 PM, Trond Myklebust wrote: > No. It is not a problem, because nfs-utils defaults to using TCP > mounts. Fragmentation is only a problem with UDP, and we stopped > defaulting to that almost 2 decades ago. > > However it may well be that klibc is still defaulting to using UDP, in > which case it should be fixed. There are major Linux distros out there > today that don't even compile in support for NFS over UDP any more. I haven't tested with UDP at all; the problem was with TCP. I saw the problem in klibc nfsmount with TCP + NFS 3, and in `mount -t nfs -o timeo=7 server:/share /mnt` with TCP + NFS 4.2. Steps to reproduce: 1) Connect server <=> client at 10 or 100 Mbps. Gigabit is also "less snappy" but it's less obvious there. For reliable results, I made sure that server/client/network didn't have any other load at all. 2) Server: echo '/srv *(ro,async,no_subtree_check)' >> /etc/exports exportfs -ra truncate -s 10G /srv/10G.file The sparse file ensures that disk IO bandwidth isn't an issue. 3) Client: mount -t nfs -o timeo=7 192.168.1.112:/srv /mnt dd if=/mnt/10G.file of=/dev/null status=progress 4) Result: dd there starts with 11.2 MB/sec, which is fine/expected, and it slowly drops to 2 MB/sec after a while, it lags, omitting some seconds in its output line, e.g. 507510784 bytes (508 MB, 484 MiB) copied, 186 s, 2,7 MB/s^C, at which point "Ctrl+C" needs 30+ seconds to stop dd, because of IO waiting etc. In another terminal tab, `dmesg -w` is full of these: [ 316.404250] nfs: server 192.168.1.112 not responding, still trying [ 316.759512] nfs: server 192.168.1.112 OK 5) Remarks: With timeo=600, there are no errors in dmesg. The fact that timeo=7 (the nfsmount default) causes errors, proves that some packets need more than 0.7 secs to arrive. Which in turn explains why all the applications open extremely slowly and feel sluggish on netroot = 100 Mbps, NFS, TCP. Lowering rsize,wsize from 1M to 32K solves all those issues without any negative side effects that I can see. Even on gigabit, 32K makes applications a lot more snappy so it's better even there. On 10 Mbps, rsize=1M is completely unusable. So I'm not sure where rsize=1M is a better default. Is it only for 10G+ connections? Thank you very much, Alkis Georgopoulos ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-19 19:21 ` Alkis Georgopoulos @ 2019-09-19 19:51 ` Trond Myklebust 2019-09-19 19:57 ` Alkis Georgopoulos 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2019-09-19 19:51 UTC (permalink / raw) To: alkisg, linux-nfs On Thu, 2019-09-19 at 22:21 +0300, Alkis Georgopoulos wrote: > On 9/19/19 7:11 PM, Trond Myklebust wrote: > > No. It is not a problem, because nfs-utils defaults to using TCP > > mounts. Fragmentation is only a problem with UDP, and we stopped > > defaulting to that almost 2 decades ago. > > > > However it may well be that klibc is still defaulting to using UDP, > > in > > which case it should be fixed. There are major Linux distros out > > there > > today that don't even compile in support for NFS over UDP any more. > > I haven't tested with UDP at all; the problem was with TCP. > I saw the problem in klibc nfsmount with TCP + NFS 3, > and in `mount -t nfs -o timeo=7 server:/share /mnt` with TCP + NFS > 4.2. > > Steps to reproduce: > 1) Connect server <=> client at 10 or 100 Mbps. > Gigabit is also "less snappy" but it's less obvious there. > For reliable results, I made sure that server/client/network didn't > have > any other load at all. > > 2) Server: > echo '/srv *(ro,async,no_subtree_check)' >> /etc/exports > exportfs -ra > truncate -s 10G /srv/10G.file > The sparse file ensures that disk IO bandwidth isn't an issue. > > 3) Client: > mount -t nfs -o timeo=7 192.168.1.112:/srv /mnt > dd if=/mnt/10G.file of=/dev/null status=progress > > 4) Result: > dd there starts with 11.2 MB/sec, which is fine/expected, > and it slowly drops to 2 MB/sec after a while, > it lags, omitting some seconds in its output line, > e.g. 507510784 bytes (508 MB, 484 MiB) copied, 186 s, 2,7 MB/s^C, > at which point "Ctrl+C" needs 30+ seconds to stop dd, > because of IO waiting etc. > > In another terminal tab, `dmesg -w` is full of these: > [ 316.404250] nfs: server 192.168.1.112 not responding, still trying > [ 316.759512] nfs: server 192.168.1.112 OK > > 5) Remarks: > With timeo=600, there are no errors in dmesg. > The fact that timeo=7 (the nfsmount default) causes errors, proves > that > some packets need more than 0.7 secs to arrive. > Which in turn explains why all the applications open extremely > slowly > and feel sluggish on netroot = 100 Mbps, NFS, TCP. > > Lowering rsize,wsize from 1M to 32K solves all those issues without > any > negative side effects that I can see. Even on gigabit, 32K makes > applications a lot more snappy so it's better even there. > On 10 Mbps, rsize=1M is completely unusable. > > So I'm not sure where rsize=1M is a better default. Is it only for > 10G+ > connections? > I don't understand why klibc would default to supplying a timeo=7 argument at all. It would be MUCH better if it just let the kernel set the default, which in the case of TCP is timeo=600. I agree with your argument that replaying requests every 0.7 seconds is just going to cause congestion. TCP provides for reliable delivery of RPC messages to the server, which is why the kernel default is a full minute. So please ask the klibc developers to change libmount to let the kernel decide the default mount options. Their current setting is just plain wrong. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-19 19:51 ` Trond Myklebust @ 2019-09-19 19:57 ` Alkis Georgopoulos 2019-09-19 20:05 ` Trond Myklebust 0 siblings, 1 reply; 19+ messages in thread From: Alkis Georgopoulos @ 2019-09-19 19:57 UTC (permalink / raw) To: Trond Myklebust, linux-nfs On 9/19/19 10:51 PM, Trond Myklebust wrote: > I don't understand why klibc would default to supplying a timeo=7 > argument at all. It would be MUCH better if it just let the kernel set > the default, which in the case of TCP is timeo=600. > > I agree with your argument that replaying requests every 0.7 seconds is > just going to cause congestion. TCP provides for reliable delivery of > RPC messages to the server, which is why the kernel default is a full > minute. > > So please ask the klibc developers to change libmount to let the kernel > decide the default mount options. Their current setting is just plain > wrong. This was what I asked in my first message to their mailing list, https://lists.zytor.com/archives/klibc/2019-September/004234.html Then I realized that timeo=600 just hides the real problem, which is rsize=1M. NFS defaults: timeo=600,rsize=1M => lag nfsmount defaults: timeo=7,rsize=1MK => lag AND dmesg errors My proposal: timeo=whatever,rsize=32K => all fine If more benchmarks are needed from me to document the "NFS defaults: timeo=600,rsize=1M => lag" I can surely provide them. Thanks, Alkis ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-19 19:57 ` Alkis Georgopoulos @ 2019-09-19 20:05 ` Trond Myklebust 2019-09-19 20:20 ` Alkis Georgopoulos 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2019-09-19 20:05 UTC (permalink / raw) To: alkisg, linux-nfs On Thu, 2019-09-19 at 22:57 +0300, Alkis Georgopoulos wrote: > On 9/19/19 10:51 PM, Trond Myklebust wrote: > > I don't understand why klibc would default to supplying a timeo=7 > > argument at all. It would be MUCH better if it just let the kernel > > set > > the default, which in the case of TCP is timeo=600. > > > > I agree with your argument that replaying requests every 0.7 > > seconds is > > just going to cause congestion. TCP provides for reliable delivery > > of > > RPC messages to the server, which is why the kernel default is a > > full > > minute. > > > > So please ask the klibc developers to change libmount to let the > > kernel > > decide the default mount options. Their current setting is just > > plain > > wrong. > > This was what I asked in my first message to their mailing list, > https://lists.zytor.com/archives/klibc/2019-September/004234.html > > Then I realized that timeo=600 just hides the real problem, > which is rsize=1M. > > NFS defaults: timeo=600,rsize=1M => lag > nfsmount defaults: timeo=7,rsize=1MK => lag AND dmesg errors > > My proposal: timeo=whatever,rsize=32K => all fine > > If more benchmarks are needed from me to document the > "NFS defaults: timeo=600,rsize=1M => lag" > I can surely provide them. There are plenty of operations that can take longer than 700 ms to complete. Synchronous writes to disk are one, but COMMIT (i.e. the NFS equivalent of fsync()) can often take much longer even though it has no payload. So the problem is not the size of the WRITE payload. The real problem is the timeout. The bottom line is that if you want to keep timeo=7 as a mount option for TCP, then you are on your own. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-19 20:05 ` Trond Myklebust @ 2019-09-19 20:20 ` Alkis Georgopoulos 2019-09-19 20:40 ` Trond Myklebust 0 siblings, 1 reply; 19+ messages in thread From: Alkis Georgopoulos @ 2019-09-19 20:20 UTC (permalink / raw) To: Trond Myklebust, linux-nfs On 9/19/19 11:05 PM, Trond Myklebust wrote: > There are plenty of operations that can take longer than 700 ms to > complete. Synchronous writes to disk are one, but COMMIT (i.e. the NFS > equivalent of fsync()) can often take much longer even though it has no > payload. > > So the problem is not the size of the WRITE payload. The real problem > is the timeout. > > The bottom line is that if you want to keep timeo=7 as a mount option > for TCP, then you are on your own. > The problem isn't timeo at all. If I understand it correctly, when I try to launch firefox over nfsroot, NFS will wait until it fills 1M before "replying" to the application. Thus the applications will launch a lot slower, as they get "disk feedback" in larger chunks and not "snappy". In numbers: timeo=600,rsize=1M => firefox opens in 30 secs timeo=600,rsize=32k => firefox opens in 20 secs Anyway, thank you very much for your time and feedback. Kind regards, Alkis Georgopoulos ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-19 20:20 ` Alkis Georgopoulos @ 2019-09-19 20:40 ` Trond Myklebust 2019-09-19 21:19 ` Daniel Forrest 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2019-09-19 20:40 UTC (permalink / raw) To: alkisg, linux-nfs On Thu, 2019-09-19 at 23:20 +0300, Alkis Georgopoulos wrote: > On 9/19/19 11:05 PM, Trond Myklebust wrote: > > There are plenty of operations that can take longer than 700 ms to > > complete. Synchronous writes to disk are one, but COMMIT (i.e. the > > NFS > > equivalent of fsync()) can often take much longer even though it > > has no > > payload. > > > > So the problem is not the size of the WRITE payload. The real > > problem > > is the timeout. > > > > The bottom line is that if you want to keep timeo=7 as a mount > > option > > for TCP, then you are on your own. > > > > The problem isn't timeo at all. > If I understand it correctly, when I try to launch firefox over > nfsroot, > NFS will wait until it fills 1M before "replying" to the application. > Thus the applications will launch a lot slower, as they get "disk > feedback" in larger chunks and not "snappy". > > In numbers: > timeo=600,rsize=1M => firefox opens in 30 secs > timeo=600,rsize=32k => firefox opens in 20 secs > That's a different problem, and is most likely due to readahead causing your client to read more data than it needs to. It is also true that the maximum readahead size is proportional to the rsize and that maybe it shouldn't be. However the VM layer is supposed to ensure that the kernel doesn't try to read ahead more than necessary. It is bounded by the maximum we set in the NFS layer, but it isn't supposed to hit that maximum unless the readahead heuristics show that the application may need it. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-19 20:40 ` Trond Myklebust @ 2019-09-19 21:19 ` Daniel Forrest 2019-09-19 21:42 ` Trond Myklebust 0 siblings, 1 reply; 19+ messages in thread From: Daniel Forrest @ 2019-09-19 21:19 UTC (permalink / raw) To: Trond Myklebust; +Cc: alkisg, linux-nfs On Thu, Sep 19, 2019 at 08:40:41PM +0000, Trond Myklebust wrote: > On Thu, 2019-09-19 at 23:20 +0300, Alkis Georgopoulos wrote: > > On 9/19/19 11:05 PM, Trond Myklebust wrote: > > > There are plenty of operations that can take longer than 700 ms to > > > complete. Synchronous writes to disk are one, but COMMIT (i.e. the > > > NFS > > > equivalent of fsync()) can often take much longer even though it > > > has no > > > payload. > > > > > > So the problem is not the size of the WRITE payload. The real > > > problem > > > is the timeout. > > > > > > The bottom line is that if you want to keep timeo=7 as a mount > > > option > > > for TCP, then you are on your own. > > > > > > > The problem isn't timeo at all. > > If I understand it correctly, when I try to launch firefox over > > nfsroot, > > NFS will wait until it fills 1M before "replying" to the application. > > Thus the applications will launch a lot slower, as they get "disk > > feedback" in larger chunks and not "snappy". > > > > In numbers: > > timeo=600,rsize=1M => firefox opens in 30 secs > > timeo=600,rsize=32k => firefox opens in 20 secs > > > > That's a different problem, and is most likely due to readahead causing > your client to read more data than it needs to. It is also true that > the maximum readahead size is proportional to the rsize and that maybe > it shouldn't be. > However the VM layer is supposed to ensure that the kernel doesn't try > to read ahead more than necessary. It is bounded by the maximum we set > in the NFS layer, but it isn't supposed to hit that maximum unless the > readahead heuristics show that the application may need it. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com What may be happening here is something I have noticed with glibc. - statfs reports the rsize/wsize as the block size of the filesystem. - glibc uses the block size as the default buffer size for fread/fwrite. If an application is using fread/fwrite on an NFS mounted file with an rsize/wsize of 1M it will try to fill a 1MB buffer. I have often changed mounts to use rsize/wsize=64K to alleviate this. -- Dan Forrest Space Science and Engineering Center, University of Wisconsin, Madison dforrest@wisc.edu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-19 21:19 ` Daniel Forrest @ 2019-09-19 21:42 ` Trond Myklebust 2019-09-19 22:16 ` Daniel Forrest 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2019-09-19 21:42 UTC (permalink / raw) To: Daniel Forrest; +Cc: alkisg, linux-nfs On Thu, 2019-09-19 at 16:19 -0500, Daniel Forrest wrote: > On Thu, Sep 19, 2019 at 08:40:41PM +0000, Trond Myklebust wrote: > > On Thu, 2019-09-19 at 23:20 +0300, Alkis Georgopoulos wrote: > > > On 9/19/19 11:05 PM, Trond Myklebust wrote: > > > > There are plenty of operations that can take longer than 700 ms > > > > to > > > > complete. Synchronous writes to disk are one, but COMMIT (i.e. > > > > the > > > > NFS > > > > equivalent of fsync()) can often take much longer even though > > > > it > > > > has no > > > > payload. > > > > > > > > So the problem is not the size of the WRITE payload. The real > > > > problem > > > > is the timeout. > > > > > > > > The bottom line is that if you want to keep timeo=7 as a mount > > > > option > > > > for TCP, then you are on your own. > > > > > > > > > > The problem isn't timeo at all. > > > If I understand it correctly, when I try to launch firefox over > > > nfsroot, > > > NFS will wait until it fills 1M before "replying" to the > > > application. > > > Thus the applications will launch a lot slower, as they get > > > "disk > > > feedback" in larger chunks and not "snappy". > > > > > > In numbers: > > > timeo=600,rsize=1M => firefox opens in 30 secs > > > timeo=600,rsize=32k => firefox opens in 20 secs > > > > > > > That's a different problem, and is most likely due to readahead > > causing > > your client to read more data than it needs to. It is also true > > that > > the maximum readahead size is proportional to the rsize and that > > maybe > > it shouldn't be. > > However the VM layer is supposed to ensure that the kernel doesn't > > try > > to read ahead more than necessary. It is bounded by the maximum we > > set > > in the NFS layer, but it isn't supposed to hit that maximum unless > > the > > readahead heuristics show that the application may need it. > > > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trond.myklebust@hammerspace.com > > What may be happening here is something I have noticed with glibc. > > - statfs reports the rsize/wsize as the block size of the filesystem. > > - glibc uses the block size as the default buffer size for > fread/fwrite. > > If an application is using fread/fwrite on an NFS mounted file with > an > rsize/wsize of 1M it will try to fill a 1MB buffer. > > I have often changed mounts to use rsize/wsize=64K to alleviate this. > That sounds like an abuse of the filesystem block size. There is nothing in the POSIX definition of either fread() or fwrite() that requires glibc to do this: https://pubs.opengroup.org/onlinepubs/9699919799/functions/fread.html https://pubs.opengroup.org/onlinepubs/9699919799/functions/fwrite.html -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-19 21:42 ` Trond Myklebust @ 2019-09-19 22:16 ` Daniel Forrest 2019-09-20 9:25 ` Alkis Georgopoulos 0 siblings, 1 reply; 19+ messages in thread From: Daniel Forrest @ 2019-09-19 22:16 UTC (permalink / raw) To: Trond Myklebust; +Cc: alkisg, linux-nfs On Thu, Sep 19, 2019 at 05:42:26PM -0400, Trond Myklebust wrote: > On Thu, 2019-09-19 at 16:19 -0500, Daniel Forrest wrote: > > On Thu, Sep 19, 2019 at 08:40:41PM +0000, Trond Myklebust wrote: > > > On Thu, 2019-09-19 at 23:20 +0300, Alkis Georgopoulos wrote: > > > > On 9/19/19 11:05 PM, Trond Myklebust wrote: > > > > > There are plenty of operations that can take longer than 700 ms > > > > > to > > > > > complete. Synchronous writes to disk are one, but COMMIT (i.e. > > > > > the > > > > > NFS > > > > > equivalent of fsync()) can often take much longer even though > > > > > it > > > > > has no > > > > > payload. > > > > > > > > > > So the problem is not the size of the WRITE payload. The real > > > > > problem > > > > > is the timeout. > > > > > > > > > > The bottom line is that if you want to keep timeo=7 as a mount > > > > > option > > > > > for TCP, then you are on your own. > > > > > > > > > > > > > The problem isn't timeo at all. > > > > If I understand it correctly, when I try to launch firefox over > > > > nfsroot, > > > > NFS will wait until it fills 1M before "replying" to the > > > > application. > > > > Thus the applications will launch a lot slower, as they get > > > > "disk > > > > feedback" in larger chunks and not "snappy". > > > > > > > > In numbers: > > > > timeo=600,rsize=1M => firefox opens in 30 secs > > > > timeo=600,rsize=32k => firefox opens in 20 secs > > > > > > > > > > That's a different problem, and is most likely due to readahead > > > causing > > > your client to read more data than it needs to. It is also true > > > that > > > the maximum readahead size is proportional to the rsize and that > > > maybe > > > it shouldn't be. > > > However the VM layer is supposed to ensure that the kernel doesn't > > > try > > > to read ahead more than necessary. It is bounded by the maximum we > > > set > > > in the NFS layer, but it isn't supposed to hit that maximum unless > > > the > > > readahead heuristics show that the application may need it. > > > > > > -- > > > Trond Myklebust > > > Linux NFS client maintainer, Hammerspace > > > trond.myklebust@hammerspace.com > > > > What may be happening here is something I have noticed with glibc. > > > > - statfs reports the rsize/wsize as the block size of the filesystem. > > > > - glibc uses the block size as the default buffer size for > > fread/fwrite. > > > > If an application is using fread/fwrite on an NFS mounted file with > > an rsize/wsize of 1M it will try to fill a 1MB buffer. > > > > I have often changed mounts to use rsize/wsize=64K to alleviate this. > > > > That sounds like an abuse of the filesystem block size. There is > nothing in the POSIX definition of either fread() or fwrite() that > requires glibc to do this: > https://pubs.opengroup.org/onlinepubs/9699919799/functions/fread.html > https://pubs.opengroup.org/onlinepubs/9699919799/functions/fwrite.html > > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com It looks like this was fixed in glibc 2.25: https://sourceware.org/bugzilla/show_bug.cgi?id=4099 But this version is not on the CentOS 6/7 systems I use. -- Dan Forrest Space Science and Engineering Center University of Wisconsin, Madison dforrest@wisc.edu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-19 22:16 ` Daniel Forrest @ 2019-09-20 9:25 ` Alkis Georgopoulos 2019-09-20 9:48 ` Alkis Georgopoulos 0 siblings, 1 reply; 19+ messages in thread From: Alkis Georgopoulos @ 2019-09-20 9:25 UTC (permalink / raw) To: Trond Myklebust, linux-nfs On 9/20/19 1:16 AM, Daniel Forrest wrote: >>> What may be happening here is something I have noticed with glibc. >>> >>> - statfs reports the rsize/wsize as the block size of the filesystem. >>> >>> - glibc uses the block size as the default buffer size for >>> fread/fwrite. >>> >>> If an application is using fread/fwrite on an NFS mounted file with >>> an rsize/wsize of 1M it will try to fill a 1MB buffer. >>> >>> I have often changed mounts to use rsize/wsize=64K to alleviate this. >> >> That sounds like an abuse of the filesystem block size. There is >> nothing in the POSIX definition of either fread() or fwrite() that >> requires glibc to do this: >> https://pubs.opengroup.org/onlinepubs/9699919799/functions/fread.html >> https://pubs.opengroup.org/onlinepubs/9699919799/functions/fwrite.html >> > > It looks like this was fixed in glibc 2.25: > > https://sourceware.org/bugzilla/show_bug.cgi?id=4099 This is likely not the exact issue I'm experiencing, as I'm testing e.g. with glibc 2.27-3ubuntu1 on Ubuntu 18.04 and kernel 5.0. New benchmark, measuring the boot time of a netbooted client, from right after the kernel is loaded to the display manager screen: 1) On 10 Mbps: a) tcp,timeo=600,rsize=32K: 304 secs b) tcp,timeo=600,rsize=1M: 618 secs 2) On 100 Mbps: a) tcp,timeo=600,rsize=32K: 40 secs b) tcp,timeo=600,rsize=1M: 84 secs 3) On 1000 Mbps: a) tcp,timeo=600,rsize=32K: 20 secs b) tcp,timeo=600,rsize=1M: 24 secs 32K is always faster, even on full gigabit. Disk access on gigabit was *significantly* faster to result in 4 seconds lower boot time. In the 10/100 cases, rsize=1M is pretty much unusable. There are no writes involved, they go in a local tmpfs/overlayfs. Would it make sense for me to measure the *boot bandwidth* in each case, to see if more things (readahead) are downloaded with rsize=1M? I can do whatever benchmarks and test whatever parameters you tell me to, but I do not know the NFS/kernel internals to be able to explain why this happens. The reason I investigated this is because I developed the new version of ltsp.org (GPL netbooting software), where we switched from squashfs-over-NBD to squashfs-over-NFS, and netbooting was extremely slow until I lowered rsize to 32K, so I thought I'd share my findings in case it makes a better default for everyone (or reveals a problem elsewhere). With rsize=32K, squashfs-over-NFS is as speedy as squashfs-over-NBD, but a lot more stable. Of course the same rsize findings apply for NFS /home too (without nfsmount), or for just transferring large or small files, not just for /. Btw, https://www.kernel.org/doc/Documentation/filesystems/nfs/nfsroot.txt says the kernel nfsroot defaults are timeo=7,rsize=4096,wsize=4096. This is about the internal kernel netbooting support, not using klibc nfsmount; but I haven't tested it as it would involve compiling a kernel with my NIC driver. Thank you, Alkis Georgopoulos LTSP developer ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-20 9:25 ` Alkis Georgopoulos @ 2019-09-20 9:48 ` Alkis Georgopoulos 2019-09-20 10:04 ` Alkis Georgopoulos 0 siblings, 1 reply; 19+ messages in thread From: Alkis Georgopoulos @ 2019-09-20 9:48 UTC (permalink / raw) To: Trond Myklebust, linux-nfs On 9/20/19 12:25 PM, Alkis Georgopoulos wrote: > This is likely not the exact issue I'm experiencing, as I'm testing e.g. > with glibc 2.27-3ubuntu1 on Ubuntu 18.04 and kernel 5.0. > > New benchmark, measuring the boot time of a netbooted client, > from right after the kernel is loaded to the display manager screen: > > 1) On 10 Mbps: > a) tcp,timeo=600,rsize=32K: 304 secs > b) tcp,timeo=600,rsize=1M: 618 secs > > 2) On 100 Mbps: > a) tcp,timeo=600,rsize=32K: 40 secs > b) tcp,timeo=600,rsize=1M: 84 secs > > 3) On 1000 Mbps: > a) tcp,timeo=600,rsize=32K: 20 secs > b) tcp,timeo=600,rsize=1M: 24 secs > > 32K is always faster, even on full gigabit. > Disk access on gigabit was *significantly* faster to result in 4 seconds > lower boot time. In the 10/100 cases, rsize=1M is pretty much unusable. > There are no writes involved, they go in a local tmpfs/overlayfs. > Would it make sense for me to measure the *boot bandwidth* in each case, > to see if more things (readahead) are downloaded with rsize=1M? I did test the boot bandwidth. On ext4-over-NFS, with tmpfs-and-overlayfs to make root writable: 2) On 100 Mbps: a) tcp,timeo=600,rsize=32K: 471MB b) tcp,timeo=600,rsize=1M: 1250MB So it is indeed slower because it's transferring more things that the client doesn't need. Maybe it is a different or a new aspect of the readahead issue that you guys mentioned above. Is it possible that NFS is always sending 1MB chunks even when the actual data inside them is lower? If you want me to test more things, I can; if you consider it a problem with glibc etc that shouldn't involve this mailing list, I can try to report it there... Thank you, Alkis Georgopoulos ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-20 9:48 ` Alkis Georgopoulos @ 2019-09-20 10:04 ` Alkis Georgopoulos 2019-09-21 7:52 ` Alkis Georgopoulos 0 siblings, 1 reply; 19+ messages in thread From: Alkis Georgopoulos @ 2019-09-20 10:04 UTC (permalink / raw) To: Trond Myklebust, linux-nfs On 9/20/19 12:48 PM, Alkis Georgopoulos wrote: > I did test the boot bandwidth (I mean how many MB were transferred). > On ext4-over-NFS, with tmpfs-and-overlayfs to make root writable: I also tested with the kernel netbooting default of rsize=4K to compare. All on 100 Mbps, tcp,timeo=600: | rsize | MB to boot | sec to boot | |-------|------------|-------------| | 1M | 1250 | 84 | | 32K | 471 | 40 | | 4K | 320 | 31 | | 2K | 355 | 34 | It appears matching rsize=cluster size=4K gives the best results. Thank you, Alkis Georgopoulos ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-20 10:04 ` Alkis Georgopoulos @ 2019-09-21 7:52 ` Alkis Georgopoulos 2019-09-21 7:59 ` Alkis Georgopoulos 0 siblings, 1 reply; 19+ messages in thread From: Alkis Georgopoulos @ 2019-09-21 7:52 UTC (permalink / raw) To: linux-nfs I think it's caused by the kernel readahead, not glibc readahead. TL;DR: This solves the problem: echo 4 > /sys/devices/virtual/bdi/0:58/read_ahead_kb Question: how to configure NFS/kernel to automatically set that? Long version: Doing step (4) below results in tremendous speedup: 1) mount -t nfs -o tcp,timeo=600,rsize=1048576,wsize=1048576 10.161.254.11:/srv/ltsp /mnt 2) cat /proc/fs/nfsfs/volumes We see the DEV number from there, e.g. 0:58 3) cat /sys/devices/virtual/bdi/0:58/read_ahead_kb 15360 I assume that this means the kernel will try to read ahead up to 15 MB for each accessed file. *THIS IS THE PROBLEM*. For non-NFS devices, this value is 128 (KB). 4) echo 4 > /sys/devices/virtual/bdi/0:58/read_ahead_kb 5) Test. Traffic now should be a *lot* less, and speed a *lot* more. E.g. my NFS booting tests: - read_ahead_kb=15360 (the default) => 1160 MB traffic to boot - read_ahead_kb=128 => 324MB traffic - read_ahead_kb=4 => 223MB traffic So the question that remains, is how to properly configure either NFS or the kernel, to use small readahead values for NFS. I'm currently doing it with this workaround: for f in $(awk '/^v[0-9]/ { print $4 }' < /proc/fs/nfsfs/volumes); do echo 4 > /sys/devices/virtual/bdi/$f/read_ahead_kb; done Thanks, Alkis ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-21 7:52 ` Alkis Georgopoulos @ 2019-09-21 7:59 ` Alkis Georgopoulos 2019-09-21 11:02 ` Alkis Georgopoulos 0 siblings, 1 reply; 19+ messages in thread From: Alkis Georgopoulos @ 2019-09-21 7:59 UTC (permalink / raw) To: linux-nfs On 9/21/19 10:52 AM, Alkis Georgopoulos wrote: > I think it's caused by the kernel readahead, not glibc readahead. > TL;DR: This solves the problem: > echo 4 > /sys/devices/virtual/bdi/0:58/read_ahead_kb > > Question: how to configure NFS/kernel to automatically set that? > > Long version: > Doing step (4) below results in tremendous speedup: > > 1) mount -t nfs -o tcp,timeo=600,rsize=1048576,wsize=1048576 > 10.161.254.11:/srv/ltsp /mnt > > 2) cat /proc/fs/nfsfs/volumes > We see the DEV number from there, e.g. 0:58 > > 3) cat /sys/devices/virtual/bdi/0:58/read_ahead_kb > 15360 > I assume that this means the kernel will try to read ahead up to 15 MB > for each accessed file. *THIS IS THE PROBLEM*. For non-NFS devices, this > value is 128 (KB). > > 4) echo 4 > /sys/devices/virtual/bdi/0:58/read_ahead_kb > > 5) Test. Traffic now should be a *lot* less, and speed a *lot* more. > E.g. my NFS booting tests: > - read_ahead_kb=15360 (the default) => 1160 MB traffic to boot > - read_ahead_kb=128 => 324MB traffic > - read_ahead_kb=4 => 223MB traffic > > So the question that remains, is how to properly configure either NFS or > the kernel, to use small readahead values for NFS. > > I'm currently doing it with this workaround: > for f in $(awk '/^v[0-9]/ { print $4 }' < /proc/fs/nfsfs/volumes); do > echo 4 > /sys/devices/virtual/bdi/$f/read_ahead_kb; done > > Thanks, > Alkis Quoting https://lkml.org/lkml/2010/2/26/48 > nfs: use 2*rsize readahead size > With default rsize=512k and NFS_MAX_READAHEAD=15, the current NFS > readahead size 512k*15=7680k is too large than necessary for typical > clients. I.e. the problem probably is that when NFS_MAX_READAHEAD=15 was implemented, rsize was 512k; now that rsize=1M, this results in readaheads of 15M, which cause all the traffic and lags. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsize,wsize=1M causes severe lags in 10/100 Mbps 2019-09-21 7:59 ` Alkis Georgopoulos @ 2019-09-21 11:02 ` Alkis Georgopoulos 0 siblings, 0 replies; 19+ messages in thread From: Alkis Georgopoulos @ 2019-09-21 11:02 UTC (permalink / raw) To: linux-nfs On 9/21/19 10:59 AM, Alkis Georgopoulos wrote: > I.e. the problem probably is that when NFS_MAX_READAHEAD=15 was > implemented, rsize was 512k; now that rsize=1M, this results in > readaheads of 15M, which cause all the traffic and lags. I filed a bug report for this: https://bugzilla.kernel.org/show_bug.cgi?id=204939 A quick work around is to run on the clients, after the NFS mounts: for f in $(awk '/^v[0-9]/ { print $4 }' < /proc/fs/nfsfs/volumes); do echo 4 > /sys/devices/virtual/bdi/$f/read_ahead_kb done Btw the mail title is wrong, the workaround above causes the netboot traffic to drop from e.g. 1160MB to 221MB in any network speed; it was just more observable in lower speeds. Thank you very much, Alkis Georgopoulos ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2019-09-21 11:02 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-19 7:29 rsize,wsize=1M causes severe lags in 10/100 Mbps, what sets those defaults? Alkis Georgopoulos 2019-09-19 15:08 ` Trond Myklebust 2019-09-19 15:58 ` rsize,wsize=1M causes severe lags in 10/100 Mbps Alkis Georgopoulos 2019-09-19 16:11 ` Trond Myklebust 2019-09-19 19:21 ` Alkis Georgopoulos 2019-09-19 19:51 ` Trond Myklebust 2019-09-19 19:57 ` Alkis Georgopoulos 2019-09-19 20:05 ` Trond Myklebust 2019-09-19 20:20 ` Alkis Georgopoulos 2019-09-19 20:40 ` Trond Myklebust 2019-09-19 21:19 ` Daniel Forrest 2019-09-19 21:42 ` Trond Myklebust 2019-09-19 22:16 ` Daniel Forrest 2019-09-20 9:25 ` Alkis Georgopoulos 2019-09-20 9:48 ` Alkis Georgopoulos 2019-09-20 10:04 ` Alkis Georgopoulos 2019-09-21 7:52 ` Alkis Georgopoulos 2019-09-21 7:59 ` Alkis Georgopoulos 2019-09-21 11:02 ` Alkis Georgopoulos
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).