linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* effect of nfs blocksize on I/O ?
@ 2003-09-29  6:42 Frank Cusack
  2003-09-29  7:19 ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Frank Cusack @ 2003-09-29  6:42 UTC (permalink / raw)
  To: Trond Myklebust, torvalds; +Cc: lkml

I am not talking about rsize/wsize, rather the fs blocksize.

2.4 sets this to MIN(MAX(MAX(4096,rsize),wsize),32768) = 8192 typically.
2.6 sets this to nfs_fsinfo.wtmult?nfs_fsinfo.wtmult:512 = 512 typically.

(My estimation of "typical" may be way off though.)

At a 512 byte blocksize, this overflows struct statfs for fs > 1TB.
Most of my NFS filesystems (on netapp) are larger than that.

But more importantly, what does the VFS *do* with the blocksize?
strace seems to show that glibc/stdio does consider it.  If I fprintf()
two 4096 byte strings, libc does a single write() with 8192 blocksize,
and 3 write()'s for 512 blocksize.  I haven't looked to see what goes
over the wire, but I assume that still follows rsize/wsize.

Does any NFS server report wtmult?

Here's a patch.

/fc

--- a/fs/nfs/inode.c	2003-09-28 23:41:13.000000000 -0700
+++ b/fs/nfs/inode.c	2003-09-28 23:40:18.000000000 -0700
@@ -323,8 +323,8 @@
 		server->wsize = nfs_block_size(fsinfo.wtpref, NULL);
 	if (sb->s_blocksize == 0) {
 		if (fsinfo.wtmult == 0) {
-			sb->s_blocksize = 512;
-			sb->s_blocksize_bits = 9;
+			sb->s_blocksize = nfs_block_bits(server->rsize > server->wsize ? server->rsize : server->wsize,
+							 &sb->s_blocksize_bits);
 		} else
 			sb->s_blocksize = nfs_block_bits(fsinfo.wtmult,
 							 &sb->s_blocksize_bits);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* effect of nfs blocksize on I/O ?
  2003-09-29  6:42 effect of nfs blocksize on I/O ? Frank Cusack
@ 2003-09-29  7:19 ` Trond Myklebust
  2003-09-29  7:52   ` Frank Cusack
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2003-09-29  7:19 UTC (permalink / raw)
  To: Frank Cusack; +Cc: torvalds, lkml

>>>>> " " == Frank Cusack <fcusack@fcusack.com> writes:

     > 2.6 sets this to nfs_fsinfo.wtmult?nfs_fsinfo.wtmult:512 = 512
     >     typically.

     > (My estimation of "typical" may be way off though.)

     > At a 512 byte blocksize, this overflows struct statfs for fs >
     > 1TB.  Most of my NFS filesystems (on netapp) are larger than
     > that.

Then you should use statfs64()/statvfs64(). Nobody is going to
guarantee to you that the equivalent 32-bit syscalls will hold for
arbitrary disk sizes.

     > But more importantly, what does the VFS *do* with the
     > blocksize?  strace seems to show that glibc/stdio does consider
     > it.  If I fprintf() two 4096 byte strings, libc does a single
     > write() with 8192 blocksize, and 3 write()'s for 512 blocksize.
     > I haven't looked to see what goes over the wire, but I assume
     > that still follows rsize/wsize.

In NFS you need to distinguish between the 'block size' (bsize) and
the 'fragment size' (frsize). The former is defined as the "optimal
transfer block size", the latter is the "block size on the underlying
filesystem" according to the manpages.

These are SUS-mandated definitions...


The VFS itself cares little about the blocksize, however programs like
'df' are supposed to use the fragment size as their basic unit when
reporting space usage. Putting arbitrary values in place of the true
fragment size typically leads to rounding errors, and so is not
recommended.

OTOH, bsize is of informational interest to programs that wish to
optimize I/O throughput by grouping their data into appropriately
sized records.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: effect of nfs blocksize on I/O ?
  2003-09-29  7:19 ` Trond Myklebust
@ 2003-09-29  7:52   ` Frank Cusack
  2003-09-29  8:27     ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Frank Cusack @ 2003-09-29  7:52 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: torvalds, lkml

On Mon, Sep 29, 2003 at 12:19:30AM -0700, Trond Myklebust wrote:
> >>>>> " " == Frank Cusack <fcusack@fcusack.com> writes:
> 
>      > 2.6 sets this to nfs_fsinfo.wtmult?nfs_fsinfo.wtmult:512 = 512
>      >     typically.
> 
>      > (My estimation of "typical" may be way off though.)
> 
>      > At a 512 byte blocksize, this overflows struct statfs for fs >
>      > 1TB.  Most of my NFS filesystems (on netapp) are larger than
>      > that.
> 
> Then you should use statfs64()/statvfs64(). Nobody is going to
> guarantee to you that the equivalent 32-bit syscalls will hold for
> arbitrary disk sizes.

I see.

> OTOH, bsize is of informational interest to programs that wish to
> optimize I/O throughput by grouping their data into appropriately
> sized records.

So then isn't the optimal record size 8192 for r/wsize=8192?  Since the
data is going to be grouped into 8192-byte reads and writes over the wire,
shouldn't bsize match that?  Why should I make 16x 512-byte write() syscalls
(if "optimal" I/O size is bsize=512) instead of 1x 8192-byte syscall?

SUSv3 says:

    unsigned long f_bsize    File system block size. 
    unsigned long f_frsize   Fundamental file system block size.

Solaris statvfs(2) says:

    u_long      f_bsize;             /* preferred file system block size */
    u_long      f_frsize;            /* fundamental filesystem block
                                         (size if supported) */

/fc

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: effect of nfs blocksize on I/O ?
  2003-09-29  7:52   ` Frank Cusack
@ 2003-09-29  8:27     ` Trond Myklebust
  2003-09-30  5:23       ` Frank Cusack
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2003-09-29  8:27 UTC (permalink / raw)
  To: Frank Cusack; +Cc: torvalds, lkml

>>>>> " " == Frank Cusack <fcusack@fcusack.com> writes:

    >> OTOH, bsize is of informational interest to programs that wish
    >> to optimize I/O throughput by grouping their data into
    >> appropriately sized records.

     > So then isn't the optimal record size 8192 for r/wsize=8192?
     > Since the data is going to be grouped into 8192-byte reads and
     > writes over the wire, shouldn't bsize match that?  Why should I
     > make 16x 512-byte write() syscalls (if "optimal" I/O size is
     > bsize=512) instead of 1x 8192-byte syscall?

Yes. It is already on my list of bugs.

We basically need to feed 'wtpref' (a.k.a. 'wsize') into the f_bsize,
and 'wtmult' into f_frsize.

OTOH, the s_blocksize (and inode->i_blkbits) might well want to stay
with wtmult.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: effect of nfs blocksize on I/O ?
  2003-09-29  8:27     ` Trond Myklebust
@ 2003-09-30  5:23       ` Frank Cusack
  2003-09-30  6:04         ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Frank Cusack @ 2003-09-30  5:23 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: torvalds, lkml

On Mon, Sep 29, 2003 at 01:27:51AM -0700, Trond Myklebust wrote:
> >>>>> " " == Frank Cusack <fcusack@fcusack.com> writes:
> 
>     >> OTOH, bsize is of informational interest to programs that wish
>     >> to optimize I/O throughput by grouping their data into
>     >> appropriately sized records.
> 
>      > So then isn't the optimal record size 8192 for r/wsize=8192?
>      > Since the data is going to be grouped into 8192-byte reads and
>      > writes over the wire, shouldn't bsize match that?  Why should I
>      > make 16x 512-byte write() syscalls (if "optimal" I/O size is
>      > bsize=512) instead of 1x 8192-byte syscall?
> 
> Yes. It is already on my list of bugs.
> 
> We basically need to feed 'wtpref' (a.k.a. 'wsize') into the f_bsize,
> and 'wtmult' into f_frsize.

Then it sounds like the current wtmult/512 value for f_bsize is a bug.
Until such time as you get f_frsize going, just directly plugging
wsize into s_blocksize seems like a win.  Doesn't it?  At least, I don't
see the advantage of using wtmult.  (but could easily be missing it!)

> OTOH, the s_blocksize (and inode->i_blkbits) might well want to stay
> with wtmult.

ISTM that f_frsize is pretty useless for NFS.  Even if the server gives
you this value (as wtmult), what use besides conversion of tbytes/abytes
values does it have?

If you like, I can supply such a patch.

- s_blocksize, either
  . leave it as is (wtmult?wtmult:512)
  . set to wsize (ie, my first patch in this thread)
- statfs, both
  . report wtpref as f_bsize (already done if s_blocksize = wsize)
  . report (wtmult?wtmult:wtpref) as f_frsize

I think the second s_blocksize option is better because not only statfs()
but also stat() will use this value without any additional work.

/fc

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: effect of nfs blocksize on I/O ?
  2003-09-30  5:23       ` Frank Cusack
@ 2003-09-30  6:04         ` Trond Myklebust
  0 siblings, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2003-09-30  6:04 UTC (permalink / raw)
  To: Frank Cusack; +Cc: torvalds, lkml

>>>>> " " == Frank Cusack <fcusack@fcusack.com> writes:

     > Then it sounds like the current wtmult/512 value for f_bsize is
     > a bug.  Until such time as you get f_frsize going, just
     > directly plugging wsize into s_blocksize seems like a win.
     > Doesn't it?

No "until such time" solutions please... Let's do it right the first
time, or not at all!

s_blocksize should remain set to wtmult. It is only f_bsize and
f_frsize that are currently set to the wrong values.

     > ISTM that f_frsize is pretty useless for NFS.  Even if the
     > server gives you this value (as wtmult), what use besides
     > conversion of tbytes/abytes values does it have?

As I told you, the main use is for 'df' and friends. Get it wrong, and
the result will be rounding errors when reporting disk sizes etc. This
is currently the case in 2.4.x where we confuse f_bsize and f_frsize.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-09-30  6:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-29  6:42 effect of nfs blocksize on I/O ? Frank Cusack
2003-09-29  7:19 ` Trond Myklebust
2003-09-29  7:52   ` Frank Cusack
2003-09-29  8:27     ` Trond Myklebust
2003-09-30  5:23       ` Frank Cusack
2003-09-30  6:04         ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).