linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@dilger.ca>
To: Nick Piggin <npiggin@suse.de>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Al Viro <viro@ZenIV.linux.org.uk>,
	Ulrich Drepper <drepper@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [rfc] new stat*fs-like syscall?
Date: Fri, 25 Jun 2010 11:47:29 -0600	[thread overview]
Message-ID: <1A529206-98C4-4A88-A2CA-18522D4D631D@dilger.ca> (raw)
In-Reply-To: <20100625040156.GQ10441@laptop>

On 2010-06-24, at 22:01, Nick Piggin wrote:
> On Thu, Jun 24, 2010 at 05:13:38PM -0600, Andreas Dilger wrote:
>>> Other than types, other differences are:
>>> - statvfs(2) has is f_frsize, which seems fairly useless.
>> 
>> Actually, we were just lamenting the fact that f_frsize is currently broken, because Lustre wants to export the IO size as 1MB for good RPC performance, but the underlying blocksize is 4kB (ext3 blocksize).  Similarly, NFS might want to export the rsize/wsize of 32kB or 64kB even if the underlying filesystem blocksize is smaller.
>> 
>> 
>>> - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
>>> block size. The latter could be useful for disk space algorithms.
>>> Both can be ill defned.
>> 
>> According to POSIX, "f_bsize" is the blocksize, but unfortunately this was 

Doh, typo.  "f_frsize" is the "blocksize" (i.e. the units of f_blocks), and "f_bsize" is the "optimal IO size".

The SUSv2 includes the following field definitions (not showing all of them):
> unsigned long f_bsize    file system block size
> unsigned long f_frsize   fundamental filesystem block size
> fsblkcnt_t    f_blocks   total number of blocks on file system
>                          in units of f_frsize

>> botched in the earlier Linux implementations so currently they are both set to the same value, and using anything other than that breaks userspace programs that get them mixed up.
> 
> So is "frsize" supposed to be the optimal block size, or what?

No, "frsize" is the minimum allocation unit - it is "fragment size".

> f_bsize AFAIKS should be filesystem allocation block size because 
> apparently some programs require it to calculate size of file on
> disk.

Using statvfs()/struct statvfs clearly documents that f_blocks is in units of f_frsize, but since this is a relatively new API on Linux, and statfs() used f_bsize for years to mean the same thing some applications are broken.

> If we can't change existing suboptimal legacy things, then let's
> introduce new APIs that do the right thing. Apps that care will
> eventually start using eg. a new syscall.

I'd rather NOT start a proliferation of redundant syscalls, since there is no expectation that they will be used correctly either, and it just makes applications less portable.  I think it less effort to fix the few current applications using sys_statvfs() incorrectly to use f_frsize than to use some new linux-only syscall.

>> It wouldn't be a bad idea, but then you could get into issues of what exactly the above flags mean.  That said, I think it is better to have broad categories of features that may be slightly ill-defined than having nothing at all.
> 
> Yes it would be tricky. I don't want to add features that will just
> be useless or go unused, but I don't want to change the syscall API
> just to add f_flags, without looking at other possibilities.

SUSv2 only defines the flags ST_RDONLY and ST_NOSUID, and this is also what is documented in the Linux/BSD/OSX statvfs(3) man page.  According to the Solaris statvfs(3) man page I found it additionally defines:

ST_NOTRUNC   0x04    /* does not truncate file names longer than
                        NAME_MAX */

Cheers, Andreas






  parent reply	other threads:[~2010-06-25 17:47 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-24 13:14 [rfc] new stat*fs-like syscall? Nick Piggin
2010-06-24 14:03 ` Miklos Szeredi
2010-06-24 14:36   ` Nick Piggin
2010-06-24 14:08 ` Andy Lutomirski
2010-06-24 14:18   ` Miklos Szeredi
2010-06-24 14:37     ` Andrew Lutomirski
2010-06-24 14:48       ` Miklos Szeredi
2010-06-25  3:50         ` Nick Piggin
2010-06-24 23:06   ` Andreas Dilger
2010-06-25  6:37     ` Christoph Hellwig
2010-06-24 23:13 ` Andreas Dilger
2010-06-25  4:01   ` Nick Piggin
2010-06-25  4:33     ` Jeff Garzik
2010-06-25 17:47     ` Andreas Dilger [this message]
2010-06-25 17:52       ` Ulrich Drepper
2010-06-25 18:16         ` Christoph Hellwig
2010-06-25 18:45           ` Christoph Hellwig
2010-06-25 19:40             ` Ulrich Drepper
2010-06-26  5:53 ` J. R. Okajima
2010-06-26  9:35   ` Christoph Hellwig
2010-06-26 12:54     ` J. R. Okajima
2010-07-05 20:58       ` Brad Boyer
2010-07-05 23:31         ` J. R. Okajima
2010-07-06  0:45           ` Brad Boyer
2010-07-06 16:45             ` Linus Torvalds
2010-07-07  1:44               ` Christoph Hellwig
2010-07-07  2:28                 ` Linus Torvalds
2010-06-26 14:49     ` Ulrich Drepper
2010-06-26 10:13 ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1A529206-98C4-4A88-A2CA-18522D4D631D@dilger.ca \
    --to=adilger@dilger.ca \
    --cc=drepper@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).