All of lore.kernel.org
 help / color / mirror / Atom feed
* [rfc] new stat*fs-like syscall?
@ 2010-06-24 13:14 Nick Piggin
  2010-06-24 14:03 ` Miklos Szeredi
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Nick Piggin @ 2010-06-24 13:14 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds

This has come up a few times in the past, and I'd like to try to get
an agreement on it. statvfs(2) importantly contains f_flag (mount
flags), and is encouraged to use rather than statfs(2). The kernel
provides a statfs syscall only.

This means glibc has to provide f_flag support by parsing /proc/mounts
and stat(2)ing mount points. This is really slow, and /proc/mounts is
hard for the kernel to provide. It's actually the last scalability
bottleneck in the core vfs for dbench (samba) after my patches.

Not only that, but it's racy.

Other than types, other differences are:
- statvfs(2) has is f_frsize, which seems fairly useless.
- statvfs(2) has f_favail.
- statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
  block size. The latter could be useful for disk space algorithms.
  Both can be ill defned.
- statvfs(2) lacks f_type.

Is there anything more we should add here? Samba wants a capabilities
field, with things like sparse files, quotas, compression, encryption,
case preserving/sensitive.

Any thoughts?

Thanks,
Nick


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-24 13:14 [rfc] new stat*fs-like syscall? Nick Piggin
@ 2010-06-24 14:03 ` Miklos Szeredi
  2010-06-24 14:36   ` Nick Piggin
  2010-06-24 14:08 ` Andy Lutomirski
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Miklos Szeredi @ 2010-06-24 14:03 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, linux-kernel, viro, drepper, torvalds

On Thu, 24 Jun 2010, Nick Piggin wrote:
> This has come up a few times in the past, and I'd like to try to get
> an agreement on it. statvfs(2) importantly contains f_flag (mount
> flags), and is encouraged to use rather than statfs(2). The kernel
> provides a statfs syscall only.
> 
> This means glibc has to provide f_flag support by parsing /proc/mounts
> and stat(2)ing mount points. This is really slow, and /proc/mounts is
> hard for the kernel to provide. It's actually the last scalability
> bottleneck in the core vfs for dbench (samba) after my patches.
> 
> Not only that, but it's racy.
> 
> Other than types, other differences are:
> - statvfs(2) has is f_frsize, which seems fairly useless.

statfs(2) also has f_frsize since 2.6.0, only it hasn't been
documented (should be fixed now).

> - statvfs(2) has f_favail.
> - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
>   block size. The latter could be useful for disk space algorithms.
>   Both can be ill defned.

They are the same, only the documentation is different.

> - statvfs(2) lacks f_type.
> 
> Is there anything more we should add here? Samba wants a capabilities
> field, with things like sparse files, quotas, compression, encryption,
> case preserving/sensitive.
> 
> Any thoughts?

"struct statfs" and "struct statfs64" have spare fields.  We could put
the f_flag in there including a magic "this is a valid f_flag" flag,
that distinguishes from the default zero value.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-24 13:14 [rfc] new stat*fs-like syscall? Nick Piggin
  2010-06-24 14:03 ` Miklos Szeredi
@ 2010-06-24 14:08 ` Andy Lutomirski
  2010-06-24 14:18   ` Miklos Szeredi
  2010-06-24 23:06   ` Andreas Dilger
  2010-06-24 23:13 ` Andreas Dilger
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 30+ messages in thread
From: Andy Lutomirski @ 2010-06-24 14:08 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds

Nick Piggin wrote:
> This has come up a few times in the past, and I'd like to try to get
> an agreement on it. statvfs(2) importantly contains f_flag (mount
> flags), and is encouraged to use rather than statfs(2). The kernel
> provides a statfs syscall only.
> 
> This means glibc has to provide f_flag support by parsing /proc/mounts
> and stat(2)ing mount points. This is really slow, and /proc/mounts is
> hard for the kernel to provide. It's actually the last scalability
> bottleneck in the core vfs for dbench (samba) after my patches.
> 
> Not only that, but it's racy.
> 
> Other than types, other differences are:
> - statvfs(2) has is f_frsize, which seems fairly useless.
> - statvfs(2) has f_favail.
> - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
>   block size. The latter could be useful for disk space algorithms.
>   Both can be ill defned.
> - statvfs(2) lacks f_type.
> 
> Is there anything more we should add here? Samba wants a capabilities
> field, with things like sparse files, quotas, compression, encryption,
> case preserving/sensitive.
> 
> Any thoughts?

Something like fsid but actually specified to uniquely identify a 
superblock.  (Currently, fsid seems to be set by the filesystem, and 
nothing in particular ensures that two different filesystems couldn't 
have collisions.)  We could guarantee (or have a flag guaranteeing) that 
(fsid, st_inode) actually uniquely identifies an inode.

Similarly, something like fsid that uniquely identifies the vfsmount 
could be useful, although I don't know how easy that would be to provide 
for fstat?fs.

If we could expose the complete set of filesystem mount options so that 
mount(1) didn't have to look at /proc/self/mounts or /etc/mtab, then 
playing with chroots would be that much easier.

Should we expose superblock and vfsmount options separately?  We have 
read-only bind mounts now, but the way they work is rather inscrutable, 
and if stat?fs could say "superblock is read-write but vfsmount is 
readonly" then people might be able to make more sense of what's going on.

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-24 14:08 ` Andy Lutomirski
@ 2010-06-24 14:18   ` Miklos Szeredi
  2010-06-24 14:37     ` Andrew Lutomirski
  2010-06-24 23:06   ` Andreas Dilger
  1 sibling, 1 reply; 30+ messages in thread
From: Miklos Szeredi @ 2010-06-24 14:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: npiggin, linux-fsdevel, linux-kernel, viro, drepper, torvalds

On Thu, 24 Jun 2010, Andy Lutomirski wrote:
> Something like fsid but actually specified to uniquely identify a 
> superblock.  (Currently, fsid seems to be set by the filesystem, and 
> nothing in particular ensures that two different filesystems couldn't 
> have collisions.)  We could guarantee (or have a flag guaranteeing) that 
> (fsid, st_inode) actually uniquely identifies an inode.
> 
> Similarly, something like fsid that uniquely identifies the vfsmount 
> could be useful, although I don't know how easy that would be to provide 
> for fstat?fs.
> 
> If we could expose the complete set of filesystem mount options so that 
> mount(1) didn't have to look at /proc/self/mounts or /etc/mtab, then 
> playing with chroots would be that much easier.
> 
> Should we expose superblock and vfsmount options separately?  We have 
> read-only bind mounts now, but the way they work is rather inscrutable, 
> and if stat?fs could say "superblock is read-write but vfsmount is 
> readonly" then people might be able to make more sense of what's going on.

You'll find all of those things in /proc/self/mountinfo.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-24 14:03 ` Miklos Szeredi
@ 2010-06-24 14:36   ` Nick Piggin
  0 siblings, 0 replies; 30+ messages in thread
From: Nick Piggin @ 2010-06-24 14:36 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, viro, drepper, torvalds

On Thu, Jun 24, 2010 at 04:03:05PM +0200, Miklos Szeredi wrote:
> On Thu, 24 Jun 2010, Nick Piggin wrote:
> > This has come up a few times in the past, and I'd like to try to get
> > an agreement on it. statvfs(2) importantly contains f_flag (mount
> > flags), and is encouraged to use rather than statfs(2). The kernel
> > provides a statfs syscall only.
> > 
> > This means glibc has to provide f_flag support by parsing /proc/mounts
> > and stat(2)ing mount points. This is really slow, and /proc/mounts is
> > hard for the kernel to provide. It's actually the last scalability
> > bottleneck in the core vfs for dbench (samba) after my patches.
> > 
> > Not only that, but it's racy.
> > 
> > Other than types, other differences are:
> > - statvfs(2) has is f_frsize, which seems fairly useless.
> 
> statfs(2) also has f_frsize since 2.6.0, only it hasn't been
> documented (should be fixed now).
> 
> > - statvfs(2) has f_favail.
> > - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
> >   block size. The latter could be useful for disk space algorithms.
> >   Both can be ill defned.
> 
> They are the same, only the documentation is different.
> 
> > - statvfs(2) lacks f_type.
> > 
> > Is there anything more we should add here? Samba wants a capabilities
> > field, with things like sparse files, quotas, compression, encryption,
> > case preserving/sensitive.
> > 
> > Any thoughts?
> 
> "struct statfs" and "struct statfs64" have spare fields.  We could put
> the f_flag in there including a magic "this is a valid f_flag" flag,
> that distinguishes from the default zero value.

Ah so it does. We have 5 words spare. So we should have a version
number rather than just do a per-word hack each time. We could
probably pack the version number into a few bits of f_flag though.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-24 14:18   ` Miklos Szeredi
@ 2010-06-24 14:37     ` Andrew Lutomirski
  2010-06-24 14:48       ` Miklos Szeredi
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Lutomirski @ 2010-06-24 14:37 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: npiggin, linux-fsdevel, linux-kernel, viro, drepper, torvalds

On Thu, Jun 24, 2010 at 10:18 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, 24 Jun 2010, Andy Lutomirski wrote:
>> Something like fsid but actually specified to uniquely identify a
>> superblock.  (Currently, fsid seems to be set by the filesystem, and
>> nothing in particular ensures that two different filesystems couldn't
>> have collisions.)  We could guarantee (or have a flag guaranteeing) that
>> (fsid, st_inode) actually uniquely identifies an inode.
>>
>> Similarly, something like fsid that uniquely identifies the vfsmount
>> could be useful, although I don't know how easy that would be to provide
>> for fstat?fs.
>>
>> If we could expose the complete set of filesystem mount options so that
>> mount(1) didn't have to look at /proc/self/mounts or /etc/mtab, then
>> playing with chroots would be that much easier.
>>
>> Should we expose superblock and vfsmount options separately?  We have
>> read-only bind mounts now, but the way they work is rather inscrutable,
>> and if stat?fs could say "superblock is read-write but vfsmount is
>> readonly" then people might be able to make more sense of what's going on.
>
> You'll find all of those things in /proc/self/mountinfo.

Wasn't the point that /proc/self/mounts (and presumably
/proc/self/mountinfo) isn't scalable and we wanted a syscall to query
it efficiently (and racelessly)?

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-24 14:37     ` Andrew Lutomirski
@ 2010-06-24 14:48       ` Miklos Szeredi
  2010-06-25  3:50         ` Nick Piggin
  0 siblings, 1 reply; 30+ messages in thread
From: Miklos Szeredi @ 2010-06-24 14:48 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: miklos, npiggin, linux-fsdevel, linux-kernel, viro, drepper, torvalds

On Thu, 24 Jun 2010, Andrew Lutomirski wrote:
> Wasn't the point that /proc/self/mounts (and presumably
> /proc/self/mountinfo) isn't scalable and we wanted a syscall to query
> it efficiently (and racelessly)?

The question was how to support statvfs() efficiently, and the only
thing missing there is f_flags which can easily be added to the
existing statfs() syscall.

A separate mount_info() syscall might possibly be useful, but that's
another story.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-24 14:08 ` Andy Lutomirski
  2010-06-24 14:18   ` Miklos Szeredi
@ 2010-06-24 23:06   ` Andreas Dilger
  2010-06-25  6:37     ` Christoph Hellwig
  1 sibling, 1 reply; 30+ messages in thread
From: Andreas Dilger @ 2010-06-24 23:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Nick Piggin, linux-fsdevel, linux-kernel, Al Viro,
	Ulrich Drepper, Linus Torvalds

On 2010-06-24, at 08:08, Andy Lutomirski wrote:
> Something like fsid but actually specified to uniquely identify a superblock.  (Currently, fsid seems to be set by the filesystem, and nothing in particular ensures that two different filesystems couldn't have collisions.)  We could guarantee (or have a flag guaranteeing) that (fsid, st_inode) actually uniquely identifies an inode.

I think the right solution for this issue is to (gradually) start enforcing the "uniqueness" of the UUID in the filesystem superblock.  That is what it is supposed to be for.  Using (fsid, st_inode) doesn't necessarily help anything, if "fsid" isn't unique, and the same "st_inode" number is used on two different mountpoints.

To start, tracking the UUID at mount time an printing a non-fatal error at mount time if the mounted UUID is not unique would help, as would having e.g. fsck track the UUIDs of the underlying filesystems and printing a non-fatal error if it hits a duplicate UUID.  

At some point in the future, the kernel can be changed to refuse to mount a filesystem with a duplicate UUID.  I believe mount.xfs already does this.

Cheers, Andreas






^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-24 13:14 [rfc] new stat*fs-like syscall? Nick Piggin
  2010-06-24 14:03 ` Miklos Szeredi
  2010-06-24 14:08 ` Andy Lutomirski
@ 2010-06-24 23:13 ` Andreas Dilger
  2010-06-25  4:01   ` Nick Piggin
  2010-06-26  5:53 ` J. R. Okajima
  2010-06-26 10:13 ` Andi Kleen
  4 siblings, 1 reply; 30+ messages in thread
From: Andreas Dilger @ 2010-06-24 23:13 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds

On 2010-06-24, at 07:14, Nick Piggin wrote:
> This means glibc has to provide f_flag support by parsing /proc/mounts
> and stat(2)ing mount points. This is really slow, and /proc/mounts is
> hard for the kernel to provide.

Not only that, but if a mountpoint is broken (e.g. remote NFS server) then the glibc stat of all the mountpoints can hang the statvfs() call even if there is no interest in that particular filesystem.

> It's actually the last scalability bottleneck in the core vfs for dbench (samba) after my patches.
> 
> Not only that, but it's racy.
> 
> Other than types, other differences are:
> - statvfs(2) has is f_frsize, which seems fairly useless.

Actually, we were just lamenting the fact that f_frsize is currently broken, because Lustre wants to export the IO size as 1MB for good RPC performance, but the underlying blocksize is 4kB (ext3 blocksize).  Similarly, NFS might want to export the rsize/wsize of 32kB or 64kB even if the underlying filesystem blocksize is smaller.

> - statvfs(2) has f_favail.
> - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
>  block size. The latter could be useful for disk space algorithms.
>  Both can be ill defned.

According to POSIX, "f_bsize" is the blocksize, but unfortunately this was botched in the earlier Linux implementations so currently they are both set to the same value, and using anything other than that breaks userspace programs that get them mixed up.

> - statvfs(2) lacks f_type.
> 
> Is there anything more we should add here? Samba wants a capabilities
> field, with things like sparse files, quotas, compression, encryption,
> case preserving/sensitive.

It wouldn't be a bad idea, but then you could get into issues of what exactly the above flags mean.  That said, I think it is better to have broad categories of features that may be slightly ill-defined than having nothing at all.

Cheers, Andreas






^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-24 14:48       ` Miklos Szeredi
@ 2010-06-25  3:50         ` Nick Piggin
  0 siblings, 0 replies; 30+ messages in thread
From: Nick Piggin @ 2010-06-25  3:50 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Andrew Lutomirski, linux-fsdevel, linux-kernel, viro, drepper, torvalds

On Thu, Jun 24, 2010 at 04:48:20PM +0200, Miklos Szeredi wrote:
> On Thu, 24 Jun 2010, Andrew Lutomirski wrote:
> > Wasn't the point that /proc/self/mounts (and presumably
> > /proc/self/mountinfo) isn't scalable and we wanted a syscall to query
> > it efficiently (and racelessly)?
> 
> The question was how to support statvfs() efficiently, and the only
> thing missing there is f_flags which can easily be added to the
> existing statfs() syscall.
> 
> A separate mount_info() syscall might possibly be useful, but that's
> another story.

Native statvfs() support is my motivation, but I am thinking that if
we are going to introduce a new syscall (or version rev the statfs
syscall somehow), then we should think hard about what else we can do.

More superblock info should be possible, more detailed info like like
related mounts will be costlier, so that may be better off as a
different syscall.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-24 23:13 ` Andreas Dilger
@ 2010-06-25  4:01   ` Nick Piggin
  2010-06-25  4:33     ` Jeff Garzik
  2010-06-25 17:47     ` Andreas Dilger
  0 siblings, 2 replies; 30+ messages in thread
From: Nick Piggin @ 2010-06-25  4:01 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds

On Thu, Jun 24, 2010 at 05:13:38PM -0600, Andreas Dilger wrote:
> On 2010-06-24, at 07:14, Nick Piggin wrote:
> > This means glibc has to provide f_flag support by parsing /proc/mounts
> > and stat(2)ing mount points. This is really slow, and /proc/mounts is
> > hard for the kernel to provide.
> 
> Not only that, but if a mountpoint is broken (e.g. remote NFS server) then the glibc stat of all the mountpoints can hang the statvfs() call even if there is no interest in that particular filesystem.

Good point.

 
> > It's actually the last scalability bottleneck in the core vfs for dbench (samba) after my patches.
> > 
> > Not only that, but it's racy.
> > 
> > Other than types, other differences are:
> > - statvfs(2) has is f_frsize, which seems fairly useless.
> 
> Actually, we were just lamenting the fact that f_frsize is currently broken, because Lustre wants to export the IO size as 1MB for good RPC performance, but the underlying blocksize is 4kB (ext3 blocksize).  Similarly, NFS might want to export the rsize/wsize of 32kB or 64kB even if the underlying filesystem blocksize is smaller.
> 
> > - statvfs(2) has f_favail.
> > - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
> >  block size. The latter could be useful for disk space algorithms.
> >  Both can be ill defned.
> 
> According to POSIX, "f_bsize" is the blocksize, but unfortunately this was botched in the earlier Linux implementations so currently they are both set to the same value, and using anything other than that breaks userspace programs that get them mixed up.

So is "frsize" supposed to be the optimal block size, or what?
f_bsize AFAIKS should be filesystem allocation block size because 
apparently some programs require it to calculate size of file on
disk.

If we can't change existing suboptimal legacy things, then let's
introduce new APIs that do the right thing. Apps that care will
eventually start using eg. a new syscall.

> 
> > - statvfs(2) lacks f_type.
> > 
> > Is there anything more we should add here? Samba wants a capabilities
> > field, with things like sparse files, quotas, compression, encryption,
> > case preserving/sensitive.
> 
> It wouldn't be a bad idea, but then you could get into issues of what exactly the above flags mean.  That said, I think it is better to have broad categories of features that may be slightly ill-defined than having nothing at all.

Yes it would be tricky. I don't want to add features that will just
be useless or go unused, but I don't want to change the syscall API
just to add f_flags, without looking at other possibilities.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-25  4:01   ` Nick Piggin
@ 2010-06-25  4:33     ` Jeff Garzik
  2010-06-25 17:47     ` Andreas Dilger
  1 sibling, 0 replies; 30+ messages in thread
From: Jeff Garzik @ 2010-06-25  4:33 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andreas Dilger, linux-fsdevel, linux-kernel, Al Viro,
	Ulrich Drepper, Linus Torvalds

On 06/25/2010 12:01 AM, Nick Piggin wrote:
> So is "frsize" supposed to be the optimal block size, or what?
> f_bsize AFAIKS should be filesystem allocation block size because
> apparently some programs require it to calculate size of file on
> disk.
>
> If we can't change existing suboptimal legacy things, then let's
> introduce new APIs that do the right thing. Apps that care will
> eventually start using eg. a new syscall.
>
>>
>>> - statvfs(2) lacks f_type.
>>>
>>> Is there anything more we should add here? Samba wants a capabilities
>>> field, with things like sparse files, quotas, compression, encryption,
>>> case preserving/sensitive.
>>
>> It wouldn't be a bad idea, but then you could get into issues of what exactly the above flags mean.  That said, I think it is better to have broad categories of features that may be slightly ill-defined than having nothing at all.
>
> Yes it would be tricky. I don't want to add features that will just
> be useless or go unused, but I don't want to change the syscall API
> just to add f_flags, without looking at other possibilities.


It would be nice to separate capabilities and fixed parameters (block 
size) from statistics which change frequently (free space).

And are capabilities really suited to a C struct, at all?  That seems 
more suited to a key/value type interface, a la NFSv4 attributes.

	Jeff




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-24 23:06   ` Andreas Dilger
@ 2010-06-25  6:37     ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2010-06-25  6:37 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Andy Lutomirski, Nick Piggin, linux-fsdevel, linux-kernel,
	Al Viro, Ulrich Drepper, Linus Torvalds

On Thu, Jun 24, 2010 at 05:06:45PM -0600, Andreas Dilger wrote:
> I think the right solution for this issue is to (gradually) start enforcing the "uniqueness" of the UUID in the filesystem superblock.  That is what it is supposed to be for.  Using (fsid, st_inode) doesn't necessarily help anything, if "fsid" isn't unique, and the same "st_inode" number is used on two different mountpoints.
> 
> To start, tracking the UUID at mount time an printing a non-fatal error at mount time if the mounted UUID is not unique would help, as would having e.g. fsck track the UUIDs of the underlying filesystems and printing a non-fatal error if it hits a duplicate UUID.  
> 
> At some point in the future, the kernel can be changed to refuse to mount a filesystem with a duplicate UUID.  I believe mount.xfs already does this.

Tracking and exposing the uuid to be exact.  Having the full uuid in a
statfs/statvfs-like system call is one first step.  And yes, XFS does
check the uuid during mount.  But it's actually in kernelspace, not in a
mount helper which XFS doesn't have.  Take a look at xfs_uuid_mount().


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-25  4:01   ` Nick Piggin
  2010-06-25  4:33     ` Jeff Garzik
@ 2010-06-25 17:47     ` Andreas Dilger
  2010-06-25 17:52       ` Ulrich Drepper
  1 sibling, 1 reply; 30+ messages in thread
From: Andreas Dilger @ 2010-06-25 17:47 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds

On 2010-06-24, at 22:01, Nick Piggin wrote:
> On Thu, Jun 24, 2010 at 05:13:38PM -0600, Andreas Dilger wrote:
>>> Other than types, other differences are:
>>> - statvfs(2) has is f_frsize, which seems fairly useless.
>> 
>> Actually, we were just lamenting the fact that f_frsize is currently broken, because Lustre wants to export the IO size as 1MB for good RPC performance, but the underlying blocksize is 4kB (ext3 blocksize).  Similarly, NFS might want to export the rsize/wsize of 32kB or 64kB even if the underlying filesystem blocksize is smaller.
>> 
>> 
>>> - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
>>> block size. The latter could be useful for disk space algorithms.
>>> Both can be ill defned.
>> 
>> According to POSIX, "f_bsize" is the blocksize, but unfortunately this was 

Doh, typo.  "f_frsize" is the "blocksize" (i.e. the units of f_blocks), and "f_bsize" is the "optimal IO size".

The SUSv2 includes the following field definitions (not showing all of them):
> unsigned long f_bsize    file system block size
> unsigned long f_frsize   fundamental filesystem block size
> fsblkcnt_t    f_blocks   total number of blocks on file system
>                          in units of f_frsize

>> botched in the earlier Linux implementations so currently they are both set to the same value, and using anything other than that breaks userspace programs that get them mixed up.
> 
> So is "frsize" supposed to be the optimal block size, or what?

No, "frsize" is the minimum allocation unit - it is "fragment size".

> f_bsize AFAIKS should be filesystem allocation block size because 
> apparently some programs require it to calculate size of file on
> disk.

Using statvfs()/struct statvfs clearly documents that f_blocks is in units of f_frsize, but since this is a relatively new API on Linux, and statfs() used f_bsize for years to mean the same thing some applications are broken.

> If we can't change existing suboptimal legacy things, then let's
> introduce new APIs that do the right thing. Apps that care will
> eventually start using eg. a new syscall.

I'd rather NOT start a proliferation of redundant syscalls, since there is no expectation that they will be used correctly either, and it just makes applications less portable.  I think it less effort to fix the few current applications using sys_statvfs() incorrectly to use f_frsize than to use some new linux-only syscall.

>> It wouldn't be a bad idea, but then you could get into issues of what exactly the above flags mean.  That said, I think it is better to have broad categories of features that may be slightly ill-defined than having nothing at all.
> 
> Yes it would be tricky. I don't want to add features that will just
> be useless or go unused, but I don't want to change the syscall API
> just to add f_flags, without looking at other possibilities.

SUSv2 only defines the flags ST_RDONLY and ST_NOSUID, and this is also what is documented in the Linux/BSD/OSX statvfs(3) man page.  According to the Solaris statvfs(3) man page I found it additionally defines:

ST_NOTRUNC   0x04    /* does not truncate file names longer than
                        NAME_MAX */

Cheers, Andreas






^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-25 17:47     ` Andreas Dilger
@ 2010-06-25 17:52       ` Ulrich Drepper
  2010-06-25 18:16         ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Ulrich Drepper @ 2010-06-25 17:52 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Nick Piggin, linux-fsdevel, linux-kernel, Al Viro,
	Ulrich Drepper, Linus Torvalds

On Fri, Jun 25, 2010 at 10:47, Andreas Dilger <adilger@dilger.ca> wrote:
> SUSv2 only defines the flags ST_RDONLY and ST_NOSUID, and this is also what is documented in the Linux/BSD/OSX statvfs(3) man page.  According to the Solaris statvfs(3) man page I found it additionally defines:
>
> ST_NOTRUNC   0x04    /* does not truncate file names longer than
>                        NAME_MAX */

glibc supports many more flags.  SuS of course has to restrict itself,
there are not that many flags which are portable and available on all
the platforms.  Look at /usr/include/bits/statvfs.h for what has to be
supported and the values to use.  If the values the kernel will use
differ I'd have to (unnecessarily) convert the values.  If some values
are missing/not supported I still would have to use /proc/mounts and
nothing is gained.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-25 17:52       ` Ulrich Drepper
@ 2010-06-25 18:16         ` Christoph Hellwig
  2010-06-25 18:45           ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2010-06-25 18:16 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Andreas Dilger, Nick Piggin, linux-fsdevel, linux-kernel,
	Al Viro, Ulrich Drepper, Linus Torvalds

On Fri, Jun 25, 2010 at 10:52:05AM -0700, Ulrich Drepper wrote:
> there are not that many flags which are portable and available on all
> the platforms.  Look at /usr/include/bits/statvfs.h for what has to be
> supported and the values to use.  If the values the kernel will use
> differ I'd have to (unnecessarily) convert the values.  If some values
> are missing/not supported I still would have to use /proc/mounts and
> nothing is gained.

I don't quite get what ST_WRITE is supposed to mean.  All but that one
can be supported trivially.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-25 18:16         ` Christoph Hellwig
@ 2010-06-25 18:45           ` Christoph Hellwig
  2010-06-25 19:40               ` Ulrich Drepper
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2010-06-25 18:45 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Andreas Dilger, Nick Piggin, linux-fsdevel, linux-kernel,
	Al Viro, Ulrich Drepper, Linus Torvalds

On Fri, Jun 25, 2010 at 02:16:38PM -0400, Christoph Hellwig wrote:
> On Fri, Jun 25, 2010 at 10:52:05AM -0700, Ulrich Drepper wrote:
> > there are not that many flags which are portable and available on all
> > the platforms.  Look at /usr/include/bits/statvfs.h for what has to be
> > supported and the values to use.  If the values the kernel will use
> > differ I'd have to (unnecessarily) convert the values.  If some values
> > are missing/not supported I still would have to use /proc/mounts and
> > nothing is gained.
> 
> I don't quite get what ST_WRITE is supposed to mean.  All but that one
> can be supported trivially.

In addition ST_APPEND and ST_IMMUTABLE are rather puzzling.  Do you
really want these to mean if the file we call statfs on have the
immutable/append only bits set?  That is mixing two bits of stat
information into statfs?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-25 18:45           ` Christoph Hellwig
@ 2010-06-25 19:40               ` Ulrich Drepper
  0 siblings, 0 replies; 30+ messages in thread
From: Ulrich Drepper @ 2010-06-25 19:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Dilger, Nick Piggin, linux-fsdevel, linux-kernel,
	Al Viro, Ulrich Drepper, Linus Torvalds

On Fri, Jun 25, 2010 at 11:45, Christoph Hellwig <hch@infradead.org> wrote:
> I don't quite get what ST_WRITE is supposed to mean.  All but that one
> can be supported trivially.

ST_WRITE comes elsewhere.  We don't use it on Linux.


> In addition ST_APPEND and ST_IMMUTABLE are rather puzzling.  Do you
> really want these to mean if the file we call statfs on have the
> immutable/append only bits set?  That is mixing two bits of stat
> information into statfs?

Ignore these as well, they also has a different source.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
@ 2010-06-25 19:40               ` Ulrich Drepper
  0 siblings, 0 replies; 30+ messages in thread
From: Ulrich Drepper @ 2010-06-25 19:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Dilger, Nick Piggin, linux-fsdevel, linux-kernel,
	Al Viro, Ulrich Drepper, Linus Torvalds

On Fri, Jun 25, 2010 at 11:45, Christoph Hellwig <hch@infradead.org> wrote:
> I don't quite get what ST_WRITE is supposed to mean.  All but that one
> can be supported trivially.

ST_WRITE comes elsewhere.  We don't use it on Linux.


> In addition ST_APPEND and ST_IMMUTABLE are rather puzzling.  Do you
> really want these to mean if the file we call statfs on have the
> immutable/append only bits set?  That is mixing two bits of stat
> information into statfs?

Ignore these as well, they also has a different source.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-24 13:14 [rfc] new stat*fs-like syscall? Nick Piggin
                   ` (2 preceding siblings ...)
  2010-06-24 23:13 ` Andreas Dilger
@ 2010-06-26  5:53 ` J. R. Okajima
  2010-06-26  9:35   ` Christoph Hellwig
  2010-06-26 10:13 ` Andi Kleen
  4 siblings, 1 reply; 30+ messages in thread
From: J. R. Okajima @ 2010-06-26  5:53 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds


Nick Piggin:
> Is there anything more we should add here? Samba wants a capabilities
> field, with things like sparse files, quotas, compression, encryption,
> case preserving/sensitive.

How about the max link count?
There was a post in last December.
See <http://marc.info/?l=linux-kernel&m=126008640210762&w=2> and its
thread in detail.


J. R. Okajima

----------------------------------------------------------------------
The pathconf(_PC_LINK_MAX) cannot get the correct value, since linux
kernel doesn't provide such interface. And the current implementation in
GLibc issues statfs(2) first and then returns the predefined value
(EXT2_LINK_MAX, etc) based upoin the filesystem type. But GLibc doesn't
support all filesystem types. ie. when the target filesystem is unknown
to pathconf(3), it will return LINUX_LINK_MAX (127).
For GLibc, there is no way except implementing this poor method.

This patch makes statfs(2) return the correct value via struct
statfs.f_spare[0].

RFC:
- Can we use f_spare for this purpose?
- Does pathconf(_PC_LINK_MAX) distinguish a dir and a non-dir?
  If a filesystem sets different limit for a dir as a link count from a
  non-dir, then should the filesystem checks the type of the specified
  dentry->d_inode->i_mode and return the different value?
  This patch series doesn't distinguish them and return a single value.
- Here I tried supporting only ext[23], nfs and tmpfs. Since I can test
  them by myself. I left other FSs as it is, which means if FS doesn't
  support _PC_LINK_MAX by modifying its s_op->statfs(), the default
  value will be returned. The default value is taken from GLibc trying
  to keep the compatibility. But it may not be important.
- Some FS such as ms-dos based one which doesn't support hardlink, will
  return LINK_MAX_UNSUPPORTED which is defined as 1.
- Other FS such as tmpfs which doesn't check the link count in link(2),
  will return LINK_MAX_UNLIMITED which is defined as -1. This value
  doesn't mean an error. The negative return value of pathconf(3) is
  valid.

Even if linux kernel return a correct value via statfs(2) (or anything
else), users will not get the value at once since the support in libc is
necessary too.


J. R. Okajima (5):
  vfs, support pathconf(3) with _PC_LINK_MAX
  ext2, support pathconf(3) with _PC_LINK_MAX
  ext3, support pathconf(3) with _PC_LINK_MAX
  nfs, support pathconf(3) with _PC_LINK_MAX
  tmpfs, support pathconf(3) with _PC_LINK_MAX

 fs/compat.c               |    5 +++--
 fs/ext2/super.c           |    1 +
 fs/ext3/super.c           |    1 +
 fs/libfs.c                |    1 +
 fs/nfs/client.c           |   10 +++++++---
 fs/nfs/super.c            |    1 +
 fs/open.c                 |    9 +++++++--
 include/linux/nfs_fs_sb.h |    1 +
 include/linux/statfs.h    |    6 ++++++
 mm/shmem.c                |    1 +
 10 files changed, 29 insertions(+), 7 deletions(-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-26  5:53 ` J. R. Okajima
@ 2010-06-26  9:35   ` Christoph Hellwig
  2010-06-26 12:54     ` J. R. Okajima
  2010-06-26 14:49     ` Ulrich Drepper
  0 siblings, 2 replies; 30+ messages in thread
From: Christoph Hellwig @ 2010-06-26  9:35 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Nick Piggin, linux-fsdevel, linux-kernel, Al Viro,
	Ulrich Drepper, Linus Torvalds

On Sat, Jun 26, 2010 at 02:53:32PM +0900, J. R. Okajima wrote:
> 
> Nick Piggin:
> > Is there anything more we should add here? Samba wants a capabilities
> > field, with things like sparse files, quotas, compression, encryption,
> > case preserving/sensitive.
> 
> How about the max link count?
> There was a post in last December.
> See <http://marc.info/?l=linux-kernel&m=126008640210762&w=2> and its
> thread in detail.

That's really job for a pathconf system call that allows quering random
paramters.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-24 13:14 [rfc] new stat*fs-like syscall? Nick Piggin
                   ` (3 preceding siblings ...)
  2010-06-26  5:53 ` J. R. Okajima
@ 2010-06-26 10:13 ` Andi Kleen
  4 siblings, 0 replies; 30+ messages in thread
From: Andi Kleen @ 2010-06-26 10:13 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds

Nick Piggin <npiggin@suse.de> writes:

> Other than types, other differences are:
> - statvfs(2) has is f_frsize, which seems fairly useless.
> - statvfs(2) has f_favail.
> - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
>   block size. The latter could be useful for disk space algorithms.
>   Both can be ill defned.
> - statvfs(2) lacks f_type.
>
> Is there anything more we should add here? Samba wants a capabilities
> field, with things like sparse files, quotas, compression, encryption,
> case preserving/sensitive.

I wonder if it would make sense to export the time stamp granuality
of the time stamps? We already have this information internally,
and it might allow user land to optimize its stat frequency or comparison.

Some file systems also have quotas with "project ids". Maybe add that 
too?

I think NTFS et.al. also have some more time stamps, but not sure
there's enough space for that.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-26  9:35   ` Christoph Hellwig
@ 2010-06-26 12:54     ` J. R. Okajima
  2010-07-05 20:58       ` Brad Boyer
  2010-06-26 14:49     ` Ulrich Drepper
  1 sibling, 1 reply; 30+ messages in thread
From: J. R. Okajima @ 2010-06-26 12:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nick Piggin, linux-fsdevel, linux-kernel, Al Viro,
	Ulrich Drepper, Linus Torvalds


Christoph Hellwig:
> That's really job for a pathconf system call that allows quering random
> paramters.

Do you mean it should be implemented such like this?
vfs_pathconf(struct dentry, int parm)
--> return d_sb->s_op->pathconf(parm)

I am afraid it is overdesign because the actual parameter(for FS) is
_PC_LINK_MAX only. All other params are already handled by VFS, glibc or
sb->statfs.


J. R. Okajima

(pathconf(3) parameters from the manual)
       _PC_LINK_MAX
              returns  the  maximum number of links to the file.  If fd or path refer to a direc-
              tory, then the value applies to the whole directory.  The  corresponding  macro  is
              _POSIX_LINK_MAX.

       _PC_MAX_CANON
              returns  the  maximum length of a formatted input line, where fd or path must refer
              to a terminal.  The corresponding macro is _POSIX_MAX_CANON.

       _PC_MAX_INPUT
              returns the maximum length of an input line, where fd or path must refer to a  ter-
              minal.  The corresponding macro is _POSIX_MAX_INPUT.

       _PC_NAME_MAX
              returns  the maximum length of a filename in the directory path or fd that the pro-
              cess is allowed to create.  The corresponding macro is _POSIX_NAME_MAX.

       _PC_PATH_MAX
              returns the maximum length of a relative pathname when path or fd  is  the  current
              working directory.  The corresponding macro is _POSIX_PATH_MAX.

       _PC_PIPE_BUF
              returns the size of the pipe buffer, where fd must refer to a pipe or FIFO and path
              must refer to a FIFO.  The corresponding macro is _POSIX_PIPE_BUF.

       _PC_CHOWN_RESTRICTED
              returns non-zero if the chown(2) call may not be used on this file.  If fd or  path
              refer to a directory, then this applies to all files in that directory.  The corre-
              sponding macro is _POSIX_CHOWN_RESTRICTED.

       _PC_NO_TRUNC
              returns non-zero if accessing filenames longer than  _POSIX_NAME_MAX  generates  an
              error.  The corresponding macro is _POSIX_NO_TRUNC.

       _PC_VDISABLE
              returns  non-zero if special character processing can be disabled, where fd or path
              must refer to a terminal.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-26  9:35   ` Christoph Hellwig
  2010-06-26 12:54     ` J. R. Okajima
@ 2010-06-26 14:49     ` Ulrich Drepper
  1 sibling, 0 replies; 30+ messages in thread
From: Ulrich Drepper @ 2010-06-26 14:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. R. Okajima, Nick Piggin, linux-fsdevel, linux-kernel, Al Viro,
	Ulrich Drepper, Linus Torvalds

On Sat, Jun 26, 2010 at 02:35, Christoph Hellwig <hch@infradead.org> wrote:
> That's really job for a pathconf system call that allows quering random
> paramters.

Linus has always objected to sysconf/pathconf-like syscalls.  If you
get it in I'm all for it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-06-26 12:54     ` J. R. Okajima
@ 2010-07-05 20:58       ` Brad Boyer
  2010-07-05 23:31         ` J. R. Okajima
  0 siblings, 1 reply; 30+ messages in thread
From: Brad Boyer @ 2010-07-05 20:58 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Christoph Hellwig, Nick Piggin, linux-fsdevel, linux-kernel,
	Al Viro, Ulrich Drepper, Linus Torvalds

On Sat, Jun 26, 2010 at 09:54:44PM +0900, J. R. Okajima wrote:
> Christoph Hellwig:
> > That's really job for a pathconf system call that allows quering random
> > paramters.
> 
> Do you mean it should be implemented such like this?
> vfs_pathconf(struct dentry, int parm)
> --> return d_sb->s_op->pathconf(parm)

I would suggest making it an inode operation if we do actually add it. Most
cases are going to be per super-block, but it might be easier to transparently
handle things like _PC_PIPE_BUF in glibc if it could call an fpathconf type
system call on the pipe fd. I haven't looked at the current glibc code for
that particular selector. The only one I looked at in any detail was
_PC_LINK_MAX, which is the one you already discussed and is obviously a
per-sb option. The only drawback I can see is that making it an inode
operation would make the vfs_pathconf fail on a negative dentry, but that
seems like a very strange thing to support in any case.

> I am afraid it is overdesign because the actual parameter(for FS) is
> _PC_LINK_MAX only. All other params are already handled by VFS, glibc or
> sb->statfs.

	Brad Boyer
	flar@allandria.com


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-07-05 20:58       ` Brad Boyer
@ 2010-07-05 23:31         ` J. R. Okajima
  2010-07-06  0:45           ` Brad Boyer
  0 siblings, 1 reply; 30+ messages in thread
From: J. R. Okajima @ 2010-07-05 23:31 UTC (permalink / raw)
  To: Brad Boyer
  Cc: Christoph Hellwig, Nick Piggin, linux-fsdevel, linux-kernel,
	Al Viro, Ulrich Drepper, Linus Torvalds


Brad Boyer:
> I would suggest making it an inode operation if we do actually add it. Most
> cases are going to be per super-block, but it might be easier to transparently
> handle things like _PC_PIPE_BUF in glibc if it could call an fpathconf type
> system call on the pipe fd. I haven't looked at the current glibc code for
> that particular selector. The only one I looked at in any detail was
> _PC_LINK_MAX, which is the one you already discussed and is obviously a
> per-sb option. The only drawback I can see is that making it an inode
> operation would make the vfs_pathconf fail on a negative dentry, but that
> seems like a very strange thing to support in any case.

Recently the size of the pipe buffer becomes customizable, doesn't it?
For _PC_PIPE_BUF, fpathconf should issue fcntl(F_GETPIPE_SZ).

For negative dentry, it should be supported as long as some
standard/specification doesn't prohibit explicitly. So I still think
statfs is the best place to implement _PC_LINK_MAX.


J. R. Okajima

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-07-05 23:31         ` J. R. Okajima
@ 2010-07-06  0:45           ` Brad Boyer
  2010-07-06 16:45             ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: Brad Boyer @ 2010-07-06  0:45 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Christoph Hellwig, Nick Piggin, linux-fsdevel, linux-kernel,
	Al Viro, Ulrich Drepper, Linus Torvalds

On Tue, Jul 06, 2010 at 08:31:30AM +0900, J. R. Okajima wrote:
> Recently the size of the pipe buffer becomes customizable, doesn't it?
> For _PC_PIPE_BUF, fpathconf should issue fcntl(F_GETPIPE_SZ).

That should work and is in line with my understanding of the current
code for pathconf in glibc.

> For negative dentry, it should be supported as long as some
> standard/specification doesn't prohibit explicitly. So I still think
> statfs is the best place to implement _PC_LINK_MAX.

If we're going to be changing statfs (or adding a new system call)
anyway, that does seem like a reasonable place to export this data
along with whatever else gets added. With the various things that
have been suggested, maybe we need something more like the stat
replacement that has been getting discussed with the room for some
larger optional fields and a way to request a specific set of fields.

	Brad Boyer
	flar@allandria.com


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-07-06  0:45           ` Brad Boyer
@ 2010-07-06 16:45             ` Linus Torvalds
  2010-07-07  1:44               ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2010-07-06 16:45 UTC (permalink / raw)
  To: Brad Boyer
  Cc: J. R. Okajima, Christoph Hellwig, Nick Piggin, linux-fsdevel,
	linux-kernel, Al Viro, Ulrich Drepper

[-- Attachment #1: Type: text/plain, Size: 1106 bytes --]

On Mon, Jul 5, 2010 at 5:45 PM, Brad Boyer <flar@allandria.com> wrote:
> On Tue, Jul 06, 2010 at 08:31:30AM +0900, J. R. Okajima wrote:
>> For negative dentry, it should be supported as long as some
>> standard/specification doesn't prohibit explicitly. So I still think
>> statfs is the best place to implement _PC_LINK_MAX.
>
> If we're going to be changing statfs (or adding a new system call)
> anyway, that does seem like a reasonable place to export this data
> along with whatever else gets added. With the various things that
> have been suggested, maybe we need something more like the stat
> replacement that has been getting discussed with the room for some
> larger optional fields and a way to request a specific set of fields.

Let's not overdesign things. Just do something like the attached
patch, which is the obvious and straightforward thing to do.

Overdesigning is a disease. It's fundamentally wrong.

(Yeah, yeah,. the patch is untested, and doesn't actually _fill_ the
new f_flags value, but that's left as a trivial exercise for the
reader.)

                                Linus

[-- Attachment #2: diff --]
[-- Type: application/octet-stream, Size: 5204 bytes --]

 arch/ia64/include/asm/compat.h |    3 ++-
 arch/mips/include/asm/statfs.h |   12 ++++++++----
 arch/s390/include/asm/statfs.h |    9 ++++++---
 arch/x86/include/asm/compat.h  |    3 ++-
 fs/compat.c                    |    5 +++--
 include/asm-generic/statfs.h   |    9 ++++++---
 include/linux/statfs.h         |    3 ++-
 7 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/arch/ia64/include/asm/compat.h b/arch/ia64/include/asm/compat.h
index f90edc8..ab15469 100644
--- a/arch/ia64/include/asm/compat.h
+++ b/arch/ia64/include/asm/compat.h
@@ -105,7 +105,8 @@ struct compat_statfs {
 	compat_fsid_t	f_fsid;
 	int		f_namelen;	/* SunOS ignores this field. */
 	int		f_frsize;
-	int		f_spare[5];
+	int		f_flags;
+	int		f_spare[4];
 };
 
 #define COMPAT_RLIM_OLD_INFINITY       0x7fffffff
diff --git a/arch/mips/include/asm/statfs.h b/arch/mips/include/asm/statfs.h
index c3ddf97..0f805c7 100644
--- a/arch/mips/include/asm/statfs.h
+++ b/arch/mips/include/asm/statfs.h
@@ -33,7 +33,8 @@ struct statfs {
 	/* Linux specials */
 	__kernel_fsid_t	f_fsid;
 	long		f_namelen;
-	long		f_spare[6];
+	long		f_flags;
+	long		f_spare[5];
 };
 
 #if (_MIPS_SIM == _MIPS_SIM_ABI32) || (_MIPS_SIM == _MIPS_SIM_NABI32)
@@ -53,7 +54,8 @@ struct statfs64 {
 	__u64	f_bavail;
 	__kernel_fsid_t f_fsid;
 	__u32	f_namelen;
-	__u32	f_spare[6];
+	__u32	f_flags;
+	__u32	f_spare[5];
 };
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
@@ -73,7 +75,8 @@ struct statfs64 {			/* Same as struct statfs */
 	/* Linux specials */
 	__kernel_fsid_t	f_fsid;
 	long		f_namelen;
-	long		f_spare[6];
+	long		f_flags;
+	long		f_spare[5];
 };
 
 struct compat_statfs64 {
@@ -88,7 +91,8 @@ struct compat_statfs64 {
 	__u64	f_bavail;
 	__kernel_fsid_t f_fsid;
 	__u32	f_namelen;
-	__u32	f_spare[6];
+	__u32	f_flags;
+	__u32	f_spare[5];
 };
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
diff --git a/arch/s390/include/asm/statfs.h b/arch/s390/include/asm/statfs.h
index 06cc703..3be7fbd 100644
--- a/arch/s390/include/asm/statfs.h
+++ b/arch/s390/include/asm/statfs.h
@@ -33,7 +33,8 @@ struct statfs {
 	__kernel_fsid_t f_fsid;
 	int  f_namelen;
 	int  f_frsize;
-	int  f_spare[5];
+	int  f_flags;
+	int  f_spare[4];
 };
 
 struct statfs64 {
@@ -47,7 +48,8 @@ struct statfs64 {
 	__kernel_fsid_t f_fsid;
 	int  f_namelen;
 	int  f_frsize;
-	int  f_spare[5];
+	int  f_flags;
+	int  f_spare[4];
 };
 
 struct compat_statfs64 {
@@ -61,7 +63,8 @@ struct compat_statfs64 {
 	__kernel_fsid_t f_fsid;
 	__u32 f_namelen;
 	__u32 f_frsize;
-	__u32 f_spare[5];
+	__u32 f_flags;
+	__u32 f_spare[4];
 };
 
 #endif /* __s390x__ */
diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index 306160e..9f9cdb8 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -108,7 +108,8 @@ struct compat_statfs {
 	compat_fsid_t	f_fsid;
 	int		f_namelen;	/* SunOS ignores this field. */
 	int		f_frsize;
-	int		f_spare[5];
+	int		f_flags;
+	int		f_spare[4];
 };
 
 #define COMPAT_RLIM_OLD_INFINITY	0x7fffffff
diff --git a/fs/compat.c b/fs/compat.c
index 6490d21..fe96e7d 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -245,7 +245,7 @@ static int put_compat_statfs(struct compat_statfs __user *ubuf, struct kstatfs *
 	    __put_user(kbuf->f_fsid.val[0], &ubuf->f_fsid.val[0]) ||
 	    __put_user(kbuf->f_fsid.val[1], &ubuf->f_fsid.val[1]) ||
 	    __put_user(kbuf->f_frsize, &ubuf->f_frsize) ||
-	    __put_user(0, &ubuf->f_spare[0]) || 
+	    __put_user(kbuf->f_flags, &ubuf->f_flags) || 
 	    __put_user(0, &ubuf->f_spare[1]) || 
 	    __put_user(0, &ubuf->f_spare[2]) || 
 	    __put_user(0, &ubuf->f_spare[3]) || 
@@ -318,7 +318,8 @@ static int put_compat_statfs64(struct compat_statfs64 __user *ubuf, struct kstat
 	    __put_user(kbuf->f_namelen, &ubuf->f_namelen) ||
 	    __put_user(kbuf->f_fsid.val[0], &ubuf->f_fsid.val[0]) ||
 	    __put_user(kbuf->f_fsid.val[1], &ubuf->f_fsid.val[1]) ||
-	    __put_user(kbuf->f_frsize, &ubuf->f_frsize))
+	    __put_user(kbuf->f_frsize, &ubuf->f_frsize) ||
+	    __put_user(kbuf->f_flags, &ubuf->f_flags))
 		return -EFAULT;
 	return 0;
 }
diff --git a/include/asm-generic/statfs.h b/include/asm-generic/statfs.h
index 3b4fb3e..0fd28e0 100644
--- a/include/asm-generic/statfs.h
+++ b/include/asm-generic/statfs.h
@@ -33,7 +33,8 @@ struct statfs {
 	__kernel_fsid_t f_fsid;
 	__statfs_word f_namelen;
 	__statfs_word f_frsize;
-	__statfs_word f_spare[5];
+	__statfs_word f_flags;
+	__statfs_word f_spare[4];
 };
 
 /*
@@ -55,7 +56,8 @@ struct statfs64 {
 	__kernel_fsid_t f_fsid;
 	__statfs_word f_namelen;
 	__statfs_word f_frsize;
-	__statfs_word f_spare[5];
+	__statfs_word f_flags;
+	__statfs_word f_spare[4];
 } ARCH_PACK_STATFS64;
 
 /* 
@@ -77,7 +79,8 @@ struct compat_statfs64 {
 	__kernel_fsid_t f_fsid;
 	__u32 f_namelen;
 	__u32 f_frsize;
-	__u32 f_spare[5];
+	__u32 f_flags;
+	__u32 f_spare[4];
 } ARCH_PACK_COMPAT_STATFS64;
 
 #endif
diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index b34cc82..dd8b4e7 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -16,7 +16,8 @@ struct kstatfs {
 	__kernel_fsid_t f_fsid;
 	long f_namelen;
 	long f_frsize;
-	long f_spare[5];
+	long f_flags;
+	long f_spare[4];
 };
 
 #endif

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-07-06 16:45             ` Linus Torvalds
@ 2010-07-07  1:44               ` Christoph Hellwig
  2010-07-07  2:28                 ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2010-07-07  1:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Brad Boyer, J. R. Okajima, Christoph Hellwig, Nick Piggin,
	linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper

On Tue, Jul 06, 2010 at 09:45:26AM -0700, Linus Torvalds wrote:
> Let's not overdesign things. Just do something like the attached
> patch, which is the obvious and straightforward thing to do.
> 
> Overdesigning is a disease. It's fundamentally wrong.
> 
> (Yeah, yeah,. the patch is untested, and doesn't actually _fill_ the
> new f_flags value, but that's left as a trivial exercise for the
> reader.)

At least one of the readers posted a patch filling it in already.
Need to send out the version with the review comments addressed, but
I'm still waiting for Uli if he really insists on new syscall vectors
for the same structure.  Using that one ST_VALID bit seems a lot easier
to me.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc] new stat*fs-like syscall?
  2010-07-07  1:44               ` Christoph Hellwig
@ 2010-07-07  2:28                 ` Linus Torvalds
  0 siblings, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2010-07-07  2:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Brad Boyer, J. R. Okajima, Nick Piggin, linux-fsdevel,
	linux-kernel, Al Viro, Ulrich Drepper

On Tue, Jul 6, 2010 at 6:44 PM, Christoph Hellwig <hch@infradead.org> wrote:
>
> I'm still waiting for Uli if he really insists on new syscall vectors
> for the same structure.  Using that one ST_VALID bit seems a lot easier
> to me.

Umm. Uli doesn't get to choose kernel system call conventions. It
matters not one whit whether he insists on new system calls or not,
it's not going to happen.

             Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2010-07-07  2:29 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-24 13:14 [rfc] new stat*fs-like syscall? Nick Piggin
2010-06-24 14:03 ` Miklos Szeredi
2010-06-24 14:36   ` Nick Piggin
2010-06-24 14:08 ` Andy Lutomirski
2010-06-24 14:18   ` Miklos Szeredi
2010-06-24 14:37     ` Andrew Lutomirski
2010-06-24 14:48       ` Miklos Szeredi
2010-06-25  3:50         ` Nick Piggin
2010-06-24 23:06   ` Andreas Dilger
2010-06-25  6:37     ` Christoph Hellwig
2010-06-24 23:13 ` Andreas Dilger
2010-06-25  4:01   ` Nick Piggin
2010-06-25  4:33     ` Jeff Garzik
2010-06-25 17:47     ` Andreas Dilger
2010-06-25 17:52       ` Ulrich Drepper
2010-06-25 18:16         ` Christoph Hellwig
2010-06-25 18:45           ` Christoph Hellwig
2010-06-25 19:40             ` Ulrich Drepper
2010-06-25 19:40               ` Ulrich Drepper
2010-06-26  5:53 ` J. R. Okajima
2010-06-26  9:35   ` Christoph Hellwig
2010-06-26 12:54     ` J. R. Okajima
2010-07-05 20:58       ` Brad Boyer
2010-07-05 23:31         ` J. R. Okajima
2010-07-06  0:45           ` Brad Boyer
2010-07-06 16:45             ` Linus Torvalds
2010-07-07  1:44               ` Christoph Hellwig
2010-07-07  2:28                 ` Linus Torvalds
2010-06-26 14:49     ` Ulrich Drepper
2010-06-26 10:13 ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.