* [rfc] new stat*fs-like syscall? @ 2010-06-24 13:14 Nick Piggin 2010-06-24 14:03 ` Miklos Szeredi ` (4 more replies) 0 siblings, 5 replies; 29+ messages in thread From: Nick Piggin @ 2010-06-24 13:14 UTC (permalink / raw) To: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds This has come up a few times in the past, and I'd like to try to get an agreement on it. statvfs(2) importantly contains f_flag (mount flags), and is encouraged to use rather than statfs(2). The kernel provides a statfs syscall only. This means glibc has to provide f_flag support by parsing /proc/mounts and stat(2)ing mount points. This is really slow, and /proc/mounts is hard for the kernel to provide. It's actually the last scalability bottleneck in the core vfs for dbench (samba) after my patches. Not only that, but it's racy. Other than types, other differences are: - statvfs(2) has is f_frsize, which seems fairly useless. - statvfs(2) has f_favail. - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs block size. The latter could be useful for disk space algorithms. Both can be ill defned. - statvfs(2) lacks f_type. Is there anything more we should add here? Samba wants a capabilities field, with things like sparse files, quotas, compression, encryption, case preserving/sensitive. Any thoughts? Thanks, Nick ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-24 13:14 [rfc] new stat*fs-like syscall? Nick Piggin @ 2010-06-24 14:03 ` Miklos Szeredi 2010-06-24 14:36 ` Nick Piggin 2010-06-24 14:08 ` Andy Lutomirski ` (3 subsequent siblings) 4 siblings, 1 reply; 29+ messages in thread From: Miklos Szeredi @ 2010-06-24 14:03 UTC (permalink / raw) To: Nick Piggin; +Cc: linux-fsdevel, linux-kernel, viro, drepper, torvalds On Thu, 24 Jun 2010, Nick Piggin wrote: > This has come up a few times in the past, and I'd like to try to get > an agreement on it. statvfs(2) importantly contains f_flag (mount > flags), and is encouraged to use rather than statfs(2). The kernel > provides a statfs syscall only. > > This means glibc has to provide f_flag support by parsing /proc/mounts > and stat(2)ing mount points. This is really slow, and /proc/mounts is > hard for the kernel to provide. It's actually the last scalability > bottleneck in the core vfs for dbench (samba) after my patches. > > Not only that, but it's racy. > > Other than types, other differences are: > - statvfs(2) has is f_frsize, which seems fairly useless. statfs(2) also has f_frsize since 2.6.0, only it hasn't been documented (should be fixed now). > - statvfs(2) has f_favail. > - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs > block size. The latter could be useful for disk space algorithms. > Both can be ill defned. They are the same, only the documentation is different. > - statvfs(2) lacks f_type. > > Is there anything more we should add here? Samba wants a capabilities > field, with things like sparse files, quotas, compression, encryption, > case preserving/sensitive. > > Any thoughts? "struct statfs" and "struct statfs64" have spare fields. We could put the f_flag in there including a magic "this is a valid f_flag" flag, that distinguishes from the default zero value. Thanks, Miklos ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-24 14:03 ` Miklos Szeredi @ 2010-06-24 14:36 ` Nick Piggin 0 siblings, 0 replies; 29+ messages in thread From: Nick Piggin @ 2010-06-24 14:36 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, viro, drepper, torvalds On Thu, Jun 24, 2010 at 04:03:05PM +0200, Miklos Szeredi wrote: > On Thu, 24 Jun 2010, Nick Piggin wrote: > > This has come up a few times in the past, and I'd like to try to get > > an agreement on it. statvfs(2) importantly contains f_flag (mount > > flags), and is encouraged to use rather than statfs(2). The kernel > > provides a statfs syscall only. > > > > This means glibc has to provide f_flag support by parsing /proc/mounts > > and stat(2)ing mount points. This is really slow, and /proc/mounts is > > hard for the kernel to provide. It's actually the last scalability > > bottleneck in the core vfs for dbench (samba) after my patches. > > > > Not only that, but it's racy. > > > > Other than types, other differences are: > > - statvfs(2) has is f_frsize, which seems fairly useless. > > statfs(2) also has f_frsize since 2.6.0, only it hasn't been > documented (should be fixed now). > > > - statvfs(2) has f_favail. > > - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs > > block size. The latter could be useful for disk space algorithms. > > Both can be ill defned. > > They are the same, only the documentation is different. > > > - statvfs(2) lacks f_type. > > > > Is there anything more we should add here? Samba wants a capabilities > > field, with things like sparse files, quotas, compression, encryption, > > case preserving/sensitive. > > > > Any thoughts? > > "struct statfs" and "struct statfs64" have spare fields. We could put > the f_flag in there including a magic "this is a valid f_flag" flag, > that distinguishes from the default zero value. Ah so it does. We have 5 words spare. So we should have a version number rather than just do a per-word hack each time. We could probably pack the version number into a few bits of f_flag though. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-24 13:14 [rfc] new stat*fs-like syscall? Nick Piggin 2010-06-24 14:03 ` Miklos Szeredi @ 2010-06-24 14:08 ` Andy Lutomirski 2010-06-24 14:18 ` Miklos Szeredi 2010-06-24 23:06 ` Andreas Dilger 2010-06-24 23:13 ` Andreas Dilger ` (2 subsequent siblings) 4 siblings, 2 replies; 29+ messages in thread From: Andy Lutomirski @ 2010-06-24 14:08 UTC (permalink / raw) To: Nick Piggin Cc: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds Nick Piggin wrote: > This has come up a few times in the past, and I'd like to try to get > an agreement on it. statvfs(2) importantly contains f_flag (mount > flags), and is encouraged to use rather than statfs(2). The kernel > provides a statfs syscall only. > > This means glibc has to provide f_flag support by parsing /proc/mounts > and stat(2)ing mount points. This is really slow, and /proc/mounts is > hard for the kernel to provide. It's actually the last scalability > bottleneck in the core vfs for dbench (samba) after my patches. > > Not only that, but it's racy. > > Other than types, other differences are: > - statvfs(2) has is f_frsize, which seems fairly useless. > - statvfs(2) has f_favail. > - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs > block size. The latter could be useful for disk space algorithms. > Both can be ill defned. > - statvfs(2) lacks f_type. > > Is there anything more we should add here? Samba wants a capabilities > field, with things like sparse files, quotas, compression, encryption, > case preserving/sensitive. > > Any thoughts? Something like fsid but actually specified to uniquely identify a superblock. (Currently, fsid seems to be set by the filesystem, and nothing in particular ensures that two different filesystems couldn't have collisions.) We could guarantee (or have a flag guaranteeing) that (fsid, st_inode) actually uniquely identifies an inode. Similarly, something like fsid that uniquely identifies the vfsmount could be useful, although I don't know how easy that would be to provide for fstat?fs. If we could expose the complete set of filesystem mount options so that mount(1) didn't have to look at /proc/self/mounts or /etc/mtab, then playing with chroots would be that much easier. Should we expose superblock and vfsmount options separately? We have read-only bind mounts now, but the way they work is rather inscrutable, and if stat?fs could say "superblock is read-write but vfsmount is readonly" then people might be able to make more sense of what's going on. --Andy ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-24 14:08 ` Andy Lutomirski @ 2010-06-24 14:18 ` Miklos Szeredi 2010-06-24 14:37 ` Andrew Lutomirski 2010-06-24 23:06 ` Andreas Dilger 1 sibling, 1 reply; 29+ messages in thread From: Miklos Szeredi @ 2010-06-24 14:18 UTC (permalink / raw) To: Andy Lutomirski Cc: npiggin, linux-fsdevel, linux-kernel, viro, drepper, torvalds On Thu, 24 Jun 2010, Andy Lutomirski wrote: > Something like fsid but actually specified to uniquely identify a > superblock. (Currently, fsid seems to be set by the filesystem, and > nothing in particular ensures that two different filesystems couldn't > have collisions.) We could guarantee (or have a flag guaranteeing) that > (fsid, st_inode) actually uniquely identifies an inode. > > Similarly, something like fsid that uniquely identifies the vfsmount > could be useful, although I don't know how easy that would be to provide > for fstat?fs. > > If we could expose the complete set of filesystem mount options so that > mount(1) didn't have to look at /proc/self/mounts or /etc/mtab, then > playing with chroots would be that much easier. > > Should we expose superblock and vfsmount options separately? We have > read-only bind mounts now, but the way they work is rather inscrutable, > and if stat?fs could say "superblock is read-write but vfsmount is > readonly" then people might be able to make more sense of what's going on. You'll find all of those things in /proc/self/mountinfo. Thanks, Miklos ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-24 14:18 ` Miklos Szeredi @ 2010-06-24 14:37 ` Andrew Lutomirski 2010-06-24 14:48 ` Miklos Szeredi 0 siblings, 1 reply; 29+ messages in thread From: Andrew Lutomirski @ 2010-06-24 14:37 UTC (permalink / raw) To: Miklos Szeredi Cc: npiggin, linux-fsdevel, linux-kernel, viro, drepper, torvalds On Thu, Jun 24, 2010 at 10:18 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Thu, 24 Jun 2010, Andy Lutomirski wrote: >> Something like fsid but actually specified to uniquely identify a >> superblock. (Currently, fsid seems to be set by the filesystem, and >> nothing in particular ensures that two different filesystems couldn't >> have collisions.) We could guarantee (or have a flag guaranteeing) that >> (fsid, st_inode) actually uniquely identifies an inode. >> >> Similarly, something like fsid that uniquely identifies the vfsmount >> could be useful, although I don't know how easy that would be to provide >> for fstat?fs. >> >> If we could expose the complete set of filesystem mount options so that >> mount(1) didn't have to look at /proc/self/mounts or /etc/mtab, then >> playing with chroots would be that much easier. >> >> Should we expose superblock and vfsmount options separately? We have >> read-only bind mounts now, but the way they work is rather inscrutable, >> and if stat?fs could say "superblock is read-write but vfsmount is >> readonly" then people might be able to make more sense of what's going on. > > You'll find all of those things in /proc/self/mountinfo. Wasn't the point that /proc/self/mounts (and presumably /proc/self/mountinfo) isn't scalable and we wanted a syscall to query it efficiently (and racelessly)? --Andy ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-24 14:37 ` Andrew Lutomirski @ 2010-06-24 14:48 ` Miklos Szeredi 2010-06-25 3:50 ` Nick Piggin 0 siblings, 1 reply; 29+ messages in thread From: Miklos Szeredi @ 2010-06-24 14:48 UTC (permalink / raw) To: Andrew Lutomirski Cc: miklos, npiggin, linux-fsdevel, linux-kernel, viro, drepper, torvalds On Thu, 24 Jun 2010, Andrew Lutomirski wrote: > Wasn't the point that /proc/self/mounts (and presumably > /proc/self/mountinfo) isn't scalable and we wanted a syscall to query > it efficiently (and racelessly)? The question was how to support statvfs() efficiently, and the only thing missing there is f_flags which can easily be added to the existing statfs() syscall. A separate mount_info() syscall might possibly be useful, but that's another story. Thanks, Miklos ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-24 14:48 ` Miklos Szeredi @ 2010-06-25 3:50 ` Nick Piggin 0 siblings, 0 replies; 29+ messages in thread From: Nick Piggin @ 2010-06-25 3:50 UTC (permalink / raw) To: Miklos Szeredi Cc: Andrew Lutomirski, linux-fsdevel, linux-kernel, viro, drepper, torvalds On Thu, Jun 24, 2010 at 04:48:20PM +0200, Miklos Szeredi wrote: > On Thu, 24 Jun 2010, Andrew Lutomirski wrote: > > Wasn't the point that /proc/self/mounts (and presumably > > /proc/self/mountinfo) isn't scalable and we wanted a syscall to query > > it efficiently (and racelessly)? > > The question was how to support statvfs() efficiently, and the only > thing missing there is f_flags which can easily be added to the > existing statfs() syscall. > > A separate mount_info() syscall might possibly be useful, but that's > another story. Native statvfs() support is my motivation, but I am thinking that if we are going to introduce a new syscall (or version rev the statfs syscall somehow), then we should think hard about what else we can do. More superblock info should be possible, more detailed info like like related mounts will be costlier, so that may be better off as a different syscall. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-24 14:08 ` Andy Lutomirski 2010-06-24 14:18 ` Miklos Szeredi @ 2010-06-24 23:06 ` Andreas Dilger 2010-06-25 6:37 ` Christoph Hellwig 1 sibling, 1 reply; 29+ messages in thread From: Andreas Dilger @ 2010-06-24 23:06 UTC (permalink / raw) To: Andy Lutomirski Cc: Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On 2010-06-24, at 08:08, Andy Lutomirski wrote: > Something like fsid but actually specified to uniquely identify a superblock. (Currently, fsid seems to be set by the filesystem, and nothing in particular ensures that two different filesystems couldn't have collisions.) We could guarantee (or have a flag guaranteeing) that (fsid, st_inode) actually uniquely identifies an inode. I think the right solution for this issue is to (gradually) start enforcing the "uniqueness" of the UUID in the filesystem superblock. That is what it is supposed to be for. Using (fsid, st_inode) doesn't necessarily help anything, if "fsid" isn't unique, and the same "st_inode" number is used on two different mountpoints. To start, tracking the UUID at mount time an printing a non-fatal error at mount time if the mounted UUID is not unique would help, as would having e.g. fsck track the UUIDs of the underlying filesystems and printing a non-fatal error if it hits a duplicate UUID. At some point in the future, the kernel can be changed to refuse to mount a filesystem with a duplicate UUID. I believe mount.xfs already does this. Cheers, Andreas ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-24 23:06 ` Andreas Dilger @ 2010-06-25 6:37 ` Christoph Hellwig 0 siblings, 0 replies; 29+ messages in thread From: Christoph Hellwig @ 2010-06-25 6:37 UTC (permalink / raw) To: Andreas Dilger Cc: Andy Lutomirski, Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On Thu, Jun 24, 2010 at 05:06:45PM -0600, Andreas Dilger wrote: > I think the right solution for this issue is to (gradually) start enforcing the "uniqueness" of the UUID in the filesystem superblock. That is what it is supposed to be for. Using (fsid, st_inode) doesn't necessarily help anything, if "fsid" isn't unique, and the same "st_inode" number is used on two different mountpoints. > > To start, tracking the UUID at mount time an printing a non-fatal error at mount time if the mounted UUID is not unique would help, as would having e.g. fsck track the UUIDs of the underlying filesystems and printing a non-fatal error if it hits a duplicate UUID. > > At some point in the future, the kernel can be changed to refuse to mount a filesystem with a duplicate UUID. I believe mount.xfs already does this. Tracking and exposing the uuid to be exact. Having the full uuid in a statfs/statvfs-like system call is one first step. And yes, XFS does check the uuid during mount. But it's actually in kernelspace, not in a mount helper which XFS doesn't have. Take a look at xfs_uuid_mount(). ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-24 13:14 [rfc] new stat*fs-like syscall? Nick Piggin 2010-06-24 14:03 ` Miklos Szeredi 2010-06-24 14:08 ` Andy Lutomirski @ 2010-06-24 23:13 ` Andreas Dilger 2010-06-25 4:01 ` Nick Piggin 2010-06-26 5:53 ` J. R. Okajima 2010-06-26 10:13 ` Andi Kleen 4 siblings, 1 reply; 29+ messages in thread From: Andreas Dilger @ 2010-06-24 23:13 UTC (permalink / raw) To: Nick Piggin Cc: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On 2010-06-24, at 07:14, Nick Piggin wrote: > This means glibc has to provide f_flag support by parsing /proc/mounts > and stat(2)ing mount points. This is really slow, and /proc/mounts is > hard for the kernel to provide. Not only that, but if a mountpoint is broken (e.g. remote NFS server) then the glibc stat of all the mountpoints can hang the statvfs() call even if there is no interest in that particular filesystem. > It's actually the last scalability bottleneck in the core vfs for dbench (samba) after my patches. > > Not only that, but it's racy. > > Other than types, other differences are: > - statvfs(2) has is f_frsize, which seems fairly useless. Actually, we were just lamenting the fact that f_frsize is currently broken, because Lustre wants to export the IO size as 1MB for good RPC performance, but the underlying blocksize is 4kB (ext3 blocksize). Similarly, NFS might want to export the rsize/wsize of 32kB or 64kB even if the underlying filesystem blocksize is smaller. > - statvfs(2) has f_favail. > - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs > block size. The latter could be useful for disk space algorithms. > Both can be ill defned. According to POSIX, "f_bsize" is the blocksize, but unfortunately this was botched in the earlier Linux implementations so currently they are both set to the same value, and using anything other than that breaks userspace programs that get them mixed up. > - statvfs(2) lacks f_type. > > Is there anything more we should add here? Samba wants a capabilities > field, with things like sparse files, quotas, compression, encryption, > case preserving/sensitive. It wouldn't be a bad idea, but then you could get into issues of what exactly the above flags mean. That said, I think it is better to have broad categories of features that may be slightly ill-defined than having nothing at all. Cheers, Andreas ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-24 23:13 ` Andreas Dilger @ 2010-06-25 4:01 ` Nick Piggin 2010-06-25 4:33 ` Jeff Garzik 2010-06-25 17:47 ` Andreas Dilger 0 siblings, 2 replies; 29+ messages in thread From: Nick Piggin @ 2010-06-25 4:01 UTC (permalink / raw) To: Andreas Dilger Cc: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On Thu, Jun 24, 2010 at 05:13:38PM -0600, Andreas Dilger wrote: > On 2010-06-24, at 07:14, Nick Piggin wrote: > > This means glibc has to provide f_flag support by parsing /proc/mounts > > and stat(2)ing mount points. This is really slow, and /proc/mounts is > > hard for the kernel to provide. > > Not only that, but if a mountpoint is broken (e.g. remote NFS server) then the glibc stat of all the mountpoints can hang the statvfs() call even if there is no interest in that particular filesystem. Good point. > > It's actually the last scalability bottleneck in the core vfs for dbench (samba) after my patches. > > > > Not only that, but it's racy. > > > > Other than types, other differences are: > > - statvfs(2) has is f_frsize, which seems fairly useless. > > Actually, we were just lamenting the fact that f_frsize is currently broken, because Lustre wants to export the IO size as 1MB for good RPC performance, but the underlying blocksize is 4kB (ext3 blocksize). Similarly, NFS might want to export the rsize/wsize of 32kB or 64kB even if the underlying filesystem blocksize is smaller. > > > - statvfs(2) has f_favail. > > - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs > > block size. The latter could be useful for disk space algorithms. > > Both can be ill defned. > > According to POSIX, "f_bsize" is the blocksize, but unfortunately this was botched in the earlier Linux implementations so currently they are both set to the same value, and using anything other than that breaks userspace programs that get them mixed up. So is "frsize" supposed to be the optimal block size, or what? f_bsize AFAIKS should be filesystem allocation block size because apparently some programs require it to calculate size of file on disk. If we can't change existing suboptimal legacy things, then let's introduce new APIs that do the right thing. Apps that care will eventually start using eg. a new syscall. > > > - statvfs(2) lacks f_type. > > > > Is there anything more we should add here? Samba wants a capabilities > > field, with things like sparse files, quotas, compression, encryption, > > case preserving/sensitive. > > It wouldn't be a bad idea, but then you could get into issues of what exactly the above flags mean. That said, I think it is better to have broad categories of features that may be slightly ill-defined than having nothing at all. Yes it would be tricky. I don't want to add features that will just be useless or go unused, but I don't want to change the syscall API just to add f_flags, without looking at other possibilities. Thanks, Nick ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-25 4:01 ` Nick Piggin @ 2010-06-25 4:33 ` Jeff Garzik 2010-06-25 17:47 ` Andreas Dilger 1 sibling, 0 replies; 29+ messages in thread From: Jeff Garzik @ 2010-06-25 4:33 UTC (permalink / raw) To: Nick Piggin Cc: Andreas Dilger, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On 06/25/2010 12:01 AM, Nick Piggin wrote: > So is "frsize" supposed to be the optimal block size, or what? > f_bsize AFAIKS should be filesystem allocation block size because > apparently some programs require it to calculate size of file on > disk. > > If we can't change existing suboptimal legacy things, then let's > introduce new APIs that do the right thing. Apps that care will > eventually start using eg. a new syscall. > >> >>> - statvfs(2) lacks f_type. >>> >>> Is there anything more we should add here? Samba wants a capabilities >>> field, with things like sparse files, quotas, compression, encryption, >>> case preserving/sensitive. >> >> It wouldn't be a bad idea, but then you could get into issues of what exactly the above flags mean. That said, I think it is better to have broad categories of features that may be slightly ill-defined than having nothing at all. > > Yes it would be tricky. I don't want to add features that will just > be useless or go unused, but I don't want to change the syscall API > just to add f_flags, without looking at other possibilities. It would be nice to separate capabilities and fixed parameters (block size) from statistics which change frequently (free space). And are capabilities really suited to a C struct, at all? That seems more suited to a key/value type interface, a la NFSv4 attributes. Jeff ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-25 4:01 ` Nick Piggin 2010-06-25 4:33 ` Jeff Garzik @ 2010-06-25 17:47 ` Andreas Dilger 2010-06-25 17:52 ` Ulrich Drepper 1 sibling, 1 reply; 29+ messages in thread From: Andreas Dilger @ 2010-06-25 17:47 UTC (permalink / raw) To: Nick Piggin Cc: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On 2010-06-24, at 22:01, Nick Piggin wrote: > On Thu, Jun 24, 2010 at 05:13:38PM -0600, Andreas Dilger wrote: >>> Other than types, other differences are: >>> - statvfs(2) has is f_frsize, which seems fairly useless. >> >> Actually, we were just lamenting the fact that f_frsize is currently broken, because Lustre wants to export the IO size as 1MB for good RPC performance, but the underlying blocksize is 4kB (ext3 blocksize). Similarly, NFS might want to export the rsize/wsize of 32kB or 64kB even if the underlying filesystem blocksize is smaller. >> >> >>> - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs >>> block size. The latter could be useful for disk space algorithms. >>> Both can be ill defned. >> >> According to POSIX, "f_bsize" is the blocksize, but unfortunately this was Doh, typo. "f_frsize" is the "blocksize" (i.e. the units of f_blocks), and "f_bsize" is the "optimal IO size". The SUSv2 includes the following field definitions (not showing all of them): > unsigned long f_bsize file system block size > unsigned long f_frsize fundamental filesystem block size > fsblkcnt_t f_blocks total number of blocks on file system > in units of f_frsize >> botched in the earlier Linux implementations so currently they are both set to the same value, and using anything other than that breaks userspace programs that get them mixed up. > > So is "frsize" supposed to be the optimal block size, or what? No, "frsize" is the minimum allocation unit - it is "fragment size". > f_bsize AFAIKS should be filesystem allocation block size because > apparently some programs require it to calculate size of file on > disk. Using statvfs()/struct statvfs clearly documents that f_blocks is in units of f_frsize, but since this is a relatively new API on Linux, and statfs() used f_bsize for years to mean the same thing some applications are broken. > If we can't change existing suboptimal legacy things, then let's > introduce new APIs that do the right thing. Apps that care will > eventually start using eg. a new syscall. I'd rather NOT start a proliferation of redundant syscalls, since there is no expectation that they will be used correctly either, and it just makes applications less portable. I think it less effort to fix the few current applications using sys_statvfs() incorrectly to use f_frsize than to use some new linux-only syscall. >> It wouldn't be a bad idea, but then you could get into issues of what exactly the above flags mean. That said, I think it is better to have broad categories of features that may be slightly ill-defined than having nothing at all. > > Yes it would be tricky. I don't want to add features that will just > be useless or go unused, but I don't want to change the syscall API > just to add f_flags, without looking at other possibilities. SUSv2 only defines the flags ST_RDONLY and ST_NOSUID, and this is also what is documented in the Linux/BSD/OSX statvfs(3) man page. According to the Solaris statvfs(3) man page I found it additionally defines: ST_NOTRUNC 0x04 /* does not truncate file names longer than NAME_MAX */ Cheers, Andreas ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-25 17:47 ` Andreas Dilger @ 2010-06-25 17:52 ` Ulrich Drepper 2010-06-25 18:16 ` Christoph Hellwig 0 siblings, 1 reply; 29+ messages in thread From: Ulrich Drepper @ 2010-06-25 17:52 UTC (permalink / raw) To: Andreas Dilger Cc: Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On Fri, Jun 25, 2010 at 10:47, Andreas Dilger <adilger@dilger.ca> wrote: > SUSv2 only defines the flags ST_RDONLY and ST_NOSUID, and this is also what is documented in the Linux/BSD/OSX statvfs(3) man page. According to the Solaris statvfs(3) man page I found it additionally defines: > > ST_NOTRUNC 0x04 /* does not truncate file names longer than > NAME_MAX */ glibc supports many more flags. SuS of course has to restrict itself, there are not that many flags which are portable and available on all the platforms. Look at /usr/include/bits/statvfs.h for what has to be supported and the values to use. If the values the kernel will use differ I'd have to (unnecessarily) convert the values. If some values are missing/not supported I still would have to use /proc/mounts and nothing is gained. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-25 17:52 ` Ulrich Drepper @ 2010-06-25 18:16 ` Christoph Hellwig 2010-06-25 18:45 ` Christoph Hellwig 0 siblings, 1 reply; 29+ messages in thread From: Christoph Hellwig @ 2010-06-25 18:16 UTC (permalink / raw) To: Ulrich Drepper Cc: Andreas Dilger, Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On Fri, Jun 25, 2010 at 10:52:05AM -0700, Ulrich Drepper wrote: > there are not that many flags which are portable and available on all > the platforms. Look at /usr/include/bits/statvfs.h for what has to be > supported and the values to use. If the values the kernel will use > differ I'd have to (unnecessarily) convert the values. If some values > are missing/not supported I still would have to use /proc/mounts and > nothing is gained. I don't quite get what ST_WRITE is supposed to mean. All but that one can be supported trivially. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-25 18:16 ` Christoph Hellwig @ 2010-06-25 18:45 ` Christoph Hellwig 2010-06-25 19:40 ` Ulrich Drepper 0 siblings, 1 reply; 29+ messages in thread From: Christoph Hellwig @ 2010-06-25 18:45 UTC (permalink / raw) To: Ulrich Drepper Cc: Andreas Dilger, Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On Fri, Jun 25, 2010 at 02:16:38PM -0400, Christoph Hellwig wrote: > On Fri, Jun 25, 2010 at 10:52:05AM -0700, Ulrich Drepper wrote: > > there are not that many flags which are portable and available on all > > the platforms. Look at /usr/include/bits/statvfs.h for what has to be > > supported and the values to use. If the values the kernel will use > > differ I'd have to (unnecessarily) convert the values. If some values > > are missing/not supported I still would have to use /proc/mounts and > > nothing is gained. > > I don't quite get what ST_WRITE is supposed to mean. All but that one > can be supported trivially. In addition ST_APPEND and ST_IMMUTABLE are rather puzzling. Do you really want these to mean if the file we call statfs on have the immutable/append only bits set? That is mixing two bits of stat information into statfs? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-25 18:45 ` Christoph Hellwig @ 2010-06-25 19:40 ` Ulrich Drepper 0 siblings, 0 replies; 29+ messages in thread From: Ulrich Drepper @ 2010-06-25 19:40 UTC (permalink / raw) To: Christoph Hellwig Cc: Andreas Dilger, Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On Fri, Jun 25, 2010 at 11:45, Christoph Hellwig <hch@infradead.org> wrote: > I don't quite get what ST_WRITE is supposed to mean. All but that one > can be supported trivially. ST_WRITE comes elsewhere. We don't use it on Linux. > In addition ST_APPEND and ST_IMMUTABLE are rather puzzling. Do you > really want these to mean if the file we call statfs on have the > immutable/append only bits set? That is mixing two bits of stat > information into statfs? Ignore these as well, they also has a different source. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-24 13:14 [rfc] new stat*fs-like syscall? Nick Piggin ` (2 preceding siblings ...) 2010-06-24 23:13 ` Andreas Dilger @ 2010-06-26 5:53 ` J. R. Okajima 2010-06-26 9:35 ` Christoph Hellwig 2010-06-26 10:13 ` Andi Kleen 4 siblings, 1 reply; 29+ messages in thread From: J. R. Okajima @ 2010-06-26 5:53 UTC (permalink / raw) To: Nick Piggin Cc: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds Nick Piggin: > Is there anything more we should add here? Samba wants a capabilities > field, with things like sparse files, quotas, compression, encryption, > case preserving/sensitive. How about the max link count? There was a post in last December. See <http://marc.info/?l=linux-kernel&m=126008640210762&w=2> and its thread in detail. J. R. Okajima ---------------------------------------------------------------------- The pathconf(_PC_LINK_MAX) cannot get the correct value, since linux kernel doesn't provide such interface. And the current implementation in GLibc issues statfs(2) first and then returns the predefined value (EXT2_LINK_MAX, etc) based upoin the filesystem type. But GLibc doesn't support all filesystem types. ie. when the target filesystem is unknown to pathconf(3), it will return LINUX_LINK_MAX (127). For GLibc, there is no way except implementing this poor method. This patch makes statfs(2) return the correct value via struct statfs.f_spare[0]. RFC: - Can we use f_spare for this purpose? - Does pathconf(_PC_LINK_MAX) distinguish a dir and a non-dir? If a filesystem sets different limit for a dir as a link count from a non-dir, then should the filesystem checks the type of the specified dentry->d_inode->i_mode and return the different value? This patch series doesn't distinguish them and return a single value. - Here I tried supporting only ext[23], nfs and tmpfs. Since I can test them by myself. I left other FSs as it is, which means if FS doesn't support _PC_LINK_MAX by modifying its s_op->statfs(), the default value will be returned. The default value is taken from GLibc trying to keep the compatibility. But it may not be important. - Some FS such as ms-dos based one which doesn't support hardlink, will return LINK_MAX_UNSUPPORTED which is defined as 1. - Other FS such as tmpfs which doesn't check the link count in link(2), will return LINK_MAX_UNLIMITED which is defined as -1. This value doesn't mean an error. The negative return value of pathconf(3) is valid. Even if linux kernel return a correct value via statfs(2) (or anything else), users will not get the value at once since the support in libc is necessary too. J. R. Okajima (5): vfs, support pathconf(3) with _PC_LINK_MAX ext2, support pathconf(3) with _PC_LINK_MAX ext3, support pathconf(3) with _PC_LINK_MAX nfs, support pathconf(3) with _PC_LINK_MAX tmpfs, support pathconf(3) with _PC_LINK_MAX fs/compat.c | 5 +++-- fs/ext2/super.c | 1 + fs/ext3/super.c | 1 + fs/libfs.c | 1 + fs/nfs/client.c | 10 +++++++--- fs/nfs/super.c | 1 + fs/open.c | 9 +++++++-- include/linux/nfs_fs_sb.h | 1 + include/linux/statfs.h | 6 ++++++ mm/shmem.c | 1 + 10 files changed, 29 insertions(+), 7 deletions(-) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-26 5:53 ` J. R. Okajima @ 2010-06-26 9:35 ` Christoph Hellwig 2010-06-26 12:54 ` J. R. Okajima 2010-06-26 14:49 ` Ulrich Drepper 0 siblings, 2 replies; 29+ messages in thread From: Christoph Hellwig @ 2010-06-26 9:35 UTC (permalink / raw) To: J. R. Okajima Cc: Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On Sat, Jun 26, 2010 at 02:53:32PM +0900, J. R. Okajima wrote: > > Nick Piggin: > > Is there anything more we should add here? Samba wants a capabilities > > field, with things like sparse files, quotas, compression, encryption, > > case preserving/sensitive. > > How about the max link count? > There was a post in last December. > See <http://marc.info/?l=linux-kernel&m=126008640210762&w=2> and its > thread in detail. That's really job for a pathconf system call that allows quering random paramters. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-26 9:35 ` Christoph Hellwig @ 2010-06-26 12:54 ` J. R. Okajima 2010-07-05 20:58 ` Brad Boyer 2010-06-26 14:49 ` Ulrich Drepper 1 sibling, 1 reply; 29+ messages in thread From: J. R. Okajima @ 2010-06-26 12:54 UTC (permalink / raw) To: Christoph Hellwig Cc: Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds Christoph Hellwig: > That's really job for a pathconf system call that allows quering random > paramters. Do you mean it should be implemented such like this? vfs_pathconf(struct dentry, int parm) --> return d_sb->s_op->pathconf(parm) I am afraid it is overdesign because the actual parameter(for FS) is _PC_LINK_MAX only. All other params are already handled by VFS, glibc or sb->statfs. J. R. Okajima (pathconf(3) parameters from the manual) _PC_LINK_MAX returns the maximum number of links to the file. If fd or path refer to a direc- tory, then the value applies to the whole directory. The corresponding macro is _POSIX_LINK_MAX. _PC_MAX_CANON returns the maximum length of a formatted input line, where fd or path must refer to a terminal. The corresponding macro is _POSIX_MAX_CANON. _PC_MAX_INPUT returns the maximum length of an input line, where fd or path must refer to a ter- minal. The corresponding macro is _POSIX_MAX_INPUT. _PC_NAME_MAX returns the maximum length of a filename in the directory path or fd that the pro- cess is allowed to create. The corresponding macro is _POSIX_NAME_MAX. _PC_PATH_MAX returns the maximum length of a relative pathname when path or fd is the current working directory. The corresponding macro is _POSIX_PATH_MAX. _PC_PIPE_BUF returns the size of the pipe buffer, where fd must refer to a pipe or FIFO and path must refer to a FIFO. The corresponding macro is _POSIX_PIPE_BUF. _PC_CHOWN_RESTRICTED returns non-zero if the chown(2) call may not be used on this file. If fd or path refer to a directory, then this applies to all files in that directory. The corre- sponding macro is _POSIX_CHOWN_RESTRICTED. _PC_NO_TRUNC returns non-zero if accessing filenames longer than _POSIX_NAME_MAX generates an error. The corresponding macro is _POSIX_NO_TRUNC. _PC_VDISABLE returns non-zero if special character processing can be disabled, where fd or path must refer to a terminal. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-26 12:54 ` J. R. Okajima @ 2010-07-05 20:58 ` Brad Boyer 2010-07-05 23:31 ` J. R. Okajima 0 siblings, 1 reply; 29+ messages in thread From: Brad Boyer @ 2010-07-05 20:58 UTC (permalink / raw) To: J. R. Okajima Cc: Christoph Hellwig, Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On Sat, Jun 26, 2010 at 09:54:44PM +0900, J. R. Okajima wrote: > Christoph Hellwig: > > That's really job for a pathconf system call that allows quering random > > paramters. > > Do you mean it should be implemented such like this? > vfs_pathconf(struct dentry, int parm) > --> return d_sb->s_op->pathconf(parm) I would suggest making it an inode operation if we do actually add it. Most cases are going to be per super-block, but it might be easier to transparently handle things like _PC_PIPE_BUF in glibc if it could call an fpathconf type system call on the pipe fd. I haven't looked at the current glibc code for that particular selector. The only one I looked at in any detail was _PC_LINK_MAX, which is the one you already discussed and is obviously a per-sb option. The only drawback I can see is that making it an inode operation would make the vfs_pathconf fail on a negative dentry, but that seems like a very strange thing to support in any case. > I am afraid it is overdesign because the actual parameter(for FS) is > _PC_LINK_MAX only. All other params are already handled by VFS, glibc or > sb->statfs. Brad Boyer flar@allandria.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-07-05 20:58 ` Brad Boyer @ 2010-07-05 23:31 ` J. R. Okajima 2010-07-06 0:45 ` Brad Boyer 0 siblings, 1 reply; 29+ messages in thread From: J. R. Okajima @ 2010-07-05 23:31 UTC (permalink / raw) To: Brad Boyer Cc: Christoph Hellwig, Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds Brad Boyer: > I would suggest making it an inode operation if we do actually add it. Most > cases are going to be per super-block, but it might be easier to transparently > handle things like _PC_PIPE_BUF in glibc if it could call an fpathconf type > system call on the pipe fd. I haven't looked at the current glibc code for > that particular selector. The only one I looked at in any detail was > _PC_LINK_MAX, which is the one you already discussed and is obviously a > per-sb option. The only drawback I can see is that making it an inode > operation would make the vfs_pathconf fail on a negative dentry, but that > seems like a very strange thing to support in any case. Recently the size of the pipe buffer becomes customizable, doesn't it? For _PC_PIPE_BUF, fpathconf should issue fcntl(F_GETPIPE_SZ). For negative dentry, it should be supported as long as some standard/specification doesn't prohibit explicitly. So I still think statfs is the best place to implement _PC_LINK_MAX. J. R. Okajima ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-07-05 23:31 ` J. R. Okajima @ 2010-07-06 0:45 ` Brad Boyer 2010-07-06 16:45 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: Brad Boyer @ 2010-07-06 0:45 UTC (permalink / raw) To: J. R. Okajima Cc: Christoph Hellwig, Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On Tue, Jul 06, 2010 at 08:31:30AM +0900, J. R. Okajima wrote: > Recently the size of the pipe buffer becomes customizable, doesn't it? > For _PC_PIPE_BUF, fpathconf should issue fcntl(F_GETPIPE_SZ). That should work and is in line with my understanding of the current code for pathconf in glibc. > For negative dentry, it should be supported as long as some > standard/specification doesn't prohibit explicitly. So I still think > statfs is the best place to implement _PC_LINK_MAX. If we're going to be changing statfs (or adding a new system call) anyway, that does seem like a reasonable place to export this data along with whatever else gets added. With the various things that have been suggested, maybe we need something more like the stat replacement that has been getting discussed with the room for some larger optional fields and a way to request a specific set of fields. Brad Boyer flar@allandria.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-07-06 0:45 ` Brad Boyer @ 2010-07-06 16:45 ` Linus Torvalds 2010-07-07 1:44 ` Christoph Hellwig 0 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2010-07-06 16:45 UTC (permalink / raw) To: Brad Boyer Cc: J. R. Okajima, Christoph Hellwig, Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper [-- Attachment #1: Type: text/plain, Size: 1106 bytes --] On Mon, Jul 5, 2010 at 5:45 PM, Brad Boyer <flar@allandria.com> wrote: > On Tue, Jul 06, 2010 at 08:31:30AM +0900, J. R. Okajima wrote: >> For negative dentry, it should be supported as long as some >> standard/specification doesn't prohibit explicitly. So I still think >> statfs is the best place to implement _PC_LINK_MAX. > > If we're going to be changing statfs (or adding a new system call) > anyway, that does seem like a reasonable place to export this data > along with whatever else gets added. With the various things that > have been suggested, maybe we need something more like the stat > replacement that has been getting discussed with the room for some > larger optional fields and a way to request a specific set of fields. Let's not overdesign things. Just do something like the attached patch, which is the obvious and straightforward thing to do. Overdesigning is a disease. It's fundamentally wrong. (Yeah, yeah,. the patch is untested, and doesn't actually _fill_ the new f_flags value, but that's left as a trivial exercise for the reader.) Linus [-- Attachment #2: diff --] [-- Type: application/octet-stream, Size: 5204 bytes --] arch/ia64/include/asm/compat.h | 3 ++- arch/mips/include/asm/statfs.h | 12 ++++++++---- arch/s390/include/asm/statfs.h | 9 ++++++--- arch/x86/include/asm/compat.h | 3 ++- fs/compat.c | 5 +++-- include/asm-generic/statfs.h | 9 ++++++--- include/linux/statfs.h | 3 ++- 7 files changed, 29 insertions(+), 15 deletions(-) diff --git a/arch/ia64/include/asm/compat.h b/arch/ia64/include/asm/compat.h index f90edc8..ab15469 100644 --- a/arch/ia64/include/asm/compat.h +++ b/arch/ia64/include/asm/compat.h @@ -105,7 +105,8 @@ struct compat_statfs { compat_fsid_t f_fsid; int f_namelen; /* SunOS ignores this field. */ int f_frsize; - int f_spare[5]; + int f_flags; + int f_spare[4]; }; #define COMPAT_RLIM_OLD_INFINITY 0x7fffffff diff --git a/arch/mips/include/asm/statfs.h b/arch/mips/include/asm/statfs.h index c3ddf97..0f805c7 100644 --- a/arch/mips/include/asm/statfs.h +++ b/arch/mips/include/asm/statfs.h @@ -33,7 +33,8 @@ struct statfs { /* Linux specials */ __kernel_fsid_t f_fsid; long f_namelen; - long f_spare[6]; + long f_flags; + long f_spare[5]; }; #if (_MIPS_SIM == _MIPS_SIM_ABI32) || (_MIPS_SIM == _MIPS_SIM_NABI32) @@ -53,7 +54,8 @@ struct statfs64 { __u64 f_bavail; __kernel_fsid_t f_fsid; __u32 f_namelen; - __u32 f_spare[6]; + __u32 f_flags; + __u32 f_spare[5]; }; #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */ @@ -73,7 +75,8 @@ struct statfs64 { /* Same as struct statfs */ /* Linux specials */ __kernel_fsid_t f_fsid; long f_namelen; - long f_spare[6]; + long f_flags; + long f_spare[5]; }; struct compat_statfs64 { @@ -88,7 +91,8 @@ struct compat_statfs64 { __u64 f_bavail; __kernel_fsid_t f_fsid; __u32 f_namelen; - __u32 f_spare[6]; + __u32 f_flags; + __u32 f_spare[5]; }; #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */ diff --git a/arch/s390/include/asm/statfs.h b/arch/s390/include/asm/statfs.h index 06cc703..3be7fbd 100644 --- a/arch/s390/include/asm/statfs.h +++ b/arch/s390/include/asm/statfs.h @@ -33,7 +33,8 @@ struct statfs { __kernel_fsid_t f_fsid; int f_namelen; int f_frsize; - int f_spare[5]; + int f_flags; + int f_spare[4]; }; struct statfs64 { @@ -47,7 +48,8 @@ struct statfs64 { __kernel_fsid_t f_fsid; int f_namelen; int f_frsize; - int f_spare[5]; + int f_flags; + int f_spare[4]; }; struct compat_statfs64 { @@ -61,7 +63,8 @@ struct compat_statfs64 { __kernel_fsid_t f_fsid; __u32 f_namelen; __u32 f_frsize; - __u32 f_spare[5]; + __u32 f_flags; + __u32 f_spare[4]; }; #endif /* __s390x__ */ diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h index 306160e..9f9cdb8 100644 --- a/arch/x86/include/asm/compat.h +++ b/arch/x86/include/asm/compat.h @@ -108,7 +108,8 @@ struct compat_statfs { compat_fsid_t f_fsid; int f_namelen; /* SunOS ignores this field. */ int f_frsize; - int f_spare[5]; + int f_flags; + int f_spare[4]; }; #define COMPAT_RLIM_OLD_INFINITY 0x7fffffff diff --git a/fs/compat.c b/fs/compat.c index 6490d21..fe96e7d 100644 --- a/fs/compat.c +++ b/fs/compat.c @@ -245,7 +245,7 @@ static int put_compat_statfs(struct compat_statfs __user *ubuf, struct kstatfs * __put_user(kbuf->f_fsid.val[0], &ubuf->f_fsid.val[0]) || __put_user(kbuf->f_fsid.val[1], &ubuf->f_fsid.val[1]) || __put_user(kbuf->f_frsize, &ubuf->f_frsize) || - __put_user(0, &ubuf->f_spare[0]) || + __put_user(kbuf->f_flags, &ubuf->f_flags) || __put_user(0, &ubuf->f_spare[1]) || __put_user(0, &ubuf->f_spare[2]) || __put_user(0, &ubuf->f_spare[3]) || @@ -318,7 +318,8 @@ static int put_compat_statfs64(struct compat_statfs64 __user *ubuf, struct kstat __put_user(kbuf->f_namelen, &ubuf->f_namelen) || __put_user(kbuf->f_fsid.val[0], &ubuf->f_fsid.val[0]) || __put_user(kbuf->f_fsid.val[1], &ubuf->f_fsid.val[1]) || - __put_user(kbuf->f_frsize, &ubuf->f_frsize)) + __put_user(kbuf->f_frsize, &ubuf->f_frsize) || + __put_user(kbuf->f_flags, &ubuf->f_flags)) return -EFAULT; return 0; } diff --git a/include/asm-generic/statfs.h b/include/asm-generic/statfs.h index 3b4fb3e..0fd28e0 100644 --- a/include/asm-generic/statfs.h +++ b/include/asm-generic/statfs.h @@ -33,7 +33,8 @@ struct statfs { __kernel_fsid_t f_fsid; __statfs_word f_namelen; __statfs_word f_frsize; - __statfs_word f_spare[5]; + __statfs_word f_flags; + __statfs_word f_spare[4]; }; /* @@ -55,7 +56,8 @@ struct statfs64 { __kernel_fsid_t f_fsid; __statfs_word f_namelen; __statfs_word f_frsize; - __statfs_word f_spare[5]; + __statfs_word f_flags; + __statfs_word f_spare[4]; } ARCH_PACK_STATFS64; /* @@ -77,7 +79,8 @@ struct compat_statfs64 { __kernel_fsid_t f_fsid; __u32 f_namelen; __u32 f_frsize; - __u32 f_spare[5]; + __u32 f_flags; + __u32 f_spare[4]; } ARCH_PACK_COMPAT_STATFS64; #endif diff --git a/include/linux/statfs.h b/include/linux/statfs.h index b34cc82..dd8b4e7 100644 --- a/include/linux/statfs.h +++ b/include/linux/statfs.h @@ -16,7 +16,8 @@ struct kstatfs { __kernel_fsid_t f_fsid; long f_namelen; long f_frsize; - long f_spare[5]; + long f_flags; + long f_spare[4]; }; #endif ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-07-06 16:45 ` Linus Torvalds @ 2010-07-07 1:44 ` Christoph Hellwig 2010-07-07 2:28 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: Christoph Hellwig @ 2010-07-07 1:44 UTC (permalink / raw) To: Linus Torvalds Cc: Brad Boyer, J. R. Okajima, Christoph Hellwig, Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper On Tue, Jul 06, 2010 at 09:45:26AM -0700, Linus Torvalds wrote: > Let's not overdesign things. Just do something like the attached > patch, which is the obvious and straightforward thing to do. > > Overdesigning is a disease. It's fundamentally wrong. > > (Yeah, yeah,. the patch is untested, and doesn't actually _fill_ the > new f_flags value, but that's left as a trivial exercise for the > reader.) At least one of the readers posted a patch filling it in already. Need to send out the version with the review comments addressed, but I'm still waiting for Uli if he really insists on new syscall vectors for the same structure. Using that one ST_VALID bit seems a lot easier to me. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-07-07 1:44 ` Christoph Hellwig @ 2010-07-07 2:28 ` Linus Torvalds 0 siblings, 0 replies; 29+ messages in thread From: Linus Torvalds @ 2010-07-07 2:28 UTC (permalink / raw) To: Christoph Hellwig Cc: Brad Boyer, J. R. Okajima, Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper On Tue, Jul 6, 2010 at 6:44 PM, Christoph Hellwig <hch@infradead.org> wrote: > > I'm still waiting for Uli if he really insists on new syscall vectors > for the same structure. Using that one ST_VALID bit seems a lot easier > to me. Umm. Uli doesn't get to choose kernel system call conventions. It matters not one whit whether he insists on new system calls or not, it's not going to happen. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-26 9:35 ` Christoph Hellwig 2010-06-26 12:54 ` J. R. Okajima @ 2010-06-26 14:49 ` Ulrich Drepper 1 sibling, 0 replies; 29+ messages in thread From: Ulrich Drepper @ 2010-06-26 14:49 UTC (permalink / raw) To: Christoph Hellwig Cc: J. R. Okajima, Nick Piggin, linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds On Sat, Jun 26, 2010 at 02:35, Christoph Hellwig <hch@infradead.org> wrote: > That's really job for a pathconf system call that allows quering random > paramters. Linus has always objected to sysconf/pathconf-like syscalls. If you get it in I'm all for it. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [rfc] new stat*fs-like syscall? 2010-06-24 13:14 [rfc] new stat*fs-like syscall? Nick Piggin ` (3 preceding siblings ...) 2010-06-26 5:53 ` J. R. Okajima @ 2010-06-26 10:13 ` Andi Kleen 4 siblings, 0 replies; 29+ messages in thread From: Andi Kleen @ 2010-06-26 10:13 UTC (permalink / raw) To: Nick Piggin Cc: linux-fsdevel, linux-kernel, Al Viro, Ulrich Drepper, Linus Torvalds Nick Piggin <npiggin@suse.de> writes: > Other than types, other differences are: > - statvfs(2) has is f_frsize, which seems fairly useless. > - statvfs(2) has f_favail. > - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs > block size. The latter could be useful for disk space algorithms. > Both can be ill defned. > - statvfs(2) lacks f_type. > > Is there anything more we should add here? Samba wants a capabilities > field, with things like sparse files, quotas, compression, encryption, > case preserving/sensitive. I wonder if it would make sense to export the time stamp granuality of the time stamps? We already have this information internally, and it might allow user land to optimize its stat frequency or comparison. Some file systems also have quotas with "project ids". Maybe add that too? I think NTFS et.al. also have some more time stamps, but not sure there's enough space for that. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2010-07-07 2:29 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-06-24 13:14 [rfc] new stat*fs-like syscall? Nick Piggin 2010-06-24 14:03 ` Miklos Szeredi 2010-06-24 14:36 ` Nick Piggin 2010-06-24 14:08 ` Andy Lutomirski 2010-06-24 14:18 ` Miklos Szeredi 2010-06-24 14:37 ` Andrew Lutomirski 2010-06-24 14:48 ` Miklos Szeredi 2010-06-25 3:50 ` Nick Piggin 2010-06-24 23:06 ` Andreas Dilger 2010-06-25 6:37 ` Christoph Hellwig 2010-06-24 23:13 ` Andreas Dilger 2010-06-25 4:01 ` Nick Piggin 2010-06-25 4:33 ` Jeff Garzik 2010-06-25 17:47 ` Andreas Dilger 2010-06-25 17:52 ` Ulrich Drepper 2010-06-25 18:16 ` Christoph Hellwig 2010-06-25 18:45 ` Christoph Hellwig 2010-06-25 19:40 ` Ulrich Drepper 2010-06-26 5:53 ` J. R. Okajima 2010-06-26 9:35 ` Christoph Hellwig 2010-06-26 12:54 ` J. R. Okajima 2010-07-05 20:58 ` Brad Boyer 2010-07-05 23:31 ` J. R. Okajima 2010-07-06 0:45 ` Brad Boyer 2010-07-06 16:45 ` Linus Torvalds 2010-07-07 1:44 ` Christoph Hellwig 2010-07-07 2:28 ` Linus Torvalds 2010-06-26 14:49 ` Ulrich Drepper 2010-06-26 10:13 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).