* [PATCH 0/6] Extended file stat system call @ 2012-04-19 14:05 ` David Howells 0 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-19 14:05 UTC (permalink / raw) To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw Implement a pair of new system calls to provide extended and further extensible stat functions. The second of the associated patches is the main patch that provides these new system calls: ssize_t ret = xstat(int dfd, const char *filename, unsigned atflag, unsigned mask, struct xstat *buffer); ssize_t ret = fxstat(int fd, unsigned atflag, unsigned mask, struct xstat *buffer); which are more fully documented in the first patch's description. These new stat functions provide a number of useful features, in summary: (1) More information: creation time, inode generation number, data version number, flags/attributes. A subset of these is available through a number of filesystems (such as CIFS, NFS, AFS, Ext4 and BTRFS). (2) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server. (3) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date. (4) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available. (5) Make the fields a consistent size on all arches, and make them large. (6) Can be extended by using more request flags and appending further data after the end of the standard return data. Note that no lstat() equivalent is required as that can be implemented through xstat() with atflag == 0. ======= PATCHES ======= Patch 1 defines the xstat() and fxstat() system calls. Patches 2-6 implement extended stat facilities for Ext4, AFS, NFS and CIFS, and make eCryptFS go to the lower filesystem for such details. ============== CONSIDERATIONS ============== Should fxstat() be implemented as xstat() with a NULL filename, using dfd as fd? Should the default for a network fs be to do an unconditional (heavyweight) stat with a flag to suppress going to the server to update the locally held attributes and flushing pending writebacks? Should things like the Windows Archive, Hidden and System bits be handled through IOC flags, perhaps expanded to 64-bits? Are these things useful to userspace other than Samba and userspace NFS servers? Is it useful to pass the volume ID out? Or is statfs() sufficient for this? Should I add a sixth argument to xstat(), mark it reserved and require that must be supplied as 0 to hedge against future use? Is there anything else I can usefully add at the moment? ========== TO BE DONE ========== Autofs, ntfs, btrfs, ... I should perhaps use u8/u32/u64 rather than uint8/32/64_t. Handle remote filesystems being offline and indicate this with XSTAT_INFO_OFFLINE. ======= TESTING ======= There's a test program attached to the description for the main patch. It can be run as follows: [root@andromeda tmp]# ./xstat -R /mnt/foo xstat(/mnt/foo) = 0 0000: 000081a40000ffef 0000000000000001 0000020000000000 0000100000080000 0020: 0000000000000000 0000000600000008 000000004f88499a 0000000136fd9208 0040: 000000004f88499a 0000000136fd9208 000000004f8849b9 0000000106daf187 0060: 000000004f8849b9 0000000106daf187 000000000000000c 000000000000000f 0080: 0000000000000008 00000000484ebbef 0000000000000025 5949ebd4711efd82 00a0: d3250b5c15d5e380 0000000000000000 0000000000000000 0000000000000000 00c0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 00e0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 results=ffef Size: 15 Blocks: 8 IO Block: 4096 regular file Device: 08:06 Inode: 12 Links: 1 Access: (0644/-rw-r--r--) Uid: 0 Gid: 0 Access: 2012-04-13 16:43:22.922587656+0100 Modify: 2012-04-13 16:43:53.115011975+0100 Change: 2012-04-13 16:43:53.115011975+0100 Create: 2012-04-13 16:43:22.922587656+0100 Inode version: 484ebbefh Data version: 25h Inode flags: 00080000 (-------- ----e--- -------- --------) Information: 00000200 (-------- -------- ------a- --------) Volume ID: 82fd1e71d4eb4959-80e3d5155c0b25d3 David --- David Howells (6): xstat: eCryptFS: Return extended attributes xstat: CIFS: Return extended attributes xstat: NFS: Return extended attributes xstat: AFS: Return extended attributes xstat: Ext4: Return extended attributes xstat: Add a pair of system calls to make extended file stats available arch/x86/syscalls/syscall_32.tbl | 2 arch/x86/syscalls/syscall_64.tbl | 2 fs/afs/inode.c | 29 ++- fs/afs/super.c | 7 + fs/cifs/cifsfs.h | 4 fs/cifs/cifsglob.h | 16 +- fs/cifs/dir.c | 2 fs/cifs/inode.c | 120 +++++++++++-- fs/ecryptfs/inode.c | 14 +- fs/ext4/ext4.h | 2 fs/ext4/file.c | 2 fs/ext4/inode.c | 32 +++ fs/ext4/namei.c | 2 fs/ext4/super.c | 1 fs/ext4/symlink.c | 2 fs/nfs/inode.c | 49 ++++- fs/nfs/super.c | 1 fs/stat.c | 350 +++++++++++++++++++++++++++++++++++--- include/linux/fcntl.h | 1 include/linux/fs.h | 4 include/linux/stat.h | 126 +++++++++++++- include/linux/syscalls.h | 7 + 22 files changed, 694 insertions(+), 81 deletions(-) ^ permalink raw reply [flat|nested] 144+ messages in thread
* [PATCH 0/6] Extended file stat system call @ 2012-04-19 14:05 ` David Howells 0 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-19 14:05 UTC (permalink / raw) To: linux-fsdevel Cc: dhowells, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Implement a pair of new system calls to provide extended and further extensible stat functions. The second of the associated patches is the main patch that provides these new system calls: ssize_t ret = xstat(int dfd, const char *filename, unsigned atflag, unsigned mask, struct xstat *buffer); ssize_t ret = fxstat(int fd, unsigned atflag, unsigned mask, struct xstat *buffer); which are more fully documented in the first patch's description. These new stat functions provide a number of useful features, in summary: (1) More information: creation time, inode generation number, data version number, flags/attributes. A subset of these is available through a number of filesystems (such as CIFS, NFS, AFS, Ext4 and BTRFS). (2) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server. (3) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date. (4) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available. (5) Make the fields a consistent size on all arches, and make them large. (6) Can be extended by using more request flags and appending further data after the end of the standard return data. Note that no lstat() equivalent is required as that can be implemented through xstat() with atflag == 0. ======= PATCHES ======= Patch 1 defines the xstat() and fxstat() system calls. Patches 2-6 implement extended stat facilities for Ext4, AFS, NFS and CIFS, and make eCryptFS go to the lower filesystem for such details. ============== CONSIDERATIONS ============== Should fxstat() be implemented as xstat() with a NULL filename, using dfd as fd? Should the default for a network fs be to do an unconditional (heavyweight) stat with a flag to suppress going to the server to update the locally held attributes and flushing pending writebacks? Should things like the Windows Archive, Hidden and System bits be handled through IOC flags, perhaps expanded to 64-bits? Are these things useful to userspace other than Samba and userspace NFS servers? Is it useful to pass the volume ID out? Or is statfs() sufficient for this? Should I add a sixth argument to xstat(), mark it reserved and require that must be supplied as 0 to hedge against future use? Is there anything else I can usefully add at the moment? ========== TO BE DONE ========== Autofs, ntfs, btrfs, ... I should perhaps use u8/u32/u64 rather than uint8/32/64_t. Handle remote filesystems being offline and indicate this with XSTAT_INFO_OFFLINE. ======= TESTING ======= There's a test program attached to the description for the main patch. It can be run as follows: [root@andromeda tmp]# ./xstat -R /mnt/foo xstat(/mnt/foo) = 0 0000: 000081a40000ffef 0000000000000001 0000020000000000 0000100000080000 0020: 0000000000000000 0000000600000008 000000004f88499a 0000000136fd9208 0040: 000000004f88499a 0000000136fd9208 000000004f8849b9 0000000106daf187 0060: 000000004f8849b9 0000000106daf187 000000000000000c 000000000000000f 0080: 0000000000000008 00000000484ebbef 0000000000000025 5949ebd4711efd82 00a0: d3250b5c15d5e380 0000000000000000 0000000000000000 0000000000000000 00c0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 00e0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 results=ffef Size: 15 Blocks: 8 IO Block: 4096 regular file Device: 08:06 Inode: 12 Links: 1 Access: (0644/-rw-r--r--) Uid: 0 Gid: 0 Access: 2012-04-13 16:43:22.922587656+0100 Modify: 2012-04-13 16:43:53.115011975+0100 Change: 2012-04-13 16:43:53.115011975+0100 Create: 2012-04-13 16:43:22.922587656+0100 Inode version: 484ebbefh Data version: 25h Inode flags: 00080000 (-------- ----e--- -------- --------) Information: 00000200 (-------- -------- ------a- --------) Volume ID: 82fd1e71d4eb4959-80e3d5155c0b25d3 David --- David Howells (6): xstat: eCryptFS: Return extended attributes xstat: CIFS: Return extended attributes xstat: NFS: Return extended attributes xstat: AFS: Return extended attributes xstat: Ext4: Return extended attributes xstat: Add a pair of system calls to make extended file stats available arch/x86/syscalls/syscall_32.tbl | 2 arch/x86/syscalls/syscall_64.tbl | 2 fs/afs/inode.c | 29 ++- fs/afs/super.c | 7 + fs/cifs/cifsfs.h | 4 fs/cifs/cifsglob.h | 16 +- fs/cifs/dir.c | 2 fs/cifs/inode.c | 120 +++++++++++-- fs/ecryptfs/inode.c | 14 +- fs/ext4/ext4.h | 2 fs/ext4/file.c | 2 fs/ext4/inode.c | 32 +++ fs/ext4/namei.c | 2 fs/ext4/super.c | 1 fs/ext4/symlink.c | 2 fs/nfs/inode.c | 49 ++++- fs/nfs/super.c | 1 fs/stat.c | 350 +++++++++++++++++++++++++++++++++++--- include/linux/fcntl.h | 1 include/linux/fs.h | 4 include/linux/stat.h | 126 +++++++++++++- include/linux/syscalls.h | 7 + 22 files changed, 694 insertions(+), 81 deletions(-) ^ permalink raw reply [flat|nested] 144+ messages in thread
* [PATCH 3/6] xstat: AFS: Return extended attributes 2012-04-19 14:05 ` David Howells (?) @ 2012-04-19 14:06 ` David Howells -1 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-19 14:06 UTC (permalink / raw) To: linux-fsdevel Cc: dhowells, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Return extended attributes from the AFS filesystem. This includes the following: (1) The vnode uniquifier as st_gen. (2) The data version number as st_data_version. (3) XSTAT_INFO_AUTOMOUNT will be set on automount directories by virtue of S_AUTOMOUNT being set on the inode. These are referrals to other volumes or other cells. (4) XSTAT_INFO_AUTODIR on a directory that does cell lookup for non-existent names and mounts them (typically mounted on /afs with -o autocell). The resulting directories are marked XSTAT_INFO_FABRICATED as they do not actually exist in the mounted AFS directory. (6) Files, directories and symlinks accessed over AFS are marked XSTAT_INFO_REMOTE. (7) XSTAT_INFO_NONSYSTEM_OWNERSHIP is set as the UID and GID retrieved from an AFS share may not be applicable on the system. (8) XSTAT_INFO_HAS_ACL is set as AFS directories have ACLs (the UID and GID are only used through the ACLs) and these ACLs apply to the contents of the directories. Signed-off-by: David Howells <dhowells@redhat.com> --- fs/afs/inode.c | 29 +++++++++++++++++++++-------- fs/afs/super.c | 7 +++++++ 2 files changed, 28 insertions(+), 8 deletions(-) diff --git a/fs/afs/inode.c b/fs/afs/inode.c index d890ae3..062def2 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -71,9 +71,9 @@ static int afs_inode_map_status(struct afs_vnode *vnode, struct key *key) inode->i_uid = vnode->status.owner; inode->i_gid = 0; inode->i_size = vnode->status.size; - inode->i_ctime.tv_sec = vnode->status.mtime_server; - inode->i_ctime.tv_nsec = 0; - inode->i_atime = inode->i_mtime = inode->i_ctime; + inode->i_mtime.tv_sec = vnode->status.mtime_server; + inode->i_mtime.tv_nsec = 0; + inode->i_atime = inode->i_ctime = inode->i_mtime; inode->i_blocks = 0; inode->i_generation = vnode->fid.unique; inode->i_version = vnode->status.data_version; @@ -374,16 +374,29 @@ error_unlock: /* * read the attributes of an inode */ -int afs_getattr(struct vfsmount *mnt, struct dentry *dentry, - struct kstat *stat) +int afs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) { - struct inode *inode; - - inode = dentry->d_inode; + struct inode *inode = dentry->d_inode; _enter("{ ino=%lu v=%u }", inode->i_ino, inode->i_generation); generic_fillattr(inode, stat); + + stat->result_mask &= ~(XSTAT_ATIME | XSTAT_CTIME | XSTAT_BLOCKS); + stat->result_mask |= XSTAT_GEN | XSTAT_VERSION; + stat->gen = inode->i_generation; + stat->version = inode->i_version; + + if (test_bit(AFS_VNODE_AUTOCELL, &AFS_FS_I(inode)->flags)) + stat->information |= XSTAT_INFO_AUTODIR; + + if (test_bit(AFS_VNODE_PSEUDODIR, &AFS_FS_I(inode)->flags)) + stat->information |= XSTAT_INFO_FABRICATED; + else + stat->information |= XSTAT_INFO_REMOTE; + + stat->information |= + XSTAT_INFO_NONSYSTEM_OWNERSHIP | XSTAT_INFO_HAS_ACL; return 0; } diff --git a/fs/afs/super.c b/fs/afs/super.c index f02b31e..1f13b48 100644 --- a/fs/afs/super.c +++ b/fs/afs/super.c @@ -314,6 +314,13 @@ static int afs_fill_super(struct super_block *sb, sb->s_bdi = &as->volume->bdi; strlcpy(sb->s_id, as->volume->vlocation->vldb.name, sizeof(sb->s_id)); + /* construct a volume ID from the AFS volume ID and type */ + sb->s_volume_id[4] = as->volume->type; + sb->s_volume_id[3] = as->volume->vid >> 0; + sb->s_volume_id[2] = as->volume->vid >> 8; + sb->s_volume_id[1] = as->volume->vid >> 16; + sb->s_volume_id[0] = as->volume->vid >> 24; + /* allocate the root inode and dentry */ fid.vid = as->volume->vid; fid.vnode = 1; ^ permalink raw reply related [flat|nested] 144+ messages in thread
* [PATCH 4/6] xstat: NFS: Return extended attributes 2012-04-19 14:05 ` David Howells (?) (?) @ 2012-04-19 14:06 ` David Howells [not found] ` <20120419140653.17272.95035.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2012-04-26 13:52 ` David Howells -1 siblings, 2 replies; 144+ messages in thread From: David Howells @ 2012-04-19 14:06 UTC (permalink / raw) To: linux-fsdevel Cc: dhowells, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Return extended attributes from the NFS filesystem. This includes the following: (1) The change attribute as st_data_version if NFSv4. (2) XSTAT_INFO_AUTOMOUNT and XSTAT_INFO_FABRICATED are set on referral or submount directories that are automounted upon. NFS shows one directory with a different FSID, but the local filesystem has two: the mountpoint directory and the root of the filesystem mounted upon it. (3) XSTAT_INFO_REMOTE is set on files acquired over NFS. Furthermore, what nfs_getattr() does can be controlled as follows: (1) If AT_FORCE_ATTR_SYNC is indicated, or mtime, ctime or data_version (NFSv4 only) are requested then the outstanding writes will be written to the server first. (2) The inode's attributes may be synchronised with the server: (a) If AT_FORCE_ATTR_SYNC is indicated or if atime is requested (and atime updating is not suppressed by a mount flag) then the attributes will be reread unconditionally. (b) If the data version or any of basic stats are requested then the attributes will be reread if the cached attributes have expired. (c) Otherwise the cached attributes will be used - even if expired - without reference to the server. Signed-off-by: David Howells <dhowells@redhat.com> --- fs/nfs/inode.c | 49 +++++++++++++++++++++++++++++++++++++------------ fs/nfs/super.c | 1 + 2 files changed, 38 insertions(+), 12 deletions(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index e8bbfa5..460fcf3 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -509,11 +509,18 @@ void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr) int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) { struct inode *inode = dentry->d_inode; + unsigned force = stat->query_flags & AT_FORCE_ATTR_SYNC; int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME; int err; - /* Flush out writes to the server in order to update c/mtime. */ - if (S_ISREG(inode->i_mode)) { + if (NFS_SERVER(inode)->nfs_client->rpc_ops->version < 4) + stat->request_mask &= ~XSTAT_VERSION; + + /* Flush out writes to the server in order to update c/mtime + * or data version if the user wants them */ + if ((force || (stat->request_mask & + (XSTAT_MTIME | XSTAT_CTIME | XSTAT_VERSION))) && + S_ISREG(inode->i_mode)) { err = filemap_write_and_wait(inode->i_mapping); if (err) goto out; @@ -528,18 +535,36 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) * - NFS never sets MS_NOATIME or MS_NODIRATIME so there is * no point in checking those. */ - if ((mnt->mnt_flags & MNT_NOATIME) || - ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))) + if (mnt->mnt_flags & MNT_NOATIME || + (mnt->mnt_flags & MNT_NODIRATIME && S_ISDIR(inode->i_mode))) { + stat->ioc_flags |= FS_NOATIME_FL; + need_atime = 0; + } else if (!(stat->request_mask & XSTAT_ATIME)) { need_atime = 0; + } - if (need_atime) - err = __nfs_revalidate_inode(NFS_SERVER(inode), inode); - else - err = nfs_revalidate_inode(NFS_SERVER(inode), inode); - if (!err) { - generic_fillattr(inode, stat); - stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode)); + if (force || stat->request_mask & (XSTAT_BASIC_STATS | XSTAT_VERSION)) { + if (force || need_atime) + err = __nfs_revalidate_inode(NFS_SERVER(inode), inode); + else + err = nfs_revalidate_inode(NFS_SERVER(inode), inode); + if (err) + goto out; } + + generic_fillattr(inode, stat); + stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode)); + + if (stat->request_mask & XSTAT_VERSION) { + stat->version = inode->i_version; + stat->result_mask |= XSTAT_VERSION; + } + + if (IS_AUTOMOUNT(inode)) + stat->information |= XSTAT_INFO_FABRICATED; + + stat->information |= XSTAT_INFO_REMOTE; + out: return err; } @@ -852,7 +877,7 @@ int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode) static int nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping) { struct nfs_inode *nfsi = NFS_I(inode); - + if (mapping->nrpages != 0) { int ret = invalidate_inode_pages2(mapping); if (ret < 0) diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 37412f7..faa652c 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -2222,6 +2222,7 @@ static int nfs_set_super(struct super_block *s, void *data) ret = set_anon_super(s, server); if (ret == 0) server->s_dev = s->s_dev; + memcpy(&s->s_volume_id, &server->fsid, sizeof(s->s_volume_id)); return ret; } ^ permalink raw reply related [flat|nested] 144+ messages in thread
[parent not found: <20120419140653.17272.95035.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>]
* Re: [PATCH 4/6] xstat: NFS: Return extended attributes 2012-04-19 14:06 ` [PATCH 4/6] xstat: NFS: " David Howells @ 2012-04-19 14:35 ` Myklebust, Trond 2012-04-26 13:52 ` David Howells 1 sibling, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-04-19 14:35 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5805 bytes --] On Thu, 2012-04-19 at 15:06 +0100, David Howells wrote: > Return extended attributes from the NFS filesystem. This includes the > following: > > (1) The change attribute as st_data_version if NFSv4. > > (2) XSTAT_INFO_AUTOMOUNT and XSTAT_INFO_FABRICATED are set on referral or > submount directories that are automounted upon. NFS shows one directory > with a different FSID, but the local filesystem has two: the mountpoint > directory and the root of the filesystem mounted upon it. > > (3) XSTAT_INFO_REMOTE is set on files acquired over NFS. > > Furthermore, what nfs_getattr() does can be controlled as follows: > > (1) If AT_FORCE_ATTR_SYNC is indicated, or mtime, ctime or data_version (NFSv4 > only) are requested then the outstanding writes will be written to the > server first. > > (2) The inode's attributes may be synchronised with the server: > > (a) If AT_FORCE_ATTR_SYNC is indicated or if atime is requested (and atime > updating is not suppressed by a mount flag) then the attributes will > be reread unconditionally. > > (b) If the data version or any of basic stats are requested then the > attributes will be reread if the cached attributes have expired. > > (c) Otherwise the cached attributes will be used - even if expired - > without reference to the server. Hmm... As far as I can see you are still doing an nfs_revalidate_inode() in the non-forced case. That will cause expired attributes to be retrieved from the server. > Signed-off-by: David Howells <dhowells@redhat.com> > --- > > fs/nfs/inode.c | 49 +++++++++++++++++++++++++++++++++++++------------ > fs/nfs/super.c | 1 + > 2 files changed, 38 insertions(+), 12 deletions(-) > > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c > index e8bbfa5..460fcf3 100644 > --- a/fs/nfs/inode.c > +++ b/fs/nfs/inode.c > @@ -509,11 +509,18 @@ void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr) > int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) > { > struct inode *inode = dentry->d_inode; > + unsigned force = stat->query_flags & AT_FORCE_ATTR_SYNC; > int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME; > int err; > > - /* Flush out writes to the server in order to update c/mtime. */ > - if (S_ISREG(inode->i_mode)) { > + if (NFS_SERVER(inode)->nfs_client->rpc_ops->version < 4) > + stat->request_mask &= ~XSTAT_VERSION; > + > + /* Flush out writes to the server in order to update c/mtime > + * or data version if the user wants them */ > + if ((force || (stat->request_mask & > + (XSTAT_MTIME | XSTAT_CTIME | XSTAT_VERSION))) && > + S_ISREG(inode->i_mode)) { > err = filemap_write_and_wait(inode->i_mapping); We can get rid of the filemap_write_and_wait() if the caller allows us to approximate m/ctime values. That would give a major speed-up for most stat() workloads. > if (err) > goto out; > @@ -528,18 +535,36 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) > * - NFS never sets MS_NOATIME or MS_NODIRATIME so there is > * no point in checking those. > */ > - if ((mnt->mnt_flags & MNT_NOATIME) || > - ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))) > + if (mnt->mnt_flags & MNT_NOATIME || > + (mnt->mnt_flags & MNT_NODIRATIME && S_ISDIR(inode->i_mode))) { > + stat->ioc_flags |= FS_NOATIME_FL; > + need_atime = 0; > + } else if (!(stat->request_mask & XSTAT_ATIME)) { > need_atime = 0; > + } > > - if (need_atime) > - err = __nfs_revalidate_inode(NFS_SERVER(inode), inode); > - else > - err = nfs_revalidate_inode(NFS_SERVER(inode), inode); > - if (!err) { > - generic_fillattr(inode, stat); > - stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode)); > + if (force || stat->request_mask & (XSTAT_BASIC_STATS | XSTAT_VERSION)) { > + if (force || need_atime) > + err = __nfs_revalidate_inode(NFS_SERVER(inode), inode); > + else > + err = nfs_revalidate_inode(NFS_SERVER(inode), inode); > + if (err) > + goto out; > } > + > + generic_fillattr(inode, stat); > + stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode)); > + > + if (stat->request_mask & XSTAT_VERSION) { > + stat->version = inode->i_version; > + stat->result_mask |= XSTAT_VERSION; > + } > + > + if (IS_AUTOMOUNT(inode)) > + stat->information |= XSTAT_INFO_FABRICATED; > + > + stat->information |= XSTAT_INFO_REMOTE; > + > out: > return err; > } > @@ -852,7 +877,7 @@ int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode) > static int nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping) > { > struct nfs_inode *nfsi = NFS_I(inode); > - > + > if (mapping->nrpages != 0) { > int ret = invalidate_inode_pages2(mapping); > if (ret < 0) > diff --git a/fs/nfs/super.c b/fs/nfs/super.c > index 37412f7..faa652c 100644 > --- a/fs/nfs/super.c > +++ b/fs/nfs/super.c > @@ -2222,6 +2222,7 @@ static int nfs_set_super(struct super_block *s, void *data) > ret = set_anon_super(s, server); > if (ret == 0) > server->s_dev = s->s_dev; > + memcpy(&s->s_volume_id, &server->fsid, sizeof(s->s_volume_id)); > return ret; > } > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±û"Ø^nr¡ö¦zË\x1aëh¨èÚ&¢îý»\x05ËÛÔØï¦v¬Îf\x1dp)¹¹br ê+Ê+zf£¢·h§~Ûiÿûàz¹\x1e®w¥¢¸?¨èÚ&¢)ߢ^[f ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 4/6] xstat: NFS: Return extended attributes @ 2012-04-19 14:35 ` Myklebust, Trond 0 siblings, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-04-19 14:35 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha T24gVGh1LCAyMDEyLTA0LTE5IGF0IDE1OjA2ICswMTAwLCBEYXZpZCBIb3dlbGxzIHdyb3RlOg0K PiBSZXR1cm4gZXh0ZW5kZWQgYXR0cmlidXRlcyBmcm9tIHRoZSBORlMgZmlsZXN5c3RlbS4gIFRo aXMgaW5jbHVkZXMgdGhlDQo+IGZvbGxvd2luZzoNCj4gDQo+ICAoMSkgVGhlIGNoYW5nZSBhdHRy aWJ1dGUgYXMgc3RfZGF0YV92ZXJzaW9uIGlmIE5GU3Y0Lg0KPiANCj4gICgyKSBYU1RBVF9JTkZP X0FVVE9NT1VOVCBhbmQgWFNUQVRfSU5GT19GQUJSSUNBVEVEIGFyZSBzZXQgb24gcmVmZXJyYWwg b3INCj4gICAgICBzdWJtb3VudCBkaXJlY3RvcmllcyB0aGF0IGFyZSBhdXRvbW91bnRlZCB1cG9u LiAgTkZTIHNob3dzIG9uZSBkaXJlY3RvcnkNCj4gICAgICB3aXRoIGEgZGlmZmVyZW50IEZTSUQs IGJ1dCB0aGUgbG9jYWwgZmlsZXN5c3RlbSBoYXMgdHdvOiB0aGUgbW91bnRwb2ludA0KPiAgICAg IGRpcmVjdG9yeSBhbmQgdGhlIHJvb3Qgb2YgdGhlIGZpbGVzeXN0ZW0gbW91bnRlZCB1cG9uIGl0 Lg0KPiANCj4gICgzKSBYU1RBVF9JTkZPX1JFTU9URSBpcyBzZXQgb24gZmlsZXMgYWNxdWlyZWQg b3ZlciBORlMuDQo+IA0KPiBGdXJ0aGVybW9yZSwgd2hhdCBuZnNfZ2V0YXR0cigpIGRvZXMgY2Fu IGJlIGNvbnRyb2xsZWQgYXMgZm9sbG93czoNCj4gDQo+ICAoMSkgSWYgQVRfRk9SQ0VfQVRUUl9T WU5DIGlzIGluZGljYXRlZCwgb3IgbXRpbWUsIGN0aW1lIG9yIGRhdGFfdmVyc2lvbiAoTkZTdjQN Cj4gICAgICBvbmx5KSBhcmUgcmVxdWVzdGVkIHRoZW4gdGhlIG91dHN0YW5kaW5nIHdyaXRlcyB3 aWxsIGJlIHdyaXR0ZW4gdG8gdGhlDQo+ICAgICAgc2VydmVyIGZpcnN0Lg0KPiANCj4gICgyKSBU aGUgaW5vZGUncyBhdHRyaWJ1dGVzIG1heSBiZSBzeW5jaHJvbmlzZWQgd2l0aCB0aGUgc2VydmVy Og0KPiANCj4gICAgICAoYSkgSWYgQVRfRk9SQ0VfQVRUUl9TWU5DIGlzIGluZGljYXRlZCBvciBp ZiBhdGltZSBpcyByZXF1ZXN0ZWQgKGFuZCBhdGltZQ0KPiAgICAgIAkgdXBkYXRpbmcgaXMgbm90 IHN1cHByZXNzZWQgYnkgYSBtb3VudCBmbGFnKSB0aGVuIHRoZSBhdHRyaWJ1dGVzIHdpbGwNCj4g ICAgICAJIGJlIHJlcmVhZCB1bmNvbmRpdGlvbmFsbHkuDQo+IA0KPiAgICAgIChiKSBJZiB0aGUg ZGF0YSB2ZXJzaW9uIG9yIGFueSBvZiBiYXNpYyBzdGF0cyBhcmUgcmVxdWVzdGVkIHRoZW4gdGhl DQo+ICAgICAgCSBhdHRyaWJ1dGVzIHdpbGwgYmUgcmVyZWFkIGlmIHRoZSBjYWNoZWQgYXR0cmli dXRlcyBoYXZlIGV4cGlyZWQuDQo+IA0KPiAgICAgIChjKSBPdGhlcndpc2UgdGhlIGNhY2hlZCBh dHRyaWJ1dGVzIHdpbGwgYmUgdXNlZCAtIGV2ZW4gaWYgZXhwaXJlZCAtDQo+ICAgICAgCSB3aXRo b3V0IHJlZmVyZW5jZSB0byB0aGUgc2VydmVyLg0KDQpIbW0uLi4gQXMgZmFyIGFzIEkgY2FuIHNl ZSB5b3UgYXJlIHN0aWxsIGRvaW5nIGFuIG5mc19yZXZhbGlkYXRlX2lub2RlKCkNCmluIHRoZSBu b24tZm9yY2VkIGNhc2UuIFRoYXQgd2lsbCBjYXVzZSBleHBpcmVkIGF0dHJpYnV0ZXMgdG8gYmUN CnJldHJpZXZlZCBmcm9tIHRoZSBzZXJ2ZXIuDQoNCj4gU2lnbmVkLW9mZi1ieTogRGF2aWQgSG93 ZWxscyA8ZGhvd2VsbHNAcmVkaGF0LmNvbT4NCj4gLS0tDQo+IA0KPiAgZnMvbmZzL2lub2RlLmMg fCAgIDQ5ICsrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKystLS0tLS0tLS0tLS0N Cj4gIGZzL25mcy9zdXBlci5jIHwgICAgMSArDQo+ICAyIGZpbGVzIGNoYW5nZWQsIDM4IGluc2Vy dGlvbnMoKyksIDEyIGRlbGV0aW9ucygtKQ0KPiANCj4gZGlmZiAtLWdpdCBhL2ZzL25mcy9pbm9k ZS5jIGIvZnMvbmZzL2lub2RlLmMNCj4gaW5kZXggZThiYmZhNS4uNDYwZmNmMyAxMDA2NDQNCj4g LS0tIGEvZnMvbmZzL2lub2RlLmMNCj4gKysrIGIvZnMvbmZzL2lub2RlLmMNCj4gQEAgLTUwOSwx MSArNTA5LDE4IEBAIHZvaWQgbmZzX3NldGF0dHJfdXBkYXRlX2lub2RlKHN0cnVjdCBpbm9kZSAq aW5vZGUsIHN0cnVjdCBpYXR0ciAqYXR0cikNCj4gIGludCBuZnNfZ2V0YXR0cihzdHJ1Y3QgdmZz bW91bnQgKm1udCwgc3RydWN0IGRlbnRyeSAqZGVudHJ5LCBzdHJ1Y3Qga3N0YXQgKnN0YXQpDQo+ ICB7DQo+ICAJc3RydWN0IGlub2RlICppbm9kZSA9IGRlbnRyeS0+ZF9pbm9kZTsNCj4gKwl1bnNp Z25lZCBmb3JjZSA9IHN0YXQtPnF1ZXJ5X2ZsYWdzICYgQVRfRk9SQ0VfQVRUUl9TWU5DOw0KPiAg CWludCBuZWVkX2F0aW1lID0gTkZTX0koaW5vZGUpLT5jYWNoZV92YWxpZGl0eSAmIE5GU19JTk9f SU5WQUxJRF9BVElNRTsNCj4gIAlpbnQgZXJyOw0KPiAgDQo+IC0JLyogRmx1c2ggb3V0IHdyaXRl cyB0byB0aGUgc2VydmVyIGluIG9yZGVyIHRvIHVwZGF0ZSBjL210aW1lLiAgKi8NCj4gLQlpZiAo U19JU1JFRyhpbm9kZS0+aV9tb2RlKSkgew0KPiArCWlmIChORlNfU0VSVkVSKGlub2RlKS0+bmZz X2NsaWVudC0+cnBjX29wcy0+dmVyc2lvbiA8IDQpDQo+ICsJCXN0YXQtPnJlcXVlc3RfbWFzayAm PSB+WFNUQVRfVkVSU0lPTjsNCj4gKw0KPiArCS8qIEZsdXNoIG91dCB3cml0ZXMgdG8gdGhlIHNl cnZlciBpbiBvcmRlciB0byB1cGRhdGUgYy9tdGltZQ0KPiArCSAqIG9yIGRhdGEgdmVyc2lvbiBp ZiB0aGUgdXNlciB3YW50cyB0aGVtICovDQo+ICsJaWYgKChmb3JjZSB8fCAoc3RhdC0+cmVxdWVz dF9tYXNrICYNCj4gKwkJICAgICAgIChYU1RBVF9NVElNRSB8IFhTVEFUX0NUSU1FIHwgWFNUQVRf VkVSU0lPTikpKSAmJg0KPiArCSAgICBTX0lTUkVHKGlub2RlLT5pX21vZGUpKSB7DQo+ICAJCWVy ciA9IGZpbGVtYXBfd3JpdGVfYW5kX3dhaXQoaW5vZGUtPmlfbWFwcGluZyk7DQoNCldlIGNhbiBn ZXQgcmlkIG9mIHRoZSBmaWxlbWFwX3dyaXRlX2FuZF93YWl0KCkgaWYgdGhlIGNhbGxlciBhbGxv d3MgdXMNCnRvIGFwcHJveGltYXRlIG0vY3RpbWUgdmFsdWVzLiBUaGF0IHdvdWxkIGdpdmUgYSBt YWpvciBzcGVlZC11cCBmb3IgbW9zdA0Kc3RhdCgpIHdvcmtsb2Fkcy4NCg0KPiAgCQlpZiAoZXJy KQ0KPiAgCQkJZ290byBvdXQ7DQo+IEBAIC01MjgsMTggKzUzNSwzNiBAQCBpbnQgbmZzX2dldGF0 dHIoc3RydWN0IHZmc21vdW50ICptbnQsIHN0cnVjdCBkZW50cnkgKmRlbnRyeSwgc3RydWN0IGtz dGF0ICpzdGF0KQ0KPiAgCSAqICAtIE5GUyBuZXZlciBzZXRzIE1TX05PQVRJTUUgb3IgTVNfTk9E SVJBVElNRSBzbyB0aGVyZSBpcw0KPiAgCSAqICAgIG5vIHBvaW50IGluIGNoZWNraW5nIHRob3Nl Lg0KPiAgCSAqLw0KPiAtIAlpZiAoKG1udC0+bW50X2ZsYWdzICYgTU5UX05PQVRJTUUpIHx8DQo+ IC0gCSAgICAoKG1udC0+bW50X2ZsYWdzICYgTU5UX05PRElSQVRJTUUpICYmIFNfSVNESVIoaW5v ZGUtPmlfbW9kZSkpKQ0KPiArCWlmIChtbnQtPm1udF9mbGFncyAmIE1OVF9OT0FUSU1FIHx8DQo+ ICsJICAgIChtbnQtPm1udF9mbGFncyAmIE1OVF9OT0RJUkFUSU1FICYmIFNfSVNESVIoaW5vZGUt PmlfbW9kZSkpKSB7DQo+ICsJCXN0YXQtPmlvY19mbGFncyB8PSBGU19OT0FUSU1FX0ZMOw0KPiAr CQluZWVkX2F0aW1lID0gMDsNCj4gKwl9IGVsc2UgaWYgKCEoc3RhdC0+cmVxdWVzdF9tYXNrICYg WFNUQVRfQVRJTUUpKSB7DQo+ICAJCW5lZWRfYXRpbWUgPSAwOw0KPiArCX0NCj4gIA0KPiAtCWlm IChuZWVkX2F0aW1lKQ0KPiAtCQllcnIgPSBfX25mc19yZXZhbGlkYXRlX2lub2RlKE5GU19TRVJW RVIoaW5vZGUpLCBpbm9kZSk7DQo+IC0JZWxzZQ0KPiAtCQllcnIgPSBuZnNfcmV2YWxpZGF0ZV9p bm9kZShORlNfU0VSVkVSKGlub2RlKSwgaW5vZGUpOw0KPiAtCWlmICghZXJyKSB7DQo+IC0JCWdl bmVyaWNfZmlsbGF0dHIoaW5vZGUsIHN0YXQpOw0KPiAtCQlzdGF0LT5pbm8gPSBuZnNfY29tcGF0 X3VzZXJfaW5vNjQoTkZTX0ZJTEVJRChpbm9kZSkpOw0KPiArCWlmIChmb3JjZSB8fCBzdGF0LT5y ZXF1ZXN0X21hc2sgJiAoWFNUQVRfQkFTSUNfU1RBVFMgfCBYU1RBVF9WRVJTSU9OKSkgew0KPiAr CQlpZiAoZm9yY2UgfHwgbmVlZF9hdGltZSkNCj4gKwkJCWVyciA9IF9fbmZzX3JldmFsaWRhdGVf aW5vZGUoTkZTX1NFUlZFUihpbm9kZSksIGlub2RlKTsNCj4gKwkJZWxzZQ0KPiArCQkJZXJyID0g bmZzX3JldmFsaWRhdGVfaW5vZGUoTkZTX1NFUlZFUihpbm9kZSksIGlub2RlKTsNCj4gKwkJaWYg KGVycikNCj4gKwkJCWdvdG8gb3V0Ow0KPiAgCX0NCj4gKw0KPiArCWdlbmVyaWNfZmlsbGF0dHIo aW5vZGUsIHN0YXQpOw0KPiArCXN0YXQtPmlubyA9IG5mc19jb21wYXRfdXNlcl9pbm82NChORlNf RklMRUlEKGlub2RlKSk7DQo+ICsNCj4gKwlpZiAoc3RhdC0+cmVxdWVzdF9tYXNrICYgWFNUQVRf VkVSU0lPTikgew0KPiArCQlzdGF0LT52ZXJzaW9uID0gaW5vZGUtPmlfdmVyc2lvbjsNCj4gKwkJ c3RhdC0+cmVzdWx0X21hc2sgfD0gWFNUQVRfVkVSU0lPTjsNCj4gKwl9DQo+ICsNCj4gKwlpZiAo SVNfQVVUT01PVU5UKGlub2RlKSkNCj4gKwkJc3RhdC0+aW5mb3JtYXRpb24gfD0gWFNUQVRfSU5G T19GQUJSSUNBVEVEOw0KPiArDQo+ICsJc3RhdC0+aW5mb3JtYXRpb24gfD0gWFNUQVRfSU5GT19S RU1PVEU7DQo+ICsNCj4gIG91dDoNCj4gIAlyZXR1cm4gZXJyOw0KPiAgfQ0KPiBAQCAtODUyLDcg Kzg3Nyw3IEBAIGludCBuZnNfcmV2YWxpZGF0ZV9pbm9kZShzdHJ1Y3QgbmZzX3NlcnZlciAqc2Vy dmVyLCBzdHJ1Y3QgaW5vZGUgKmlub2RlKQ0KPiAgc3RhdGljIGludCBuZnNfaW52YWxpZGF0ZV9t YXBwaW5nKHN0cnVjdCBpbm9kZSAqaW5vZGUsIHN0cnVjdCBhZGRyZXNzX3NwYWNlICptYXBwaW5n KQ0KPiAgew0KPiAgCXN0cnVjdCBuZnNfaW5vZGUgKm5mc2kgPSBORlNfSShpbm9kZSk7DQo+IC0J DQo+ICsNCj4gIAlpZiAobWFwcGluZy0+bnJwYWdlcyAhPSAwKSB7DQo+ICAJCWludCByZXQgPSBp bnZhbGlkYXRlX2lub2RlX3BhZ2VzMihtYXBwaW5nKTsNCj4gIAkJaWYgKHJldCA8IDApDQo+IGRp ZmYgLS1naXQgYS9mcy9uZnMvc3VwZXIuYyBiL2ZzL25mcy9zdXBlci5jDQo+IGluZGV4IDM3NDEy ZjcuLmZhYTY1MmMgMTAwNjQ0DQo+IC0tLSBhL2ZzL25mcy9zdXBlci5jDQo+ICsrKyBiL2ZzL25m cy9zdXBlci5jDQo+IEBAIC0yMjIyLDYgKzIyMjIsNyBAQCBzdGF0aWMgaW50IG5mc19zZXRfc3Vw ZXIoc3RydWN0IHN1cGVyX2Jsb2NrICpzLCB2b2lkICpkYXRhKQ0KPiAgCXJldCA9IHNldF9hbm9u X3N1cGVyKHMsIHNlcnZlcik7DQo+ICAJaWYgKHJldCA9PSAwKQ0KPiAgCQlzZXJ2ZXItPnNfZGV2 ID0gcy0+c19kZXY7DQo+ICsJbWVtY3B5KCZzLT5zX3ZvbHVtZV9pZCwgJnNlcnZlci0+ZnNpZCwg c2l6ZW9mKHMtPnNfdm9sdW1lX2lkKSk7DQo+ICAJcmV0dXJuIHJldDsNCj4gIH0NCj4gIA0KPiAN Cj4gLS0NCj4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVu c3Vic2NyaWJlIGxpbnV4LW5mcyIgaW4NCj4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1ham9y ZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCj4gTW9yZSBtYWpvcmRvbW8gaW5mbyBhdCAgaHR0cDovL3Zn ZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1sDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0 DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RA bmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 4/6] xstat: NFS: Return extended attributes 2012-04-19 14:06 ` [PATCH 4/6] xstat: NFS: " David Howells [not found] ` <20120419140653.17272.95035.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> @ 2012-04-26 13:52 ` David Howells 1 sibling, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-26 13:52 UTC (permalink / raw) To: Myklebust, Trond Cc: dhowells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > Hmm... As far as I can see you are still doing an nfs_revalidate_inode() > in the non-forced case. That will cause expired attributes to be > retrieved from the server. Revalidation is only done when you force it or explicitly ask for a basic stat or the data version number: - if (need_atime) - err = __nfs_revalidate_inode(NFS_SERVER(inode), inode); - else - err = nfs_revalidate_inode(NFS_SERVER(inode), inode); - if (!err) { - generic_fillattr(inode, stat); - stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode)); + if (force || stat->request_mask & (XSTAT_BASIC_STATS | XSTAT_VERSION)) { + if (force || need_atime) + err = __nfs_revalidate_inode(NFS_SERVER(inode), inode); + else + err = nfs_revalidate_inode(NFS_SERVER(inode), inode); + if (err) + goto out; Unfortunately, I think I have to revalidate if any of XSTAT_BASIC_STATS are requested to maintain compatibility with stat() so that stat() can be done with xstat(). On the other hand, stat() could be done by userspace with xstat() and AT_FORCE_ATTR_SYNC, I suppose. David ^ permalink raw reply [flat|nested] 144+ messages in thread
* [PATCH 5/6] xstat: CIFS: Return extended attributes 2012-04-19 14:05 ` David Howells ` (2 preceding siblings ...) (?) @ 2012-04-19 14:07 ` David Howells [not found] ` <20120419140706.17272.72290.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> -1 siblings, 1 reply; 144+ messages in thread From: David Howells @ 2012-04-19 14:07 UTC (permalink / raw) To: linux-fsdevel Cc: dhowells, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Return extended attributes from the CIFS filesystem. This includes the following: (1) Return the file creation time as btime. We assume that the creation time won't change over the life of the inode. (2) Set XSTAT_INFO_AUTOMOUNT on referral/submount directories. (3) Unset XSTAT_INO if we made up the inode number and didn't get it from the server. (4) Unset XSTAT_[UG]ID if we are either returning values passed to mount and/or the server doesn't return them. (5) Map various Windows file attributes to FS_xxx_FL flags in st_ioc_flags and XSTAT_INFO_xxx flags in st_information, fetching them from the server if we don't have them yet or don't have a current copy. Possibly things like Hidden, System and Archive should be FS_xxx_FL flags rather than XSTAT_INFO_xxx flags and st_ioc_flags should be expanded to 64 bits. (6) Set XSTAT_INFO_REMOTE on all files fetched by CIFS. (7) Set XSTAT_INFO_NONSYSTEM_OWNERSHIP on all files as they all have Windows ownership details too. (8) Set XSTAT_INFO_HAS_ACL if CONFIG_CIFS_ACL=y as Windows ACLs are available on the object. Furthermore, what cifs_getattr() does can be controlled as follows: (1) If AT_FORCE_ATTR_SYNC is indicated, or if the inode flags or creation time are requested but not yet collected, then the attributes will be reread unconditionally. (2) If the basic stats are requested or if the inode flags are requested and have been collected previously, then the attributes will be reread if out of date. (3) Otherwise the cached attributes will be used - even if expired - without reference to the server. Note that cifs_revalidate_dentry() will issue an extra operation to get the FILE_ALL_INFO in addition to the FILE_UNIX_BASIC_INFO if it needs to collect creation time and attributes on behalf of cifs_getattr(). [NOTE: THIS PATCH IS UNTESTED!] Signed-off-by: David Howells <dhowells@redhat.com> --- fs/cifs/cifsfs.h | 4 +- fs/cifs/cifsglob.h | 16 +++++-- fs/cifs/dir.c | 2 - fs/cifs/inode.c | 120 +++++++++++++++++++++++++++++++++++++++++++++------- 4 files changed, 118 insertions(+), 24 deletions(-) diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index d1389bb..021e327 100644 --- a/fs/cifs/cifsfs.h +++ b/fs/cifs/cifsfs.h @@ -56,9 +56,9 @@ extern int cifs_rmdir(struct inode *, struct dentry *); extern int cifs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *); extern int cifs_revalidate_file_attr(struct file *filp); -extern int cifs_revalidate_dentry_attr(struct dentry *); +extern int cifs_revalidate_dentry_attr(struct dentry *, bool, bool); extern int cifs_revalidate_file(struct file *filp); -extern int cifs_revalidate_dentry(struct dentry *); +extern int cifs_revalidate_dentry(struct dentry *, bool, bool); extern int cifs_invalidate_mapping(struct inode *inode); extern int cifs_getattr(struct vfsmount *, struct dentry *, struct kstat *); extern int cifs_setattr(struct dentry *, struct iattr *); diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 4ff6313..d3567da 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -621,11 +621,15 @@ struct cifsInodeInfo { /* BB add in lists for dirty pages i.e. write caching info for oplock */ struct list_head openFileList; __u32 cifsAttrs; /* e.g. DOS archive bit, sparse, compressed, system */ - bool clientCanCacheRead; /* read oplock */ - bool clientCanCacheAll; /* read and writebehind oplock */ - bool delete_pending; /* DELETE_ON_CLOSE is set */ - bool invalid_mapping; /* pagecache is invalid */ + bool clientCanCacheRead:1; /* read oplock */ + bool clientCanCacheAll:1; /* read and writebehind oplock */ + bool delete_pending:1; /* DELETE_ON_CLOSE is set */ + bool invalid_mapping:1; /* pagecache is invalid */ + bool btime_valid:1; /* stored creation time is valid */ + bool uid_faked:1; /* true if i_uid is faked */ + bool gid_faked:1; /* true if i_gid is faked */ unsigned long time; /* jiffies of last update of inode */ + struct timespec btime; /* creation time */ u64 server_eof; /* current file size on server -- protected by i_lock */ u64 uniqueid; /* server inode number */ u64 createtime; /* creation time on server */ @@ -833,6 +837,9 @@ struct dfs_info3_param { #define CIFS_FATTR_DELETE_PENDING 0x2 #define CIFS_FATTR_NEED_REVAL 0x4 #define CIFS_FATTR_INO_COLLISION 0x8 +#define CIFS_FATTR_WINATTRS_VALID 0x10 /* T if cf_btime and cf_cifsattrs valid */ +#define CIFS_FATTR_UID_FAKED 0x20 /* T if cf_uid is faked */ +#define CIFS_FATTR_GID_FAKED 0x40 /* T if cf_gid is faked */ struct cifs_fattr { u32 cf_flags; @@ -850,6 +857,7 @@ struct cifs_fattr { struct timespec cf_atime; struct timespec cf_mtime; struct timespec cf_ctime; + struct timespec cf_btime; }; static inline void free_dfs_info_param(struct dfs_info3_param *param) diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c index d172c8e..d9e03ae 100644 --- a/fs/cifs/dir.c +++ b/fs/cifs/dir.c @@ -664,7 +664,7 @@ cifs_d_revalidate(struct dentry *direntry, struct nameidata *nd) return -ECHILD; if (direntry->d_inode) { - if (cifs_revalidate_dentry(direntry)) + if (cifs_revalidate_dentry(direntry, false, false)) return 0; else { /* diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index 745da3d..662d5ce 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -135,13 +135,21 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr) set_nlink(inode, fattr->cf_nlink); inode->i_uid = fattr->cf_uid; inode->i_gid = fattr->cf_gid; + if (fattr->cf_flags & CIFS_FATTR_UID_FAKED) + cifs_i->uid_faked = true; + if (fattr->cf_flags & CIFS_FATTR_GID_FAKED) + cifs_i->gid_faked = true; /* if dynperm is set, don't clobber existing mode */ if (inode->i_state & I_NEW || !(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DYNPERM)) inode->i_mode = fattr->cf_mode; - cifs_i->cifsAttrs = fattr->cf_cifsattrs; + if (fattr->cf_flags & CIFS_FATTR_WINATTRS_VALID) { + cifs_i->cifsAttrs = fattr->cf_cifsattrs; + cifs_i->btime = fattr->cf_btime; + cifs_i->btime_valid = true; + } if (fattr->cf_flags & CIFS_FATTR_NEED_REVAL) cifs_i->time = 0; @@ -248,15 +256,19 @@ cifs_unix_basic_to_fattr(struct cifs_fattr *fattr, FILE_UNIX_BASIC_INFO *info, break; } - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID) + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID) { fattr->cf_uid = cifs_sb->mnt_uid; - else + fattr->cf_flags |= CIFS_FATTR_UID_FAKED; + } else { fattr->cf_uid = le64_to_cpu(info->Uid); + } - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID) + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID) { fattr->cf_gid = cifs_sb->mnt_gid; - else + fattr->cf_flags |= CIFS_FATTR_GID_FAKED; + } else { fattr->cf_gid = le64_to_cpu(info->Gid); + } fattr->cf_nlink = le64_to_cpu(info->Nlinks); } @@ -283,7 +295,8 @@ cifs_create_dfs_fattr(struct cifs_fattr *fattr, struct super_block *sb) fattr->cf_ctime = CURRENT_TIME; fattr->cf_mtime = CURRENT_TIME; fattr->cf_nlink = 2; - fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL; + fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL | + CIFS_FATTR_UID_FAKED | CIFS_FATTR_GID_FAKED; } int cifs_get_file_info_unix(struct file *filp) @@ -510,6 +523,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); memset(fattr, 0, sizeof(*fattr)); + fattr->cf_flags = CIFS_FATTR_WINATTRS_VALID; fattr->cf_cifsattrs = le32_to_cpu(info->Attributes); if (info->DeletePending) fattr->cf_flags |= CIFS_FATTR_DELETE_PENDING; @@ -521,6 +535,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, fattr->cf_ctime = cifs_NTtimeToUnix(info->ChangeTime); fattr->cf_mtime = cifs_NTtimeToUnix(info->LastWriteTime); + fattr->cf_btime = cifs_NTtimeToUnix(info->CreationTime); if (adjust_tz) { fattr->cf_ctime.tv_sec += tcon->ses->server->timeAdj; @@ -1724,7 +1739,8 @@ int cifs_revalidate_file_attr(struct file *filp) return rc; } -int cifs_revalidate_dentry_attr(struct dentry *dentry) +int cifs_revalidate_dentry_attr(struct dentry *dentry, + bool want_extra_bits, bool force) { int xid; int rc = 0; @@ -1735,7 +1751,7 @@ int cifs_revalidate_dentry_attr(struct dentry *dentry) if (inode == NULL) return -ENOENT; - if (!cifs_inode_needs_reval(inode)) + if (!force && !cifs_inode_needs_reval(inode)) return rc; xid = GetXid(); @@ -1752,9 +1768,12 @@ int cifs_revalidate_dentry_attr(struct dentry *dentry) "%ld jiffies %ld", full_path, inode, inode->i_count.counter, dentry, dentry->d_time, jiffies); - if (cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext) + if (cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext) { rc = cifs_get_inode_info_unix(&inode, full_path, sb, xid); - else + if (rc != 0) + goto out; + } + if (!cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext || want_extra_bits) rc = cifs_get_inode_info(&inode, full_path, NULL, sb, xid, NULL); @@ -1779,12 +1798,13 @@ int cifs_revalidate_file(struct file *filp) } /* revalidate a dentry's inode attributes */ -int cifs_revalidate_dentry(struct dentry *dentry) +int cifs_revalidate_dentry(struct dentry *dentry, + bool want_extra_bits, bool force) { int rc; struct inode *inode = dentry->d_inode; - rc = cifs_revalidate_dentry_attr(dentry); + rc = cifs_revalidate_dentry_attr(dentry, want_extra_bits, force); if (rc) return rc; @@ -1796,11 +1816,30 @@ int cifs_revalidate_dentry(struct dentry *dentry) int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) { + struct cifsInodeInfo *cifs_i = CIFS_I(dentry->d_inode); struct cifs_sb_info *cifs_sb = CIFS_SB(dentry->d_sb); struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); struct inode *inode = dentry->d_inode; + unsigned force = stat->query_flags & AT_FORCE_ATTR_SYNC; + bool want_extra_bits = false; + u32 info, ioc = 0; + u32 attrs; int rc; + if (cifs_i->uid_faked) + stat->request_mask &= ~XSTAT_UID; + if (cifs_i->gid_faked) + stat->request_mask &= ~XSTAT_GID; + + if (stat->request_mask & XSTAT_BTIME && !cifs_i->btime_valid) { + want_extra_bits = true; + force = true; + } + if (stat->request_mask & XSTAT_IOC_FLAGS) { + want_extra_bits = true; + force = true; + } + /* * We need to be sure that all dirty pages are written and the server * has actual ctime, mtime and file length. @@ -1814,13 +1853,14 @@ int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry, } } - rc = cifs_revalidate_dentry_attr(dentry); - if (rc) - return rc; + if (force || stat->request_mask & XSTAT_BASIC_STATS) { + rc = cifs_revalidate_dentry(dentry, want_extra_bits, force); + if (rc) + return rc; + } generic_fillattr(inode, stat); stat->blksize = CIFS_MAX_MSGSIZE; - stat->ino = CIFS_I(inode)->uniqueid; /* * If on a multiuser mount without unix extensions, and the admin hasn't @@ -1834,7 +1874,53 @@ int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry, if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID)) stat->gid = current_fsgid(); } - return rc; + + info = XSTAT_INFO_REMOTE | XSTAT_INFO_NONSYSTEM_OWNERSHIP; +#ifdef CONFIG_CIFS_ACL + info |= XSTAT_INFO_HAS_ACL; +#endif + + if (cifs_i->btime_valid) { + stat->btime = cifs_i->btime; + stat->result_mask |= XSTAT_BTIME; + } + + /* We don't promise an inode number if we made one up */ + stat->ino = cifs_i->uniqueid; + if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM)) + stat->result_mask &= ~XSTAT_INO; + + /* + * If on a multiuser mount without unix extensions, and the admin + * hasn't overridden them, set the ownership to the fsuid/fsgid of the + * current process. + */ + if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER) && + !tcon->unix_ext) { + if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID)) + stat->uid = current_fsuid(); + if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID)) + stat->gid = current_fsgid(); + } + if (cifs_i->uid_faked) + stat->result_mask &= ~XSTAT_UID; + if (cifs_i->gid_faked) + stat->result_mask &= ~XSTAT_GID; + + attrs = cifs_i->cifsAttrs; + if (attrs & ATTR_HIDDEN) info |= XSTAT_INFO_HIDDEN; + if (attrs & ATTR_SYSTEM) info |= XSTAT_INFO_SYSTEM; + if (attrs & ATTR_ARCHIVE) info |= XSTAT_INFO_ARCHIVE; + if (attrs & ATTR_TEMPORARY) info |= XSTAT_INFO_TEMPORARY; + if (attrs & ATTR_REPARSE) info |= XSTAT_INFO_REPARSE_POINT; + if (attrs & ATTR_OFFLINE) info |= XSTAT_INFO_OFFLINE; + if (attrs & ATTR_ENCRYPTED) info |= XSTAT_INFO_ENCRYPTED; + stat->information |= info; + + if (attrs & ATTR_READONLY) ioc |= FS_IMMUTABLE_FL; + if (attrs & ATTR_COMPRESSED) ioc |= FS_COMPR_FL; + stat->ioc_flags |= ioc; + return 0; } static int cifs_truncate_page(struct address_space *mapping, loff_t from) ^ permalink raw reply related [flat|nested] 144+ messages in thread
[parent not found: <20120419140706.17272.72290.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>]
* Re: [PATCH 5/6] xstat: CIFS: Return extended attributes 2012-04-19 14:07 ` [PATCH 5/6] xstat: CIFS: " David Howells @ 2012-04-19 15:19 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-19 15:19 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw For some of our users this would help A LOT. Interesting ... just had discussions yesterday with some guys trying to migrate to Linux and another set trying to backup Windows/NetApp from Linux. Some things they brought up that they needed (beyond what we already have wth the cifs acl and SID and "dos attributes" xattrs, which is even more useful now with the backup intent cifs mount flag) included: - how do they tell if the inode number for a file was manufactured on the client, or whether we were able to use the server file's inode number ("UniqueId") - how to get birth time (creation time) - how to tell if file is "offline" (HSM) - And is there a way to return the other less common cifs attributes (e.g. "reparse") Dave's patch seems to address all of that. Samba server stuffs most of this in an ndr encoded xattr blob which isn't much good for kernel code to use, and I really prefer Dave's approach. Without this, I would need to add another cifs specific ioctl, but since there is significant overlap between some of these and ntfs, vfat, nfs etc. I like the xstat idea better. On Thu, Apr 19, 2012 at 9:07 AM, David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > Return extended attributes from the CIFS filesystem. This includes the > following: > > (1) Return the file creation time as btime. We assume that the creation time > won't change over the life of the inode. > > (2) Set XSTAT_INFO_AUTOMOUNT on referral/submount directories. > > (3) Unset XSTAT_INO if we made up the inode number and didn't get it from the > server. > > (4) Unset XSTAT_[UG]ID if we are either returning values passed to mount > and/or the server doesn't return them. > > (5) Map various Windows file attributes to FS_xxx_FL flags in st_ioc_flags > and XSTAT_INFO_xxx flags in st_information, fetching them from the server > if we don't have them yet or don't have a current copy. > > Possibly things like Hidden, System and Archive should be FS_xxx_FL flags > rather than XSTAT_INFO_xxx flags and st_ioc_flags should be expanded to > 64 bits. > > (6) Set XSTAT_INFO_REMOTE on all files fetched by CIFS. > > (7) Set XSTAT_INFO_NONSYSTEM_OWNERSHIP on all files as they all have Windows > ownership details too. > > (8) Set XSTAT_INFO_HAS_ACL if CONFIG_CIFS_ACL=y as Windows ACLs are available > on the object. > > Furthermore, what cifs_getattr() does can be controlled as follows: > > (1) If AT_FORCE_ATTR_SYNC is indicated, or if the inode flags or creation time > are requested but not yet collected, then the attributes will be reread > unconditionally. > > (2) If the basic stats are requested or if the inode flags are requested and > have been collected previously, then the attributes will be reread if out > of date. > > (3) Otherwise the cached attributes will be used - even if expired - without > reference to the server. > > Note that cifs_revalidate_dentry() will issue an extra operation to get the > FILE_ALL_INFO in addition to the FILE_UNIX_BASIC_INFO if it needs to collect > creation time and attributes on behalf of cifs_getattr(). > > [NOTE: THIS PATCH IS UNTESTED!] > > Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > --- > > fs/cifs/cifsfs.h | 4 +- > fs/cifs/cifsglob.h | 16 +++++-- > fs/cifs/dir.c | 2 - > fs/cifs/inode.c | 120 +++++++++++++++++++++++++++++++++++++++++++++------- > 4 files changed, 118 insertions(+), 24 deletions(-) > > diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h > index d1389bb..021e327 100644 > --- a/fs/cifs/cifsfs.h > +++ b/fs/cifs/cifsfs.h > @@ -56,9 +56,9 @@ extern int cifs_rmdir(struct inode *, struct dentry *); > extern int cifs_rename(struct inode *, struct dentry *, struct inode *, > struct dentry *); > extern int cifs_revalidate_file_attr(struct file *filp); > -extern int cifs_revalidate_dentry_attr(struct dentry *); > +extern int cifs_revalidate_dentry_attr(struct dentry *, bool, bool); > extern int cifs_revalidate_file(struct file *filp); > -extern int cifs_revalidate_dentry(struct dentry *); > +extern int cifs_revalidate_dentry(struct dentry *, bool, bool); > extern int cifs_invalidate_mapping(struct inode *inode); > extern int cifs_getattr(struct vfsmount *, struct dentry *, struct kstat *); > extern int cifs_setattr(struct dentry *, struct iattr *); > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h > index 4ff6313..d3567da 100644 > --- a/fs/cifs/cifsglob.h > +++ b/fs/cifs/cifsglob.h > @@ -621,11 +621,15 @@ struct cifsInodeInfo { > /* BB add in lists for dirty pages i.e. write caching info for oplock */ > struct list_head openFileList; > __u32 cifsAttrs; /* e.g. DOS archive bit, sparse, compressed, system */ > - bool clientCanCacheRead; /* read oplock */ > - bool clientCanCacheAll; /* read and writebehind oplock */ > - bool delete_pending; /* DELETE_ON_CLOSE is set */ > - bool invalid_mapping; /* pagecache is invalid */ > + bool clientCanCacheRead:1; /* read oplock */ > + bool clientCanCacheAll:1; /* read and writebehind oplock */ > + bool delete_pending:1; /* DELETE_ON_CLOSE is set */ > + bool invalid_mapping:1; /* pagecache is invalid */ > + bool btime_valid:1; /* stored creation time is valid */ > + bool uid_faked:1; /* true if i_uid is faked */ > + bool gid_faked:1; /* true if i_gid is faked */ > unsigned long time; /* jiffies of last update of inode */ > + struct timespec btime; /* creation time */ > u64 server_eof; /* current file size on server -- protected by i_lock */ > u64 uniqueid; /* server inode number */ > u64 createtime; /* creation time on server */ > @@ -833,6 +837,9 @@ struct dfs_info3_param { > #define CIFS_FATTR_DELETE_PENDING 0x2 > #define CIFS_FATTR_NEED_REVAL 0x4 > #define CIFS_FATTR_INO_COLLISION 0x8 > +#define CIFS_FATTR_WINATTRS_VALID 0x10 /* T if cf_btime and cf_cifsattrs valid */ > +#define CIFS_FATTR_UID_FAKED 0x20 /* T if cf_uid is faked */ > +#define CIFS_FATTR_GID_FAKED 0x40 /* T if cf_gid is faked */ > > struct cifs_fattr { > u32 cf_flags; > @@ -850,6 +857,7 @@ struct cifs_fattr { > struct timespec cf_atime; > struct timespec cf_mtime; > struct timespec cf_ctime; > + struct timespec cf_btime; > }; > > static inline void free_dfs_info_param(struct dfs_info3_param *param) > diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c > index d172c8e..d9e03ae 100644 > --- a/fs/cifs/dir.c > +++ b/fs/cifs/dir.c > @@ -664,7 +664,7 @@ cifs_d_revalidate(struct dentry *direntry, struct nameidata *nd) > return -ECHILD; > > if (direntry->d_inode) { > - if (cifs_revalidate_dentry(direntry)) > + if (cifs_revalidate_dentry(direntry, false, false)) > return 0; > else { > /* > diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c > index 745da3d..662d5ce 100644 > --- a/fs/cifs/inode.c > +++ b/fs/cifs/inode.c > @@ -135,13 +135,21 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr) > set_nlink(inode, fattr->cf_nlink); > inode->i_uid = fattr->cf_uid; > inode->i_gid = fattr->cf_gid; > + if (fattr->cf_flags & CIFS_FATTR_UID_FAKED) > + cifs_i->uid_faked = true; > + if (fattr->cf_flags & CIFS_FATTR_GID_FAKED) > + cifs_i->gid_faked = true; > > /* if dynperm is set, don't clobber existing mode */ > if (inode->i_state & I_NEW || > !(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DYNPERM)) > inode->i_mode = fattr->cf_mode; > > - cifs_i->cifsAttrs = fattr->cf_cifsattrs; > + if (fattr->cf_flags & CIFS_FATTR_WINATTRS_VALID) { > + cifs_i->cifsAttrs = fattr->cf_cifsattrs; > + cifs_i->btime = fattr->cf_btime; > + cifs_i->btime_valid = true; > + } > > if (fattr->cf_flags & CIFS_FATTR_NEED_REVAL) > cifs_i->time = 0; > @@ -248,15 +256,19 @@ cifs_unix_basic_to_fattr(struct cifs_fattr *fattr, FILE_UNIX_BASIC_INFO *info, > break; > } > > - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID) > + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID) { > fattr->cf_uid = cifs_sb->mnt_uid; > - else > + fattr->cf_flags |= CIFS_FATTR_UID_FAKED; > + } else { > fattr->cf_uid = le64_to_cpu(info->Uid); > + } > > - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID) > + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID) { > fattr->cf_gid = cifs_sb->mnt_gid; > - else > + fattr->cf_flags |= CIFS_FATTR_GID_FAKED; > + } else { > fattr->cf_gid = le64_to_cpu(info->Gid); > + } > > fattr->cf_nlink = le64_to_cpu(info->Nlinks); > } > @@ -283,7 +295,8 @@ cifs_create_dfs_fattr(struct cifs_fattr *fattr, struct super_block *sb) > fattr->cf_ctime = CURRENT_TIME; > fattr->cf_mtime = CURRENT_TIME; > fattr->cf_nlink = 2; > - fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL; > + fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL | > + CIFS_FATTR_UID_FAKED | CIFS_FATTR_GID_FAKED; > } > > int cifs_get_file_info_unix(struct file *filp) > @@ -510,6 +523,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, > struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); > > memset(fattr, 0, sizeof(*fattr)); > + fattr->cf_flags = CIFS_FATTR_WINATTRS_VALID; > fattr->cf_cifsattrs = le32_to_cpu(info->Attributes); > if (info->DeletePending) > fattr->cf_flags |= CIFS_FATTR_DELETE_PENDING; > @@ -521,6 +535,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, > > fattr->cf_ctime = cifs_NTtimeToUnix(info->ChangeTime); > fattr->cf_mtime = cifs_NTtimeToUnix(info->LastWriteTime); > + fattr->cf_btime = cifs_NTtimeToUnix(info->CreationTime); > > if (adjust_tz) { > fattr->cf_ctime.tv_sec += tcon->ses->server->timeAdj; > @@ -1724,7 +1739,8 @@ int cifs_revalidate_file_attr(struct file *filp) > return rc; > } > > -int cifs_revalidate_dentry_attr(struct dentry *dentry) > +int cifs_revalidate_dentry_attr(struct dentry *dentry, > + bool want_extra_bits, bool force) > { > int xid; > int rc = 0; > @@ -1735,7 +1751,7 @@ int cifs_revalidate_dentry_attr(struct dentry *dentry) > if (inode == NULL) > return -ENOENT; > > - if (!cifs_inode_needs_reval(inode)) > + if (!force && !cifs_inode_needs_reval(inode)) > return rc; > > xid = GetXid(); > @@ -1752,9 +1768,12 @@ int cifs_revalidate_dentry_attr(struct dentry *dentry) > "%ld jiffies %ld", full_path, inode, inode->i_count.counter, > dentry, dentry->d_time, jiffies); > > - if (cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext) > + if (cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext) { > rc = cifs_get_inode_info_unix(&inode, full_path, sb, xid); > - else > + if (rc != 0) > + goto out; > + } > + if (!cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext || want_extra_bits) > rc = cifs_get_inode_info(&inode, full_path, NULL, sb, > xid, NULL); > > @@ -1779,12 +1798,13 @@ int cifs_revalidate_file(struct file *filp) > } > > /* revalidate a dentry's inode attributes */ > -int cifs_revalidate_dentry(struct dentry *dentry) > +int cifs_revalidate_dentry(struct dentry *dentry, > + bool want_extra_bits, bool force) > { > int rc; > struct inode *inode = dentry->d_inode; > > - rc = cifs_revalidate_dentry_attr(dentry); > + rc = cifs_revalidate_dentry_attr(dentry, want_extra_bits, force); > if (rc) > return rc; > > @@ -1796,11 +1816,30 @@ int cifs_revalidate_dentry(struct dentry *dentry) > int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry, > struct kstat *stat) > { > + struct cifsInodeInfo *cifs_i = CIFS_I(dentry->d_inode); > struct cifs_sb_info *cifs_sb = CIFS_SB(dentry->d_sb); > struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); > struct inode *inode = dentry->d_inode; > + unsigned force = stat->query_flags & AT_FORCE_ATTR_SYNC; > + bool want_extra_bits = false; > + u32 info, ioc = 0; > + u32 attrs; > int rc; > > + if (cifs_i->uid_faked) > + stat->request_mask &= ~XSTAT_UID; > + if (cifs_i->gid_faked) > + stat->request_mask &= ~XSTAT_GID; > + > + if (stat->request_mask & XSTAT_BTIME && !cifs_i->btime_valid) { > + want_extra_bits = true; > + force = true; > + } > + if (stat->request_mask & XSTAT_IOC_FLAGS) { > + want_extra_bits = true; > + force = true; > + } > + > /* > * We need to be sure that all dirty pages are written and the server > * has actual ctime, mtime and file length. > @@ -1814,13 +1853,14 @@ int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry, > } > } > > - rc = cifs_revalidate_dentry_attr(dentry); > - if (rc) > - return rc; > + if (force || stat->request_mask & XSTAT_BASIC_STATS) { > + rc = cifs_revalidate_dentry(dentry, want_extra_bits, force); > + if (rc) > + return rc; > + } > > generic_fillattr(inode, stat); > stat->blksize = CIFS_MAX_MSGSIZE; > - stat->ino = CIFS_I(inode)->uniqueid; > > /* > * If on a multiuser mount without unix extensions, and the admin hasn't > @@ -1834,7 +1874,53 @@ int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry, > if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID)) > stat->gid = current_fsgid(); > } > - return rc; > + > + info = XSTAT_INFO_REMOTE | XSTAT_INFO_NONSYSTEM_OWNERSHIP; > +#ifdef CONFIG_CIFS_ACL > + info |= XSTAT_INFO_HAS_ACL; > +#endif > + > + if (cifs_i->btime_valid) { > + stat->btime = cifs_i->btime; > + stat->result_mask |= XSTAT_BTIME; > + } > + > + /* We don't promise an inode number if we made one up */ > + stat->ino = cifs_i->uniqueid; > + if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM)) > + stat->result_mask &= ~XSTAT_INO; > + > + /* > + * If on a multiuser mount without unix extensions, and the admin > + * hasn't overridden them, set the ownership to the fsuid/fsgid of the > + * current process. > + */ > + if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER) && > + !tcon->unix_ext) { > + if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID)) > + stat->uid = current_fsuid(); > + if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID)) > + stat->gid = current_fsgid(); > + } > + if (cifs_i->uid_faked) > + stat->result_mask &= ~XSTAT_UID; > + if (cifs_i->gid_faked) > + stat->result_mask &= ~XSTAT_GID; > + > + attrs = cifs_i->cifsAttrs; > + if (attrs & ATTR_HIDDEN) info |= XSTAT_INFO_HIDDEN; > + if (attrs & ATTR_SYSTEM) info |= XSTAT_INFO_SYSTEM; > + if (attrs & ATTR_ARCHIVE) info |= XSTAT_INFO_ARCHIVE; > + if (attrs & ATTR_TEMPORARY) info |= XSTAT_INFO_TEMPORARY; > + if (attrs & ATTR_REPARSE) info |= XSTAT_INFO_REPARSE_POINT; > + if (attrs & ATTR_OFFLINE) info |= XSTAT_INFO_OFFLINE; > + if (attrs & ATTR_ENCRYPTED) info |= XSTAT_INFO_ENCRYPTED; > + stat->information |= info; > + > + if (attrs & ATTR_READONLY) ioc |= FS_IMMUTABLE_FL; > + if (attrs & ATTR_COMPRESSED) ioc |= FS_COMPR_FL; > + stat->ioc_flags |= ioc; > + return 0; > } > > static int cifs_truncate_page(struct address_space *mapping, loff_t from) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 5/6] xstat: CIFS: Return extended attributes @ 2012-04-19 15:19 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-19 15:19 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha For some of our users this would help A LOT. Interesting ... just had discussions yesterday with some guys trying to migrate to Linux and another set trying to backup Windows/NetApp from Linux. Some things they brought up that they needed (beyond what we already have wth the cifs acl and SID and "dos attributes" xattrs, which is even more useful now with the backup intent cifs mount flag) included: - how do they tell if the inode number for a file was manufactured on the client, or whether we were able to use the server file's inode number ("UniqueId") - how to get birth time (creation time) - how to tell if file is "offline" (HSM) - And is there a way to return the other less common cifs attributes (e.g. "reparse") Dave's patch seems to address all of that. Samba server stuffs most of this in an ndr encoded xattr blob which isn't much good for kernel code to use, and I really prefer Dave's approach. Without this, I would need to add another cifs specific ioctl, but since there is significant overlap between some of these and ntfs, vfat, nfs etc. I like the xstat idea better. On Thu, Apr 19, 2012 at 9:07 AM, David Howells <dhowells@redhat.com> wrote: > Return extended attributes from the CIFS filesystem. This includes the > following: > > (1) Return the file creation time as btime. We assume that the creation time > won't change over the life of the inode. > > (2) Set XSTAT_INFO_AUTOMOUNT on referral/submount directories. > > (3) Unset XSTAT_INO if we made up the inode number and didn't get it from the > server. > > (4) Unset XSTAT_[UG]ID if we are either returning values passed to mount > and/or the server doesn't return them. > > (5) Map various Windows file attributes to FS_xxx_FL flags in st_ioc_flags > and XSTAT_INFO_xxx flags in st_information, fetching them from the server > if we don't have them yet or don't have a current copy. > > Possibly things like Hidden, System and Archive should be FS_xxx_FL flags > rather than XSTAT_INFO_xxx flags and st_ioc_flags should be expanded to > 64 bits. > > (6) Set XSTAT_INFO_REMOTE on all files fetched by CIFS. > > (7) Set XSTAT_INFO_NONSYSTEM_OWNERSHIP on all files as they all have Windows > ownership details too. > > (8) Set XSTAT_INFO_HAS_ACL if CONFIG_CIFS_ACL=y as Windows ACLs are available > on the object. > > Furthermore, what cifs_getattr() does can be controlled as follows: > > (1) If AT_FORCE_ATTR_SYNC is indicated, or if the inode flags or creation time > are requested but not yet collected, then the attributes will be reread > unconditionally. > > (2) If the basic stats are requested or if the inode flags are requested and > have been collected previously, then the attributes will be reread if out > of date. > > (3) Otherwise the cached attributes will be used - even if expired - without > reference to the server. > > Note that cifs_revalidate_dentry() will issue an extra operation to get the > FILE_ALL_INFO in addition to the FILE_UNIX_BASIC_INFO if it needs to collect > creation time and attributes on behalf of cifs_getattr(). > > [NOTE: THIS PATCH IS UNTESTED!] > > Signed-off-by: David Howells <dhowells@redhat.com> > --- > > fs/cifs/cifsfs.h | 4 +- > fs/cifs/cifsglob.h | 16 +++++-- > fs/cifs/dir.c | 2 - > fs/cifs/inode.c | 120 +++++++++++++++++++++++++++++++++++++++++++++------- > 4 files changed, 118 insertions(+), 24 deletions(-) > > diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h > index d1389bb..021e327 100644 > --- a/fs/cifs/cifsfs.h > +++ b/fs/cifs/cifsfs.h > @@ -56,9 +56,9 @@ extern int cifs_rmdir(struct inode *, struct dentry *); > extern int cifs_rename(struct inode *, struct dentry *, struct inode *, > struct dentry *); > extern int cifs_revalidate_file_attr(struct file *filp); > -extern int cifs_revalidate_dentry_attr(struct dentry *); > +extern int cifs_revalidate_dentry_attr(struct dentry *, bool, bool); > extern int cifs_revalidate_file(struct file *filp); > -extern int cifs_revalidate_dentry(struct dentry *); > +extern int cifs_revalidate_dentry(struct dentry *, bool, bool); > extern int cifs_invalidate_mapping(struct inode *inode); > extern int cifs_getattr(struct vfsmount *, struct dentry *, struct kstat *); > extern int cifs_setattr(struct dentry *, struct iattr *); > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h > index 4ff6313..d3567da 100644 > --- a/fs/cifs/cifsglob.h > +++ b/fs/cifs/cifsglob.h > @@ -621,11 +621,15 @@ struct cifsInodeInfo { > /* BB add in lists for dirty pages i.e. write caching info for oplock */ > struct list_head openFileList; > __u32 cifsAttrs; /* e.g. DOS archive bit, sparse, compressed, system */ > - bool clientCanCacheRead; /* read oplock */ > - bool clientCanCacheAll; /* read and writebehind oplock */ > - bool delete_pending; /* DELETE_ON_CLOSE is set */ > - bool invalid_mapping; /* pagecache is invalid */ > + bool clientCanCacheRead:1; /* read oplock */ > + bool clientCanCacheAll:1; /* read and writebehind oplock */ > + bool delete_pending:1; /* DELETE_ON_CLOSE is set */ > + bool invalid_mapping:1; /* pagecache is invalid */ > + bool btime_valid:1; /* stored creation time is valid */ > + bool uid_faked:1; /* true if i_uid is faked */ > + bool gid_faked:1; /* true if i_gid is faked */ > unsigned long time; /* jiffies of last update of inode */ > + struct timespec btime; /* creation time */ > u64 server_eof; /* current file size on server -- protected by i_lock */ > u64 uniqueid; /* server inode number */ > u64 createtime; /* creation time on server */ > @@ -833,6 +837,9 @@ struct dfs_info3_param { > #define CIFS_FATTR_DELETE_PENDING 0x2 > #define CIFS_FATTR_NEED_REVAL 0x4 > #define CIFS_FATTR_INO_COLLISION 0x8 > +#define CIFS_FATTR_WINATTRS_VALID 0x10 /* T if cf_btime and cf_cifsattrs valid */ > +#define CIFS_FATTR_UID_FAKED 0x20 /* T if cf_uid is faked */ > +#define CIFS_FATTR_GID_FAKED 0x40 /* T if cf_gid is faked */ > > struct cifs_fattr { > u32 cf_flags; > @@ -850,6 +857,7 @@ struct cifs_fattr { > struct timespec cf_atime; > struct timespec cf_mtime; > struct timespec cf_ctime; > + struct timespec cf_btime; > }; > > static inline void free_dfs_info_param(struct dfs_info3_param *param) > diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c > index d172c8e..d9e03ae 100644 > --- a/fs/cifs/dir.c > +++ b/fs/cifs/dir.c > @@ -664,7 +664,7 @@ cifs_d_revalidate(struct dentry *direntry, struct nameidata *nd) > return -ECHILD; > > if (direntry->d_inode) { > - if (cifs_revalidate_dentry(direntry)) > + if (cifs_revalidate_dentry(direntry, false, false)) > return 0; > else { > /* > diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c > index 745da3d..662d5ce 100644 > --- a/fs/cifs/inode.c > +++ b/fs/cifs/inode.c > @@ -135,13 +135,21 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr) > set_nlink(inode, fattr->cf_nlink); > inode->i_uid = fattr->cf_uid; > inode->i_gid = fattr->cf_gid; > + if (fattr->cf_flags & CIFS_FATTR_UID_FAKED) > + cifs_i->uid_faked = true; > + if (fattr->cf_flags & CIFS_FATTR_GID_FAKED) > + cifs_i->gid_faked = true; > > /* if dynperm is set, don't clobber existing mode */ > if (inode->i_state & I_NEW || > !(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DYNPERM)) > inode->i_mode = fattr->cf_mode; > > - cifs_i->cifsAttrs = fattr->cf_cifsattrs; > + if (fattr->cf_flags & CIFS_FATTR_WINATTRS_VALID) { > + cifs_i->cifsAttrs = fattr->cf_cifsattrs; > + cifs_i->btime = fattr->cf_btime; > + cifs_i->btime_valid = true; > + } > > if (fattr->cf_flags & CIFS_FATTR_NEED_REVAL) > cifs_i->time = 0; > @@ -248,15 +256,19 @@ cifs_unix_basic_to_fattr(struct cifs_fattr *fattr, FILE_UNIX_BASIC_INFO *info, > break; > } > > - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID) > + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID) { > fattr->cf_uid = cifs_sb->mnt_uid; > - else > + fattr->cf_flags |= CIFS_FATTR_UID_FAKED; > + } else { > fattr->cf_uid = le64_to_cpu(info->Uid); > + } > > - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID) > + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID) { > fattr->cf_gid = cifs_sb->mnt_gid; > - else > + fattr->cf_flags |= CIFS_FATTR_GID_FAKED; > + } else { > fattr->cf_gid = le64_to_cpu(info->Gid); > + } > > fattr->cf_nlink = le64_to_cpu(info->Nlinks); > } > @@ -283,7 +295,8 @@ cifs_create_dfs_fattr(struct cifs_fattr *fattr, struct super_block *sb) > fattr->cf_ctime = CURRENT_TIME; > fattr->cf_mtime = CURRENT_TIME; > fattr->cf_nlink = 2; > - fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL; > + fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL | > + CIFS_FATTR_UID_FAKED | CIFS_FATTR_GID_FAKED; > } > > int cifs_get_file_info_unix(struct file *filp) > @@ -510,6 +523,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, > struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); > > memset(fattr, 0, sizeof(*fattr)); > + fattr->cf_flags = CIFS_FATTR_WINATTRS_VALID; > fattr->cf_cifsattrs = le32_to_cpu(info->Attributes); > if (info->DeletePending) > fattr->cf_flags |= CIFS_FATTR_DELETE_PENDING; > @@ -521,6 +535,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, > > fattr->cf_ctime = cifs_NTtimeToUnix(info->ChangeTime); > fattr->cf_mtime = cifs_NTtimeToUnix(info->LastWriteTime); > + fattr->cf_btime = cifs_NTtimeToUnix(info->CreationTime); > > if (adjust_tz) { > fattr->cf_ctime.tv_sec += tcon->ses->server->timeAdj; > @@ -1724,7 +1739,8 @@ int cifs_revalidate_file_attr(struct file *filp) > return rc; > } > > -int cifs_revalidate_dentry_attr(struct dentry *dentry) > +int cifs_revalidate_dentry_attr(struct dentry *dentry, > + bool want_extra_bits, bool force) > { > int xid; > int rc = 0; > @@ -1735,7 +1751,7 @@ int cifs_revalidate_dentry_attr(struct dentry *dentry) > if (inode == NULL) > return -ENOENT; > > - if (!cifs_inode_needs_reval(inode)) > + if (!force && !cifs_inode_needs_reval(inode)) > return rc; > > xid = GetXid(); > @@ -1752,9 +1768,12 @@ int cifs_revalidate_dentry_attr(struct dentry *dentry) > "%ld jiffies %ld", full_path, inode, inode->i_count.counter, > dentry, dentry->d_time, jiffies); > > - if (cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext) > + if (cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext) { > rc = cifs_get_inode_info_unix(&inode, full_path, sb, xid); > - else > + if (rc != 0) > + goto out; > + } > + if (!cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext || want_extra_bits) > rc = cifs_get_inode_info(&inode, full_path, NULL, sb, > xid, NULL); > > @@ -1779,12 +1798,13 @@ int cifs_revalidate_file(struct file *filp) > } > > /* revalidate a dentry's inode attributes */ > -int cifs_revalidate_dentry(struct dentry *dentry) > +int cifs_revalidate_dentry(struct dentry *dentry, > + bool want_extra_bits, bool force) > { > int rc; > struct inode *inode = dentry->d_inode; > > - rc = cifs_revalidate_dentry_attr(dentry); > + rc = cifs_revalidate_dentry_attr(dentry, want_extra_bits, force); > if (rc) > return rc; > > @@ -1796,11 +1816,30 @@ int cifs_revalidate_dentry(struct dentry *dentry) > int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry, > struct kstat *stat) > { > + struct cifsInodeInfo *cifs_i = CIFS_I(dentry->d_inode); > struct cifs_sb_info *cifs_sb = CIFS_SB(dentry->d_sb); > struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); > struct inode *inode = dentry->d_inode; > + unsigned force = stat->query_flags & AT_FORCE_ATTR_SYNC; > + bool want_extra_bits = false; > + u32 info, ioc = 0; > + u32 attrs; > int rc; > > + if (cifs_i->uid_faked) > + stat->request_mask &= ~XSTAT_UID; > + if (cifs_i->gid_faked) > + stat->request_mask &= ~XSTAT_GID; > + > + if (stat->request_mask & XSTAT_BTIME && !cifs_i->btime_valid) { > + want_extra_bits = true; > + force = true; > + } > + if (stat->request_mask & XSTAT_IOC_FLAGS) { > + want_extra_bits = true; > + force = true; > + } > + > /* > * We need to be sure that all dirty pages are written and the server > * has actual ctime, mtime and file length. > @@ -1814,13 +1853,14 @@ int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry, > } > } > > - rc = cifs_revalidate_dentry_attr(dentry); > - if (rc) > - return rc; > + if (force || stat->request_mask & XSTAT_BASIC_STATS) { > + rc = cifs_revalidate_dentry(dentry, want_extra_bits, force); > + if (rc) > + return rc; > + } > > generic_fillattr(inode, stat); > stat->blksize = CIFS_MAX_MSGSIZE; > - stat->ino = CIFS_I(inode)->uniqueid; > > /* > * If on a multiuser mount without unix extensions, and the admin hasn't > @@ -1834,7 +1874,53 @@ int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry, > if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID)) > stat->gid = current_fsgid(); > } > - return rc; > + > + info = XSTAT_INFO_REMOTE | XSTAT_INFO_NONSYSTEM_OWNERSHIP; > +#ifdef CONFIG_CIFS_ACL > + info |= XSTAT_INFO_HAS_ACL; > +#endif > + > + if (cifs_i->btime_valid) { > + stat->btime = cifs_i->btime; > + stat->result_mask |= XSTAT_BTIME; > + } > + > + /* We don't promise an inode number if we made one up */ > + stat->ino = cifs_i->uniqueid; > + if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM)) > + stat->result_mask &= ~XSTAT_INO; > + > + /* > + * If on a multiuser mount without unix extensions, and the admin > + * hasn't overridden them, set the ownership to the fsuid/fsgid of the > + * current process. > + */ > + if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER) && > + !tcon->unix_ext) { > + if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID)) > + stat->uid = current_fsuid(); > + if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID)) > + stat->gid = current_fsgid(); > + } > + if (cifs_i->uid_faked) > + stat->result_mask &= ~XSTAT_UID; > + if (cifs_i->gid_faked) > + stat->result_mask &= ~XSTAT_GID; > + > + attrs = cifs_i->cifsAttrs; > + if (attrs & ATTR_HIDDEN) info |= XSTAT_INFO_HIDDEN; > + if (attrs & ATTR_SYSTEM) info |= XSTAT_INFO_SYSTEM; > + if (attrs & ATTR_ARCHIVE) info |= XSTAT_INFO_ARCHIVE; > + if (attrs & ATTR_TEMPORARY) info |= XSTAT_INFO_TEMPORARY; > + if (attrs & ATTR_REPARSE) info |= XSTAT_INFO_REPARSE_POINT; > + if (attrs & ATTR_OFFLINE) info |= XSTAT_INFO_OFFLINE; > + if (attrs & ATTR_ENCRYPTED) info |= XSTAT_INFO_ENCRYPTED; > + stat->information |= info; > + > + if (attrs & ATTR_READONLY) ioc |= FS_IMMUTABLE_FL; > + if (attrs & ATTR_COMPRESSED) ioc |= FS_COMPR_FL; > + stat->ioc_flags |= ioc; > + return 0; > } > > static int cifs_truncate_page(struct address_space *mapping, loff_t from) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-19 14:05 ` David Howells ` (3 preceding siblings ...) (?) @ 2012-04-19 16:32 ` Roland McGrath 2012-04-19 21:51 ` Paul Eggert [not found] ` <4F9088D6.9020203-764C0pRuGfqVc3sceRu5cw@public.gmane.org> -1 siblings, 2 replies; 144+ messages in thread From: Roland McGrath @ 2012-04-19 16:32 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha I have no comment on the functionality. But "xstat" is probably a poor choice of name. There is precedent for that function name with different meaning in the userland APIs. (It's a moderately useless meaning inherited from SVR4, but regardless overloading a name previously used is unwise.) Thanks, Roland ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-19 16:32 ` [PATCH 0/6] Extended file stat system call Roland McGrath @ 2012-04-19 21:51 ` Paul Eggert 2012-04-19 23:05 ` Roland McGrath 2012-04-26 14:16 ` David Howells [not found] ` <4F9088D6.9020203-764C0pRuGfqVc3sceRu5cw@public.gmane.org> 1 sibling, 2 replies; 144+ messages in thread From: Paul Eggert @ 2012-04-19 21:51 UTC (permalink / raw) To: Roland McGrath Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On 04/19/2012 09:32 AM, Roland McGrath wrote: > I have no comment on the functionality. But "xstat" is probably a poor > choice of name. In AIX 7.1 the (similar) function is called statxat instead of xstat. The API isn't exactly the same, but it's the same basic idea. Might be worth looking at, not merely to see whether the API should be the same, but also to borrow good ideas even if not. http://pic.dhe.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.basetechref/doc/basetrf2/statx.htm ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-19 21:51 ` Paul Eggert @ 2012-04-19 23:05 ` Roland McGrath 2012-04-26 14:16 ` David Howells 1 sibling, 0 replies; 144+ messages in thread From: Roland McGrath @ 2012-04-19 23:05 UTC (permalink / raw) To: Paul Eggert Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha statx seems like a better family of names. I also think it's worthwhile to see if the interface can be made to more closely match the AIX precedent. Thanks, Roland ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-19 21:51 ` Paul Eggert 2012-04-19 23:05 ` Roland McGrath @ 2012-04-26 14:16 ` David Howells [not found] ` <20173.1335449760-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 144+ messages in thread From: David Howells @ 2012-04-26 14:16 UTC (permalink / raw) To: Roland McGrath Cc: dhowells, Paul Eggert, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Roland McGrath <roland@hack.frob.com> wrote: > statx seems like a better family of names. I also think it's worthwhile to > see if the interface can be made to more closely match the AIX precedent. I'm not sure we can make Linux xstat (or whatever) match AIX statxat() very closely, at least from a syscall interface point of view. We really need the AT_* flags mask to be compatible with the other Linux syscalls and the length parameter that I originally had got argued out of existence. I would also like to make it so that there aren't different 32-bit and 64-bit interfaces to the kernel - though that means a burden on glibc/uClibc/etc., I guess. I do also want to be able to pass a mask of what we're actually interested in - but that's not part of the AIX interface. I also like the idea of having a larger buffer with the ability to return extra info (such as security label xattrs), but I was told it was a poor idea, so that got taken out. David ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <20173.1335449760-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 14:16 ` David Howells @ 2012-04-26 18:22 ` Roland McGrath 0 siblings, 0 replies; 144+ messages in thread From: Roland McGrath @ 2012-04-26 18:22 UTC (permalink / raw) To: David Howells Cc: Paul Eggert, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw > Roland McGrath <roland-/Z5OmTQCD9xF6kxbq+BtvQ@public.gmane.org> wrote: > > > statx seems like a better family of names. I also think it's worthwhile to > > see if the interface can be made to more closely match the AIX precedent. > > I'm not sure we can make Linux xstat (or whatever) match AIX statxat() very > closely, at least from a syscall interface point of view. OK. It was just worth a look. Thanks, Roland ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 18:22 ` Roland McGrath 0 siblings, 0 replies; 144+ messages in thread From: Roland McGrath @ 2012-04-26 18:22 UTC (permalink / raw) To: David Howells Cc: Paul Eggert, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha > Roland McGrath <roland@hack.frob.com> wrote: > > > statx seems like a better family of names. I also think it's worthwhile to > > see if the interface can be made to more closely match the AIX precedent. > > I'm not sure we can make Linux xstat (or whatever) match AIX statxat() very > closely, at least from a syscall interface point of view. OK. It was just worth a look. Thanks, Roland ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <4F9088D6.9020203-764C0pRuGfqVc3sceRu5cw@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-19 16:32 ` [PATCH 0/6] Extended file stat system call Roland McGrath @ 2012-04-26 14:04 ` David Howells [not found] ` <4F9088D6.9020203-764C0pRuGfqVc3sceRu5cw@public.gmane.org> 1 sibling, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-26 14:04 UTC (permalink / raw) To: Paul Eggert Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, Roland McGrath, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw Paul Eggert <eggert-764C0pRuGfqVc3sceRu5cw@public.gmane.org> wrote: > On 04/19/2012 09:32 AM, Roland McGrath wrote: > > I have no comment on the functionality. But "xstat" is probably a poor > > choice of name. > > In AIX 7.1 the (similar) function is called statxat instead of xstat. > The API isn't exactly the same, but it's the same basic idea. > Might be worth looking at, not merely to see whether the API > should be the same, but also to borrow good ideas even if not. > > http://pic.dhe.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.basetechref/doc/basetrf2/statx.htm Interesting. I wasn't intending to provide both statx() and statxat() variants, just the latter, in which case I'd've though that -at suffix is redundant. I note that they split their time fields into separate seconds and ns fields, presumably for better packing. David ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 14:04 ` David Howells 0 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-26 14:04 UTC (permalink / raw) To: Paul Eggert Cc: dhowells, Roland McGrath, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Paul Eggert <eggert@cs.ucla.edu> wrote: > On 04/19/2012 09:32 AM, Roland McGrath wrote: > > I have no comment on the functionality. But "xstat" is probably a poor > > choice of name. > > In AIX 7.1 the (similar) function is called statxat instead of xstat. > The API isn't exactly the same, but it's the same basic idea. > Might be worth looking at, not merely to see whether the API > should be the same, but also to borrow good ideas even if not. > > http://pic.dhe.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.basetechref/doc/basetrf2/statx.htm Interesting. I wasn't intending to provide both statx() and statxat() variants, just the latter, in which case I'd've though that -at suffix is redundant. I note that they split their time fields into separate seconds and ns fields, presumably for better packing. David ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <19638.1335449047-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 14:04 ` David Howells @ 2012-04-26 18:24 ` Roland McGrath -1 siblings, 0 replies; 144+ messages in thread From: Roland McGrath @ 2012-04-26 18:24 UTC (permalink / raw) To: David Howells Cc: Paul Eggert, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw > Interesting. I wasn't intending to provide both statx() and statxat() > variants, just the latter, in which case I'd've though that -at suffix is > redundant. It's certainly fine to provide only *at flavors for any new syscall, IMHO. The * case is always just a simple degenerate case of *at, and libc can trivially provide the simpler user API as well using the *at syscall. But please keep the uniformity that everything taking a descriptor and AT_* flags is named *at. Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 18:24 ` Roland McGrath 0 siblings, 0 replies; 144+ messages in thread From: Roland McGrath @ 2012-04-26 18:24 UTC (permalink / raw) To: David Howells Cc: Paul Eggert, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha > Interesting. I wasn't intending to provide both statx() and statxat() > variants, just the latter, in which case I'd've though that -at suffix is > redundant. It's certainly fine to provide only *at flavors for any new syscall, IMHO. The * case is always just a simple degenerate case of *at, and libc can trivially provide the simpler user API as well using the *at syscall. But please keep the uniformity that everything taking a descriptor and AT_* flags is named *at. Thanks, Roland ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-19 14:05 ` David Howells ` (4 preceding siblings ...) (?) @ 2012-04-19 23:29 ` Andreas Dilger -1 siblings, 0 replies; 144+ messages in thread From: Andreas Dilger @ 2012-04-19 23:29 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On 2012-04-19, at 8:05 AM, David Howells wrote: > Implement a pair of new system calls to provide extended and further > extensible stat functions. Hallelujah for this. I've been waiting/wanting something like this for ages already. Now if only we can get this landed before it degrades into the mess it did last time. > Should fxstat() be implemented as xstat() with a NULL filename, > using dfd as fd? I'm personally inclined toward fewer syscalls, especially since the fstatxat()->statxat() mapping (if I can be so bold as to prefer the names used later in this thread) is IMHO clear and unambiguous, and avoids several thin wrappers in the kernel. > Should the default for a network fs be to do an unconditional > (heavyweight) stat with a flag to suppress going to the server > to update the locally held attributes and flushing pending writebacks? NOOOooo! If application writers are going to use this, they should request the information needed, and no more. Make no assumptions about what information is easy or hard for a filesystem to return, since the overhead can vary wildly depending on the implementation. Something like "ls --color" (no -l or -s) always stats the file just to get the mode bits to color executable files differently. Having to return other information that isn't totally free almost ruins the benefit of adding a new syscall in the first place. > Should things like the Windows Archive, Hidden and System bits be > handled through IOC flags, perhaps expanded to 64-bits? I'm definitely in favour of a 64-bit IOC flags value, since they are getting close to running out already. As to whether those other bits should be merged into the IOC flags, I'm mostly indifferent, but I lean toward including them since they are definitely related. I wouldn't object to 64-bit UID/GID values or split 32-bit low/hi UID and GID values, since NFSv4 and Kerberos realms will likely need this at some point as well. That said, if the API is extensible, it would be just as easy to add the low/hi split values when they are needed in the future. > Are these things useful to userspace other than Samba and userspace > NFS servers? Definitely yes. The GNU fileutils can use a lot of this, since they are VERY stat() heavy for things like checking st_dev and st_ino changes during directory traversal, but don't need any of the other info. > Is it useful to pass the volume ID out? Or is statfs() sufficient > for this? Can't hurt, IMHO. It is a better (more persistent) identifier than st_dev, and if it is free, or explicitly requested by the application (Samba, Ganesha, etc) it can be very useful. Cheers, Andreas ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-19 14:05 ` David Howells ` (5 preceding siblings ...) (?) @ 2012-04-26 13:54 ` David Howells [not found] ` <19184.1335448455-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> [not found] ` <20120426182524.E5ADF2C0EC-j1d2VQoJOwwHfwO+Tb3JRVaTQe2KTcn/@public.gmane.org> -1 siblings, 2 replies; 144+ messages in thread From: David Howells @ 2012-04-26 13:54 UTC (permalink / raw) To: Roland McGrath Cc: dhowells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Roland McGrath <roland@hack.frob.com> wrote: > I have no comment on the functionality. But "xstat" is probably a poor > choice of name. There is precedent for that function name with different > meaning in the userland APIs. (It's a moderately useless meaning inherited > from SVR4, but regardless overloading a name previously used is unwise.) I've no particular attachment to the name 'xstat'. If you'd prefer 'statx' I could go for that. David ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <19184.1335448455-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 13:54 ` David Howells @ 2012-04-26 18:25 ` Roland McGrath [not found] ` <20120426182524.E5ADF2C0EC-j1d2VQoJOwwHfwO+Tb3JRVaTQe2KTcn/@public.gmane.org> 1 sibling, 0 replies; 144+ messages in thread From: Roland McGrath @ 2012-04-26 18:25 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw > I've no particular attachment to the name 'xstat'. If you'd prefer 'statx' I > could go for that. I prefer something other than xstat and statx(at) seems acceptable enough. What I'd really prefer is a name that is less meaninglessly arcane, but I haven't thought of any good ones. Thanks, Roland ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 18:25 ` Roland McGrath 0 siblings, 0 replies; 144+ messages in thread From: Roland McGrath @ 2012-04-26 18:25 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha > I've no particular attachment to the name 'xstat'. If you'd prefer 'statx' I > could go for that. I prefer something other than xstat and statx(at) seems acceptable enough. What I'd really prefer is a name that is less meaninglessly arcane, but I haven't thought of any good ones. Thanks, Roland ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 18:25 ` Roland McGrath (?) @ 2012-04-27 23:54 ` Paul Eggert -1 siblings, 0 replies; 144+ messages in thread From: Paul Eggert @ 2012-04-27 23:54 UTC (permalink / raw) To: Roland McGrath Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On 04/26/2012 11:25 AM, Roland McGrath wrote: > What I'd really prefer is a name that is less meaninglessly arcane, Since people are used to the string "stat", and since this is about getting related attributes, how about using the string "statr"? Thus "statrat" could be the syscall that acts like fstatat, but gets related attributes too. A bonus of this name is that a "stat rat" is slang for someone who loves looking at statistics, often sports statistics. Billy Beane is a stat rat. Someone who cares about files' statistics is also a stat rat. The "r" in "statrat" could stand for "related", or for "relevant", or for whatever you like. ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <20120426182524.E5ADF2C0EC-j1d2VQoJOwwHfwO+Tb3JRVaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 13:54 ` David Howells @ 2012-04-26 21:54 ` David Howells [not found] ` <20120426182524.E5ADF2C0EC-j1d2VQoJOwwHfwO+Tb3JRVaTQe2KTcn/@public.gmane.org> 1 sibling, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-26 21:54 UTC (permalink / raw) To: Roland McGrath Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw Roland McGrath <roland-/Z5OmTQCD9xF6kxbq+BtvQ@public.gmane.org> wrote: > > I've no particular attachment to the name 'xstat'. If you'd prefer > > 'statx' I could go for that. > > I prefer something other than xstat and statx(at) seems acceptable enough. > What I'd really prefer is a name that is less meaninglessly arcane, > but I haven't thought of any good ones. fileinfoat() perhaps? I think stat*at() is better, though, as people are used to the stat function family. David -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 21:54 ` David Howells 0 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-26 21:54 UTC (permalink / raw) To: Roland McGrath Cc: dhowells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Roland McGrath <roland@hack.frob.com> wrote: > > I've no particular attachment to the name 'xstat'. If you'd prefer > > 'statx' I could go for that. > > I prefer something other than xstat and statx(at) seems acceptable enough. > What I'd really prefer is a name that is less meaninglessly arcane, > but I haven't thought of any good ones. fileinfoat() perhaps? I think stat*at() is better, though, as people are used to the stat function family. David ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <9931.1335477281-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 21:54 ` David Howells @ 2012-04-26 22:02 ` Roland McGrath -1 siblings, 0 replies; 144+ messages in thread From: Roland McGrath @ 2012-04-26 22:02 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw > fileinfoat() perhaps? I think stat*at() is better, though, as people are > used to the stat function family. Names like that were all I had thought of when I said I hadn't thought of any good ones. ;-) ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 22:02 ` Roland McGrath 0 siblings, 0 replies; 144+ messages in thread From: Roland McGrath @ 2012-04-26 22:02 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha > fileinfoat() perhaps? I think stat*at() is better, though, as people are > used to the stat function family. Names like that were all I had thought of when I said I hadn't thought of any good ones. ;-) ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 22:02 ` Roland McGrath (?) @ 2012-04-26 22:21 ` Nix -1 siblings, 0 replies; 144+ messages in thread From: Nix @ 2012-04-26 22:21 UTC (permalink / raw) To: Roland McGrath Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On 26 Apr 2012, Roland McGrath verbalised: >> fileinfoat() perhaps? I think stat*at() is better, though, as people are >> used to the stat function family. > > Names like that were all I had thought of when I said I hadn't thought of > any good ones. ;-) Quite. It's a garden-path function name. "What's a foat and why would I want to put a file in one?" -- NULL && (void) ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-19 14:05 ` David Howells ` (6 preceding siblings ...) (?) @ 2012-04-26 14:25 ` David Howells 2012-04-26 14:54 ` Steve French 2012-04-26 15:52 ` David Howells -1 siblings, 2 replies; 144+ messages in thread From: David Howells @ 2012-04-26 14:25 UTC (permalink / raw) To: Steve French Cc: dhowells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Steve French <smfrench@gmail.com> wrote: > Would it be better to make the stable vs volatile inode number an attribute > of the volume or something returned by the proposed xstat? I'm not sure what you mean by a stable vs a volatile inode number. > > Should things like the Windows Archive, Hidden and System bits be handled > > through IOC flags, perhaps expanded to 64-bits? > > Today I export these through an psuedo-xattr in cifs.ko, I am curious how > NTFS and FAT export these on linux. NTFS: Not at all. FAT: The hidden bit causes the filename to get a dot prepended (and nothing else is noted). > > Autofs, ntfs, btrfs, ... > > Given the overlap in optional attributes between the network > protocol and local NTFS (and ReFS and to a lesser extent FAT) > I would expect cifs.ko and the ntfs implementations > info to map pretty closely. Yep. I wasn't going to do more filesystems till we'd finished arguing about the basic arrangement of things in struct xstat. > > Handle remote filesystems being offline and indicate this with > > XSTAT_INFO_OFFLINE. > > You already have support for an indicator for offline files (HSM), HSM? > would XSTAT_INFO_OFFLINE be intended for the case > where the network session to the server is disconnected > (and in which you case the application does not want to reconnect)? Hmmm... Interesting question. Both NTFS and CIFS have an offline attribute (which is where I originally got this from) - but should I have a separate indicator to indicate the client can't access a server over a network (ie. we've gone to disconnected operation on this file)? E.g. should there be a XSTAT_INFO_DISCONNECTED too? David ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 14:25 ` David Howells @ 2012-04-26 14:54 ` Steve French [not found] ` <CAH2r5mv1Lijdwk5zsQwYJr4Etb6fhrRyNXm-iFCQX+HecboGrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2012-04-26 15:52 ` David Howells 1 sibling, 1 reply; 144+ messages in thread From: Steve French @ 2012-04-26 14:54 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 26, 2012 at 9:25 AM, David Howells <dhowells@redhat.com> wrote: > Steve French <smfrench@gmail.com> wrote: > >> Would it be better to make the stable vs volatile inode number an attribute >> of the volume or something returned by the proposed xstat? > > I'm not sure what you mean by a stable vs a volatile inode number. Both NFS and CIFS (and SMB2) can return inode numbers or equivalent unique identifier, but in the case of CIFS some old servers don't support the calls which return inode numbers (or don't return them for all file system types, Windows FAT?) so in these cases cifs has to create inode numbers on the fly on the client. inode numbers created on the client are not "stable" they can change on unmount/remount (which can cause problems for backup applications). Similarly NFSv4 does not require that servers always return stable inode numbers (that will never change) and introduced a concept of "volatile file handle." We have run into this in two cases (there are probably more) - Specialized NFS servers for HPC which deal with lots of transient inodes, and second those for servers which base there inode number on path (Windows NFS?). See http://docs.oracle.com/cd/E19082-01/819-1634/rfsrefer-137/index.html or the NFSv4 RFC. Basically the question is whether it is worth reporting a flag on the call which returns the inode number to indicate that the inode number is "stable" (would not change on reboot or reconnection) or "volatile." Since the majority of NFS and SMB2 servers can return stable inode numbers, I don't feel strongly about the need for an indicator of "stable" vs. "volatile" but I mention it because backup and migration applications mention this (if inode numbers are volatile, they may have to check for hardlinks differently for example) >> > Should things like the Windows Archive, Hidden and System bits be handled >> > through IOC flags, perhaps expanded to 64-bits? >> >> Today I export these through an psuedo-xattr in cifs.ko, I am curious how >> NTFS and FAT export these on linux. > > NTFS: Not at all. > > FAT: The hidden bit causes the filename to get a dot prepended (and nothing > else is noted). > >> > Autofs, ntfs, btrfs, ... >> >> Given the overlap in optional attributes between the network >> protocol and local NTFS (and ReFS and to a lesser extent FAT) >> I would expect cifs.ko and the ntfs implementations >> info to map pretty closely. > > Yep. I wasn't going to do more filesystems till we'd finished arguing about > the basic arrangement of things in struct xstat. makes sense >> > Handle remote filesystems being offline and indicate this with >> > XSTAT_INFO_OFFLINE. >> >> You already have support for an indicator for offline files (HSM), > > HSM HSM is the more general case of two tiered data (disk vs. tape) en.wikipedia.org/wiki/Hierarchical_storage_management in the simplest case on "disk" (fast) vs. moved to tape (slow to retrieve) >> would XSTAT_INFO_OFFLINE be intended for the case >> where the network session to the server is disconnected >> (and in which you case the application does not want to reconnect)? > > Hmmm... Interesting question. Both NTFS and CIFS have an offline attribute > (which is where I originally got this from) - but should I have a separate > indicator to indicate the client can't access a server over a network > (ie. we've gone to disconnected operation on this file)? E.g. should there be > a XSTAT_INFO_DISCONNECTED too? my reaction is no, since it adds complexity. If you do a stat on a disconnected volume (where the network is temporarily down) reconnection will be attempted. If reconnection fails then the xstat will either fail or be retried forever depending on the value of "hard" vs. "soft" mount flag. -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <CAH2r5mv1Lijdwk5zsQwYJr4Etb6fhrRyNXm-iFCQX+HecboGrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 14:54 ` Steve French @ 2012-04-26 15:25 ` Myklebust, Trond 0 siblings, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-04-26 15:25 UTC (permalink / raw) To: Steve French Cc: David Howells, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Thu, 2012-04-26 at 09:54 -0500, Steve French wrote: > On Thu, Apr 26, 2012 at 9:25 AM, David Howells <dhowells@redhat.com> wrote: > > Steve French <smfrench@gmail.com> wrote: > > > >> Would it be better to make the stable vs volatile inode number an attribute > >> of the volume or something returned by the proposed xstat? > > > > I'm not sure what you mean by a stable vs a volatile inode number. > > Both NFS and CIFS (and SMB2) can return inode numbers or equivalent > unique identifier, but in the case of CIFS some old servers don't support the > calls which return inode numbers (or don't return them for all file system > types, Windows FAT?) so in these cases cifs has to create inode > numbers on the fly > on the client. inode numbers created on the client are not "stable" they > can change on unmount/remount (which can cause problems for backup > applications). > > Similarly NFSv4 does not require that servers always return stable inode numbers > (that will never change) and introduced a concept of "volatile file handle." > We have run into this in two cases (there are probably more) - > Specialized NFS servers > for HPC which deal with lots of transient inodes, and second those for servers > which base there inode number on path (Windows NFS?). See > http://docs.oracle.com/cd/E19082-01/819-1634/rfsrefer-137/index.html > or the NFSv4 RFC. > > Basically the question is whether it is worth reporting a flag on the > call which returns > the inode number to indicate that the inode number is "stable" (would not change > on reboot or reconnection) or "volatile." Since the majority of NFS > and SMB2 servers > can return stable inode numbers, I don't feel strongly about the need > for an indicator > of "stable" vs. "volatile" but I mention it because backup and > migration applications > mention this (if inode numbers are volatile, they may have to check > for hardlinks differently > for example) I don't understand. If the filesystem doesn't support real inode numbers, then why report them at all? What use would an application have for an inode number that can't be used to identify hard linked files? Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 15:25 ` Myklebust, Trond 0 siblings, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-04-26 15:25 UTC (permalink / raw) To: Steve French Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha T24gVGh1LCAyMDEyLTA0LTI2IGF0IDA5OjU0IC0wNTAwLCBTdGV2ZSBGcmVuY2ggd3JvdGU6DQo+ IE9uIFRodSwgQXByIDI2LCAyMDEyIGF0IDk6MjUgQU0sIERhdmlkIEhvd2VsbHMgPGRob3dlbGxz QHJlZGhhdC5jb20+IHdyb3RlOg0KPiA+IFN0ZXZlIEZyZW5jaCA8c21mcmVuY2hAZ21haWwuY29t PiB3cm90ZToNCj4gPg0KPiA+PiBXb3VsZCBpdCBiZSBiZXR0ZXIgdG8gbWFrZSB0aGUgc3RhYmxl IHZzIHZvbGF0aWxlIGlub2RlIG51bWJlciBhbiBhdHRyaWJ1dGUNCj4gPj4gb2YgdGhlIHZvbHVt ZSAgb3Igc29tZXRoaW5nIHJldHVybmVkIGJ5IHRoZSBwcm9wb3NlZCB4c3RhdD8NCj4gPg0KPiA+ IEknbSBub3Qgc3VyZSB3aGF0IHlvdSBtZWFuIGJ5IGEgc3RhYmxlIHZzIGEgdm9sYXRpbGUgaW5v ZGUgbnVtYmVyLg0KPiANCj4gQm90aCBORlMgYW5kIENJRlMgKGFuZCBTTUIyKSBjYW4gcmV0dXJu IGlub2RlIG51bWJlcnMgb3IgZXF1aXZhbGVudA0KPiB1bmlxdWUgaWRlbnRpZmllciwgYnV0IGlu IHRoZSBjYXNlIG9mIENJRlMgc29tZSBvbGQgc2VydmVycyBkb24ndCBzdXBwb3J0IHRoZQ0KPiBj YWxscyB3aGljaCByZXR1cm4gaW5vZGUgbnVtYmVycyAob3IgZG9uJ3QgcmV0dXJuIHRoZW0gZm9y IGFsbCBmaWxlIHN5c3RlbQ0KPiB0eXBlcywgV2luZG93cyBGQVQ/KSBzbyBpbiB0aGVzZSBjYXNl cyBjaWZzIGhhcyB0byBjcmVhdGUgaW5vZGUNCj4gbnVtYmVycyBvbiB0aGUgZmx5DQo+IG9uIHRo ZSBjbGllbnQuICAgaW5vZGUgbnVtYmVycyBjcmVhdGVkIG9uIHRoZSBjbGllbnQgYXJlIG5vdCAi c3RhYmxlIiB0aGV5DQo+IGNhbiBjaGFuZ2Ugb24gdW5tb3VudC9yZW1vdW50ICh3aGljaCBjYW4g Y2F1c2UgcHJvYmxlbXMgZm9yIGJhY2t1cA0KPiBhcHBsaWNhdGlvbnMpLg0KPiANCj4gU2ltaWxh cmx5IE5GU3Y0IGRvZXMgbm90IHJlcXVpcmUgdGhhdCBzZXJ2ZXJzIGFsd2F5cyByZXR1cm4gc3Rh YmxlIGlub2RlIG51bWJlcnMNCj4gKHRoYXQgd2lsbCBuZXZlciBjaGFuZ2UpIGFuZCBpbnRyb2R1 Y2VkIGEgY29uY2VwdCBvZiAidm9sYXRpbGUgZmlsZSBoYW5kbGUuIg0KPiBXZSBoYXZlIHJ1biBp bnRvIHRoaXMgaW4gdHdvIGNhc2VzICh0aGVyZSBhcmUgcHJvYmFibHkgbW9yZSkgLQ0KPiBTcGVj aWFsaXplZCBORlMgc2VydmVycw0KPiBmb3IgSFBDIHdoaWNoIGRlYWwgd2l0aCBsb3RzIG9mIHRy YW5zaWVudCBpbm9kZXMsIGFuZCBzZWNvbmQgdGhvc2UgZm9yIHNlcnZlcnMNCj4gd2hpY2ggYmFz ZSB0aGVyZSBpbm9kZSBudW1iZXIgb24gcGF0aCAoV2luZG93cyBORlM/KS4gIFNlZQ0KPiBodHRw Oi8vZG9jcy5vcmFjbGUuY29tL2NkL0UxOTA4Mi0wMS84MTktMTYzNC9yZnNyZWZlci0xMzcvaW5k ZXguaHRtbA0KPiBvciB0aGUgTkZTdjQgUkZDLg0KPiANCj4gQmFzaWNhbGx5IHRoZSBxdWVzdGlv biBpcyB3aGV0aGVyIGl0IGlzIHdvcnRoIHJlcG9ydGluZyBhIGZsYWcgb24gdGhlDQo+IGNhbGwg d2hpY2ggcmV0dXJucw0KPiB0aGUgaW5vZGUgbnVtYmVyIHRvIGluZGljYXRlIHRoYXQgdGhlIGlu b2RlIG51bWJlciBpcyAic3RhYmxlIiAod291bGQgbm90IGNoYW5nZQ0KPiBvbiByZWJvb3Qgb3Ig cmVjb25uZWN0aW9uKSBvciAidm9sYXRpbGUuIiAgICBTaW5jZSB0aGUgbWFqb3JpdHkgb2YgTkZT DQo+IGFuZCBTTUIyIHNlcnZlcnMNCj4gY2FuIHJldHVybiBzdGFibGUgaW5vZGUgbnVtYmVycywg SSBkb24ndCBmZWVsIHN0cm9uZ2x5IGFib3V0IHRoZSBuZWVkDQo+IGZvciBhbiBpbmRpY2F0b3IN Cj4gb2YgInN0YWJsZSIgdnMuICJ2b2xhdGlsZSIgYnV0IEkgbWVudGlvbiBpdCBiZWNhdXNlIGJh Y2t1cCBhbmQNCj4gbWlncmF0aW9uIGFwcGxpY2F0aW9ucw0KPiBtZW50aW9uIHRoaXMgKGlmIGlu b2RlIG51bWJlcnMgYXJlIHZvbGF0aWxlLCB0aGV5IG1heSBoYXZlIHRvIGNoZWNrDQo+IGZvciBo YXJkbGlua3MgZGlmZmVyZW50bHkNCj4gZm9yIGV4YW1wbGUpDQoNCkkgZG9uJ3QgdW5kZXJzdGFu ZC4gSWYgdGhlIGZpbGVzeXN0ZW0gZG9lc24ndCBzdXBwb3J0IHJlYWwgaW5vZGUNCm51bWJlcnMs IHRoZW4gd2h5IHJlcG9ydCB0aGVtIGF0IGFsbD8gV2hhdCB1c2Ugd291bGQgYW4gYXBwbGljYXRp b24gaGF2ZQ0KZm9yIGFuIGlub2RlIG51bWJlciB0aGF0IGNhbid0IGJlIHVzZWQgdG8gaWRlbnRp ZnkgaGFyZCBsaW5rZWQgZmlsZXM/DQoNCkNoZWVycw0KICBUcm9uZA0KLS0gDQpUcm9uZCBNeWts ZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xl YnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 15:25 ` Myklebust, Trond @ 2012-04-26 16:56 ` Steve French -1 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-26 16:56 UTC (permalink / raw) To: Myklebust, Trond Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 26, 2012 at 10:25 AM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Thu, 2012-04-26 at 09:54 -0500, Steve French wrote: >> On Thu, Apr 26, 2012 at 9:25 AM, David Howells <dhowells@redhat.com> wrote: >> > Steve French <smfrench@gmail.com> wrote: >> > >> >> Would it be better to make the stable vs volatile inode number an attribute >> >> of the volume or something returned by the proposed xstat? >> > >> > I'm not sure what you mean by a stable vs a volatile inode number. >> >> Both NFS and CIFS (and SMB2) can return inode numbers or equivalent >> unique identifier, but in the case of CIFS some old servers don't support the >> calls which return inode numbers (or don't return them for all file system >> types, Windows FAT?) so in these cases cifs has to create inode >> numbers on the fly >> on the client. inode numbers created on the client are not "stable" they >> can change on unmount/remount (which can cause problems for backup >> applications). >> >> Similarly NFSv4 does not require that servers always return stable inode numbers >> (that will never change) and introduced a concept of "volatile file handle." >> We have run into this in two cases (there are probably more) - >> Specialized NFS servers >> for HPC which deal with lots of transient inodes, and second those for servers >> which base there inode number on path (Windows NFS?). See >> http://docs.oracle.com/cd/E19082-01/819-1634/rfsrefer-137/index.html >> or the NFSv4 RFC. >> >> Basically the question is whether it is worth reporting a flag on the >> call which returns >> the inode number to indicate that the inode number is "stable" (would not change >> on reboot or reconnection) or "volatile." Since the majority of NFS >> and SMB2 servers >> can return stable inode numbers, I don't feel strongly about the need >> for an indicator >> of "stable" vs. "volatile" but I mention it because backup and >> migration applications >> mention this (if inode numbers are volatile, they may have to check >> for hardlinks differently >> for example) > > I don't understand. If the filesystem doesn't support real inode > numbers, then why report them at all? What use would an application have > for an inode number that can't be used to identify hard linked files? Well ... you have to have an inode number on the Linux client side even if the server doesn't report them (or has a bug and reports duplicates). If you can't tell hardlinked files apart fix the server (but in the cases where the file systems has this problem the server doesn't usually support hardlinks either). If the server's file system internal structures don't support real inode numbers (such as FAT or a ramdisk) then it either has to make them up based on something like path name or some other attribute of the file on disk. Servers like NetApp is where this gets interesting - for cifs e.g. level 1009 query file info is used to query_file_internal_info (the inode number) but what if the server can not report inode numbers (due to a bug) in all cases. -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 16:56 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-26 16:56 UTC (permalink / raw) To: Myklebust, Trond Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 26, 2012 at 10:25 AM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Thu, 2012-04-26 at 09:54 -0500, Steve French wrote: >> On Thu, Apr 26, 2012 at 9:25 AM, David Howells <dhowells@redhat.com> wrote: >> > Steve French <smfrench@gmail.com> wrote: >> > >> >> Would it be better to make the stable vs volatile inode number an attribute >> >> of the volume or something returned by the proposed xstat? >> > >> > I'm not sure what you mean by a stable vs a volatile inode number. >> >> Both NFS and CIFS (and SMB2) can return inode numbers or equivalent >> unique identifier, but in the case of CIFS some old servers don't support the >> calls which return inode numbers (or don't return them for all file system >> types, Windows FAT?) so in these cases cifs has to create inode >> numbers on the fly >> on the client. inode numbers created on the client are not "stable" they >> can change on unmount/remount (which can cause problems for backup >> applications). >> >> Similarly NFSv4 does not require that servers always return stable inode numbers >> (that will never change) and introduced a concept of "volatile file handle." >> We have run into this in two cases (there are probably more) - >> Specialized NFS servers >> for HPC which deal with lots of transient inodes, and second those for servers >> which base there inode number on path (Windows NFS?). See >> http://docs.oracle.com/cd/E19082-01/819-1634/rfsrefer-137/index.html >> or the NFSv4 RFC. >> >> Basically the question is whether it is worth reporting a flag on the >> call which returns >> the inode number to indicate that the inode number is "stable" (would not change >> on reboot or reconnection) or "volatile." Since the majority of NFS >> and SMB2 servers >> can return stable inode numbers, I don't feel strongly about the need >> for an indicator >> of "stable" vs. "volatile" but I mention it because backup and >> migration applications >> mention this (if inode numbers are volatile, they may have to check >> for hardlinks differently >> for example) > > I don't understand. If the filesystem doesn't support real inode > numbers, then why report them at all? What use would an application have > for an inode number that can't be used to identify hard linked files? Well ... you have to have an inode number on the Linux client side even if the server doesn't report them (or has a bug and reports duplicates). If you can't tell hardlinked files apart fix the server (but in the cases where the file systems has this problem the server doesn't usually support hardlinks either). If the server's file system internal structures don't support real inode numbers (such as FAT or a ramdisk) then it either has to make them up based on something like path name or some other attribute of the file on disk. Servers like NetApp is where this gets interesting - for cifs e.g. level 1009 query file info is used to query_file_internal_info (the inode number) but what if the server can not report inode numbers (due to a bug) in all cases. -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <CAH2r5mt5af-_hxBRKK72iD5Gr99bo91ec78Rov8EGVEx8=21mA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 16:56 ` Steve French @ 2012-04-26 17:00 ` Myklebust, Trond -1 siblings, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-04-26 17:00 UTC (permalink / raw) To: Steve French Cc: David Howells, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3651 bytes --] On Thu, 2012-04-26 at 11:56 -0500, Steve French wrote: > On Thu, Apr 26, 2012 at 10:25 AM, Myklebust, Trond > <Trond.Myklebust@netapp.com> wrote: > > On Thu, 2012-04-26 at 09:54 -0500, Steve French wrote: > >> On Thu, Apr 26, 2012 at 9:25 AM, David Howells <dhowells@redhat.com> wrote: > >> > Steve French <smfrench@gmail.com> wrote: > >> > > >> >> Would it be better to make the stable vs volatile inode number an attribute > >> >> of the volume or something returned by the proposed xstat? > >> > > >> > I'm not sure what you mean by a stable vs a volatile inode number. > >> > >> Both NFS and CIFS (and SMB2) can return inode numbers or equivalent > >> unique identifier, but in the case of CIFS some old servers don't support the > >> calls which return inode numbers (or don't return them for all file system > >> types, Windows FAT?) so in these cases cifs has to create inode > >> numbers on the fly > >> on the client. inode numbers created on the client are not "stable" they > >> can change on unmount/remount (which can cause problems for backup > >> applications). > >> > >> Similarly NFSv4 does not require that servers always return stable inode numbers > >> (that will never change) and introduced a concept of "volatile file handle." > >> We have run into this in two cases (there are probably more) - > >> Specialized NFS servers > >> for HPC which deal with lots of transient inodes, and second those for servers > >> which base there inode number on path (Windows NFS?). See > >> http://docs.oracle.com/cd/E19082-01/819-1634/rfsrefer-137/index.html > >> or the NFSv4 RFC. > >> > >> Basically the question is whether it is worth reporting a flag on the > >> call which returns > >> the inode number to indicate that the inode number is "stable" (would not change > >> on reboot or reconnection) or "volatile." Since the majority of NFS > >> and SMB2 servers > >> can return stable inode numbers, I don't feel strongly about the need > >> for an indicator > >> of "stable" vs. "volatile" but I mention it because backup and > >> migration applications > >> mention this (if inode numbers are volatile, they may have to check > >> for hardlinks differently > >> for example) > > > > I don't understand. If the filesystem doesn't support real inode > > numbers, then why report them at all? What use would an application have > > for an inode number that can't be used to identify hard linked files? > > Well ... you have to have an inode number on the Linux client side even if > the server doesn't report them (or has a bug and reports duplicates). > If you can't tell hardlinked files apart fix the server (but in the > cases where the file systems has this problem the server doesn't usually > support hardlinks either). > > If the server's file system internal structures don't support real inode > numbers (such as FAT or a ramdisk) then it either has to make them > up based on something like path name or some other attribute of the > file on disk. > > Servers like NetApp is where this gets interesting - for cifs e.g. level 1009 > query file info is used to query_file_internal_info (the inode number) but > what if the server can not report inode numbers (due to a bug) in > all cases. Right, but none of this explains why we need to report these bogus inode numbers to the application in the xstat() reply. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±û"Ø^nr¡ö¦zË\x1aëh¨èÚ&¢îý»\x05ËÛÔØï¦v¬Îf\x1dp)¹¹br ê+Ê+zf£¢·h§~Ûiÿûàz¹\x1e®w¥¢¸?¨èÚ&¢)ߢ^[f ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 17:00 ` Myklebust, Trond 0 siblings, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-04-26 17:00 UTC (permalink / raw) To: Steve French Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha T24gVGh1LCAyMDEyLTA0LTI2IGF0IDExOjU2IC0wNTAwLCBTdGV2ZSBGcmVuY2ggd3JvdGU6DQo+ IE9uIFRodSwgQXByIDI2LCAyMDEyIGF0IDEwOjI1IEFNLCBNeWtsZWJ1c3QsIFRyb25kDQo+IDxU cm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbT4gd3JvdGU6DQo+ID4gT24gVGh1LCAyMDEyLTA0LTI2 IGF0IDA5OjU0IC0wNTAwLCBTdGV2ZSBGcmVuY2ggd3JvdGU6DQo+ID4+IE9uIFRodSwgQXByIDI2 LCAyMDEyIGF0IDk6MjUgQU0sIERhdmlkIEhvd2VsbHMgPGRob3dlbGxzQHJlZGhhdC5jb20+IHdy b3RlOg0KPiA+PiA+IFN0ZXZlIEZyZW5jaCA8c21mcmVuY2hAZ21haWwuY29tPiB3cm90ZToNCj4g Pj4gPg0KPiA+PiA+PiBXb3VsZCBpdCBiZSBiZXR0ZXIgdG8gbWFrZSB0aGUgc3RhYmxlIHZzIHZv bGF0aWxlIGlub2RlIG51bWJlciBhbiBhdHRyaWJ1dGUNCj4gPj4gPj4gb2YgdGhlIHZvbHVtZSAg b3Igc29tZXRoaW5nIHJldHVybmVkIGJ5IHRoZSBwcm9wb3NlZCB4c3RhdD8NCj4gPj4gPg0KPiA+ PiA+IEknbSBub3Qgc3VyZSB3aGF0IHlvdSBtZWFuIGJ5IGEgc3RhYmxlIHZzIGEgdm9sYXRpbGUg aW5vZGUgbnVtYmVyLg0KPiA+Pg0KPiA+PiBCb3RoIE5GUyBhbmQgQ0lGUyAoYW5kIFNNQjIpIGNh biByZXR1cm4gaW5vZGUgbnVtYmVycyBvciBlcXVpdmFsZW50DQo+ID4+IHVuaXF1ZSBpZGVudGlm aWVyLCBidXQgaW4gdGhlIGNhc2Ugb2YgQ0lGUyBzb21lIG9sZCBzZXJ2ZXJzIGRvbid0IHN1cHBv cnQgdGhlDQo+ID4+IGNhbGxzIHdoaWNoIHJldHVybiBpbm9kZSBudW1iZXJzIChvciBkb24ndCBy ZXR1cm4gdGhlbSBmb3IgYWxsIGZpbGUgc3lzdGVtDQo+ID4+IHR5cGVzLCBXaW5kb3dzIEZBVD8p IHNvIGluIHRoZXNlIGNhc2VzIGNpZnMgaGFzIHRvIGNyZWF0ZSBpbm9kZQ0KPiA+PiBudW1iZXJz IG9uIHRoZSBmbHkNCj4gPj4gb24gdGhlIGNsaWVudC4gICBpbm9kZSBudW1iZXJzIGNyZWF0ZWQg b24gdGhlIGNsaWVudCBhcmUgbm90ICJzdGFibGUiIHRoZXkNCj4gPj4gY2FuIGNoYW5nZSBvbiB1 bm1vdW50L3JlbW91bnQgKHdoaWNoIGNhbiBjYXVzZSBwcm9ibGVtcyBmb3IgYmFja3VwDQo+ID4+ IGFwcGxpY2F0aW9ucykuDQo+ID4+DQo+ID4+IFNpbWlsYXJseSBORlN2NCBkb2VzIG5vdCByZXF1 aXJlIHRoYXQgc2VydmVycyBhbHdheXMgcmV0dXJuIHN0YWJsZSBpbm9kZSBudW1iZXJzDQo+ID4+ ICh0aGF0IHdpbGwgbmV2ZXIgY2hhbmdlKSBhbmQgaW50cm9kdWNlZCBhIGNvbmNlcHQgb2YgInZv bGF0aWxlIGZpbGUgaGFuZGxlLiINCj4gPj4gV2UgaGF2ZSBydW4gaW50byB0aGlzIGluIHR3byBj YXNlcyAodGhlcmUgYXJlIHByb2JhYmx5IG1vcmUpIC0NCj4gPj4gU3BlY2lhbGl6ZWQgTkZTIHNl cnZlcnMNCj4gPj4gZm9yIEhQQyB3aGljaCBkZWFsIHdpdGggbG90cyBvZiB0cmFuc2llbnQgaW5v ZGVzLCBhbmQgc2Vjb25kIHRob3NlIGZvciBzZXJ2ZXJzDQo+ID4+IHdoaWNoIGJhc2UgdGhlcmUg aW5vZGUgbnVtYmVyIG9uIHBhdGggKFdpbmRvd3MgTkZTPykuICBTZWUNCj4gPj4gaHR0cDovL2Rv Y3Mub3JhY2xlLmNvbS9jZC9FMTkwODItMDEvODE5LTE2MzQvcmZzcmVmZXItMTM3L2luZGV4Lmh0 bWwNCj4gPj4gb3IgdGhlIE5GU3Y0IFJGQy4NCj4gPj4NCj4gPj4gQmFzaWNhbGx5IHRoZSBxdWVz dGlvbiBpcyB3aGV0aGVyIGl0IGlzIHdvcnRoIHJlcG9ydGluZyBhIGZsYWcgb24gdGhlDQo+ID4+ IGNhbGwgd2hpY2ggcmV0dXJucw0KPiA+PiB0aGUgaW5vZGUgbnVtYmVyIHRvIGluZGljYXRlIHRo YXQgdGhlIGlub2RlIG51bWJlciBpcyAic3RhYmxlIiAod291bGQgbm90IGNoYW5nZQ0KPiA+PiBv biByZWJvb3Qgb3IgcmVjb25uZWN0aW9uKSBvciAidm9sYXRpbGUuIiAgICBTaW5jZSB0aGUgbWFq b3JpdHkgb2YgTkZTDQo+ID4+IGFuZCBTTUIyIHNlcnZlcnMNCj4gPj4gY2FuIHJldHVybiBzdGFi bGUgaW5vZGUgbnVtYmVycywgSSBkb24ndCBmZWVsIHN0cm9uZ2x5IGFib3V0IHRoZSBuZWVkDQo+ ID4+IGZvciBhbiBpbmRpY2F0b3INCj4gPj4gb2YgInN0YWJsZSIgdnMuICJ2b2xhdGlsZSIgYnV0 IEkgbWVudGlvbiBpdCBiZWNhdXNlIGJhY2t1cCBhbmQNCj4gPj4gbWlncmF0aW9uIGFwcGxpY2F0 aW9ucw0KPiA+PiBtZW50aW9uIHRoaXMgKGlmIGlub2RlIG51bWJlcnMgYXJlIHZvbGF0aWxlLCB0 aGV5IG1heSBoYXZlIHRvIGNoZWNrDQo+ID4+IGZvciBoYXJkbGlua3MgZGlmZmVyZW50bHkNCj4g Pj4gZm9yIGV4YW1wbGUpDQo+ID4NCj4gPiBJIGRvbid0IHVuZGVyc3RhbmQuIElmIHRoZSBmaWxl c3lzdGVtIGRvZXNuJ3Qgc3VwcG9ydCByZWFsIGlub2RlDQo+ID4gbnVtYmVycywgdGhlbiB3aHkg cmVwb3J0IHRoZW0gYXQgYWxsPyBXaGF0IHVzZSB3b3VsZCBhbiBhcHBsaWNhdGlvbiBoYXZlDQo+ ID4gZm9yIGFuIGlub2RlIG51bWJlciB0aGF0IGNhbid0IGJlIHVzZWQgdG8gaWRlbnRpZnkgaGFy ZCBsaW5rZWQgZmlsZXM/DQo+IA0KPiBXZWxsIC4uLiB5b3UgaGF2ZSB0byBoYXZlIGFuIGlub2Rl IG51bWJlciBvbiB0aGUgTGludXggY2xpZW50IHNpZGUgZXZlbiBpZg0KPiB0aGUgc2VydmVyIGRv ZXNuJ3QgcmVwb3J0IHRoZW0gKG9yIGhhcyBhIGJ1ZyBhbmQgcmVwb3J0cyBkdXBsaWNhdGVzKS4N Cj4gSWYgeW91IGNhbid0IHRlbGwgaGFyZGxpbmtlZCBmaWxlcyBhcGFydCBmaXggdGhlIHNlcnZl ciAoYnV0IGluIHRoZQ0KPiBjYXNlcyB3aGVyZSB0aGUgZmlsZSBzeXN0ZW1zIGhhcyB0aGlzIHBy b2JsZW0gdGhlIHNlcnZlciBkb2Vzbid0IHVzdWFsbHkNCj4gc3VwcG9ydCBoYXJkbGlua3MgZWl0 aGVyKS4NCj4gDQo+IElmIHRoZSBzZXJ2ZXIncyBmaWxlIHN5c3RlbSBpbnRlcm5hbCBzdHJ1Y3R1 cmVzIGRvbid0IHN1cHBvcnQgcmVhbCBpbm9kZQ0KPiBudW1iZXJzIChzdWNoIGFzIEZBVCBvciBh IHJhbWRpc2spIHRoZW4gaXQgZWl0aGVyIGhhcyB0byBtYWtlIHRoZW0NCj4gdXAgYmFzZWQgb24g c29tZXRoaW5nIGxpa2UgcGF0aCBuYW1lIG9yIHNvbWUgb3RoZXIgYXR0cmlidXRlIG9mIHRoZQ0K PiBmaWxlIG9uIGRpc2suDQo+IA0KPiBTZXJ2ZXJzIGxpa2UgTmV0QXBwIGlzIHdoZXJlIHRoaXMg Z2V0cyBpbnRlcmVzdGluZyAtIGZvciBjaWZzIGUuZy4gbGV2ZWwgMTAwOQ0KPiBxdWVyeSBmaWxl IGluZm8gaXMgdXNlZCB0byBxdWVyeV9maWxlX2ludGVybmFsX2luZm8gKHRoZSBpbm9kZSBudW1i ZXIpIGJ1dA0KPiB3aGF0IGlmIHRoZSBzZXJ2ZXIgY2FuIG5vdCByZXBvcnQgaW5vZGUgbnVtYmVy cyAoZHVlIHRvIGEgYnVnKSBpbg0KPiBhbGwgY2FzZXMuDQoNClJpZ2h0LCBidXQgbm9uZSBvZiB0 aGlzIGV4cGxhaW5zIHdoeSB3ZSBuZWVkIHRvIHJlcG9ydCB0aGVzZSBib2d1cyBpbm9kZQ0KbnVt YmVycyB0byB0aGUgYXBwbGljYXRpb24gaW4gdGhlIHhzdGF0KCkgcmVwbHkuDQoNCi0tIA0KVHJv bmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9u ZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 17:00 ` Myklebust, Trond @ 2012-04-26 17:03 ` Steve French -1 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-26 17:03 UTC (permalink / raw) To: Myklebust, Trond Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 26, 2012 at 12:00 PM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Thu, 2012-04-26 at 11:56 -0500, Steve French wrote: >> On Thu, Apr 26, 2012 at 10:25 AM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> > On Thu, 2012-04-26 at 09:54 -0500, Steve French wrote: >> >> On Thu, Apr 26, 2012 at 9:25 AM, David Howells <dhowells@redhat.com> wrote: >> >> > Steve French <smfrench@gmail.com> wrote: >> >> > >> >> >> Would it be better to make the stable vs volatile inode number an attribute >> >> >> of the volume or something returned by the proposed xstat? >> >> > >> >> > I'm not sure what you mean by a stable vs a volatile inode number. >> >> >> >> Both NFS and CIFS (and SMB2) can return inode numbers or equivalent >> >> unique identifier, but in the case of CIFS some old servers don't support the >> >> calls which return inode numbers (or don't return them for all file system >> >> types, Windows FAT?) so in these cases cifs has to create inode >> >> numbers on the fly >> >> on the client. inode numbers created on the client are not "stable" they >> >> can change on unmount/remount (which can cause problems for backup >> >> applications). >> >> >> >> Similarly NFSv4 does not require that servers always return stable inode numbers >> >> (that will never change) and introduced a concept of "volatile file handle." >> >> We have run into this in two cases (there are probably more) - >> >> Specialized NFS servers >> >> for HPC which deal with lots of transient inodes, and second those for servers >> >> which base there inode number on path (Windows NFS?). See >> >> http://docs.oracle.com/cd/E19082-01/819-1634/rfsrefer-137/index.html >> >> or the NFSv4 RFC. >> >> >> >> Basically the question is whether it is worth reporting a flag on the >> >> call which returns >> >> the inode number to indicate that the inode number is "stable" (would not change >> >> on reboot or reconnection) or "volatile." Since the majority of NFS >> >> and SMB2 servers >> >> can return stable inode numbers, I don't feel strongly about the need >> >> for an indicator >> >> of "stable" vs. "volatile" but I mention it because backup and >> >> migration applications >> >> mention this (if inode numbers are volatile, they may have to check >> >> for hardlinks differently >> >> for example) >> > >> > I don't understand. If the filesystem doesn't support real inode >> > numbers, then why report them at all? What use would an application have >> > for an inode number that can't be used to identify hard linked files? >> >> Well ... you have to have an inode number on the Linux client side even if >> the server doesn't report them (or has a bug and reports duplicates). >> If you can't tell hardlinked files apart fix the server (but in the >> cases where the file systems has this problem the server doesn't usually >> support hardlinks either). >> >> If the server's file system internal structures don't support real inode >> numbers (such as FAT or a ramdisk) then it either has to make them >> up based on something like path name or some other attribute of the >> file on disk. >> >> Servers like NetApp is where this gets interesting - for cifs e.g. level 1009 >> query file info is used to query_file_internal_info (the inode number) but >> what if the server can not report inode numbers (due to a bug) in >> all cases. > > Right, but none of this explains why we need to report these bogus inode > numbers to the application in the xstat() reply. the question is whether the application (backup) would need to know that the inode numbers are bogus and from my conversations with guys writing backup software it seems that such data is useful to them. -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 17:03 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-26 17:03 UTC (permalink / raw) To: Myklebust, Trond Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 26, 2012 at 12:00 PM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Thu, 2012-04-26 at 11:56 -0500, Steve French wrote: >> On Thu, Apr 26, 2012 at 10:25 AM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> > On Thu, 2012-04-26 at 09:54 -0500, Steve French wrote: >> >> On Thu, Apr 26, 2012 at 9:25 AM, David Howells <dhowells@redhat.com> wrote: >> >> > Steve French <smfrench@gmail.com> wrote: >> >> > >> >> >> Would it be better to make the stable vs volatile inode number an attribute >> >> >> of the volume or something returned by the proposed xstat? >> >> > >> >> > I'm not sure what you mean by a stable vs a volatile inode number. >> >> >> >> Both NFS and CIFS (and SMB2) can return inode numbers or equivalent >> >> unique identifier, but in the case of CIFS some old servers don't support the >> >> calls which return inode numbers (or don't return them for all file system >> >> types, Windows FAT?) so in these cases cifs has to create inode >> >> numbers on the fly >> >> on the client. inode numbers created on the client are not "stable" they >> >> can change on unmount/remount (which can cause problems for backup >> >> applications). >> >> >> >> Similarly NFSv4 does not require that servers always return stable inode numbers >> >> (that will never change) and introduced a concept of "volatile file handle." >> >> We have run into this in two cases (there are probably more) - >> >> Specialized NFS servers >> >> for HPC which deal with lots of transient inodes, and second those for servers >> >> which base there inode number on path (Windows NFS?). See >> >> http://docs.oracle.com/cd/E19082-01/819-1634/rfsrefer-137/index.html >> >> or the NFSv4 RFC. >> >> >> >> Basically the question is whether it is worth reporting a flag on the >> >> call which returns >> >> the inode number to indicate that the inode number is "stable" (would not change >> >> on reboot or reconnection) or "volatile." Since the majority of NFS >> >> and SMB2 servers >> >> can return stable inode numbers, I don't feel strongly about the need >> >> for an indicator >> >> of "stable" vs. "volatile" but I mention it because backup and >> >> migration applications >> >> mention this (if inode numbers are volatile, they may have to check >> >> for hardlinks differently >> >> for example) >> > >> > I don't understand. If the filesystem doesn't support real inode >> > numbers, then why report them at all? What use would an application have >> > for an inode number that can't be used to identify hard linked files? >> >> Well ... you have to have an inode number on the Linux client side even if >> the server doesn't report them (or has a bug and reports duplicates). >> If you can't tell hardlinked files apart fix the server (but in the >> cases where the file systems has this problem the server doesn't usually >> support hardlinks either). >> >> If the server's file system internal structures don't support real inode >> numbers (such as FAT or a ramdisk) then it either has to make them >> up based on something like path name or some other attribute of the >> file on disk. >> >> Servers like NetApp is where this gets interesting - for cifs e.g. level 1009 >> query file info is used to query_file_internal_info (the inode number) but >> what if the server can not report inode numbers (due to a bug) in >> all cases. > > Right, but none of this explains why we need to report these bogus inode > numbers to the application in the xstat() reply. the question is whether the application (backup) would need to know that the inode numbers are bogus and from my conversations with guys writing backup software it seems that such data is useful to them. -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <CAH2r5mvmCfLrxRHje6Wx5X84zxPEHwRMUJGsjvWBujMu7w841w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 17:03 ` Steve French @ 2012-04-26 17:06 ` Myklebust, Trond -1 siblings, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-04-26 17:06 UTC (permalink / raw) To: Steve French Cc: David Howells, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Thu, 2012-04-26 at 12:03 -0500, Steve French wrote: > On Thu, Apr 26, 2012 at 12:00 PM, Myklebust, Trond > <Trond.Myklebust@netapp.com> wrote: > > On Thu, 2012-04-26 at 11:56 -0500, Steve French wrote: > >> On Thu, Apr 26, 2012 at 10:25 AM, Myklebust, Trond > >> <Trond.Myklebust@netapp.com> wrote: > >> > On Thu, 2012-04-26 at 09:54 -0500, Steve French wrote: > >> >> On Thu, Apr 26, 2012 at 9:25 AM, David Howells <dhowells@redhat.com> wrote: > >> >> > Steve French <smfrench@gmail.com> wrote: > >> >> > > >> >> >> Would it be better to make the stable vs volatile inode number an attribute > >> >> >> of the volume or something returned by the proposed xstat? > >> >> > > >> >> > I'm not sure what you mean by a stable vs a volatile inode number. > >> >> > >> >> Both NFS and CIFS (and SMB2) can return inode numbers or equivalent > >> >> unique identifier, but in the case of CIFS some old servers don't support the > >> >> calls which return inode numbers (or don't return them for all file system > >> >> types, Windows FAT?) so in these cases cifs has to create inode > >> >> numbers on the fly > >> >> on the client. inode numbers created on the client are not "stable" they > >> >> can change on unmount/remount (which can cause problems for backup > >> >> applications). > >> >> > >> >> Similarly NFSv4 does not require that servers always return stable inode numbers > >> >> (that will never change) and introduced a concept of "volatile file handle." > >> >> We have run into this in two cases (there are probably more) - > >> >> Specialized NFS servers > >> >> for HPC which deal with lots of transient inodes, and second those for servers > >> >> which base there inode number on path (Windows NFS?). See > >> >> http://docs.oracle.com/cd/E19082-01/819-1634/rfsrefer-137/index.html > >> >> or the NFSv4 RFC. > >> >> > >> >> Basically the question is whether it is worth reporting a flag on the > >> >> call which returns > >> >> the inode number to indicate that the inode number is "stable" (would not change > >> >> on reboot or reconnection) or "volatile." Since the majority of NFS > >> >> and SMB2 servers > >> >> can return stable inode numbers, I don't feel strongly about the need > >> >> for an indicator > >> >> of "stable" vs. "volatile" but I mention it because backup and > >> >> migration applications > >> >> mention this (if inode numbers are volatile, they may have to check > >> >> for hardlinks differently > >> >> for example) > >> > > >> > I don't understand. If the filesystem doesn't support real inode > >> > numbers, then why report them at all? What use would an application have > >> > for an inode number that can't be used to identify hard linked files? > >> > >> Well ... you have to have an inode number on the Linux client side even if > >> the server doesn't report them (or has a bug and reports duplicates). > >> If you can't tell hardlinked files apart fix the server (but in the > >> cases where the file systems has this problem the server doesn't usually > >> support hardlinks either). > >> > >> If the server's file system internal structures don't support real inode > >> numbers (such as FAT or a ramdisk) then it either has to make them > >> up based on something like path name or some other attribute of the > >> file on disk. > >> > >> Servers like NetApp is where this gets interesting - for cifs e.g. level 1009 > >> query file info is used to query_file_internal_info (the inode number) but > >> what if the server can not report inode numbers (due to a bug) in > >> all cases. > > > > Right, but none of this explains why we need to report these bogus inode > > numbers to the application in the xstat() reply. > > the question is whether the application (backup) would need to know > that the inode numbers are bogus and from my conversations with > guys writing backup software it seems that such data is useful to them. You are still not explaining why they need to know the values at all? If the values are bogus, then don't return them, and don't set the flag that says they are being returned. Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 17:06 ` Myklebust, Trond 0 siblings, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-04-26 17:06 UTC (permalink / raw) To: Steve French Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha T24gVGh1LCAyMDEyLTA0LTI2IGF0IDEyOjAzIC0wNTAwLCBTdGV2ZSBGcmVuY2ggd3JvdGU6DQo+ IE9uIFRodSwgQXByIDI2LCAyMDEyIGF0IDEyOjAwIFBNLCBNeWtsZWJ1c3QsIFRyb25kDQo+IDxU cm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbT4gd3JvdGU6DQo+ID4gT24gVGh1LCAyMDEyLTA0LTI2 IGF0IDExOjU2IC0wNTAwLCBTdGV2ZSBGcmVuY2ggd3JvdGU6DQo+ID4+IE9uIFRodSwgQXByIDI2 LCAyMDEyIGF0IDEwOjI1IEFNLCBNeWtsZWJ1c3QsIFRyb25kDQo+ID4+IDxUcm9uZC5NeWtsZWJ1 c3RAbmV0YXBwLmNvbT4gd3JvdGU6DQo+ID4+ID4gT24gVGh1LCAyMDEyLTA0LTI2IGF0IDA5OjU0 IC0wNTAwLCBTdGV2ZSBGcmVuY2ggd3JvdGU6DQo+ID4+ID4+IE9uIFRodSwgQXByIDI2LCAyMDEy IGF0IDk6MjUgQU0sIERhdmlkIEhvd2VsbHMgPGRob3dlbGxzQHJlZGhhdC5jb20+IHdyb3RlOg0K PiA+PiA+PiA+IFN0ZXZlIEZyZW5jaCA8c21mcmVuY2hAZ21haWwuY29tPiB3cm90ZToNCj4gPj4g Pj4gPg0KPiA+PiA+PiA+PiBXb3VsZCBpdCBiZSBiZXR0ZXIgdG8gbWFrZSB0aGUgc3RhYmxlIHZz IHZvbGF0aWxlIGlub2RlIG51bWJlciBhbiBhdHRyaWJ1dGUNCj4gPj4gPj4gPj4gb2YgdGhlIHZv bHVtZSAgb3Igc29tZXRoaW5nIHJldHVybmVkIGJ5IHRoZSBwcm9wb3NlZCB4c3RhdD8NCj4gPj4g Pj4gPg0KPiA+PiA+PiA+IEknbSBub3Qgc3VyZSB3aGF0IHlvdSBtZWFuIGJ5IGEgc3RhYmxlIHZz IGEgdm9sYXRpbGUgaW5vZGUgbnVtYmVyLg0KPiA+PiA+Pg0KPiA+PiA+PiBCb3RoIE5GUyBhbmQg Q0lGUyAoYW5kIFNNQjIpIGNhbiByZXR1cm4gaW5vZGUgbnVtYmVycyBvciBlcXVpdmFsZW50DQo+ ID4+ID4+IHVuaXF1ZSBpZGVudGlmaWVyLCBidXQgaW4gdGhlIGNhc2Ugb2YgQ0lGUyBzb21lIG9s ZCBzZXJ2ZXJzIGRvbid0IHN1cHBvcnQgdGhlDQo+ID4+ID4+IGNhbGxzIHdoaWNoIHJldHVybiBp bm9kZSBudW1iZXJzIChvciBkb24ndCByZXR1cm4gdGhlbSBmb3IgYWxsIGZpbGUgc3lzdGVtDQo+ ID4+ID4+IHR5cGVzLCBXaW5kb3dzIEZBVD8pIHNvIGluIHRoZXNlIGNhc2VzIGNpZnMgaGFzIHRv IGNyZWF0ZSBpbm9kZQ0KPiA+PiA+PiBudW1iZXJzIG9uIHRoZSBmbHkNCj4gPj4gPj4gb24gdGhl IGNsaWVudC4gICBpbm9kZSBudW1iZXJzIGNyZWF0ZWQgb24gdGhlIGNsaWVudCBhcmUgbm90ICJz dGFibGUiIHRoZXkNCj4gPj4gPj4gY2FuIGNoYW5nZSBvbiB1bm1vdW50L3JlbW91bnQgKHdoaWNo IGNhbiBjYXVzZSBwcm9ibGVtcyBmb3IgYmFja3VwDQo+ID4+ID4+IGFwcGxpY2F0aW9ucykuDQo+ ID4+ID4+DQo+ID4+ID4+IFNpbWlsYXJseSBORlN2NCBkb2VzIG5vdCByZXF1aXJlIHRoYXQgc2Vy dmVycyBhbHdheXMgcmV0dXJuIHN0YWJsZSBpbm9kZSBudW1iZXJzDQo+ID4+ID4+ICh0aGF0IHdp bGwgbmV2ZXIgY2hhbmdlKSBhbmQgaW50cm9kdWNlZCBhIGNvbmNlcHQgb2YgInZvbGF0aWxlIGZp bGUgaGFuZGxlLiINCj4gPj4gPj4gV2UgaGF2ZSBydW4gaW50byB0aGlzIGluIHR3byBjYXNlcyAo dGhlcmUgYXJlIHByb2JhYmx5IG1vcmUpIC0NCj4gPj4gPj4gU3BlY2lhbGl6ZWQgTkZTIHNlcnZl cnMNCj4gPj4gPj4gZm9yIEhQQyB3aGljaCBkZWFsIHdpdGggbG90cyBvZiB0cmFuc2llbnQgaW5v ZGVzLCBhbmQgc2Vjb25kIHRob3NlIGZvciBzZXJ2ZXJzDQo+ID4+ID4+IHdoaWNoIGJhc2UgdGhl cmUgaW5vZGUgbnVtYmVyIG9uIHBhdGggKFdpbmRvd3MgTkZTPykuICBTZWUNCj4gPj4gPj4gaHR0 cDovL2RvY3Mub3JhY2xlLmNvbS9jZC9FMTkwODItMDEvODE5LTE2MzQvcmZzcmVmZXItMTM3L2lu ZGV4Lmh0bWwNCj4gPj4gPj4gb3IgdGhlIE5GU3Y0IFJGQy4NCj4gPj4gPj4NCj4gPj4gPj4gQmFz aWNhbGx5IHRoZSBxdWVzdGlvbiBpcyB3aGV0aGVyIGl0IGlzIHdvcnRoIHJlcG9ydGluZyBhIGZs YWcgb24gdGhlDQo+ID4+ID4+IGNhbGwgd2hpY2ggcmV0dXJucw0KPiA+PiA+PiB0aGUgaW5vZGUg bnVtYmVyIHRvIGluZGljYXRlIHRoYXQgdGhlIGlub2RlIG51bWJlciBpcyAic3RhYmxlIiAod291 bGQgbm90IGNoYW5nZQ0KPiA+PiA+PiBvbiByZWJvb3Qgb3IgcmVjb25uZWN0aW9uKSBvciAidm9s YXRpbGUuIiAgICBTaW5jZSB0aGUgbWFqb3JpdHkgb2YgTkZTDQo+ID4+ID4+IGFuZCBTTUIyIHNl cnZlcnMNCj4gPj4gPj4gY2FuIHJldHVybiBzdGFibGUgaW5vZGUgbnVtYmVycywgSSBkb24ndCBm ZWVsIHN0cm9uZ2x5IGFib3V0IHRoZSBuZWVkDQo+ID4+ID4+IGZvciBhbiBpbmRpY2F0b3INCj4g Pj4gPj4gb2YgInN0YWJsZSIgdnMuICJ2b2xhdGlsZSIgYnV0IEkgbWVudGlvbiBpdCBiZWNhdXNl IGJhY2t1cCBhbmQNCj4gPj4gPj4gbWlncmF0aW9uIGFwcGxpY2F0aW9ucw0KPiA+PiA+PiBtZW50 aW9uIHRoaXMgKGlmIGlub2RlIG51bWJlcnMgYXJlIHZvbGF0aWxlLCB0aGV5IG1heSBoYXZlIHRv IGNoZWNrDQo+ID4+ID4+IGZvciBoYXJkbGlua3MgZGlmZmVyZW50bHkNCj4gPj4gPj4gZm9yIGV4 YW1wbGUpDQo+ID4+ID4NCj4gPj4gPiBJIGRvbid0IHVuZGVyc3RhbmQuIElmIHRoZSBmaWxlc3lz dGVtIGRvZXNuJ3Qgc3VwcG9ydCByZWFsIGlub2RlDQo+ID4+ID4gbnVtYmVycywgdGhlbiB3aHkg cmVwb3J0IHRoZW0gYXQgYWxsPyBXaGF0IHVzZSB3b3VsZCBhbiBhcHBsaWNhdGlvbiBoYXZlDQo+ ID4+ID4gZm9yIGFuIGlub2RlIG51bWJlciB0aGF0IGNhbid0IGJlIHVzZWQgdG8gaWRlbnRpZnkg aGFyZCBsaW5rZWQgZmlsZXM/DQo+ID4+DQo+ID4+IFdlbGwgLi4uIHlvdSBoYXZlIHRvIGhhdmUg YW4gaW5vZGUgbnVtYmVyIG9uIHRoZSBMaW51eCBjbGllbnQgc2lkZSBldmVuIGlmDQo+ID4+IHRo ZSBzZXJ2ZXIgZG9lc24ndCByZXBvcnQgdGhlbSAob3IgaGFzIGEgYnVnIGFuZCByZXBvcnRzIGR1 cGxpY2F0ZXMpLg0KPiA+PiBJZiB5b3UgY2FuJ3QgdGVsbCBoYXJkbGlua2VkIGZpbGVzIGFwYXJ0 IGZpeCB0aGUgc2VydmVyIChidXQgaW4gdGhlDQo+ID4+IGNhc2VzIHdoZXJlIHRoZSBmaWxlIHN5 c3RlbXMgaGFzIHRoaXMgcHJvYmxlbSB0aGUgc2VydmVyIGRvZXNuJ3QgdXN1YWxseQ0KPiA+PiBz dXBwb3J0IGhhcmRsaW5rcyBlaXRoZXIpLg0KPiA+Pg0KPiA+PiBJZiB0aGUgc2VydmVyJ3MgZmls ZSBzeXN0ZW0gaW50ZXJuYWwgc3RydWN0dXJlcyBkb24ndCBzdXBwb3J0IHJlYWwgaW5vZGUNCj4g Pj4gbnVtYmVycyAoc3VjaCBhcyBGQVQgb3IgYSByYW1kaXNrKSB0aGVuIGl0IGVpdGhlciBoYXMg dG8gbWFrZSB0aGVtDQo+ID4+IHVwIGJhc2VkIG9uIHNvbWV0aGluZyBsaWtlIHBhdGggbmFtZSBv ciBzb21lIG90aGVyIGF0dHJpYnV0ZSBvZiB0aGUNCj4gPj4gZmlsZSBvbiBkaXNrLg0KPiA+Pg0K PiA+PiBTZXJ2ZXJzIGxpa2UgTmV0QXBwIGlzIHdoZXJlIHRoaXMgZ2V0cyBpbnRlcmVzdGluZyAt IGZvciBjaWZzIGUuZy4gbGV2ZWwgMTAwOQ0KPiA+PiBxdWVyeSBmaWxlIGluZm8gaXMgdXNlZCB0 byBxdWVyeV9maWxlX2ludGVybmFsX2luZm8gKHRoZSBpbm9kZSBudW1iZXIpIGJ1dA0KPiA+PiB3 aGF0IGlmIHRoZSBzZXJ2ZXIgY2FuIG5vdCByZXBvcnQgaW5vZGUgbnVtYmVycyAoZHVlIHRvIGEg YnVnKSBpbg0KPiA+PiBhbGwgY2FzZXMuDQo+ID4NCj4gPiBSaWdodCwgYnV0IG5vbmUgb2YgdGhp cyBleHBsYWlucyB3aHkgd2UgbmVlZCB0byByZXBvcnQgdGhlc2UgYm9ndXMgaW5vZGUNCj4gPiBu dW1iZXJzIHRvIHRoZSBhcHBsaWNhdGlvbiBpbiB0aGUgeHN0YXQoKSByZXBseS4NCj4gDQo+IHRo ZSBxdWVzdGlvbiBpcyB3aGV0aGVyIHRoZSBhcHBsaWNhdGlvbiAoYmFja3VwKSB3b3VsZCBuZWVk IHRvIGtub3cNCj4gdGhhdCB0aGUgaW5vZGUgbnVtYmVycyBhcmUgYm9ndXMgYW5kIGZyb20gbXkg Y29udmVyc2F0aW9ucyB3aXRoDQo+IGd1eXMgd3JpdGluZyBiYWNrdXAgc29mdHdhcmUgaXQgc2Vl bXMgdGhhdCBzdWNoIGRhdGEgaXMgdXNlZnVsIHRvIHRoZW0uDQoNCllvdSBhcmUgc3RpbGwgbm90 IGV4cGxhaW5pbmcgd2h5IHRoZXkgbmVlZCB0byBrbm93IHRoZSB2YWx1ZXMgYXQgYWxsPyBJZg0K dGhlIHZhbHVlcyBhcmUgYm9ndXMsIHRoZW4gZG9uJ3QgcmV0dXJuIHRoZW0sIGFuZCBkb24ndCBz ZXQgdGhlIGZsYWcNCnRoYXQgc2F5cyB0aGV5IGFyZSBiZWluZyByZXR1cm5lZC4NCg0KVHJvbmQN Cg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpO ZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <1335460011.9701.30.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 17:06 ` Myklebust, Trond @ 2012-04-26 17:09 ` Steve French -1 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-26 17:09 UTC (permalink / raw) To: Myklebust, Trond Cc: David Howells, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Thu, Apr 26, 2012 at 12:06 PM, Myklebust, Trond <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org> wrote: > On Thu, 2012-04-26 at 12:03 -0500, Steve French wrote: >> On Thu, Apr 26, 2012 at 12:00 PM, Myklebust, Trond >> <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org> wrote: >> > On Thu, 2012-04-26 at 11:56 -0500, Steve French wrote: >> >> On Thu, Apr 26, 2012 at 10:25 AM, Myklebust, Trond >> >> <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org> wrote: >> >> > On Thu, 2012-04-26 at 09:54 -0500, Steve French wrote: >> >> >> On Thu, Apr 26, 2012 at 9:25 AM, David Howells <dhowells@redhat.com> wrote: >> >> >> > Steve French <smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >> >> > >> >> >> >> Would it be better to make the stable vs volatile inode number an attribute >> >> >> >> of the volume or something returned by the proposed xstat? >> >> >> > >> >> >> > I'm not sure what you mean by a stable vs a volatile inode number. >> >> >> >> >> >> Both NFS and CIFS (and SMB2) can return inode numbers or equivalent >> >> >> unique identifier, but in the case of CIFS some old servers don't support the >> >> >> calls which return inode numbers (or don't return them for all file system >> >> >> types, Windows FAT?) so in these cases cifs has to create inode >> >> >> numbers on the fly >> >> >> on the client. inode numbers created on the client are not "stable" they >> >> >> can change on unmount/remount (which can cause problems for backup >> >> >> applications). >> >> >> >> >> >> Similarly NFSv4 does not require that servers always return stable inode numbers >> >> >> (that will never change) and introduced a concept of "volatile file handle." >> >> >> We have run into this in two cases (there are probably more) - >> >> >> Specialized NFS servers >> >> >> for HPC which deal with lots of transient inodes, and second those for servers >> >> >> which base there inode number on path (Windows NFS?). See >> >> >> http://docs.oracle.com/cd/E19082-01/819-1634/rfsrefer-137/index.html >> >> >> or the NFSv4 RFC. >> >> >> >> >> >> Basically the question is whether it is worth reporting a flag on the >> >> >> call which returns >> >> >> the inode number to indicate that the inode number is "stable" (would not change >> >> >> on reboot or reconnection) or "volatile." Since the majority of NFS >> >> >> and SMB2 servers >> >> >> can return stable inode numbers, I don't feel strongly about the need >> >> >> for an indicator >> >> >> of "stable" vs. "volatile" but I mention it because backup and >> >> >> migration applications >> >> >> mention this (if inode numbers are volatile, they may have to check >> >> >> for hardlinks differently >> >> >> for example) >> >> > >> >> > I don't understand. If the filesystem doesn't support real inode >> >> > numbers, then why report them at all? What use would an application have >> >> > for an inode number that can't be used to identify hard linked files? >> >> >> >> Well ... you have to have an inode number on the Linux client side even if >> >> the server doesn't report them (or has a bug and reports duplicates). >> >> If you can't tell hardlinked files apart fix the server (but in the >> >> cases where the file systems has this problem the server doesn't usually >> >> support hardlinks either). >> >> >> >> If the server's file system internal structures don't support real inode >> >> numbers (such as FAT or a ramdisk) then it either has to make them >> >> up based on something like path name or some other attribute of the >> >> file on disk. >> >> >> >> Servers like NetApp is where this gets interesting - for cifs e.g. level 1009 >> >> query file info is used to query_file_internal_info (the inode number) but >> >> what if the server can not report inode numbers (due to a bug) in >> >> all cases. >> > >> > Right, but none of this explains why we need to report these bogus inode >> > numbers to the application in the xstat() reply. >> >> the question is whether the application (backup) would need to know >> that the inode numbers are bogus and from my conversations with >> guys writing backup software it seems that such data is useful to them. > > You are still not explaining why they need to know the values at all? If > the values are bogus, then don't return them, and don't set the flag > that says they are being returned. I don't know, but assumed it was because it was an easy way to index them since the inode numbers even if they "change" on remount, are still unique. -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 17:09 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-26 17:09 UTC (permalink / raw) To: Myklebust, Trond Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 26, 2012 at 12:06 PM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Thu, 2012-04-26 at 12:03 -0500, Steve French wrote: >> On Thu, Apr 26, 2012 at 12:00 PM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> > On Thu, 2012-04-26 at 11:56 -0500, Steve French wrote: >> >> On Thu, Apr 26, 2012 at 10:25 AM, Myklebust, Trond >> >> <Trond.Myklebust@netapp.com> wrote: >> >> > On Thu, 2012-04-26 at 09:54 -0500, Steve French wrote: >> >> >> On Thu, Apr 26, 2012 at 9:25 AM, David Howells <dhowells@redhat.com> wrote: >> >> >> > Steve French <smfrench@gmail.com> wrote: >> >> >> > >> >> >> >> Would it be better to make the stable vs volatile inode number an attribute >> >> >> >> of the volume or something returned by the proposed xstat? >> >> >> > >> >> >> > I'm not sure what you mean by a stable vs a volatile inode number. >> >> >> >> >> >> Both NFS and CIFS (and SMB2) can return inode numbers or equivalent >> >> >> unique identifier, but in the case of CIFS some old servers don't support the >> >> >> calls which return inode numbers (or don't return them for all file system >> >> >> types, Windows FAT?) so in these cases cifs has to create inode >> >> >> numbers on the fly >> >> >> on the client. inode numbers created on the client are not "stable" they >> >> >> can change on unmount/remount (which can cause problems for backup >> >> >> applications). >> >> >> >> >> >> Similarly NFSv4 does not require that servers always return stable inode numbers >> >> >> (that will never change) and introduced a concept of "volatile file handle." >> >> >> We have run into this in two cases (there are probably more) - >> >> >> Specialized NFS servers >> >> >> for HPC which deal with lots of transient inodes, and second those for servers >> >> >> which base there inode number on path (Windows NFS?). See >> >> >> http://docs.oracle.com/cd/E19082-01/819-1634/rfsrefer-137/index.html >> >> >> or the NFSv4 RFC. >> >> >> >> >> >> Basically the question is whether it is worth reporting a flag on the >> >> >> call which returns >> >> >> the inode number to indicate that the inode number is "stable" (would not change >> >> >> on reboot or reconnection) or "volatile." Since the majority of NFS >> >> >> and SMB2 servers >> >> >> can return stable inode numbers, I don't feel strongly about the need >> >> >> for an indicator >> >> >> of "stable" vs. "volatile" but I mention it because backup and >> >> >> migration applications >> >> >> mention this (if inode numbers are volatile, they may have to check >> >> >> for hardlinks differently >> >> >> for example) >> >> > >> >> > I don't understand. If the filesystem doesn't support real inode >> >> > numbers, then why report them at all? What use would an application have >> >> > for an inode number that can't be used to identify hard linked files? >> >> >> >> Well ... you have to have an inode number on the Linux client side even if >> >> the server doesn't report them (or has a bug and reports duplicates). >> >> If you can't tell hardlinked files apart fix the server (but in the >> >> cases where the file systems has this problem the server doesn't usually >> >> support hardlinks either). >> >> >> >> If the server's file system internal structures don't support real inode >> >> numbers (such as FAT or a ramdisk) then it either has to make them >> >> up based on something like path name or some other attribute of the >> >> file on disk. >> >> >> >> Servers like NetApp is where this gets interesting - for cifs e.g. level 1009 >> >> query file info is used to query_file_internal_info (the inode number) but >> >> what if the server can not report inode numbers (due to a bug) in >> >> all cases. >> > >> > Right, but none of this explains why we need to report these bogus inode >> > numbers to the application in the xstat() reply. >> >> the question is whether the application (backup) would need to know >> that the inode numbers are bogus and from my conversations with >> guys writing backup software it seems that such data is useful to them. > > You are still not explaining why they need to know the values at all? If > the values are bogus, then don't return them, and don't set the flag > that says they are being returned. I don't know, but assumed it was because it was an easy way to index them since the inode numbers even if they "change" on remount, are still unique. -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <CAH2r5muXk+frkFz9X523Ny=RMwJGeqOPH75G1ToNa5QoMo5SkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 17:09 ` Steve French @ 2012-04-26 17:10 ` Steve French -1 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-26 17:10 UTC (permalink / raw) To: Myklebust, Trond Cc: David Howells, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Thu, Apr 26, 2012 at 12:09 PM, Steve French <smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > On Thu, Apr 26, 2012 at 12:06 PM, Myklebust, Trond > <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org> wrote: >> On Thu, 2012-04-26 at 12:03 -0500, Steve French wrote: >>> On Thu, Apr 26, 2012 at 12:00 PM, Myklebust, Trond >>> <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org> wrote: >>> > On Thu, 2012-04-26 at 11:56 -0500, Steve French wrote: >>> >> On Thu, Apr 26, 2012 at 10:25 AM, Myklebust, Trond >>> >> <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org> wrote: >>> >> > On Thu, 2012-04-26 at 09:54 -0500, Steve French wrote: >>> >> >> On Thu, Apr 26, 2012 at 9:25 AM, David Howells <dhowells@redhat.com> wrote: >>> >> >> > Steve French <smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>> >> >> > >>> >> >> >> Would it be better to make the stable vs volatile inode number an attribute >>> >> >> >> of the volume or something returned by the proposed xstat? >>> >> >> > >>> >> >> > I'm not sure what you mean by a stable vs a volatile inode number. >>> >> >> >>> >> >> Both NFS and CIFS (and SMB2) can return inode numbers or equivalent >>> >> >> unique identifier, but in the case of CIFS some old servers don't support the >>> >> >> calls which return inode numbers (or don't return them for all file system >>> >> >> types, Windows FAT?) so in these cases cifs has to create inode >>> >> >> numbers on the fly >>> >> >> on the client. inode numbers created on the client are not "stable" they >>> >> >> can change on unmount/remount (which can cause problems for backup >>> >> >> applications). >>> >> >> >>> >> >> Similarly NFSv4 does not require that servers always return stable inode numbers >>> >> >> (that will never change) and introduced a concept of "volatile file handle." >>> >> >> We have run into this in two cases (there are probably more) - >>> >> >> Specialized NFS servers >>> >> >> for HPC which deal with lots of transient inodes, and second those for servers >>> >> >> which base there inode number on path (Windows NFS?). See >>> >> >> http://docs.oracle.com/cd/E19082-01/819-1634/rfsrefer-137/index.html >>> >> >> or the NFSv4 RFC. >>> >> >> >>> >> >> Basically the question is whether it is worth reporting a flag on the >>> >> >> call which returns >>> >> >> the inode number to indicate that the inode number is "stable" (would not change >>> >> >> on reboot or reconnection) or "volatile." Since the majority of NFS >>> >> >> and SMB2 servers >>> >> >> can return stable inode numbers, I don't feel strongly about the need >>> >> >> for an indicator >>> >> >> of "stable" vs. "volatile" but I mention it because backup and >>> >> >> migration applications >>> >> >> mention this (if inode numbers are volatile, they may have to check >>> >> >> for hardlinks differently >>> >> >> for example) >>> >> > >>> >> > I don't understand. If the filesystem doesn't support real inode >>> >> > numbers, then why report them at all? What use would an application have >>> >> > for an inode number that can't be used to identify hard linked files? >>> >> >>> >> Well ... you have to have an inode number on the Linux client side even if >>> >> the server doesn't report them (or has a bug and reports duplicates). >>> >> If you can't tell hardlinked files apart fix the server (but in the >>> >> cases where the file systems has this problem the server doesn't usually >>> >> support hardlinks either). >>> >> >>> >> If the server's file system internal structures don't support real inode >>> >> numbers (such as FAT or a ramdisk) then it either has to make them >>> >> up based on something like path name or some other attribute of the >>> >> file on disk. >>> >> >>> >> Servers like NetApp is where this gets interesting - for cifs e.g. level 1009 >>> >> query file info is used to query_file_internal_info (the inode number) but >>> >> what if the server can not report inode numbers (due to a bug) in >>> >> all cases. >>> > >>> > Right, but none of this explains why we need to report these bogus inode >>> > numbers to the application in the xstat() reply. >>> >>> the question is whether the application (backup) would need to know >>> that the inode numbers are bogus and from my conversations with >>> guys writing backup software it seems that such data is useful to them. >> >> You are still not explaining why they need to know the values at all? If >> the values are bogus, then don't return them, and don't set the flag >> that says they are being returned. > > I don't know, but assumed it was because it was an easy way > to index them since the inode numbers even if they "change" > on remount, are still unique. if the call allows the inode number not to be returned, that is probably ok (they can always use the posix stat to get the client generated one) -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 17:10 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-26 17:10 UTC (permalink / raw) To: Myklebust, Trond Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 26, 2012 at 12:09 PM, Steve French <smfrench@gmail.com> wrote: > On Thu, Apr 26, 2012 at 12:06 PM, Myklebust, Trond > <Trond.Myklebust@netapp.com> wrote: >> On Thu, 2012-04-26 at 12:03 -0500, Steve French wrote: >>> On Thu, Apr 26, 2012 at 12:00 PM, Myklebust, Trond >>> <Trond.Myklebust@netapp.com> wrote: >>> > On Thu, 2012-04-26 at 11:56 -0500, Steve French wrote: >>> >> On Thu, Apr 26, 2012 at 10:25 AM, Myklebust, Trond >>> >> <Trond.Myklebust@netapp.com> wrote: >>> >> > On Thu, 2012-04-26 at 09:54 -0500, Steve French wrote: >>> >> >> On Thu, Apr 26, 2012 at 9:25 AM, David Howells <dhowells@redhat.com> wrote: >>> >> >> > Steve French <smfrench@gmail.com> wrote: >>> >> >> > >>> >> >> >> Would it be better to make the stable vs volatile inode number an attribute >>> >> >> >> of the volume or something returned by the proposed xstat? >>> >> >> > >>> >> >> > I'm not sure what you mean by a stable vs a volatile inode number. >>> >> >> >>> >> >> Both NFS and CIFS (and SMB2) can return inode numbers or equivalent >>> >> >> unique identifier, but in the case of CIFS some old servers don't support the >>> >> >> calls which return inode numbers (or don't return them for all file system >>> >> >> types, Windows FAT?) so in these cases cifs has to create inode >>> >> >> numbers on the fly >>> >> >> on the client. inode numbers created on the client are not "stable" they >>> >> >> can change on unmount/remount (which can cause problems for backup >>> >> >> applications). >>> >> >> >>> >> >> Similarly NFSv4 does not require that servers always return stable inode numbers >>> >> >> (that will never change) and introduced a concept of "volatile file handle." >>> >> >> We have run into this in two cases (there are probably more) - >>> >> >> Specialized NFS servers >>> >> >> for HPC which deal with lots of transient inodes, and second those for servers >>> >> >> which base there inode number on path (Windows NFS?). See >>> >> >> http://docs.oracle.com/cd/E19082-01/819-1634/rfsrefer-137/index.html >>> >> >> or the NFSv4 RFC. >>> >> >> >>> >> >> Basically the question is whether it is worth reporting a flag on the >>> >> >> call which returns >>> >> >> the inode number to indicate that the inode number is "stable" (would not change >>> >> >> on reboot or reconnection) or "volatile." Since the majority of NFS >>> >> >> and SMB2 servers >>> >> >> can return stable inode numbers, I don't feel strongly about the need >>> >> >> for an indicator >>> >> >> of "stable" vs. "volatile" but I mention it because backup and >>> >> >> migration applications >>> >> >> mention this (if inode numbers are volatile, they may have to check >>> >> >> for hardlinks differently >>> >> >> for example) >>> >> > >>> >> > I don't understand. If the filesystem doesn't support real inode >>> >> > numbers, then why report them at all? What use would an application have >>> >> > for an inode number that can't be used to identify hard linked files? >>> >> >>> >> Well ... you have to have an inode number on the Linux client side even if >>> >> the server doesn't report them (or has a bug and reports duplicates). >>> >> If you can't tell hardlinked files apart fix the server (but in the >>> >> cases where the file systems has this problem the server doesn't usually >>> >> support hardlinks either). >>> >> >>> >> If the server's file system internal structures don't support real inode >>> >> numbers (such as FAT or a ramdisk) then it either has to make them >>> >> up based on something like path name or some other attribute of the >>> >> file on disk. >>> >> >>> >> Servers like NetApp is where this gets interesting - for cifs e.g. level 1009 >>> >> query file info is used to query_file_internal_info (the inode number) but >>> >> what if the server can not report inode numbers (due to a bug) in >>> >> all cases. >>> > >>> > Right, but none of this explains why we need to report these bogus inode >>> > numbers to the application in the xstat() reply. >>> >>> the question is whether the application (backup) would need to know >>> that the inode numbers are bogus and from my conversations with >>> guys writing backup software it seems that such data is useful to them. >> >> You are still not explaining why they need to know the values at all? If >> the values are bogus, then don't return them, and don't set the flag >> that says they are being returned. > > I don't know, but assumed it was because it was an easy way > to index them since the inode numbers even if they "change" > on remount, are still unique. if the call allows the inode number not to be returned, that is probably ok (they can always use the posix stat to get the client generated one) -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 17:03 ` Steve French @ 2012-04-26 21:57 ` David Howells -1 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-26 21:57 UTC (permalink / raw) To: Myklebust, Trond Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, Steve French, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw Myklebust, Trond <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org> wrote: > You are still not explaining why they need to know the values at all? If > the values are bogus, then don't return them, and don't set the flag > that says they are being returned. What if the xstat() and struct xstat eventually becomes what userspace uses as stat() (as a wrapper) and struct stat (if such a thing is possible with glibc versioning)? Do older programs that think they're using stat() and don't know about the extra fields available expect to see a useful value in st_ino? David ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 21:57 ` David Howells 0 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-26 21:57 UTC (permalink / raw) To: Myklebust, Trond Cc: dhowells, Steve French, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > You are still not explaining why they need to know the values at all? If > the values are bogus, then don't return them, and don't set the flag > that says they are being returned. What if the xstat() and struct xstat eventually becomes what userspace uses as stat() (as a wrapper) and struct stat (if such a thing is possible with glibc versioning)? Do older programs that think they're using stat() and don't know about the extra fields available expect to see a useful value in st_ino? David ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <10104.1335477476-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 21:57 ` David Howells @ 2012-04-26 22:05 ` Roland McGrath -1 siblings, 0 replies; 144+ messages in thread From: Roland McGrath @ 2012-04-26 22:05 UTC (permalink / raw) To: David Howells Cc: Myklebust, Trond, Steve French, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw > What if the xstat() and struct xstat eventually becomes what userspace > uses as stat() (as a wrapper) and struct stat (if such a thing is > possible with glibc versioning)? It's certainly possible with symbol versioning, though it seems much more likely that we'd stick with the existing struct stat and stat* interfaces and only have the implementation using statx underneath (e.g. for new machines or kernel ABIs where the kernel stops providing any calls except for statxat), at least for the foreseeable future. > Do older programs that think they're using stat() and don't know about > the extra fields available expect to see a useful value in st_ino? POSIX requires that st_ino have a useful value for the standard *stat calls. Thanks, Roland ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-26 22:05 ` Roland McGrath 0 siblings, 0 replies; 144+ messages in thread From: Roland McGrath @ 2012-04-26 22:05 UTC (permalink / raw) To: David Howells Cc: Myklebust, Trond, Steve French, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha > What if the xstat() and struct xstat eventually becomes what userspace > uses as stat() (as a wrapper) and struct stat (if such a thing is > possible with glibc versioning)? It's certainly possible with symbol versioning, though it seems much more likely that we'd stick with the existing struct stat and stat* interfaces and only have the implementation using statx underneath (e.g. for new machines or kernel ABIs where the kernel stops providing any calls except for statxat), at least for the foreseeable future. > Do older programs that think they're using stat() and don't know about > the extra fields available expect to see a useful value in st_ino? POSIX requires that st_ino have a useful value for the standard *stat calls. Thanks, Roland ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <20120426220552.D98D62C0D3-j1d2VQoJOwwHfwO+Tb3JRVaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 22:05 ` Roland McGrath @ 2012-04-27 0:33 ` Myklebust, Trond -1 siblings, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-04-27 0:33 UTC (permalink / raw) To: Roland McGrath Cc: David Howells, Steve French, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Thu, 2012-04-26 at 15:05 -0700, Roland McGrath wrote: > > What if the xstat() and struct xstat eventually becomes what userspace > > uses as stat() (as a wrapper) and struct stat (if such a thing is > > possible with glibc versioning)? > > It's certainly possible with symbol versioning, though it seems much more > likely that we'd stick with the existing struct stat and stat* interfaces > and only have the implementation using statx underneath (e.g. for new > machines or kernel ABIs where the kernel stops providing any calls except > for statxat), at least for the foreseeable future. > > > Do older programs that think they're using stat() and don't know about > > the extra fields available expect to see a useful value in st_ino? > > POSIX requires that st_ino have a useful value for the standard *stat calls. Yes, but we're talking about non-POSIX filesystems here. If the filesystem doesn't have a useful value for st_ino, then the usual way of dealing with those POSIX requirements is to fake up values. The question then becomes whether or not we care if it is the kernel or userland that fakes up those values. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-27 0:33 ` Myklebust, Trond 0 siblings, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-04-27 0:33 UTC (permalink / raw) To: Roland McGrath Cc: David Howells, Steve French, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha T24gVGh1LCAyMDEyLTA0LTI2IGF0IDE1OjA1IC0wNzAwLCBSb2xhbmQgTWNHcmF0aCB3cm90ZToN Cj4gPiBXaGF0IGlmIHRoZSB4c3RhdCgpIGFuZCBzdHJ1Y3QgeHN0YXQgZXZlbnR1YWxseSBiZWNv bWVzIHdoYXQgdXNlcnNwYWNlDQo+ID4gdXNlcyBhcyBzdGF0KCkgKGFzIGEgd3JhcHBlcikgYW5k IHN0cnVjdCBzdGF0IChpZiBzdWNoIGEgdGhpbmcgaXMNCj4gPiBwb3NzaWJsZSB3aXRoIGdsaWJj IHZlcnNpb25pbmcpPyAgDQo+IA0KPiBJdCdzIGNlcnRhaW5seSBwb3NzaWJsZSB3aXRoIHN5bWJv bCB2ZXJzaW9uaW5nLCB0aG91Z2ggaXQgc2VlbXMgbXVjaCBtb3JlDQo+IGxpa2VseSB0aGF0IHdl J2Qgc3RpY2sgd2l0aCB0aGUgZXhpc3Rpbmcgc3RydWN0IHN0YXQgYW5kIHN0YXQqIGludGVyZmFj ZXMNCj4gYW5kIG9ubHkgaGF2ZSB0aGUgaW1wbGVtZW50YXRpb24gdXNpbmcgc3RhdHggdW5kZXJu ZWF0aCAoZS5nLiBmb3IgbmV3DQo+IG1hY2hpbmVzIG9yIGtlcm5lbCBBQklzIHdoZXJlIHRoZSBr ZXJuZWwgc3RvcHMgcHJvdmlkaW5nIGFueSBjYWxscyBleGNlcHQNCj4gZm9yIHN0YXR4YXQpLCBh dCBsZWFzdCBmb3IgdGhlIGZvcmVzZWVhYmxlIGZ1dHVyZS4NCj4gDQo+ID4gRG8gb2xkZXIgcHJv Z3JhbXMgdGhhdCB0aGluayB0aGV5J3JlIHVzaW5nIHN0YXQoKSBhbmQgZG9uJ3Qga25vdyBhYm91 dA0KPiA+IHRoZSBleHRyYSBmaWVsZHMgYXZhaWxhYmxlIGV4cGVjdCB0byBzZWUgYSB1c2VmdWwg dmFsdWUgaW4gc3RfaW5vPw0KPiANCj4gUE9TSVggcmVxdWlyZXMgdGhhdCBzdF9pbm8gaGF2ZSBh IHVzZWZ1bCB2YWx1ZSBmb3IgdGhlIHN0YW5kYXJkICpzdGF0IGNhbGxzLg0KDQpZZXMsIGJ1dCB3 ZSdyZSB0YWxraW5nIGFib3V0IG5vbi1QT1NJWCBmaWxlc3lzdGVtcyBoZXJlLiBJZiB0aGUNCmZp bGVzeXN0ZW0gZG9lc24ndCBoYXZlIGEgdXNlZnVsIHZhbHVlIGZvciBzdF9pbm8sIHRoZW4gdGhl IHVzdWFsIHdheSBvZg0KZGVhbGluZyB3aXRoIHRob3NlIFBPU0lYIHJlcXVpcmVtZW50cyBpcyB0 byBmYWtlIHVwIHZhbHVlcy4gVGhlIHF1ZXN0aW9uDQp0aGVuIGJlY29tZXMgd2hldGhlciBvciBu b3Qgd2UgY2FyZSBpZiBpdCBpcyB0aGUga2VybmVsIG9yIHVzZXJsYW5kIHRoYXQNCmZha2VzIHVw IHRob3NlIHZhbHVlcy4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQg bWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0 YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 21:57 ` David Howells @ 2012-04-27 0:30 ` Myklebust, Trond -1 siblings, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-04-27 0:30 UTC (permalink / raw) To: David Howells Cc: Steve French, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Thu, 2012-04-26 at 22:57 +0100, David Howells wrote: > Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > > > You are still not explaining why they need to know the values at all? If > > the values are bogus, then don't return them, and don't set the flag > > that says they are being returned. > th > What if the xstat() and struct xstat eventually becomes what userspace uses as > stat() (as a wrapper) and struct stat (if such a thing is possible with glibc > versioning)? Do older programs that think they're using stat() and don't know > about the extra fields available expect to see a useful value in st_ino? Does it really matter whether it is the kernel or userland that is responsible for faking up inode numbers? If userland wants to use xstat() in order to fake up a stat() call, then it gets to take responsibility for the results. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-27 0:30 ` Myklebust, Trond 0 siblings, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-04-27 0:30 UTC (permalink / raw) To: David Howells Cc: Steve French, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha T24gVGh1LCAyMDEyLTA0LTI2IGF0IDIyOjU3ICswMTAwLCBEYXZpZCBIb3dlbGxzIHdyb3RlOg0K PiBNeWtsZWJ1c3QsIFRyb25kIDxUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbT4gd3JvdGU6DQo+ IA0KPiA+IFlvdSBhcmUgc3RpbGwgbm90IGV4cGxhaW5pbmcgd2h5IHRoZXkgbmVlZCB0byBrbm93 IHRoZSB2YWx1ZXMgYXQgYWxsPyBJZg0KPiA+IHRoZSB2YWx1ZXMgYXJlIGJvZ3VzLCB0aGVuIGRv bid0IHJldHVybiB0aGVtLCBhbmQgZG9uJ3Qgc2V0IHRoZSBmbGFnDQo+ID4gdGhhdCBzYXlzIHRo ZXkgYXJlIGJlaW5nIHJldHVybmVkLg0KPiB0aA0KPiBXaGF0IGlmIHRoZSB4c3RhdCgpIGFuZCBz dHJ1Y3QgeHN0YXQgZXZlbnR1YWxseSBiZWNvbWVzIHdoYXQgdXNlcnNwYWNlIHVzZXMgYXMNCj4g c3RhdCgpIChhcyBhIHdyYXBwZXIpIGFuZCBzdHJ1Y3Qgc3RhdCAoaWYgc3VjaCBhIHRoaW5nIGlz IHBvc3NpYmxlIHdpdGggZ2xpYmMNCj4gdmVyc2lvbmluZyk/ICBEbyBvbGRlciBwcm9ncmFtcyB0 aGF0IHRoaW5rIHRoZXkncmUgdXNpbmcgc3RhdCgpIGFuZCBkb24ndCBrbm93DQo+IGFib3V0IHRo ZSBleHRyYSBmaWVsZHMgYXZhaWxhYmxlIGV4cGVjdCB0byBzZWUgYSB1c2VmdWwgdmFsdWUgaW4g c3RfaW5vPw0KDQpEb2VzIGl0IHJlYWxseSBtYXR0ZXIgd2hldGhlciBpdCBpcyB0aGUga2VybmVs IG9yIHVzZXJsYW5kIHRoYXQgaXMNCnJlc3BvbnNpYmxlIGZvciBmYWtpbmcgdXAgaW5vZGUgbnVt YmVycz8gSWYgdXNlcmxhbmQgd2FudHMgdG8gdXNlDQp4c3RhdCgpIGluIG9yZGVyIHRvIGZha2Ug dXAgYSBzdGF0KCkgY2FsbCwgdGhlbiBpdCBnZXRzIHRvIHRha2UNCnJlc3BvbnNpYmlsaXR5IGZv ciB0aGUgcmVzdWx0cy4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQg bWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0 YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 14:25 ` David Howells 2012-04-26 14:54 ` Steve French @ 2012-04-26 15:52 ` David Howells 2012-04-27 0:29 ` Andreas Dilger [not found] ` <3F302713-B675-4BAA-B2B7-235E03C5975F-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org> 1 sibling, 2 replies; 144+ messages in thread From: David Howells @ 2012-04-26 15:52 UTC (permalink / raw) To: Steve French Cc: dhowells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Steve French <smfrench@gmail.com> wrote: > >> Would it be better to make the stable vs volatile inode number an attribute > >> of the volume or something returned by the proposed xstat? > > > > I'm not sure what you mean by a stable vs a volatile inode number. > > Both NFS and CIFS (and SMB2) can return inode numbers or equivalent unique > identifier, but in the case of CIFS some old servers don't support the calls > which return inode numbers (or don't return them for all file system types, > Windows FAT?) so in these cases cifs has to create inode numbers on the fly > on the client. inode numbers created on the client are not "stable" they can > change on unmount/remount (which can cause problems for backup applications). In the volatile case you'd probably want to unset XSTAT_INO in st_mask as the inode number is a local fabrication. However, since there is a remote file ID, we could add an XSTAT_INFO_FILE_ID flag to indicate there's a standard xattr holding this. On CIFS this could be the servername + pathname, on NFS this could be the server address + FH on AFS the cell+volID+FID+uniquifier for example. That's independent of xstat, however, and wouldn't be returned as it's a blob that could be quite large. I presume in some cases, there is not a unique file ID that persists across rename. > Similarly NFSv4 does not require that servers always return stable inode > numbers (that will never change) and introduced a concept of "volatile file > handle." Can I presume the inode number cannot be considered stable if the NFS4 FH is non-volatile? Furthermore, can I presume NFS2/3 inode numbers are supposed to be stable? > Basically the question is whether it is worth reporting a flag on the call > which returns the inode number to indicate that the inode number is "stable" > (would not change on reboot or reconnection) or "volatile." Since the > majority of NFS and SMB2 servers can return stable inode numbers, I don't > feel strongly about the need for an indicator of "stable" vs. "volatile" but > I mention it because backup and migration applications mention this (if inode > numbers are volatile, they may have to check for hardlinks differently for > example) It may be that unsetting XSTAT_INO if you've fabricated the inode number locally is sufficient. > >> > Handle remote filesystems being offline and indicate this with > >> > XSTAT_INFO_OFFLINE. > >> > >> You already have support for an indicator for offline files (HSM), Which indicator is this? Or do you mean XSTAT_INFO_OFFLINE? > >> would XSTAT_INFO_OFFLINE be intended for the case > >> where the network session to the server is disconnected > >> (and in which you case the application does not want to reconnect)? > > > > Hmmm... Interesting question. Both NTFS and CIFS have an offline > > attribute (which is where I originally got this from) - but should I have a > > separate indicator to indicate the client can't access a server over a > > network (ie. we've gone to disconnected operation on this file)? > > E.g. should there be a XSTAT_INFO_DISCONNECTED too? > > my reaction is no, since it adds complexity. If you do a stat on a > disconnected volume (where the network is temporarily down) reconnection will > be attempted. If reconnection fails then the xstat will either fail or be > retried forever depending on the value of "hard" vs. "soft" mount flag. I was thinking of how to handle disconnected operation, where you can't just sit there and churn waiting for the server to come back or give an error. On the other hand, as long as there's some spare space in the struct, we can deal with that later when we actually start to implement D/O. David ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 15:52 ` David Howells @ 2012-04-27 0:29 ` Andreas Dilger [not found] ` <3F302713-B675-4BAA-B2B7-235E03C5975F-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org> 1 sibling, 0 replies; 144+ messages in thread From: Andreas Dilger @ 2012-04-27 0:29 UTC (permalink / raw) To: David Howells Cc: Steve French, dhowells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On 2012-04-26, at 10:52, David Howells <dhowells@redhat.com> wrote: > Steve French <smfrench@gmail.com> wrote: >> >> Both NFS and CIFS (and SMB2) can return inode numbers or equivalent unique >> identifier, but in the case of CIFS some old servers don't support the calls >> which return inode numbers (or don't return them for all file system types, >> Windows FAT?) so in these cases cifs has to create inode numbers on the fly >> on the client. inode numbers created on the client are not "stable" they can >> change on unmount/remount (which can cause problems for backup applications). > > In the volatile case you'd probably want to unset XSTAT_INO in st_mask as the > inode number is a local fabrication. I'd agree. Why fake up an inode number if the application doesn't care? Most apps don't actually use the inode. The only uses I know for the inode number in userspace are backup, CIFS/NFS servers, and "ls -li" . > However, since there is a remote file ID, > we could add an XSTAT_INFO_FILE_ID flag to indicate there's a standard xattr > holding this. It is a bit strange that the kernel would return a flag that was not requested, but not fatal. > On CIFS this could be the servername + pathname, on NFS this > could be the server address + FH on AFS the cell+volID+FID+uniquifier for > example. That's independent of xstat, however, and wouldn't be returned as > it's a blob that could be quite large. > > I presume in some cases, there is not a unique file ID that persists across > rename. > >> Similarly NFSv4 does not require that servers always return stable inode >> numbers (that will never change) and introduced a concept of "volatile file >> handle." > > Can I presume the inode number cannot be considered stable if the NFS4 FH is > non-volatile? Furthermore, can I presume NFS2/3 inode numbers are supposed to > be stable? > >> Basically the question is whether it is worth reporting a flag on the call >> which returns the inode number to indicate that the inode number is "stable" >> (would not change on reboot or reconnection) or "volatile." Since the >> majority of NFS and SMB2 servers can return stable inode numbers, I don't >> feel strongly about the need for an indicator of "stable" vs. "volatile" but >> I mention it because backup and migration applications mention this (if inode >> numbers are volatile, they may have to check for hardlinks differently for >> example) > > It may be that unsetting XSTAT_INO if you've fabricated the inode number > locally is sufficient. > >>>>> Handle remote filesystems being offline and indicate this with >>>>> XSTAT_INFO_OFFLINE. >>>> >>>> You already have support for an indicator for offline files (HSM), > > Which indicator is this? Or do you mean XSTAT_INFO_OFFLINE? > >>>> would XSTAT_INFO_OFFLINE be intended for the case >>>> where the network session to the server is disconnected >>>> (and in which you case the application does not want to reconnect)? >>> >>> Hmmm... Interesting question. Both NTFS and CIFS have an offline >>> attribute (which is where I originally got this from) - but should I have a >>> separate indicator to indicate the client can't access a server over a >>> network (ie. we've gone to disconnected operation on this file)? >>> E.g. should there be a XSTAT_INFO_DISCONNECTED too? >> >> my reaction is no, since it adds complexity. If you do a stat on a >> disconnected volume (where the network is temporarily down) reconnection will >> be attempted. If reconnection fails then the xstat will either fail or be >> retried forever depending on the value of "hard" vs. "soft" mount flag. > > I was thinking of how to handle disconnected operation, where you can't just > sit there and churn waiting for the server to come back or give an error. On > the other hand, as long as there's some spare space in the struct, we can deal > with that later when we actually start to implement D/O. > > David > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-27 0:29 ` Andreas Dilger 0 siblings, 0 replies; 144+ messages in thread From: Andreas Dilger @ 2012-04-27 0:29 UTC (permalink / raw) Cc: Steve French, dhowells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On 2012-04-26, at 10:52, David Howells <dhowells@redhat.com> wrote: > Steve French <smfrench@gmail.com> wrote: >> >> Both NFS and CIFS (and SMB2) can return inode numbers or equivalent unique >> identifier, but in the case of CIFS some old servers don't support the calls >> which return inode numbers (or don't return them for all file system types, >> Windows FAT?) so in these cases cifs has to create inode numbers on the fly >> on the client. inode numbers created on the client are not "stable" they can >> change on unmount/remount (which can cause problems for backup applications). > > In the volatile case you'd probably want to unset XSTAT_INO in st_mask as the > inode number is a local fabrication. I'd agree. Why fake up an inode number if the application doesn't care? Most apps don't actually use the inode. The only uses I know for the inode number in userspace are backup, CIFS/NFS servers, and "ls -li" . > However, since there is a remote file ID, > we could add an XSTAT_INFO_FILE_ID flag to indicate there's a standard xattr > holding this. It is a bit strange that the kernel would return a flag that was not requested, but not fatal. > On CIFS this could be the servername + pathname, on NFS this > could be the server address + FH on AFS the cell+volID+FID+uniquifier for > example. That's independent of xstat, however, and wouldn't be returned as > it's a blob that could be quite large. > > I presume in some cases, there is not a unique file ID that persists across > rename. > >> Similarly NFSv4 does not require that servers always return stable inode >> numbers (that will never change) and introduced a concept of "volatile file >> handle." > > Can I presume the inode number cannot be considered stable if the NFS4 FH is > non-volatile? Furthermore, can I presume NFS2/3 inode numbers are supposed to > be stable? > >> Basically the question is whether it is worth reporting a flag on the call >> which returns the inode number to indicate that the inode number is "stable" >> (would not change on reboot or reconnection) or "volatile." Since the >> majority of NFS and SMB2 servers can return stable inode numbers, I don't >> feel strongly about the need for an indicator of "stable" vs. "volatile" but >> I mention it because backup and migration applications mention this (if inode >> numbers are volatile, they may have to check for hardlinks differently for >> example) > > It may be that unsetting XSTAT_INO if you've fabricated the inode number > locally is sufficient. > >>>>> Handle remote filesystems being offline and indicate this with >>>>> XSTAT_INFO_OFFLINE. >>>> >>>> You already have support for an indicator for offline files (HSM), > > Which indicator is this? Or do you mean XSTAT_INFO_OFFLINE? > >>>> would XSTAT_INFO_OFFLINE be intended for the case >>>> where the network session to the server is disconnected >>>> (and in which you case the application does not want to reconnect)? >>> >>> Hmmm... Interesting question. Both NTFS and CIFS have an offline >>> attribute (which is where I originally got this from) - but should I have a >>> separate indicator to indicate the client can't access a server over a >>> network (ie. we've gone to disconnected operation on this file)? >>> E.g. should there be a XSTAT_INFO_DISCONNECTED too? >> >> my reaction is no, since it adds complexity. If you do a stat on a >> disconnected volume (where the network is temporarily down) reconnection will >> be attempted. If reconnection fails then the xstat will either fail or be >> retried forever depending on the value of "hard" vs. "soft" mount flag. > > I was thinking of how to handle disconnected operation, where you can't just > sit there and churn waiting for the server to come back or give an error. On > the other hand, as long as there's some spare space in the struct, we can deal > with that later when we actually start to implement D/O. > > David > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <3F302713-B675-4BAA-B2B7-235E03C5975F-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-26 15:52 ` David Howells @ 2012-04-27 9:19 ` David Howells [not found] ` <3F302713-B675-4BAA-B2B7-235E03C5975F-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org> 1 sibling, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-27 9:19 UTC (permalink / raw) To: Andreas Dilger Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, Steve French, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw Andreas Dilger <adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org> wrote: > > However, since there is a remote file ID, we could add an > > XSTAT_INFO_FILE_ID flag to indicate there's a standard xattr holding this. > > It is a bit strange that the kernel would return a flag that was not > requested, but not fatal. On the other hand, if it costs the kernel nothing to generate this indicator... For network filesystems, for instance, you might know that there is some other file ID because of the way the fs is specified. I was thinking that many of the fields will be given values, even if you don't ask for them, because the values (or approximations thereof) are present in memory already. David -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-27 9:19 ` David Howells 0 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-27 9:19 UTC (permalink / raw) To: Andreas Dilger Cc: dhowells, Steve French, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Andreas Dilger <adilger@dilger.ca> wrote: > > However, since there is a remote file ID, we could add an > > XSTAT_INFO_FILE_ID flag to indicate there's a standard xattr holding this. > > It is a bit strange that the kernel would return a flag that was not > requested, but not fatal. On the other hand, if it costs the kernel nothing to generate this indicator... For network filesystems, for instance, you might know that there is some other file ID because of the way the fs is specified. I was thinking that many of the fields will be given values, even if you don't ask for them, because the values (or approximations thereof) are present in memory already. David ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <20120419140558.17272.74360.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>]
* [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-19 14:05 ` David Howells @ 2012-04-19 14:06 ` David Howells -1 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-19 14:06 UTC (permalink / raw) To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw Add a pair of system calls to make extended file stats available, including file creation time, inode version and data version where available through the underlying filesystem. The idea was initially proposed as a set of xattrs that could be retrieved with getxattr(), but the general preferance proved to be for new syscalls with an extended stat structure. This has a number of uses: (1) Creation time: The SMB protocol carries the creation time, which could be exported by Samba, which will in turn help CIFS make use of FS-Cache as that can be used for coherency data. This is also specified in NFSv4 as a recommended attribute and could be exported by NFSD [Steve French]. (2) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server [Trond Myklebust, Ulrich Drepper]. (3) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date [Trond Myklebust]. (4) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd Schubert]. (5) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. Can also be used to modify fill_post_wcc() in NFSD which retrieves i_version directly, but has just called vfs_getattr(). It could get it from the kstat struct if it used vfs_xgetattr() instead. (6) BSD stat compatibility: Including more fields from the BSD stat such as creation time (st_btime) and inode generation number (st_gen) [Jeremy Allison, Bernd Schubert]. (7) Extra coherency data may be useful in making backups [Andreas Dilger]. (8) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available, so if, for instance, inode numbers or UIDs don't exist... (9) Make the fields a consistent size on all arches and make them large. (10) Store a 16-byte volume ID in the superblock that can be returned in struct xstat [Steve French]. (11) Include granularity fields in the time data to indicate the granularity of each of the times (NFSv4 time_delta) [Steve French]. (12) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. (13) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, Michael Kerrisk]. (14) Spare space, request flags and information flags are provided for future expansion. The following structures are defined for the use of these new system calls: struct xstat_dev { uint32_t major, minor; }; struct xstat_time { uint64_t tv_sec; uint32_t tv_nsec; uint32_t tv_granularity; }; struct xstat { uint32_t st_mask; uint32_t st_mode; uint32_t st_nlink; uint32_t st_uid; uint32_t st_gid; uint32_t st_information; uint32_t st_ioc_flags; uint32_t st_blksize; struct xstat_dev st_rdev; struct xstat_dev st_dev; struct xstat_time st_atime; struct xstat_time st_btime; struct xstat_time st_ctime; struct xstat_time st_mtime; uint64_t st_ino; uint64_t st_size; uint64_t st_blocks; uint64_t st_gen; uint64_t st_version; uint8_t st_volume_id[16]; uint64_t __spares[11]; }; where st_information is local system information about the file, st_btime is the file creation time, st_gen is the inode generation (i_generation), st_data_version is the data version number (i_version), st_ioc_flags is the flags from FS_IOC_GETFLAGS, st_volume_id is where the volume identified is stored, st_result_mask is a bitmask indicating the data provided and __spares[] are where as-yet undefined fields can be placed. The defined bits in request_mask and st_mask are: XSTAT_MODE Want/got st_mode XSTAT_NLINK Want/got st_nlink XSTAT_UID Want/got st_uid XSTAT_GID Want/got st_gid XSTAT_RDEV Want/got st_rdev XSTAT_ATIME Want/got st_atime XSTAT_MTIME Want/got st_mtime XSTAT_CTIME Want/got st_ctime XSTAT_INO Want/got st_ino XSTAT_SIZE Want/got st_size XSTAT_BLOCKS Want/got st_blocks XSTAT_BASIC_STATS [The stuff in the normal stat struct] XSTAT_IOC_FLAGS Want/got FS_IOC_GETFLAGS XSTAT_BTIME Want/got st_btime XSTAT_GEN Want/got st_gen XSTAT_VERSION Want/got st_data_version XSTAT_VOLUME_ID Want/got st_volume_id XSTAT_ALL_STATS [All currently available stuff] The defined bits in st_ioc_flags are the usual FS_xxx_FL, plus some extra flags that might be supplied by the filesystem. Note that Ext4 returns flags outside of {EXT4,FS}_FL_USER_VISIBLE in response to FS_IOC_GETFLAGS. Should {EXT4,FS}_FL_USER_VISIBLE be extended to cover them? Or should the extra flags be suppressed? The defined bits in the st_information field give local system data on a file, how it is accessed, where it is and what it does: XSTAT_INFO_ENCRYPTED File is encrypted XSTAT_INFO_TEMPORARY File is temporary (NTFS/CIFS/deleted) XSTAT_INFO_FABRICATED File was made up by filesystem XSTAT_INFO_KERNEL_API File is kernel API (eg: procfs/sysfs) XSTAT_INFO_REMOTE File is remote XSTAT_INFO_OFFLINE File is offline (CIFS) XSTAT_INFO_AUTOMOUNT Dir is automount trigger XSTAT_INFO_AUTODIR Dir provides unlisted automounts XSTAT_INFO_NONSYSTEM_OWNERSHIP File has non-system ownership details XSTAT_INFO_HAS_ACL File has an ACL of some sort XSTAT_INFO_REPARSE_POINT File is reparse point (NTFS/CIFS) XSTAT_INFO_HIDDEN File is marked hidden (DOS+) XSTAT_INFO_SYSTEM File is marked system (DOS+) XSTAT_INFO_ARCHIVE File is marked archive (DOS+) These are for the use of GUI tools that might want to mark files specially, depending on what they are. I've tried not to provide overlap with st_ioc_flags where something usable exists there. Should Hidden, System and Archive flags be associated with ioc_flags, perhaps with ioc_flags extended to 64-bits? The system calls are: ssize_t ret = xstat(int dfd, const char *filename, unsigned int flags, unsigned int mask, struct xstat *buffer); ssize_t ret = fxstat(unsigned fd, unsigned int flags, unsigned int mask, struct xstat *buffer); The dfd, filename, flags and fd parameters indicate the file to query. There is no equivalent of lstat() as that can be emulated with xstat() by passing AT_SYMLINK_NOFOLLOW in flags. AT_FORCE_ATTR_SYNC can also be set in flags. This will require a network filesystem to synchronise its attributes with the server. mask is a bitmask indicating the fields in struct xstat that are of interest to the caller. The user should set this to XSTAT__BASIC_STATS to get the basic set returned by stat(). Should there just be one xstat() syscall that does fxstat() if filename is NULL? The fields in struct xstat come in a number of classes: (0) st_dev, st_blksize, st_information. These are local data and are always available. (1) st_mode, st_nlinks, st_uid, st_gid, st_[amc]time, st_ino, st_size, st_blocks. These will be returned whether the caller asks for them or not. The corresponding bits in result_mask will be set to indicate their presence. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. If the values don't actually exist for the underlying object (such as UID or GID on a DOS file), then the bit won't be set in the result_mask, even if the caller asked for the value and the returned value will be a fabrication. (2) st_rdev. As for class (1), but this won't be returned if the file is not a blockdev or chardev. The bit will be cleared if the value is not returned. (3) File creation time (st_btime), inode generation (st_gen), data version (st_version), volume_id (st_volume_id) and inode flags (st_ioc_flags). These will be returned if available whether the caller asked for them or not. The corresponding bits in result_mask will be set or cleared as appropriate to indicate their presence. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. At the moment, this will only work on x86_64 and i386 as it requires system calls to be wired up. ======= TESTING ======= The following test program can be used to test the xstat system call: /* Test the xstat() system call * * Copyright (C) 2010 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org) * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public Licence * as published by the Free Software Foundation; either version * 2 of the Licence, or (at your option) any later version. */ #define _GNU_SOURCE #define _ATFILE_SOURCE #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <fcntl.h> #include <time.h> #include <sys/syscall.h> #include <sys/stat.h> #include <sys/types.h> #define AT_NO_AUTOMOUNT 0x800 #define AT_FORCE_ATTR_SYNC 0x2000 #define XSTAT_MODE 0x00000001U #define XSTAT_NLINK 0x00000002U #define XSTAT_UID 0x00000004U #define XSTAT_GID 0x00000008U #define XSTAT_RDEV 0x00000010U #define XSTAT_ATIME 0x00000020U #define XSTAT_MTIME 0x00000040U #define XSTAT_CTIME 0x00000080U #define XSTAT_INO 0x00000100U #define XSTAT_SIZE 0x00000200U #define XSTAT_BLOCKS 0x00000400U #define XSTAT_BASIC_STATS 0x000007ffU #define XSTAT_BTIME 0x00000800U #define XSTAT_GEN 0x00001000U #define XSTAT_VERSION 0x00002000U #define XSTAT_IOC_FLAGS 0x00004000U #define XSTAT_VOLUME_ID 0x00008000U #define XSTAT_ALL_STATS 0x0000ffffU struct xstat_dev { uint32_t major; uint32_t minor; }; struct xstat_time { uint64_t tv_sec; uint32_t tv_nsec; uint32_t tv_granularity; }; struct xstat { uint32_t st_mask; uint32_t st_mode; uint32_t st_nlink; uint32_t st_uid; uint32_t st_gid; uint32_t st_information; uint32_t st_ioc_flags; uint32_t st_blksize; struct xstat_dev st_rdev; struct xstat_dev st_dev; struct xstat_time st_atim; struct xstat_time st_btim; struct xstat_time st_ctim; struct xstat_time st_mtim; uint64_t st_ino; uint64_t st_size; uint64_t st_blksize; uint64_t st_blocks; uint64_t st_gen; uint64_t st_version; uint64_t st_volume_id[16]; uint64_t st_spares[11]; }; #define XSTAT_INFO_ENCRYPTED 0x00000001U #define XSTAT_INFO_TEMPORARY 0x00000002U #define XSTAT_INFO_FABRICATED 0x00000004U #define XSTAT_INFO_KERNEL_API 0x00000008U #define XSTAT_INFO_REMOTE 0x00000010U #define XSTAT_INFO_OFFLINE 0x00000020U #define XSTAT_INFO_AUTOMOUNT 0x00000040U #define XSTAT_INFO_AUTODIR 0x00000080U #define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U #define XSTAT_INFO_HAS_ACL 0x00000200U #define XSTAT_INFO_REPARSE_POINT 0x00000400U #define XSTAT_INFO_HIDDEN 0x00000800U #define XSTAT_INFO_SYSTEM 0x00001000U #define XSTAT_INFO_ARCHIVE 0x00002000U #define __NR_xstat 312 #define __NR_fxstat 313 static __attribute__((unused)) ssize_t xstat(int dfd, const char *filename, unsigned flags, unsigned int mask, struct xstat *buffer) { return syscall(__NR_xstat, dfd, filename, flags, mask, buffer); } static __attribute__((unused)) ssize_t fxstat(int fd, unsigned flags, unsigned int mask, struct xstat *buffer) { return syscall(__NR_fxstat, fd, flags, mask, buffer); } static void print_time(const char *field, const struct xstat_time *xstm) { struct tm tm; time_t tim; char buffer[100]; int len; tim = xstm->tv_sec; if (!localtime_r(&tim, &tm)) { perror("localtime_r"); exit(1); } len = strftime(buffer, 100, "%F %T", &tm); if (len == 0) { perror("strftime"); exit(1); } printf("%s", field); fwrite(buffer, 1, len, stdout); printf(".%09u", xstm->tv_nsec); len = strftime(buffer, 100, "%z", &tm); if (len == 0) { perror("strftime2"); exit(1); } fwrite(buffer, 1, len, stdout); printf("\n"); } static void dump_xstat(struct xstat *xst) { char buffer[256], ft; printf("results=%x\n", xst->st_mask); printf(" "); if (xst->st_mask & XSTAT_SIZE) printf(" Size: %-15llu", (unsigned long long) xst->st_size); if (xst->st_mask & XSTAT_BLOCKS) printf(" Blocks: %-10llu", (unsigned long long) xst->st_blocks); printf(" IO Block: %-6llu ", (unsigned long long) xst->st_blksize); if (xst->st_mask & XSTAT_MODE) { switch (xst->st_mode & S_IFMT) { case S_IFIFO: printf(" FIFO\n"); ft = 'p'; break; case S_IFCHR: printf(" character special file\n"); ft = 'c'; break; case S_IFDIR: printf(" directory\n"); ft = 'd'; break; case S_IFBLK: printf(" block special file\n"); ft = 'b'; break; case S_IFREG: printf(" regular file\n"); ft = '-'; break; case S_IFLNK: printf(" symbolic link\n"); ft = 'l'; break; case S_IFSOCK: printf(" socket\n"); ft = 's'; break; default: printf("unknown type (%o)\n", xst->st_mode & S_IFMT); ft = '?'; break; } } sprintf(buffer, "%02x:%02x", xst->st_dev.major, xst->st_dev.minor); printf("Device: %-15s", buffer); if (xst->st_mask & XSTAT_INO) printf(" Inode: %-11llu", (unsigned long long) xst->st_ino); if (xst->st_mask & XSTAT_SIZE) printf(" Links: %-5u", xst->st_nlink); if (xst->st_mask & XSTAT_RDEV) printf(" Device type: %u,%u", xst->st_rdev.major, xst->st_rdev.minor); printf("\n"); if (xst->st_mask & XSTAT_MODE) printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c) ", xst->st_mode & 07777, ft, xst->st_mode & S_IRUSR ? 'r' : '-', xst->st_mode & S_IWUSR ? 'w' : '-', xst->st_mode & S_IXUSR ? 'x' : '-', xst->st_mode & S_IRGRP ? 'r' : '-', xst->st_mode & S_IWGRP ? 'w' : '-', xst->st_mode & S_IXGRP ? 'x' : '-', xst->st_mode & S_IROTH ? 'r' : '-', xst->st_mode & S_IWOTH ? 'w' : '-', xst->st_mode & S_IXOTH ? 'x' : '-'); if (xst->st_mask & XSTAT_UID) printf("Uid: %d \n", xst->st_uid); if (xst->st_mask & XSTAT_GID) printf("Gid: %u\n", xst->st_gid); if (xst->st_mask & XSTAT_ATIME) print_time("Access: ", &xst->st_atim); if (xst->st_mask & XSTAT_MTIME) print_time("Modify: ", &xst->st_mtim); if (xst->st_mask & XSTAT_CTIME) print_time("Change: ", &xst->st_ctim); if (xst->st_mask & XSTAT_BTIME) print_time("Create: ", &xst->st_btim); if (xst->st_mask & XSTAT_GEN) printf("Inode version: %llxh\n", (unsigned long long) xst->st_gen); if (xst->st_mask & XSTAT_VERSION) printf("Data version: %llxh\n", (unsigned long long) xst->st_version); if (xst->st_mask & XSTAT_IOC_FLAGS) { unsigned char bits; int loop, byte; static char flag_representation[32 + 1] = /* FS_IOC_GETFLAGS flags: */ "????????" /* 31-24 0x00000000-ff000000 */ "????ehTD" /* 23-16 0x00000000-00ff0000 */ "tj?IE?XZ" /* 15- 8 0x00000000-0000ff00 */ "AdaiScus" /* 7- 0 0x00000000-000000ff */ ; printf("Inode flags: %08x (", xst->st_ioc_flags); for (byte = 32 - 8; byte >= 0; byte -= 8) { bits = xst->st_ioc_flags >> byte; for (loop = 7; loop >= 0; loop--) { int bit = byte + loop; if (bits & 0x80) putchar(flag_representation[31 - bit]); else putchar('-'); bits <<= 1; } if (byte) putchar(' '); } printf(")\n"); } if (xst->st_information) { unsigned char bits; int loop, byte; static char info_representation[32 + 1] = /* XSTAT_INFO_ flags: */ "????????" /* 31-24 0x00000000-ff000000 */ "????????" /* 23-16 0x00000000-00ff0000 */ "??ASHRan" /* 15- 8 0x00000000-0000ff00 */ "dmorkfte" /* 7- 0 0x00000000-000000ff */ ; printf("Information: %08x (", xst->st_information); for (byte = 32 - 8; byte >= 0; byte -= 8) { bits = xst->st_information >> byte; for (loop = 7; loop >= 0; loop--) { int bit = byte + loop; if (bits & 0x80) putchar(info_representation[31 - bit]); else putchar('-'); bits <<= 1; } if (byte) putchar(' '); } printf(")\n"); } if (xst->st_mask & XSTAT_VOLUME_ID) { int loop; printf("Volume ID: "); for (loop = 0; loop < sizeof(xst->st_volume_id); loop++) { printf("%02x", xst->st_volume_id[loop]); if (loop == 7) printf("-"); } printf("\n"); } } void dump_hex(unsigned long long *data, int from, int to) { unsigned offset, print_offset = 1, col = 0; from /= 8; to = (to + 7) / 8; for (offset = from; offset < to; offset++) { if (print_offset) { printf("%04x: ", offset * 8); print_offset = 0; } printf("%016llx", data[offset]); col++; if ((col & 3) == 0) { printf("\n"); print_offset = 1; } else { printf(" "); } } if (!print_offset) printf("\n"); } int main(int argc, char **argv) { struct xstat xst; int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW; unsigned int mask = XSTAT_ALL_STATS; for (argv++; *argv; argv++) { if (strcmp(*argv, "-F") == 0) { atflag |= AT_FORCE_ATTR_SYNC; continue; } if (strcmp(*argv, "-L") == 0) { atflag &= ~AT_SYMLINK_NOFOLLOW; continue; } if (strcmp(*argv, "-O") == 0) { mask &= ~XSTAT_BASIC_STATS; continue; } if (strcmp(*argv, "-A") == 0) { atflag |= AT_NO_AUTOMOUNT; continue; } if (strcmp(*argv, "-R") == 0) { raw = 1; continue; } memset(&xst, 0xbf, sizeof(xst)); ret = xstat(AT_FDCWD, *argv, atflag, mask, &xst); printf("xstat(%s) = %d\n", *argv, ret); if (ret < 0) { perror(*argv); exit(1); } if (raw) dump_hex((unsigned long long *)&xst, 0, sizeof(xst)); dump_xstat(&xst); } return 0; } Just compile and run, passing it paths to the files you want to examine: [root@andromeda ~]# /tmp/xstat /proc/$$ xstat(/proc/2074) = 160 results=47ef Size: 0 Blocks: 0 IO Block: 1024 directory Device: 00:03 Inode: 9072 Links: 7 Access: (0555/dr-xr-xr-x) Uid: 0 Gid: 0 Access: 2010-07-14 16:50:46.609336272+0100 Modify: 2010-07-14 16:50:46.609336272+0100 Change: 2010-07-14 16:50:46.609336272+0100 Inode flags: 0000000100000000 (-------- -------- -------- -------S -------- -------- -------- --------) [root@andromeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm xstat(/afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm) = 160 results=77ef Size: 5413882 Blocks: 0 IO Block: 4096 regular file Device: 00:15 Inode: 2288 Links: 1 Access: (0644/-rw-r--r--) Uid: 75338 Gid: 0 Access: 2008-11-05 19:47:22.000000000+0000 Modify: 2008-11-05 19:47:22.000000000+0000 Change: 2008-11-05 19:47:22.000000000+0000 Inode version: 795h Data version: 2h Inode flags: 0000000800000000 (-------- -------- -------- ----r--- -------- -------- -------- --------) Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- arch/x86/syscalls/syscall_32.tbl | 2 arch/x86/syscalls/syscall_64.tbl | 2 fs/stat.c | 350 +++++++++++++++++++++++++++++++++++--- include/linux/fcntl.h | 1 include/linux/fs.h | 4 include/linux/stat.h | 126 +++++++++++++- include/linux/syscalls.h | 7 + 7 files changed, 461 insertions(+), 31 deletions(-) diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl index 29f9f05..980eb5a 100644 --- a/arch/x86/syscalls/syscall_32.tbl +++ b/arch/x86/syscalls/syscall_32.tbl @@ -355,3 +355,5 @@ 346 i386 setns sys_setns 347 i386 process_vm_readv sys_process_vm_readv compat_sys_process_vm_readv 348 i386 process_vm_writev sys_process_vm_writev compat_sys_process_vm_writev +349 i386 xstat sys_xstat +350 i386 fxstat sys_fxstat diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index dd29a9e..7ae24bb 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -318,6 +318,8 @@ 309 common getcpu sys_getcpu 310 64 process_vm_readv sys_process_vm_readv 311 64 process_vm_writev sys_process_vm_writev +312 common xstat sys_xstat +313 common fxstat sys_fxstat # # x32-specific system call numbers start at 512 to avoid cache impact # for native 64-bit operation. diff --git a/fs/stat.c b/fs/stat.c index c733dc5..af3ef33 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -18,8 +18,20 @@ #include <asm/uaccess.h> #include <asm/unistd.h> +/** + * generic_fillattr - Fill in the basic attributes from the inode struct + * @inode: Inode to use as the source + * @stat: Where to fill in the attributes + * + * Fill in the basic attributes in the kstat structure from data that's to be + * found on the VFS inode structure. This is the default if no getattr inode + * operation is supplied. + */ void generic_fillattr(struct inode *inode, struct kstat *stat) { + struct super_block *sb = inode->i_sb; + u32 x; + stat->dev = inode->i_sb->s_dev; stat->ino = inode->i_ino; stat->mode = inode->i_mode; @@ -27,17 +39,61 @@ void generic_fillattr(struct inode *inode, struct kstat *stat) stat->uid = inode->i_uid; stat->gid = inode->i_gid; stat->rdev = inode->i_rdev; - stat->size = i_size_read(inode); - stat->atime = inode->i_atime; stat->mtime = inode->i_mtime; stat->ctime = inode->i_ctime; - stat->blksize = (1 << inode->i_blkbits); + stat->size = i_size_read(inode); stat->blocks = inode->i_blocks; -} + stat->blksize = (1 << inode->i_blkbits); + stat->result_mask |= XSTAT_BASIC_STATS & ~XSTAT_RDEV; + if (IS_NOATIME(inode)) + stat->result_mask &= ~XSTAT_ATIME; + else + stat->atime = inode->i_atime; + + if (S_ISREG(stat->mode) && stat->nlink == 0) + stat->information |= XSTAT_INFO_TEMPORARY; + if (IS_AUTOMOUNT(inode)) + stat->information |= XSTAT_INFO_AUTOMOUNT; + if (IS_POSIXACL(inode)) + stat->information |= XSTAT_INFO_HAS_ACL; + + /* if unset, assume 1s granularity */ + stat->tv_granularity = sb->s_time_gran ?: 1000000000U; + + if (unlikely(S_ISBLK(stat->mode) || S_ISCHR(stat->mode))) + stat->result_mask |= XSTAT_RDEV; + + x = ((u32*)&stat->volume_id)[0] = ((u32*)&sb->s_volume_id)[0]; + x |= ((u32*)&stat->volume_id)[1] = ((u32*)&sb->s_volume_id)[1]; + x |= ((u32*)&stat->volume_id)[2] = ((u32*)&sb->s_volume_id)[2]; + x |= ((u32*)&stat->volume_id)[3] = ((u32*)&sb->s_volume_id)[3]; + if (x) + stat->result_mask |= XSTAT_VOLUME_ID; +} EXPORT_SYMBOL(generic_fillattr); -int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) +/** + * vfs_xgetattr - Get the basic and extra attributes of a file + * @mnt: The mountpoint to which the dentry belongs + * @dentry: The file of interest + * @stat: Where to return the statistics + * + * Ask the filesystem for a file's attributes. The caller must have preset + * stat->request_mask and stat->query_flags to indicate what they want. + * + * If the file is remote, the filesystem can be forced to update the attributes + * from the backing store by passing AT_FORCE_ATTR_SYNC in query_flags. + * + * Bits must have been set in stat->request_mask to indicate which attributes + * the caller wants retrieving. Any such attribute not requested may be + * returned anyway, but the value may be approximate, and, if remote, may not + * have been synchronised with the server. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_xgetattr(struct vfsmount *mnt, struct dentry *dentry, + struct kstat *stat) { struct inode *inode = dentry->d_inode; int retval; @@ -46,64 +102,184 @@ int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) if (retval) return retval; + stat->result_mask = 0; + stat->information = 0; + stat->ioc_flags = 0; if (inode->i_op->getattr) return inode->i_op->getattr(mnt, dentry, stat); generic_fillattr(inode, stat); return 0; } +EXPORT_SYMBOL(vfs_xgetattr); +/** + * vfs_getattr - Get the basic attributes of a file + * @mnt: The mountpoint to which the dentry belongs + * @dentry: The file of interest + * @stat: Where to return the statistics + * + * Ask the filesystem for a file's attributes. If remote, the filesystem isn't + * forced to update its files from the backing store. Only the basic set of + * attributes will be retrieved; anyone wanting more must use vfs_getxattr(), + * as must anyone who wants to force attributes to be sync'd with the server. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) +{ + stat->query_flags = 0; + stat->request_mask = XSTAT_BASIC_STATS; + return vfs_xgetattr(mnt, dentry, stat); +} EXPORT_SYMBOL(vfs_getattr); -int vfs_fstat(unsigned int fd, struct kstat *stat) +/** + * vfs_fxstat - Get basic and extra attributes by file descriptor + * @fd: The file descriptor refering to the file of interest + * @stat: The result structure to fill in. + * + * This function is a wrapper around vfs_xgetattr(). The main difference is + * that it uses a file descriptor to determine the file location. + * + * The caller must have preset stat->query_flags and stat->request_mask as for + * vfs_xgetattr(). + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_fxstat(unsigned int fd, struct kstat *stat) { struct file *f = fget(fd); int error = -EBADF; + if (stat->query_flags & ~KSTAT_QUERY_FLAGS) + return -EINVAL; if (f) { - error = vfs_getattr(f->f_path.mnt, f->f_path.dentry, stat); + error = vfs_xgetattr(f->f_path.mnt, f->f_path.dentry, stat); fput(f); } return error; } +EXPORT_SYMBOL(vfs_fxstat); + +/** + * vfs_fstat - Get basic attributes by file descriptor + * @fd: The file descriptor refering to the file of interest + * @stat: The result structure to fill in. + * + * This function is a wrapper around vfs_getattr(). The main difference is + * that it uses a file descriptor to determine the file location. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_fstat(unsigned int fd, struct kstat *stat) +{ + stat->query_flags = 0; + stat->request_mask = XSTAT_BASIC_STATS; + return vfs_fxstat(fd, stat); +} EXPORT_SYMBOL(vfs_fstat); -int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, - int flag) +/** + * vfs_xstat - Get basic and extra attributes by filename + * @dfd: A file descriptor representing the base dir for a relative filename + * @filename: The name of the file of interest + * @flags: Flags to control the query + * @stat: The result structure to fill in. + * + * This function is a wrapper around vfs_xgetattr(). The main difference is + * that it uses a filename and base directory to determine the file location. + * Additionally, the addition of AT_SYMLINK_NOFOLLOW to flags will prevent a + * symlink at the given name from being referenced. + * + * The caller must have preset stat->request_mask as for vfs_xgetattr(). The + * flags are also used to load up stat->query_flags. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_xstat(int dfd, const char __user *filename, int flags, + struct kstat *stat) { struct path path; - int error = -EINVAL; - int lookup_flags = 0; + int error, lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT; - if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | - AT_EMPTY_PATH)) != 0) - goto out; + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | + AT_EMPTY_PATH | KSTAT_QUERY_FLAGS)) != 0) + return -EINVAL; - if (!(flag & AT_SYMLINK_NOFOLLOW)) - lookup_flags |= LOOKUP_FOLLOW; - if (flag & AT_EMPTY_PATH) + if (flags & AT_SYMLINK_NOFOLLOW) + lookup_flags &= ~LOOKUP_FOLLOW; + if (flags & AT_NO_AUTOMOUNT) + lookup_flags &= ~LOOKUP_AUTOMOUNT; + if (flags & AT_EMPTY_PATH) lookup_flags |= LOOKUP_EMPTY; + stat->query_flags = flags & KSTAT_QUERY_FLAGS; error = user_path_at(dfd, filename, lookup_flags, &path); - if (error) - goto out; - - error = vfs_getattr(path.mnt, path.dentry, stat); - path_put(&path); -out: + if (!error) { + error = vfs_xgetattr(path.mnt, path.dentry, stat); + path_put(&path); + } return error; } +EXPORT_SYMBOL(vfs_xstat); + +/** + * vfs_fstatat - Get basic attributes by filename + * @dfd: A file descriptor representing the base dir for a relative filename + * @filename: The name of the file of interest + * @flags: Flags to control the query + * @stat: The result structure to fill in. + * + * This function is a wrapper around vfs_xstat(). The difference is that it + * preselects basic stats only. The flags are used to load up + * stat->query_flags in addition to indicating symlink handling during path + * resolution. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, + int flags) +{ + stat->request_mask = XSTAT_BASIC_STATS; + return vfs_xstat(dfd, filename, flags, stat); +} EXPORT_SYMBOL(vfs_fstatat); -int vfs_stat(const char __user *name, struct kstat *stat) +/** + * vfs_stat - Get basic attributes by filename + * @filename: The name of the file of interest + * @stat: The result structure to fill in. + * + * This function is a wrapper around vfs_xstat(). The difference is that it + * preselects basic stats only, terminal symlinks are followed regardless and a + * remote filesystem can't be forced to query the server. If such is desired, + * vfs_xstat() should be used instead. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_stat(const char __user *filename, struct kstat *stat) { - return vfs_fstatat(AT_FDCWD, name, stat, 0); + stat->request_mask = XSTAT_BASIC_STATS; + return vfs_xstat(AT_FDCWD, filename, 0, stat); } EXPORT_SYMBOL(vfs_stat); +/** + * vfs_stat - Get basic attributes by filename, without following terminal symlink + * @filename: The name of the file of interest + * @stat: The result structure to fill in. + * + * This function is a wrapper around vfs_xstat(). The difference is that it + * preselects basic stats only, terminal symlinks are note followed regardless + * and a remote filesystem can't be forced to query the server. If such is + * desired, vfs_xstat() should be used instead. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ int vfs_lstat(const char __user *name, struct kstat *stat) { - return vfs_fstatat(AT_FDCWD, name, stat, AT_SYMLINK_NOFOLLOW); + return vfs_xstat(AT_FDCWD, name, AT_SYMLINK_NOFOLLOW, stat); } EXPORT_SYMBOL(vfs_lstat); @@ -118,7 +294,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta { static int warncount = 5; struct __old_kernel_stat tmp; - + if (warncount > 0) { warncount--; printk(KERN_WARNING "VFS: Warning: %s using old stat() call. Recompile your binary.\n", @@ -143,7 +319,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta #if BITS_PER_LONG == 32 if (stat->size > MAX_NON_LFS) return -EOVERFLOW; -#endif +#endif tmp.st_size = stat->size; tmp.st_atime = stat->atime.tv_sec; tmp.st_mtime = stat->mtime.tv_sec; @@ -225,7 +401,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf) #if BITS_PER_LONG == 32 if (stat->size > MAX_NON_LFS) return -EOVERFLOW; -#endif +#endif tmp.st_size = stat->size; tmp.st_atime = stat->atime.tv_sec; tmp.st_mtime = stat->mtime.tv_sec; @@ -412,6 +588,122 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, const char __user *, filename, } #endif /* __ARCH_WANT_STAT64 */ +/* + * Get the xstat parameters if supplied + */ +static int xstat_get_params(unsigned int mask, struct xstat __user *buffer, + struct kstat *stat) +{ + memset(stat, 0xde, sizeof(*stat)); // DEBUGGING + + if (!access_ok(VERIFY_WRITE, buffer, sizeof(*buffer))) + return -EFAULT; + + stat->request_mask = mask & XSTAT_ALL_STATS; + stat->result_mask = 0; + return 0; +} + +/* + * Set the xstat results. + * + * If the buffer size was 0, we just return the size of the buffer needed to + * return the full result. + * + * If bufsize indicates a buffer of insufficient size to hold the full result, + * we return -E2BIG. + * + * Otherwise we copy the extended stats to userspace and return the amount of + * data written into the buffer (or -EFAULT). + */ +static long xstat_set_result(struct kstat *stat, struct xstat __user *buffer) +{ + u32 mask = stat->result_mask, gran = stat->tv_granularity; + +#define __put_timestamp(kts, uts) ( \ + __put_user(kts.tv_sec, uts.tv_sec ) || \ + __put_user(kts.tv_nsec, uts.tv_nsec ) || \ + __put_user(gran, uts.tv_granularity )) + + /* clear out anything we're not returning */ + if (!(mask & XSTAT_IOC_FLAGS)) + stat->ioc_flags = 0; + if (!(mask & XSTAT_BTIME)) + memset(&stat->btime, 0, sizeof(stat->btime)); + if (!(mask & XSTAT_GEN)) + stat->gen = 0; + if (!(mask & XSTAT_VERSION)) + stat->version = 0; + if (!(mask & XSTAT_VOLUME_ID)) + memset(&stat->volume_id, 0, sizeof(stat->volume_id)); + + /* transfer the results */ + if (__put_user(mask, &buffer->st_mask ) || + __put_user(stat->mode, &buffer->st_mode ) || + __put_user(stat->nlink, &buffer->st_nlink ) || + __put_user(stat->uid, &buffer->st_uid ) || + __put_user(stat->gid, &buffer->st_gid ) || + __put_user(stat->information, &buffer->st_information ) || + __put_user(stat->ioc_flags, &buffer->st_ioc_flags ) || + __put_user(stat->blksize, &buffer->st_blksize ) || + __put_user(MAJOR(stat->rdev), &buffer->st_rdev.major ) || + __put_user(MINOR(stat->rdev), &buffer->st_rdev.minor ) || + __put_user(MAJOR(stat->dev), &buffer->st_dev.major ) || + __put_user(MINOR(stat->dev), &buffer->st_dev.minor ) || + __put_timestamp(stat->atime, &buffer->st_atime ) || + __put_timestamp(stat->btime, &buffer->st_btime ) || + __put_timestamp(stat->ctime, &buffer->st_ctime ) || + __put_timestamp(stat->mtime, &buffer->st_mtime ) || + __put_user(stat->ino, &buffer->st_ino ) || + __put_user(stat->size, &buffer->st_size ) || + __put_user(stat->blocks, &buffer->st_blocks ) || + __put_user(stat->gen, &buffer->st_gen ) || + __put_user(stat->version, &buffer->st_version ) || + __copy_to_user(&buffer->st_volume_id, &stat->volume_id, + sizeof(buffer->st_volume_id) ) || + __clear_user(&buffer->__spares, sizeof(buffer->__spares))) + return -EFAULT; + return 0; +} + +/* + * System call to get extended stats by path + */ +SYSCALL_DEFINE5(xstat, + int, dfd, const char __user *, filename, unsigned, flags, + unsigned int, mask, struct xstat __user *, buffer) +{ + struct kstat stat; + int error; + + error = xstat_get_params(mask, buffer, &stat); + if (error != 0) + return error; + error = vfs_xstat(dfd, filename, flags, &stat); + if (error) + return error; + return xstat_set_result(&stat, buffer); +} + +/* + * System call to get extended stats by file descriptor + */ +SYSCALL_DEFINE4(fxstat, unsigned int, fd, unsigned int, flags, + unsigned int, mask, struct xstat __user *, buffer) +{ + struct kstat stat; + int error; + + error = xstat_get_params(mask, buffer, &stat); + if (error < 0) + return error; + stat.query_flags = flags; + error = vfs_fxstat(fd, &stat); + if (error) + return error; + return xstat_set_result(&stat, buffer); +} + /* Caller is here responsible for sufficient locking (ie. inode->i_lock) */ void __inode_add_bytes(struct inode *inode, loff_t bytes) { diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index f550f89..faa9e5d 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -47,6 +47,7 @@ #define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ #define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */ #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */ +#define AT_FORCE_ATTR_SYNC 0x2000 /* Force the attributes to be sync'd with the server */ #ifdef __KERNEL__ diff --git a/include/linux/fs.h b/include/linux/fs.h index 8de6755..ec6c62e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1467,6 +1467,7 @@ struct super_block { char s_id[32]; /* Informational name */ u8 s_uuid[16]; /* UUID */ + unsigned char s_volume_id[16]; /* Volume identifier */ void *s_fs_info; /* Filesystem private info */ unsigned int s_max_links; @@ -2470,6 +2471,7 @@ extern const struct inode_operations page_symlink_inode_operations; extern int generic_readlink(struct dentry *, char __user *, int); extern void generic_fillattr(struct inode *, struct kstat *); extern int vfs_getattr(struct vfsmount *, struct dentry *, struct kstat *); +extern int vfs_xgetattr(struct vfsmount *, struct dentry *, struct kstat *); void __inode_add_bytes(struct inode *inode, loff_t bytes); void inode_add_bytes(struct inode *inode, loff_t bytes); void inode_sub_bytes(struct inode *inode, loff_t bytes); @@ -2482,6 +2484,8 @@ extern int vfs_stat(const char __user *, struct kstat *); extern int vfs_lstat(const char __user *, struct kstat *); extern int vfs_fstat(unsigned int, struct kstat *); extern int vfs_fstatat(int , const char __user *, struct kstat *, int); +extern int vfs_xstat(int, const char __user *, int, struct kstat *); +extern int vfs_xfstat(unsigned int, struct kstat *); extern int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, unsigned long arg); diff --git a/include/linux/stat.h b/include/linux/stat.h index 611c398..0ff561a 100644 --- a/include/linux/stat.h +++ b/include/linux/stat.h @@ -3,6 +3,7 @@ #ifdef __KERNEL__ +#include <linux/types.h> #include <asm/stat.h> #endif @@ -46,6 +47,117 @@ #endif +/* + * Query request/result mask + * + * Bits should be set in request_mask to request particular items when calling + * xstat() or fxstat(). + * + * The bits in st_mask may or may not be set upon return, in part depending on + * what was set in the mask argument: + * + * - if not available at all, the bit will be cleared before returning and the + * field will be cleared; otherwise, + * + * - if AT_FORCE_ATTR_SYNC is set, then the datum will be synchronised to the + * server and the field and bit will be set on return; otherwise, + * + * - if explicitly requested, the datum will be synchronised to a server or + * other medium if out of date before being returned, and the bit will be set + * on return; otherwise, + * + * - if not requested, but available in approximate form without any effort, it + * will be filled in anyway, and the bit will be set upon return (it might + * not be up to date, however, and no attempt will be made to synchronise the + * internal state first); otherwise, + * + * - the field and the bit will be cleared before returning. + * + * Items in XSTAT_BASIC_STATS may be marked unavailable on return, but they + * will have a value installed for compatibility purposes so that stat() and + * co. can be emulated in userspace. + */ +#define XSTAT_MODE 0x00000001U /* want/got st_mode */ +#define XSTAT_NLINK 0x00000002U /* want/got st_nlink */ +#define XSTAT_UID 0x00000004U /* want/got st_uid */ +#define XSTAT_GID 0x00000008U /* want/got st_gid */ +#define XSTAT_RDEV 0x00000010U /* want/got st_rdev */ +#define XSTAT_ATIME 0x00000020U /* want/got st_atime */ +#define XSTAT_MTIME 0x00000040U /* want/got st_mtime */ +#define XSTAT_CTIME 0x00000080U /* want/got st_ctime */ +#define XSTAT_INO 0x00000100U /* want/got st_ino */ +#define XSTAT_SIZE 0x00000200U /* want/got st_size */ +#define XSTAT_BLOCKS 0x00000400U /* want/got st_blocks */ +#define XSTAT_BASIC_STATS 0x000007ffU /* the stuff in the normal stat struct */ +#define XSTAT_IOC_FLAGS 0x00000800U /* want/got FS_IOC_GETFLAGS */ +#define XSTAT_BTIME 0x00001000U /* want/got st_btime */ +#define XSTAT_GEN 0x00002000U /* want/got st_gen */ +#define XSTAT_VERSION 0x00004000U /* want/got st_version */ +#define XSTAT_VOLUME_ID 0x00008000U /* want/got st_volume_id */ +#define XSTAT_ALL_STATS 0x0000ffffU /* all supported stats */ + +/* + * Extended stat structures + */ +struct xstat_dev { + uint32_t major, minor; +}; + +struct xstat_time { + int64_t tv_sec; + uint32_t tv_nsec; + uint32_t tv_granularity; /* time granularity (in nS) */ +}; + +struct xstat { + uint32_t st_mask; /* what results were written */ + uint32_t st_mode; /* file mode */ + uint32_t st_nlink; /* number of hard links */ + uint32_t st_uid; /* user ID of owner */ + uint32_t st_gid; /* group ID of owner */ + uint32_t st_information; /* information about the file */ + uint32_t st_ioc_flags; /* as FS_IOC_GETFLAGS */ + uint32_t st_blksize; /* optimal size for filesystem I/O */ + struct xstat_dev st_rdev; /* device ID of special file */ + struct xstat_dev st_dev; /* ID of device containing file */ + struct xstat_time st_atime; /* last access time */ + struct xstat_time st_btime; /* file creation time */ + struct xstat_time st_ctime; /* last attribute change time */ + struct xstat_time st_mtime; /* last data modification time */ + uint64_t st_ino; /* inode number */ + uint64_t st_size; /* file size */ + uint64_t st_blocks; /* number of 512-byte blocks allocated */ + uint64_t st_gen; /* inode generation number */ + uint64_t st_version; /* data version number */ + uint8_t st_volume_id[16]; /* volume identifier */ + uint64_t __spares[11]; /* spare space for future expansion */ +}; + +/* + * Flags to be found in st_information + * + * These give information about the features or the state of a file that might + * be of use to ordinary userspace programs such as GUIs or ls rather than + * specialised tools. + * + * Additional information may be found in st_ioc_flags and we try not to + * overlap with it. + */ +#define XSTAT_INFO_ENCRYPTED 0x00000001U /* File is encrypted */ +#define XSTAT_INFO_TEMPORARY 0x00000002U /* File is temporary (NTFS/CIFS) */ +#define XSTAT_INFO_FABRICATED 0x00000004U /* File was made up by filesystem */ +#define XSTAT_INFO_KERNEL_API 0x00000008U /* File is kernel API (eg: procfs/sysfs) */ +#define XSTAT_INFO_REMOTE 0x00000010U /* File is remote */ +#define XSTAT_INFO_OFFLINE 0x00000020U /* File is offline (CIFS) */ +#define XSTAT_INFO_AUTOMOUNT 0x00000040U /* Dir is automount trigger */ +#define XSTAT_INFO_AUTODIR 0x00000080U /* Dir provides unlisted automounts */ +#define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U /* File has non-system ownership details */ +#define XSTAT_INFO_HAS_ACL 0x00000200U /* File has an ACL of some sort */ +#define XSTAT_INFO_REPARSE_POINT 0x00000400U /* File is reparse point (NTFS/CIFS) */ +#define XSTAT_INFO_HIDDEN 0x00000800U /* File is marked hidden (DOS+) */ +#define XSTAT_INFO_SYSTEM 0x00001000U /* File is marked system (DOS+) */ +#define XSTAT_INFO_ARCHIVE 0x00002000U /* File is marked archive (DOS+) */ + #ifdef __KERNEL__ #define S_IRWXUGO (S_IRWXU|S_IRWXG|S_IRWXO) #define S_IALLUGO (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO) @@ -60,6 +172,12 @@ #include <linux/time.h> struct kstat { + u32 query_flags; /* operational flags */ +#define KSTAT_QUERY_FLAGS (AT_FORCE_ATTR_SYNC) + u32 request_mask; /* what fields the user asked for */ + u32 result_mask; /* what fields the user got */ + u32 information; + u32 ioc_flags; /* inode flags (FS_IOC_GETFLAGS) */ u64 ino; dev_t dev; umode_t mode; @@ -67,14 +185,18 @@ struct kstat { uid_t uid; gid_t gid; dev_t rdev; + unsigned int tv_granularity; /* granularity of times (in nS) */ loff_t size; - struct timespec atime; + struct timespec atime; struct timespec mtime; struct timespec ctime; + struct timespec btime; /* file creation time */ unsigned long blksize; unsigned long long blocks; + u64 gen; /* inode generation */ + u64 version; /* data version */ + unsigned char volume_id[16]; /* volume identifier */ }; #endif - #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 3de3acb..ff9f8d9 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -45,6 +45,8 @@ struct shmid_ds; struct sockaddr; struct stat; struct stat64; +struct xstat_parameters; +struct xstat; struct statfs; struct statfs64; struct __sysctl_args; @@ -858,4 +860,9 @@ asmlinkage long sys_process_vm_writev(pid_t pid, unsigned long riovcnt, unsigned long flags); +asmlinkage long sys_xstat(int dfd, const char __user *path, unsigned flags, + unsigned mask, struct xstat __user *buffer); +asmlinkage long sys_fxstat(unsigned fd, unsigned flags, + unsigned mask, struct xstat __user *buffer); + #endif ^ permalink raw reply related [flat|nested] 144+ messages in thread
* [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available @ 2012-04-19 14:06 ` David Howells 0 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-19 14:06 UTC (permalink / raw) To: linux-fsdevel Cc: dhowells, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Add a pair of system calls to make extended file stats available, including file creation time, inode version and data version where available through the underlying filesystem. The idea was initially proposed as a set of xattrs that could be retrieved with getxattr(), but the general preferance proved to be for new syscalls with an extended stat structure. This has a number of uses: (1) Creation time: The SMB protocol carries the creation time, which could be exported by Samba, which will in turn help CIFS make use of FS-Cache as that can be used for coherency data. This is also specified in NFSv4 as a recommended attribute and could be exported by NFSD [Steve French]. (2) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server [Trond Myklebust, Ulrich Drepper]. (3) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date [Trond Myklebust]. (4) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd Schubert]. (5) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. Can also be used to modify fill_post_wcc() in NFSD which retrieves i_version directly, but has just called vfs_getattr(). It could get it from the kstat struct if it used vfs_xgetattr() instead. (6) BSD stat compatibility: Including more fields from the BSD stat such as creation time (st_btime) and inode generation number (st_gen) [Jeremy Allison, Bernd Schubert]. (7) Extra coherency data may be useful in making backups [Andreas Dilger]. (8) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available, so if, for instance, inode numbers or UIDs don't exist... (9) Make the fields a consistent size on all arches and make them large. (10) Store a 16-byte volume ID in the superblock that can be returned in struct xstat [Steve French]. (11) Include granularity fields in the time data to indicate the granularity of each of the times (NFSv4 time_delta) [Steve French]. (12) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. (13) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, Michael Kerrisk]. (14) Spare space, request flags and information flags are provided for future expansion. The following structures are defined for the use of these new system calls: struct xstat_dev { uint32_t major, minor; }; struct xstat_time { uint64_t tv_sec; uint32_t tv_nsec; uint32_t tv_granularity; }; struct xstat { uint32_t st_mask; uint32_t st_mode; uint32_t st_nlink; uint32_t st_uid; uint32_t st_gid; uint32_t st_information; uint32_t st_ioc_flags; uint32_t st_blksize; struct xstat_dev st_rdev; struct xstat_dev st_dev; struct xstat_time st_atime; struct xstat_time st_btime; struct xstat_time st_ctime; struct xstat_time st_mtime; uint64_t st_ino; uint64_t st_size; uint64_t st_blocks; uint64_t st_gen; uint64_t st_version; uint8_t st_volume_id[16]; uint64_t __spares[11]; }; where st_information is local system information about the file, st_btime is the file creation time, st_gen is the inode generation (i_generation), st_data_version is the data version number (i_version), st_ioc_flags is the flags from FS_IOC_GETFLAGS, st_volume_id is where the volume identified is stored, st_result_mask is a bitmask indicating the data provided and __spares[] are where as-yet undefined fields can be placed. The defined bits in request_mask and st_mask are: XSTAT_MODE Want/got st_mode XSTAT_NLINK Want/got st_nlink XSTAT_UID Want/got st_uid XSTAT_GID Want/got st_gid XSTAT_RDEV Want/got st_rdev XSTAT_ATIME Want/got st_atime XSTAT_MTIME Want/got st_mtime XSTAT_CTIME Want/got st_ctime XSTAT_INO Want/got st_ino XSTAT_SIZE Want/got st_size XSTAT_BLOCKS Want/got st_blocks XSTAT_BASIC_STATS [The stuff in the normal stat struct] XSTAT_IOC_FLAGS Want/got FS_IOC_GETFLAGS XSTAT_BTIME Want/got st_btime XSTAT_GEN Want/got st_gen XSTAT_VERSION Want/got st_data_version XSTAT_VOLUME_ID Want/got st_volume_id XSTAT_ALL_STATS [All currently available stuff] The defined bits in st_ioc_flags are the usual FS_xxx_FL, plus some extra flags that might be supplied by the filesystem. Note that Ext4 returns flags outside of {EXT4,FS}_FL_USER_VISIBLE in response to FS_IOC_GETFLAGS. Should {EXT4,FS}_FL_USER_VISIBLE be extended to cover them? Or should the extra flags be suppressed? The defined bits in the st_information field give local system data on a file, how it is accessed, where it is and what it does: XSTAT_INFO_ENCRYPTED File is encrypted XSTAT_INFO_TEMPORARY File is temporary (NTFS/CIFS/deleted) XSTAT_INFO_FABRICATED File was made up by filesystem XSTAT_INFO_KERNEL_API File is kernel API (eg: procfs/sysfs) XSTAT_INFO_REMOTE File is remote XSTAT_INFO_OFFLINE File is offline (CIFS) XSTAT_INFO_AUTOMOUNT Dir is automount trigger XSTAT_INFO_AUTODIR Dir provides unlisted automounts XSTAT_INFO_NONSYSTEM_OWNERSHIP File has non-system ownership details XSTAT_INFO_HAS_ACL File has an ACL of some sort XSTAT_INFO_REPARSE_POINT File is reparse point (NTFS/CIFS) XSTAT_INFO_HIDDEN File is marked hidden (DOS+) XSTAT_INFO_SYSTEM File is marked system (DOS+) XSTAT_INFO_ARCHIVE File is marked archive (DOS+) These are for the use of GUI tools that might want to mark files specially, depending on what they are. I've tried not to provide overlap with st_ioc_flags where something usable exists there. Should Hidden, System and Archive flags be associated with ioc_flags, perhaps with ioc_flags extended to 64-bits? The system calls are: ssize_t ret = xstat(int dfd, const char *filename, unsigned int flags, unsigned int mask, struct xstat *buffer); ssize_t ret = fxstat(unsigned fd, unsigned int flags, unsigned int mask, struct xstat *buffer); The dfd, filename, flags and fd parameters indicate the file to query. There is no equivalent of lstat() as that can be emulated with xstat() by passing AT_SYMLINK_NOFOLLOW in flags. AT_FORCE_ATTR_SYNC can also be set in flags. This will require a network filesystem to synchronise its attributes with the server. mask is a bitmask indicating the fields in struct xstat that are of interest to the caller. The user should set this to XSTAT__BASIC_STATS to get the basic set returned by stat(). Should there just be one xstat() syscall that does fxstat() if filename is NULL? The fields in struct xstat come in a number of classes: (0) st_dev, st_blksize, st_information. These are local data and are always available. (1) st_mode, st_nlinks, st_uid, st_gid, st_[amc]time, st_ino, st_size, st_blocks. These will be returned whether the caller asks for them or not. The corresponding bits in result_mask will be set to indicate their presence. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. If the values don't actually exist for the underlying object (such as UID or GID on a DOS file), then the bit won't be set in the result_mask, even if the caller asked for the value and the returned value will be a fabrication. (2) st_rdev. As for class (1), but this won't be returned if the file is not a blockdev or chardev. The bit will be cleared if the value is not returned. (3) File creation time (st_btime), inode generation (st_gen), data version (st_version), volume_id (st_volume_id) and inode flags (st_ioc_flags). These will be returned if available whether the caller asked for them or not. The corresponding bits in result_mask will be set or cleared as appropriate to indicate their presence. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. At the moment, this will only work on x86_64 and i386 as it requires system calls to be wired up. ======= TESTING ======= The following test program can be used to test the xstat system call: /* Test the xstat() system call * * Copyright (C) 2010 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public Licence * as published by the Free Software Foundation; either version * 2 of the Licence, or (at your option) any later version. */ #define _GNU_SOURCE #define _ATFILE_SOURCE #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <fcntl.h> #include <time.h> #include <sys/syscall.h> #include <sys/stat.h> #include <sys/types.h> #define AT_NO_AUTOMOUNT 0x800 #define AT_FORCE_ATTR_SYNC 0x2000 #define XSTAT_MODE 0x00000001U #define XSTAT_NLINK 0x00000002U #define XSTAT_UID 0x00000004U #define XSTAT_GID 0x00000008U #define XSTAT_RDEV 0x00000010U #define XSTAT_ATIME 0x00000020U #define XSTAT_MTIME 0x00000040U #define XSTAT_CTIME 0x00000080U #define XSTAT_INO 0x00000100U #define XSTAT_SIZE 0x00000200U #define XSTAT_BLOCKS 0x00000400U #define XSTAT_BASIC_STATS 0x000007ffU #define XSTAT_BTIME 0x00000800U #define XSTAT_GEN 0x00001000U #define XSTAT_VERSION 0x00002000U #define XSTAT_IOC_FLAGS 0x00004000U #define XSTAT_VOLUME_ID 0x00008000U #define XSTAT_ALL_STATS 0x0000ffffU struct xstat_dev { uint32_t major; uint32_t minor; }; struct xstat_time { uint64_t tv_sec; uint32_t tv_nsec; uint32_t tv_granularity; }; struct xstat { uint32_t st_mask; uint32_t st_mode; uint32_t st_nlink; uint32_t st_uid; uint32_t st_gid; uint32_t st_information; uint32_t st_ioc_flags; uint32_t st_blksize; struct xstat_dev st_rdev; struct xstat_dev st_dev; struct xstat_time st_atim; struct xstat_time st_btim; struct xstat_time st_ctim; struct xstat_time st_mtim; uint64_t st_ino; uint64_t st_size; uint64_t st_blksize; uint64_t st_blocks; uint64_t st_gen; uint64_t st_version; uint64_t st_volume_id[16]; uint64_t st_spares[11]; }; #define XSTAT_INFO_ENCRYPTED 0x00000001U #define XSTAT_INFO_TEMPORARY 0x00000002U #define XSTAT_INFO_FABRICATED 0x00000004U #define XSTAT_INFO_KERNEL_API 0x00000008U #define XSTAT_INFO_REMOTE 0x00000010U #define XSTAT_INFO_OFFLINE 0x00000020U #define XSTAT_INFO_AUTOMOUNT 0x00000040U #define XSTAT_INFO_AUTODIR 0x00000080U #define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U #define XSTAT_INFO_HAS_ACL 0x00000200U #define XSTAT_INFO_REPARSE_POINT 0x00000400U #define XSTAT_INFO_HIDDEN 0x00000800U #define XSTAT_INFO_SYSTEM 0x00001000U #define XSTAT_INFO_ARCHIVE 0x00002000U #define __NR_xstat 312 #define __NR_fxstat 313 static __attribute__((unused)) ssize_t xstat(int dfd, const char *filename, unsigned flags, unsigned int mask, struct xstat *buffer) { return syscall(__NR_xstat, dfd, filename, flags, mask, buffer); } static __attribute__((unused)) ssize_t fxstat(int fd, unsigned flags, unsigned int mask, struct xstat *buffer) { return syscall(__NR_fxstat, fd, flags, mask, buffer); } static void print_time(const char *field, const struct xstat_time *xstm) { struct tm tm; time_t tim; char buffer[100]; int len; tim = xstm->tv_sec; if (!localtime_r(&tim, &tm)) { perror("localtime_r"); exit(1); } len = strftime(buffer, 100, "%F %T", &tm); if (len == 0) { perror("strftime"); exit(1); } printf("%s", field); fwrite(buffer, 1, len, stdout); printf(".%09u", xstm->tv_nsec); len = strftime(buffer, 100, "%z", &tm); if (len == 0) { perror("strftime2"); exit(1); } fwrite(buffer, 1, len, stdout); printf("\n"); } static void dump_xstat(struct xstat *xst) { char buffer[256], ft; printf("results=%x\n", xst->st_mask); printf(" "); if (xst->st_mask & XSTAT_SIZE) printf(" Size: %-15llu", (unsigned long long) xst->st_size); if (xst->st_mask & XSTAT_BLOCKS) printf(" Blocks: %-10llu", (unsigned long long) xst->st_blocks); printf(" IO Block: %-6llu ", (unsigned long long) xst->st_blksize); if (xst->st_mask & XSTAT_MODE) { switch (xst->st_mode & S_IFMT) { case S_IFIFO: printf(" FIFO\n"); ft = 'p'; break; case S_IFCHR: printf(" character special file\n"); ft = 'c'; break; case S_IFDIR: printf(" directory\n"); ft = 'd'; break; case S_IFBLK: printf(" block special file\n"); ft = 'b'; break; case S_IFREG: printf(" regular file\n"); ft = '-'; break; case S_IFLNK: printf(" symbolic link\n"); ft = 'l'; break; case S_IFSOCK: printf(" socket\n"); ft = 's'; break; default: printf("unknown type (%o)\n", xst->st_mode & S_IFMT); ft = '?'; break; } } sprintf(buffer, "%02x:%02x", xst->st_dev.major, xst->st_dev.minor); printf("Device: %-15s", buffer); if (xst->st_mask & XSTAT_INO) printf(" Inode: %-11llu", (unsigned long long) xst->st_ino); if (xst->st_mask & XSTAT_SIZE) printf(" Links: %-5u", xst->st_nlink); if (xst->st_mask & XSTAT_RDEV) printf(" Device type: %u,%u", xst->st_rdev.major, xst->st_rdev.minor); printf("\n"); if (xst->st_mask & XSTAT_MODE) printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c) ", xst->st_mode & 07777, ft, xst->st_mode & S_IRUSR ? 'r' : '-', xst->st_mode & S_IWUSR ? 'w' : '-', xst->st_mode & S_IXUSR ? 'x' : '-', xst->st_mode & S_IRGRP ? 'r' : '-', xst->st_mode & S_IWGRP ? 'w' : '-', xst->st_mode & S_IXGRP ? 'x' : '-', xst->st_mode & S_IROTH ? 'r' : '-', xst->st_mode & S_IWOTH ? 'w' : '-', xst->st_mode & S_IXOTH ? 'x' : '-'); if (xst->st_mask & XSTAT_UID) printf("Uid: %d \n", xst->st_uid); if (xst->st_mask & XSTAT_GID) printf("Gid: %u\n", xst->st_gid); if (xst->st_mask & XSTAT_ATIME) print_time("Access: ", &xst->st_atim); if (xst->st_mask & XSTAT_MTIME) print_time("Modify: ", &xst->st_mtim); if (xst->st_mask & XSTAT_CTIME) print_time("Change: ", &xst->st_ctim); if (xst->st_mask & XSTAT_BTIME) print_time("Create: ", &xst->st_btim); if (xst->st_mask & XSTAT_GEN) printf("Inode version: %llxh\n", (unsigned long long) xst->st_gen); if (xst->st_mask & XSTAT_VERSION) printf("Data version: %llxh\n", (unsigned long long) xst->st_version); if (xst->st_mask & XSTAT_IOC_FLAGS) { unsigned char bits; int loop, byte; static char flag_representation[32 + 1] = /* FS_IOC_GETFLAGS flags: */ "????????" /* 31-24 0x00000000-ff000000 */ "????ehTD" /* 23-16 0x00000000-00ff0000 */ "tj?IE?XZ" /* 15- 8 0x00000000-0000ff00 */ "AdaiScus" /* 7- 0 0x00000000-000000ff */ ; printf("Inode flags: %08x (", xst->st_ioc_flags); for (byte = 32 - 8; byte >= 0; byte -= 8) { bits = xst->st_ioc_flags >> byte; for (loop = 7; loop >= 0; loop--) { int bit = byte + loop; if (bits & 0x80) putchar(flag_representation[31 - bit]); else putchar('-'); bits <<= 1; } if (byte) putchar(' '); } printf(")\n"); } if (xst->st_information) { unsigned char bits; int loop, byte; static char info_representation[32 + 1] = /* XSTAT_INFO_ flags: */ "????????" /* 31-24 0x00000000-ff000000 */ "????????" /* 23-16 0x00000000-00ff0000 */ "??ASHRan" /* 15- 8 0x00000000-0000ff00 */ "dmorkfte" /* 7- 0 0x00000000-000000ff */ ; printf("Information: %08x (", xst->st_information); for (byte = 32 - 8; byte >= 0; byte -= 8) { bits = xst->st_information >> byte; for (loop = 7; loop >= 0; loop--) { int bit = byte + loop; if (bits & 0x80) putchar(info_representation[31 - bit]); else putchar('-'); bits <<= 1; } if (byte) putchar(' '); } printf(")\n"); } if (xst->st_mask & XSTAT_VOLUME_ID) { int loop; printf("Volume ID: "); for (loop = 0; loop < sizeof(xst->st_volume_id); loop++) { printf("%02x", xst->st_volume_id[loop]); if (loop == 7) printf("-"); } printf("\n"); } } void dump_hex(unsigned long long *data, int from, int to) { unsigned offset, print_offset = 1, col = 0; from /= 8; to = (to + 7) / 8; for (offset = from; offset < to; offset++) { if (print_offset) { printf("%04x: ", offset * 8); print_offset = 0; } printf("%016llx", data[offset]); col++; if ((col & 3) == 0) { printf("\n"); print_offset = 1; } else { printf(" "); } } if (!print_offset) printf("\n"); } int main(int argc, char **argv) { struct xstat xst; int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW; unsigned int mask = XSTAT_ALL_STATS; for (argv++; *argv; argv++) { if (strcmp(*argv, "-F") == 0) { atflag |= AT_FORCE_ATTR_SYNC; continue; } if (strcmp(*argv, "-L") == 0) { atflag &= ~AT_SYMLINK_NOFOLLOW; continue; } if (strcmp(*argv, "-O") == 0) { mask &= ~XSTAT_BASIC_STATS; continue; } if (strcmp(*argv, "-A") == 0) { atflag |= AT_NO_AUTOMOUNT; continue; } if (strcmp(*argv, "-R") == 0) { raw = 1; continue; } memset(&xst, 0xbf, sizeof(xst)); ret = xstat(AT_FDCWD, *argv, atflag, mask, &xst); printf("xstat(%s) = %d\n", *argv, ret); if (ret < 0) { perror(*argv); exit(1); } if (raw) dump_hex((unsigned long long *)&xst, 0, sizeof(xst)); dump_xstat(&xst); } return 0; } Just compile and run, passing it paths to the files you want to examine: [root@andromeda ~]# /tmp/xstat /proc/$$ xstat(/proc/2074) = 160 results=47ef Size: 0 Blocks: 0 IO Block: 1024 directory Device: 00:03 Inode: 9072 Links: 7 Access: (0555/dr-xr-xr-x) Uid: 0 Gid: 0 Access: 2010-07-14 16:50:46.609336272+0100 Modify: 2010-07-14 16:50:46.609336272+0100 Change: 2010-07-14 16:50:46.609336272+0100 Inode flags: 0000000100000000 (-------- -------- -------- -------S -------- -------- -------- --------) [root@andromeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm xstat(/afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm) = 160 results=77ef Size: 5413882 Blocks: 0 IO Block: 4096 regular file Device: 00:15 Inode: 2288 Links: 1 Access: (0644/-rw-r--r--) Uid: 75338 Gid: 0 Access: 2008-11-05 19:47:22.000000000+0000 Modify: 2008-11-05 19:47:22.000000000+0000 Change: 2008-11-05 19:47:22.000000000+0000 Inode version: 795h Data version: 2h Inode flags: 0000000800000000 (-------- -------- -------- ----r--- -------- -------- -------- --------) Signed-off-by: David Howells <dhowells@redhat.com> --- arch/x86/syscalls/syscall_32.tbl | 2 arch/x86/syscalls/syscall_64.tbl | 2 fs/stat.c | 350 +++++++++++++++++++++++++++++++++++--- include/linux/fcntl.h | 1 include/linux/fs.h | 4 include/linux/stat.h | 126 +++++++++++++- include/linux/syscalls.h | 7 + 7 files changed, 461 insertions(+), 31 deletions(-) diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl index 29f9f05..980eb5a 100644 --- a/arch/x86/syscalls/syscall_32.tbl +++ b/arch/x86/syscalls/syscall_32.tbl @@ -355,3 +355,5 @@ 346 i386 setns sys_setns 347 i386 process_vm_readv sys_process_vm_readv compat_sys_process_vm_readv 348 i386 process_vm_writev sys_process_vm_writev compat_sys_process_vm_writev +349 i386 xstat sys_xstat +350 i386 fxstat sys_fxstat diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index dd29a9e..7ae24bb 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -318,6 +318,8 @@ 309 common getcpu sys_getcpu 310 64 process_vm_readv sys_process_vm_readv 311 64 process_vm_writev sys_process_vm_writev +312 common xstat sys_xstat +313 common fxstat sys_fxstat # # x32-specific system call numbers start at 512 to avoid cache impact # for native 64-bit operation. diff --git a/fs/stat.c b/fs/stat.c index c733dc5..af3ef33 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -18,8 +18,20 @@ #include <asm/uaccess.h> #include <asm/unistd.h> +/** + * generic_fillattr - Fill in the basic attributes from the inode struct + * @inode: Inode to use as the source + * @stat: Where to fill in the attributes + * + * Fill in the basic attributes in the kstat structure from data that's to be + * found on the VFS inode structure. This is the default if no getattr inode + * operation is supplied. + */ void generic_fillattr(struct inode *inode, struct kstat *stat) { + struct super_block *sb = inode->i_sb; + u32 x; + stat->dev = inode->i_sb->s_dev; stat->ino = inode->i_ino; stat->mode = inode->i_mode; @@ -27,17 +39,61 @@ void generic_fillattr(struct inode *inode, struct kstat *stat) stat->uid = inode->i_uid; stat->gid = inode->i_gid; stat->rdev = inode->i_rdev; - stat->size = i_size_read(inode); - stat->atime = inode->i_atime; stat->mtime = inode->i_mtime; stat->ctime = inode->i_ctime; - stat->blksize = (1 << inode->i_blkbits); + stat->size = i_size_read(inode); stat->blocks = inode->i_blocks; -} + stat->blksize = (1 << inode->i_blkbits); + stat->result_mask |= XSTAT_BASIC_STATS & ~XSTAT_RDEV; + if (IS_NOATIME(inode)) + stat->result_mask &= ~XSTAT_ATIME; + else + stat->atime = inode->i_atime; + + if (S_ISREG(stat->mode) && stat->nlink == 0) + stat->information |= XSTAT_INFO_TEMPORARY; + if (IS_AUTOMOUNT(inode)) + stat->information |= XSTAT_INFO_AUTOMOUNT; + if (IS_POSIXACL(inode)) + stat->information |= XSTAT_INFO_HAS_ACL; + + /* if unset, assume 1s granularity */ + stat->tv_granularity = sb->s_time_gran ?: 1000000000U; + + if (unlikely(S_ISBLK(stat->mode) || S_ISCHR(stat->mode))) + stat->result_mask |= XSTAT_RDEV; + + x = ((u32*)&stat->volume_id)[0] = ((u32*)&sb->s_volume_id)[0]; + x |= ((u32*)&stat->volume_id)[1] = ((u32*)&sb->s_volume_id)[1]; + x |= ((u32*)&stat->volume_id)[2] = ((u32*)&sb->s_volume_id)[2]; + x |= ((u32*)&stat->volume_id)[3] = ((u32*)&sb->s_volume_id)[3]; + if (x) + stat->result_mask |= XSTAT_VOLUME_ID; +} EXPORT_SYMBOL(generic_fillattr); -int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) +/** + * vfs_xgetattr - Get the basic and extra attributes of a file + * @mnt: The mountpoint to which the dentry belongs + * @dentry: The file of interest + * @stat: Where to return the statistics + * + * Ask the filesystem for a file's attributes. The caller must have preset + * stat->request_mask and stat->query_flags to indicate what they want. + * + * If the file is remote, the filesystem can be forced to update the attributes + * from the backing store by passing AT_FORCE_ATTR_SYNC in query_flags. + * + * Bits must have been set in stat->request_mask to indicate which attributes + * the caller wants retrieving. Any such attribute not requested may be + * returned anyway, but the value may be approximate, and, if remote, may not + * have been synchronised with the server. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_xgetattr(struct vfsmount *mnt, struct dentry *dentry, + struct kstat *stat) { struct inode *inode = dentry->d_inode; int retval; @@ -46,64 +102,184 @@ int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) if (retval) return retval; + stat->result_mask = 0; + stat->information = 0; + stat->ioc_flags = 0; if (inode->i_op->getattr) return inode->i_op->getattr(mnt, dentry, stat); generic_fillattr(inode, stat); return 0; } +EXPORT_SYMBOL(vfs_xgetattr); +/** + * vfs_getattr - Get the basic attributes of a file + * @mnt: The mountpoint to which the dentry belongs + * @dentry: The file of interest + * @stat: Where to return the statistics + * + * Ask the filesystem for a file's attributes. If remote, the filesystem isn't + * forced to update its files from the backing store. Only the basic set of + * attributes will be retrieved; anyone wanting more must use vfs_getxattr(), + * as must anyone who wants to force attributes to be sync'd with the server. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) +{ + stat->query_flags = 0; + stat->request_mask = XSTAT_BASIC_STATS; + return vfs_xgetattr(mnt, dentry, stat); +} EXPORT_SYMBOL(vfs_getattr); -int vfs_fstat(unsigned int fd, struct kstat *stat) +/** + * vfs_fxstat - Get basic and extra attributes by file descriptor + * @fd: The file descriptor refering to the file of interest + * @stat: The result structure to fill in. + * + * This function is a wrapper around vfs_xgetattr(). The main difference is + * that it uses a file descriptor to determine the file location. + * + * The caller must have preset stat->query_flags and stat->request_mask as for + * vfs_xgetattr(). + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_fxstat(unsigned int fd, struct kstat *stat) { struct file *f = fget(fd); int error = -EBADF; + if (stat->query_flags & ~KSTAT_QUERY_FLAGS) + return -EINVAL; if (f) { - error = vfs_getattr(f->f_path.mnt, f->f_path.dentry, stat); + error = vfs_xgetattr(f->f_path.mnt, f->f_path.dentry, stat); fput(f); } return error; } +EXPORT_SYMBOL(vfs_fxstat); + +/** + * vfs_fstat - Get basic attributes by file descriptor + * @fd: The file descriptor refering to the file of interest + * @stat: The result structure to fill in. + * + * This function is a wrapper around vfs_getattr(). The main difference is + * that it uses a file descriptor to determine the file location. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_fstat(unsigned int fd, struct kstat *stat) +{ + stat->query_flags = 0; + stat->request_mask = XSTAT_BASIC_STATS; + return vfs_fxstat(fd, stat); +} EXPORT_SYMBOL(vfs_fstat); -int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, - int flag) +/** + * vfs_xstat - Get basic and extra attributes by filename + * @dfd: A file descriptor representing the base dir for a relative filename + * @filename: The name of the file of interest + * @flags: Flags to control the query + * @stat: The result structure to fill in. + * + * This function is a wrapper around vfs_xgetattr(). The main difference is + * that it uses a filename and base directory to determine the file location. + * Additionally, the addition of AT_SYMLINK_NOFOLLOW to flags will prevent a + * symlink at the given name from being referenced. + * + * The caller must have preset stat->request_mask as for vfs_xgetattr(). The + * flags are also used to load up stat->query_flags. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_xstat(int dfd, const char __user *filename, int flags, + struct kstat *stat) { struct path path; - int error = -EINVAL; - int lookup_flags = 0; + int error, lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT; - if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | - AT_EMPTY_PATH)) != 0) - goto out; + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | + AT_EMPTY_PATH | KSTAT_QUERY_FLAGS)) != 0) + return -EINVAL; - if (!(flag & AT_SYMLINK_NOFOLLOW)) - lookup_flags |= LOOKUP_FOLLOW; - if (flag & AT_EMPTY_PATH) + if (flags & AT_SYMLINK_NOFOLLOW) + lookup_flags &= ~LOOKUP_FOLLOW; + if (flags & AT_NO_AUTOMOUNT) + lookup_flags &= ~LOOKUP_AUTOMOUNT; + if (flags & AT_EMPTY_PATH) lookup_flags |= LOOKUP_EMPTY; + stat->query_flags = flags & KSTAT_QUERY_FLAGS; error = user_path_at(dfd, filename, lookup_flags, &path); - if (error) - goto out; - - error = vfs_getattr(path.mnt, path.dentry, stat); - path_put(&path); -out: + if (!error) { + error = vfs_xgetattr(path.mnt, path.dentry, stat); + path_put(&path); + } return error; } +EXPORT_SYMBOL(vfs_xstat); + +/** + * vfs_fstatat - Get basic attributes by filename + * @dfd: A file descriptor representing the base dir for a relative filename + * @filename: The name of the file of interest + * @flags: Flags to control the query + * @stat: The result structure to fill in. + * + * This function is a wrapper around vfs_xstat(). The difference is that it + * preselects basic stats only. The flags are used to load up + * stat->query_flags in addition to indicating symlink handling during path + * resolution. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, + int flags) +{ + stat->request_mask = XSTAT_BASIC_STATS; + return vfs_xstat(dfd, filename, flags, stat); +} EXPORT_SYMBOL(vfs_fstatat); -int vfs_stat(const char __user *name, struct kstat *stat) +/** + * vfs_stat - Get basic attributes by filename + * @filename: The name of the file of interest + * @stat: The result structure to fill in. + * + * This function is a wrapper around vfs_xstat(). The difference is that it + * preselects basic stats only, terminal symlinks are followed regardless and a + * remote filesystem can't be forced to query the server. If such is desired, + * vfs_xstat() should be used instead. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ +int vfs_stat(const char __user *filename, struct kstat *stat) { - return vfs_fstatat(AT_FDCWD, name, stat, 0); + stat->request_mask = XSTAT_BASIC_STATS; + return vfs_xstat(AT_FDCWD, filename, 0, stat); } EXPORT_SYMBOL(vfs_stat); +/** + * vfs_stat - Get basic attributes by filename, without following terminal symlink + * @filename: The name of the file of interest + * @stat: The result structure to fill in. + * + * This function is a wrapper around vfs_xstat(). The difference is that it + * preselects basic stats only, terminal symlinks are note followed regardless + * and a remote filesystem can't be forced to query the server. If such is + * desired, vfs_xstat() should be used instead. + * + * 0 will be returned on success, and a -ve error code if unsuccessful. + */ int vfs_lstat(const char __user *name, struct kstat *stat) { - return vfs_fstatat(AT_FDCWD, name, stat, AT_SYMLINK_NOFOLLOW); + return vfs_xstat(AT_FDCWD, name, AT_SYMLINK_NOFOLLOW, stat); } EXPORT_SYMBOL(vfs_lstat); @@ -118,7 +294,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta { static int warncount = 5; struct __old_kernel_stat tmp; - + if (warncount > 0) { warncount--; printk(KERN_WARNING "VFS: Warning: %s using old stat() call. Recompile your binary.\n", @@ -143,7 +319,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta #if BITS_PER_LONG == 32 if (stat->size > MAX_NON_LFS) return -EOVERFLOW; -#endif +#endif tmp.st_size = stat->size; tmp.st_atime = stat->atime.tv_sec; tmp.st_mtime = stat->mtime.tv_sec; @@ -225,7 +401,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf) #if BITS_PER_LONG == 32 if (stat->size > MAX_NON_LFS) return -EOVERFLOW; -#endif +#endif tmp.st_size = stat->size; tmp.st_atime = stat->atime.tv_sec; tmp.st_mtime = stat->mtime.tv_sec; @@ -412,6 +588,122 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, const char __user *, filename, } #endif /* __ARCH_WANT_STAT64 */ +/* + * Get the xstat parameters if supplied + */ +static int xstat_get_params(unsigned int mask, struct xstat __user *buffer, + struct kstat *stat) +{ + memset(stat, 0xde, sizeof(*stat)); // DEBUGGING + + if (!access_ok(VERIFY_WRITE, buffer, sizeof(*buffer))) + return -EFAULT; + + stat->request_mask = mask & XSTAT_ALL_STATS; + stat->result_mask = 0; + return 0; +} + +/* + * Set the xstat results. + * + * If the buffer size was 0, we just return the size of the buffer needed to + * return the full result. + * + * If bufsize indicates a buffer of insufficient size to hold the full result, + * we return -E2BIG. + * + * Otherwise we copy the extended stats to userspace and return the amount of + * data written into the buffer (or -EFAULT). + */ +static long xstat_set_result(struct kstat *stat, struct xstat __user *buffer) +{ + u32 mask = stat->result_mask, gran = stat->tv_granularity; + +#define __put_timestamp(kts, uts) ( \ + __put_user(kts.tv_sec, uts.tv_sec ) || \ + __put_user(kts.tv_nsec, uts.tv_nsec ) || \ + __put_user(gran, uts.tv_granularity )) + + /* clear out anything we're not returning */ + if (!(mask & XSTAT_IOC_FLAGS)) + stat->ioc_flags = 0; + if (!(mask & XSTAT_BTIME)) + memset(&stat->btime, 0, sizeof(stat->btime)); + if (!(mask & XSTAT_GEN)) + stat->gen = 0; + if (!(mask & XSTAT_VERSION)) + stat->version = 0; + if (!(mask & XSTAT_VOLUME_ID)) + memset(&stat->volume_id, 0, sizeof(stat->volume_id)); + + /* transfer the results */ + if (__put_user(mask, &buffer->st_mask ) || + __put_user(stat->mode, &buffer->st_mode ) || + __put_user(stat->nlink, &buffer->st_nlink ) || + __put_user(stat->uid, &buffer->st_uid ) || + __put_user(stat->gid, &buffer->st_gid ) || + __put_user(stat->information, &buffer->st_information ) || + __put_user(stat->ioc_flags, &buffer->st_ioc_flags ) || + __put_user(stat->blksize, &buffer->st_blksize ) || + __put_user(MAJOR(stat->rdev), &buffer->st_rdev.major ) || + __put_user(MINOR(stat->rdev), &buffer->st_rdev.minor ) || + __put_user(MAJOR(stat->dev), &buffer->st_dev.major ) || + __put_user(MINOR(stat->dev), &buffer->st_dev.minor ) || + __put_timestamp(stat->atime, &buffer->st_atime ) || + __put_timestamp(stat->btime, &buffer->st_btime ) || + __put_timestamp(stat->ctime, &buffer->st_ctime ) || + __put_timestamp(stat->mtime, &buffer->st_mtime ) || + __put_user(stat->ino, &buffer->st_ino ) || + __put_user(stat->size, &buffer->st_size ) || + __put_user(stat->blocks, &buffer->st_blocks ) || + __put_user(stat->gen, &buffer->st_gen ) || + __put_user(stat->version, &buffer->st_version ) || + __copy_to_user(&buffer->st_volume_id, &stat->volume_id, + sizeof(buffer->st_volume_id) ) || + __clear_user(&buffer->__spares, sizeof(buffer->__spares))) + return -EFAULT; + return 0; +} + +/* + * System call to get extended stats by path + */ +SYSCALL_DEFINE5(xstat, + int, dfd, const char __user *, filename, unsigned, flags, + unsigned int, mask, struct xstat __user *, buffer) +{ + struct kstat stat; + int error; + + error = xstat_get_params(mask, buffer, &stat); + if (error != 0) + return error; + error = vfs_xstat(dfd, filename, flags, &stat); + if (error) + return error; + return xstat_set_result(&stat, buffer); +} + +/* + * System call to get extended stats by file descriptor + */ +SYSCALL_DEFINE4(fxstat, unsigned int, fd, unsigned int, flags, + unsigned int, mask, struct xstat __user *, buffer) +{ + struct kstat stat; + int error; + + error = xstat_get_params(mask, buffer, &stat); + if (error < 0) + return error; + stat.query_flags = flags; + error = vfs_fxstat(fd, &stat); + if (error) + return error; + return xstat_set_result(&stat, buffer); +} + /* Caller is here responsible for sufficient locking (ie. inode->i_lock) */ void __inode_add_bytes(struct inode *inode, loff_t bytes) { diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index f550f89..faa9e5d 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -47,6 +47,7 @@ #define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ #define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */ #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */ +#define AT_FORCE_ATTR_SYNC 0x2000 /* Force the attributes to be sync'd with the server */ #ifdef __KERNEL__ diff --git a/include/linux/fs.h b/include/linux/fs.h index 8de6755..ec6c62e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1467,6 +1467,7 @@ struct super_block { char s_id[32]; /* Informational name */ u8 s_uuid[16]; /* UUID */ + unsigned char s_volume_id[16]; /* Volume identifier */ void *s_fs_info; /* Filesystem private info */ unsigned int s_max_links; @@ -2470,6 +2471,7 @@ extern const struct inode_operations page_symlink_inode_operations; extern int generic_readlink(struct dentry *, char __user *, int); extern void generic_fillattr(struct inode *, struct kstat *); extern int vfs_getattr(struct vfsmount *, struct dentry *, struct kstat *); +extern int vfs_xgetattr(struct vfsmount *, struct dentry *, struct kstat *); void __inode_add_bytes(struct inode *inode, loff_t bytes); void inode_add_bytes(struct inode *inode, loff_t bytes); void inode_sub_bytes(struct inode *inode, loff_t bytes); @@ -2482,6 +2484,8 @@ extern int vfs_stat(const char __user *, struct kstat *); extern int vfs_lstat(const char __user *, struct kstat *); extern int vfs_fstat(unsigned int, struct kstat *); extern int vfs_fstatat(int , const char __user *, struct kstat *, int); +extern int vfs_xstat(int, const char __user *, int, struct kstat *); +extern int vfs_xfstat(unsigned int, struct kstat *); extern int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, unsigned long arg); diff --git a/include/linux/stat.h b/include/linux/stat.h index 611c398..0ff561a 100644 --- a/include/linux/stat.h +++ b/include/linux/stat.h @@ -3,6 +3,7 @@ #ifdef __KERNEL__ +#include <linux/types.h> #include <asm/stat.h> #endif @@ -46,6 +47,117 @@ #endif +/* + * Query request/result mask + * + * Bits should be set in request_mask to request particular items when calling + * xstat() or fxstat(). + * + * The bits in st_mask may or may not be set upon return, in part depending on + * what was set in the mask argument: + * + * - if not available at all, the bit will be cleared before returning and the + * field will be cleared; otherwise, + * + * - if AT_FORCE_ATTR_SYNC is set, then the datum will be synchronised to the + * server and the field and bit will be set on return; otherwise, + * + * - if explicitly requested, the datum will be synchronised to a server or + * other medium if out of date before being returned, and the bit will be set + * on return; otherwise, + * + * - if not requested, but available in approximate form without any effort, it + * will be filled in anyway, and the bit will be set upon return (it might + * not be up to date, however, and no attempt will be made to synchronise the + * internal state first); otherwise, + * + * - the field and the bit will be cleared before returning. + * + * Items in XSTAT_BASIC_STATS may be marked unavailable on return, but they + * will have a value installed for compatibility purposes so that stat() and + * co. can be emulated in userspace. + */ +#define XSTAT_MODE 0x00000001U /* want/got st_mode */ +#define XSTAT_NLINK 0x00000002U /* want/got st_nlink */ +#define XSTAT_UID 0x00000004U /* want/got st_uid */ +#define XSTAT_GID 0x00000008U /* want/got st_gid */ +#define XSTAT_RDEV 0x00000010U /* want/got st_rdev */ +#define XSTAT_ATIME 0x00000020U /* want/got st_atime */ +#define XSTAT_MTIME 0x00000040U /* want/got st_mtime */ +#define XSTAT_CTIME 0x00000080U /* want/got st_ctime */ +#define XSTAT_INO 0x00000100U /* want/got st_ino */ +#define XSTAT_SIZE 0x00000200U /* want/got st_size */ +#define XSTAT_BLOCKS 0x00000400U /* want/got st_blocks */ +#define XSTAT_BASIC_STATS 0x000007ffU /* the stuff in the normal stat struct */ +#define XSTAT_IOC_FLAGS 0x00000800U /* want/got FS_IOC_GETFLAGS */ +#define XSTAT_BTIME 0x00001000U /* want/got st_btime */ +#define XSTAT_GEN 0x00002000U /* want/got st_gen */ +#define XSTAT_VERSION 0x00004000U /* want/got st_version */ +#define XSTAT_VOLUME_ID 0x00008000U /* want/got st_volume_id */ +#define XSTAT_ALL_STATS 0x0000ffffU /* all supported stats */ + +/* + * Extended stat structures + */ +struct xstat_dev { + uint32_t major, minor; +}; + +struct xstat_time { + int64_t tv_sec; + uint32_t tv_nsec; + uint32_t tv_granularity; /* time granularity (in nS) */ +}; + +struct xstat { + uint32_t st_mask; /* what results were written */ + uint32_t st_mode; /* file mode */ + uint32_t st_nlink; /* number of hard links */ + uint32_t st_uid; /* user ID of owner */ + uint32_t st_gid; /* group ID of owner */ + uint32_t st_information; /* information about the file */ + uint32_t st_ioc_flags; /* as FS_IOC_GETFLAGS */ + uint32_t st_blksize; /* optimal size for filesystem I/O */ + struct xstat_dev st_rdev; /* device ID of special file */ + struct xstat_dev st_dev; /* ID of device containing file */ + struct xstat_time st_atime; /* last access time */ + struct xstat_time st_btime; /* file creation time */ + struct xstat_time st_ctime; /* last attribute change time */ + struct xstat_time st_mtime; /* last data modification time */ + uint64_t st_ino; /* inode number */ + uint64_t st_size; /* file size */ + uint64_t st_blocks; /* number of 512-byte blocks allocated */ + uint64_t st_gen; /* inode generation number */ + uint64_t st_version; /* data version number */ + uint8_t st_volume_id[16]; /* volume identifier */ + uint64_t __spares[11]; /* spare space for future expansion */ +}; + +/* + * Flags to be found in st_information + * + * These give information about the features or the state of a file that might + * be of use to ordinary userspace programs such as GUIs or ls rather than + * specialised tools. + * + * Additional information may be found in st_ioc_flags and we try not to + * overlap with it. + */ +#define XSTAT_INFO_ENCRYPTED 0x00000001U /* File is encrypted */ +#define XSTAT_INFO_TEMPORARY 0x00000002U /* File is temporary (NTFS/CIFS) */ +#define XSTAT_INFO_FABRICATED 0x00000004U /* File was made up by filesystem */ +#define XSTAT_INFO_KERNEL_API 0x00000008U /* File is kernel API (eg: procfs/sysfs) */ +#define XSTAT_INFO_REMOTE 0x00000010U /* File is remote */ +#define XSTAT_INFO_OFFLINE 0x00000020U /* File is offline (CIFS) */ +#define XSTAT_INFO_AUTOMOUNT 0x00000040U /* Dir is automount trigger */ +#define XSTAT_INFO_AUTODIR 0x00000080U /* Dir provides unlisted automounts */ +#define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U /* File has non-system ownership details */ +#define XSTAT_INFO_HAS_ACL 0x00000200U /* File has an ACL of some sort */ +#define XSTAT_INFO_REPARSE_POINT 0x00000400U /* File is reparse point (NTFS/CIFS) */ +#define XSTAT_INFO_HIDDEN 0x00000800U /* File is marked hidden (DOS+) */ +#define XSTAT_INFO_SYSTEM 0x00001000U /* File is marked system (DOS+) */ +#define XSTAT_INFO_ARCHIVE 0x00002000U /* File is marked archive (DOS+) */ + #ifdef __KERNEL__ #define S_IRWXUGO (S_IRWXU|S_IRWXG|S_IRWXO) #define S_IALLUGO (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO) @@ -60,6 +172,12 @@ #include <linux/time.h> struct kstat { + u32 query_flags; /* operational flags */ +#define KSTAT_QUERY_FLAGS (AT_FORCE_ATTR_SYNC) + u32 request_mask; /* what fields the user asked for */ + u32 result_mask; /* what fields the user got */ + u32 information; + u32 ioc_flags; /* inode flags (FS_IOC_GETFLAGS) */ u64 ino; dev_t dev; umode_t mode; @@ -67,14 +185,18 @@ struct kstat { uid_t uid; gid_t gid; dev_t rdev; + unsigned int tv_granularity; /* granularity of times (in nS) */ loff_t size; - struct timespec atime; + struct timespec atime; struct timespec mtime; struct timespec ctime; + struct timespec btime; /* file creation time */ unsigned long blksize; unsigned long long blocks; + u64 gen; /* inode generation */ + u64 version; /* data version */ + unsigned char volume_id[16]; /* volume identifier */ }; #endif - #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 3de3acb..ff9f8d9 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -45,6 +45,8 @@ struct shmid_ds; struct sockaddr; struct stat; struct stat64; +struct xstat_parameters; +struct xstat; struct statfs; struct statfs64; struct __sysctl_args; @@ -858,4 +860,9 @@ asmlinkage long sys_process_vm_writev(pid_t pid, unsigned long riovcnt, unsigned long flags); +asmlinkage long sys_xstat(int dfd, const char __user *path, unsigned flags, + unsigned mask, struct xstat __user *buffer); +asmlinkage long sys_fxstat(unsigned fd, unsigned flags, + unsigned mask, struct xstat __user *buffer); + #endif ^ permalink raw reply related [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-19 14:06 ` David Howells (?) @ 2012-04-19 23:36 ` Andreas Dilger -1 siblings, 0 replies; 144+ messages in thread From: Andreas Dilger @ 2012-04-19 23:36 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On 2012-04-19, at 8:06 AM, David Howells wrote: > Add a pair of system calls to make extended file stats available, > including file creation time, inode version and data version where available through the underlying filesystem. > > The idea was initially proposed as a set of xattrs that could be > retrieved with getxattr(), but the general preferance proved to be > for new syscalls with an extended stat structure. I would comment that it was the opposite. It was originally a stat()-like extension that degraded into a messy getxattr() mess. > (2) Lightweight stat: Ask for just those details of interest, and > allow a netfs (such as NFS) to approximate anything not of > interest, possibly without going to the server [Trond Myklebust, > Ulrich Drepper]. This was my original motivation for this functionality, so you can put my name here also. > The fields in struct xstat come in a number of classes: > > (0) st_dev, st_blksize, st_information. > > These are local data and are always available. For the extra two bits it would cost us, I don't think st_blksize and st_information should always be returned. st_blksize may be variable for a distributed filesystem, and some of the fields in st_information (offline) may not be free to access either. Cheers, Andreas ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <20120419140612.17272.57774.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>]
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-19 14:06 ` David Howells @ 2012-04-24 21:29 ` J. Bruce Fields -1 siblings, 0 replies; 144+ messages in thread From: J. Bruce Fields @ 2012-04-24 21:29 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Thu, Apr 19, 2012 at 03:06:12PM +0100, David Howells wrote: > Add a pair of system calls to make extended file stats available, including > file creation time, inode version and data version where available through the > underlying filesystem. > > The idea was initially proposed as a set of xattrs that could be retrieved with > getxattr(), but the general preferance proved to be for new syscalls with an > extended stat structure. > > This has a number of uses: > > (1) Creation time: The SMB protocol carries the creation time, which could be > exported by Samba, which will in turn help CIFS make use of FS-Cache as > that can be used for coherency data. > > This is also specified in NFSv4 as a recommended attribute and could be > exported by NFSD [Steve French]. > > (2) Lightweight stat: Ask for just those details of interest, and allow a > netfs (such as NFS) to approximate anything not of interest, possibly > without going to the server [Trond Myklebust, Ulrich Drepper]. > > (3) Heavyweight stat: Force a netfs to go to the server, even if it thinks its > cached attributes are up to date [Trond Myklebust]. > > (4) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd > Schubert]. > > (5) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. > > Can also be used to modify fill_post_wcc() in NFSD which retrieves > i_version directly, but has just called vfs_getattr(). It could get it > from the kstat struct if it used vfs_xgetattr() instead. > > (6) BSD stat compatibility: Including more fields from the BSD stat such as > creation time (st_btime) and inode generation number (st_gen) [Jeremy > Allison, Bernd Schubert]. > > (7) Extra coherency data may be useful in making backups [Andreas Dilger]. > > (8) Allow the filesystem to indicate what it can/cannot provide: A filesystem > can now say it doesn't support a standard stat feature if that isn't > available, so if, for instance, inode numbers or UIDs don't exist... > > (9) Make the fields a consistent size on all arches and make them large. > > (10) Store a 16-byte volume ID in the superblock that can be returned in struct > xstat [Steve French]. > > (11) Include granularity fields in the time data to indicate the granularity of > each of the times (NFSv4 time_delta) [Steve French]. It looks like you're including this with *each* time? But surely there's no filesystem with different granularity (say) for ctime than for mtime. Also, nfsd will want only one time_delta, not one for each time. Note also we need to document carefully what this means: I think it should be the granularity that the filesystem is capable of representing, but people are sometimes surprised to find out that the actual time source is usually more coarse-grained than that. --b. > > (12) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. > > (13) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, > Michael Kerrisk]. > > (14) Spare space, request flags and information flags are provided for future > expansion. > > > The following structures are defined for the use of these new system calls: > > struct xstat_dev { > uint32_t major, minor; > }; > > struct xstat_time { > uint64_t tv_sec; > uint32_t tv_nsec; > uint32_t tv_granularity; > }; > > struct xstat { > uint32_t st_mask; > uint32_t st_mode; > uint32_t st_nlink; > uint32_t st_uid; > uint32_t st_gid; > uint32_t st_information; > uint32_t st_ioc_flags; > uint32_t st_blksize; > struct xstat_dev st_rdev; > struct xstat_dev st_dev; > struct xstat_time st_atime; > struct xstat_time st_btime; > struct xstat_time st_ctime; > struct xstat_time st_mtime; > uint64_t st_ino; > uint64_t st_size; > uint64_t st_blocks; > uint64_t st_gen; > uint64_t st_version; > uint8_t st_volume_id[16]; > uint64_t __spares[11]; > }; > > where st_information is local system information about the file, st_btime is > the file creation time, st_gen is the inode generation (i_generation), > st_data_version is the data version number (i_version), st_ioc_flags is the > flags from FS_IOC_GETFLAGS, st_volume_id is where the volume identified is > stored, st_result_mask is a bitmask indicating the data provided and __spares[] > are where as-yet undefined fields can be placed. > > The defined bits in request_mask and st_mask are: > > XSTAT_MODE Want/got st_mode > XSTAT_NLINK Want/got st_nlink > XSTAT_UID Want/got st_uid > XSTAT_GID Want/got st_gid > XSTAT_RDEV Want/got st_rdev > XSTAT_ATIME Want/got st_atime > XSTAT_MTIME Want/got st_mtime > XSTAT_CTIME Want/got st_ctime > XSTAT_INO Want/got st_ino > XSTAT_SIZE Want/got st_size > XSTAT_BLOCKS Want/got st_blocks > XSTAT_BASIC_STATS [The stuff in the normal stat struct] > XSTAT_IOC_FLAGS Want/got FS_IOC_GETFLAGS > XSTAT_BTIME Want/got st_btime > XSTAT_GEN Want/got st_gen > XSTAT_VERSION Want/got st_data_version > XSTAT_VOLUME_ID Want/got st_volume_id > XSTAT_ALL_STATS [All currently available stuff] > > The defined bits in st_ioc_flags are the usual FS_xxx_FL, plus some extra flags > that might be supplied by the filesystem. Note that Ext4 returns flags outside > of {EXT4,FS}_FL_USER_VISIBLE in response to FS_IOC_GETFLAGS. Should > {EXT4,FS}_FL_USER_VISIBLE be extended to cover them? Or should the extra flags > be suppressed? > > The defined bits in the st_information field give local system data on a file, > how it is accessed, where it is and what it does: > > XSTAT_INFO_ENCRYPTED File is encrypted > XSTAT_INFO_TEMPORARY File is temporary (NTFS/CIFS/deleted) > XSTAT_INFO_FABRICATED File was made up by filesystem > XSTAT_INFO_KERNEL_API File is kernel API (eg: procfs/sysfs) > XSTAT_INFO_REMOTE File is remote > XSTAT_INFO_OFFLINE File is offline (CIFS) > XSTAT_INFO_AUTOMOUNT Dir is automount trigger > XSTAT_INFO_AUTODIR Dir provides unlisted automounts > XSTAT_INFO_NONSYSTEM_OWNERSHIP File has non-system ownership details > XSTAT_INFO_HAS_ACL File has an ACL of some sort > XSTAT_INFO_REPARSE_POINT File is reparse point (NTFS/CIFS) > XSTAT_INFO_HIDDEN File is marked hidden (DOS+) > XSTAT_INFO_SYSTEM File is marked system (DOS+) > XSTAT_INFO_ARCHIVE File is marked archive (DOS+) > > These are for the use of GUI tools that might want to mark files specially, > depending on what they are. I've tried not to provide overlap with > st_ioc_flags where something usable exists there. Should Hidden, System and > Archive flags be associated with ioc_flags, perhaps with ioc_flags extended to > 64-bits? > > > The system calls are: > > ssize_t ret = xstat(int dfd, > const char *filename, > unsigned int flags, > unsigned int mask, > struct xstat *buffer); > > ssize_t ret = fxstat(unsigned fd, > unsigned int flags, > unsigned int mask, > struct xstat *buffer); > > > The dfd, filename, flags and fd parameters indicate the file to query. There > is no equivalent of lstat() as that can be emulated with xstat() by passing > AT_SYMLINK_NOFOLLOW in flags. > > AT_FORCE_ATTR_SYNC can also be set in flags. This will require a network > filesystem to synchronise its attributes with the server. > > mask is a bitmask indicating the fields in struct xstat that are of interest to > the caller. The user should set this to XSTAT__BASIC_STATS to get the > basic set returned by stat(). > > Should there just be one xstat() syscall that does fxstat() if filename is NULL? > > The fields in struct xstat come in a number of classes: > > (0) st_dev, st_blksize, st_information. > > These are local data and are always available. > > (1) st_mode, st_nlinks, st_uid, st_gid, st_[amc]time, st_ino, st_size, > st_blocks. > > These will be returned whether the caller asks for them or not. The > corresponding bits in result_mask will be set to indicate their presence. > > If the caller didn't ask for them, then they may be approximated. For > example, NFS won't waste any time updating them from the server, unless as > a byproduct of updating something requested. > > If the values don't actually exist for the underlying object (such as UID > or GID on a DOS file), then the bit won't be set in the result_mask, even > if the caller asked for the value and the returned value will be a > fabrication. > > (2) st_rdev. > > As for class (1), but this won't be returned if the file is not a blockdev > or chardev. The bit will be cleared if the value is not returned. > > (3) File creation time (st_btime), inode generation (st_gen), data version > (st_version), volume_id (st_volume_id) and inode flags (st_ioc_flags). > > These will be returned if available whether the caller asked for them or > not. The corresponding bits in result_mask will be set or cleared as > appropriate to indicate their presence. > > If the caller didn't ask for them, then they may be approximated. For > example, NFS won't waste any time updating them from the server, unless > as a byproduct of updating something requested. > > At the moment, this will only work on x86_64 and i386 as it requires system > calls to be wired up. > > > ======= > TESTING > ======= > > The following test program can be used to test the xstat system call: > > /* Test the xstat() system call > * > * Copyright (C) 2010 Red Hat, Inc. All Rights Reserved. > * Written by David Howells (dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org) > * > * This program is free software; you can redistribute it and/or > * modify it under the terms of the GNU General Public Licence > * as published by the Free Software Foundation; either version > * 2 of the Licence, or (at your option) any later version. > */ > > #define _GNU_SOURCE > #define _ATFILE_SOURCE > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <unistd.h> > #include <fcntl.h> > #include <time.h> > #include <sys/syscall.h> > #include <sys/stat.h> > #include <sys/types.h> > > #define AT_NO_AUTOMOUNT 0x800 > #define AT_FORCE_ATTR_SYNC 0x2000 > > #define XSTAT_MODE 0x00000001U > #define XSTAT_NLINK 0x00000002U > #define XSTAT_UID 0x00000004U > #define XSTAT_GID 0x00000008U > #define XSTAT_RDEV 0x00000010U > #define XSTAT_ATIME 0x00000020U > #define XSTAT_MTIME 0x00000040U > #define XSTAT_CTIME 0x00000080U > #define XSTAT_INO 0x00000100U > #define XSTAT_SIZE 0x00000200U > #define XSTAT_BLOCKS 0x00000400U > #define XSTAT_BASIC_STATS 0x000007ffU > #define XSTAT_BTIME 0x00000800U > #define XSTAT_GEN 0x00001000U > #define XSTAT_VERSION 0x00002000U > #define XSTAT_IOC_FLAGS 0x00004000U > #define XSTAT_VOLUME_ID 0x00008000U > #define XSTAT_ALL_STATS 0x0000ffffU > > struct xstat_dev { > uint32_t major; > uint32_t minor; > }; > > struct xstat_time { > uint64_t tv_sec; > uint32_t tv_nsec; > uint32_t tv_granularity; > }; > > struct xstat { > uint32_t st_mask; > uint32_t st_mode; > uint32_t st_nlink; > uint32_t st_uid; > uint32_t st_gid; > uint32_t st_information; > uint32_t st_ioc_flags; > uint32_t st_blksize; > struct xstat_dev st_rdev; > struct xstat_dev st_dev; > struct xstat_time st_atim; > struct xstat_time st_btim; > struct xstat_time st_ctim; > struct xstat_time st_mtim; > uint64_t st_ino; > uint64_t st_size; > uint64_t st_blksize; > uint64_t st_blocks; > uint64_t st_gen; > uint64_t st_version; > uint64_t st_volume_id[16]; > uint64_t st_spares[11]; > }; > > #define XSTAT_INFO_ENCRYPTED 0x00000001U > #define XSTAT_INFO_TEMPORARY 0x00000002U > #define XSTAT_INFO_FABRICATED 0x00000004U > #define XSTAT_INFO_KERNEL_API 0x00000008U > #define XSTAT_INFO_REMOTE 0x00000010U > #define XSTAT_INFO_OFFLINE 0x00000020U > #define XSTAT_INFO_AUTOMOUNT 0x00000040U > #define XSTAT_INFO_AUTODIR 0x00000080U > #define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U > #define XSTAT_INFO_HAS_ACL 0x00000200U > #define XSTAT_INFO_REPARSE_POINT 0x00000400U > #define XSTAT_INFO_HIDDEN 0x00000800U > #define XSTAT_INFO_SYSTEM 0x00001000U > #define XSTAT_INFO_ARCHIVE 0x00002000U > > #define __NR_xstat 312 > #define __NR_fxstat 313 > > static __attribute__((unused)) > ssize_t xstat(int dfd, const char *filename, unsigned flags, > unsigned int mask, struct xstat *buffer) > { > return syscall(__NR_xstat, dfd, filename, flags, mask, buffer); > } > > static __attribute__((unused)) > ssize_t fxstat(int fd, unsigned flags, > unsigned int mask, struct xstat *buffer) > { > return syscall(__NR_fxstat, fd, flags, mask, buffer); > } > > static void print_time(const char *field, const struct xstat_time *xstm) > { > struct tm tm; > time_t tim; > char buffer[100]; > int len; > > tim = xstm->tv_sec; > if (!localtime_r(&tim, &tm)) { > perror("localtime_r"); > exit(1); > } > len = strftime(buffer, 100, "%F %T", &tm); > if (len == 0) { > perror("strftime"); > exit(1); > } > printf("%s", field); > fwrite(buffer, 1, len, stdout); > printf(".%09u", xstm->tv_nsec); > len = strftime(buffer, 100, "%z", &tm); > if (len == 0) { > perror("strftime2"); > exit(1); > } > fwrite(buffer, 1, len, stdout); > printf("\n"); > } > > static void dump_xstat(struct xstat *xst) > { > char buffer[256], ft; > > printf("results=%x\n", xst->st_mask); > > printf(" "); > if (xst->st_mask & XSTAT_SIZE) > printf(" Size: %-15llu", (unsigned long long) xst->st_size); > if (xst->st_mask & XSTAT_BLOCKS) > printf(" Blocks: %-10llu", (unsigned long long) xst->st_blocks); > printf(" IO Block: %-6llu ", (unsigned long long) xst->st_blksize); > if (xst->st_mask & XSTAT_MODE) { > switch (xst->st_mode & S_IFMT) { > case S_IFIFO: printf(" FIFO\n"); ft = 'p'; break; > case S_IFCHR: printf(" character special file\n"); ft = 'c'; break; > case S_IFDIR: printf(" directory\n"); ft = 'd'; break; > case S_IFBLK: printf(" block special file\n"); ft = 'b'; break; > case S_IFREG: printf(" regular file\n"); ft = '-'; break; > case S_IFLNK: printf(" symbolic link\n"); ft = 'l'; break; > case S_IFSOCK: printf(" socket\n"); ft = 's'; break; > default: > printf("unknown type (%o)\n", xst->st_mode & S_IFMT); > ft = '?'; > break; > } > } > > sprintf(buffer, "%02x:%02x", xst->st_dev.major, xst->st_dev.minor); > printf("Device: %-15s", buffer); > if (xst->st_mask & XSTAT_INO) > printf(" Inode: %-11llu", (unsigned long long) xst->st_ino); > if (xst->st_mask & XSTAT_SIZE) > printf(" Links: %-5u", xst->st_nlink); > if (xst->st_mask & XSTAT_RDEV) > printf(" Device type: %u,%u", > xst->st_rdev.major, xst->st_rdev.minor); > printf("\n"); > > if (xst->st_mask & XSTAT_MODE) > printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c) ", > xst->st_mode & 07777, > ft, > xst->st_mode & S_IRUSR ? 'r' : '-', > xst->st_mode & S_IWUSR ? 'w' : '-', > xst->st_mode & S_IXUSR ? 'x' : '-', > xst->st_mode & S_IRGRP ? 'r' : '-', > xst->st_mode & S_IWGRP ? 'w' : '-', > xst->st_mode & S_IXGRP ? 'x' : '-', > xst->st_mode & S_IROTH ? 'r' : '-', > xst->st_mode & S_IWOTH ? 'w' : '-', > xst->st_mode & S_IXOTH ? 'x' : '-'); > if (xst->st_mask & XSTAT_UID) > printf("Uid: %d \n", xst->st_uid); > if (xst->st_mask & XSTAT_GID) > printf("Gid: %u\n", xst->st_gid); > > if (xst->st_mask & XSTAT_ATIME) > print_time("Access: ", &xst->st_atim); > if (xst->st_mask & XSTAT_MTIME) > print_time("Modify: ", &xst->st_mtim); > if (xst->st_mask & XSTAT_CTIME) > print_time("Change: ", &xst->st_ctim); > if (xst->st_mask & XSTAT_BTIME) > print_time("Create: ", &xst->st_btim); > > if (xst->st_mask & XSTAT_GEN) > printf("Inode version: %llxh\n", (unsigned long long) xst->st_gen); > if (xst->st_mask & XSTAT_VERSION) > printf("Data version: %llxh\n", (unsigned long long) xst->st_version); > > if (xst->st_mask & XSTAT_IOC_FLAGS) { > unsigned char bits; > int loop, byte; > > static char flag_representation[32 + 1] = > /* FS_IOC_GETFLAGS flags: */ > "????????" /* 31-24 0x00000000-ff000000 */ > "????ehTD" /* 23-16 0x00000000-00ff0000 */ > "tj?IE?XZ" /* 15- 8 0x00000000-0000ff00 */ > "AdaiScus" /* 7- 0 0x00000000-000000ff */ > ; > > printf("Inode flags: %08x (", xst->st_ioc_flags); > for (byte = 32 - 8; byte >= 0; byte -= 8) { > bits = xst->st_ioc_flags >> byte; > for (loop = 7; loop >= 0; loop--) { > int bit = byte + loop; > > if (bits & 0x80) > putchar(flag_representation[31 - bit]); > else > putchar('-'); > bits <<= 1; > } > if (byte) > putchar(' '); > } > printf(")\n"); > } > > if (xst->st_information) { > unsigned char bits; > int loop, byte; > > static char info_representation[32 + 1] = > /* XSTAT_INFO_ flags: */ > "????????" /* 31-24 0x00000000-ff000000 */ > "????????" /* 23-16 0x00000000-00ff0000 */ > "??ASHRan" /* 15- 8 0x00000000-0000ff00 */ > "dmorkfte" /* 7- 0 0x00000000-000000ff */ > ; > > printf("Information: %08x (", xst->st_information); > for (byte = 32 - 8; byte >= 0; byte -= 8) { > bits = xst->st_information >> byte; > for (loop = 7; loop >= 0; loop--) { > int bit = byte + loop; > > if (bits & 0x80) > putchar(info_representation[31 - bit]); > else > putchar('-'); > bits <<= 1; > } > if (byte) > putchar(' '); > } > printf(")\n"); > } > > if (xst->st_mask & XSTAT_VOLUME_ID) { > int loop; > printf("Volume ID: "); > for (loop = 0; loop < sizeof(xst->st_volume_id); loop++) { > printf("%02x", xst->st_volume_id[loop]); > if (loop == 7) > printf("-"); > } > printf("\n"); > } > } > > void dump_hex(unsigned long long *data, int from, int to) > { > unsigned offset, print_offset = 1, col = 0; > > from /= 8; > to = (to + 7) / 8; > > for (offset = from; offset < to; offset++) { > if (print_offset) { > printf("%04x: ", offset * 8); > print_offset = 0; > } > printf("%016llx", data[offset]); > col++; > if ((col & 3) == 0) { > printf("\n"); > print_offset = 1; > } else { > printf(" "); > } > } > > if (!print_offset) > printf("\n"); > } > > int main(int argc, char **argv) > { > struct xstat xst; > int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW; > > unsigned int mask = XSTAT_ALL_STATS; > > for (argv++; *argv; argv++) { > if (strcmp(*argv, "-F") == 0) { > atflag |= AT_FORCE_ATTR_SYNC; > continue; > } > if (strcmp(*argv, "-L") == 0) { > atflag &= ~AT_SYMLINK_NOFOLLOW; > continue; > } > if (strcmp(*argv, "-O") == 0) { > mask &= ~XSTAT_BASIC_STATS; > continue; > } > if (strcmp(*argv, "-A") == 0) { > atflag |= AT_NO_AUTOMOUNT; > continue; > } > if (strcmp(*argv, "-R") == 0) { > raw = 1; > continue; > } > > memset(&xst, 0xbf, sizeof(xst)); > ret = xstat(AT_FDCWD, *argv, atflag, mask, &xst); > printf("xstat(%s) = %d\n", *argv, ret); > if (ret < 0) { > perror(*argv); > exit(1); > } > > if (raw) > dump_hex((unsigned long long *)&xst, 0, sizeof(xst)); > > dump_xstat(&xst); > } > return 0; > } > > Just compile and run, passing it paths to the files you want to examine: > > [root@andromeda ~]# /tmp/xstat /proc/$$ > xstat(/proc/2074) = 160 > results=47ef > Size: 0 Blocks: 0 IO Block: 1024 directory > Device: 00:03 Inode: 9072 Links: 7 > Access: (0555/dr-xr-xr-x) Uid: 0 > Gid: 0 > Access: 2010-07-14 16:50:46.609336272+0100 > Modify: 2010-07-14 16:50:46.609336272+0100 > Change: 2010-07-14 16:50:46.609336272+0100 > Inode flags: 0000000100000000 (-------- -------- -------- -------S -------- -------- -------- --------) > [root@andromeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm > xstat(/afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm) = 160 > results=77ef > Size: 5413882 Blocks: 0 IO Block: 4096 regular file > Device: 00:15 Inode: 2288 Links: 1 > Access: (0644/-rw-r--r--) Uid: 75338 > Gid: 0 > Access: 2008-11-05 19:47:22.000000000+0000 > Modify: 2008-11-05 19:47:22.000000000+0000 > Change: 2008-11-05 19:47:22.000000000+0000 > Inode version: 795h > Data version: 2h > Inode flags: 0000000800000000 (-------- -------- -------- ----r--- -------- -------- -------- --------) > > Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > --- > > arch/x86/syscalls/syscall_32.tbl | 2 > arch/x86/syscalls/syscall_64.tbl | 2 > fs/stat.c | 350 +++++++++++++++++++++++++++++++++++--- > include/linux/fcntl.h | 1 > include/linux/fs.h | 4 > include/linux/stat.h | 126 +++++++++++++- > include/linux/syscalls.h | 7 + > 7 files changed, 461 insertions(+), 31 deletions(-) > > diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl > index 29f9f05..980eb5a 100644 > --- a/arch/x86/syscalls/syscall_32.tbl > +++ b/arch/x86/syscalls/syscall_32.tbl > @@ -355,3 +355,5 @@ > 346 i386 setns sys_setns > 347 i386 process_vm_readv sys_process_vm_readv compat_sys_process_vm_readv > 348 i386 process_vm_writev sys_process_vm_writev compat_sys_process_vm_writev > +349 i386 xstat sys_xstat > +350 i386 fxstat sys_fxstat > diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl > index dd29a9e..7ae24bb 100644 > --- a/arch/x86/syscalls/syscall_64.tbl > +++ b/arch/x86/syscalls/syscall_64.tbl > @@ -318,6 +318,8 @@ > 309 common getcpu sys_getcpu > 310 64 process_vm_readv sys_process_vm_readv > 311 64 process_vm_writev sys_process_vm_writev > +312 common xstat sys_xstat > +313 common fxstat sys_fxstat > # > # x32-specific system call numbers start at 512 to avoid cache impact > # for native 64-bit operation. > diff --git a/fs/stat.c b/fs/stat.c > index c733dc5..af3ef33 100644 > --- a/fs/stat.c > +++ b/fs/stat.c > @@ -18,8 +18,20 @@ > #include <asm/uaccess.h> > #include <asm/unistd.h> > > +/** > + * generic_fillattr - Fill in the basic attributes from the inode struct > + * @inode: Inode to use as the source > + * @stat: Where to fill in the attributes > + * > + * Fill in the basic attributes in the kstat structure from data that's to be > + * found on the VFS inode structure. This is the default if no getattr inode > + * operation is supplied. > + */ > void generic_fillattr(struct inode *inode, struct kstat *stat) > { > + struct super_block *sb = inode->i_sb; > + u32 x; > + > stat->dev = inode->i_sb->s_dev; > stat->ino = inode->i_ino; > stat->mode = inode->i_mode; > @@ -27,17 +39,61 @@ void generic_fillattr(struct inode *inode, struct kstat *stat) > stat->uid = inode->i_uid; > stat->gid = inode->i_gid; > stat->rdev = inode->i_rdev; > - stat->size = i_size_read(inode); > - stat->atime = inode->i_atime; > stat->mtime = inode->i_mtime; > stat->ctime = inode->i_ctime; > - stat->blksize = (1 << inode->i_blkbits); > + stat->size = i_size_read(inode); > stat->blocks = inode->i_blocks; > -} > + stat->blksize = (1 << inode->i_blkbits); > > + stat->result_mask |= XSTAT_BASIC_STATS & ~XSTAT_RDEV; > + if (IS_NOATIME(inode)) > + stat->result_mask &= ~XSTAT_ATIME; > + else > + stat->atime = inode->i_atime; > + > + if (S_ISREG(stat->mode) && stat->nlink == 0) > + stat->information |= XSTAT_INFO_TEMPORARY; > + if (IS_AUTOMOUNT(inode)) > + stat->information |= XSTAT_INFO_AUTOMOUNT; > + if (IS_POSIXACL(inode)) > + stat->information |= XSTAT_INFO_HAS_ACL; > + > + /* if unset, assume 1s granularity */ > + stat->tv_granularity = sb->s_time_gran ?: 1000000000U; > + > + if (unlikely(S_ISBLK(stat->mode) || S_ISCHR(stat->mode))) > + stat->result_mask |= XSTAT_RDEV; > + > + x = ((u32*)&stat->volume_id)[0] = ((u32*)&sb->s_volume_id)[0]; > + x |= ((u32*)&stat->volume_id)[1] = ((u32*)&sb->s_volume_id)[1]; > + x |= ((u32*)&stat->volume_id)[2] = ((u32*)&sb->s_volume_id)[2]; > + x |= ((u32*)&stat->volume_id)[3] = ((u32*)&sb->s_volume_id)[3]; > + if (x) > + stat->result_mask |= XSTAT_VOLUME_ID; > +} > EXPORT_SYMBOL(generic_fillattr); > > -int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) > +/** > + * vfs_xgetattr - Get the basic and extra attributes of a file > + * @mnt: The mountpoint to which the dentry belongs > + * @dentry: The file of interest > + * @stat: Where to return the statistics > + * > + * Ask the filesystem for a file's attributes. The caller must have preset > + * stat->request_mask and stat->query_flags to indicate what they want. > + * > + * If the file is remote, the filesystem can be forced to update the attributes > + * from the backing store by passing AT_FORCE_ATTR_SYNC in query_flags. > + * > + * Bits must have been set in stat->request_mask to indicate which attributes > + * the caller wants retrieving. Any such attribute not requested may be > + * returned anyway, but the value may be approximate, and, if remote, may not > + * have been synchronised with the server. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_xgetattr(struct vfsmount *mnt, struct dentry *dentry, > + struct kstat *stat) > { > struct inode *inode = dentry->d_inode; > int retval; > @@ -46,64 +102,184 @@ int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) > if (retval) > return retval; > > + stat->result_mask = 0; > + stat->information = 0; > + stat->ioc_flags = 0; > if (inode->i_op->getattr) > return inode->i_op->getattr(mnt, dentry, stat); > > generic_fillattr(inode, stat); > return 0; > } > +EXPORT_SYMBOL(vfs_xgetattr); > > +/** > + * vfs_getattr - Get the basic attributes of a file > + * @mnt: The mountpoint to which the dentry belongs > + * @dentry: The file of interest > + * @stat: Where to return the statistics > + * > + * Ask the filesystem for a file's attributes. If remote, the filesystem isn't > + * forced to update its files from the backing store. Only the basic set of > + * attributes will be retrieved; anyone wanting more must use vfs_getxattr(), > + * as must anyone who wants to force attributes to be sync'd with the server. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) > +{ > + stat->query_flags = 0; > + stat->request_mask = XSTAT_BASIC_STATS; > + return vfs_xgetattr(mnt, dentry, stat); > +} > EXPORT_SYMBOL(vfs_getattr); > > -int vfs_fstat(unsigned int fd, struct kstat *stat) > +/** > + * vfs_fxstat - Get basic and extra attributes by file descriptor > + * @fd: The file descriptor refering to the file of interest > + * @stat: The result structure to fill in. > + * > + * This function is a wrapper around vfs_xgetattr(). The main difference is > + * that it uses a file descriptor to determine the file location. > + * > + * The caller must have preset stat->query_flags and stat->request_mask as for > + * vfs_xgetattr(). > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_fxstat(unsigned int fd, struct kstat *stat) > { > struct file *f = fget(fd); > int error = -EBADF; > > + if (stat->query_flags & ~KSTAT_QUERY_FLAGS) > + return -EINVAL; > if (f) { > - error = vfs_getattr(f->f_path.mnt, f->f_path.dentry, stat); > + error = vfs_xgetattr(f->f_path.mnt, f->f_path.dentry, stat); > fput(f); > } > return error; > } > +EXPORT_SYMBOL(vfs_fxstat); > + > +/** > + * vfs_fstat - Get basic attributes by file descriptor > + * @fd: The file descriptor refering to the file of interest > + * @stat: The result structure to fill in. > + * > + * This function is a wrapper around vfs_getattr(). The main difference is > + * that it uses a file descriptor to determine the file location. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_fstat(unsigned int fd, struct kstat *stat) > +{ > + stat->query_flags = 0; > + stat->request_mask = XSTAT_BASIC_STATS; > + return vfs_fxstat(fd, stat); > +} > EXPORT_SYMBOL(vfs_fstat); > > -int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, > - int flag) > +/** > + * vfs_xstat - Get basic and extra attributes by filename > + * @dfd: A file descriptor representing the base dir for a relative filename > + * @filename: The name of the file of interest > + * @flags: Flags to control the query > + * @stat: The result structure to fill in. > + * > + * This function is a wrapper around vfs_xgetattr(). The main difference is > + * that it uses a filename and base directory to determine the file location. > + * Additionally, the addition of AT_SYMLINK_NOFOLLOW to flags will prevent a > + * symlink at the given name from being referenced. > + * > + * The caller must have preset stat->request_mask as for vfs_xgetattr(). The > + * flags are also used to load up stat->query_flags. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_xstat(int dfd, const char __user *filename, int flags, > + struct kstat *stat) > { > struct path path; > - int error = -EINVAL; > - int lookup_flags = 0; > + int error, lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT; > > - if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | > - AT_EMPTY_PATH)) != 0) > - goto out; > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | > + AT_EMPTY_PATH | KSTAT_QUERY_FLAGS)) != 0) > + return -EINVAL; > > - if (!(flag & AT_SYMLINK_NOFOLLOW)) > - lookup_flags |= LOOKUP_FOLLOW; > - if (flag & AT_EMPTY_PATH) > + if (flags & AT_SYMLINK_NOFOLLOW) > + lookup_flags &= ~LOOKUP_FOLLOW; > + if (flags & AT_NO_AUTOMOUNT) > + lookup_flags &= ~LOOKUP_AUTOMOUNT; > + if (flags & AT_EMPTY_PATH) > lookup_flags |= LOOKUP_EMPTY; > > + stat->query_flags = flags & KSTAT_QUERY_FLAGS; > error = user_path_at(dfd, filename, lookup_flags, &path); > - if (error) > - goto out; > - > - error = vfs_getattr(path.mnt, path.dentry, stat); > - path_put(&path); > -out: > + if (!error) { > + error = vfs_xgetattr(path.mnt, path.dentry, stat); > + path_put(&path); > + } > return error; > } > +EXPORT_SYMBOL(vfs_xstat); > + > +/** > + * vfs_fstatat - Get basic attributes by filename > + * @dfd: A file descriptor representing the base dir for a relative filename > + * @filename: The name of the file of interest > + * @flags: Flags to control the query > + * @stat: The result structure to fill in. > + * > + * This function is a wrapper around vfs_xstat(). The difference is that it > + * preselects basic stats only. The flags are used to load up > + * stat->query_flags in addition to indicating symlink handling during path > + * resolution. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, > + int flags) > +{ > + stat->request_mask = XSTAT_BASIC_STATS; > + return vfs_xstat(dfd, filename, flags, stat); > +} > EXPORT_SYMBOL(vfs_fstatat); > > -int vfs_stat(const char __user *name, struct kstat *stat) > +/** > + * vfs_stat - Get basic attributes by filename > + * @filename: The name of the file of interest > + * @stat: The result structure to fill in. > + * > + * This function is a wrapper around vfs_xstat(). The difference is that it > + * preselects basic stats only, terminal symlinks are followed regardless and a > + * remote filesystem can't be forced to query the server. If such is desired, > + * vfs_xstat() should be used instead. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_stat(const char __user *filename, struct kstat *stat) > { > - return vfs_fstatat(AT_FDCWD, name, stat, 0); > + stat->request_mask = XSTAT_BASIC_STATS; > + return vfs_xstat(AT_FDCWD, filename, 0, stat); > } > EXPORT_SYMBOL(vfs_stat); > > +/** > + * vfs_stat - Get basic attributes by filename, without following terminal symlink > + * @filename: The name of the file of interest > + * @stat: The result structure to fill in. > + * > + * This function is a wrapper around vfs_xstat(). The difference is that it > + * preselects basic stats only, terminal symlinks are note followed regardless > + * and a remote filesystem can't be forced to query the server. If such is > + * desired, vfs_xstat() should be used instead. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > int vfs_lstat(const char __user *name, struct kstat *stat) > { > - return vfs_fstatat(AT_FDCWD, name, stat, AT_SYMLINK_NOFOLLOW); > + return vfs_xstat(AT_FDCWD, name, AT_SYMLINK_NOFOLLOW, stat); > } > EXPORT_SYMBOL(vfs_lstat); > > @@ -118,7 +294,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta > { > static int warncount = 5; > struct __old_kernel_stat tmp; > - > + > if (warncount > 0) { > warncount--; > printk(KERN_WARNING "VFS: Warning: %s using old stat() call. Recompile your binary.\n", > @@ -143,7 +319,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta > #if BITS_PER_LONG == 32 > if (stat->size > MAX_NON_LFS) > return -EOVERFLOW; > -#endif > +#endif > tmp.st_size = stat->size; > tmp.st_atime = stat->atime.tv_sec; > tmp.st_mtime = stat->mtime.tv_sec; > @@ -225,7 +401,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf) > #if BITS_PER_LONG == 32 > if (stat->size > MAX_NON_LFS) > return -EOVERFLOW; > -#endif > +#endif > tmp.st_size = stat->size; > tmp.st_atime = stat->atime.tv_sec; > tmp.st_mtime = stat->mtime.tv_sec; > @@ -412,6 +588,122 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, const char __user *, filename, > } > #endif /* __ARCH_WANT_STAT64 */ > > +/* > + * Get the xstat parameters if supplied > + */ > +static int xstat_get_params(unsigned int mask, struct xstat __user *buffer, > + struct kstat *stat) > +{ > + memset(stat, 0xde, sizeof(*stat)); // DEBUGGING > + > + if (!access_ok(VERIFY_WRITE, buffer, sizeof(*buffer))) > + return -EFAULT; > + > + stat->request_mask = mask & XSTAT_ALL_STATS; > + stat->result_mask = 0; > + return 0; > +} > + > +/* > + * Set the xstat results. > + * > + * If the buffer size was 0, we just return the size of the buffer needed to > + * return the full result. > + * > + * If bufsize indicates a buffer of insufficient size to hold the full result, > + * we return -E2BIG. > + * > + * Otherwise we copy the extended stats to userspace and return the amount of > + * data written into the buffer (or -EFAULT). > + */ > +static long xstat_set_result(struct kstat *stat, struct xstat __user *buffer) > +{ > + u32 mask = stat->result_mask, gran = stat->tv_granularity; > + > +#define __put_timestamp(kts, uts) ( \ > + __put_user(kts.tv_sec, uts.tv_sec ) || \ > + __put_user(kts.tv_nsec, uts.tv_nsec ) || \ > + __put_user(gran, uts.tv_granularity )) > + > + /* clear out anything we're not returning */ > + if (!(mask & XSTAT_IOC_FLAGS)) > + stat->ioc_flags = 0; > + if (!(mask & XSTAT_BTIME)) > + memset(&stat->btime, 0, sizeof(stat->btime)); > + if (!(mask & XSTAT_GEN)) > + stat->gen = 0; > + if (!(mask & XSTAT_VERSION)) > + stat->version = 0; > + if (!(mask & XSTAT_VOLUME_ID)) > + memset(&stat->volume_id, 0, sizeof(stat->volume_id)); > + > + /* transfer the results */ > + if (__put_user(mask, &buffer->st_mask ) || > + __put_user(stat->mode, &buffer->st_mode ) || > + __put_user(stat->nlink, &buffer->st_nlink ) || > + __put_user(stat->uid, &buffer->st_uid ) || > + __put_user(stat->gid, &buffer->st_gid ) || > + __put_user(stat->information, &buffer->st_information ) || > + __put_user(stat->ioc_flags, &buffer->st_ioc_flags ) || > + __put_user(stat->blksize, &buffer->st_blksize ) || > + __put_user(MAJOR(stat->rdev), &buffer->st_rdev.major ) || > + __put_user(MINOR(stat->rdev), &buffer->st_rdev.minor ) || > + __put_user(MAJOR(stat->dev), &buffer->st_dev.major ) || > + __put_user(MINOR(stat->dev), &buffer->st_dev.minor ) || > + __put_timestamp(stat->atime, &buffer->st_atime ) || > + __put_timestamp(stat->btime, &buffer->st_btime ) || > + __put_timestamp(stat->ctime, &buffer->st_ctime ) || > + __put_timestamp(stat->mtime, &buffer->st_mtime ) || > + __put_user(stat->ino, &buffer->st_ino ) || > + __put_user(stat->size, &buffer->st_size ) || > + __put_user(stat->blocks, &buffer->st_blocks ) || > + __put_user(stat->gen, &buffer->st_gen ) || > + __put_user(stat->version, &buffer->st_version ) || > + __copy_to_user(&buffer->st_volume_id, &stat->volume_id, > + sizeof(buffer->st_volume_id) ) || > + __clear_user(&buffer->__spares, sizeof(buffer->__spares))) > + return -EFAULT; > + return 0; > +} > + > +/* > + * System call to get extended stats by path > + */ > +SYSCALL_DEFINE5(xstat, > + int, dfd, const char __user *, filename, unsigned, flags, > + unsigned int, mask, struct xstat __user *, buffer) > +{ > + struct kstat stat; > + int error; > + > + error = xstat_get_params(mask, buffer, &stat); > + if (error != 0) > + return error; > + error = vfs_xstat(dfd, filename, flags, &stat); > + if (error) > + return error; > + return xstat_set_result(&stat, buffer); > +} > + > +/* > + * System call to get extended stats by file descriptor > + */ > +SYSCALL_DEFINE4(fxstat, unsigned int, fd, unsigned int, flags, > + unsigned int, mask, struct xstat __user *, buffer) > +{ > + struct kstat stat; > + int error; > + > + error = xstat_get_params(mask, buffer, &stat); > + if (error < 0) > + return error; > + stat.query_flags = flags; > + error = vfs_fxstat(fd, &stat); > + if (error) > + return error; > + return xstat_set_result(&stat, buffer); > +} > + > /* Caller is here responsible for sufficient locking (ie. inode->i_lock) */ > void __inode_add_bytes(struct inode *inode, loff_t bytes) > { > diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h > index f550f89..faa9e5d 100644 > --- a/include/linux/fcntl.h > +++ b/include/linux/fcntl.h > @@ -47,6 +47,7 @@ > #define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ > #define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */ > #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */ > +#define AT_FORCE_ATTR_SYNC 0x2000 /* Force the attributes to be sync'd with the server */ > > #ifdef __KERNEL__ > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 8de6755..ec6c62e 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1467,6 +1467,7 @@ struct super_block { > > char s_id[32]; /* Informational name */ > u8 s_uuid[16]; /* UUID */ > + unsigned char s_volume_id[16]; /* Volume identifier */ > > void *s_fs_info; /* Filesystem private info */ > unsigned int s_max_links; > @@ -2470,6 +2471,7 @@ extern const struct inode_operations page_symlink_inode_operations; > extern int generic_readlink(struct dentry *, char __user *, int); > extern void generic_fillattr(struct inode *, struct kstat *); > extern int vfs_getattr(struct vfsmount *, struct dentry *, struct kstat *); > +extern int vfs_xgetattr(struct vfsmount *, struct dentry *, struct kstat *); > void __inode_add_bytes(struct inode *inode, loff_t bytes); > void inode_add_bytes(struct inode *inode, loff_t bytes); > void inode_sub_bytes(struct inode *inode, loff_t bytes); > @@ -2482,6 +2484,8 @@ extern int vfs_stat(const char __user *, struct kstat *); > extern int vfs_lstat(const char __user *, struct kstat *); > extern int vfs_fstat(unsigned int, struct kstat *); > extern int vfs_fstatat(int , const char __user *, struct kstat *, int); > +extern int vfs_xstat(int, const char __user *, int, struct kstat *); > +extern int vfs_xfstat(unsigned int, struct kstat *); > > extern int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, > unsigned long arg); > diff --git a/include/linux/stat.h b/include/linux/stat.h > index 611c398..0ff561a 100644 > --- a/include/linux/stat.h > +++ b/include/linux/stat.h > @@ -3,6 +3,7 @@ > > #ifdef __KERNEL__ > > +#include <linux/types.h> > #include <asm/stat.h> > > #endif > @@ -46,6 +47,117 @@ > > #endif > > +/* > + * Query request/result mask > + * > + * Bits should be set in request_mask to request particular items when calling > + * xstat() or fxstat(). > + * > + * The bits in st_mask may or may not be set upon return, in part depending on > + * what was set in the mask argument: > + * > + * - if not available at all, the bit will be cleared before returning and the > + * field will be cleared; otherwise, > + * > + * - if AT_FORCE_ATTR_SYNC is set, then the datum will be synchronised to the > + * server and the field and bit will be set on return; otherwise, > + * > + * - if explicitly requested, the datum will be synchronised to a server or > + * other medium if out of date before being returned, and the bit will be set > + * on return; otherwise, > + * > + * - if not requested, but available in approximate form without any effort, it > + * will be filled in anyway, and the bit will be set upon return (it might > + * not be up to date, however, and no attempt will be made to synchronise the > + * internal state first); otherwise, > + * > + * - the field and the bit will be cleared before returning. > + * > + * Items in XSTAT_BASIC_STATS may be marked unavailable on return, but they > + * will have a value installed for compatibility purposes so that stat() and > + * co. can be emulated in userspace. > + */ > +#define XSTAT_MODE 0x00000001U /* want/got st_mode */ > +#define XSTAT_NLINK 0x00000002U /* want/got st_nlink */ > +#define XSTAT_UID 0x00000004U /* want/got st_uid */ > +#define XSTAT_GID 0x00000008U /* want/got st_gid */ > +#define XSTAT_RDEV 0x00000010U /* want/got st_rdev */ > +#define XSTAT_ATIME 0x00000020U /* want/got st_atime */ > +#define XSTAT_MTIME 0x00000040U /* want/got st_mtime */ > +#define XSTAT_CTIME 0x00000080U /* want/got st_ctime */ > +#define XSTAT_INO 0x00000100U /* want/got st_ino */ > +#define XSTAT_SIZE 0x00000200U /* want/got st_size */ > +#define XSTAT_BLOCKS 0x00000400U /* want/got st_blocks */ > +#define XSTAT_BASIC_STATS 0x000007ffU /* the stuff in the normal stat struct */ > +#define XSTAT_IOC_FLAGS 0x00000800U /* want/got FS_IOC_GETFLAGS */ > +#define XSTAT_BTIME 0x00001000U /* want/got st_btime */ > +#define XSTAT_GEN 0x00002000U /* want/got st_gen */ > +#define XSTAT_VERSION 0x00004000U /* want/got st_version */ > +#define XSTAT_VOLUME_ID 0x00008000U /* want/got st_volume_id */ > +#define XSTAT_ALL_STATS 0x0000ffffU /* all supported stats */ > + > +/* > + * Extended stat structures > + */ > +struct xstat_dev { > + uint32_t major, minor; > +}; > + > +struct xstat_time { > + int64_t tv_sec; > + uint32_t tv_nsec; > + uint32_t tv_granularity; /* time granularity (in nS) */ > +}; > + > +struct xstat { > + uint32_t st_mask; /* what results were written */ > + uint32_t st_mode; /* file mode */ > + uint32_t st_nlink; /* number of hard links */ > + uint32_t st_uid; /* user ID of owner */ > + uint32_t st_gid; /* group ID of owner */ > + uint32_t st_information; /* information about the file */ > + uint32_t st_ioc_flags; /* as FS_IOC_GETFLAGS */ > + uint32_t st_blksize; /* optimal size for filesystem I/O */ > + struct xstat_dev st_rdev; /* device ID of special file */ > + struct xstat_dev st_dev; /* ID of device containing file */ > + struct xstat_time st_atime; /* last access time */ > + struct xstat_time st_btime; /* file creation time */ > + struct xstat_time st_ctime; /* last attribute change time */ > + struct xstat_time st_mtime; /* last data modification time */ > + uint64_t st_ino; /* inode number */ > + uint64_t st_size; /* file size */ > + uint64_t st_blocks; /* number of 512-byte blocks allocated */ > + uint64_t st_gen; /* inode generation number */ > + uint64_t st_version; /* data version number */ > + uint8_t st_volume_id[16]; /* volume identifier */ > + uint64_t __spares[11]; /* spare space for future expansion */ > +}; > + > +/* > + * Flags to be found in st_information > + * > + * These give information about the features or the state of a file that might > + * be of use to ordinary userspace programs such as GUIs or ls rather than > + * specialised tools. > + * > + * Additional information may be found in st_ioc_flags and we try not to > + * overlap with it. > + */ > +#define XSTAT_INFO_ENCRYPTED 0x00000001U /* File is encrypted */ > +#define XSTAT_INFO_TEMPORARY 0x00000002U /* File is temporary (NTFS/CIFS) */ > +#define XSTAT_INFO_FABRICATED 0x00000004U /* File was made up by filesystem */ > +#define XSTAT_INFO_KERNEL_API 0x00000008U /* File is kernel API (eg: procfs/sysfs) */ > +#define XSTAT_INFO_REMOTE 0x00000010U /* File is remote */ > +#define XSTAT_INFO_OFFLINE 0x00000020U /* File is offline (CIFS) */ > +#define XSTAT_INFO_AUTOMOUNT 0x00000040U /* Dir is automount trigger */ > +#define XSTAT_INFO_AUTODIR 0x00000080U /* Dir provides unlisted automounts */ > +#define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U /* File has non-system ownership details */ > +#define XSTAT_INFO_HAS_ACL 0x00000200U /* File has an ACL of some sort */ > +#define XSTAT_INFO_REPARSE_POINT 0x00000400U /* File is reparse point (NTFS/CIFS) */ > +#define XSTAT_INFO_HIDDEN 0x00000800U /* File is marked hidden (DOS+) */ > +#define XSTAT_INFO_SYSTEM 0x00001000U /* File is marked system (DOS+) */ > +#define XSTAT_INFO_ARCHIVE 0x00002000U /* File is marked archive (DOS+) */ > + > #ifdef __KERNEL__ > #define S_IRWXUGO (S_IRWXU|S_IRWXG|S_IRWXO) > #define S_IALLUGO (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO) > @@ -60,6 +172,12 @@ > #include <linux/time.h> > > struct kstat { > + u32 query_flags; /* operational flags */ > +#define KSTAT_QUERY_FLAGS (AT_FORCE_ATTR_SYNC) > + u32 request_mask; /* what fields the user asked for */ > + u32 result_mask; /* what fields the user got */ > + u32 information; > + u32 ioc_flags; /* inode flags (FS_IOC_GETFLAGS) */ > u64 ino; > dev_t dev; > umode_t mode; > @@ -67,14 +185,18 @@ struct kstat { > uid_t uid; > gid_t gid; > dev_t rdev; > + unsigned int tv_granularity; /* granularity of times (in nS) */ > loff_t size; > - struct timespec atime; > + struct timespec atime; > struct timespec mtime; > struct timespec ctime; > + struct timespec btime; /* file creation time */ > unsigned long blksize; > unsigned long long blocks; > + u64 gen; /* inode generation */ > + u64 version; /* data version */ > + unsigned char volume_id[16]; /* volume identifier */ > }; > > #endif > - > #endif > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h > index 3de3acb..ff9f8d9 100644 > --- a/include/linux/syscalls.h > +++ b/include/linux/syscalls.h > @@ -45,6 +45,8 @@ struct shmid_ds; > struct sockaddr; > struct stat; > struct stat64; > +struct xstat_parameters; > +struct xstat; > struct statfs; > struct statfs64; > struct __sysctl_args; > @@ -858,4 +860,9 @@ asmlinkage long sys_process_vm_writev(pid_t pid, > unsigned long riovcnt, > unsigned long flags); > > +asmlinkage long sys_xstat(int dfd, const char __user *path, unsigned flags, > + unsigned mask, struct xstat __user *buffer); > +asmlinkage long sys_fxstat(unsigned fd, unsigned flags, > + unsigned mask, struct xstat __user *buffer); > + > #endif > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available @ 2012-04-24 21:29 ` J. Bruce Fields 0 siblings, 0 replies; 144+ messages in thread From: J. Bruce Fields @ 2012-04-24 21:29 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 19, 2012 at 03:06:12PM +0100, David Howells wrote: > Add a pair of system calls to make extended file stats available, including > file creation time, inode version and data version where available through the > underlying filesystem. > > The idea was initially proposed as a set of xattrs that could be retrieved with > getxattr(), but the general preferance proved to be for new syscalls with an > extended stat structure. > > This has a number of uses: > > (1) Creation time: The SMB protocol carries the creation time, which could be > exported by Samba, which will in turn help CIFS make use of FS-Cache as > that can be used for coherency data. > > This is also specified in NFSv4 as a recommended attribute and could be > exported by NFSD [Steve French]. > > (2) Lightweight stat: Ask for just those details of interest, and allow a > netfs (such as NFS) to approximate anything not of interest, possibly > without going to the server [Trond Myklebust, Ulrich Drepper]. > > (3) Heavyweight stat: Force a netfs to go to the server, even if it thinks its > cached attributes are up to date [Trond Myklebust]. > > (4) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd > Schubert]. > > (5) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. > > Can also be used to modify fill_post_wcc() in NFSD which retrieves > i_version directly, but has just called vfs_getattr(). It could get it > from the kstat struct if it used vfs_xgetattr() instead. > > (6) BSD stat compatibility: Including more fields from the BSD stat such as > creation time (st_btime) and inode generation number (st_gen) [Jeremy > Allison, Bernd Schubert]. > > (7) Extra coherency data may be useful in making backups [Andreas Dilger]. > > (8) Allow the filesystem to indicate what it can/cannot provide: A filesystem > can now say it doesn't support a standard stat feature if that isn't > available, so if, for instance, inode numbers or UIDs don't exist... > > (9) Make the fields a consistent size on all arches and make them large. > > (10) Store a 16-byte volume ID in the superblock that can be returned in struct > xstat [Steve French]. > > (11) Include granularity fields in the time data to indicate the granularity of > each of the times (NFSv4 time_delta) [Steve French]. It looks like you're including this with *each* time? But surely there's no filesystem with different granularity (say) for ctime than for mtime. Also, nfsd will want only one time_delta, not one for each time. Note also we need to document carefully what this means: I think it should be the granularity that the filesystem is capable of representing, but people are sometimes surprised to find out that the actual time source is usually more coarse-grained than that. --b. > > (12) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. > > (13) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, > Michael Kerrisk]. > > (14) Spare space, request flags and information flags are provided for future > expansion. > > > The following structures are defined for the use of these new system calls: > > struct xstat_dev { > uint32_t major, minor; > }; > > struct xstat_time { > uint64_t tv_sec; > uint32_t tv_nsec; > uint32_t tv_granularity; > }; > > struct xstat { > uint32_t st_mask; > uint32_t st_mode; > uint32_t st_nlink; > uint32_t st_uid; > uint32_t st_gid; > uint32_t st_information; > uint32_t st_ioc_flags; > uint32_t st_blksize; > struct xstat_dev st_rdev; > struct xstat_dev st_dev; > struct xstat_time st_atime; > struct xstat_time st_btime; > struct xstat_time st_ctime; > struct xstat_time st_mtime; > uint64_t st_ino; > uint64_t st_size; > uint64_t st_blocks; > uint64_t st_gen; > uint64_t st_version; > uint8_t st_volume_id[16]; > uint64_t __spares[11]; > }; > > where st_information is local system information about the file, st_btime is > the file creation time, st_gen is the inode generation (i_generation), > st_data_version is the data version number (i_version), st_ioc_flags is the > flags from FS_IOC_GETFLAGS, st_volume_id is where the volume identified is > stored, st_result_mask is a bitmask indicating the data provided and __spares[] > are where as-yet undefined fields can be placed. > > The defined bits in request_mask and st_mask are: > > XSTAT_MODE Want/got st_mode > XSTAT_NLINK Want/got st_nlink > XSTAT_UID Want/got st_uid > XSTAT_GID Want/got st_gid > XSTAT_RDEV Want/got st_rdev > XSTAT_ATIME Want/got st_atime > XSTAT_MTIME Want/got st_mtime > XSTAT_CTIME Want/got st_ctime > XSTAT_INO Want/got st_ino > XSTAT_SIZE Want/got st_size > XSTAT_BLOCKS Want/got st_blocks > XSTAT_BASIC_STATS [The stuff in the normal stat struct] > XSTAT_IOC_FLAGS Want/got FS_IOC_GETFLAGS > XSTAT_BTIME Want/got st_btime > XSTAT_GEN Want/got st_gen > XSTAT_VERSION Want/got st_data_version > XSTAT_VOLUME_ID Want/got st_volume_id > XSTAT_ALL_STATS [All currently available stuff] > > The defined bits in st_ioc_flags are the usual FS_xxx_FL, plus some extra flags > that might be supplied by the filesystem. Note that Ext4 returns flags outside > of {EXT4,FS}_FL_USER_VISIBLE in response to FS_IOC_GETFLAGS. Should > {EXT4,FS}_FL_USER_VISIBLE be extended to cover them? Or should the extra flags > be suppressed? > > The defined bits in the st_information field give local system data on a file, > how it is accessed, where it is and what it does: > > XSTAT_INFO_ENCRYPTED File is encrypted > XSTAT_INFO_TEMPORARY File is temporary (NTFS/CIFS/deleted) > XSTAT_INFO_FABRICATED File was made up by filesystem > XSTAT_INFO_KERNEL_API File is kernel API (eg: procfs/sysfs) > XSTAT_INFO_REMOTE File is remote > XSTAT_INFO_OFFLINE File is offline (CIFS) > XSTAT_INFO_AUTOMOUNT Dir is automount trigger > XSTAT_INFO_AUTODIR Dir provides unlisted automounts > XSTAT_INFO_NONSYSTEM_OWNERSHIP File has non-system ownership details > XSTAT_INFO_HAS_ACL File has an ACL of some sort > XSTAT_INFO_REPARSE_POINT File is reparse point (NTFS/CIFS) > XSTAT_INFO_HIDDEN File is marked hidden (DOS+) > XSTAT_INFO_SYSTEM File is marked system (DOS+) > XSTAT_INFO_ARCHIVE File is marked archive (DOS+) > > These are for the use of GUI tools that might want to mark files specially, > depending on what they are. I've tried not to provide overlap with > st_ioc_flags where something usable exists there. Should Hidden, System and > Archive flags be associated with ioc_flags, perhaps with ioc_flags extended to > 64-bits? > > > The system calls are: > > ssize_t ret = xstat(int dfd, > const char *filename, > unsigned int flags, > unsigned int mask, > struct xstat *buffer); > > ssize_t ret = fxstat(unsigned fd, > unsigned int flags, > unsigned int mask, > struct xstat *buffer); > > > The dfd, filename, flags and fd parameters indicate the file to query. There > is no equivalent of lstat() as that can be emulated with xstat() by passing > AT_SYMLINK_NOFOLLOW in flags. > > AT_FORCE_ATTR_SYNC can also be set in flags. This will require a network > filesystem to synchronise its attributes with the server. > > mask is a bitmask indicating the fields in struct xstat that are of interest to > the caller. The user should set this to XSTAT__BASIC_STATS to get the > basic set returned by stat(). > > Should there just be one xstat() syscall that does fxstat() if filename is NULL? > > The fields in struct xstat come in a number of classes: > > (0) st_dev, st_blksize, st_information. > > These are local data and are always available. > > (1) st_mode, st_nlinks, st_uid, st_gid, st_[amc]time, st_ino, st_size, > st_blocks. > > These will be returned whether the caller asks for them or not. The > corresponding bits in result_mask will be set to indicate their presence. > > If the caller didn't ask for them, then they may be approximated. For > example, NFS won't waste any time updating them from the server, unless as > a byproduct of updating something requested. > > If the values don't actually exist for the underlying object (such as UID > or GID on a DOS file), then the bit won't be set in the result_mask, even > if the caller asked for the value and the returned value will be a > fabrication. > > (2) st_rdev. > > As for class (1), but this won't be returned if the file is not a blockdev > or chardev. The bit will be cleared if the value is not returned. > > (3) File creation time (st_btime), inode generation (st_gen), data version > (st_version), volume_id (st_volume_id) and inode flags (st_ioc_flags). > > These will be returned if available whether the caller asked for them or > not. The corresponding bits in result_mask will be set or cleared as > appropriate to indicate their presence. > > If the caller didn't ask for them, then they may be approximated. For > example, NFS won't waste any time updating them from the server, unless > as a byproduct of updating something requested. > > At the moment, this will only work on x86_64 and i386 as it requires system > calls to be wired up. > > > ======= > TESTING > ======= > > The following test program can be used to test the xstat system call: > > /* Test the xstat() system call > * > * Copyright (C) 2010 Red Hat, Inc. All Rights Reserved. > * Written by David Howells (dhowells@redhat.com) > * > * This program is free software; you can redistribute it and/or > * modify it under the terms of the GNU General Public Licence > * as published by the Free Software Foundation; either version > * 2 of the Licence, or (at your option) any later version. > */ > > #define _GNU_SOURCE > #define _ATFILE_SOURCE > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <unistd.h> > #include <fcntl.h> > #include <time.h> > #include <sys/syscall.h> > #include <sys/stat.h> > #include <sys/types.h> > > #define AT_NO_AUTOMOUNT 0x800 > #define AT_FORCE_ATTR_SYNC 0x2000 > > #define XSTAT_MODE 0x00000001U > #define XSTAT_NLINK 0x00000002U > #define XSTAT_UID 0x00000004U > #define XSTAT_GID 0x00000008U > #define XSTAT_RDEV 0x00000010U > #define XSTAT_ATIME 0x00000020U > #define XSTAT_MTIME 0x00000040U > #define XSTAT_CTIME 0x00000080U > #define XSTAT_INO 0x00000100U > #define XSTAT_SIZE 0x00000200U > #define XSTAT_BLOCKS 0x00000400U > #define XSTAT_BASIC_STATS 0x000007ffU > #define XSTAT_BTIME 0x00000800U > #define XSTAT_GEN 0x00001000U > #define XSTAT_VERSION 0x00002000U > #define XSTAT_IOC_FLAGS 0x00004000U > #define XSTAT_VOLUME_ID 0x00008000U > #define XSTAT_ALL_STATS 0x0000ffffU > > struct xstat_dev { > uint32_t major; > uint32_t minor; > }; > > struct xstat_time { > uint64_t tv_sec; > uint32_t tv_nsec; > uint32_t tv_granularity; > }; > > struct xstat { > uint32_t st_mask; > uint32_t st_mode; > uint32_t st_nlink; > uint32_t st_uid; > uint32_t st_gid; > uint32_t st_information; > uint32_t st_ioc_flags; > uint32_t st_blksize; > struct xstat_dev st_rdev; > struct xstat_dev st_dev; > struct xstat_time st_atim; > struct xstat_time st_btim; > struct xstat_time st_ctim; > struct xstat_time st_mtim; > uint64_t st_ino; > uint64_t st_size; > uint64_t st_blksize; > uint64_t st_blocks; > uint64_t st_gen; > uint64_t st_version; > uint64_t st_volume_id[16]; > uint64_t st_spares[11]; > }; > > #define XSTAT_INFO_ENCRYPTED 0x00000001U > #define XSTAT_INFO_TEMPORARY 0x00000002U > #define XSTAT_INFO_FABRICATED 0x00000004U > #define XSTAT_INFO_KERNEL_API 0x00000008U > #define XSTAT_INFO_REMOTE 0x00000010U > #define XSTAT_INFO_OFFLINE 0x00000020U > #define XSTAT_INFO_AUTOMOUNT 0x00000040U > #define XSTAT_INFO_AUTODIR 0x00000080U > #define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U > #define XSTAT_INFO_HAS_ACL 0x00000200U > #define XSTAT_INFO_REPARSE_POINT 0x00000400U > #define XSTAT_INFO_HIDDEN 0x00000800U > #define XSTAT_INFO_SYSTEM 0x00001000U > #define XSTAT_INFO_ARCHIVE 0x00002000U > > #define __NR_xstat 312 > #define __NR_fxstat 313 > > static __attribute__((unused)) > ssize_t xstat(int dfd, const char *filename, unsigned flags, > unsigned int mask, struct xstat *buffer) > { > return syscall(__NR_xstat, dfd, filename, flags, mask, buffer); > } > > static __attribute__((unused)) > ssize_t fxstat(int fd, unsigned flags, > unsigned int mask, struct xstat *buffer) > { > return syscall(__NR_fxstat, fd, flags, mask, buffer); > } > > static void print_time(const char *field, const struct xstat_time *xstm) > { > struct tm tm; > time_t tim; > char buffer[100]; > int len; > > tim = xstm->tv_sec; > if (!localtime_r(&tim, &tm)) { > perror("localtime_r"); > exit(1); > } > len = strftime(buffer, 100, "%F %T", &tm); > if (len == 0) { > perror("strftime"); > exit(1); > } > printf("%s", field); > fwrite(buffer, 1, len, stdout); > printf(".%09u", xstm->tv_nsec); > len = strftime(buffer, 100, "%z", &tm); > if (len == 0) { > perror("strftime2"); > exit(1); > } > fwrite(buffer, 1, len, stdout); > printf("\n"); > } > > static void dump_xstat(struct xstat *xst) > { > char buffer[256], ft; > > printf("results=%x\n", xst->st_mask); > > printf(" "); > if (xst->st_mask & XSTAT_SIZE) > printf(" Size: %-15llu", (unsigned long long) xst->st_size); > if (xst->st_mask & XSTAT_BLOCKS) > printf(" Blocks: %-10llu", (unsigned long long) xst->st_blocks); > printf(" IO Block: %-6llu ", (unsigned long long) xst->st_blksize); > if (xst->st_mask & XSTAT_MODE) { > switch (xst->st_mode & S_IFMT) { > case S_IFIFO: printf(" FIFO\n"); ft = 'p'; break; > case S_IFCHR: printf(" character special file\n"); ft = 'c'; break; > case S_IFDIR: printf(" directory\n"); ft = 'd'; break; > case S_IFBLK: printf(" block special file\n"); ft = 'b'; break; > case S_IFREG: printf(" regular file\n"); ft = '-'; break; > case S_IFLNK: printf(" symbolic link\n"); ft = 'l'; break; > case S_IFSOCK: printf(" socket\n"); ft = 's'; break; > default: > printf("unknown type (%o)\n", xst->st_mode & S_IFMT); > ft = '?'; > break; > } > } > > sprintf(buffer, "%02x:%02x", xst->st_dev.major, xst->st_dev.minor); > printf("Device: %-15s", buffer); > if (xst->st_mask & XSTAT_INO) > printf(" Inode: %-11llu", (unsigned long long) xst->st_ino); > if (xst->st_mask & XSTAT_SIZE) > printf(" Links: %-5u", xst->st_nlink); > if (xst->st_mask & XSTAT_RDEV) > printf(" Device type: %u,%u", > xst->st_rdev.major, xst->st_rdev.minor); > printf("\n"); > > if (xst->st_mask & XSTAT_MODE) > printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c) ", > xst->st_mode & 07777, > ft, > xst->st_mode & S_IRUSR ? 'r' : '-', > xst->st_mode & S_IWUSR ? 'w' : '-', > xst->st_mode & S_IXUSR ? 'x' : '-', > xst->st_mode & S_IRGRP ? 'r' : '-', > xst->st_mode & S_IWGRP ? 'w' : '-', > xst->st_mode & S_IXGRP ? 'x' : '-', > xst->st_mode & S_IROTH ? 'r' : '-', > xst->st_mode & S_IWOTH ? 'w' : '-', > xst->st_mode & S_IXOTH ? 'x' : '-'); > if (xst->st_mask & XSTAT_UID) > printf("Uid: %d \n", xst->st_uid); > if (xst->st_mask & XSTAT_GID) > printf("Gid: %u\n", xst->st_gid); > > if (xst->st_mask & XSTAT_ATIME) > print_time("Access: ", &xst->st_atim); > if (xst->st_mask & XSTAT_MTIME) > print_time("Modify: ", &xst->st_mtim); > if (xst->st_mask & XSTAT_CTIME) > print_time("Change: ", &xst->st_ctim); > if (xst->st_mask & XSTAT_BTIME) > print_time("Create: ", &xst->st_btim); > > if (xst->st_mask & XSTAT_GEN) > printf("Inode version: %llxh\n", (unsigned long long) xst->st_gen); > if (xst->st_mask & XSTAT_VERSION) > printf("Data version: %llxh\n", (unsigned long long) xst->st_version); > > if (xst->st_mask & XSTAT_IOC_FLAGS) { > unsigned char bits; > int loop, byte; > > static char flag_representation[32 + 1] = > /* FS_IOC_GETFLAGS flags: */ > "????????" /* 31-24 0x00000000-ff000000 */ > "????ehTD" /* 23-16 0x00000000-00ff0000 */ > "tj?IE?XZ" /* 15- 8 0x00000000-0000ff00 */ > "AdaiScus" /* 7- 0 0x00000000-000000ff */ > ; > > printf("Inode flags: %08x (", xst->st_ioc_flags); > for (byte = 32 - 8; byte >= 0; byte -= 8) { > bits = xst->st_ioc_flags >> byte; > for (loop = 7; loop >= 0; loop--) { > int bit = byte + loop; > > if (bits & 0x80) > putchar(flag_representation[31 - bit]); > else > putchar('-'); > bits <<= 1; > } > if (byte) > putchar(' '); > } > printf(")\n"); > } > > if (xst->st_information) { > unsigned char bits; > int loop, byte; > > static char info_representation[32 + 1] = > /* XSTAT_INFO_ flags: */ > "????????" /* 31-24 0x00000000-ff000000 */ > "????????" /* 23-16 0x00000000-00ff0000 */ > "??ASHRan" /* 15- 8 0x00000000-0000ff00 */ > "dmorkfte" /* 7- 0 0x00000000-000000ff */ > ; > > printf("Information: %08x (", xst->st_information); > for (byte = 32 - 8; byte >= 0; byte -= 8) { > bits = xst->st_information >> byte; > for (loop = 7; loop >= 0; loop--) { > int bit = byte + loop; > > if (bits & 0x80) > putchar(info_representation[31 - bit]); > else > putchar('-'); > bits <<= 1; > } > if (byte) > putchar(' '); > } > printf(")\n"); > } > > if (xst->st_mask & XSTAT_VOLUME_ID) { > int loop; > printf("Volume ID: "); > for (loop = 0; loop < sizeof(xst->st_volume_id); loop++) { > printf("%02x", xst->st_volume_id[loop]); > if (loop == 7) > printf("-"); > } > printf("\n"); > } > } > > void dump_hex(unsigned long long *data, int from, int to) > { > unsigned offset, print_offset = 1, col = 0; > > from /= 8; > to = (to + 7) / 8; > > for (offset = from; offset < to; offset++) { > if (print_offset) { > printf("%04x: ", offset * 8); > print_offset = 0; > } > printf("%016llx", data[offset]); > col++; > if ((col & 3) == 0) { > printf("\n"); > print_offset = 1; > } else { > printf(" "); > } > } > > if (!print_offset) > printf("\n"); > } > > int main(int argc, char **argv) > { > struct xstat xst; > int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW; > > unsigned int mask = XSTAT_ALL_STATS; > > for (argv++; *argv; argv++) { > if (strcmp(*argv, "-F") == 0) { > atflag |= AT_FORCE_ATTR_SYNC; > continue; > } > if (strcmp(*argv, "-L") == 0) { > atflag &= ~AT_SYMLINK_NOFOLLOW; > continue; > } > if (strcmp(*argv, "-O") == 0) { > mask &= ~XSTAT_BASIC_STATS; > continue; > } > if (strcmp(*argv, "-A") == 0) { > atflag |= AT_NO_AUTOMOUNT; > continue; > } > if (strcmp(*argv, "-R") == 0) { > raw = 1; > continue; > } > > memset(&xst, 0xbf, sizeof(xst)); > ret = xstat(AT_FDCWD, *argv, atflag, mask, &xst); > printf("xstat(%s) = %d\n", *argv, ret); > if (ret < 0) { > perror(*argv); > exit(1); > } > > if (raw) > dump_hex((unsigned long long *)&xst, 0, sizeof(xst)); > > dump_xstat(&xst); > } > return 0; > } > > Just compile and run, passing it paths to the files you want to examine: > > [root@andromeda ~]# /tmp/xstat /proc/$$ > xstat(/proc/2074) = 160 > results=47ef > Size: 0 Blocks: 0 IO Block: 1024 directory > Device: 00:03 Inode: 9072 Links: 7 > Access: (0555/dr-xr-xr-x) Uid: 0 > Gid: 0 > Access: 2010-07-14 16:50:46.609336272+0100 > Modify: 2010-07-14 16:50:46.609336272+0100 > Change: 2010-07-14 16:50:46.609336272+0100 > Inode flags: 0000000100000000 (-------- -------- -------- -------S -------- -------- -------- --------) > [root@andromeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm > xstat(/afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm) = 160 > results=77ef > Size: 5413882 Blocks: 0 IO Block: 4096 regular file > Device: 00:15 Inode: 2288 Links: 1 > Access: (0644/-rw-r--r--) Uid: 75338 > Gid: 0 > Access: 2008-11-05 19:47:22.000000000+0000 > Modify: 2008-11-05 19:47:22.000000000+0000 > Change: 2008-11-05 19:47:22.000000000+0000 > Inode version: 795h > Data version: 2h > Inode flags: 0000000800000000 (-------- -------- -------- ----r--- -------- -------- -------- --------) > > Signed-off-by: David Howells <dhowells@redhat.com> > --- > > arch/x86/syscalls/syscall_32.tbl | 2 > arch/x86/syscalls/syscall_64.tbl | 2 > fs/stat.c | 350 +++++++++++++++++++++++++++++++++++--- > include/linux/fcntl.h | 1 > include/linux/fs.h | 4 > include/linux/stat.h | 126 +++++++++++++- > include/linux/syscalls.h | 7 + > 7 files changed, 461 insertions(+), 31 deletions(-) > > diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl > index 29f9f05..980eb5a 100644 > --- a/arch/x86/syscalls/syscall_32.tbl > +++ b/arch/x86/syscalls/syscall_32.tbl > @@ -355,3 +355,5 @@ > 346 i386 setns sys_setns > 347 i386 process_vm_readv sys_process_vm_readv compat_sys_process_vm_readv > 348 i386 process_vm_writev sys_process_vm_writev compat_sys_process_vm_writev > +349 i386 xstat sys_xstat > +350 i386 fxstat sys_fxstat > diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl > index dd29a9e..7ae24bb 100644 > --- a/arch/x86/syscalls/syscall_64.tbl > +++ b/arch/x86/syscalls/syscall_64.tbl > @@ -318,6 +318,8 @@ > 309 common getcpu sys_getcpu > 310 64 process_vm_readv sys_process_vm_readv > 311 64 process_vm_writev sys_process_vm_writev > +312 common xstat sys_xstat > +313 common fxstat sys_fxstat > # > # x32-specific system call numbers start at 512 to avoid cache impact > # for native 64-bit operation. > diff --git a/fs/stat.c b/fs/stat.c > index c733dc5..af3ef33 100644 > --- a/fs/stat.c > +++ b/fs/stat.c > @@ -18,8 +18,20 @@ > #include <asm/uaccess.h> > #include <asm/unistd.h> > > +/** > + * generic_fillattr - Fill in the basic attributes from the inode struct > + * @inode: Inode to use as the source > + * @stat: Where to fill in the attributes > + * > + * Fill in the basic attributes in the kstat structure from data that's to be > + * found on the VFS inode structure. This is the default if no getattr inode > + * operation is supplied. > + */ > void generic_fillattr(struct inode *inode, struct kstat *stat) > { > + struct super_block *sb = inode->i_sb; > + u32 x; > + > stat->dev = inode->i_sb->s_dev; > stat->ino = inode->i_ino; > stat->mode = inode->i_mode; > @@ -27,17 +39,61 @@ void generic_fillattr(struct inode *inode, struct kstat *stat) > stat->uid = inode->i_uid; > stat->gid = inode->i_gid; > stat->rdev = inode->i_rdev; > - stat->size = i_size_read(inode); > - stat->atime = inode->i_atime; > stat->mtime = inode->i_mtime; > stat->ctime = inode->i_ctime; > - stat->blksize = (1 << inode->i_blkbits); > + stat->size = i_size_read(inode); > stat->blocks = inode->i_blocks; > -} > + stat->blksize = (1 << inode->i_blkbits); > > + stat->result_mask |= XSTAT_BASIC_STATS & ~XSTAT_RDEV; > + if (IS_NOATIME(inode)) > + stat->result_mask &= ~XSTAT_ATIME; > + else > + stat->atime = inode->i_atime; > + > + if (S_ISREG(stat->mode) && stat->nlink == 0) > + stat->information |= XSTAT_INFO_TEMPORARY; > + if (IS_AUTOMOUNT(inode)) > + stat->information |= XSTAT_INFO_AUTOMOUNT; > + if (IS_POSIXACL(inode)) > + stat->information |= XSTAT_INFO_HAS_ACL; > + > + /* if unset, assume 1s granularity */ > + stat->tv_granularity = sb->s_time_gran ?: 1000000000U; > + > + if (unlikely(S_ISBLK(stat->mode) || S_ISCHR(stat->mode))) > + stat->result_mask |= XSTAT_RDEV; > + > + x = ((u32*)&stat->volume_id)[0] = ((u32*)&sb->s_volume_id)[0]; > + x |= ((u32*)&stat->volume_id)[1] = ((u32*)&sb->s_volume_id)[1]; > + x |= ((u32*)&stat->volume_id)[2] = ((u32*)&sb->s_volume_id)[2]; > + x |= ((u32*)&stat->volume_id)[3] = ((u32*)&sb->s_volume_id)[3]; > + if (x) > + stat->result_mask |= XSTAT_VOLUME_ID; > +} > EXPORT_SYMBOL(generic_fillattr); > > -int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) > +/** > + * vfs_xgetattr - Get the basic and extra attributes of a file > + * @mnt: The mountpoint to which the dentry belongs > + * @dentry: The file of interest > + * @stat: Where to return the statistics > + * > + * Ask the filesystem for a file's attributes. The caller must have preset > + * stat->request_mask and stat->query_flags to indicate what they want. > + * > + * If the file is remote, the filesystem can be forced to update the attributes > + * from the backing store by passing AT_FORCE_ATTR_SYNC in query_flags. > + * > + * Bits must have been set in stat->request_mask to indicate which attributes > + * the caller wants retrieving. Any such attribute not requested may be > + * returned anyway, but the value may be approximate, and, if remote, may not > + * have been synchronised with the server. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_xgetattr(struct vfsmount *mnt, struct dentry *dentry, > + struct kstat *stat) > { > struct inode *inode = dentry->d_inode; > int retval; > @@ -46,64 +102,184 @@ int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) > if (retval) > return retval; > > + stat->result_mask = 0; > + stat->information = 0; > + stat->ioc_flags = 0; > if (inode->i_op->getattr) > return inode->i_op->getattr(mnt, dentry, stat); > > generic_fillattr(inode, stat); > return 0; > } > +EXPORT_SYMBOL(vfs_xgetattr); > > +/** > + * vfs_getattr - Get the basic attributes of a file > + * @mnt: The mountpoint to which the dentry belongs > + * @dentry: The file of interest > + * @stat: Where to return the statistics > + * > + * Ask the filesystem for a file's attributes. If remote, the filesystem isn't > + * forced to update its files from the backing store. Only the basic set of > + * attributes will be retrieved; anyone wanting more must use vfs_getxattr(), > + * as must anyone who wants to force attributes to be sync'd with the server. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) > +{ > + stat->query_flags = 0; > + stat->request_mask = XSTAT_BASIC_STATS; > + return vfs_xgetattr(mnt, dentry, stat); > +} > EXPORT_SYMBOL(vfs_getattr); > > -int vfs_fstat(unsigned int fd, struct kstat *stat) > +/** > + * vfs_fxstat - Get basic and extra attributes by file descriptor > + * @fd: The file descriptor refering to the file of interest > + * @stat: The result structure to fill in. > + * > + * This function is a wrapper around vfs_xgetattr(). The main difference is > + * that it uses a file descriptor to determine the file location. > + * > + * The caller must have preset stat->query_flags and stat->request_mask as for > + * vfs_xgetattr(). > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_fxstat(unsigned int fd, struct kstat *stat) > { > struct file *f = fget(fd); > int error = -EBADF; > > + if (stat->query_flags & ~KSTAT_QUERY_FLAGS) > + return -EINVAL; > if (f) { > - error = vfs_getattr(f->f_path.mnt, f->f_path.dentry, stat); > + error = vfs_xgetattr(f->f_path.mnt, f->f_path.dentry, stat); > fput(f); > } > return error; > } > +EXPORT_SYMBOL(vfs_fxstat); > + > +/** > + * vfs_fstat - Get basic attributes by file descriptor > + * @fd: The file descriptor refering to the file of interest > + * @stat: The result structure to fill in. > + * > + * This function is a wrapper around vfs_getattr(). The main difference is > + * that it uses a file descriptor to determine the file location. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_fstat(unsigned int fd, struct kstat *stat) > +{ > + stat->query_flags = 0; > + stat->request_mask = XSTAT_BASIC_STATS; > + return vfs_fxstat(fd, stat); > +} > EXPORT_SYMBOL(vfs_fstat); > > -int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, > - int flag) > +/** > + * vfs_xstat - Get basic and extra attributes by filename > + * @dfd: A file descriptor representing the base dir for a relative filename > + * @filename: The name of the file of interest > + * @flags: Flags to control the query > + * @stat: The result structure to fill in. > + * > + * This function is a wrapper around vfs_xgetattr(). The main difference is > + * that it uses a filename and base directory to determine the file location. > + * Additionally, the addition of AT_SYMLINK_NOFOLLOW to flags will prevent a > + * symlink at the given name from being referenced. > + * > + * The caller must have preset stat->request_mask as for vfs_xgetattr(). The > + * flags are also used to load up stat->query_flags. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_xstat(int dfd, const char __user *filename, int flags, > + struct kstat *stat) > { > struct path path; > - int error = -EINVAL; > - int lookup_flags = 0; > + int error, lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT; > > - if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | > - AT_EMPTY_PATH)) != 0) > - goto out; > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | > + AT_EMPTY_PATH | KSTAT_QUERY_FLAGS)) != 0) > + return -EINVAL; > > - if (!(flag & AT_SYMLINK_NOFOLLOW)) > - lookup_flags |= LOOKUP_FOLLOW; > - if (flag & AT_EMPTY_PATH) > + if (flags & AT_SYMLINK_NOFOLLOW) > + lookup_flags &= ~LOOKUP_FOLLOW; > + if (flags & AT_NO_AUTOMOUNT) > + lookup_flags &= ~LOOKUP_AUTOMOUNT; > + if (flags & AT_EMPTY_PATH) > lookup_flags |= LOOKUP_EMPTY; > > + stat->query_flags = flags & KSTAT_QUERY_FLAGS; > error = user_path_at(dfd, filename, lookup_flags, &path); > - if (error) > - goto out; > - > - error = vfs_getattr(path.mnt, path.dentry, stat); > - path_put(&path); > -out: > + if (!error) { > + error = vfs_xgetattr(path.mnt, path.dentry, stat); > + path_put(&path); > + } > return error; > } > +EXPORT_SYMBOL(vfs_xstat); > + > +/** > + * vfs_fstatat - Get basic attributes by filename > + * @dfd: A file descriptor representing the base dir for a relative filename > + * @filename: The name of the file of interest > + * @flags: Flags to control the query > + * @stat: The result structure to fill in. > + * > + * This function is a wrapper around vfs_xstat(). The difference is that it > + * preselects basic stats only. The flags are used to load up > + * stat->query_flags in addition to indicating symlink handling during path > + * resolution. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, > + int flags) > +{ > + stat->request_mask = XSTAT_BASIC_STATS; > + return vfs_xstat(dfd, filename, flags, stat); > +} > EXPORT_SYMBOL(vfs_fstatat); > > -int vfs_stat(const char __user *name, struct kstat *stat) > +/** > + * vfs_stat - Get basic attributes by filename > + * @filename: The name of the file of interest > + * @stat: The result structure to fill in. > + * > + * This function is a wrapper around vfs_xstat(). The difference is that it > + * preselects basic stats only, terminal symlinks are followed regardless and a > + * remote filesystem can't be forced to query the server. If such is desired, > + * vfs_xstat() should be used instead. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > +int vfs_stat(const char __user *filename, struct kstat *stat) > { > - return vfs_fstatat(AT_FDCWD, name, stat, 0); > + stat->request_mask = XSTAT_BASIC_STATS; > + return vfs_xstat(AT_FDCWD, filename, 0, stat); > } > EXPORT_SYMBOL(vfs_stat); > > +/** > + * vfs_stat - Get basic attributes by filename, without following terminal symlink > + * @filename: The name of the file of interest > + * @stat: The result structure to fill in. > + * > + * This function is a wrapper around vfs_xstat(). The difference is that it > + * preselects basic stats only, terminal symlinks are note followed regardless > + * and a remote filesystem can't be forced to query the server. If such is > + * desired, vfs_xstat() should be used instead. > + * > + * 0 will be returned on success, and a -ve error code if unsuccessful. > + */ > int vfs_lstat(const char __user *name, struct kstat *stat) > { > - return vfs_fstatat(AT_FDCWD, name, stat, AT_SYMLINK_NOFOLLOW); > + return vfs_xstat(AT_FDCWD, name, AT_SYMLINK_NOFOLLOW, stat); > } > EXPORT_SYMBOL(vfs_lstat); > > @@ -118,7 +294,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta > { > static int warncount = 5; > struct __old_kernel_stat tmp; > - > + > if (warncount > 0) { > warncount--; > printk(KERN_WARNING "VFS: Warning: %s using old stat() call. Recompile your binary.\n", > @@ -143,7 +319,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta > #if BITS_PER_LONG == 32 > if (stat->size > MAX_NON_LFS) > return -EOVERFLOW; > -#endif > +#endif > tmp.st_size = stat->size; > tmp.st_atime = stat->atime.tv_sec; > tmp.st_mtime = stat->mtime.tv_sec; > @@ -225,7 +401,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf) > #if BITS_PER_LONG == 32 > if (stat->size > MAX_NON_LFS) > return -EOVERFLOW; > -#endif > +#endif > tmp.st_size = stat->size; > tmp.st_atime = stat->atime.tv_sec; > tmp.st_mtime = stat->mtime.tv_sec; > @@ -412,6 +588,122 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, const char __user *, filename, > } > #endif /* __ARCH_WANT_STAT64 */ > > +/* > + * Get the xstat parameters if supplied > + */ > +static int xstat_get_params(unsigned int mask, struct xstat __user *buffer, > + struct kstat *stat) > +{ > + memset(stat, 0xde, sizeof(*stat)); // DEBUGGING > + > + if (!access_ok(VERIFY_WRITE, buffer, sizeof(*buffer))) > + return -EFAULT; > + > + stat->request_mask = mask & XSTAT_ALL_STATS; > + stat->result_mask = 0; > + return 0; > +} > + > +/* > + * Set the xstat results. > + * > + * If the buffer size was 0, we just return the size of the buffer needed to > + * return the full result. > + * > + * If bufsize indicates a buffer of insufficient size to hold the full result, > + * we return -E2BIG. > + * > + * Otherwise we copy the extended stats to userspace and return the amount of > + * data written into the buffer (or -EFAULT). > + */ > +static long xstat_set_result(struct kstat *stat, struct xstat __user *buffer) > +{ > + u32 mask = stat->result_mask, gran = stat->tv_granularity; > + > +#define __put_timestamp(kts, uts) ( \ > + __put_user(kts.tv_sec, uts.tv_sec ) || \ > + __put_user(kts.tv_nsec, uts.tv_nsec ) || \ > + __put_user(gran, uts.tv_granularity )) > + > + /* clear out anything we're not returning */ > + if (!(mask & XSTAT_IOC_FLAGS)) > + stat->ioc_flags = 0; > + if (!(mask & XSTAT_BTIME)) > + memset(&stat->btime, 0, sizeof(stat->btime)); > + if (!(mask & XSTAT_GEN)) > + stat->gen = 0; > + if (!(mask & XSTAT_VERSION)) > + stat->version = 0; > + if (!(mask & XSTAT_VOLUME_ID)) > + memset(&stat->volume_id, 0, sizeof(stat->volume_id)); > + > + /* transfer the results */ > + if (__put_user(mask, &buffer->st_mask ) || > + __put_user(stat->mode, &buffer->st_mode ) || > + __put_user(stat->nlink, &buffer->st_nlink ) || > + __put_user(stat->uid, &buffer->st_uid ) || > + __put_user(stat->gid, &buffer->st_gid ) || > + __put_user(stat->information, &buffer->st_information ) || > + __put_user(stat->ioc_flags, &buffer->st_ioc_flags ) || > + __put_user(stat->blksize, &buffer->st_blksize ) || > + __put_user(MAJOR(stat->rdev), &buffer->st_rdev.major ) || > + __put_user(MINOR(stat->rdev), &buffer->st_rdev.minor ) || > + __put_user(MAJOR(stat->dev), &buffer->st_dev.major ) || > + __put_user(MINOR(stat->dev), &buffer->st_dev.minor ) || > + __put_timestamp(stat->atime, &buffer->st_atime ) || > + __put_timestamp(stat->btime, &buffer->st_btime ) || > + __put_timestamp(stat->ctime, &buffer->st_ctime ) || > + __put_timestamp(stat->mtime, &buffer->st_mtime ) || > + __put_user(stat->ino, &buffer->st_ino ) || > + __put_user(stat->size, &buffer->st_size ) || > + __put_user(stat->blocks, &buffer->st_blocks ) || > + __put_user(stat->gen, &buffer->st_gen ) || > + __put_user(stat->version, &buffer->st_version ) || > + __copy_to_user(&buffer->st_volume_id, &stat->volume_id, > + sizeof(buffer->st_volume_id) ) || > + __clear_user(&buffer->__spares, sizeof(buffer->__spares))) > + return -EFAULT; > + return 0; > +} > + > +/* > + * System call to get extended stats by path > + */ > +SYSCALL_DEFINE5(xstat, > + int, dfd, const char __user *, filename, unsigned, flags, > + unsigned int, mask, struct xstat __user *, buffer) > +{ > + struct kstat stat; > + int error; > + > + error = xstat_get_params(mask, buffer, &stat); > + if (error != 0) > + return error; > + error = vfs_xstat(dfd, filename, flags, &stat); > + if (error) > + return error; > + return xstat_set_result(&stat, buffer); > +} > + > +/* > + * System call to get extended stats by file descriptor > + */ > +SYSCALL_DEFINE4(fxstat, unsigned int, fd, unsigned int, flags, > + unsigned int, mask, struct xstat __user *, buffer) > +{ > + struct kstat stat; > + int error; > + > + error = xstat_get_params(mask, buffer, &stat); > + if (error < 0) > + return error; > + stat.query_flags = flags; > + error = vfs_fxstat(fd, &stat); > + if (error) > + return error; > + return xstat_set_result(&stat, buffer); > +} > + > /* Caller is here responsible for sufficient locking (ie. inode->i_lock) */ > void __inode_add_bytes(struct inode *inode, loff_t bytes) > { > diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h > index f550f89..faa9e5d 100644 > --- a/include/linux/fcntl.h > +++ b/include/linux/fcntl.h > @@ -47,6 +47,7 @@ > #define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ > #define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */ > #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */ > +#define AT_FORCE_ATTR_SYNC 0x2000 /* Force the attributes to be sync'd with the server */ > > #ifdef __KERNEL__ > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 8de6755..ec6c62e 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1467,6 +1467,7 @@ struct super_block { > > char s_id[32]; /* Informational name */ > u8 s_uuid[16]; /* UUID */ > + unsigned char s_volume_id[16]; /* Volume identifier */ > > void *s_fs_info; /* Filesystem private info */ > unsigned int s_max_links; > @@ -2470,6 +2471,7 @@ extern const struct inode_operations page_symlink_inode_operations; > extern int generic_readlink(struct dentry *, char __user *, int); > extern void generic_fillattr(struct inode *, struct kstat *); > extern int vfs_getattr(struct vfsmount *, struct dentry *, struct kstat *); > +extern int vfs_xgetattr(struct vfsmount *, struct dentry *, struct kstat *); > void __inode_add_bytes(struct inode *inode, loff_t bytes); > void inode_add_bytes(struct inode *inode, loff_t bytes); > void inode_sub_bytes(struct inode *inode, loff_t bytes); > @@ -2482,6 +2484,8 @@ extern int vfs_stat(const char __user *, struct kstat *); > extern int vfs_lstat(const char __user *, struct kstat *); > extern int vfs_fstat(unsigned int, struct kstat *); > extern int vfs_fstatat(int , const char __user *, struct kstat *, int); > +extern int vfs_xstat(int, const char __user *, int, struct kstat *); > +extern int vfs_xfstat(unsigned int, struct kstat *); > > extern int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, > unsigned long arg); > diff --git a/include/linux/stat.h b/include/linux/stat.h > index 611c398..0ff561a 100644 > --- a/include/linux/stat.h > +++ b/include/linux/stat.h > @@ -3,6 +3,7 @@ > > #ifdef __KERNEL__ > > +#include <linux/types.h> > #include <asm/stat.h> > > #endif > @@ -46,6 +47,117 @@ > > #endif > > +/* > + * Query request/result mask > + * > + * Bits should be set in request_mask to request particular items when calling > + * xstat() or fxstat(). > + * > + * The bits in st_mask may or may not be set upon return, in part depending on > + * what was set in the mask argument: > + * > + * - if not available at all, the bit will be cleared before returning and the > + * field will be cleared; otherwise, > + * > + * - if AT_FORCE_ATTR_SYNC is set, then the datum will be synchronised to the > + * server and the field and bit will be set on return; otherwise, > + * > + * - if explicitly requested, the datum will be synchronised to a server or > + * other medium if out of date before being returned, and the bit will be set > + * on return; otherwise, > + * > + * - if not requested, but available in approximate form without any effort, it > + * will be filled in anyway, and the bit will be set upon return (it might > + * not be up to date, however, and no attempt will be made to synchronise the > + * internal state first); otherwise, > + * > + * - the field and the bit will be cleared before returning. > + * > + * Items in XSTAT_BASIC_STATS may be marked unavailable on return, but they > + * will have a value installed for compatibility purposes so that stat() and > + * co. can be emulated in userspace. > + */ > +#define XSTAT_MODE 0x00000001U /* want/got st_mode */ > +#define XSTAT_NLINK 0x00000002U /* want/got st_nlink */ > +#define XSTAT_UID 0x00000004U /* want/got st_uid */ > +#define XSTAT_GID 0x00000008U /* want/got st_gid */ > +#define XSTAT_RDEV 0x00000010U /* want/got st_rdev */ > +#define XSTAT_ATIME 0x00000020U /* want/got st_atime */ > +#define XSTAT_MTIME 0x00000040U /* want/got st_mtime */ > +#define XSTAT_CTIME 0x00000080U /* want/got st_ctime */ > +#define XSTAT_INO 0x00000100U /* want/got st_ino */ > +#define XSTAT_SIZE 0x00000200U /* want/got st_size */ > +#define XSTAT_BLOCKS 0x00000400U /* want/got st_blocks */ > +#define XSTAT_BASIC_STATS 0x000007ffU /* the stuff in the normal stat struct */ > +#define XSTAT_IOC_FLAGS 0x00000800U /* want/got FS_IOC_GETFLAGS */ > +#define XSTAT_BTIME 0x00001000U /* want/got st_btime */ > +#define XSTAT_GEN 0x00002000U /* want/got st_gen */ > +#define XSTAT_VERSION 0x00004000U /* want/got st_version */ > +#define XSTAT_VOLUME_ID 0x00008000U /* want/got st_volume_id */ > +#define XSTAT_ALL_STATS 0x0000ffffU /* all supported stats */ > + > +/* > + * Extended stat structures > + */ > +struct xstat_dev { > + uint32_t major, minor; > +}; > + > +struct xstat_time { > + int64_t tv_sec; > + uint32_t tv_nsec; > + uint32_t tv_granularity; /* time granularity (in nS) */ > +}; > + > +struct xstat { > + uint32_t st_mask; /* what results were written */ > + uint32_t st_mode; /* file mode */ > + uint32_t st_nlink; /* number of hard links */ > + uint32_t st_uid; /* user ID of owner */ > + uint32_t st_gid; /* group ID of owner */ > + uint32_t st_information; /* information about the file */ > + uint32_t st_ioc_flags; /* as FS_IOC_GETFLAGS */ > + uint32_t st_blksize; /* optimal size for filesystem I/O */ > + struct xstat_dev st_rdev; /* device ID of special file */ > + struct xstat_dev st_dev; /* ID of device containing file */ > + struct xstat_time st_atime; /* last access time */ > + struct xstat_time st_btime; /* file creation time */ > + struct xstat_time st_ctime; /* last attribute change time */ > + struct xstat_time st_mtime; /* last data modification time */ > + uint64_t st_ino; /* inode number */ > + uint64_t st_size; /* file size */ > + uint64_t st_blocks; /* number of 512-byte blocks allocated */ > + uint64_t st_gen; /* inode generation number */ > + uint64_t st_version; /* data version number */ > + uint8_t st_volume_id[16]; /* volume identifier */ > + uint64_t __spares[11]; /* spare space for future expansion */ > +}; > + > +/* > + * Flags to be found in st_information > + * > + * These give information about the features or the state of a file that might > + * be of use to ordinary userspace programs such as GUIs or ls rather than > + * specialised tools. > + * > + * Additional information may be found in st_ioc_flags and we try not to > + * overlap with it. > + */ > +#define XSTAT_INFO_ENCRYPTED 0x00000001U /* File is encrypted */ > +#define XSTAT_INFO_TEMPORARY 0x00000002U /* File is temporary (NTFS/CIFS) */ > +#define XSTAT_INFO_FABRICATED 0x00000004U /* File was made up by filesystem */ > +#define XSTAT_INFO_KERNEL_API 0x00000008U /* File is kernel API (eg: procfs/sysfs) */ > +#define XSTAT_INFO_REMOTE 0x00000010U /* File is remote */ > +#define XSTAT_INFO_OFFLINE 0x00000020U /* File is offline (CIFS) */ > +#define XSTAT_INFO_AUTOMOUNT 0x00000040U /* Dir is automount trigger */ > +#define XSTAT_INFO_AUTODIR 0x00000080U /* Dir provides unlisted automounts */ > +#define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U /* File has non-system ownership details */ > +#define XSTAT_INFO_HAS_ACL 0x00000200U /* File has an ACL of some sort */ > +#define XSTAT_INFO_REPARSE_POINT 0x00000400U /* File is reparse point (NTFS/CIFS) */ > +#define XSTAT_INFO_HIDDEN 0x00000800U /* File is marked hidden (DOS+) */ > +#define XSTAT_INFO_SYSTEM 0x00001000U /* File is marked system (DOS+) */ > +#define XSTAT_INFO_ARCHIVE 0x00002000U /* File is marked archive (DOS+) */ > + > #ifdef __KERNEL__ > #define S_IRWXUGO (S_IRWXU|S_IRWXG|S_IRWXO) > #define S_IALLUGO (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO) > @@ -60,6 +172,12 @@ > #include <linux/time.h> > > struct kstat { > + u32 query_flags; /* operational flags */ > +#define KSTAT_QUERY_FLAGS (AT_FORCE_ATTR_SYNC) > + u32 request_mask; /* what fields the user asked for */ > + u32 result_mask; /* what fields the user got */ > + u32 information; > + u32 ioc_flags; /* inode flags (FS_IOC_GETFLAGS) */ > u64 ino; > dev_t dev; > umode_t mode; > @@ -67,14 +185,18 @@ struct kstat { > uid_t uid; > gid_t gid; > dev_t rdev; > + unsigned int tv_granularity; /* granularity of times (in nS) */ > loff_t size; > - struct timespec atime; > + struct timespec atime; > struct timespec mtime; > struct timespec ctime; > + struct timespec btime; /* file creation time */ > unsigned long blksize; > unsigned long long blocks; > + u64 gen; /* inode generation */ > + u64 version; /* data version */ > + unsigned char volume_id[16]; /* volume identifier */ > }; > > #endif > - > #endif > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h > index 3de3acb..ff9f8d9 100644 > --- a/include/linux/syscalls.h > +++ b/include/linux/syscalls.h > @@ -45,6 +45,8 @@ struct shmid_ds; > struct sockaddr; > struct stat; > struct stat64; > +struct xstat_parameters; > +struct xstat; > struct statfs; > struct statfs64; > struct __sysctl_args; > @@ -858,4 +860,9 @@ asmlinkage long sys_process_vm_writev(pid_t pid, > unsigned long riovcnt, > unsigned long flags); > > +asmlinkage long sys_xstat(int dfd, const char __user *path, unsigned flags, > + unsigned mask, struct xstat __user *buffer); > +asmlinkage long sys_fxstat(unsigned fd, unsigned flags, > + unsigned mask, struct xstat __user *buffer); > + > #endif > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <20120424212911.GA26073-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-24 21:29 ` J. Bruce Fields @ 2012-04-24 22:08 ` Steve French -1 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-24 22:08 UTC (permalink / raw) To: J. Bruce Fields Cc: David Howells, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Tue, Apr 24, 2012 at 4:29 PM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote: > On Thu, Apr 19, 2012 at 03:06:12PM +0100, David Howells wrote: >> Add a pair of system calls to make extended file stats available, including >> file creation time, inode version and data version where available through the >> underlying filesystem. >> >> The idea was initially proposed as a set of xattrs that could be retrieved with >> getxattr(), but the general preferance proved to be for new syscalls with an >> extended stat structure. >> >> This has a number of uses: >> >> (1) Creation time: The SMB protocol carries the creation time, which could be >> exported by Samba, which will in turn help CIFS make use of FS-Cache as >> that can be used for coherency data. >> >> This is also specified in NFSv4 as a recommended attribute and could be >> exported by NFSD [Steve French]. >> >> (2) Lightweight stat: Ask for just those details of interest, and allow a >> netfs (such as NFS) to approximate anything not of interest, possibly >> without going to the server [Trond Myklebust, Ulrich Drepper]. >> >> (3) Heavyweight stat: Force a netfs to go to the server, even if it thinks its >> cached attributes are up to date [Trond Myklebust]. >> >> (4) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd >> Schubert]. >> >> (5) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. >> >> Can also be used to modify fill_post_wcc() in NFSD which retrieves >> i_version directly, but has just called vfs_getattr(). It could get it >> from the kstat struct if it used vfs_xgetattr() instead. >> >> (6) BSD stat compatibility: Including more fields from the BSD stat such as >> creation time (st_btime) and inode generation number (st_gen) [Jeremy >> Allison, Bernd Schubert]. >> >> (7) Extra coherency data may be useful in making backups [Andreas Dilger]. >> >> (8) Allow the filesystem to indicate what it can/cannot provide: A filesystem >> can now say it doesn't support a standard stat feature if that isn't >> available, so if, for instance, inode numbers or UIDs don't exist... >> >> (9) Make the fields a consistent size on all arches and make them large. >> >> (10) Store a 16-byte volume ID in the superblock that can be returned in struct >> xstat [Steve French]. >> >> (11) Include granularity fields in the time data to indicate the granularity of >> each of the times (NFSv4 time_delta) [Steve French]. > > It looks like you're including this with *each* time? But surely > there's no filesystem with different granularity (say) for ctime than > for mtime. Also, nfsd will want only one time_delta, not one for each > time. > > Note also we need to document carefully what this means: I think it > should be the granularity that the filesystem is capable of > representing, but people are sometimes surprised to find out that the > actual time source is usually more coarse-grained than that. I also would prefer that we simply treat the time granularity as part of the superblock (mounted volume) ie returned on fstat rather than on every stat of the filesystem. For cifs mounts we could conceivably have different time granularity (1 or 2 second) on mounts to old servers rather than 100 nanoseconds. -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available @ 2012-04-24 22:08 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-24 22:08 UTC (permalink / raw) To: J. Bruce Fields Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Tue, Apr 24, 2012 at 4:29 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > On Thu, Apr 19, 2012 at 03:06:12PM +0100, David Howells wrote: >> Add a pair of system calls to make extended file stats available, including >> file creation time, inode version and data version where available through the >> underlying filesystem. >> >> The idea was initially proposed as a set of xattrs that could be retrieved with >> getxattr(), but the general preferance proved to be for new syscalls with an >> extended stat structure. >> >> This has a number of uses: >> >> (1) Creation time: The SMB protocol carries the creation time, which could be >> exported by Samba, which will in turn help CIFS make use of FS-Cache as >> that can be used for coherency data. >> >> This is also specified in NFSv4 as a recommended attribute and could be >> exported by NFSD [Steve French]. >> >> (2) Lightweight stat: Ask for just those details of interest, and allow a >> netfs (such as NFS) to approximate anything not of interest, possibly >> without going to the server [Trond Myklebust, Ulrich Drepper]. >> >> (3) Heavyweight stat: Force a netfs to go to the server, even if it thinks its >> cached attributes are up to date [Trond Myklebust]. >> >> (4) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd >> Schubert]. >> >> (5) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. >> >> Can also be used to modify fill_post_wcc() in NFSD which retrieves >> i_version directly, but has just called vfs_getattr(). It could get it >> from the kstat struct if it used vfs_xgetattr() instead. >> >> (6) BSD stat compatibility: Including more fields from the BSD stat such as >> creation time (st_btime) and inode generation number (st_gen) [Jeremy >> Allison, Bernd Schubert]. >> >> (7) Extra coherency data may be useful in making backups [Andreas Dilger]. >> >> (8) Allow the filesystem to indicate what it can/cannot provide: A filesystem >> can now say it doesn't support a standard stat feature if that isn't >> available, so if, for instance, inode numbers or UIDs don't exist... >> >> (9) Make the fields a consistent size on all arches and make them large. >> >> (10) Store a 16-byte volume ID in the superblock that can be returned in struct >> xstat [Steve French]. >> >> (11) Include granularity fields in the time data to indicate the granularity of >> each of the times (NFSv4 time_delta) [Steve French]. > > It looks like you're including this with *each* time? But surely > there's no filesystem with different granularity (say) for ctime than > for mtime. Also, nfsd will want only one time_delta, not one for each > time. > > Note also we need to document carefully what this means: I think it > should be the granularity that the filesystem is capable of > representing, but people are sometimes surprised to find out that the > actual time source is usually more coarse-grained than that. I also would prefer that we simply treat the time granularity as part of the superblock (mounted volume) ie returned on fstat rather than on every stat of the filesystem. For cifs mounts we could conceivably have different time granularity (1 or 2 second) on mounts to old servers rather than 100 nanoseconds. -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-24 21:29 ` J. Bruce Fields @ 2012-04-25 14:44 ` Andreas Dilger -1 siblings, 0 replies; 144+ messages in thread From: Andreas Dilger @ 2012-04-25 14:44 UTC (permalink / raw) To: J. Bruce Fields Cc: David Howells, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On 2012-04-24, at 4:29 PM, J. Bruce Fields wrote: > On Thu, Apr 19, 2012 at 03:06:12PM +0100, David Howells wrote: >> (11) Include granularity fields in the time data to indicate the >> granularity of each of the times (NFSv4 time_delta) [Steve French]. > > It looks like you're including this with *each* time? But surely > there's no filesystem with different granularity (say) for ctime than > for mtime. Also, nfsd will want only one time_delta, not one for each > time. I suspect the main reason for having a separate time_delta per timestamp is to use the extra 32-bit field in the timestamp structs. Since those structs have a 64-bit + 32-bit field, it would be messy to pack them, and leaving the spare bytes unused and adding an additional field for the granularity would just increase the struct size. > Note also we need to document carefully what this means: I think it > should be the granularity that the filesystem is capable of > representing, but people are sometimes surprised to find out that the > actual time source is usually more coarse-grained than that. > > --b. > >> >> (12) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. >> >> (13) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, >> Michael Kerrisk]. >> >> (14) Spare space, request flags and information flags are provided for future >> expansion. >> >> >> The following structures are defined for the use of these new system calls: >> >> struct xstat_dev { >> uint32_t major, minor; >> }; >> >> struct xstat_time { >> uint64_t tv_sec; >> uint32_t tv_nsec; >> uint32_t tv_granularity; >> }; >> >> struct xstat { >> uint32_t st_mask; >> uint32_t st_mode; >> uint32_t st_nlink; >> uint32_t st_uid; >> uint32_t st_gid; >> uint32_t st_information; >> uint32_t st_ioc_flags; >> uint32_t st_blksize; >> struct xstat_dev st_rdev; >> struct xstat_dev st_dev; >> struct xstat_time st_atime; >> struct xstat_time st_btime; >> struct xstat_time st_ctime; >> struct xstat_time st_mtime; >> uint64_t st_ino; >> uint64_t st_size; >> uint64_t st_blocks; >> uint64_t st_gen; >> uint64_t st_version; >> uint8_t st_volume_id[16]; >> uint64_t __spares[11]; >> }; >> >> where st_information is local system information about the file, st_btime is >> the file creation time, st_gen is the inode generation (i_generation), >> st_data_version is the data version number (i_version), st_ioc_flags is the >> flags from FS_IOC_GETFLAGS, st_volume_id is where the volume identified is >> stored, st_result_mask is a bitmask indicating the data provided and __spares[] >> are where as-yet undefined fields can be placed. >> >> The defined bits in request_mask and st_mask are: >> >> XSTAT_MODE Want/got st_mode >> XSTAT_NLINK Want/got st_nlink >> XSTAT_UID Want/got st_uid >> XSTAT_GID Want/got st_gid >> XSTAT_RDEV Want/got st_rdev >> XSTAT_ATIME Want/got st_atime >> XSTAT_MTIME Want/got st_mtime >> XSTAT_CTIME Want/got st_ctime >> XSTAT_INO Want/got st_ino >> XSTAT_SIZE Want/got st_size >> XSTAT_BLOCKS Want/got st_blocks >> XSTAT_BASIC_STATS [The stuff in the normal stat struct] >> XSTAT_IOC_FLAGS Want/got FS_IOC_GETFLAGS >> XSTAT_BTIME Want/got st_btime >> XSTAT_GEN Want/got st_gen >> XSTAT_VERSION Want/got st_data_version >> XSTAT_VOLUME_ID Want/got st_volume_id >> XSTAT_ALL_STATS [All currently available stuff] >> >> The defined bits in st_ioc_flags are the usual FS_xxx_FL, plus some extra flags >> that might be supplied by the filesystem. Note that Ext4 returns flags outside >> of {EXT4,FS}_FL_USER_VISIBLE in response to FS_IOC_GETFLAGS. Should >> {EXT4,FS}_FL_USER_VISIBLE be extended to cover them? Or should the extra flags >> be suppressed? >> >> The defined bits in the st_information field give local system data on a file, >> how it is accessed, where it is and what it does: >> >> XSTAT_INFO_ENCRYPTED File is encrypted >> XSTAT_INFO_TEMPORARY File is temporary (NTFS/CIFS/deleted) >> XSTAT_INFO_FABRICATED File was made up by filesystem >> XSTAT_INFO_KERNEL_API File is kernel API (eg: procfs/sysfs) >> XSTAT_INFO_REMOTE File is remote >> XSTAT_INFO_OFFLINE File is offline (CIFS) >> XSTAT_INFO_AUTOMOUNT Dir is automount trigger >> XSTAT_INFO_AUTODIR Dir provides unlisted automounts >> XSTAT_INFO_NONSYSTEM_OWNERSHIP File has non-system ownership details >> XSTAT_INFO_HAS_ACL File has an ACL of some sort >> XSTAT_INFO_REPARSE_POINT File is reparse point (NTFS/CIFS) >> XSTAT_INFO_HIDDEN File is marked hidden (DOS+) >> XSTAT_INFO_SYSTEM File is marked system (DOS+) >> XSTAT_INFO_ARCHIVE File is marked archive (DOS+) >> >> These are for the use of GUI tools that might want to mark files specially, >> depending on what they are. I've tried not to provide overlap with >> st_ioc_flags where something usable exists there. Should Hidden, System and >> Archive flags be associated with ioc_flags, perhaps with ioc_flags extended to >> 64-bits? >> >> >> The system calls are: >> >> ssize_t ret = xstat(int dfd, >> const char *filename, >> unsigned int flags, >> unsigned int mask, >> struct xstat *buffer); >> >> ssize_t ret = fxstat(unsigned fd, >> unsigned int flags, >> unsigned int mask, >> struct xstat *buffer); >> >> >> The dfd, filename, flags and fd parameters indicate the file to query. There >> is no equivalent of lstat() as that can be emulated with xstat() by passing >> AT_SYMLINK_NOFOLLOW in flags. >> >> AT_FORCE_ATTR_SYNC can also be set in flags. This will require a network >> filesystem to synchronise its attributes with the server. >> >> mask is a bitmask indicating the fields in struct xstat that are of interest to >> the caller. The user should set this to XSTAT__BASIC_STATS to get the >> basic set returned by stat(). >> >> Should there just be one xstat() syscall that does fxstat() if filename is NULL? >> >> The fields in struct xstat come in a number of classes: >> >> (0) st_dev, st_blksize, st_information. >> >> These are local data and are always available. >> >> (1) st_mode, st_nlinks, st_uid, st_gid, st_[amc]time, st_ino, st_size, >> st_blocks. >> >> These will be returned whether the caller asks for them or not. The >> corresponding bits in result_mask will be set to indicate their presence. >> >> If the caller didn't ask for them, then they may be approximated. For >> example, NFS won't waste any time updating them from the server, unless as >> a byproduct of updating something requested. >> >> If the values don't actually exist for the underlying object (such as UID >> or GID on a DOS file), then the bit won't be set in the result_mask, even >> if the caller asked for the value and the returned value will be a >> fabrication. >> >> (2) st_rdev. >> >> As for class (1), but this won't be returned if the file is not a blockdev >> or chardev. The bit will be cleared if the value is not returned. >> >> (3) File creation time (st_btime), inode generation (st_gen), data version >> (st_version), volume_id (st_volume_id) and inode flags (st_ioc_flags). >> >> These will be returned if available whether the caller asked for them or >> not. The corresponding bits in result_mask will be set or cleared as >> appropriate to indicate their presence. >> >> If the caller didn't ask for them, then they may be approximated. For >> example, NFS won't waste any time updating them from the server, unless >> as a byproduct of updating something requested. >> >> At the moment, this will only work on x86_64 and i386 as it requires system >> calls to be wired up. >> >> >> ======= >> TESTING >> ======= >> >> The following test program can be used to test the xstat system call: >> >> /* Test the xstat() system call >> * >> * Copyright (C) 2010 Red Hat, Inc. All Rights Reserved. >> * Written by David Howells (dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org) >> * >> * This program is free software; you can redistribute it and/or >> * modify it under the terms of the GNU General Public Licence >> * as published by the Free Software Foundation; either version >> * 2 of the Licence, or (at your option) any later version. >> */ >> >> #define _GNU_SOURCE >> #define _ATFILE_SOURCE >> #include <stdio.h> >> #include <stdlib.h> >> #include <string.h> >> #include <unistd.h> >> #include <fcntl.h> >> #include <time.h> >> #include <sys/syscall.h> >> #include <sys/stat.h> >> #include <sys/types.h> >> >> #define AT_NO_AUTOMOUNT 0x800 >> #define AT_FORCE_ATTR_SYNC 0x2000 >> >> #define XSTAT_MODE 0x00000001U >> #define XSTAT_NLINK 0x00000002U >> #define XSTAT_UID 0x00000004U >> #define XSTAT_GID 0x00000008U >> #define XSTAT_RDEV 0x00000010U >> #define XSTAT_ATIME 0x00000020U >> #define XSTAT_MTIME 0x00000040U >> #define XSTAT_CTIME 0x00000080U >> #define XSTAT_INO 0x00000100U >> #define XSTAT_SIZE 0x00000200U >> #define XSTAT_BLOCKS 0x00000400U >> #define XSTAT_BASIC_STATS 0x000007ffU >> #define XSTAT_BTIME 0x00000800U >> #define XSTAT_GEN 0x00001000U >> #define XSTAT_VERSION 0x00002000U >> #define XSTAT_IOC_FLAGS 0x00004000U >> #define XSTAT_VOLUME_ID 0x00008000U >> #define XSTAT_ALL_STATS 0x0000ffffU >> >> struct xstat_dev { >> uint32_t major; >> uint32_t minor; >> }; >> >> struct xstat_time { >> uint64_t tv_sec; >> uint32_t tv_nsec; >> uint32_t tv_granularity; >> }; >> >> struct xstat { >> uint32_t st_mask; >> uint32_t st_mode; >> uint32_t st_nlink; >> uint32_t st_uid; >> uint32_t st_gid; >> uint32_t st_information; >> uint32_t st_ioc_flags; >> uint32_t st_blksize; >> struct xstat_dev st_rdev; >> struct xstat_dev st_dev; >> struct xstat_time st_atim; >> struct xstat_time st_btim; >> struct xstat_time st_ctim; >> struct xstat_time st_mtim; >> uint64_t st_ino; >> uint64_t st_size; >> uint64_t st_blksize; >> uint64_t st_blocks; >> uint64_t st_gen; >> uint64_t st_version; >> uint64_t st_volume_id[16]; >> uint64_t st_spares[11]; >> }; >> >> #define XSTAT_INFO_ENCRYPTED 0x00000001U >> #define XSTAT_INFO_TEMPORARY 0x00000002U >> #define XSTAT_INFO_FABRICATED 0x00000004U >> #define XSTAT_INFO_KERNEL_API 0x00000008U >> #define XSTAT_INFO_REMOTE 0x00000010U >> #define XSTAT_INFO_OFFLINE 0x00000020U >> #define XSTAT_INFO_AUTOMOUNT 0x00000040U >> #define XSTAT_INFO_AUTODIR 0x00000080U >> #define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U >> #define XSTAT_INFO_HAS_ACL 0x00000200U >> #define XSTAT_INFO_REPARSE_POINT 0x00000400U >> #define XSTAT_INFO_HIDDEN 0x00000800U >> #define XSTAT_INFO_SYSTEM 0x00001000U >> #define XSTAT_INFO_ARCHIVE 0x00002000U >> >> #define __NR_xstat 312 >> #define __NR_fxstat 313 >> >> static __attribute__((unused)) >> ssize_t xstat(int dfd, const char *filename, unsigned flags, >> unsigned int mask, struct xstat *buffer) >> { >> return syscall(__NR_xstat, dfd, filename, flags, mask, buffer); >> } >> >> static __attribute__((unused)) >> ssize_t fxstat(int fd, unsigned flags, >> unsigned int mask, struct xstat *buffer) >> { >> return syscall(__NR_fxstat, fd, flags, mask, buffer); >> } >> >> static void print_time(const char *field, const struct xstat_time *xstm) >> { >> struct tm tm; >> time_t tim; >> char buffer[100]; >> int len; >> >> tim = xstm->tv_sec; >> if (!localtime_r(&tim, &tm)) { >> perror("localtime_r"); >> exit(1); >> } >> len = strftime(buffer, 100, "%F %T", &tm); >> if (len == 0) { >> perror("strftime"); >> exit(1); >> } >> printf("%s", field); >> fwrite(buffer, 1, len, stdout); >> printf(".%09u", xstm->tv_nsec); >> len = strftime(buffer, 100, "%z", &tm); >> if (len == 0) { >> perror("strftime2"); >> exit(1); >> } >> fwrite(buffer, 1, len, stdout); >> printf("\n"); >> } >> >> static void dump_xstat(struct xstat *xst) >> { >> char buffer[256], ft; >> >> printf("results=%x\n", xst->st_mask); >> >> printf(" "); >> if (xst->st_mask & XSTAT_SIZE) >> printf(" Size: %-15llu", (unsigned long long) xst->st_size); >> if (xst->st_mask & XSTAT_BLOCKS) >> printf(" Blocks: %-10llu", (unsigned long long) xst->st_blocks); >> printf(" IO Block: %-6llu ", (unsigned long long) xst->st_blksize); >> if (xst->st_mask & XSTAT_MODE) { >> switch (xst->st_mode & S_IFMT) { >> case S_IFIFO: printf(" FIFO\n"); ft = 'p'; break; >> case S_IFCHR: printf(" character special file\n"); ft = 'c'; break; >> case S_IFDIR: printf(" directory\n"); ft = 'd'; break; >> case S_IFBLK: printf(" block special file\n"); ft = 'b'; break; >> case S_IFREG: printf(" regular file\n"); ft = '-'; break; >> case S_IFLNK: printf(" symbolic link\n"); ft = 'l'; break; >> case S_IFSOCK: printf(" socket\n"); ft = 's'; break; >> default: >> printf("unknown type (%o)\n", xst->st_mode & S_IFMT); >> ft = '?'; >> break; >> } >> } >> >> sprintf(buffer, "%02x:%02x", xst->st_dev.major, xst->st_dev.minor); >> printf("Device: %-15s", buffer); >> if (xst->st_mask & XSTAT_INO) >> printf(" Inode: %-11llu", (unsigned long long) xst->st_ino); >> if (xst->st_mask & XSTAT_SIZE) >> printf(" Links: %-5u", xst->st_nlink); >> if (xst->st_mask & XSTAT_RDEV) >> printf(" Device type: %u,%u", >> xst->st_rdev.major, xst->st_rdev.minor); >> printf("\n"); >> >> if (xst->st_mask & XSTAT_MODE) >> printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c) ", >> xst->st_mode & 07777, >> ft, >> xst->st_mode & S_IRUSR ? 'r' : '-', >> xst->st_mode & S_IWUSR ? 'w' : '-', >> xst->st_mode & S_IXUSR ? 'x' : '-', >> xst->st_mode & S_IRGRP ? 'r' : '-', >> xst->st_mode & S_IWGRP ? 'w' : '-', >> xst->st_mode & S_IXGRP ? 'x' : '-', >> xst->st_mode & S_IROTH ? 'r' : '-', >> xst->st_mode & S_IWOTH ? 'w' : '-', >> xst->st_mode & S_IXOTH ? 'x' : '-'); >> if (xst->st_mask & XSTAT_UID) >> printf("Uid: %d \n", xst->st_uid); >> if (xst->st_mask & XSTAT_GID) >> printf("Gid: %u\n", xst->st_gid); >> >> if (xst->st_mask & XSTAT_ATIME) >> print_time("Access: ", &xst->st_atim); >> if (xst->st_mask & XSTAT_MTIME) >> print_time("Modify: ", &xst->st_mtim); >> if (xst->st_mask & XSTAT_CTIME) >> print_time("Change: ", &xst->st_ctim); >> if (xst->st_mask & XSTAT_BTIME) >> print_time("Create: ", &xst->st_btim); >> >> if (xst->st_mask & XSTAT_GEN) >> printf("Inode version: %llxh\n", (unsigned long long) xst->st_gen); >> if (xst->st_mask & XSTAT_VERSION) >> printf("Data version: %llxh\n", (unsigned long long) xst->st_version); >> >> if (xst->st_mask & XSTAT_IOC_FLAGS) { >> unsigned char bits; >> int loop, byte; >> >> static char flag_representation[32 + 1] = >> /* FS_IOC_GETFLAGS flags: */ >> "????????" /* 31-24 0x00000000-ff000000 */ >> "????ehTD" /* 23-16 0x00000000-00ff0000 */ >> "tj?IE?XZ" /* 15- 8 0x00000000-0000ff00 */ >> "AdaiScus" /* 7- 0 0x00000000-000000ff */ >> ; >> >> printf("Inode flags: %08x (", xst->st_ioc_flags); >> for (byte = 32 - 8; byte >= 0; byte -= 8) { >> bits = xst->st_ioc_flags >> byte; >> for (loop = 7; loop >= 0; loop--) { >> int bit = byte + loop; >> >> if (bits & 0x80) >> putchar(flag_representation[31 - bit]); >> else >> putchar('-'); >> bits <<= 1; >> } >> if (byte) >> putchar(' '); >> } >> printf(")\n"); >> } >> >> if (xst->st_information) { >> unsigned char bits; >> int loop, byte; >> >> static char info_representation[32 + 1] = >> /* XSTAT_INFO_ flags: */ >> "????????" /* 31-24 0x00000000-ff000000 */ >> "????????" /* 23-16 0x00000000-00ff0000 */ >> "??ASHRan" /* 15- 8 0x00000000-0000ff00 */ >> "dmorkfte" /* 7- 0 0x00000000-000000ff */ >> ; >> >> printf("Information: %08x (", xst->st_information); >> for (byte = 32 - 8; byte >= 0; byte -= 8) { >> bits = xst->st_information >> byte; >> for (loop = 7; loop >= 0; loop--) { >> int bit = byte + loop; >> >> if (bits & 0x80) >> putchar(info_representation[31 - bit]); >> else >> putchar('-'); >> bits <<= 1; >> } >> if (byte) >> putchar(' '); >> } >> printf(")\n"); >> } >> >> if (xst->st_mask & XSTAT_VOLUME_ID) { >> int loop; >> printf("Volume ID: "); >> for (loop = 0; loop < sizeof(xst->st_volume_id); loop++) { >> printf("%02x", xst->st_volume_id[loop]); >> if (loop == 7) >> printf("-"); >> } >> printf("\n"); >> } >> } >> >> void dump_hex(unsigned long long *data, int from, int to) >> { >> unsigned offset, print_offset = 1, col = 0; >> >> from /= 8; >> to = (to + 7) / 8; >> >> for (offset = from; offset < to; offset++) { >> if (print_offset) { >> printf("%04x: ", offset * 8); >> print_offset = 0; >> } >> printf("%016llx", data[offset]); >> col++; >> if ((col & 3) == 0) { >> printf("\n"); >> print_offset = 1; >> } else { >> printf(" "); >> } >> } >> >> if (!print_offset) >> printf("\n"); >> } >> >> int main(int argc, char **argv) >> { >> struct xstat xst; >> int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW; >> >> unsigned int mask = XSTAT_ALL_STATS; >> >> for (argv++; *argv; argv++) { >> if (strcmp(*argv, "-F") == 0) { >> atflag |= AT_FORCE_ATTR_SYNC; >> continue; >> } >> if (strcmp(*argv, "-L") == 0) { >> atflag &= ~AT_SYMLINK_NOFOLLOW; >> continue; >> } >> if (strcmp(*argv, "-O") == 0) { >> mask &= ~XSTAT_BASIC_STATS; >> continue; >> } >> if (strcmp(*argv, "-A") == 0) { >> atflag |= AT_NO_AUTOMOUNT; >> continue; >> } >> if (strcmp(*argv, "-R") == 0) { >> raw = 1; >> continue; >> } >> >> memset(&xst, 0xbf, sizeof(xst)); >> ret = xstat(AT_FDCWD, *argv, atflag, mask, &xst); >> printf("xstat(%s) = %d\n", *argv, ret); >> if (ret < 0) { >> perror(*argv); >> exit(1); >> } >> >> if (raw) >> dump_hex((unsigned long long *)&xst, 0, sizeof(xst)); >> >> dump_xstat(&xst); >> } >> return 0; >> } >> >> Just compile and run, passing it paths to the files you want to examine: >> >> [root@andromeda ~]# /tmp/xstat /proc/$$ >> xstat(/proc/2074) = 160 >> results=47ef >> Size: 0 Blocks: 0 IO Block: 1024 directory >> Device: 00:03 Inode: 9072 Links: 7 >> Access: (0555/dr-xr-xr-x) Uid: 0 >> Gid: 0 >> Access: 2010-07-14 16:50:46.609336272+0100 >> Modify: 2010-07-14 16:50:46.609336272+0100 >> Change: 2010-07-14 16:50:46.609336272+0100 >> Inode flags: 0000000100000000 (-------- -------- -------- -------S -------- -------- -------- --------) >> [root@andromeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm >> xstat(/afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm) = 160 >> results=77ef >> Size: 5413882 Blocks: 0 IO Block: 4096 regular file >> Device: 00:15 Inode: 2288 Links: 1 >> Access: (0644/-rw-r--r--) Uid: 75338 >> Gid: 0 >> Access: 2008-11-05 19:47:22.000000000+0000 >> Modify: 2008-11-05 19:47:22.000000000+0000 >> Change: 2008-11-05 19:47:22.000000000+0000 >> Inode version: 795h >> Data version: 2h >> Inode flags: 0000000800000000 (-------- -------- -------- ----r--- -------- -------- -------- --------) >> >> Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> >> --- >> >> arch/x86/syscalls/syscall_32.tbl | 2 >> arch/x86/syscalls/syscall_64.tbl | 2 >> fs/stat.c | 350 +++++++++++++++++++++++++++++++++++--- >> include/linux/fcntl.h | 1 >> include/linux/fs.h | 4 >> include/linux/stat.h | 126 +++++++++++++- >> include/linux/syscalls.h | 7 + >> 7 files changed, 461 insertions(+), 31 deletions(-) >> >> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl >> index 29f9f05..980eb5a 100644 >> --- a/arch/x86/syscalls/syscall_32.tbl >> +++ b/arch/x86/syscalls/syscall_32.tbl >> @@ -355,3 +355,5 @@ >> 346 i386 setns sys_setns >> 347 i386 process_vm_readv sys_process_vm_readv compat_sys_process_vm_readv >> 348 i386 process_vm_writev sys_process_vm_writev compat_sys_process_vm_writev >> +349 i386 xstat sys_xstat >> +350 i386 fxstat sys_fxstat >> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl >> index dd29a9e..7ae24bb 100644 >> --- a/arch/x86/syscalls/syscall_64.tbl >> +++ b/arch/x86/syscalls/syscall_64.tbl >> @@ -318,6 +318,8 @@ >> 309 common getcpu sys_getcpu >> 310 64 process_vm_readv sys_process_vm_readv >> 311 64 process_vm_writev sys_process_vm_writev >> +312 common xstat sys_xstat >> +313 common fxstat sys_fxstat >> # >> # x32-specific system call numbers start at 512 to avoid cache impact >> # for native 64-bit operation. >> diff --git a/fs/stat.c b/fs/stat.c >> index c733dc5..af3ef33 100644 >> --- a/fs/stat.c >> +++ b/fs/stat.c >> @@ -18,8 +18,20 @@ >> #include <asm/uaccess.h> >> #include <asm/unistd.h> >> >> +/** >> + * generic_fillattr - Fill in the basic attributes from the inode struct >> + * @inode: Inode to use as the source >> + * @stat: Where to fill in the attributes >> + * >> + * Fill in the basic attributes in the kstat structure from data that's to be >> + * found on the VFS inode structure. This is the default if no getattr inode >> + * operation is supplied. >> + */ >> void generic_fillattr(struct inode *inode, struct kstat *stat) >> { >> + struct super_block *sb = inode->i_sb; >> + u32 x; >> + >> stat->dev = inode->i_sb->s_dev; >> stat->ino = inode->i_ino; >> stat->mode = inode->i_mode; >> @@ -27,17 +39,61 @@ void generic_fillattr(struct inode *inode, struct kstat *stat) >> stat->uid = inode->i_uid; >> stat->gid = inode->i_gid; >> stat->rdev = inode->i_rdev; >> - stat->size = i_size_read(inode); >> - stat->atime = inode->i_atime; >> stat->mtime = inode->i_mtime; >> stat->ctime = inode->i_ctime; >> - stat->blksize = (1 << inode->i_blkbits); >> + stat->size = i_size_read(inode); >> stat->blocks = inode->i_blocks; >> -} >> + stat->blksize = (1 << inode->i_blkbits); >> >> + stat->result_mask |= XSTAT_BASIC_STATS & ~XSTAT_RDEV; >> + if (IS_NOATIME(inode)) >> + stat->result_mask &= ~XSTAT_ATIME; >> + else >> + stat->atime = inode->i_atime; >> + >> + if (S_ISREG(stat->mode) && stat->nlink == 0) >> + stat->information |= XSTAT_INFO_TEMPORARY; >> + if (IS_AUTOMOUNT(inode)) >> + stat->information |= XSTAT_INFO_AUTOMOUNT; >> + if (IS_POSIXACL(inode)) >> + stat->information |= XSTAT_INFO_HAS_ACL; >> + >> + /* if unset, assume 1s granularity */ >> + stat->tv_granularity = sb->s_time_gran ?: 1000000000U; >> + >> + if (unlikely(S_ISBLK(stat->mode) || S_ISCHR(stat->mode))) >> + stat->result_mask |= XSTAT_RDEV; >> + >> + x = ((u32*)&stat->volume_id)[0] = ((u32*)&sb->s_volume_id)[0]; >> + x |= ((u32*)&stat->volume_id)[1] = ((u32*)&sb->s_volume_id)[1]; >> + x |= ((u32*)&stat->volume_id)[2] = ((u32*)&sb->s_volume_id)[2]; >> + x |= ((u32*)&stat->volume_id)[3] = ((u32*)&sb->s_volume_id)[3]; >> + if (x) >> + stat->result_mask |= XSTAT_VOLUME_ID; >> +} >> EXPORT_SYMBOL(generic_fillattr); >> >> -int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) >> +/** >> + * vfs_xgetattr - Get the basic and extra attributes of a file >> + * @mnt: The mountpoint to which the dentry belongs >> + * @dentry: The file of interest >> + * @stat: Where to return the statistics >> + * >> + * Ask the filesystem for a file's attributes. The caller must have preset >> + * stat->request_mask and stat->query_flags to indicate what they want. >> + * >> + * If the file is remote, the filesystem can be forced to update the attributes >> + * from the backing store by passing AT_FORCE_ATTR_SYNC in query_flags. >> + * >> + * Bits must have been set in stat->request_mask to indicate which attributes >> + * the caller wants retrieving. Any such attribute not requested may be >> + * returned anyway, but the value may be approximate, and, if remote, may not >> + * have been synchronised with the server. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_xgetattr(struct vfsmount *mnt, struct dentry *dentry, >> + struct kstat *stat) >> { >> struct inode *inode = dentry->d_inode; >> int retval; >> @@ -46,64 +102,184 @@ int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) >> if (retval) >> return retval; >> >> + stat->result_mask = 0; >> + stat->information = 0; >> + stat->ioc_flags = 0; >> if (inode->i_op->getattr) >> return inode->i_op->getattr(mnt, dentry, stat); >> >> generic_fillattr(inode, stat); >> return 0; >> } >> +EXPORT_SYMBOL(vfs_xgetattr); >> >> +/** >> + * vfs_getattr - Get the basic attributes of a file >> + * @mnt: The mountpoint to which the dentry belongs >> + * @dentry: The file of interest >> + * @stat: Where to return the statistics >> + * >> + * Ask the filesystem for a file's attributes. If remote, the filesystem isn't >> + * forced to update its files from the backing store. Only the basic set of >> + * attributes will be retrieved; anyone wanting more must use vfs_getxattr(), >> + * as must anyone who wants to force attributes to be sync'd with the server. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) >> +{ >> + stat->query_flags = 0; >> + stat->request_mask = XSTAT_BASIC_STATS; >> + return vfs_xgetattr(mnt, dentry, stat); >> +} >> EXPORT_SYMBOL(vfs_getattr); >> >> -int vfs_fstat(unsigned int fd, struct kstat *stat) >> +/** >> + * vfs_fxstat - Get basic and extra attributes by file descriptor >> + * @fd: The file descriptor refering to the file of interest >> + * @stat: The result structure to fill in. >> + * >> + * This function is a wrapper around vfs_xgetattr(). The main difference is >> + * that it uses a file descriptor to determine the file location. >> + * >> + * The caller must have preset stat->query_flags and stat->request_mask as for >> + * vfs_xgetattr(). >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_fxstat(unsigned int fd, struct kstat *stat) >> { >> struct file *f = fget(fd); >> int error = -EBADF; >> >> + if (stat->query_flags & ~KSTAT_QUERY_FLAGS) >> + return -EINVAL; >> if (f) { >> - error = vfs_getattr(f->f_path.mnt, f->f_path.dentry, stat); >> + error = vfs_xgetattr(f->f_path.mnt, f->f_path.dentry, stat); >> fput(f); >> } >> return error; >> } >> +EXPORT_SYMBOL(vfs_fxstat); >> + >> +/** >> + * vfs_fstat - Get basic attributes by file descriptor >> + * @fd: The file descriptor refering to the file of interest >> + * @stat: The result structure to fill in. >> + * >> + * This function is a wrapper around vfs_getattr(). The main difference is >> + * that it uses a file descriptor to determine the file location. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_fstat(unsigned int fd, struct kstat *stat) >> +{ >> + stat->query_flags = 0; >> + stat->request_mask = XSTAT_BASIC_STATS; >> + return vfs_fxstat(fd, stat); >> +} >> EXPORT_SYMBOL(vfs_fstat); >> >> -int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, >> - int flag) >> +/** >> + * vfs_xstat - Get basic and extra attributes by filename >> + * @dfd: A file descriptor representing the base dir for a relative filename >> + * @filename: The name of the file of interest >> + * @flags: Flags to control the query >> + * @stat: The result structure to fill in. >> + * >> + * This function is a wrapper around vfs_xgetattr(). The main difference is >> + * that it uses a filename and base directory to determine the file location. >> + * Additionally, the addition of AT_SYMLINK_NOFOLLOW to flags will prevent a >> + * symlink at the given name from being referenced. >> + * >> + * The caller must have preset stat->request_mask as for vfs_xgetattr(). The >> + * flags are also used to load up stat->query_flags. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_xstat(int dfd, const char __user *filename, int flags, >> + struct kstat *stat) >> { >> struct path path; >> - int error = -EINVAL; >> - int lookup_flags = 0; >> + int error, lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT; >> >> - if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | >> - AT_EMPTY_PATH)) != 0) >> - goto out; >> + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | >> + AT_EMPTY_PATH | KSTAT_QUERY_FLAGS)) != 0) >> + return -EINVAL; >> >> - if (!(flag & AT_SYMLINK_NOFOLLOW)) >> - lookup_flags |= LOOKUP_FOLLOW; >> - if (flag & AT_EMPTY_PATH) >> + if (flags & AT_SYMLINK_NOFOLLOW) >> + lookup_flags &= ~LOOKUP_FOLLOW; >> + if (flags & AT_NO_AUTOMOUNT) >> + lookup_flags &= ~LOOKUP_AUTOMOUNT; >> + if (flags & AT_EMPTY_PATH) >> lookup_flags |= LOOKUP_EMPTY; >> >> + stat->query_flags = flags & KSTAT_QUERY_FLAGS; >> error = user_path_at(dfd, filename, lookup_flags, &path); >> - if (error) >> - goto out; >> - >> - error = vfs_getattr(path.mnt, path.dentry, stat); >> - path_put(&path); >> -out: >> + if (!error) { >> + error = vfs_xgetattr(path.mnt, path.dentry, stat); >> + path_put(&path); >> + } >> return error; >> } >> +EXPORT_SYMBOL(vfs_xstat); >> + >> +/** >> + * vfs_fstatat - Get basic attributes by filename >> + * @dfd: A file descriptor representing the base dir for a relative filename >> + * @filename: The name of the file of interest >> + * @flags: Flags to control the query >> + * @stat: The result structure to fill in. >> + * >> + * This function is a wrapper around vfs_xstat(). The difference is that it >> + * preselects basic stats only. The flags are used to load up >> + * stat->query_flags in addition to indicating symlink handling during path >> + * resolution. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, >> + int flags) >> +{ >> + stat->request_mask = XSTAT_BASIC_STATS; >> + return vfs_xstat(dfd, filename, flags, stat); >> +} >> EXPORT_SYMBOL(vfs_fstatat); >> >> -int vfs_stat(const char __user *name, struct kstat *stat) >> +/** >> + * vfs_stat - Get basic attributes by filename >> + * @filename: The name of the file of interest >> + * @stat: The result structure to fill in. >> + * >> + * This function is a wrapper around vfs_xstat(). The difference is that it >> + * preselects basic stats only, terminal symlinks are followed regardless and a >> + * remote filesystem can't be forced to query the server. If such is desired, >> + * vfs_xstat() should be used instead. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_stat(const char __user *filename, struct kstat *stat) >> { >> - return vfs_fstatat(AT_FDCWD, name, stat, 0); >> + stat->request_mask = XSTAT_BASIC_STATS; >> + return vfs_xstat(AT_FDCWD, filename, 0, stat); >> } >> EXPORT_SYMBOL(vfs_stat); >> >> +/** >> + * vfs_stat - Get basic attributes by filename, without following terminal symlink >> + * @filename: The name of the file of interest >> + * @stat: The result structure to fill in. >> + * >> + * This function is a wrapper around vfs_xstat(). The difference is that it >> + * preselects basic stats only, terminal symlinks are note followed regardless >> + * and a remote filesystem can't be forced to query the server. If such is >> + * desired, vfs_xstat() should be used instead. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> int vfs_lstat(const char __user *name, struct kstat *stat) >> { >> - return vfs_fstatat(AT_FDCWD, name, stat, AT_SYMLINK_NOFOLLOW); >> + return vfs_xstat(AT_FDCWD, name, AT_SYMLINK_NOFOLLOW, stat); >> } >> EXPORT_SYMBOL(vfs_lstat); >> >> @@ -118,7 +294,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta >> { >> static int warncount = 5; >> struct __old_kernel_stat tmp; >> - >> + >> if (warncount > 0) { >> warncount--; >> printk(KERN_WARNING "VFS: Warning: %s using old stat() call. Recompile your binary.\n", >> @@ -143,7 +319,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta >> #if BITS_PER_LONG == 32 >> if (stat->size > MAX_NON_LFS) >> return -EOVERFLOW; >> -#endif >> +#endif >> tmp.st_size = stat->size; >> tmp.st_atime = stat->atime.tv_sec; >> tmp.st_mtime = stat->mtime.tv_sec; >> @@ -225,7 +401,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf) >> #if BITS_PER_LONG == 32 >> if (stat->size > MAX_NON_LFS) >> return -EOVERFLOW; >> -#endif >> +#endif >> tmp.st_size = stat->size; >> tmp.st_atime = stat->atime.tv_sec; >> tmp.st_mtime = stat->mtime.tv_sec; >> @@ -412,6 +588,122 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, const char __user *, filename, >> } >> #endif /* __ARCH_WANT_STAT64 */ >> >> +/* >> + * Get the xstat parameters if supplied >> + */ >> +static int xstat_get_params(unsigned int mask, struct xstat __user *buffer, >> + struct kstat *stat) >> +{ >> + memset(stat, 0xde, sizeof(*stat)); // DEBUGGING >> + >> + if (!access_ok(VERIFY_WRITE, buffer, sizeof(*buffer))) >> + return -EFAULT; >> + >> + stat->request_mask = mask & XSTAT_ALL_STATS; >> + stat->result_mask = 0; >> + return 0; >> +} >> + >> +/* >> + * Set the xstat results. >> + * >> + * If the buffer size was 0, we just return the size of the buffer needed to >> + * return the full result. >> + * >> + * If bufsize indicates a buffer of insufficient size to hold the full result, >> + * we return -E2BIG. >> + * >> + * Otherwise we copy the extended stats to userspace and return the amount of >> + * data written into the buffer (or -EFAULT). >> + */ >> +static long xstat_set_result(struct kstat *stat, struct xstat __user *buffer) >> +{ >> + u32 mask = stat->result_mask, gran = stat->tv_granularity; >> + >> +#define __put_timestamp(kts, uts) ( \ >> + __put_user(kts.tv_sec, uts.tv_sec ) || \ >> + __put_user(kts.tv_nsec, uts.tv_nsec ) || \ >> + __put_user(gran, uts.tv_granularity )) >> + >> + /* clear out anything we're not returning */ >> + if (!(mask & XSTAT_IOC_FLAGS)) >> + stat->ioc_flags = 0; >> + if (!(mask & XSTAT_BTIME)) >> + memset(&stat->btime, 0, sizeof(stat->btime)); >> + if (!(mask & XSTAT_GEN)) >> + stat->gen = 0; >> + if (!(mask & XSTAT_VERSION)) >> + stat->version = 0; >> + if (!(mask & XSTAT_VOLUME_ID)) >> + memset(&stat->volume_id, 0, sizeof(stat->volume_id)); >> + >> + /* transfer the results */ >> + if (__put_user(mask, &buffer->st_mask ) || >> + __put_user(stat->mode, &buffer->st_mode ) || >> + __put_user(stat->nlink, &buffer->st_nlink ) || >> + __put_user(stat->uid, &buffer->st_uid ) || >> + __put_user(stat->gid, &buffer->st_gid ) || >> + __put_user(stat->information, &buffer->st_information ) || >> + __put_user(stat->ioc_flags, &buffer->st_ioc_flags ) || >> + __put_user(stat->blksize, &buffer->st_blksize ) || >> + __put_user(MAJOR(stat->rdev), &buffer->st_rdev.major ) || >> + __put_user(MINOR(stat->rdev), &buffer->st_rdev.minor ) || >> + __put_user(MAJOR(stat->dev), &buffer->st_dev.major ) || >> + __put_user(MINOR(stat->dev), &buffer->st_dev.minor ) || >> + __put_timestamp(stat->atime, &buffer->st_atime ) || >> + __put_timestamp(stat->btime, &buffer->st_btime ) || >> + __put_timestamp(stat->ctime, &buffer->st_ctime ) || >> + __put_timestamp(stat->mtime, &buffer->st_mtime ) || >> + __put_user(stat->ino, &buffer->st_ino ) || >> + __put_user(stat->size, &buffer->st_size ) || >> + __put_user(stat->blocks, &buffer->st_blocks ) || >> + __put_user(stat->gen, &buffer->st_gen ) || >> + __put_user(stat->version, &buffer->st_version ) || >> + __copy_to_user(&buffer->st_volume_id, &stat->volume_id, >> + sizeof(buffer->st_volume_id) ) || >> + __clear_user(&buffer->__spares, sizeof(buffer->__spares))) >> + return -EFAULT; >> + return 0; >> +} >> + >> +/* >> + * System call to get extended stats by path >> + */ >> +SYSCALL_DEFINE5(xstat, >> + int, dfd, const char __user *, filename, unsigned, flags, >> + unsigned int, mask, struct xstat __user *, buffer) >> +{ >> + struct kstat stat; >> + int error; >> + >> + error = xstat_get_params(mask, buffer, &stat); >> + if (error != 0) >> + return error; >> + error = vfs_xstat(dfd, filename, flags, &stat); >> + if (error) >> + return error; >> + return xstat_set_result(&stat, buffer); >> +} >> + >> +/* >> + * System call to get extended stats by file descriptor >> + */ >> +SYSCALL_DEFINE4(fxstat, unsigned int, fd, unsigned int, flags, >> + unsigned int, mask, struct xstat __user *, buffer) >> +{ >> + struct kstat stat; >> + int error; >> + >> + error = xstat_get_params(mask, buffer, &stat); >> + if (error < 0) >> + return error; >> + stat.query_flags = flags; >> + error = vfs_fxstat(fd, &stat); >> + if (error) >> + return error; >> + return xstat_set_result(&stat, buffer); >> +} >> + >> /* Caller is here responsible for sufficient locking (ie. inode->i_lock) */ >> void __inode_add_bytes(struct inode *inode, loff_t bytes) >> { >> diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h >> index f550f89..faa9e5d 100644 >> --- a/include/linux/fcntl.h >> +++ b/include/linux/fcntl.h >> @@ -47,6 +47,7 @@ >> #define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ >> #define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */ >> #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */ >> +#define AT_FORCE_ATTR_SYNC 0x2000 /* Force the attributes to be sync'd with the server */ >> >> #ifdef __KERNEL__ >> >> diff --git a/include/linux/fs.h b/include/linux/fs.h >> index 8de6755..ec6c62e 100644 >> --- a/include/linux/fs.h >> +++ b/include/linux/fs.h >> @@ -1467,6 +1467,7 @@ struct super_block { >> >> char s_id[32]; /* Informational name */ >> u8 s_uuid[16]; /* UUID */ >> + unsigned char s_volume_id[16]; /* Volume identifier */ >> >> void *s_fs_info; /* Filesystem private info */ >> unsigned int s_max_links; >> @@ -2470,6 +2471,7 @@ extern const struct inode_operations page_symlink_inode_operations; >> extern int generic_readlink(struct dentry *, char __user *, int); >> extern void generic_fillattr(struct inode *, struct kstat *); >> extern int vfs_getattr(struct vfsmount *, struct dentry *, struct kstat *); >> +extern int vfs_xgetattr(struct vfsmount *, struct dentry *, struct kstat *); >> void __inode_add_bytes(struct inode *inode, loff_t bytes); >> void inode_add_bytes(struct inode *inode, loff_t bytes); >> void inode_sub_bytes(struct inode *inode, loff_t bytes); >> @@ -2482,6 +2484,8 @@ extern int vfs_stat(const char __user *, struct kstat *); >> extern int vfs_lstat(const char __user *, struct kstat *); >> extern int vfs_fstat(unsigned int, struct kstat *); >> extern int vfs_fstatat(int , const char __user *, struct kstat *, int); >> +extern int vfs_xstat(int, const char __user *, int, struct kstat *); >> +extern int vfs_xfstat(unsigned int, struct kstat *); >> >> extern int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, >> unsigned long arg); >> diff --git a/include/linux/stat.h b/include/linux/stat.h >> index 611c398..0ff561a 100644 >> --- a/include/linux/stat.h >> +++ b/include/linux/stat.h >> @@ -3,6 +3,7 @@ >> >> #ifdef __KERNEL__ >> >> +#include <linux/types.h> >> #include <asm/stat.h> >> >> #endif >> @@ -46,6 +47,117 @@ >> >> #endif >> >> +/* >> + * Query request/result mask >> + * >> + * Bits should be set in request_mask to request particular items when calling >> + * xstat() or fxstat(). >> + * >> + * The bits in st_mask may or may not be set upon return, in part depending on >> + * what was set in the mask argument: >> + * >> + * - if not available at all, the bit will be cleared before returning and the >> + * field will be cleared; otherwise, >> + * >> + * - if AT_FORCE_ATTR_SYNC is set, then the datum will be synchronised to the >> + * server and the field and bit will be set on return; otherwise, >> + * >> + * - if explicitly requested, the datum will be synchronised to a server or >> + * other medium if out of date before being returned, and the bit will be set >> + * on return; otherwise, >> + * >> + * - if not requested, but available in approximate form without any effort, it >> + * will be filled in anyway, and the bit will be set upon return (it might >> + * not be up to date, however, and no attempt will be made to synchronise the >> + * internal state first); otherwise, >> + * >> + * - the field and the bit will be cleared before returning. >> + * >> + * Items in XSTAT_BASIC_STATS may be marked unavailable on return, but they >> + * will have a value installed for compatibility purposes so that stat() and >> + * co. can be emulated in userspace. >> + */ >> +#define XSTAT_MODE 0x00000001U /* want/got st_mode */ >> +#define XSTAT_NLINK 0x00000002U /* want/got st_nlink */ >> +#define XSTAT_UID 0x00000004U /* want/got st_uid */ >> +#define XSTAT_GID 0x00000008U /* want/got st_gid */ >> +#define XSTAT_RDEV 0x00000010U /* want/got st_rdev */ >> +#define XSTAT_ATIME 0x00000020U /* want/got st_atime */ >> +#define XSTAT_MTIME 0x00000040U /* want/got st_mtime */ >> +#define XSTAT_CTIME 0x00000080U /* want/got st_ctime */ >> +#define XSTAT_INO 0x00000100U /* want/got st_ino */ >> +#define XSTAT_SIZE 0x00000200U /* want/got st_size */ >> +#define XSTAT_BLOCKS 0x00000400U /* want/got st_blocks */ >> +#define XSTAT_BASIC_STATS 0x000007ffU /* the stuff in the normal stat struct */ >> +#define XSTAT_IOC_FLAGS 0x00000800U /* want/got FS_IOC_GETFLAGS */ >> +#define XSTAT_BTIME 0x00001000U /* want/got st_btime */ >> +#define XSTAT_GEN 0x00002000U /* want/got st_gen */ >> +#define XSTAT_VERSION 0x00004000U /* want/got st_version */ >> +#define XSTAT_VOLUME_ID 0x00008000U /* want/got st_volume_id */ >> +#define XSTAT_ALL_STATS 0x0000ffffU /* all supported stats */ >> + >> +/* >> + * Extended stat structures >> + */ >> +struct xstat_dev { >> + uint32_t major, minor; >> +}; >> + >> +struct xstat_time { >> + int64_t tv_sec; >> + uint32_t tv_nsec; >> + uint32_t tv_granularity; /* time granularity (in nS) */ >> +}; >> + >> +struct xstat { >> + uint32_t st_mask; /* what results were written */ >> + uint32_t st_mode; /* file mode */ >> + uint32_t st_nlink; /* number of hard links */ >> + uint32_t st_uid; /* user ID of owner */ >> + uint32_t st_gid; /* group ID of owner */ >> + uint32_t st_information; /* information about the file */ >> + uint32_t st_ioc_flags; /* as FS_IOC_GETFLAGS */ >> + uint32_t st_blksize; /* optimal size for filesystem I/O */ >> + struct xstat_dev st_rdev; /* device ID of special file */ >> + struct xstat_dev st_dev; /* ID of device containing file */ >> + struct xstat_time st_atime; /* last access time */ >> + struct xstat_time st_btime; /* file creation time */ >> + struct xstat_time st_ctime; /* last attribute change time */ >> + struct xstat_time st_mtime; /* last data modification time */ >> + uint64_t st_ino; /* inode number */ >> + uint64_t st_size; /* file size */ >> + uint64_t st_blocks; /* number of 512-byte blocks allocated */ >> + uint64_t st_gen; /* inode generation number */ >> + uint64_t st_version; /* data version number */ >> + uint8_t st_volume_id[16]; /* volume identifier */ >> + uint64_t __spares[11]; /* spare space for future expansion */ >> +}; >> + >> +/* >> + * Flags to be found in st_information >> + * >> + * These give information about the features or the state of a file that might >> + * be of use to ordinary userspace programs such as GUIs or ls rather than >> + * specialised tools. >> + * >> + * Additional information may be found in st_ioc_flags and we try not to >> + * overlap with it. >> + */ >> +#define XSTAT_INFO_ENCRYPTED 0x00000001U /* File is encrypted */ >> +#define XSTAT_INFO_TEMPORARY 0x00000002U /* File is temporary (NTFS/CIFS) */ >> +#define XSTAT_INFO_FABRICATED 0x00000004U /* File was made up by filesystem */ >> +#define XSTAT_INFO_KERNEL_API 0x00000008U /* File is kernel API (eg: procfs/sysfs) */ >> +#define XSTAT_INFO_REMOTE 0x00000010U /* File is remote */ >> +#define XSTAT_INFO_OFFLINE 0x00000020U /* File is offline (CIFS) */ >> +#define XSTAT_INFO_AUTOMOUNT 0x00000040U /* Dir is automount trigger */ >> +#define XSTAT_INFO_AUTODIR 0x00000080U /* Dir provides unlisted automounts */ >> +#define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U /* File has non-system ownership details */ >> +#define XSTAT_INFO_HAS_ACL 0x00000200U /* File has an ACL of some sort */ >> +#define XSTAT_INFO_REPARSE_POINT 0x00000400U /* File is reparse point (NTFS/CIFS) */ >> +#define XSTAT_INFO_HIDDEN 0x00000800U /* File is marked hidden (DOS+) */ >> +#define XSTAT_INFO_SYSTEM 0x00001000U /* File is marked system (DOS+) */ >> +#define XSTAT_INFO_ARCHIVE 0x00002000U /* File is marked archive (DOS+) */ >> + >> #ifdef __KERNEL__ >> #define S_IRWXUGO (S_IRWXU|S_IRWXG|S_IRWXO) >> #define S_IALLUGO (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO) >> @@ -60,6 +172,12 @@ >> #include <linux/time.h> >> >> struct kstat { >> + u32 query_flags; /* operational flags */ >> +#define KSTAT_QUERY_FLAGS (AT_FORCE_ATTR_SYNC) >> + u32 request_mask; /* what fields the user asked for */ >> + u32 result_mask; /* what fields the user got */ >> + u32 information; >> + u32 ioc_flags; /* inode flags (FS_IOC_GETFLAGS) */ >> u64 ino; >> dev_t dev; >> umode_t mode; >> @@ -67,14 +185,18 @@ struct kstat { >> uid_t uid; >> gid_t gid; >> dev_t rdev; >> + unsigned int tv_granularity; /* granularity of times (in nS) */ >> loff_t size; >> - struct timespec atime; >> + struct timespec atime; >> struct timespec mtime; >> struct timespec ctime; >> + struct timespec btime; /* file creation time */ >> unsigned long blksize; >> unsigned long long blocks; >> + u64 gen; /* inode generation */ >> + u64 version; /* data version */ >> + unsigned char volume_id[16]; /* volume identifier */ >> }; >> >> #endif >> - >> #endif >> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h >> index 3de3acb..ff9f8d9 100644 >> --- a/include/linux/syscalls.h >> +++ b/include/linux/syscalls.h >> @@ -45,6 +45,8 @@ struct shmid_ds; >> struct sockaddr; >> struct stat; >> struct stat64; >> +struct xstat_parameters; >> +struct xstat; >> struct statfs; >> struct statfs64; >> struct __sysctl_args; >> @@ -858,4 +860,9 @@ asmlinkage long sys_process_vm_writev(pid_t pid, >> unsigned long riovcnt, >> unsigned long flags); >> >> +asmlinkage long sys_xstat(int dfd, const char __user *path, unsigned flags, >> + unsigned mask, struct xstat __user *buffer); >> +asmlinkage long sys_fxstat(unsigned fd, unsigned flags, >> + unsigned mask, struct xstat __user *buffer); >> + >> #endif >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available @ 2012-04-25 14:44 ` Andreas Dilger 0 siblings, 0 replies; 144+ messages in thread From: Andreas Dilger @ 2012-04-25 14:44 UTC (permalink / raw) To: J. Bruce Fields Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On 2012-04-24, at 4:29 PM, J. Bruce Fields wrote: > On Thu, Apr 19, 2012 at 03:06:12PM +0100, David Howells wrote: >> (11) Include granularity fields in the time data to indicate the >> granularity of each of the times (NFSv4 time_delta) [Steve French]. > > It looks like you're including this with *each* time? But surely > there's no filesystem with different granularity (say) for ctime than > for mtime. Also, nfsd will want only one time_delta, not one for each > time. I suspect the main reason for having a separate time_delta per timestamp is to use the extra 32-bit field in the timestamp structs. Since those structs have a 64-bit + 32-bit field, it would be messy to pack them, and leaving the spare bytes unused and adding an additional field for the granularity would just increase the struct size. > Note also we need to document carefully what this means: I think it > should be the granularity that the filesystem is capable of > representing, but people are sometimes surprised to find out that the > actual time source is usually more coarse-grained than that. > > --b. > >> >> (12) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. >> >> (13) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, >> Michael Kerrisk]. >> >> (14) Spare space, request flags and information flags are provided for future >> expansion. >> >> >> The following structures are defined for the use of these new system calls: >> >> struct xstat_dev { >> uint32_t major, minor; >> }; >> >> struct xstat_time { >> uint64_t tv_sec; >> uint32_t tv_nsec; >> uint32_t tv_granularity; >> }; >> >> struct xstat { >> uint32_t st_mask; >> uint32_t st_mode; >> uint32_t st_nlink; >> uint32_t st_uid; >> uint32_t st_gid; >> uint32_t st_information; >> uint32_t st_ioc_flags; >> uint32_t st_blksize; >> struct xstat_dev st_rdev; >> struct xstat_dev st_dev; >> struct xstat_time st_atime; >> struct xstat_time st_btime; >> struct xstat_time st_ctime; >> struct xstat_time st_mtime; >> uint64_t st_ino; >> uint64_t st_size; >> uint64_t st_blocks; >> uint64_t st_gen; >> uint64_t st_version; >> uint8_t st_volume_id[16]; >> uint64_t __spares[11]; >> }; >> >> where st_information is local system information about the file, st_btime is >> the file creation time, st_gen is the inode generation (i_generation), >> st_data_version is the data version number (i_version), st_ioc_flags is the >> flags from FS_IOC_GETFLAGS, st_volume_id is where the volume identified is >> stored, st_result_mask is a bitmask indicating the data provided and __spares[] >> are where as-yet undefined fields can be placed. >> >> The defined bits in request_mask and st_mask are: >> >> XSTAT_MODE Want/got st_mode >> XSTAT_NLINK Want/got st_nlink >> XSTAT_UID Want/got st_uid >> XSTAT_GID Want/got st_gid >> XSTAT_RDEV Want/got st_rdev >> XSTAT_ATIME Want/got st_atime >> XSTAT_MTIME Want/got st_mtime >> XSTAT_CTIME Want/got st_ctime >> XSTAT_INO Want/got st_ino >> XSTAT_SIZE Want/got st_size >> XSTAT_BLOCKS Want/got st_blocks >> XSTAT_BASIC_STATS [The stuff in the normal stat struct] >> XSTAT_IOC_FLAGS Want/got FS_IOC_GETFLAGS >> XSTAT_BTIME Want/got st_btime >> XSTAT_GEN Want/got st_gen >> XSTAT_VERSION Want/got st_data_version >> XSTAT_VOLUME_ID Want/got st_volume_id >> XSTAT_ALL_STATS [All currently available stuff] >> >> The defined bits in st_ioc_flags are the usual FS_xxx_FL, plus some extra flags >> that might be supplied by the filesystem. Note that Ext4 returns flags outside >> of {EXT4,FS}_FL_USER_VISIBLE in response to FS_IOC_GETFLAGS. Should >> {EXT4,FS}_FL_USER_VISIBLE be extended to cover them? Or should the extra flags >> be suppressed? >> >> The defined bits in the st_information field give local system data on a file, >> how it is accessed, where it is and what it does: >> >> XSTAT_INFO_ENCRYPTED File is encrypted >> XSTAT_INFO_TEMPORARY File is temporary (NTFS/CIFS/deleted) >> XSTAT_INFO_FABRICATED File was made up by filesystem >> XSTAT_INFO_KERNEL_API File is kernel API (eg: procfs/sysfs) >> XSTAT_INFO_REMOTE File is remote >> XSTAT_INFO_OFFLINE File is offline (CIFS) >> XSTAT_INFO_AUTOMOUNT Dir is automount trigger >> XSTAT_INFO_AUTODIR Dir provides unlisted automounts >> XSTAT_INFO_NONSYSTEM_OWNERSHIP File has non-system ownership details >> XSTAT_INFO_HAS_ACL File has an ACL of some sort >> XSTAT_INFO_REPARSE_POINT File is reparse point (NTFS/CIFS) >> XSTAT_INFO_HIDDEN File is marked hidden (DOS+) >> XSTAT_INFO_SYSTEM File is marked system (DOS+) >> XSTAT_INFO_ARCHIVE File is marked archive (DOS+) >> >> These are for the use of GUI tools that might want to mark files specially, >> depending on what they are. I've tried not to provide overlap with >> st_ioc_flags where something usable exists there. Should Hidden, System and >> Archive flags be associated with ioc_flags, perhaps with ioc_flags extended to >> 64-bits? >> >> >> The system calls are: >> >> ssize_t ret = xstat(int dfd, >> const char *filename, >> unsigned int flags, >> unsigned int mask, >> struct xstat *buffer); >> >> ssize_t ret = fxstat(unsigned fd, >> unsigned int flags, >> unsigned int mask, >> struct xstat *buffer); >> >> >> The dfd, filename, flags and fd parameters indicate the file to query. There >> is no equivalent of lstat() as that can be emulated with xstat() by passing >> AT_SYMLINK_NOFOLLOW in flags. >> >> AT_FORCE_ATTR_SYNC can also be set in flags. This will require a network >> filesystem to synchronise its attributes with the server. >> >> mask is a bitmask indicating the fields in struct xstat that are of interest to >> the caller. The user should set this to XSTAT__BASIC_STATS to get the >> basic set returned by stat(). >> >> Should there just be one xstat() syscall that does fxstat() if filename is NULL? >> >> The fields in struct xstat come in a number of classes: >> >> (0) st_dev, st_blksize, st_information. >> >> These are local data and are always available. >> >> (1) st_mode, st_nlinks, st_uid, st_gid, st_[amc]time, st_ino, st_size, >> st_blocks. >> >> These will be returned whether the caller asks for them or not. The >> corresponding bits in result_mask will be set to indicate their presence. >> >> If the caller didn't ask for them, then they may be approximated. For >> example, NFS won't waste any time updating them from the server, unless as >> a byproduct of updating something requested. >> >> If the values don't actually exist for the underlying object (such as UID >> or GID on a DOS file), then the bit won't be set in the result_mask, even >> if the caller asked for the value and the returned value will be a >> fabrication. >> >> (2) st_rdev. >> >> As for class (1), but this won't be returned if the file is not a blockdev >> or chardev. The bit will be cleared if the value is not returned. >> >> (3) File creation time (st_btime), inode generation (st_gen), data version >> (st_version), volume_id (st_volume_id) and inode flags (st_ioc_flags). >> >> These will be returned if available whether the caller asked for them or >> not. The corresponding bits in result_mask will be set or cleared as >> appropriate to indicate their presence. >> >> If the caller didn't ask for them, then they may be approximated. For >> example, NFS won't waste any time updating them from the server, unless >> as a byproduct of updating something requested. >> >> At the moment, this will only work on x86_64 and i386 as it requires system >> calls to be wired up. >> >> >> ======= >> TESTING >> ======= >> >> The following test program can be used to test the xstat system call: >> >> /* Test the xstat() system call >> * >> * Copyright (C) 2010 Red Hat, Inc. All Rights Reserved. >> * Written by David Howells (dhowells@redhat.com) >> * >> * This program is free software; you can redistribute it and/or >> * modify it under the terms of the GNU General Public Licence >> * as published by the Free Software Foundation; either version >> * 2 of the Licence, or (at your option) any later version. >> */ >> >> #define _GNU_SOURCE >> #define _ATFILE_SOURCE >> #include <stdio.h> >> #include <stdlib.h> >> #include <string.h> >> #include <unistd.h> >> #include <fcntl.h> >> #include <time.h> >> #include <sys/syscall.h> >> #include <sys/stat.h> >> #include <sys/types.h> >> >> #define AT_NO_AUTOMOUNT 0x800 >> #define AT_FORCE_ATTR_SYNC 0x2000 >> >> #define XSTAT_MODE 0x00000001U >> #define XSTAT_NLINK 0x00000002U >> #define XSTAT_UID 0x00000004U >> #define XSTAT_GID 0x00000008U >> #define XSTAT_RDEV 0x00000010U >> #define XSTAT_ATIME 0x00000020U >> #define XSTAT_MTIME 0x00000040U >> #define XSTAT_CTIME 0x00000080U >> #define XSTAT_INO 0x00000100U >> #define XSTAT_SIZE 0x00000200U >> #define XSTAT_BLOCKS 0x00000400U >> #define XSTAT_BASIC_STATS 0x000007ffU >> #define XSTAT_BTIME 0x00000800U >> #define XSTAT_GEN 0x00001000U >> #define XSTAT_VERSION 0x00002000U >> #define XSTAT_IOC_FLAGS 0x00004000U >> #define XSTAT_VOLUME_ID 0x00008000U >> #define XSTAT_ALL_STATS 0x0000ffffU >> >> struct xstat_dev { >> uint32_t major; >> uint32_t minor; >> }; >> >> struct xstat_time { >> uint64_t tv_sec; >> uint32_t tv_nsec; >> uint32_t tv_granularity; >> }; >> >> struct xstat { >> uint32_t st_mask; >> uint32_t st_mode; >> uint32_t st_nlink; >> uint32_t st_uid; >> uint32_t st_gid; >> uint32_t st_information; >> uint32_t st_ioc_flags; >> uint32_t st_blksize; >> struct xstat_dev st_rdev; >> struct xstat_dev st_dev; >> struct xstat_time st_atim; >> struct xstat_time st_btim; >> struct xstat_time st_ctim; >> struct xstat_time st_mtim; >> uint64_t st_ino; >> uint64_t st_size; >> uint64_t st_blksize; >> uint64_t st_blocks; >> uint64_t st_gen; >> uint64_t st_version; >> uint64_t st_volume_id[16]; >> uint64_t st_spares[11]; >> }; >> >> #define XSTAT_INFO_ENCRYPTED 0x00000001U >> #define XSTAT_INFO_TEMPORARY 0x00000002U >> #define XSTAT_INFO_FABRICATED 0x00000004U >> #define XSTAT_INFO_KERNEL_API 0x00000008U >> #define XSTAT_INFO_REMOTE 0x00000010U >> #define XSTAT_INFO_OFFLINE 0x00000020U >> #define XSTAT_INFO_AUTOMOUNT 0x00000040U >> #define XSTAT_INFO_AUTODIR 0x00000080U >> #define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U >> #define XSTAT_INFO_HAS_ACL 0x00000200U >> #define XSTAT_INFO_REPARSE_POINT 0x00000400U >> #define XSTAT_INFO_HIDDEN 0x00000800U >> #define XSTAT_INFO_SYSTEM 0x00001000U >> #define XSTAT_INFO_ARCHIVE 0x00002000U >> >> #define __NR_xstat 312 >> #define __NR_fxstat 313 >> >> static __attribute__((unused)) >> ssize_t xstat(int dfd, const char *filename, unsigned flags, >> unsigned int mask, struct xstat *buffer) >> { >> return syscall(__NR_xstat, dfd, filename, flags, mask, buffer); >> } >> >> static __attribute__((unused)) >> ssize_t fxstat(int fd, unsigned flags, >> unsigned int mask, struct xstat *buffer) >> { >> return syscall(__NR_fxstat, fd, flags, mask, buffer); >> } >> >> static void print_time(const char *field, const struct xstat_time *xstm) >> { >> struct tm tm; >> time_t tim; >> char buffer[100]; >> int len; >> >> tim = xstm->tv_sec; >> if (!localtime_r(&tim, &tm)) { >> perror("localtime_r"); >> exit(1); >> } >> len = strftime(buffer, 100, "%F %T", &tm); >> if (len == 0) { >> perror("strftime"); >> exit(1); >> } >> printf("%s", field); >> fwrite(buffer, 1, len, stdout); >> printf(".%09u", xstm->tv_nsec); >> len = strftime(buffer, 100, "%z", &tm); >> if (len == 0) { >> perror("strftime2"); >> exit(1); >> } >> fwrite(buffer, 1, len, stdout); >> printf("\n"); >> } >> >> static void dump_xstat(struct xstat *xst) >> { >> char buffer[256], ft; >> >> printf("results=%x\n", xst->st_mask); >> >> printf(" "); >> if (xst->st_mask & XSTAT_SIZE) >> printf(" Size: %-15llu", (unsigned long long) xst->st_size); >> if (xst->st_mask & XSTAT_BLOCKS) >> printf(" Blocks: %-10llu", (unsigned long long) xst->st_blocks); >> printf(" IO Block: %-6llu ", (unsigned long long) xst->st_blksize); >> if (xst->st_mask & XSTAT_MODE) { >> switch (xst->st_mode & S_IFMT) { >> case S_IFIFO: printf(" FIFO\n"); ft = 'p'; break; >> case S_IFCHR: printf(" character special file\n"); ft = 'c'; break; >> case S_IFDIR: printf(" directory\n"); ft = 'd'; break; >> case S_IFBLK: printf(" block special file\n"); ft = 'b'; break; >> case S_IFREG: printf(" regular file\n"); ft = '-'; break; >> case S_IFLNK: printf(" symbolic link\n"); ft = 'l'; break; >> case S_IFSOCK: printf(" socket\n"); ft = 's'; break; >> default: >> printf("unknown type (%o)\n", xst->st_mode & S_IFMT); >> ft = '?'; >> break; >> } >> } >> >> sprintf(buffer, "%02x:%02x", xst->st_dev.major, xst->st_dev.minor); >> printf("Device: %-15s", buffer); >> if (xst->st_mask & XSTAT_INO) >> printf(" Inode: %-11llu", (unsigned long long) xst->st_ino); >> if (xst->st_mask & XSTAT_SIZE) >> printf(" Links: %-5u", xst->st_nlink); >> if (xst->st_mask & XSTAT_RDEV) >> printf(" Device type: %u,%u", >> xst->st_rdev.major, xst->st_rdev.minor); >> printf("\n"); >> >> if (xst->st_mask & XSTAT_MODE) >> printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c) ", >> xst->st_mode & 07777, >> ft, >> xst->st_mode & S_IRUSR ? 'r' : '-', >> xst->st_mode & S_IWUSR ? 'w' : '-', >> xst->st_mode & S_IXUSR ? 'x' : '-', >> xst->st_mode & S_IRGRP ? 'r' : '-', >> xst->st_mode & S_IWGRP ? 'w' : '-', >> xst->st_mode & S_IXGRP ? 'x' : '-', >> xst->st_mode & S_IROTH ? 'r' : '-', >> xst->st_mode & S_IWOTH ? 'w' : '-', >> xst->st_mode & S_IXOTH ? 'x' : '-'); >> if (xst->st_mask & XSTAT_UID) >> printf("Uid: %d \n", xst->st_uid); >> if (xst->st_mask & XSTAT_GID) >> printf("Gid: %u\n", xst->st_gid); >> >> if (xst->st_mask & XSTAT_ATIME) >> print_time("Access: ", &xst->st_atim); >> if (xst->st_mask & XSTAT_MTIME) >> print_time("Modify: ", &xst->st_mtim); >> if (xst->st_mask & XSTAT_CTIME) >> print_time("Change: ", &xst->st_ctim); >> if (xst->st_mask & XSTAT_BTIME) >> print_time("Create: ", &xst->st_btim); >> >> if (xst->st_mask & XSTAT_GEN) >> printf("Inode version: %llxh\n", (unsigned long long) xst->st_gen); >> if (xst->st_mask & XSTAT_VERSION) >> printf("Data version: %llxh\n", (unsigned long long) xst->st_version); >> >> if (xst->st_mask & XSTAT_IOC_FLAGS) { >> unsigned char bits; >> int loop, byte; >> >> static char flag_representation[32 + 1] = >> /* FS_IOC_GETFLAGS flags: */ >> "????????" /* 31-24 0x00000000-ff000000 */ >> "????ehTD" /* 23-16 0x00000000-00ff0000 */ >> "tj?IE?XZ" /* 15- 8 0x00000000-0000ff00 */ >> "AdaiScus" /* 7- 0 0x00000000-000000ff */ >> ; >> >> printf("Inode flags: %08x (", xst->st_ioc_flags); >> for (byte = 32 - 8; byte >= 0; byte -= 8) { >> bits = xst->st_ioc_flags >> byte; >> for (loop = 7; loop >= 0; loop--) { >> int bit = byte + loop; >> >> if (bits & 0x80) >> putchar(flag_representation[31 - bit]); >> else >> putchar('-'); >> bits <<= 1; >> } >> if (byte) >> putchar(' '); >> } >> printf(")\n"); >> } >> >> if (xst->st_information) { >> unsigned char bits; >> int loop, byte; >> >> static char info_representation[32 + 1] = >> /* XSTAT_INFO_ flags: */ >> "????????" /* 31-24 0x00000000-ff000000 */ >> "????????" /* 23-16 0x00000000-00ff0000 */ >> "??ASHRan" /* 15- 8 0x00000000-0000ff00 */ >> "dmorkfte" /* 7- 0 0x00000000-000000ff */ >> ; >> >> printf("Information: %08x (", xst->st_information); >> for (byte = 32 - 8; byte >= 0; byte -= 8) { >> bits = xst->st_information >> byte; >> for (loop = 7; loop >= 0; loop--) { >> int bit = byte + loop; >> >> if (bits & 0x80) >> putchar(info_representation[31 - bit]); >> else >> putchar('-'); >> bits <<= 1; >> } >> if (byte) >> putchar(' '); >> } >> printf(")\n"); >> } >> >> if (xst->st_mask & XSTAT_VOLUME_ID) { >> int loop; >> printf("Volume ID: "); >> for (loop = 0; loop < sizeof(xst->st_volume_id); loop++) { >> printf("%02x", xst->st_volume_id[loop]); >> if (loop == 7) >> printf("-"); >> } >> printf("\n"); >> } >> } >> >> void dump_hex(unsigned long long *data, int from, int to) >> { >> unsigned offset, print_offset = 1, col = 0; >> >> from /= 8; >> to = (to + 7) / 8; >> >> for (offset = from; offset < to; offset++) { >> if (print_offset) { >> printf("%04x: ", offset * 8); >> print_offset = 0; >> } >> printf("%016llx", data[offset]); >> col++; >> if ((col & 3) == 0) { >> printf("\n"); >> print_offset = 1; >> } else { >> printf(" "); >> } >> } >> >> if (!print_offset) >> printf("\n"); >> } >> >> int main(int argc, char **argv) >> { >> struct xstat xst; >> int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW; >> >> unsigned int mask = XSTAT_ALL_STATS; >> >> for (argv++; *argv; argv++) { >> if (strcmp(*argv, "-F") == 0) { >> atflag |= AT_FORCE_ATTR_SYNC; >> continue; >> } >> if (strcmp(*argv, "-L") == 0) { >> atflag &= ~AT_SYMLINK_NOFOLLOW; >> continue; >> } >> if (strcmp(*argv, "-O") == 0) { >> mask &= ~XSTAT_BASIC_STATS; >> continue; >> } >> if (strcmp(*argv, "-A") == 0) { >> atflag |= AT_NO_AUTOMOUNT; >> continue; >> } >> if (strcmp(*argv, "-R") == 0) { >> raw = 1; >> continue; >> } >> >> memset(&xst, 0xbf, sizeof(xst)); >> ret = xstat(AT_FDCWD, *argv, atflag, mask, &xst); >> printf("xstat(%s) = %d\n", *argv, ret); >> if (ret < 0) { >> perror(*argv); >> exit(1); >> } >> >> if (raw) >> dump_hex((unsigned long long *)&xst, 0, sizeof(xst)); >> >> dump_xstat(&xst); >> } >> return 0; >> } >> >> Just compile and run, passing it paths to the files you want to examine: >> >> [root@andromeda ~]# /tmp/xstat /proc/$$ >> xstat(/proc/2074) = 160 >> results=47ef >> Size: 0 Blocks: 0 IO Block: 1024 directory >> Device: 00:03 Inode: 9072 Links: 7 >> Access: (0555/dr-xr-xr-x) Uid: 0 >> Gid: 0 >> Access: 2010-07-14 16:50:46.609336272+0100 >> Modify: 2010-07-14 16:50:46.609336272+0100 >> Change: 2010-07-14 16:50:46.609336272+0100 >> Inode flags: 0000000100000000 (-------- -------- -------- -------S -------- -------- -------- --------) >> [root@andromeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm >> xstat(/afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm) = 160 >> results=77ef >> Size: 5413882 Blocks: 0 IO Block: 4096 regular file >> Device: 00:15 Inode: 2288 Links: 1 >> Access: (0644/-rw-r--r--) Uid: 75338 >> Gid: 0 >> Access: 2008-11-05 19:47:22.000000000+0000 >> Modify: 2008-11-05 19:47:22.000000000+0000 >> Change: 2008-11-05 19:47:22.000000000+0000 >> Inode version: 795h >> Data version: 2h >> Inode flags: 0000000800000000 (-------- -------- -------- ----r--- -------- -------- -------- --------) >> >> Signed-off-by: David Howells <dhowells@redhat.com> >> --- >> >> arch/x86/syscalls/syscall_32.tbl | 2 >> arch/x86/syscalls/syscall_64.tbl | 2 >> fs/stat.c | 350 +++++++++++++++++++++++++++++++++++--- >> include/linux/fcntl.h | 1 >> include/linux/fs.h | 4 >> include/linux/stat.h | 126 +++++++++++++- >> include/linux/syscalls.h | 7 + >> 7 files changed, 461 insertions(+), 31 deletions(-) >> >> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl >> index 29f9f05..980eb5a 100644 >> --- a/arch/x86/syscalls/syscall_32.tbl >> +++ b/arch/x86/syscalls/syscall_32.tbl >> @@ -355,3 +355,5 @@ >> 346 i386 setns sys_setns >> 347 i386 process_vm_readv sys_process_vm_readv compat_sys_process_vm_readv >> 348 i386 process_vm_writev sys_process_vm_writev compat_sys_process_vm_writev >> +349 i386 xstat sys_xstat >> +350 i386 fxstat sys_fxstat >> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl >> index dd29a9e..7ae24bb 100644 >> --- a/arch/x86/syscalls/syscall_64.tbl >> +++ b/arch/x86/syscalls/syscall_64.tbl >> @@ -318,6 +318,8 @@ >> 309 common getcpu sys_getcpu >> 310 64 process_vm_readv sys_process_vm_readv >> 311 64 process_vm_writev sys_process_vm_writev >> +312 common xstat sys_xstat >> +313 common fxstat sys_fxstat >> # >> # x32-specific system call numbers start at 512 to avoid cache impact >> # for native 64-bit operation. >> diff --git a/fs/stat.c b/fs/stat.c >> index c733dc5..af3ef33 100644 >> --- a/fs/stat.c >> +++ b/fs/stat.c >> @@ -18,8 +18,20 @@ >> #include <asm/uaccess.h> >> #include <asm/unistd.h> >> >> +/** >> + * generic_fillattr - Fill in the basic attributes from the inode struct >> + * @inode: Inode to use as the source >> + * @stat: Where to fill in the attributes >> + * >> + * Fill in the basic attributes in the kstat structure from data that's to be >> + * found on the VFS inode structure. This is the default if no getattr inode >> + * operation is supplied. >> + */ >> void generic_fillattr(struct inode *inode, struct kstat *stat) >> { >> + struct super_block *sb = inode->i_sb; >> + u32 x; >> + >> stat->dev = inode->i_sb->s_dev; >> stat->ino = inode->i_ino; >> stat->mode = inode->i_mode; >> @@ -27,17 +39,61 @@ void generic_fillattr(struct inode *inode, struct kstat *stat) >> stat->uid = inode->i_uid; >> stat->gid = inode->i_gid; >> stat->rdev = inode->i_rdev; >> - stat->size = i_size_read(inode); >> - stat->atime = inode->i_atime; >> stat->mtime = inode->i_mtime; >> stat->ctime = inode->i_ctime; >> - stat->blksize = (1 << inode->i_blkbits); >> + stat->size = i_size_read(inode); >> stat->blocks = inode->i_blocks; >> -} >> + stat->blksize = (1 << inode->i_blkbits); >> >> + stat->result_mask |= XSTAT_BASIC_STATS & ~XSTAT_RDEV; >> + if (IS_NOATIME(inode)) >> + stat->result_mask &= ~XSTAT_ATIME; >> + else >> + stat->atime = inode->i_atime; >> + >> + if (S_ISREG(stat->mode) && stat->nlink == 0) >> + stat->information |= XSTAT_INFO_TEMPORARY; >> + if (IS_AUTOMOUNT(inode)) >> + stat->information |= XSTAT_INFO_AUTOMOUNT; >> + if (IS_POSIXACL(inode)) >> + stat->information |= XSTAT_INFO_HAS_ACL; >> + >> + /* if unset, assume 1s granularity */ >> + stat->tv_granularity = sb->s_time_gran ?: 1000000000U; >> + >> + if (unlikely(S_ISBLK(stat->mode) || S_ISCHR(stat->mode))) >> + stat->result_mask |= XSTAT_RDEV; >> + >> + x = ((u32*)&stat->volume_id)[0] = ((u32*)&sb->s_volume_id)[0]; >> + x |= ((u32*)&stat->volume_id)[1] = ((u32*)&sb->s_volume_id)[1]; >> + x |= ((u32*)&stat->volume_id)[2] = ((u32*)&sb->s_volume_id)[2]; >> + x |= ((u32*)&stat->volume_id)[3] = ((u32*)&sb->s_volume_id)[3]; >> + if (x) >> + stat->result_mask |= XSTAT_VOLUME_ID; >> +} >> EXPORT_SYMBOL(generic_fillattr); >> >> -int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) >> +/** >> + * vfs_xgetattr - Get the basic and extra attributes of a file >> + * @mnt: The mountpoint to which the dentry belongs >> + * @dentry: The file of interest >> + * @stat: Where to return the statistics >> + * >> + * Ask the filesystem for a file's attributes. The caller must have preset >> + * stat->request_mask and stat->query_flags to indicate what they want. >> + * >> + * If the file is remote, the filesystem can be forced to update the attributes >> + * from the backing store by passing AT_FORCE_ATTR_SYNC in query_flags. >> + * >> + * Bits must have been set in stat->request_mask to indicate which attributes >> + * the caller wants retrieving. Any such attribute not requested may be >> + * returned anyway, but the value may be approximate, and, if remote, may not >> + * have been synchronised with the server. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_xgetattr(struct vfsmount *mnt, struct dentry *dentry, >> + struct kstat *stat) >> { >> struct inode *inode = dentry->d_inode; >> int retval; >> @@ -46,64 +102,184 @@ int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) >> if (retval) >> return retval; >> >> + stat->result_mask = 0; >> + stat->information = 0; >> + stat->ioc_flags = 0; >> if (inode->i_op->getattr) >> return inode->i_op->getattr(mnt, dentry, stat); >> >> generic_fillattr(inode, stat); >> return 0; >> } >> +EXPORT_SYMBOL(vfs_xgetattr); >> >> +/** >> + * vfs_getattr - Get the basic attributes of a file >> + * @mnt: The mountpoint to which the dentry belongs >> + * @dentry: The file of interest >> + * @stat: Where to return the statistics >> + * >> + * Ask the filesystem for a file's attributes. If remote, the filesystem isn't >> + * forced to update its files from the backing store. Only the basic set of >> + * attributes will be retrieved; anyone wanting more must use vfs_getxattr(), >> + * as must anyone who wants to force attributes to be sync'd with the server. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) >> +{ >> + stat->query_flags = 0; >> + stat->request_mask = XSTAT_BASIC_STATS; >> + return vfs_xgetattr(mnt, dentry, stat); >> +} >> EXPORT_SYMBOL(vfs_getattr); >> >> -int vfs_fstat(unsigned int fd, struct kstat *stat) >> +/** >> + * vfs_fxstat - Get basic and extra attributes by file descriptor >> + * @fd: The file descriptor refering to the file of interest >> + * @stat: The result structure to fill in. >> + * >> + * This function is a wrapper around vfs_xgetattr(). The main difference is >> + * that it uses a file descriptor to determine the file location. >> + * >> + * The caller must have preset stat->query_flags and stat->request_mask as for >> + * vfs_xgetattr(). >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_fxstat(unsigned int fd, struct kstat *stat) >> { >> struct file *f = fget(fd); >> int error = -EBADF; >> >> + if (stat->query_flags & ~KSTAT_QUERY_FLAGS) >> + return -EINVAL; >> if (f) { >> - error = vfs_getattr(f->f_path.mnt, f->f_path.dentry, stat); >> + error = vfs_xgetattr(f->f_path.mnt, f->f_path.dentry, stat); >> fput(f); >> } >> return error; >> } >> +EXPORT_SYMBOL(vfs_fxstat); >> + >> +/** >> + * vfs_fstat - Get basic attributes by file descriptor >> + * @fd: The file descriptor refering to the file of interest >> + * @stat: The result structure to fill in. >> + * >> + * This function is a wrapper around vfs_getattr(). The main difference is >> + * that it uses a file descriptor to determine the file location. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_fstat(unsigned int fd, struct kstat *stat) >> +{ >> + stat->query_flags = 0; >> + stat->request_mask = XSTAT_BASIC_STATS; >> + return vfs_fxstat(fd, stat); >> +} >> EXPORT_SYMBOL(vfs_fstat); >> >> -int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, >> - int flag) >> +/** >> + * vfs_xstat - Get basic and extra attributes by filename >> + * @dfd: A file descriptor representing the base dir for a relative filename >> + * @filename: The name of the file of interest >> + * @flags: Flags to control the query >> + * @stat: The result structure to fill in. >> + * >> + * This function is a wrapper around vfs_xgetattr(). The main difference is >> + * that it uses a filename and base directory to determine the file location. >> + * Additionally, the addition of AT_SYMLINK_NOFOLLOW to flags will prevent a >> + * symlink at the given name from being referenced. >> + * >> + * The caller must have preset stat->request_mask as for vfs_xgetattr(). The >> + * flags are also used to load up stat->query_flags. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_xstat(int dfd, const char __user *filename, int flags, >> + struct kstat *stat) >> { >> struct path path; >> - int error = -EINVAL; >> - int lookup_flags = 0; >> + int error, lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT; >> >> - if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | >> - AT_EMPTY_PATH)) != 0) >> - goto out; >> + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | >> + AT_EMPTY_PATH | KSTAT_QUERY_FLAGS)) != 0) >> + return -EINVAL; >> >> - if (!(flag & AT_SYMLINK_NOFOLLOW)) >> - lookup_flags |= LOOKUP_FOLLOW; >> - if (flag & AT_EMPTY_PATH) >> + if (flags & AT_SYMLINK_NOFOLLOW) >> + lookup_flags &= ~LOOKUP_FOLLOW; >> + if (flags & AT_NO_AUTOMOUNT) >> + lookup_flags &= ~LOOKUP_AUTOMOUNT; >> + if (flags & AT_EMPTY_PATH) >> lookup_flags |= LOOKUP_EMPTY; >> >> + stat->query_flags = flags & KSTAT_QUERY_FLAGS; >> error = user_path_at(dfd, filename, lookup_flags, &path); >> - if (error) >> - goto out; >> - >> - error = vfs_getattr(path.mnt, path.dentry, stat); >> - path_put(&path); >> -out: >> + if (!error) { >> + error = vfs_xgetattr(path.mnt, path.dentry, stat); >> + path_put(&path); >> + } >> return error; >> } >> +EXPORT_SYMBOL(vfs_xstat); >> + >> +/** >> + * vfs_fstatat - Get basic attributes by filename >> + * @dfd: A file descriptor representing the base dir for a relative filename >> + * @filename: The name of the file of interest >> + * @flags: Flags to control the query >> + * @stat: The result structure to fill in. >> + * >> + * This function is a wrapper around vfs_xstat(). The difference is that it >> + * preselects basic stats only. The flags are used to load up >> + * stat->query_flags in addition to indicating symlink handling during path >> + * resolution. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, >> + int flags) >> +{ >> + stat->request_mask = XSTAT_BASIC_STATS; >> + return vfs_xstat(dfd, filename, flags, stat); >> +} >> EXPORT_SYMBOL(vfs_fstatat); >> >> -int vfs_stat(const char __user *name, struct kstat *stat) >> +/** >> + * vfs_stat - Get basic attributes by filename >> + * @filename: The name of the file of interest >> + * @stat: The result structure to fill in. >> + * >> + * This function is a wrapper around vfs_xstat(). The difference is that it >> + * preselects basic stats only, terminal symlinks are followed regardless and a >> + * remote filesystem can't be forced to query the server. If such is desired, >> + * vfs_xstat() should be used instead. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> +int vfs_stat(const char __user *filename, struct kstat *stat) >> { >> - return vfs_fstatat(AT_FDCWD, name, stat, 0); >> + stat->request_mask = XSTAT_BASIC_STATS; >> + return vfs_xstat(AT_FDCWD, filename, 0, stat); >> } >> EXPORT_SYMBOL(vfs_stat); >> >> +/** >> + * vfs_stat - Get basic attributes by filename, without following terminal symlink >> + * @filename: The name of the file of interest >> + * @stat: The result structure to fill in. >> + * >> + * This function is a wrapper around vfs_xstat(). The difference is that it >> + * preselects basic stats only, terminal symlinks are note followed regardless >> + * and a remote filesystem can't be forced to query the server. If such is >> + * desired, vfs_xstat() should be used instead. >> + * >> + * 0 will be returned on success, and a -ve error code if unsuccessful. >> + */ >> int vfs_lstat(const char __user *name, struct kstat *stat) >> { >> - return vfs_fstatat(AT_FDCWD, name, stat, AT_SYMLINK_NOFOLLOW); >> + return vfs_xstat(AT_FDCWD, name, AT_SYMLINK_NOFOLLOW, stat); >> } >> EXPORT_SYMBOL(vfs_lstat); >> >> @@ -118,7 +294,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta >> { >> static int warncount = 5; >> struct __old_kernel_stat tmp; >> - >> + >> if (warncount > 0) { >> warncount--; >> printk(KERN_WARNING "VFS: Warning: %s using old stat() call. Recompile your binary.\n", >> @@ -143,7 +319,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta >> #if BITS_PER_LONG == 32 >> if (stat->size > MAX_NON_LFS) >> return -EOVERFLOW; >> -#endif >> +#endif >> tmp.st_size = stat->size; >> tmp.st_atime = stat->atime.tv_sec; >> tmp.st_mtime = stat->mtime.tv_sec; >> @@ -225,7 +401,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf) >> #if BITS_PER_LONG == 32 >> if (stat->size > MAX_NON_LFS) >> return -EOVERFLOW; >> -#endif >> +#endif >> tmp.st_size = stat->size; >> tmp.st_atime = stat->atime.tv_sec; >> tmp.st_mtime = stat->mtime.tv_sec; >> @@ -412,6 +588,122 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, const char __user *, filename, >> } >> #endif /* __ARCH_WANT_STAT64 */ >> >> +/* >> + * Get the xstat parameters if supplied >> + */ >> +static int xstat_get_params(unsigned int mask, struct xstat __user *buffer, >> + struct kstat *stat) >> +{ >> + memset(stat, 0xde, sizeof(*stat)); // DEBUGGING >> + >> + if (!access_ok(VERIFY_WRITE, buffer, sizeof(*buffer))) >> + return -EFAULT; >> + >> + stat->request_mask = mask & XSTAT_ALL_STATS; >> + stat->result_mask = 0; >> + return 0; >> +} >> + >> +/* >> + * Set the xstat results. >> + * >> + * If the buffer size was 0, we just return the size of the buffer needed to >> + * return the full result. >> + * >> + * If bufsize indicates a buffer of insufficient size to hold the full result, >> + * we return -E2BIG. >> + * >> + * Otherwise we copy the extended stats to userspace and return the amount of >> + * data written into the buffer (or -EFAULT). >> + */ >> +static long xstat_set_result(struct kstat *stat, struct xstat __user *buffer) >> +{ >> + u32 mask = stat->result_mask, gran = stat->tv_granularity; >> + >> +#define __put_timestamp(kts, uts) ( \ >> + __put_user(kts.tv_sec, uts.tv_sec ) || \ >> + __put_user(kts.tv_nsec, uts.tv_nsec ) || \ >> + __put_user(gran, uts.tv_granularity )) >> + >> + /* clear out anything we're not returning */ >> + if (!(mask & XSTAT_IOC_FLAGS)) >> + stat->ioc_flags = 0; >> + if (!(mask & XSTAT_BTIME)) >> + memset(&stat->btime, 0, sizeof(stat->btime)); >> + if (!(mask & XSTAT_GEN)) >> + stat->gen = 0; >> + if (!(mask & XSTAT_VERSION)) >> + stat->version = 0; >> + if (!(mask & XSTAT_VOLUME_ID)) >> + memset(&stat->volume_id, 0, sizeof(stat->volume_id)); >> + >> + /* transfer the results */ >> + if (__put_user(mask, &buffer->st_mask ) || >> + __put_user(stat->mode, &buffer->st_mode ) || >> + __put_user(stat->nlink, &buffer->st_nlink ) || >> + __put_user(stat->uid, &buffer->st_uid ) || >> + __put_user(stat->gid, &buffer->st_gid ) || >> + __put_user(stat->information, &buffer->st_information ) || >> + __put_user(stat->ioc_flags, &buffer->st_ioc_flags ) || >> + __put_user(stat->blksize, &buffer->st_blksize ) || >> + __put_user(MAJOR(stat->rdev), &buffer->st_rdev.major ) || >> + __put_user(MINOR(stat->rdev), &buffer->st_rdev.minor ) || >> + __put_user(MAJOR(stat->dev), &buffer->st_dev.major ) || >> + __put_user(MINOR(stat->dev), &buffer->st_dev.minor ) || >> + __put_timestamp(stat->atime, &buffer->st_atime ) || >> + __put_timestamp(stat->btime, &buffer->st_btime ) || >> + __put_timestamp(stat->ctime, &buffer->st_ctime ) || >> + __put_timestamp(stat->mtime, &buffer->st_mtime ) || >> + __put_user(stat->ino, &buffer->st_ino ) || >> + __put_user(stat->size, &buffer->st_size ) || >> + __put_user(stat->blocks, &buffer->st_blocks ) || >> + __put_user(stat->gen, &buffer->st_gen ) || >> + __put_user(stat->version, &buffer->st_version ) || >> + __copy_to_user(&buffer->st_volume_id, &stat->volume_id, >> + sizeof(buffer->st_volume_id) ) || >> + __clear_user(&buffer->__spares, sizeof(buffer->__spares))) >> + return -EFAULT; >> + return 0; >> +} >> + >> +/* >> + * System call to get extended stats by path >> + */ >> +SYSCALL_DEFINE5(xstat, >> + int, dfd, const char __user *, filename, unsigned, flags, >> + unsigned int, mask, struct xstat __user *, buffer) >> +{ >> + struct kstat stat; >> + int error; >> + >> + error = xstat_get_params(mask, buffer, &stat); >> + if (error != 0) >> + return error; >> + error = vfs_xstat(dfd, filename, flags, &stat); >> + if (error) >> + return error; >> + return xstat_set_result(&stat, buffer); >> +} >> + >> +/* >> + * System call to get extended stats by file descriptor >> + */ >> +SYSCALL_DEFINE4(fxstat, unsigned int, fd, unsigned int, flags, >> + unsigned int, mask, struct xstat __user *, buffer) >> +{ >> + struct kstat stat; >> + int error; >> + >> + error = xstat_get_params(mask, buffer, &stat); >> + if (error < 0) >> + return error; >> + stat.query_flags = flags; >> + error = vfs_fxstat(fd, &stat); >> + if (error) >> + return error; >> + return xstat_set_result(&stat, buffer); >> +} >> + >> /* Caller is here responsible for sufficient locking (ie. inode->i_lock) */ >> void __inode_add_bytes(struct inode *inode, loff_t bytes) >> { >> diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h >> index f550f89..faa9e5d 100644 >> --- a/include/linux/fcntl.h >> +++ b/include/linux/fcntl.h >> @@ -47,6 +47,7 @@ >> #define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ >> #define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */ >> #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */ >> +#define AT_FORCE_ATTR_SYNC 0x2000 /* Force the attributes to be sync'd with the server */ >> >> #ifdef __KERNEL__ >> >> diff --git a/include/linux/fs.h b/include/linux/fs.h >> index 8de6755..ec6c62e 100644 >> --- a/include/linux/fs.h >> +++ b/include/linux/fs.h >> @@ -1467,6 +1467,7 @@ struct super_block { >> >> char s_id[32]; /* Informational name */ >> u8 s_uuid[16]; /* UUID */ >> + unsigned char s_volume_id[16]; /* Volume identifier */ >> >> void *s_fs_info; /* Filesystem private info */ >> unsigned int s_max_links; >> @@ -2470,6 +2471,7 @@ extern const struct inode_operations page_symlink_inode_operations; >> extern int generic_readlink(struct dentry *, char __user *, int); >> extern void generic_fillattr(struct inode *, struct kstat *); >> extern int vfs_getattr(struct vfsmount *, struct dentry *, struct kstat *); >> +extern int vfs_xgetattr(struct vfsmount *, struct dentry *, struct kstat *); >> void __inode_add_bytes(struct inode *inode, loff_t bytes); >> void inode_add_bytes(struct inode *inode, loff_t bytes); >> void inode_sub_bytes(struct inode *inode, loff_t bytes); >> @@ -2482,6 +2484,8 @@ extern int vfs_stat(const char __user *, struct kstat *); >> extern int vfs_lstat(const char __user *, struct kstat *); >> extern int vfs_fstat(unsigned int, struct kstat *); >> extern int vfs_fstatat(int , const char __user *, struct kstat *, int); >> +extern int vfs_xstat(int, const char __user *, int, struct kstat *); >> +extern int vfs_xfstat(unsigned int, struct kstat *); >> >> extern int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, >> unsigned long arg); >> diff --git a/include/linux/stat.h b/include/linux/stat.h >> index 611c398..0ff561a 100644 >> --- a/include/linux/stat.h >> +++ b/include/linux/stat.h >> @@ -3,6 +3,7 @@ >> >> #ifdef __KERNEL__ >> >> +#include <linux/types.h> >> #include <asm/stat.h> >> >> #endif >> @@ -46,6 +47,117 @@ >> >> #endif >> >> +/* >> + * Query request/result mask >> + * >> + * Bits should be set in request_mask to request particular items when calling >> + * xstat() or fxstat(). >> + * >> + * The bits in st_mask may or may not be set upon return, in part depending on >> + * what was set in the mask argument: >> + * >> + * - if not available at all, the bit will be cleared before returning and the >> + * field will be cleared; otherwise, >> + * >> + * - if AT_FORCE_ATTR_SYNC is set, then the datum will be synchronised to the >> + * server and the field and bit will be set on return; otherwise, >> + * >> + * - if explicitly requested, the datum will be synchronised to a server or >> + * other medium if out of date before being returned, and the bit will be set >> + * on return; otherwise, >> + * >> + * - if not requested, but available in approximate form without any effort, it >> + * will be filled in anyway, and the bit will be set upon return (it might >> + * not be up to date, however, and no attempt will be made to synchronise the >> + * internal state first); otherwise, >> + * >> + * - the field and the bit will be cleared before returning. >> + * >> + * Items in XSTAT_BASIC_STATS may be marked unavailable on return, but they >> + * will have a value installed for compatibility purposes so that stat() and >> + * co. can be emulated in userspace. >> + */ >> +#define XSTAT_MODE 0x00000001U /* want/got st_mode */ >> +#define XSTAT_NLINK 0x00000002U /* want/got st_nlink */ >> +#define XSTAT_UID 0x00000004U /* want/got st_uid */ >> +#define XSTAT_GID 0x00000008U /* want/got st_gid */ >> +#define XSTAT_RDEV 0x00000010U /* want/got st_rdev */ >> +#define XSTAT_ATIME 0x00000020U /* want/got st_atime */ >> +#define XSTAT_MTIME 0x00000040U /* want/got st_mtime */ >> +#define XSTAT_CTIME 0x00000080U /* want/got st_ctime */ >> +#define XSTAT_INO 0x00000100U /* want/got st_ino */ >> +#define XSTAT_SIZE 0x00000200U /* want/got st_size */ >> +#define XSTAT_BLOCKS 0x00000400U /* want/got st_blocks */ >> +#define XSTAT_BASIC_STATS 0x000007ffU /* the stuff in the normal stat struct */ >> +#define XSTAT_IOC_FLAGS 0x00000800U /* want/got FS_IOC_GETFLAGS */ >> +#define XSTAT_BTIME 0x00001000U /* want/got st_btime */ >> +#define XSTAT_GEN 0x00002000U /* want/got st_gen */ >> +#define XSTAT_VERSION 0x00004000U /* want/got st_version */ >> +#define XSTAT_VOLUME_ID 0x00008000U /* want/got st_volume_id */ >> +#define XSTAT_ALL_STATS 0x0000ffffU /* all supported stats */ >> + >> +/* >> + * Extended stat structures >> + */ >> +struct xstat_dev { >> + uint32_t major, minor; >> +}; >> + >> +struct xstat_time { >> + int64_t tv_sec; >> + uint32_t tv_nsec; >> + uint32_t tv_granularity; /* time granularity (in nS) */ >> +}; >> + >> +struct xstat { >> + uint32_t st_mask; /* what results were written */ >> + uint32_t st_mode; /* file mode */ >> + uint32_t st_nlink; /* number of hard links */ >> + uint32_t st_uid; /* user ID of owner */ >> + uint32_t st_gid; /* group ID of owner */ >> + uint32_t st_information; /* information about the file */ >> + uint32_t st_ioc_flags; /* as FS_IOC_GETFLAGS */ >> + uint32_t st_blksize; /* optimal size for filesystem I/O */ >> + struct xstat_dev st_rdev; /* device ID of special file */ >> + struct xstat_dev st_dev; /* ID of device containing file */ >> + struct xstat_time st_atime; /* last access time */ >> + struct xstat_time st_btime; /* file creation time */ >> + struct xstat_time st_ctime; /* last attribute change time */ >> + struct xstat_time st_mtime; /* last data modification time */ >> + uint64_t st_ino; /* inode number */ >> + uint64_t st_size; /* file size */ >> + uint64_t st_blocks; /* number of 512-byte blocks allocated */ >> + uint64_t st_gen; /* inode generation number */ >> + uint64_t st_version; /* data version number */ >> + uint8_t st_volume_id[16]; /* volume identifier */ >> + uint64_t __spares[11]; /* spare space for future expansion */ >> +}; >> + >> +/* >> + * Flags to be found in st_information >> + * >> + * These give information about the features or the state of a file that might >> + * be of use to ordinary userspace programs such as GUIs or ls rather than >> + * specialised tools. >> + * >> + * Additional information may be found in st_ioc_flags and we try not to >> + * overlap with it. >> + */ >> +#define XSTAT_INFO_ENCRYPTED 0x00000001U /* File is encrypted */ >> +#define XSTAT_INFO_TEMPORARY 0x00000002U /* File is temporary (NTFS/CIFS) */ >> +#define XSTAT_INFO_FABRICATED 0x00000004U /* File was made up by filesystem */ >> +#define XSTAT_INFO_KERNEL_API 0x00000008U /* File is kernel API (eg: procfs/sysfs) */ >> +#define XSTAT_INFO_REMOTE 0x00000010U /* File is remote */ >> +#define XSTAT_INFO_OFFLINE 0x00000020U /* File is offline (CIFS) */ >> +#define XSTAT_INFO_AUTOMOUNT 0x00000040U /* Dir is automount trigger */ >> +#define XSTAT_INFO_AUTODIR 0x00000080U /* Dir provides unlisted automounts */ >> +#define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U /* File has non-system ownership details */ >> +#define XSTAT_INFO_HAS_ACL 0x00000200U /* File has an ACL of some sort */ >> +#define XSTAT_INFO_REPARSE_POINT 0x00000400U /* File is reparse point (NTFS/CIFS) */ >> +#define XSTAT_INFO_HIDDEN 0x00000800U /* File is marked hidden (DOS+) */ >> +#define XSTAT_INFO_SYSTEM 0x00001000U /* File is marked system (DOS+) */ >> +#define XSTAT_INFO_ARCHIVE 0x00002000U /* File is marked archive (DOS+) */ >> + >> #ifdef __KERNEL__ >> #define S_IRWXUGO (S_IRWXU|S_IRWXG|S_IRWXO) >> #define S_IALLUGO (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO) >> @@ -60,6 +172,12 @@ >> #include <linux/time.h> >> >> struct kstat { >> + u32 query_flags; /* operational flags */ >> +#define KSTAT_QUERY_FLAGS (AT_FORCE_ATTR_SYNC) >> + u32 request_mask; /* what fields the user asked for */ >> + u32 result_mask; /* what fields the user got */ >> + u32 information; >> + u32 ioc_flags; /* inode flags (FS_IOC_GETFLAGS) */ >> u64 ino; >> dev_t dev; >> umode_t mode; >> @@ -67,14 +185,18 @@ struct kstat { >> uid_t uid; >> gid_t gid; >> dev_t rdev; >> + unsigned int tv_granularity; /* granularity of times (in nS) */ >> loff_t size; >> - struct timespec atime; >> + struct timespec atime; >> struct timespec mtime; >> struct timespec ctime; >> + struct timespec btime; /* file creation time */ >> unsigned long blksize; >> unsigned long long blocks; >> + u64 gen; /* inode generation */ >> + u64 version; /* data version */ >> + unsigned char volume_id[16]; /* volume identifier */ >> }; >> >> #endif >> - >> #endif >> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h >> index 3de3acb..ff9f8d9 100644 >> --- a/include/linux/syscalls.h >> +++ b/include/linux/syscalls.h >> @@ -45,6 +45,8 @@ struct shmid_ds; >> struct sockaddr; >> struct stat; >> struct stat64; >> +struct xstat_parameters; >> +struct xstat; >> struct statfs; >> struct statfs64; >> struct __sysctl_args; >> @@ -858,4 +860,9 @@ asmlinkage long sys_process_vm_writev(pid_t pid, >> unsigned long riovcnt, >> unsigned long flags); >> >> +asmlinkage long sys_xstat(int dfd, const char __user *path, unsigned flags, >> + unsigned mask, struct xstat __user *buffer); >> +asmlinkage long sys_fxstat(unsigned fd, unsigned flags, >> + unsigned mask, struct xstat __user *buffer); >> + >> #endif >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-24 21:29 ` J. Bruce Fields (?) (?) @ 2012-04-26 13:45 ` David Howells [not found] ` <18765.1335447954-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> -1 siblings, 1 reply; 144+ messages in thread From: David Howells @ 2012-04-26 13:45 UTC (permalink / raw) To: Steve French Cc: dhowells, J. Bruce Fields, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Steve French <smfrench@gmail.com> wrote: > I also would prefer that we simply treat the time granularity as part > of the superblock (mounted volume) ie returned on fstat rather than on > every stat of the filesystem. For cifs mounts we could conceivably > have different time granularity (1 or 2 second) on mounts to old > servers rather than 100 nanoseconds. The question is whether you want to have to do a statfs in addition to a stat? I suppose you can potentially cache the statfs based on device number. That said, there are cases where caching filesystem-level info based on i_dev doesn't work. OpenAFS springs to mind as that only has one superblock and thus one set of device numbers, but keeps all the inodes for all the different volumes it may have mounted there. I don't know whether this would be a problem for CIFS too - say on a windows server you fabricate P:, for example, by joining together several filesystems (with junctions?). How does this appear on a Linux client when it steps from one filesystem to another within a mounted share? David ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <18765.1335447954-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-26 13:45 ` David Howells @ 2012-04-26 14:28 ` J. Bruce Fields 0 siblings, 0 replies; 144+ messages in thread From: J. Bruce Fields @ 2012-04-26 14:28 UTC (permalink / raw) To: David Howells Cc: Steve French, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Thu, Apr 26, 2012 at 02:45:54PM +0100, David Howells wrote: > Steve French <smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > I also would prefer that we simply treat the time granularity as part > > of the superblock (mounted volume) ie returned on fstat rather than on > > every stat of the filesystem. For cifs mounts we could conceivably > > have different time granularity (1 or 2 second) on mounts to old > > servers rather than 100 nanoseconds. > > The question is whether you want to have to do a statfs in addition to a stat? > I suppose you can potentially cache the statfs based on device number. > > That said, there are cases where caching filesystem-level info based on i_dev > doesn't work. OpenAFS springs to mind as that only has one superblock and > thus one set of device numbers, but keeps all the inodes for all the different > volumes it may have mounted there. > > I don't know whether this would be a problem for CIFS too - say on a windows > server you fabricate P:, for example, by joining together several filesystems > (with junctions?). How does this appear on a Linux client when it steps from > one filesystem to another within a mounted share? In the NFS case we do try to preserve filesystem boundaries as well as we can--the protocol has an fsid field and the client creates a new mount each time it sees it change. And the protocol defines time_delta as a per-filesystem attribute (though, somewhat hilariously, there's also a per-filesystem "homogeneous" attribute that a server can clear to indicate the per-filesystem attributes might actually vary within the filesystem.) --b. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available @ 2012-04-26 14:28 ` J. Bruce Fields 0 siblings, 0 replies; 144+ messages in thread From: J. Bruce Fields @ 2012-04-26 14:28 UTC (permalink / raw) To: David Howells Cc: Steve French, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 26, 2012 at 02:45:54PM +0100, David Howells wrote: > Steve French <smfrench@gmail.com> wrote: > > > I also would prefer that we simply treat the time granularity as part > > of the superblock (mounted volume) ie returned on fstat rather than on > > every stat of the filesystem. For cifs mounts we could conceivably > > have different time granularity (1 or 2 second) on mounts to old > > servers rather than 100 nanoseconds. > > The question is whether you want to have to do a statfs in addition to a stat? > I suppose you can potentially cache the statfs based on device number. > > That said, there are cases where caching filesystem-level info based on i_dev > doesn't work. OpenAFS springs to mind as that only has one superblock and > thus one set of device numbers, but keeps all the inodes for all the different > volumes it may have mounted there. > > I don't know whether this would be a problem for CIFS too - say on a windows > server you fabricate P:, for example, by joining together several filesystems > (with junctions?). How does this appear on a Linux client when it steps from > one filesystem to another within a mounted share? In the NFS case we do try to preserve filesystem boundaries as well as we can--the protocol has an fsid field and the client creates a new mount each time it sees it change. And the protocol defines time_delta as a per-filesystem attribute (though, somewhat hilariously, there's also a per-filesystem "homogeneous" attribute that a server can clear to indicate the per-filesystem attributes might actually vary within the filesystem.) --b. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-26 14:28 ` J. Bruce Fields @ 2012-04-26 17:06 ` Steve French -1 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-26 17:06 UTC (permalink / raw) To: J. Bruce Fields Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, linux-api, libc-alpha On Thu, Apr 26, 2012 at 9:28 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > On Thu, Apr 26, 2012 at 02:45:54PM +0100, David Howells wrote: >> Steve French <smfrench@gmail.com> wrote: >> >> > I also would prefer that we simply treat the time granularity as part >> > of the superblock (mounted volume) ie returned on fstat rather than on >> > every stat of the filesystem. For cifs mounts we could conceivably >> > have different time granularity (1 or 2 second) on mounts to old >> > servers rather than 100 nanoseconds. >> >> The question is whether you want to have to do a statfs in addition to a stat? >> I suppose you can potentially cache the statfs based on device number. >> >> That said, there are cases where caching filesystem-level info based on i_dev >> doesn't work. OpenAFS springs to mind as that only has one superblock and >> thus one set of device numbers, but keeps all the inodes for all the different >> volumes it may have mounted there. >> >> I don't know whether this would be a problem for CIFS too - say on a windows >> server you fabricate P:, for example, by joining together several filesystems >> (with junctions?). How does this appear on a Linux client when it steps from >> one filesystem to another within a mounted share? > > In the NFS case we do try to preserve filesystem boundaries as well as > we can--the protocol has an fsid field and the client creates a new > mount each time it sees it change. And the protocol defines time_delta > as a per-filesystem attribute (though, somewhat hilariously, there's > also a per-filesystem "homogeneous" attribute that a server can clear to > indicate the per-filesystem attributes might actually vary within the > filesystem.) Thank you for reminding me, I need to look at this case more ... although cifs creates implicit submounts (as we traverse DFS referrals) there are probably cases where we need to do the same thing as NFS and look at the fsid so we don't run into a Windows server exporting something with a "junction" (e.g. directory redirection to a DVD drive for example) and thus cross file system volume boundaries. -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available @ 2012-04-26 17:06 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-26 17:06 UTC (permalink / raw) To: J. Bruce Fields Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, linux-api, libc-alpha On Thu, Apr 26, 2012 at 9:28 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > On Thu, Apr 26, 2012 at 02:45:54PM +0100, David Howells wrote: >> Steve French <smfrench@gmail.com> wrote: >> >> > I also would prefer that we simply treat the time granularity as part >> > of the superblock (mounted volume) ie returned on fstat rather than on >> > every stat of the filesystem. For cifs mounts we could conceivably >> > have different time granularity (1 or 2 second) on mounts to old >> > servers rather than 100 nanoseconds. >> >> The question is whether you want to have to do a statfs in addition to a stat? >> I suppose you can potentially cache the statfs based on device number. >> >> That said, there are cases where caching filesystem-level info based on i_dev >> doesn't work. OpenAFS springs to mind as that only has one superblock and >> thus one set of device numbers, but keeps all the inodes for all the different >> volumes it may have mounted there. >> >> I don't know whether this would be a problem for CIFS too - say on a windows >> server you fabricate P:, for example, by joining together several filesystems >> (with junctions?). How does this appear on a Linux client when it steps from >> one filesystem to another within a mounted share? > > In the NFS case we do try to preserve filesystem boundaries as well as > we can--the protocol has an fsid field and the client creates a new > mount each time it sees it change. And the protocol defines time_delta > as a per-filesystem attribute (though, somewhat hilariously, there's > also a per-filesystem "homogeneous" attribute that a server can clear to > indicate the per-filesystem attributes might actually vary within the > filesystem.) Thank you for reminding me, I need to look at this case more ... although cifs creates implicit submounts (as we traverse DFS referrals) there are probably cases where we need to do the same thing as NFS and look at the fsid so we don't run into a Windows server exporting something with a "junction" (e.g. directory redirection to a DVD drive for example) and thus cross file system volume boundaries. -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-19 14:06 ` David Howells ` (2 preceding siblings ...) (?) @ 2012-04-26 13:32 ` David Howells [not found] ` <18195.1335447156-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> -1 siblings, 1 reply; 144+ messages in thread From: David Howells @ 2012-04-26 13:32 UTC (permalink / raw) To: Andreas Dilger Cc: dhowells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Andreas Dilger <adilger@dilger.ca> wrote: > > The idea was initially proposed as a set of xattrs that could be > > retrieved with getxattr(), but the general preferance proved to be > > for new syscalls with an extended stat structure. > > I would comment that it was the opposite. It was originally a > stat()-like extension that degraded into a messy getxattr() mess. Ummm... No, my first attempt was definitely through getxattr(). You even commented on it. > > The fields in struct xstat come in a number of classes: > > > > (0) st_dev, st_blksize, st_information. > > > > These are local data and are always available. > > For the extra two bits it would cost us, I don't think st_blksize > and st_information should always be returned. Fair enough. > st_blksize may be variable for a distributed filesystem, I wonder if there's a way to make this explicit - or is it something that if the bit isn't set, you can't use the value in st_blksize. I wonder if this value always has to be non-zero to make sure existing stat() doesn't explode. > and some of the fields in st_information (offline) may not be free to access > either. True. David ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <18195.1335447156-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-26 13:32 ` David Howells @ 2012-04-27 0:51 ` Dave Chinner 0 siblings, 0 replies; 144+ messages in thread From: Dave Chinner @ 2012-04-27 0:51 UTC (permalink / raw) To: David Howells Cc: Andreas Dilger, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Thu, Apr 26, 2012 at 02:32:36PM +0100, David Howells wrote: > Andreas Dilger <adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org> wrote: > > st_blksize may be variable for a distributed filesystem, It can be variable for local filesystems, too. XFS will vary the block size based on the configuration of the inode. e.g. if there is an extent allocation size hint on the inode, or it's on the realtime device, and so on. There is no guarantee that from file to file that it is constant. > I wonder if there's a way to make this explicit - or is it something that if > the bit isn't set, you can't use the value in st_blksize. > I wonder if this > value always has to be non-zero to make sure existing stat() doesn't explode. More likely it probably needs to be non-zero to prevent applications doing division by block size from exploding... ;) Cheers, Dave. -- Dave Chinner david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available @ 2012-04-27 0:51 ` Dave Chinner 0 siblings, 0 replies; 144+ messages in thread From: Dave Chinner @ 2012-04-27 0:51 UTC (permalink / raw) To: David Howells Cc: Andreas Dilger, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 26, 2012 at 02:32:36PM +0100, David Howells wrote: > Andreas Dilger <adilger@dilger.ca> wrote: > > st_blksize may be variable for a distributed filesystem, It can be variable for local filesystems, too. XFS will vary the block size based on the configuration of the inode. e.g. if there is an extent allocation size hint on the inode, or it's on the realtime device, and so on. There is no guarantee that from file to file that it is constant. > I wonder if there's a way to make this explicit - or is it something that if > the bit isn't set, you can't use the value in st_blksize. > I wonder if this > value always has to be non-zero to make sure existing stat() doesn't explode. More likely it probably needs to be non-zero to prevent applications doing division by block size from exploding... ;) Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-27 0:51 ` Dave Chinner @ 2012-04-27 3:11 ` Andreas Dilger -1 siblings, 0 replies; 144+ messages in thread From: Andreas Dilger @ 2012-04-27 3:11 UTC (permalink / raw) To: Dave Chinner Cc: David Howells, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On 2012-04-26, at 6:51 PM, Dave Chinner wrote: > On Thu, Apr 26, 2012 at 02:32:36PM +0100, David Howells wrote: >> I wonder if there's a way to make this explicit - or is it something that if the bit isn't set, you can't use the value in st_blksize. >> I wonder if this value always has to be non-zero to make sure existing >> stat() doesn't explode. > > More likely it probably needs to be non-zero to prevent applications > doing division by block size from exploding... ;) Right, and any application which knows it needs the blocksize should also be requesting it when using the statxat() (or whatever) syscall. Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available @ 2012-04-27 3:11 ` Andreas Dilger 0 siblings, 0 replies; 144+ messages in thread From: Andreas Dilger @ 2012-04-27 3:11 UTC (permalink / raw) To: Dave Chinner Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On 2012-04-26, at 6:51 PM, Dave Chinner wrote: > On Thu, Apr 26, 2012 at 02:32:36PM +0100, David Howells wrote: >> I wonder if there's a way to make this explicit - or is it something that if the bit isn't set, you can't use the value in st_blksize. >> I wonder if this value always has to be non-zero to make sure existing >> stat() doesn't explode. > > More likely it probably needs to be non-zero to prevent applications > doing division by block size from exploding... ;) Right, and any application which knows it needs the blocksize should also be requesting it when using the statxat() (or whatever) syscall. Cheers, Andreas ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-19 14:06 ` David Howells ` (3 preceding siblings ...) (?) @ 2012-04-26 13:40 ` David Howells [not found] ` <18533.1335447617-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-04-30 16:27 ` Ben Hutchings -1 siblings, 2 replies; 144+ messages in thread From: David Howells @ 2012-04-26 13:40 UTC (permalink / raw) To: J. Bruce Fields Cc: dhowells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha J. Bruce Fields <bfields@fieldses.org> wrote: > > (11) Include granularity fields in the time data to indicate the > > granularity of each of the times (NFSv4 time_delta) [Steve French]. > > It looks like you're including this with *each* time? But surely > there's no filesystem with different granularity (say) for ctime than > for mtime. I put it in each time struct to use up the hole there. I could, I suppose, split tv_sec from tv_nsec to get rid of the holes and then put the granularity separately. That means that someone who wanted both the tv_sec and tv_nsec would have to fish them out separately, but that's probably okay. I could even make the granularity bigger then, to allow for the possibility of having a granularity >4s, but I don't know of anywhere that requires a gran >2. > Also, nfsd will want only one time_delta, not one for each time. time_delta? Is that the same as granularity? > Note also we need to document carefully what this means: I think it > should be the granularity that the filesystem is capable of > representing, but people are sometimes surprised to find out that the > actual time source is usually more coarse-grained than that. Yeah, but the latter is something you may not be able to determine, and may indeed change over time (say someone updates the server kernel to one with a more fine-grained software clock). Also, for a network fs, it may depend on the client that happened to set that time last. David ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <18533.1335447617-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-26 13:40 ` David Howells @ 2012-04-26 14:23 ` J. Bruce Fields 2012-04-30 16:27 ` Ben Hutchings 1 sibling, 0 replies; 144+ messages in thread From: J. Bruce Fields @ 2012-04-26 14:23 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Thu, Apr 26, 2012 at 02:40:17PM +0100, David Howells wrote: > J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote: > > > > (11) Include granularity fields in the time data to indicate the > > > granularity of each of the times (NFSv4 time_delta) [Steve French]. > > > > It looks like you're including this with *each* time? But surely > > there's no filesystem with different granularity (say) for ctime than > > for mtime. > > I put it in each time struct to use up the hole there. I could, I suppose, > split tv_sec from tv_nsec to get rid of the holes and then put the granularity > separately. That means that someone who wanted both the tv_sec and tv_nsec > would have to fish them out separately, but that's probably okay. > > I could even make the granularity bigger then, to allow for the possibility of > having a granularity >4s, but I don't know of anywhere that requires a gran >2. > > > Also, nfsd will want only one time_delta, not one for each time. > > time_delta? Is that the same as granularity? Right, sorry, that's just the NFS word for the same thing. So my whine here is just that most callers only want to know one number and we're giving them three. Whatever, they can just pick one. It feels a little ugly, but feel free to ignore my nitpicking.... (Though as Steve French asked: could we add this to statfs (or something similar) instead?) > > > Note also we need to document carefully what this means: I think it > > should be the granularity that the filesystem is capable of > > representing, but people are sometimes surprised to find out that the > > actual time source is usually more coarse-grained than that. > > Yeah, but the latter is something you may not be able to determine, and may > indeed change over time (say someone updates the server kernel to one with a > more fine-grained software clock). Also, for a network fs, it may depend on > the client that happened to set that time last. Yep, agreed, the granularity should be what the filesystem can store, we should just make sure that statement makes it into any eventual man pages or other documentation, since it does seem to surprise people. --b. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available @ 2012-04-26 14:23 ` J. Bruce Fields 0 siblings, 0 replies; 144+ messages in thread From: J. Bruce Fields @ 2012-04-26 14:23 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 26, 2012 at 02:40:17PM +0100, David Howells wrote: > J. Bruce Fields <bfields@fieldses.org> wrote: > > > > (11) Include granularity fields in the time data to indicate the > > > granularity of each of the times (NFSv4 time_delta) [Steve French]. > > > > It looks like you're including this with *each* time? But surely > > there's no filesystem with different granularity (say) for ctime than > > for mtime. > > I put it in each time struct to use up the hole there. I could, I suppose, > split tv_sec from tv_nsec to get rid of the holes and then put the granularity > separately. That means that someone who wanted both the tv_sec and tv_nsec > would have to fish them out separately, but that's probably okay. > > I could even make the granularity bigger then, to allow for the possibility of > having a granularity >4s, but I don't know of anywhere that requires a gran >2. > > > Also, nfsd will want only one time_delta, not one for each time. > > time_delta? Is that the same as granularity? Right, sorry, that's just the NFS word for the same thing. So my whine here is just that most callers only want to know one number and we're giving them three. Whatever, they can just pick one. It feels a little ugly, but feel free to ignore my nitpicking.... (Though as Steve French asked: could we add this to statfs (or something similar) instead?) > > > Note also we need to document carefully what this means: I think it > > should be the granularity that the filesystem is capable of > > representing, but people are sometimes surprised to find out that the > > actual time source is usually more coarse-grained than that. > > Yeah, but the latter is something you may not be able to determine, and may > indeed change over time (say someone updates the server kernel to one with a > more fine-grained software clock). Also, for a network fs, it may depend on > the client that happened to set that time last. Yep, agreed, the granularity should be what the filesystem can store, we should just make sure that statement makes it into any eventual man pages or other documentation, since it does seem to surprise people. --b. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-26 13:40 ` David Howells [not found] ` <18533.1335447617-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2012-04-30 16:27 ` Ben Hutchings 2012-04-30 20:15 ` David Howells 1 sibling, 1 reply; 144+ messages in thread From: Ben Hutchings @ 2012-04-30 16:27 UTC (permalink / raw) To: David Howells; +Cc: linux-fsdevel, J. Bruce Fields [Apologies for trimming the cc's; I've had to pick this out of an archive and don't have all the addresses.] David Howells <dhowells@redhat.com> wrote: > J. Bruce Fields <bfields@fieldses.org> wrote: > > > > (11) Include granularity fields in the time data to indicate the > > > granularity of each of the times (NFSv4 time_delta) [Steve French]. > > > > It looks like you're including this with *each* time? But surely > > there's no filesystem with different granularity (say) for ctime than > > for mtime. There is an extremely obscure filesystem with this property: VFAT. > I put it in each time struct to use up the hole there. I could, I suppose, > split tv_sec from tv_nsec to get rid of the holes and then put the granularity > separately. That means that someone who wanted both the tv_sec and tv_nsec > would have to fish them out separately, but that's probably okay. > > I could even make the granularity bigger then, to allow for the possibility of > having a granularity >4s, but I don't know of anywhere that requires a gran >2. [...] Try 86,400 seconds - the actual granularity of atime on VFAT. (For mtime it's 2 seconds, and for ctime 0.01 seconds.) Ben. -- Ben Hutchings We get into the habit of living before acquiring the habit of thinking. - Albert Camus ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-30 16:27 ` Ben Hutchings @ 2012-04-30 20:15 ` David Howells 2012-04-30 20:30 ` J. Bruce Fields 0 siblings, 1 reply; 144+ messages in thread From: David Howells @ 2012-04-30 20:15 UTC (permalink / raw) To: Ben Hutchings; +Cc: dhowells, linux-fsdevel, J. Bruce Fields Ben Hutchings <ben@decadent.org.uk> wrote: > Try 86,400 seconds - the actual granularity of atime on VFAT. 24 hours? Really? I guess it makes the updating of atime on the media something you can be lazy about. > (For mtime it's 2 seconds, and for ctime 0.01 seconds.) Sigh. Okay. Ugh. I guess I need separate granularities after all... Not only that, but a 32-bit integer isn't sufficiently capacious to hold the full range I now know about (1nS up to 1 day). I wonder if granularity should be left to a statfsxat() syscall? And I know Linus didn't like it, but I wonder if I can pack it in to a 32-bit word either by doing an x * 10^y thing. David ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-30 20:15 ` David Howells @ 2012-04-30 20:30 ` J. Bruce Fields 2012-04-30 23:31 ` Ben Hutchings 0 siblings, 1 reply; 144+ messages in thread From: J. Bruce Fields @ 2012-04-30 20:30 UTC (permalink / raw) To: David Howells; +Cc: Ben Hutchings, linux-fsdevel On Mon, Apr 30, 2012 at 09:15:56PM +0100, David Howells wrote: > Ben Hutchings <ben@decadent.org.uk> wrote: > > > Try 86,400 seconds - the actual granularity of atime on VFAT. > > 24 hours? Really? I guess it makes the updating of atime on the media > something you can be lazy about. > > > (For mtime it's 2 seconds, and for ctime 0.01 seconds.) Does it actually support ctime, or is there some confusion here between unix ctime and file creation time? > Sigh. Okay. Ugh. I guess I need separate granularities after all... Not > only that, but a 32-bit integer isn't sufficiently capacious to hold the full > range I now know about (1nS up to 1 day). > > I wonder if granularity should be left to a statfsxat() syscall? > > And I know Linus didn't like it, but I wonder if I can pack it in to a 32-bit > word either by doing an x * 10^y thing. Does it matter if we don't get vfat's atime granularity exactly right? Is anyone ever going to use it for anything? --b. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available 2012-04-30 20:30 ` J. Bruce Fields @ 2012-04-30 23:31 ` Ben Hutchings 0 siblings, 0 replies; 144+ messages in thread From: Ben Hutchings @ 2012-04-30 23:31 UTC (permalink / raw) To: J. Bruce Fields; +Cc: David Howells, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 1490 bytes --] On Mon, 2012-04-30 at 16:30 -0400, J. Bruce Fields wrote: > On Mon, Apr 30, 2012 at 09:15:56PM +0100, David Howells wrote: > > Ben Hutchings <ben@decadent.org.uk> wrote: > > > > > Try 86,400 seconds - the actual granularity of atime on VFAT. > > > > 24 hours? Really? I guess it makes the updating of atime on the media > > something you can be lazy about. > > > > > (For mtime it's 2 seconds, and for ctime 0.01 seconds.) > > Does it actually support ctime, or is there some confusion here between > unix ctime and file creation time? I think there is some confusion, yes. The Linux implementation stores ctime, but the documentation I can find refers to creation time. > > Sigh. Okay. Ugh. I guess I need separate granularities after all... Not > > only that, but a 32-bit integer isn't sufficiently capacious to hold the full > > range I now know about (1nS up to 1 day). > > > > I wonder if granularity should be left to a statfsxat() syscall? > > > > And I know Linus didn't like it, but I wonder if I can pack it in to a 32-bit > > word either by doing an x * 10^y thing. > > Does it matter if we don't get vfat's atime granularity exactly right? > Is anyone ever going to use it for anything? I thought the point of this extension was to let callers know what information we *really* have. In which case, let's not cop out on this. Ben. -- Ben Hutchings Design a system any fool can use, and only a fool will want to use it. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 144+ messages in thread
* [PATCH 2/6] xstat: Ext4: Return extended attributes 2012-04-19 14:05 ` David Howells @ 2012-04-19 14:06 ` David Howells -1 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-19 14:06 UTC (permalink / raw) To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw Return extended attributes from the Ext4 filesystem. This includes the following: (1) The inode creation time (i_crtime) as i_btime. (2) The inode i_generation as i_gen if not the root directory. (3) The inode i_version as st_data_version if a file with I_VERSION set or a directory. (4) FS_xxx_FL flags are returned as for ioctl(FS_IOC_GETFLAGS). Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- fs/ext4/ext4.h | 2 ++ fs/ext4/file.c | 2 +- fs/ext4/inode.c | 32 +++++++++++++++++++++++++++++--- fs/ext4/namei.c | 2 ++ fs/ext4/super.c | 1 + fs/ext4/symlink.c | 2 ++ 6 files changed, 37 insertions(+), 4 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index ab2594a..81806da 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1899,6 +1899,8 @@ extern int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat); extern void ext4_evict_inode(struct inode *); extern void ext4_clear_inode(struct inode *); +extern int ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry, + struct kstat *stat); extern int ext4_sync_inode(handle_t *, struct inode *); extern void ext4_dirty_inode(struct inode *, int); extern int ext4_change_inode_journal_flag(struct inode *, int); diff --git a/fs/ext4/file.c b/fs/ext4/file.c index cb70f18..ae8654c 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -249,7 +249,7 @@ const struct file_operations ext4_file_operations = { const struct inode_operations ext4_file_inode_operations = { .setattr = ext4_setattr, - .getattr = ext4_getattr, + .getattr = ext4_file_getattr, #ifdef CONFIG_EXT4_FS_XATTR .setxattr = generic_setxattr, .getxattr = generic_getxattr, diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c77b0bd..eafc188 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4191,11 +4191,37 @@ err_out: int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) { - struct inode *inode; - unsigned long delalloc_blocks; + struct inode *inode = dentry->d_inode; + struct ext4_inode_info *ei = EXT4_I(inode); + + stat->result_mask |= XSTAT_BTIME; + stat->btime.tv_sec = ei->i_crtime.tv_sec; + stat->btime.tv_nsec = ei->i_crtime.tv_nsec; + + if (inode->i_ino != EXT4_ROOT_INO) { + stat->result_mask |= XSTAT_GEN; + stat->gen = inode->i_generation; + } + if (S_ISDIR(inode->i_mode) || IS_I_VERSION(inode)) { + stat->result_mask |= XSTAT_VERSION; + stat->version = inode->i_version; + } + + ext4_get_inode_flags(ei); + stat->ioc_flags |= ei->i_flags & EXT4_FL_USER_VISIBLE; + stat->result_mask |= XSTAT_IOC_FLAGS; - inode = dentry->d_inode; generic_fillattr(inode, stat); + return 0; +} + +int ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry, + struct kstat *stat) +{ + struct inode *inode = dentry->d_inode; + u64 delalloc_blocks; + + ext4_getattr(mnt, dentry, stat); /* * We can't update i_blocks if the block allocation is delayed diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 349d7b3..6162387 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2579,6 +2579,7 @@ const struct inode_operations ext4_dir_inode_operations = { .mknod = ext4_mknod, .rename = ext4_rename, .setattr = ext4_setattr, + .getattr = ext4_getattr, #ifdef CONFIG_EXT4_FS_XATTR .setxattr = generic_setxattr, .getxattr = generic_getxattr, @@ -2591,6 +2592,7 @@ const struct inode_operations ext4_dir_inode_operations = { const struct inode_operations ext4_special_inode_operations = { .setattr = ext4_setattr, + .getattr = ext4_getattr, #ifdef CONFIG_EXT4_FS_XATTR .setxattr = generic_setxattr, .getxattr = generic_getxattr, diff --git a/fs/ext4/super.c b/fs/ext4/super.c index ceebaf8..2d395bf 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3040,6 +3040,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) if (sb->s_magic != EXT4_SUPER_MAGIC) goto cantfind_ext4; sbi->s_kbytes_written = le64_to_cpu(es->s_kbytes_written); + memcpy(sb->s_volume_id, es->s_uuid, sizeof(sb->s_volume_id)); /* Set defaults before we parse the mount options */ def_mount_opts = le32_to_cpu(es->s_default_mount_opts); diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c index ed9354a..d8fe7fb 100644 --- a/fs/ext4/symlink.c +++ b/fs/ext4/symlink.c @@ -35,6 +35,7 @@ const struct inode_operations ext4_symlink_inode_operations = { .follow_link = page_follow_link_light, .put_link = page_put_link, .setattr = ext4_setattr, + .getattr = ext4_getattr, #ifdef CONFIG_EXT4_FS_XATTR .setxattr = generic_setxattr, .getxattr = generic_getxattr, @@ -47,6 +48,7 @@ const struct inode_operations ext4_fast_symlink_inode_operations = { .readlink = generic_readlink, .follow_link = ext4_follow_link, .setattr = ext4_setattr, + .getattr = ext4_getattr, #ifdef CONFIG_EXT4_FS_XATTR .setxattr = generic_setxattr, .getxattr = generic_getxattr, -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 144+ messages in thread
* [PATCH 2/6] xstat: Ext4: Return extended attributes @ 2012-04-19 14:06 ` David Howells 0 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-19 14:06 UTC (permalink / raw) To: linux-fsdevel Cc: dhowells, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Return extended attributes from the Ext4 filesystem. This includes the following: (1) The inode creation time (i_crtime) as i_btime. (2) The inode i_generation as i_gen if not the root directory. (3) The inode i_version as st_data_version if a file with I_VERSION set or a directory. (4) FS_xxx_FL flags are returned as for ioctl(FS_IOC_GETFLAGS). Signed-off-by: David Howells <dhowells@redhat.com> --- fs/ext4/ext4.h | 2 ++ fs/ext4/file.c | 2 +- fs/ext4/inode.c | 32 +++++++++++++++++++++++++++++--- fs/ext4/namei.c | 2 ++ fs/ext4/super.c | 1 + fs/ext4/symlink.c | 2 ++ 6 files changed, 37 insertions(+), 4 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index ab2594a..81806da 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1899,6 +1899,8 @@ extern int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat); extern void ext4_evict_inode(struct inode *); extern void ext4_clear_inode(struct inode *); +extern int ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry, + struct kstat *stat); extern int ext4_sync_inode(handle_t *, struct inode *); extern void ext4_dirty_inode(struct inode *, int); extern int ext4_change_inode_journal_flag(struct inode *, int); diff --git a/fs/ext4/file.c b/fs/ext4/file.c index cb70f18..ae8654c 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -249,7 +249,7 @@ const struct file_operations ext4_file_operations = { const struct inode_operations ext4_file_inode_operations = { .setattr = ext4_setattr, - .getattr = ext4_getattr, + .getattr = ext4_file_getattr, #ifdef CONFIG_EXT4_FS_XATTR .setxattr = generic_setxattr, .getxattr = generic_getxattr, diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c77b0bd..eafc188 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4191,11 +4191,37 @@ err_out: int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) { - struct inode *inode; - unsigned long delalloc_blocks; + struct inode *inode = dentry->d_inode; + struct ext4_inode_info *ei = EXT4_I(inode); + + stat->result_mask |= XSTAT_BTIME; + stat->btime.tv_sec = ei->i_crtime.tv_sec; + stat->btime.tv_nsec = ei->i_crtime.tv_nsec; + + if (inode->i_ino != EXT4_ROOT_INO) { + stat->result_mask |= XSTAT_GEN; + stat->gen = inode->i_generation; + } + if (S_ISDIR(inode->i_mode) || IS_I_VERSION(inode)) { + stat->result_mask |= XSTAT_VERSION; + stat->version = inode->i_version; + } + + ext4_get_inode_flags(ei); + stat->ioc_flags |= ei->i_flags & EXT4_FL_USER_VISIBLE; + stat->result_mask |= XSTAT_IOC_FLAGS; - inode = dentry->d_inode; generic_fillattr(inode, stat); + return 0; +} + +int ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry, + struct kstat *stat) +{ + struct inode *inode = dentry->d_inode; + u64 delalloc_blocks; + + ext4_getattr(mnt, dentry, stat); /* * We can't update i_blocks if the block allocation is delayed diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 349d7b3..6162387 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2579,6 +2579,7 @@ const struct inode_operations ext4_dir_inode_operations = { .mknod = ext4_mknod, .rename = ext4_rename, .setattr = ext4_setattr, + .getattr = ext4_getattr, #ifdef CONFIG_EXT4_FS_XATTR .setxattr = generic_setxattr, .getxattr = generic_getxattr, @@ -2591,6 +2592,7 @@ const struct inode_operations ext4_dir_inode_operations = { const struct inode_operations ext4_special_inode_operations = { .setattr = ext4_setattr, + .getattr = ext4_getattr, #ifdef CONFIG_EXT4_FS_XATTR .setxattr = generic_setxattr, .getxattr = generic_getxattr, diff --git a/fs/ext4/super.c b/fs/ext4/super.c index ceebaf8..2d395bf 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3040,6 +3040,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) if (sb->s_magic != EXT4_SUPER_MAGIC) goto cantfind_ext4; sbi->s_kbytes_written = le64_to_cpu(es->s_kbytes_written); + memcpy(sb->s_volume_id, es->s_uuid, sizeof(sb->s_volume_id)); /* Set defaults before we parse the mount options */ def_mount_opts = le32_to_cpu(es->s_default_mount_opts); diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c index ed9354a..d8fe7fb 100644 --- a/fs/ext4/symlink.c +++ b/fs/ext4/symlink.c @@ -35,6 +35,7 @@ const struct inode_operations ext4_symlink_inode_operations = { .follow_link = page_follow_link_light, .put_link = page_put_link, .setattr = ext4_setattr, + .getattr = ext4_getattr, #ifdef CONFIG_EXT4_FS_XATTR .setxattr = generic_setxattr, .getxattr = generic_getxattr, @@ -47,6 +48,7 @@ const struct inode_operations ext4_fast_symlink_inode_operations = { .readlink = generic_readlink, .follow_link = ext4_follow_link, .setattr = ext4_setattr, + .getattr = ext4_getattr, #ifdef CONFIG_EXT4_FS_XATTR .setxattr = generic_setxattr, .getxattr = generic_getxattr, ^ permalink raw reply related [flat|nested] 144+ messages in thread
[parent not found: <20120419140625.17272.23303.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>]
* Re: [PATCH 2/6] xstat: Ext4: Return extended attributes 2012-04-19 14:06 ` David Howells @ 2012-04-19 16:03 ` Steve French -1 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-19 16:03 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw This patch reminds me of a question on time stamps - how can an application query the time granularity ie sb_s_time_gran for a mount (e.g. 1 second for some file systems, 100 nanoseconds for cifs/smb2, 1 nanosecond for others etc.) On Thu, Apr 19, 2012 at 9:06 AM, David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > Return extended attributes from the Ext4 filesystem. This includes the > following: > > (1) The inode creation time (i_crtime) as i_btime. > > (2) The inode i_generation as i_gen if not the root directory. > > (3) The inode i_version as st_data_version if a file with I_VERSION set or a > directory. > > (4) FS_xxx_FL flags are returned as for ioctl(FS_IOC_GETFLAGS). > > Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > --- > > fs/ext4/ext4.h | 2 ++ > fs/ext4/file.c | 2 +- > fs/ext4/inode.c | 32 +++++++++++++++++++++++++++++--- > fs/ext4/namei.c | 2 ++ > fs/ext4/super.c | 1 + > fs/ext4/symlink.c | 2 ++ > 6 files changed, 37 insertions(+), 4 deletions(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index ab2594a..81806da 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -1899,6 +1899,8 @@ extern int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry, > struct kstat *stat); > extern void ext4_evict_inode(struct inode *); > extern void ext4_clear_inode(struct inode *); > +extern int ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry, > + struct kstat *stat); > extern int ext4_sync_inode(handle_t *, struct inode *); > extern void ext4_dirty_inode(struct inode *, int); > extern int ext4_change_inode_journal_flag(struct inode *, int); > diff --git a/fs/ext4/file.c b/fs/ext4/file.c > index cb70f18..ae8654c 100644 > --- a/fs/ext4/file.c > +++ b/fs/ext4/file.c > @@ -249,7 +249,7 @@ const struct file_operations ext4_file_operations = { > > const struct inode_operations ext4_file_inode_operations = { > .setattr = ext4_setattr, > - .getattr = ext4_getattr, > + .getattr = ext4_file_getattr, > #ifdef CONFIG_EXT4_FS_XATTR > .setxattr = generic_setxattr, > .getxattr = generic_getxattr, > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index c77b0bd..eafc188 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -4191,11 +4191,37 @@ err_out: > int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry, > struct kstat *stat) > { > - struct inode *inode; > - unsigned long delalloc_blocks; > + struct inode *inode = dentry->d_inode; > + struct ext4_inode_info *ei = EXT4_I(inode); > + > + stat->result_mask |= XSTAT_BTIME; > + stat->btime.tv_sec = ei->i_crtime.tv_sec; > + stat->btime.tv_nsec = ei->i_crtime.tv_nsec; > + > + if (inode->i_ino != EXT4_ROOT_INO) { > + stat->result_mask |= XSTAT_GEN; > + stat->gen = inode->i_generation; > + } > + if (S_ISDIR(inode->i_mode) || IS_I_VERSION(inode)) { > + stat->result_mask |= XSTAT_VERSION; > + stat->version = inode->i_version; > + } > + > + ext4_get_inode_flags(ei); > + stat->ioc_flags |= ei->i_flags & EXT4_FL_USER_VISIBLE; > + stat->result_mask |= XSTAT_IOC_FLAGS; > > - inode = dentry->d_inode; > generic_fillattr(inode, stat); > + return 0; > +} > + > +int ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry, > + struct kstat *stat) > +{ > + struct inode *inode = dentry->d_inode; > + u64 delalloc_blocks; > + > + ext4_getattr(mnt, dentry, stat); > > /* > * We can't update i_blocks if the block allocation is delayed > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > index 349d7b3..6162387 100644 > --- a/fs/ext4/namei.c > +++ b/fs/ext4/namei.c > @@ -2579,6 +2579,7 @@ const struct inode_operations ext4_dir_inode_operations = { > .mknod = ext4_mknod, > .rename = ext4_rename, > .setattr = ext4_setattr, > + .getattr = ext4_getattr, > #ifdef CONFIG_EXT4_FS_XATTR > .setxattr = generic_setxattr, > .getxattr = generic_getxattr, > @@ -2591,6 +2592,7 @@ const struct inode_operations ext4_dir_inode_operations = { > > const struct inode_operations ext4_special_inode_operations = { > .setattr = ext4_setattr, > + .getattr = ext4_getattr, > #ifdef CONFIG_EXT4_FS_XATTR > .setxattr = generic_setxattr, > .getxattr = generic_getxattr, > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index ceebaf8..2d395bf 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -3040,6 +3040,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) > if (sb->s_magic != EXT4_SUPER_MAGIC) > goto cantfind_ext4; > sbi->s_kbytes_written = le64_to_cpu(es->s_kbytes_written); > + memcpy(sb->s_volume_id, es->s_uuid, sizeof(sb->s_volume_id)); > > /* Set defaults before we parse the mount options */ > def_mount_opts = le32_to_cpu(es->s_default_mount_opts); > diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c > index ed9354a..d8fe7fb 100644 > --- a/fs/ext4/symlink.c > +++ b/fs/ext4/symlink.c > @@ -35,6 +35,7 @@ const struct inode_operations ext4_symlink_inode_operations = { > .follow_link = page_follow_link_light, > .put_link = page_put_link, > .setattr = ext4_setattr, > + .getattr = ext4_getattr, > #ifdef CONFIG_EXT4_FS_XATTR > .setxattr = generic_setxattr, > .getxattr = generic_getxattr, > @@ -47,6 +48,7 @@ const struct inode_operations ext4_fast_symlink_inode_operations = { > .readlink = generic_readlink, > .follow_link = ext4_follow_link, > .setattr = ext4_setattr, > + .getattr = ext4_getattr, > #ifdef CONFIG_EXT4_FS_XATTR > .setxattr = generic_setxattr, > .getxattr = generic_getxattr, > > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 2/6] xstat: Ext4: Return extended attributes @ 2012-04-19 16:03 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-19 16:03 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha This patch reminds me of a question on time stamps - how can an application query the time granularity ie sb_s_time_gran for a mount (e.g. 1 second for some file systems, 100 nanoseconds for cifs/smb2, 1 nanosecond for others etc.) On Thu, Apr 19, 2012 at 9:06 AM, David Howells <dhowells@redhat.com> wrote: > Return extended attributes from the Ext4 filesystem. This includes the > following: > > (1) The inode creation time (i_crtime) as i_btime. > > (2) The inode i_generation as i_gen if not the root directory. > > (3) The inode i_version as st_data_version if a file with I_VERSION set or a > directory. > > (4) FS_xxx_FL flags are returned as for ioctl(FS_IOC_GETFLAGS). > > Signed-off-by: David Howells <dhowells@redhat.com> > --- > > fs/ext4/ext4.h | 2 ++ > fs/ext4/file.c | 2 +- > fs/ext4/inode.c | 32 +++++++++++++++++++++++++++++--- > fs/ext4/namei.c | 2 ++ > fs/ext4/super.c | 1 + > fs/ext4/symlink.c | 2 ++ > 6 files changed, 37 insertions(+), 4 deletions(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index ab2594a..81806da 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -1899,6 +1899,8 @@ extern int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry, > struct kstat *stat); > extern void ext4_evict_inode(struct inode *); > extern void ext4_clear_inode(struct inode *); > +extern int ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry, > + struct kstat *stat); > extern int ext4_sync_inode(handle_t *, struct inode *); > extern void ext4_dirty_inode(struct inode *, int); > extern int ext4_change_inode_journal_flag(struct inode *, int); > diff --git a/fs/ext4/file.c b/fs/ext4/file.c > index cb70f18..ae8654c 100644 > --- a/fs/ext4/file.c > +++ b/fs/ext4/file.c > @@ -249,7 +249,7 @@ const struct file_operations ext4_file_operations = { > > const struct inode_operations ext4_file_inode_operations = { > .setattr = ext4_setattr, > - .getattr = ext4_getattr, > + .getattr = ext4_file_getattr, > #ifdef CONFIG_EXT4_FS_XATTR > .setxattr = generic_setxattr, > .getxattr = generic_getxattr, > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index c77b0bd..eafc188 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -4191,11 +4191,37 @@ err_out: > int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry, > struct kstat *stat) > { > - struct inode *inode; > - unsigned long delalloc_blocks; > + struct inode *inode = dentry->d_inode; > + struct ext4_inode_info *ei = EXT4_I(inode); > + > + stat->result_mask |= XSTAT_BTIME; > + stat->btime.tv_sec = ei->i_crtime.tv_sec; > + stat->btime.tv_nsec = ei->i_crtime.tv_nsec; > + > + if (inode->i_ino != EXT4_ROOT_INO) { > + stat->result_mask |= XSTAT_GEN; > + stat->gen = inode->i_generation; > + } > + if (S_ISDIR(inode->i_mode) || IS_I_VERSION(inode)) { > + stat->result_mask |= XSTAT_VERSION; > + stat->version = inode->i_version; > + } > + > + ext4_get_inode_flags(ei); > + stat->ioc_flags |= ei->i_flags & EXT4_FL_USER_VISIBLE; > + stat->result_mask |= XSTAT_IOC_FLAGS; > > - inode = dentry->d_inode; > generic_fillattr(inode, stat); > + return 0; > +} > + > +int ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry, > + struct kstat *stat) > +{ > + struct inode *inode = dentry->d_inode; > + u64 delalloc_blocks; > + > + ext4_getattr(mnt, dentry, stat); > > /* > * We can't update i_blocks if the block allocation is delayed > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > index 349d7b3..6162387 100644 > --- a/fs/ext4/namei.c > +++ b/fs/ext4/namei.c > @@ -2579,6 +2579,7 @@ const struct inode_operations ext4_dir_inode_operations = { > .mknod = ext4_mknod, > .rename = ext4_rename, > .setattr = ext4_setattr, > + .getattr = ext4_getattr, > #ifdef CONFIG_EXT4_FS_XATTR > .setxattr = generic_setxattr, > .getxattr = generic_getxattr, > @@ -2591,6 +2592,7 @@ const struct inode_operations ext4_dir_inode_operations = { > > const struct inode_operations ext4_special_inode_operations = { > .setattr = ext4_setattr, > + .getattr = ext4_getattr, > #ifdef CONFIG_EXT4_FS_XATTR > .setxattr = generic_setxattr, > .getxattr = generic_getxattr, > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index ceebaf8..2d395bf 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -3040,6 +3040,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) > if (sb->s_magic != EXT4_SUPER_MAGIC) > goto cantfind_ext4; > sbi->s_kbytes_written = le64_to_cpu(es->s_kbytes_written); > + memcpy(sb->s_volume_id, es->s_uuid, sizeof(sb->s_volume_id)); > > /* Set defaults before we parse the mount options */ > def_mount_opts = le32_to_cpu(es->s_default_mount_opts); > diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c > index ed9354a..d8fe7fb 100644 > --- a/fs/ext4/symlink.c > +++ b/fs/ext4/symlink.c > @@ -35,6 +35,7 @@ const struct inode_operations ext4_symlink_inode_operations = { > .follow_link = page_follow_link_light, > .put_link = page_put_link, > .setattr = ext4_setattr, > + .getattr = ext4_getattr, > #ifdef CONFIG_EXT4_FS_XATTR > .setxattr = generic_setxattr, > .getxattr = generic_getxattr, > @@ -47,6 +48,7 @@ const struct inode_operations ext4_fast_symlink_inode_operations = { > .readlink = generic_readlink, > .follow_link = ext4_follow_link, > .setattr = ext4_setattr, > + .getattr = ext4_getattr, > #ifdef CONFIG_EXT4_FS_XATTR > .setxattr = generic_setxattr, > .getxattr = generic_getxattr, > > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 2/6] xstat: Ext4: Return extended attributes 2012-04-19 14:06 ` David Howells (?) (?) @ 2012-04-26 13:47 ` David Howells 2012-04-26 17:00 ` Steve French -1 siblings, 1 reply; 144+ messages in thread From: David Howells @ 2012-04-26 13:47 UTC (permalink / raw) To: Steve French Cc: dhowells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Steve French <smfrench@gmail.com> wrote: > This patch reminds me of a question on time stamps - how can an > application query the time granularity ie sb_s_time_gran for a mount > (e.g. 1 second for some file systems, 100 nanoseconds for cifs/smb2, 1 > nanosecond for others etc.) Ummm... In what context? With the proposed xstat() interface it will be provided. David ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 2/6] xstat: Ext4: Return extended attributes 2012-04-26 13:47 ` David Howells @ 2012-04-26 17:00 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-26 17:00 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 26, 2012 at 8:47 AM, David Howells <dhowells@redhat.com> wrote: > Steve French <smfrench@gmail.com> wrote: > >> This patch reminds me of a question on time stamps - how can an >> application query the time granularity ie sb_s_time_gran for a mount >> (e.g. 1 second for some file systems, 100 nanoseconds for cifs/smb2, 1 >> nanosecond for others etc.) > > Ummm... In what context? With the proposed xstat() interface it will be > provided. great (although I thought it would be "stat -f" property). -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 2/6] xstat: Ext4: Return extended attributes @ 2012-04-26 17:00 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-26 17:00 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 26, 2012 at 8:47 AM, David Howells <dhowells@redhat.com> wrote: > Steve French <smfrench@gmail.com> wrote: > >> This patch reminds me of a question on time stamps - how can an >> application query the time granularity ie sb_s_time_gran for a mount >> (e.g. 1 second for some file systems, 100 nanoseconds for cifs/smb2, 1 >> nanosecond for others etc.) > > Ummm... In what context? With the proposed xstat() interface it will be > provided. great (although I thought it would be "stat -f" property). -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* [PATCH 6/6] xstat: eCryptFS: Return extended attributes 2012-04-19 14:05 ` David Howells @ 2012-04-19 14:07 ` David Howells -1 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-19 14:07 UTC (permalink / raw) To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw Return extended attributes from the eCryptFS filesystem, dredged up from the lower filesystem. XSTAT_INFO_ENCRYPTED is set on the files whose cryptography is handled by eCryptFS. Possibly eCryptFS should also set FS_COMPR_FL on its compressed files. Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- fs/ecryptfs/inode.c | 14 ++++++++++++-- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index ab35b11..62865e9 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -1060,13 +1060,23 @@ int ecryptfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat lower_stat; int rc; - rc = vfs_getattr(ecryptfs_dentry_to_lower_mnt(dentry), - ecryptfs_dentry_to_lower(dentry), &lower_stat); + lower_stat.query_flags = stat->query_flags; + lower_stat.request_mask = stat->request_mask | XSTAT_BLOCKS; + rc = vfs_xgetattr(ecryptfs_dentry_to_lower_mnt(dentry), + ecryptfs_dentry_to_lower(dentry), &lower_stat); if (!rc) { fsstack_copy_attr_all(dentry->d_inode, ecryptfs_inode_to_lower(dentry->d_inode)); generic_fillattr(dentry->d_inode, stat); stat->blocks = lower_stat.blocks; + stat->result_mask = lower_stat.result_mask; + stat->information = lower_stat.information; + stat->information |= XSTAT_INFO_ENCRYPTED; + stat->gen = lower_stat.gen; + stat->version = lower_stat.version; + stat->ioc_flags = lower_stat.ioc_flags; + memcpy(&stat->volume_id, lower_stat.volume_id, + sizeof(stat->volume_id)); } return rc; } ^ permalink raw reply related [flat|nested] 144+ messages in thread
* [PATCH 6/6] xstat: eCryptFS: Return extended attributes @ 2012-04-19 14:07 ` David Howells 0 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-04-19 14:07 UTC (permalink / raw) To: linux-fsdevel Cc: dhowells, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Return extended attributes from the eCryptFS filesystem, dredged up from the lower filesystem. XSTAT_INFO_ENCRYPTED is set on the files whose cryptography is handled by eCryptFS. Possibly eCryptFS should also set FS_COMPR_FL on its compressed files. Signed-off-by: David Howells <dhowells@redhat.com> --- fs/ecryptfs/inode.c | 14 ++++++++++++-- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index ab35b11..62865e9 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -1060,13 +1060,23 @@ int ecryptfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat lower_stat; int rc; - rc = vfs_getattr(ecryptfs_dentry_to_lower_mnt(dentry), - ecryptfs_dentry_to_lower(dentry), &lower_stat); + lower_stat.query_flags = stat->query_flags; + lower_stat.request_mask = stat->request_mask | XSTAT_BLOCKS; + rc = vfs_xgetattr(ecryptfs_dentry_to_lower_mnt(dentry), + ecryptfs_dentry_to_lower(dentry), &lower_stat); if (!rc) { fsstack_copy_attr_all(dentry->d_inode, ecryptfs_inode_to_lower(dentry->d_inode)); generic_fillattr(dentry->d_inode, stat); stat->blocks = lower_stat.blocks; + stat->result_mask = lower_stat.result_mask; + stat->information = lower_stat.information; + stat->information |= XSTAT_INFO_ENCRYPTED; + stat->gen = lower_stat.gen; + stat->version = lower_stat.version; + stat->ioc_flags = lower_stat.ioc_flags; + memcpy(&stat->volume_id, lower_stat.volume_id, + sizeof(stat->volume_id)); } return rc; } ^ permalink raw reply related [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-19 14:05 ` David Howells @ 2012-04-19 17:11 ` Steve French -1 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-19 17:11 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Thu, Apr 19, 2012 at 9:05 AM, David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > Implement a pair of new system calls to provide extended and further extensible stat functions. <snip> > Should the default for a network fs be to do an unconditional (heavyweight) > stat with a flag to suppress going to the server to update the locally held > attributes and flushing pending writebacks? Even though we can use leases (oplocks) to avoid the roundrtrip, it is probably too expensive to default to forcing a cache flush, especially when a common case is to get the file creation time or inode number information (stable vs volatile). Would it be better to make the stable vs volatile inode number an attribute of the volume or something returned by the proposed xstat? > Should things like the Windows Archive, Hidden and System bits be handled > through IOC flags, perhaps expanded to 64-bits? Today I export these through an psuedo-xattr in cifs.ko, I am curious how NTFS and FAT export these on linux. > ========== > TO BE DONE > ========== > > Autofs, ntfs, btrfs, ... Given the overlap in optional attributes between the network protocol and local NTFS (and ReFS and to a lesser extent FAT) I would expect cifs.ko and the ntfs implementations info to map pretty closely. > I should perhaps use u8/u32/u64 rather than uint8/32/64_t. > > Handle remote filesystems being offline and indicate this with > XSTAT_INFO_OFFLINE. You already have support for an indicator for offline files (HSM), would XSTAT_INFO_OFFLINE be intended for the case where the network session to the server is disconnected (and in which you case the application does not want to reconnect)? -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-19 17:11 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-19 17:11 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 19, 2012 at 9:05 AM, David Howells <dhowells@redhat.com> wrote: > > Implement a pair of new system calls to provide extended and further extensible stat functions. <snip> > Should the default for a network fs be to do an unconditional (heavyweight) > stat with a flag to suppress going to the server to update the locally held > attributes and flushing pending writebacks? Even though we can use leases (oplocks) to avoid the roundrtrip, it is probably too expensive to default to forcing a cache flush, especially when a common case is to get the file creation time or inode number information (stable vs volatile). Would it be better to make the stable vs volatile inode number an attribute of the volume or something returned by the proposed xstat? > Should things like the Windows Archive, Hidden and System bits be handled > through IOC flags, perhaps expanded to 64-bits? Today I export these through an psuedo-xattr in cifs.ko, I am curious how NTFS and FAT export these on linux. > ========== > TO BE DONE > ========== > > Autofs, ntfs, btrfs, ... Given the overlap in optional attributes between the network protocol and local NTFS (and ReFS and to a lesser extent FAT) I would expect cifs.ko and the ntfs implementations info to map pretty closely. > I should perhaps use u8/u32/u64 rather than uint8/32/64_t. > > Handle remote filesystems being offline and indicate this with > XSTAT_INFO_OFFLINE. You already have support for an indicator for offline files (HSM), would XSTAT_INFO_OFFLINE be intended for the case where the network session to the server is disconnected (and in which you case the application does not want to reconnect)? -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-19 14:05 ` David Howells @ 2012-04-27 1:06 ` Dave Chinner -1 siblings, 0 replies; 144+ messages in thread From: Dave Chinner @ 2012-04-27 1:06 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Thu, Apr 19, 2012 at 03:05:58PM +0100, David Howells wrote: > > Implement a pair of new system calls to provide extended and further extensible > stat functions. > > The second of the associated patches is the main patch that provides these new > system calls: > > ssize_t ret = xstat(int dfd, > const char *filename, > unsigned atflag, > unsigned mask, > struct xstat *buffer); > > ssize_t ret = fxstat(int fd, > unsigned atflag, > unsigned mask, > struct xstat *buffer); > > which are more fully documented in the first patch's description. > > These new stat functions provide a number of useful features, in summary: > > (1) More information: creation time, inode generation number, data version > number, flags/attributes. A subset of these is available through a number > of filesystems (such as CIFS, NFS, AFS, Ext4 and BTRFS). If we are adding per-inode flags, then what do we do with filesystem specific flags? e.g. XFS has quite a number of per-inode flags that don't align with any other filesystem (e.g. filestream allocator, real time file, behaviour inheritence flags, etc), but may be useful to retrieve in such a call. We currently have an ioctl to get that information from each inode. Have you thought about how to handle such flags? Along the same lines, filesytsems can have different allocation constraints to IO the filesystem block size - ext4 with it's bigalloc hack, XFS with it's per-inode extent size hints and the realtime device, etc. Then there's optimal IO characteristics (e.g. geometery hints like stripe unit/stripe width for the allocation policy of that given file) that applications could use if they were present rather than having to expose them through ioctls that nobody even knows about... Perhaps also exposing the project ID for quota purposes, like we do UID and GID. That way we wouldn't need a filesystem specific ioctl to read it.... Cheers, Dave. -- Dave Chinner david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-27 1:06 ` Dave Chinner 0 siblings, 0 replies; 144+ messages in thread From: Dave Chinner @ 2012-04-27 1:06 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 19, 2012 at 03:05:58PM +0100, David Howells wrote: > > Implement a pair of new system calls to provide extended and further extensible > stat functions. > > The second of the associated patches is the main patch that provides these new > system calls: > > ssize_t ret = xstat(int dfd, > const char *filename, > unsigned atflag, > unsigned mask, > struct xstat *buffer); > > ssize_t ret = fxstat(int fd, > unsigned atflag, > unsigned mask, > struct xstat *buffer); > > which are more fully documented in the first patch's description. > > These new stat functions provide a number of useful features, in summary: > > (1) More information: creation time, inode generation number, data version > number, flags/attributes. A subset of these is available through a number > of filesystems (such as CIFS, NFS, AFS, Ext4 and BTRFS). If we are adding per-inode flags, then what do we do with filesystem specific flags? e.g. XFS has quite a number of per-inode flags that don't align with any other filesystem (e.g. filestream allocator, real time file, behaviour inheritence flags, etc), but may be useful to retrieve in such a call. We currently have an ioctl to get that information from each inode. Have you thought about how to handle such flags? Along the same lines, filesytsems can have different allocation constraints to IO the filesystem block size - ext4 with it's bigalloc hack, XFS with it's per-inode extent size hints and the realtime device, etc. Then there's optimal IO characteristics (e.g. geometery hints like stripe unit/stripe width for the allocation policy of that given file) that applications could use if they were present rather than having to expose them through ioctls that nobody even knows about... Perhaps also exposing the project ID for quota purposes, like we do UID and GID. That way we wouldn't need a filesystem specific ioctl to read it.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-27 1:06 ` Dave Chinner (?) @ 2012-04-27 3:22 ` Andreas Dilger [not found] ` <ED5B8F1B-6C99-4516-85FA-A767E94B635F-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org> -1 siblings, 1 reply; 144+ messages in thread From: Andreas Dilger @ 2012-04-27 3:22 UTC (permalink / raw) To: Dave Chinner Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On 2012-04-26, at 7:06 PM, Dave Chinner wrote: > On Thu, Apr 19, 2012 at 03:05:58PM +0100, David Howells wrote: >> >> Implement a pair of new system calls to provide extended and further extensible stat functions. >> >> The second of the associated patches is the main patch that provides these new system calls: >> >> ssize_t ret = xstat(int dfd, >> const char *filename, >> unsigned atflag, >> unsigned mask, >> struct xstat *buffer); >> >> ssize_t ret = fxstat(int fd, >> unsigned atflag, >> unsigned mask, >> struct xstat *buffer); >> >> which are more fully documented in the first patch's description. >> >> These new stat functions provide a number of useful features, in summary: >> >> (1) More information: creation time, inode generation number, data >> version number, flags/attributes. A subset of these is available >> through a number of filesystems (CIFS, NFS, AFS, Ext4 and BTRFS). > > If we are adding per-inode flags, then what do we do with filesystem > specific flags? e.g. XFS has quite a number of per-inode flags that > don't align with any other filesystem (e.g. filestream allocator, > real time file, behaviour inheritence flags, etc), but may be useful > to retrieve in such a call. We currently have an ioctl to get that > information from each inode. Have you thought about how to handle > such flags? I'm sympathetic to your cause, but I don't want this to degrade into the same morass that it did last time when every attribute under the sun was added to the call. The intent is to replace the stat() call with something that can avoid overhead on filesystems for which some attributes are expensive, and that applications may not need. Some common attributes were added that are used by multiple filesystems. If it is too filesystem-specific, and there is little possibility that these attributes will be usable on other filesystems, then it should remain a filesystem specific ioctl() call. If you can make a case that these attributes have value on a few other filesystems, and applications are reasonably likely to be able to use them, and their addition does not make the API overly complex, then suggest away. > Along the same lines, filesytsems can have different allocation > constraints to IO the filesystem block size - ext4 with it's > bigalloc hack, XFS with it's per-inode extent size hints and the > realtime device, etc. Then there's optimal IO characteristics > (e.g. geometery hints like stripe unit/stripe width for the > allocation policy of that given file) that applications could use > if they were present rather than having to expose them through > ioctls that nobody even knows about... There is already "optimal IO size" that the application can use, how do the geometry hints differ? Userspace is able to handle st_blksize of several MB in size without problems, and any sane application will do the IO sized + aligned on multiples of this. > Perhaps also exposing the project ID for quota purposes, like we do > UID and GID. That way we wouldn't need a filesystem specific ioctl > to read it.... This seems reasonable and generic and simple. This is similar to directory quotas in other filesystems. Cheers, Andreas ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <ED5B8F1B-6C99-4516-85FA-A767E94B635F-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-27 3:22 ` Andreas Dilger @ 2012-04-28 0:38 ` Dave Chinner 0 siblings, 0 replies; 144+ messages in thread From: Dave Chinner @ 2012-04-28 0:38 UTC (permalink / raw) To: Andreas Dilger Cc: David Howells, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Thu, Apr 26, 2012 at 09:22:04PM -0600, Andreas Dilger wrote: > On 2012-04-26, at 7:06 PM, Dave Chinner wrote: > > On Thu, Apr 19, 2012 at 03:05:58PM +0100, David Howells wrote: > >> > >> Implement a pair of new system calls to provide extended and further extensible stat functions. > >> > >> The second of the associated patches is the main patch that provides these new system calls: > >> > >> ssize_t ret = xstat(int dfd, > >> const char *filename, > >> unsigned atflag, > >> unsigned mask, > >> struct xstat *buffer); > >> > >> ssize_t ret = fxstat(int fd, > >> unsigned atflag, > >> unsigned mask, > >> struct xstat *buffer); > >> > >> which are more fully documented in the first patch's description. > >> > >> These new stat functions provide a number of useful features, in summary: > >> > >> (1) More information: creation time, inode generation number, data > >> version number, flags/attributes. A subset of these is available > >> through a number of filesystems (CIFS, NFS, AFS, Ext4 and BTRFS). > > > > If we are adding per-inode flags, then what do we do with filesystem > > specific flags? e.g. XFS has quite a number of per-inode flags that > > don't align with any other filesystem (e.g. filestream allocator, > > real time file, behaviour inheritence flags, etc), but may be useful > > to retrieve in such a call. We currently have an ioctl to get that > > information from each inode. Have you thought about how to handle > > such flags? > > I'm sympathetic to your cause, but I don't want this to degrade into > the same morass that it did last time when every attribute under the > sun was added to the call. Understood, which is why I'm not asking for everything under the sun to be supported. I'm more interested in finding the useful subset of information that a typical application might make use of. > The intent is to replace the stat() call > with something that can avoid overhead on filesystems for which some > attributes are expensive, and that applications may not need. Some > common attributes were added that are used by multiple filesystems. > > If it is too filesystem-specific, and there is little possibility > that these attributes will be usable on other filesystems, then it > should remain a filesystem specific ioctl() call. Right, that's why I didn't mention the real-time bits, the filestream allocation bits, or other things that are tightly bound to the way XFS works.... > If you can make > a case that these attributes have value on a few other filesystems, > and applications are reasonably likely to be able to use them, and > their addition does not make the API overly complex, then suggest > away. Exactly my thoughts ;) > > Along the same lines, filesytsems can have different allocation > > constraints to IO the filesystem block size - ext4 with it's > > bigalloc hack, XFS with it's per-inode extent size hints and the > > realtime device, etc. Then there's optimal IO characteristics > > (e.g. geometery hints like stripe unit/stripe width for the > > allocation policy of that given file) that applications could use > > if they were present rather than having to expose them through > > ioctls that nobody even knows about... > > There is already "optimal IO size" that the application can use, > how do the geometry hints differ? Have a look at how XFS overloads stat.st_blksize depending on the filesystem and inode config. It's amazingly convoluted, and based on a combination of filesystem geometry, inode bits and mount options: xfs_vn_getattr() .... if (XFS_IS_REALTIME_INODE(ip)) { /* * If the file blocks are being allocated from a * realtime volume, then return the inode's realtime * extent size or the realtime volume's extent size. */ stat->blksize = xfs_get_extsz_hint(ip) << mp->m_sb.sb_blocklog; } else stat->blksize = xfs_preferred_iosize(mp); ...... xfs_extlen_t xfs_get_extsz_hint( struct xfs_inode *ip) { if ((ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE) && ip->i_d.di_extsize) return ip->i_d.di_extsize; if (XFS_IS_REALTIME_INODE(ip)) return ip->i_mount->m_sb.sb_rextsize; return 0; } .... static inline unsigned long xfs_preferred_iosize(xfs_mount_t *mp) { if (mp->m_flags & XFS_MOUNT_COMPAT_IOSIZE) return PAGE_CACHE_SIZE; return (mp->m_swidth ? (mp->m_swidth << mp->m_sb.sb_blocklog) : ((mp->m_flags & XFS_MOUNT_DFLT_IOSIZE) ? (1 << (int)MAX(mp->m_readio_log, mp->m_writeio_log)) : PAGE_CACHE_SIZE)); } All of that can be exported as 4 parameters for normal files: allocation block size (extent size hint) minimum io size (PAGE_CACHE_SIZE) preferred minimum IO size (mp->m_readio_log/mp->m_writeio_log) best aligned IO size (stripe width) And for realtime files it's a bit different because of the block-based bitmap allocator it uses: allocation block size (extent size hint) minimum io size (PAGE_CACHE_SIZE) preferred minimum IO size (extent size hint) best aligned IO size (some multiple of extent size hint) > Userspace is able to handle > st_blksize of several MB in size without problems, and any sane > application will do the IO sized + aligned on multiples of this. Actually, some applications still have problems with that. That's the reason we only expose stripe widths in st_blksize when a mount option is set. Stripe widths are known to get into the tens of MB, and applications using st_blksize for memory allocation of IO buffers tend to get into trouble with those. That's why I'd prefer specific optimal IO hints - we don't have to overload st_blksize with lots of meanings to pass what is relatively trivial information back to the application. Cheers, Dave. -- Dave Chinner david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-28 0:38 ` Dave Chinner 0 siblings, 0 replies; 144+ messages in thread From: Dave Chinner @ 2012-04-28 0:38 UTC (permalink / raw) To: Andreas Dilger Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Thu, Apr 26, 2012 at 09:22:04PM -0600, Andreas Dilger wrote: > On 2012-04-26, at 7:06 PM, Dave Chinner wrote: > > On Thu, Apr 19, 2012 at 03:05:58PM +0100, David Howells wrote: > >> > >> Implement a pair of new system calls to provide extended and further extensible stat functions. > >> > >> The second of the associated patches is the main patch that provides these new system calls: > >> > >> ssize_t ret = xstat(int dfd, > >> const char *filename, > >> unsigned atflag, > >> unsigned mask, > >> struct xstat *buffer); > >> > >> ssize_t ret = fxstat(int fd, > >> unsigned atflag, > >> unsigned mask, > >> struct xstat *buffer); > >> > >> which are more fully documented in the first patch's description. > >> > >> These new stat functions provide a number of useful features, in summary: > >> > >> (1) More information: creation time, inode generation number, data > >> version number, flags/attributes. A subset of these is available > >> through a number of filesystems (CIFS, NFS, AFS, Ext4 and BTRFS). > > > > If we are adding per-inode flags, then what do we do with filesystem > > specific flags? e.g. XFS has quite a number of per-inode flags that > > don't align with any other filesystem (e.g. filestream allocator, > > real time file, behaviour inheritence flags, etc), but may be useful > > to retrieve in such a call. We currently have an ioctl to get that > > information from each inode. Have you thought about how to handle > > such flags? > > I'm sympathetic to your cause, but I don't want this to degrade into > the same morass that it did last time when every attribute under the > sun was added to the call. Understood, which is why I'm not asking for everything under the sun to be supported. I'm more interested in finding the useful subset of information that a typical application might make use of. > The intent is to replace the stat() call > with something that can avoid overhead on filesystems for which some > attributes are expensive, and that applications may not need. Some > common attributes were added that are used by multiple filesystems. > > If it is too filesystem-specific, and there is little possibility > that these attributes will be usable on other filesystems, then it > should remain a filesystem specific ioctl() call. Right, that's why I didn't mention the real-time bits, the filestream allocation bits, or other things that are tightly bound to the way XFS works.... > If you can make > a case that these attributes have value on a few other filesystems, > and applications are reasonably likely to be able to use them, and > their addition does not make the API overly complex, then suggest > away. Exactly my thoughts ;) > > Along the same lines, filesytsems can have different allocation > > constraints to IO the filesystem block size - ext4 with it's > > bigalloc hack, XFS with it's per-inode extent size hints and the > > realtime device, etc. Then there's optimal IO characteristics > > (e.g. geometery hints like stripe unit/stripe width for the > > allocation policy of that given file) that applications could use > > if they were present rather than having to expose them through > > ioctls that nobody even knows about... > > There is already "optimal IO size" that the application can use, > how do the geometry hints differ? Have a look at how XFS overloads stat.st_blksize depending on the filesystem and inode config. It's amazingly convoluted, and based on a combination of filesystem geometry, inode bits and mount options: xfs_vn_getattr() .... if (XFS_IS_REALTIME_INODE(ip)) { /* * If the file blocks are being allocated from a * realtime volume, then return the inode's realtime * extent size or the realtime volume's extent size. */ stat->blksize = xfs_get_extsz_hint(ip) << mp->m_sb.sb_blocklog; } else stat->blksize = xfs_preferred_iosize(mp); ...... xfs_extlen_t xfs_get_extsz_hint( struct xfs_inode *ip) { if ((ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE) && ip->i_d.di_extsize) return ip->i_d.di_extsize; if (XFS_IS_REALTIME_INODE(ip)) return ip->i_mount->m_sb.sb_rextsize; return 0; } .... static inline unsigned long xfs_preferred_iosize(xfs_mount_t *mp) { if (mp->m_flags & XFS_MOUNT_COMPAT_IOSIZE) return PAGE_CACHE_SIZE; return (mp->m_swidth ? (mp->m_swidth << mp->m_sb.sb_blocklog) : ((mp->m_flags & XFS_MOUNT_DFLT_IOSIZE) ? (1 << (int)MAX(mp->m_readio_log, mp->m_writeio_log)) : PAGE_CACHE_SIZE)); } All of that can be exported as 4 parameters for normal files: allocation block size (extent size hint) minimum io size (PAGE_CACHE_SIZE) preferred minimum IO size (mp->m_readio_log/mp->m_writeio_log) best aligned IO size (stripe width) And for realtime files it's a bit different because of the block-based bitmap allocator it uses: allocation block size (extent size hint) minimum io size (PAGE_CACHE_SIZE) preferred minimum IO size (extent size hint) best aligned IO size (some multiple of extent size hint) > Userspace is able to handle > st_blksize of several MB in size without problems, and any sane > application will do the IO sized + aligned on multiples of this. Actually, some applications still have problems with that. That's the reason we only expose stripe widths in st_blksize when a mount option is set. Stripe widths are known to get into the tens of MB, and applications using st_blksize for memory allocation of IO buffers tend to get into trouble with those. That's why I'd prefer specific optimal IO hints - we don't have to overload st_blksize with lots of meanings to pass what is relatively trivial information back to the application. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-28 0:38 ` Dave Chinner (?) @ 2012-04-28 0:54 ` Steve French -1 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-28 0:54 UTC (permalink / raw) To: Dave Chinner Cc: Andreas Dilger, David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, linux-api, libc-alpha On Fri, Apr 27, 2012 at 7:38 PM, Dave Chinner <david@fromorbit.com> wrote: > preferred minimum IO size (mp->m_readio_log/mp->m_writeio_log) This discussion about i/o sizes is very interesting. For network file system (at least for SMB2 to all known servers, and for cifs mounts to Samba, but probably for recent NFS), ideal i/o sizes are often well over a megabyte ... but how to indicate that to the application... > That's why I'd prefer specific optimal IO hints - we don't have to > overload st_blksize with lots of meanings to pass what is relatively > trivial information back to the application. > > Cheers, > > Dave. > -- > Dave Chinner Yes. -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* Extended file stat: Splitting file- and fs-specific info? 2012-04-19 14:05 ` David Howells @ 2012-05-08 20:19 ` David Howells -1 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-05-08 20:19 UTC (permalink / raw) To: adilger-m1MBpc4rdrD3fQ9qLvQP4Q, david-FqsqvQoI3Ljby3iVrkZq2A, bfields-uC3wQj2KruNg9hUCZPvPmw, smfrench-Re5JQEeQqe8AvxtiuMwx3w, ben-/+tVBieCtBitmTQ+vhA3Yw, Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA, roland-/Z5OmTQCD9xF6kxbq+BtvQ Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw Should I split the file-specific info and the fs-specific info and make the second optional? What I'm thinking of is something like this: Have a file information structure: struct statx { /* 0x00 */ uint32_t st_mask; /* What results were written */ uint32_t st_information; /* Information about the file */ uint16_t st_mode; /* File mode */ uint16_t __spare0[3]; /* 0x10 */ uint32_t st_uid; /* User ID of owner */ uint32_t st_gid; /* Group ID of owner */ uint32_t st_nlink; /* Number of hard links */ uint32_t st_blksize; /* Optimal size for filesystem I/O */ /* 0x20 */ struct statx_dev st_rdev; /* Device ID of special file */ struct statx_dev st_dev; /* ID of device containing file */ /* 0x30 */ int32_t st_atime_ns; /* Last access time (ns part) */ int32_t st_btime_ns; /* File creation time (ns part) */ int32_t st_ctime_ns; /* Last attribute change time (ns part) */ int32_t st_mtime_ns; /* Last data modification time (ns part) */ /* 0x40 */ int64_t st_atime; /* Last access time */ int64_t st_btime; /* File creation time */ int64_t st_ctime; /* Last attribute change time */ int64_t st_mtime; /* Last data modification time */ /* 0x60 */ uint64_t st_ino; /* Inode number */ uint64_t st_size; /* File size */ uint64_t st_blocks; /* Number of 512-byte blocks allocated */ uint64_t st_gen; /* Inode generation number */ uint64_t st_version; /* Data version number */ uint64_t st_ioc_flags; /* As FS_IOC_GETFLAGS */ /* 0x90 */ uint64_t __spare1[13]; /* Spare space for future expansion */ /* 0x100 */ }; And an fs information structure for less commonly needed data: struct statx_fsinfo { /* 0x00 - General info */ uint32_t st_mask; /* What optional fields are filled in */ uint32_t st_type; /* Filesystem type from linux/magic.h */ /* 0x08 - file timestamp granularity info */ uint16_t st_atime_gran_mantissa; /* gran(secs) = mant * 10^exp */ uint16_t st_btime_gran_mantissa; uint16_t st_ctime_gran_mantissa; uint16_t st_mtime_gran_mantissa; /* 0x10 */ int8_t st_atime_gran_exponent; int8_t st_btime_gran_exponent; int8_t st_ctime_gran_exponent; int8_t st_mtime_gran_exponent; /* 0x14 - I/O parameters */ uint32_t st_blksize; /* File block size */ uint32_t st_alloc_blksize; /* Allocation block size/alignment */ uint32_t st_small_io_size; /* IO size/alignment that avoids fs/page cache RMW */ uint32_t st_pref_io_size; /* Preferred IO size for general usage */ uint32_t st_large_io_size; /* IO size/alignment for high bandwidth sequential IO */ /* 0x28 - Restrictions on struct statx contents */ uint64_t st_supported_ioc_flags; /* FS_IOC_GETFLAGS flags supported */ /* 0x30 - Volume/filesystem information */ uint64_t st_fsid; /* Short 64-bit Filesystem ID (as statfs) */ uint64_t __spare0[3]; /* 0x50 */ uint8_t st_volume_id[16]; /* Volume/fs identifier */ uint8_t st_volume_uuid[16]; /* Volume/fs UUID */ /* 0x80 */ uint64_t __spare1[8]; /* 0xc0 */ uint8_t st_volume_name[64]; /* Volume name (up to 64 chars) */ /* 0x100 */ uint8_t st_domain_name[256]; /* Domain/cell/workgroup name (up to 256 chars) */ /* 0x200 */ }; One could argue a bit over what goes in which, should we go for this. This may be better split between multiple syscalls though (with the race that that implies) and potentially merging with statfs. The statxat() syscall [née xstat] could then use the 6th parameter thusly: asmlinkage long sys_statxat(int dfd, const char __user *path, unsigned flags, unsigned mask, struct statx __user *buffer, struct statx_fsinfo __user *fsinfo); letting fsinfo be NULL to indicate a lack of interest. I'm not sure we want to do that, though. Also, do Dave Chinner's ideas for indicating five I/O parameters want to be 32-bit numbers? Larger? Smaller? Can they be log2? Note also, that I've suggested that we represent the timestamp granularity information as a decimal float (which requires 3 bytes per timestamp) and that we provide separate granularities for each timestamp. David ^ permalink raw reply [flat|nested] 144+ messages in thread
* Extended file stat: Splitting file- and fs-specific info? @ 2012-05-08 20:19 ` David Howells 0 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-05-08 20:19 UTC (permalink / raw) To: adilger, david, bfields, smfrench, ben, Trond.Myklebust, roland Cc: dhowells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha Should I split the file-specific info and the fs-specific info and make the second optional? What I'm thinking of is something like this: Have a file information structure: struct statx { /* 0x00 */ uint32_t st_mask; /* What results were written */ uint32_t st_information; /* Information about the file */ uint16_t st_mode; /* File mode */ uint16_t __spare0[3]; /* 0x10 */ uint32_t st_uid; /* User ID of owner */ uint32_t st_gid; /* Group ID of owner */ uint32_t st_nlink; /* Number of hard links */ uint32_t st_blksize; /* Optimal size for filesystem I/O */ /* 0x20 */ struct statx_dev st_rdev; /* Device ID of special file */ struct statx_dev st_dev; /* ID of device containing file */ /* 0x30 */ int32_t st_atime_ns; /* Last access time (ns part) */ int32_t st_btime_ns; /* File creation time (ns part) */ int32_t st_ctime_ns; /* Last attribute change time (ns part) */ int32_t st_mtime_ns; /* Last data modification time (ns part) */ /* 0x40 */ int64_t st_atime; /* Last access time */ int64_t st_btime; /* File creation time */ int64_t st_ctime; /* Last attribute change time */ int64_t st_mtime; /* Last data modification time */ /* 0x60 */ uint64_t st_ino; /* Inode number */ uint64_t st_size; /* File size */ uint64_t st_blocks; /* Number of 512-byte blocks allocated */ uint64_t st_gen; /* Inode generation number */ uint64_t st_version; /* Data version number */ uint64_t st_ioc_flags; /* As FS_IOC_GETFLAGS */ /* 0x90 */ uint64_t __spare1[13]; /* Spare space for future expansion */ /* 0x100 */ }; And an fs information structure for less commonly needed data: struct statx_fsinfo { /* 0x00 - General info */ uint32_t st_mask; /* What optional fields are filled in */ uint32_t st_type; /* Filesystem type from linux/magic.h */ /* 0x08 - file timestamp granularity info */ uint16_t st_atime_gran_mantissa; /* gran(secs) = mant * 10^exp */ uint16_t st_btime_gran_mantissa; uint16_t st_ctime_gran_mantissa; uint16_t st_mtime_gran_mantissa; /* 0x10 */ int8_t st_atime_gran_exponent; int8_t st_btime_gran_exponent; int8_t st_ctime_gran_exponent; int8_t st_mtime_gran_exponent; /* 0x14 - I/O parameters */ uint32_t st_blksize; /* File block size */ uint32_t st_alloc_blksize; /* Allocation block size/alignment */ uint32_t st_small_io_size; /* IO size/alignment that avoids fs/page cache RMW */ uint32_t st_pref_io_size; /* Preferred IO size for general usage */ uint32_t st_large_io_size; /* IO size/alignment for high bandwidth sequential IO */ /* 0x28 - Restrictions on struct statx contents */ uint64_t st_supported_ioc_flags; /* FS_IOC_GETFLAGS flags supported */ /* 0x30 - Volume/filesystem information */ uint64_t st_fsid; /* Short 64-bit Filesystem ID (as statfs) */ uint64_t __spare0[3]; /* 0x50 */ uint8_t st_volume_id[16]; /* Volume/fs identifier */ uint8_t st_volume_uuid[16]; /* Volume/fs UUID */ /* 0x80 */ uint64_t __spare1[8]; /* 0xc0 */ uint8_t st_volume_name[64]; /* Volume name (up to 64 chars) */ /* 0x100 */ uint8_t st_domain_name[256]; /* Domain/cell/workgroup name (up to 256 chars) */ /* 0x200 */ }; One could argue a bit over what goes in which, should we go for this. This may be better split between multiple syscalls though (with the race that that implies) and potentially merging with statfs. The statxat() syscall [née xstat] could then use the 6th parameter thusly: asmlinkage long sys_statxat(int dfd, const char __user *path, unsigned flags, unsigned mask, struct statx __user *buffer, struct statx_fsinfo __user *fsinfo); letting fsinfo be NULL to indicate a lack of interest. I'm not sure we want to do that, though. Also, do Dave Chinner's ideas for indicating five I/O parameters want to be 32-bit numbers? Larger? Smaller? Can they be log2? Note also, that I've suggested that we represent the timestamp granularity information as a decimal float (which requires 3 bytes per timestamp) and that we provide separate granularities for each timestamp. David ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-08 20:19 ` David Howells @ 2012-05-08 21:13 ` Myklebust, Trond -1 siblings, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-05-08 21:13 UTC (permalink / raw) To: David Howells Cc: adilger, david, bfields, smfrench, ben, roland, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On Tue, 2012-05-08 at 21:19 +0100, David Howells wrote: > Should I split the file-specific info and the fs-specific info and make the > second optional? What I'm thinking of is something like this: > > Have a file information structure: > > struct statx { > /* 0x00 */ > uint32_t st_mask; /* What results were written */ > uint32_t st_information; /* Information about the file */ > uint16_t st_mode; /* File mode */ > uint16_t __spare0[3]; > /* 0x10 */ > uint32_t st_uid; /* User ID of owner */ > uint32_t st_gid; /* Group ID of owner */ > uint32_t st_nlink; /* Number of hard links */ > uint32_t st_blksize; /* Optimal size for filesystem I/O */ > /* 0x20 */ > struct statx_dev st_rdev; /* Device ID of special file */ > struct statx_dev st_dev; /* ID of device containing file */ > /* 0x30 */ > int32_t st_atime_ns; /* Last access time (ns part) */ > int32_t st_btime_ns; /* File creation time (ns part) */ > int32_t st_ctime_ns; /* Last attribute change time (ns part) */ > int32_t st_mtime_ns; /* Last data modification time (ns part) */ > /* 0x40 */ > int64_t st_atime; /* Last access time */ > int64_t st_btime; /* File creation time */ > int64_t st_ctime; /* Last attribute change time */ > int64_t st_mtime; /* Last data modification time */ > /* 0x60 */ > uint64_t st_ino; /* Inode number */ > uint64_t st_size; /* File size */ Should we consider making the st_size and st_blocks 128-bit values while we're at it? Alternatively, we could add an st_ioc_flag for it later... > uint64_t st_blocks; /* Number of 512-byte blocks allocated */ > uint64_t st_gen; /* Inode generation number */ > uint64_t st_version; /* Data version number */ > uint64_t st_ioc_flags; /* As FS_IOC_GETFLAGS */ > /* 0x90 */ > uint64_t __spare1[13]; /* Spare space for future expansion */ > /* 0x100 */ > }; > > And an fs information structure for less commonly needed data: > > struct statx_fsinfo { > /* 0x00 - General info */ > uint32_t st_mask; /* What optional fields are filled in */ > uint32_t st_type; /* Filesystem type from linux/magic.h */ > > /* 0x08 - file timestamp granularity info */ > uint16_t st_atime_gran_mantissa; /* gran(secs) = mant * 10^exp */ > uint16_t st_btime_gran_mantissa; > uint16_t st_ctime_gran_mantissa; > uint16_t st_mtime_gran_mantissa; > /* 0x10 */ > int8_t st_atime_gran_exponent; > int8_t st_btime_gran_exponent; > int8_t st_ctime_gran_exponent; > int8_t st_mtime_gran_exponent; > > /* 0x14 - I/O parameters */ > uint32_t st_blksize; /* File block size */ > uint32_t st_alloc_blksize; /* Allocation block size/alignment */ > uint32_t st_small_io_size; /* IO size/alignment that avoids fs/page cache RMW */ > uint32_t st_pref_io_size; /* Preferred IO size for general usage */ > uint32_t st_large_io_size; /* IO size/alignment for high bandwidth sequential IO */ > > /* 0x28 - Restrictions on struct statx contents */ > uint64_t st_supported_ioc_flags; /* FS_IOC_GETFLAGS flags supported */ > > /* 0x30 - Volume/filesystem information */ > uint64_t st_fsid; /* Short 64-bit Filesystem ID (as statfs) */ > uint64_t __spare0[3]; > /* 0x50 */ > uint8_t st_volume_id[16]; /* Volume/fs identifier */ > uint8_t st_volume_uuid[16]; /* Volume/fs UUID */ > /* 0x80 */ > uint64_t __spare1[8]; > /* 0xc0 */ > uint8_t st_volume_name[64]; /* Volume name (up to 64 chars) */ > /* 0x100 */ > uint8_t st_domain_name[256]; /* Domain/cell/workgroup name (up to 256 chars) */ > /* 0x200 */ > }; If you are making a separate fsinfo structure, then it would be nice to have flags to indicate what kind of acls the filesystem supports, and if it supports features such as xattrs, subfiles and/or snapshots. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? @ 2012-05-08 21:13 ` Myklebust, Trond 0 siblings, 0 replies; 144+ messages in thread From: Myklebust, Trond @ 2012-05-08 21:13 UTC (permalink / raw) To: David Howells Cc: adilger, david, bfields, smfrench, ben, roland, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha T24gVHVlLCAyMDEyLTA1LTA4IGF0IDIxOjE5ICswMTAwLCBEYXZpZCBIb3dlbGxzIHdyb3RlOg0K PiBTaG91bGQgSSBzcGxpdCB0aGUgZmlsZS1zcGVjaWZpYyBpbmZvIGFuZCB0aGUgZnMtc3BlY2lm aWMgaW5mbyBhbmQgbWFrZSB0aGUNCj4gc2Vjb25kIG9wdGlvbmFsPyAgV2hhdCBJJ20gdGhpbmtp bmcgb2YgaXMgc29tZXRoaW5nIGxpa2UgdGhpczoNCj4gDQo+IEhhdmUgYSBmaWxlIGluZm9ybWF0 aW9uIHN0cnVjdHVyZToNCj4gDQo+IHN0cnVjdCBzdGF0eCB7DQo+IAkvKiAweDAwICovDQo+IAl1 aW50MzJfdAlzdF9tYXNrOwkvKiBXaGF0IHJlc3VsdHMgd2VyZSB3cml0dGVuICovDQo+IAl1aW50 MzJfdAlzdF9pbmZvcm1hdGlvbjsJLyogSW5mb3JtYXRpb24gYWJvdXQgdGhlIGZpbGUgKi8NCj4g CXVpbnQxNl90CXN0X21vZGU7CS8qIEZpbGUgbW9kZSAqLw0KPiAJdWludDE2X3QJX19zcGFyZTBb M107DQo+IAkvKiAweDEwICovDQo+IAl1aW50MzJfdAlzdF91aWQ7CQkvKiBVc2VyIElEIG9mIG93 bmVyICovDQo+IAl1aW50MzJfdAlzdF9naWQ7CQkvKiBHcm91cCBJRCBvZiBvd25lciAqLw0KPiAJ dWludDMyX3QJc3Rfbmxpbms7CS8qIE51bWJlciBvZiBoYXJkIGxpbmtzICovDQo+IAl1aW50MzJf dAlzdF9ibGtzaXplOwkvKiBPcHRpbWFsIHNpemUgZm9yIGZpbGVzeXN0ZW0gSS9PICovDQo+IAkv KiAweDIwICovDQo+IAlzdHJ1Y3Qgc3RhdHhfZGV2IHN0X3JkZXY7CS8qIERldmljZSBJRCBvZiBz cGVjaWFsIGZpbGUgKi8NCj4gCXN0cnVjdCBzdGF0eF9kZXYgc3RfZGV2OwkvKiBJRCBvZiBkZXZp Y2UgY29udGFpbmluZyBmaWxlICovDQo+IAkvKiAweDMwICovDQo+IAlpbnQzMl90CQlzdF9hdGlt ZV9uczsJLyogTGFzdCBhY2Nlc3MgdGltZSAobnMgcGFydCkgKi8NCj4gCWludDMyX3QJCXN0X2J0 aW1lX25zOwkvKiBGaWxlIGNyZWF0aW9uIHRpbWUgKG5zIHBhcnQpICovDQo+IAlpbnQzMl90CQlz dF9jdGltZV9uczsJLyogTGFzdCBhdHRyaWJ1dGUgY2hhbmdlIHRpbWUgKG5zIHBhcnQpICovDQo+ IAlpbnQzMl90CQlzdF9tdGltZV9uczsJLyogTGFzdCBkYXRhIG1vZGlmaWNhdGlvbiB0aW1lIChu cyBwYXJ0KSAqLw0KPiAJLyogMHg0MCAqLw0KPiAJaW50NjRfdAkJc3RfYXRpbWU7CS8qIExhc3Qg YWNjZXNzIHRpbWUgKi8NCj4gCWludDY0X3QJCXN0X2J0aW1lOwkvKiBGaWxlIGNyZWF0aW9uIHRp bWUgKi8NCj4gCWludDY0X3QJCXN0X2N0aW1lOwkvKiBMYXN0IGF0dHJpYnV0ZSBjaGFuZ2UgdGlt ZSAqLw0KPiAJaW50NjRfdAkJc3RfbXRpbWU7CS8qIExhc3QgZGF0YSBtb2RpZmljYXRpb24gdGlt ZSAqLw0KPiAJLyogMHg2MCAqLw0KPiAJdWludDY0X3QJc3RfaW5vOwkJLyogSW5vZGUgbnVtYmVy ICovDQo+IAl1aW50NjRfdAlzdF9zaXplOwkvKiBGaWxlIHNpemUgKi8NCg0KU2hvdWxkIHdlIGNv bnNpZGVyIG1ha2luZyB0aGUgc3Rfc2l6ZSBhbmQgc3RfYmxvY2tzIDEyOC1iaXQgdmFsdWVzIHdo aWxlDQp3ZSdyZSBhdCBpdD8gQWx0ZXJuYXRpdmVseSwgd2UgY291bGQgYWRkIGFuIHN0X2lvY19m bGFnIGZvciBpdCBsYXRlci4uLg0KDQo+IAl1aW50NjRfdAlzdF9ibG9ja3M7CS8qIE51bWJlciBv ZiA1MTItYnl0ZSBibG9ja3MgYWxsb2NhdGVkICovDQo+IAl1aW50NjRfdAlzdF9nZW47CQkvKiBJ bm9kZSBnZW5lcmF0aW9uIG51bWJlciAqLw0KPiAJdWludDY0X3QJc3RfdmVyc2lvbjsJLyogRGF0 YSB2ZXJzaW9uIG51bWJlciAqLw0KPiAJdWludDY0X3QJc3RfaW9jX2ZsYWdzOwkvKiBBcyBGU19J T0NfR0VURkxBR1MgKi8NCj4gCS8qIDB4OTAgKi8NCj4gCXVpbnQ2NF90CV9fc3BhcmUxWzEzXTsJ LyogU3BhcmUgc3BhY2UgZm9yIGZ1dHVyZSBleHBhbnNpb24gKi8NCj4gCS8qIDB4MTAwICovDQo+ IH07DQo+IA0KPiBBbmQgYW4gZnMgaW5mb3JtYXRpb24gc3RydWN0dXJlIGZvciBsZXNzIGNvbW1v bmx5IG5lZWRlZCBkYXRhOg0KPiANCj4gc3RydWN0IHN0YXR4X2ZzaW5mbyB7DQo+IAkvKiAweDAw IC0gR2VuZXJhbCBpbmZvICovDQo+IAl1aW50MzJfdAlzdF9tYXNrOwkvKiBXaGF0IG9wdGlvbmFs IGZpZWxkcyBhcmUgZmlsbGVkIGluICovDQo+IAl1aW50MzJfdAlzdF90eXBlOwkvKiBGaWxlc3lz dGVtIHR5cGUgZnJvbSBsaW51eC9tYWdpYy5oICovDQo+IA0KPiAJLyogMHgwOCAtIGZpbGUgdGlt ZXN0YW1wIGdyYW51bGFyaXR5IGluZm8gKi8NCj4gCXVpbnQxNl90CXN0X2F0aW1lX2dyYW5fbWFu dGlzc2E7CS8qIGdyYW4oc2VjcykgPSBtYW50ICogMTBeZXhwICovDQo+IAl1aW50MTZfdAlzdF9i dGltZV9ncmFuX21hbnRpc3NhOw0KPiAJdWludDE2X3QJc3RfY3RpbWVfZ3Jhbl9tYW50aXNzYTsN Cj4gCXVpbnQxNl90CXN0X210aW1lX2dyYW5fbWFudGlzc2E7DQo+IAkvKiAweDEwICovDQo+IAlp bnQ4X3QJCXN0X2F0aW1lX2dyYW5fZXhwb25lbnQ7DQo+IAlpbnQ4X3QJCXN0X2J0aW1lX2dyYW5f ZXhwb25lbnQ7DQo+IAlpbnQ4X3QJCXN0X2N0aW1lX2dyYW5fZXhwb25lbnQ7DQo+IAlpbnQ4X3QJ CXN0X210aW1lX2dyYW5fZXhwb25lbnQ7DQo+IA0KPiAJLyogMHgxNCAtIEkvTyBwYXJhbWV0ZXJz ICovDQo+IAl1aW50MzJfdAlzdF9ibGtzaXplOwkgIC8qIEZpbGUgYmxvY2sgc2l6ZSAqLw0KPiAJ dWludDMyX3QJc3RfYWxsb2NfYmxrc2l6ZTsgLyogQWxsb2NhdGlvbiBibG9jayBzaXplL2FsaWdu bWVudCAqLw0KPiAJdWludDMyX3QJc3Rfc21hbGxfaW9fc2l6ZTsgLyogSU8gc2l6ZS9hbGlnbm1l bnQgdGhhdCBhdm9pZHMgZnMvcGFnZSBjYWNoZSBSTVcgKi8NCj4gCXVpbnQzMl90CXN0X3ByZWZf aW9fc2l6ZTsgIC8qIFByZWZlcnJlZCBJTyBzaXplIGZvciBnZW5lcmFsIHVzYWdlICovDQo+IAl1 aW50MzJfdAlzdF9sYXJnZV9pb19zaXplOyAvKiBJTyBzaXplL2FsaWdubWVudCBmb3IgaGlnaCBi YW5kd2lkdGggc2VxdWVudGlhbCBJTyAqLw0KPiANCj4gCS8qIDB4MjggLSBSZXN0cmljdGlvbnMg b24gc3RydWN0IHN0YXR4IGNvbnRlbnRzICovDQo+IAl1aW50NjRfdAlzdF9zdXBwb3J0ZWRfaW9j X2ZsYWdzOyAvKiBGU19JT0NfR0VURkxBR1MgZmxhZ3Mgc3VwcG9ydGVkICAqLw0KPiANCj4gCS8q IDB4MzAgLSBWb2x1bWUvZmlsZXN5c3RlbSBpbmZvcm1hdGlvbiAqLw0KPiAJdWludDY0X3QJc3Rf ZnNpZDsJLyogU2hvcnQgNjQtYml0IEZpbGVzeXN0ZW0gSUQgKGFzIHN0YXRmcykgKi8NCj4gCXVp bnQ2NF90CV9fc3BhcmUwWzNdOw0KPiAJLyogMHg1MCAqLw0KPiAJdWludDhfdAkJc3Rfdm9sdW1l X2lkWzE2XTsgLyogVm9sdW1lL2ZzIGlkZW50aWZpZXIgKi8NCj4gCXVpbnQ4X3QJCXN0X3ZvbHVt ZV91dWlkWzE2XTsgLyogVm9sdW1lL2ZzIFVVSUQgKi8NCj4gCS8qIDB4ODAgKi8NCj4gCXVpbnQ2 NF90CV9fc3BhcmUxWzhdOw0KPiAJLyogMHhjMCAqLw0KPiAJdWludDhfdAkJc3Rfdm9sdW1lX25h bWVbNjRdOyAvKiBWb2x1bWUgbmFtZSAodXAgdG8gNjQgY2hhcnMpICovDQo+IAkvKiAweDEwMCAq Lw0KPiAJdWludDhfdAkJc3RfZG9tYWluX25hbWVbMjU2XTsgLyogRG9tYWluL2NlbGwvd29ya2dy b3VwIG5hbWUgKHVwIHRvIDI1NiBjaGFycykgKi8NCj4gCS8qIDB4MjAwICovDQo+IH07DQoNCklm IHlvdSBhcmUgbWFraW5nIGEgc2VwYXJhdGUgZnNpbmZvIHN0cnVjdHVyZSwgdGhlbiBpdCB3b3Vs ZCBiZSBuaWNlIHRvDQpoYXZlIGZsYWdzIHRvIGluZGljYXRlIHdoYXQga2luZCBvZiBhY2xzIHRo ZSBmaWxlc3lzdGVtIHN1cHBvcnRzLCBhbmQgaWYNCml0IHN1cHBvcnRzIGZlYXR1cmVzIHN1Y2gg YXMgeGF0dHJzLCBzdWJmaWxlcyBhbmQvb3Igc25hcHNob3RzLg0KDQoNCg0KLS0gDQpUcm9uZCBN eWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15 a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <16281.1336508382-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-08 20:19 ` David Howells @ 2012-05-09 0:24 ` Dave Chinner -1 siblings, 0 replies; 144+ messages in thread From: Dave Chinner @ 2012-05-09 0:24 UTC (permalink / raw) To: David Howells Cc: adilger-m1MBpc4rdrD3fQ9qLvQP4Q, bfields-uC3wQj2KruNg9hUCZPvPmw, smfrench-Re5JQEeQqe8AvxtiuMwx3w, ben-/+tVBieCtBitmTQ+vhA3Yw, Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA, roland-/Z5OmTQCD9xF6kxbq+BtvQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Tue, May 08, 2012 at 09:19:42PM +0100, David Howells wrote: > > Should I split the file-specific info and the fs-specific info and make the > second optional? What I'm thinking of is something like this: > > Have a file information structure: > > struct statx { > /* 0x00 */ > uint32_t st_mask; /* What results were written */ > uint32_t st_information; /* Information about the file */ > uint16_t st_mode; /* File mode */ > uint16_t __spare0[3]; > /* 0x10 */ > uint32_t st_uid; /* User ID of owner */ > uint32_t st_gid; /* Group ID of owner */ > uint32_t st_nlink; /* Number of hard links */ > uint32_t st_blksize; /* Optimal size for filesystem I/O */ > /* 0x20 */ > struct statx_dev st_rdev; /* Device ID of special file */ > struct statx_dev st_dev; /* ID of device containing file */ > /* 0x30 */ > int32_t st_atime_ns; /* Last access time (ns part) */ > int32_t st_btime_ns; /* File creation time (ns part) */ > int32_t st_ctime_ns; /* Last attribute change time (ns part) */ > int32_t st_mtime_ns; /* Last data modification time (ns part) */ > /* 0x40 */ > int64_t st_atime; /* Last access time */ > int64_t st_btime; /* File creation time */ > int64_t st_ctime; /* Last attribute change time */ > int64_t st_mtime; /* Last data modification time */ > /* 0x60 */ > uint64_t st_ino; /* Inode number */ > uint64_t st_size; /* File size */ > uint64_t st_blocks; /* Number of 512-byte blocks allocated */ > uint64_t st_gen; /* Inode generation number */ I don't think we want to expose the inode generation numbers. It is trivial to construct NFS file handles (usually just fsid, inode number and generation) with that information and hence bypass security checks to access files. > uint64_t st_version; /* Data version number */ > uint64_t st_ioc_flags; /* As FS_IOC_GETFLAGS */ > /* 0x90 */ > uint64_t __spare1[13]; /* Spare space for future expansion */ > /* 0x100 */ > }; > > And an fs information structure for less commonly needed data: > > struct statx_fsinfo { > /* 0x00 - General info */ > uint32_t st_mask; /* What optional fields are filled in */ > uint32_t st_type; /* Filesystem type from linux/magic.h */ > > /* 0x08 - file timestamp granularity info */ > uint16_t st_atime_gran_mantissa; /* gran(secs) = mant * 10^exp */ > uint16_t st_btime_gran_mantissa; > uint16_t st_ctime_gran_mantissa; > uint16_t st_mtime_gran_mantissa; > /* 0x10 */ > int8_t st_atime_gran_exponent; > int8_t st_btime_gran_exponent; > int8_t st_ctime_gran_exponent; > int8_t st_mtime_gran_exponent; > > /* 0x14 - I/O parameters */ > uint32_t st_blksize; /* File block size */ > uint32_t st_alloc_blksize; /* Allocation block size/alignment */ > uint32_t st_small_io_size; /* IO size/alignment that avoids fs/page cache RMW */ > uint32_t st_pref_io_size; /* Preferred IO size for general usage */ > uint32_t st_large_io_size; /* IO size/alignment for high bandwidth sequential IO */ That's per file information, not per filesystem. XFS definitely needs this IO information per-file.... > > /* 0x28 - Restrictions on struct statx contents */ > uint64_t st_supported_ioc_flags; /* FS_IOC_GETFLAGS flags supported */ > > /* 0x30 - Volume/filesystem information */ > uint64_t st_fsid; /* Short 64-bit Filesystem ID (as statfs) */ > uint64_t __spare0[3]; > /* 0x50 */ > uint8_t st_volume_id[16]; /* Volume/fs identifier */ > uint8_t st_volume_uuid[16]; /* Volume/fs UUID */ And there's all the remaining information needed to construct file NFS handles without root priviledges... > /* 0x80 */ > uint64_t __spare1[8]; > /* 0xc0 */ > uint8_t st_volume_name[64]; /* Volume name (up to 64 chars) */ > /* 0x100 */ > uint8_t st_domain_name[256]; /* Domain/cell/workgroup name (up to 256 chars) */ > /* 0x200 */ > }; > > One could argue a bit over what goes in which, should we go for this. This > may be better split between multiple syscalls though (with the race that that > implies) and potentially merging with statfs. > > > The statxat() syscall [née xstat] could then use the 6th parameter thusly: > > asmlinkage long sys_statxat(int dfd, const char __user *path, unsigned flags, > unsigned mask, struct statx __user *buffer, > struct statx_fsinfo __user *fsinfo); > > > letting fsinfo be NULL to indicate a lack of interest. I'm not sure we want > to do that, though. > > > Also, do Dave Chinner's ideas for indicating five I/O parameters want to be > 32-bit numbers? Larger? Smaller? Can they be log2? Definitely 32 bit, IMO, as it's not uncommon to see optimal IO sizes in the tens of megabytes on large, high bandwidth storage systems. As for being log2 - that's just making it more complex to use and making code ugly - we'd have to convert to log2 in kernel, then convert back in every single application.... > Note also, that I've suggested that we represent the timestamp granularity > information as a decimal float (which requires 3 bytes per timestamp) and that > we provide separate granularities for each timestamp. > > David > -- Dave Chinner david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? @ 2012-05-09 0:24 ` Dave Chinner 0 siblings, 0 replies; 144+ messages in thread From: Dave Chinner @ 2012-05-09 0:24 UTC (permalink / raw) To: David Howells Cc: adilger, bfields, smfrench, ben, Trond.Myklebust, roland, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On Tue, May 08, 2012 at 09:19:42PM +0100, David Howells wrote: > > Should I split the file-specific info and the fs-specific info and make the > second optional? What I'm thinking of is something like this: > > Have a file information structure: > > struct statx { > /* 0x00 */ > uint32_t st_mask; /* What results were written */ > uint32_t st_information; /* Information about the file */ > uint16_t st_mode; /* File mode */ > uint16_t __spare0[3]; > /* 0x10 */ > uint32_t st_uid; /* User ID of owner */ > uint32_t st_gid; /* Group ID of owner */ > uint32_t st_nlink; /* Number of hard links */ > uint32_t st_blksize; /* Optimal size for filesystem I/O */ > /* 0x20 */ > struct statx_dev st_rdev; /* Device ID of special file */ > struct statx_dev st_dev; /* ID of device containing file */ > /* 0x30 */ > int32_t st_atime_ns; /* Last access time (ns part) */ > int32_t st_btime_ns; /* File creation time (ns part) */ > int32_t st_ctime_ns; /* Last attribute change time (ns part) */ > int32_t st_mtime_ns; /* Last data modification time (ns part) */ > /* 0x40 */ > int64_t st_atime; /* Last access time */ > int64_t st_btime; /* File creation time */ > int64_t st_ctime; /* Last attribute change time */ > int64_t st_mtime; /* Last data modification time */ > /* 0x60 */ > uint64_t st_ino; /* Inode number */ > uint64_t st_size; /* File size */ > uint64_t st_blocks; /* Number of 512-byte blocks allocated */ > uint64_t st_gen; /* Inode generation number */ I don't think we want to expose the inode generation numbers. It is trivial to construct NFS file handles (usually just fsid, inode number and generation) with that information and hence bypass security checks to access files. > uint64_t st_version; /* Data version number */ > uint64_t st_ioc_flags; /* As FS_IOC_GETFLAGS */ > /* 0x90 */ > uint64_t __spare1[13]; /* Spare space for future expansion */ > /* 0x100 */ > }; > > And an fs information structure for less commonly needed data: > > struct statx_fsinfo { > /* 0x00 - General info */ > uint32_t st_mask; /* What optional fields are filled in */ > uint32_t st_type; /* Filesystem type from linux/magic.h */ > > /* 0x08 - file timestamp granularity info */ > uint16_t st_atime_gran_mantissa; /* gran(secs) = mant * 10^exp */ > uint16_t st_btime_gran_mantissa; > uint16_t st_ctime_gran_mantissa; > uint16_t st_mtime_gran_mantissa; > /* 0x10 */ > int8_t st_atime_gran_exponent; > int8_t st_btime_gran_exponent; > int8_t st_ctime_gran_exponent; > int8_t st_mtime_gran_exponent; > > /* 0x14 - I/O parameters */ > uint32_t st_blksize; /* File block size */ > uint32_t st_alloc_blksize; /* Allocation block size/alignment */ > uint32_t st_small_io_size; /* IO size/alignment that avoids fs/page cache RMW */ > uint32_t st_pref_io_size; /* Preferred IO size for general usage */ > uint32_t st_large_io_size; /* IO size/alignment for high bandwidth sequential IO */ That's per file information, not per filesystem. XFS definitely needs this IO information per-file.... > > /* 0x28 - Restrictions on struct statx contents */ > uint64_t st_supported_ioc_flags; /* FS_IOC_GETFLAGS flags supported */ > > /* 0x30 - Volume/filesystem information */ > uint64_t st_fsid; /* Short 64-bit Filesystem ID (as statfs) */ > uint64_t __spare0[3]; > /* 0x50 */ > uint8_t st_volume_id[16]; /* Volume/fs identifier */ > uint8_t st_volume_uuid[16]; /* Volume/fs UUID */ And there's all the remaining information needed to construct file NFS handles without root priviledges... > /* 0x80 */ > uint64_t __spare1[8]; > /* 0xc0 */ > uint8_t st_volume_name[64]; /* Volume name (up to 64 chars) */ > /* 0x100 */ > uint8_t st_domain_name[256]; /* Domain/cell/workgroup name (up to 256 chars) */ > /* 0x200 */ > }; > > One could argue a bit over what goes in which, should we go for this. This > may be better split between multiple syscalls though (with the race that that > implies) and potentially merging with statfs. > > > The statxat() syscall [née xstat] could then use the 6th parameter thusly: > > asmlinkage long sys_statxat(int dfd, const char __user *path, unsigned flags, > unsigned mask, struct statx __user *buffer, > struct statx_fsinfo __user *fsinfo); > > > letting fsinfo be NULL to indicate a lack of interest. I'm not sure we want > to do that, though. > > > Also, do Dave Chinner's ideas for indicating five I/O parameters want to be > 32-bit numbers? Larger? Smaller? Can they be log2? Definitely 32 bit, IMO, as it's not uncommon to see optimal IO sizes in the tens of megabytes on large, high bandwidth storage systems. As for being log2 - that's just making it more complex to use and making code ugly - we'd have to convert to log2 in kernel, then convert back in every single application.... > Note also, that I've suggested that we represent the timestamp granularity > information as a decimal float (which requires 3 bytes per timestamp) and that > we provide separate granularities for each timestamp. > > David > -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-09 0:24 ` Dave Chinner @ 2012-05-09 1:09 ` J. Bruce Fields -1 siblings, 0 replies; 144+ messages in thread From: J. Bruce Fields @ 2012-05-09 1:09 UTC (permalink / raw) To: Dave Chinner Cc: David Howells, adilger-m1MBpc4rdrD3fQ9qLvQP4Q, smfrench-Re5JQEeQqe8AvxtiuMwx3w, ben-/+tVBieCtBitmTQ+vhA3Yw, Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA, roland-/Z5OmTQCD9xF6kxbq+BtvQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Wed, May 09, 2012 at 10:24:20AM +1000, Dave Chinner wrote: > On Tue, May 08, 2012 at 09:19:42PM +0100, David Howells wrote: > > > > Should I split the file-specific info and the fs-specific info and make the > > second optional? What I'm thinking of is something like this: > > > > Have a file information structure: > > > > struct statx { > > /* 0x00 */ > > uint32_t st_mask; /* What results were written */ > > uint32_t st_information; /* Information about the file */ > > uint16_t st_mode; /* File mode */ > > uint16_t __spare0[3]; > > /* 0x10 */ > > uint32_t st_uid; /* User ID of owner */ > > uint32_t st_gid; /* Group ID of owner */ > > uint32_t st_nlink; /* Number of hard links */ > > uint32_t st_blksize; /* Optimal size for filesystem I/O */ > > /* 0x20 */ > > struct statx_dev st_rdev; /* Device ID of special file */ > > struct statx_dev st_dev; /* ID of device containing file */ > > /* 0x30 */ > > int32_t st_atime_ns; /* Last access time (ns part) */ > > int32_t st_btime_ns; /* File creation time (ns part) */ > > int32_t st_ctime_ns; /* Last attribute change time (ns part) */ > > int32_t st_mtime_ns; /* Last data modification time (ns part) */ > > /* 0x40 */ > > int64_t st_atime; /* Last access time */ > > int64_t st_btime; /* File creation time */ > > int64_t st_ctime; /* Last attribute change time */ > > int64_t st_mtime; /* Last data modification time */ > > /* 0x60 */ > > uint64_t st_ino; /* Inode number */ > > uint64_t st_size; /* File size */ > > uint64_t st_blocks; /* Number of 512-byte blocks allocated */ > > uint64_t st_gen; /* Inode generation number */ > > I don't think we want to expose the inode generation numbers. It is > trivial to construct NFS file handles (usually just fsid, inode > number and generation) with that information and hence bypass > security checks to access files. I'm not convinced there's much value in trying to keep filehandles secret. If you're going to base your security on a secret, then it should be hard to guess, easy to keep secret, and changeable in case it ever does get out. Filehandles are pretty easy to guess (even without help like this), they usually go over the wire in cleartext, and they can't be changed. --b. > > > uint64_t st_version; /* Data version number */ > > uint64_t st_ioc_flags; /* As FS_IOC_GETFLAGS */ > > /* 0x90 */ > > uint64_t __spare1[13]; /* Spare space for future expansion */ > > /* 0x100 */ > > }; > > > > And an fs information structure for less commonly needed data: > > > > struct statx_fsinfo { > > /* 0x00 - General info */ > > uint32_t st_mask; /* What optional fields are filled in */ > > uint32_t st_type; /* Filesystem type from linux/magic.h */ > > > > /* 0x08 - file timestamp granularity info */ > > uint16_t st_atime_gran_mantissa; /* gran(secs) = mant * 10^exp */ > > uint16_t st_btime_gran_mantissa; > > uint16_t st_ctime_gran_mantissa; > > uint16_t st_mtime_gran_mantissa; > > /* 0x10 */ > > int8_t st_atime_gran_exponent; > > int8_t st_btime_gran_exponent; > > int8_t st_ctime_gran_exponent; > > int8_t st_mtime_gran_exponent; > > > > /* 0x14 - I/O parameters */ > > uint32_t st_blksize; /* File block size */ > > uint32_t st_alloc_blksize; /* Allocation block size/alignment */ > > uint32_t st_small_io_size; /* IO size/alignment that avoids fs/page cache RMW */ > > uint32_t st_pref_io_size; /* Preferred IO size for general usage */ > > uint32_t st_large_io_size; /* IO size/alignment for high bandwidth sequential IO */ > > That's per file information, not per filesystem. XFS definitely > needs this IO information per-file.... > > > > > /* 0x28 - Restrictions on struct statx contents */ > > uint64_t st_supported_ioc_flags; /* FS_IOC_GETFLAGS flags supported */ > > > > /* 0x30 - Volume/filesystem information */ > > uint64_t st_fsid; /* Short 64-bit Filesystem ID (as statfs) */ > > uint64_t __spare0[3]; > > /* 0x50 */ > > uint8_t st_volume_id[16]; /* Volume/fs identifier */ > > uint8_t st_volume_uuid[16]; /* Volume/fs UUID */ > > And there's all the remaining information needed to construct file > NFS handles without root priviledges... > > > /* 0x80 */ > > uint64_t __spare1[8]; > > /* 0xc0 */ > > uint8_t st_volume_name[64]; /* Volume name (up to 64 chars) */ > > /* 0x100 */ > > uint8_t st_domain_name[256]; /* Domain/cell/workgroup name (up to 256 chars) */ > > /* 0x200 */ > > }; > > > > One could argue a bit over what goes in which, should we go for this. This > > may be better split between multiple syscalls though (with the race that that > > implies) and potentially merging with statfs. > > > > > > The statxat() syscall [née xstat] could then use the 6th parameter thusly: > > > > asmlinkage long sys_statxat(int dfd, const char __user *path, unsigned flags, > > unsigned mask, struct statx __user *buffer, > > struct statx_fsinfo __user *fsinfo); > > > > > > letting fsinfo be NULL to indicate a lack of interest. I'm not sure we want > > to do that, though. > > > > > > Also, do Dave Chinner's ideas for indicating five I/O parameters want to be > > 32-bit numbers? Larger? Smaller? Can they be log2? > > Definitely 32 bit, IMO, as it's not uncommon to see optimal IO sizes > in the tens of megabytes on large, high bandwidth storage systems. > As for being log2 - that's just making it more complex to use and > making code ugly - we'd have to convert to log2 in kernel, then > convert back in every single application.... > > > Note also, that I've suggested that we represent the timestamp granularity > > information as a decimal float (which requires 3 bytes per timestamp) and that > > we provide separate granularities for each timestamp. > > > > David > > > > -- > Dave Chinner > david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? @ 2012-05-09 1:09 ` J. Bruce Fields 0 siblings, 0 replies; 144+ messages in thread From: J. Bruce Fields @ 2012-05-09 1:09 UTC (permalink / raw) To: Dave Chinner Cc: David Howells, adilger, smfrench, ben, Trond.Myklebust, roland, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On Wed, May 09, 2012 at 10:24:20AM +1000, Dave Chinner wrote: > On Tue, May 08, 2012 at 09:19:42PM +0100, David Howells wrote: > > > > Should I split the file-specific info and the fs-specific info and make the > > second optional? What I'm thinking of is something like this: > > > > Have a file information structure: > > > > struct statx { > > /* 0x00 */ > > uint32_t st_mask; /* What results were written */ > > uint32_t st_information; /* Information about the file */ > > uint16_t st_mode; /* File mode */ > > uint16_t __spare0[3]; > > /* 0x10 */ > > uint32_t st_uid; /* User ID of owner */ > > uint32_t st_gid; /* Group ID of owner */ > > uint32_t st_nlink; /* Number of hard links */ > > uint32_t st_blksize; /* Optimal size for filesystem I/O */ > > /* 0x20 */ > > struct statx_dev st_rdev; /* Device ID of special file */ > > struct statx_dev st_dev; /* ID of device containing file */ > > /* 0x30 */ > > int32_t st_atime_ns; /* Last access time (ns part) */ > > int32_t st_btime_ns; /* File creation time (ns part) */ > > int32_t st_ctime_ns; /* Last attribute change time (ns part) */ > > int32_t st_mtime_ns; /* Last data modification time (ns part) */ > > /* 0x40 */ > > int64_t st_atime; /* Last access time */ > > int64_t st_btime; /* File creation time */ > > int64_t st_ctime; /* Last attribute change time */ > > int64_t st_mtime; /* Last data modification time */ > > /* 0x60 */ > > uint64_t st_ino; /* Inode number */ > > uint64_t st_size; /* File size */ > > uint64_t st_blocks; /* Number of 512-byte blocks allocated */ > > uint64_t st_gen; /* Inode generation number */ > > I don't think we want to expose the inode generation numbers. It is > trivial to construct NFS file handles (usually just fsid, inode > number and generation) with that information and hence bypass > security checks to access files. I'm not convinced there's much value in trying to keep filehandles secret. If you're going to base your security on a secret, then it should be hard to guess, easy to keep secret, and changeable in case it ever does get out. Filehandles are pretty easy to guess (even without help like this), they usually go over the wire in cleartext, and they can't be changed. --b. > > > uint64_t st_version; /* Data version number */ > > uint64_t st_ioc_flags; /* As FS_IOC_GETFLAGS */ > > /* 0x90 */ > > uint64_t __spare1[13]; /* Spare space for future expansion */ > > /* 0x100 */ > > }; > > > > And an fs information structure for less commonly needed data: > > > > struct statx_fsinfo { > > /* 0x00 - General info */ > > uint32_t st_mask; /* What optional fields are filled in */ > > uint32_t st_type; /* Filesystem type from linux/magic.h */ > > > > /* 0x08 - file timestamp granularity info */ > > uint16_t st_atime_gran_mantissa; /* gran(secs) = mant * 10^exp */ > > uint16_t st_btime_gran_mantissa; > > uint16_t st_ctime_gran_mantissa; > > uint16_t st_mtime_gran_mantissa; > > /* 0x10 */ > > int8_t st_atime_gran_exponent; > > int8_t st_btime_gran_exponent; > > int8_t st_ctime_gran_exponent; > > int8_t st_mtime_gran_exponent; > > > > /* 0x14 - I/O parameters */ > > uint32_t st_blksize; /* File block size */ > > uint32_t st_alloc_blksize; /* Allocation block size/alignment */ > > uint32_t st_small_io_size; /* IO size/alignment that avoids fs/page cache RMW */ > > uint32_t st_pref_io_size; /* Preferred IO size for general usage */ > > uint32_t st_large_io_size; /* IO size/alignment for high bandwidth sequential IO */ > > That's per file information, not per filesystem. XFS definitely > needs this IO information per-file.... > > > > > /* 0x28 - Restrictions on struct statx contents */ > > uint64_t st_supported_ioc_flags; /* FS_IOC_GETFLAGS flags supported */ > > > > /* 0x30 - Volume/filesystem information */ > > uint64_t st_fsid; /* Short 64-bit Filesystem ID (as statfs) */ > > uint64_t __spare0[3]; > > /* 0x50 */ > > uint8_t st_volume_id[16]; /* Volume/fs identifier */ > > uint8_t st_volume_uuid[16]; /* Volume/fs UUID */ > > And there's all the remaining information needed to construct file > NFS handles without root priviledges... > > > /* 0x80 */ > > uint64_t __spare1[8]; > > /* 0xc0 */ > > uint8_t st_volume_name[64]; /* Volume name (up to 64 chars) */ > > /* 0x100 */ > > uint8_t st_domain_name[256]; /* Domain/cell/workgroup name (up to 256 chars) */ > > /* 0x200 */ > > }; > > > > One could argue a bit over what goes in which, should we go for this. This > > may be better split between multiple syscalls though (with the race that that > > implies) and potentially merging with statfs. > > > > > > The statxat() syscall [née xstat] could then use the 6th parameter thusly: > > > > asmlinkage long sys_statxat(int dfd, const char __user *path, unsigned flags, > > unsigned mask, struct statx __user *buffer, > > struct statx_fsinfo __user *fsinfo); > > > > > > letting fsinfo be NULL to indicate a lack of interest. I'm not sure we want > > to do that, though. > > > > > > Also, do Dave Chinner's ideas for indicating five I/O parameters want to be > > 32-bit numbers? Larger? Smaller? Can they be log2? > > Definitely 32 bit, IMO, as it's not uncommon to see optimal IO sizes > in the tens of megabytes on large, high bandwidth storage systems. > As for being log2 - that's just making it more complex to use and > making code ugly - we'd have to convert to log2 in kernel, then > convert back in every single application.... > > > Note also, that I've suggested that we represent the timestamp granularity > > information as a decimal float (which requires 3 bytes per timestamp) and that > > we provide separate granularities for each timestamp. > > > > David > > > > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-09 1:09 ` J. Bruce Fields (?) @ 2012-05-09 4:25 ` Dave Chinner 2012-05-09 11:14 ` J. Bruce Fields -1 siblings, 1 reply; 144+ messages in thread From: Dave Chinner @ 2012-05-09 4:25 UTC (permalink / raw) To: J. Bruce Fields Cc: David Howells, adilger, smfrench, ben, Trond.Myklebust, roland, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On Tue, May 08, 2012 at 09:09:41PM -0400, J. Bruce Fields wrote: > On Wed, May 09, 2012 at 10:24:20AM +1000, Dave Chinner wrote: > > On Tue, May 08, 2012 at 09:19:42PM +0100, David Howells wrote: > > > > > > Should I split the file-specific info and the fs-specific info and make the > > > second optional? What I'm thinking of is something like this: > > > > > > Have a file information structure: > > > > > > struct statx { > > > /* 0x00 */ > > > uint32_t st_mask; /* What results were written */ > > > uint32_t st_information; /* Information about the file */ > > > uint16_t st_mode; /* File mode */ > > > uint16_t __spare0[3]; > > > /* 0x10 */ > > > uint32_t st_uid; /* User ID of owner */ > > > uint32_t st_gid; /* Group ID of owner */ > > > uint32_t st_nlink; /* Number of hard links */ > > > uint32_t st_blksize; /* Optimal size for filesystem I/O */ > > > /* 0x20 */ > > > struct statx_dev st_rdev; /* Device ID of special file */ > > > struct statx_dev st_dev; /* ID of device containing file */ > > > /* 0x30 */ > > > int32_t st_atime_ns; /* Last access time (ns part) */ > > > int32_t st_btime_ns; /* File creation time (ns part) */ > > > int32_t st_ctime_ns; /* Last attribute change time (ns part) */ > > > int32_t st_mtime_ns; /* Last data modification time (ns part) */ > > > /* 0x40 */ > > > int64_t st_atime; /* Last access time */ > > > int64_t st_btime; /* File creation time */ > > > int64_t st_ctime; /* Last attribute change time */ > > > int64_t st_mtime; /* Last data modification time */ > > > /* 0x60 */ > > > uint64_t st_ino; /* Inode number */ > > > uint64_t st_size; /* File size */ > > > uint64_t st_blocks; /* Number of 512-byte blocks allocated */ > > > uint64_t st_gen; /* Inode generation number */ > > > > I don't think we want to expose the inode generation numbers. It is > > trivial to construct NFS file handles (usually just fsid, inode > > number and generation) with that information and hence bypass > > security checks to access files. > > I'm not convinced there's much value in trying to keep filehandles > secret. Sure, but I can't really see any good reason to expose filesystem internal implementation details like this - a generation number is usually used to differentiate between inode life cycles which userspace has no concept of and is different for every filesystem, so it's behaviour and values are not going to be consistent across filesystems. Some filesystems might not even have a generation number they can export, and that makes me wonder if there is any good reason for exposing it at all. If you need to discriminate between versions of files with the same name, then use name_to_handle_at() and compare filehandles.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-09 4:25 ` Dave Chinner @ 2012-05-09 11:14 ` J. Bruce Fields 0 siblings, 0 replies; 144+ messages in thread From: J. Bruce Fields @ 2012-05-09 11:14 UTC (permalink / raw) To: Dave Chinner Cc: David Howells, adilger-m1MBpc4rdrD3fQ9qLvQP4Q, smfrench-Re5JQEeQqe8AvxtiuMwx3w, ben-/+tVBieCtBitmTQ+vhA3Yw, Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA, roland-/Z5OmTQCD9xF6kxbq+BtvQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Wed, May 09, 2012 at 02:25:32PM +1000, Dave Chinner wrote: > On Tue, May 08, 2012 at 09:09:41PM -0400, J. Bruce Fields wrote: > > On Wed, May 09, 2012 at 10:24:20AM +1000, Dave Chinner wrote: > > > On Tue, May 08, 2012 at 09:19:42PM +0100, David Howells wrote: > > > > > > > > Should I split the file-specific info and the fs-specific info and make the > > > > second optional? What I'm thinking of is something like this: > > > > > > > > Have a file information structure: > > > > > > > > struct statx { > > > > /* 0x00 */ > > > > uint32_t st_mask; /* What results were written */ > > > > uint32_t st_information; /* Information about the file */ > > > > uint16_t st_mode; /* File mode */ > > > > uint16_t __spare0[3]; > > > > /* 0x10 */ > > > > uint32_t st_uid; /* User ID of owner */ > > > > uint32_t st_gid; /* Group ID of owner */ > > > > uint32_t st_nlink; /* Number of hard links */ > > > > uint32_t st_blksize; /* Optimal size for filesystem I/O */ > > > > /* 0x20 */ > > > > struct statx_dev st_rdev; /* Device ID of special file */ > > > > struct statx_dev st_dev; /* ID of device containing file */ > > > > /* 0x30 */ > > > > int32_t st_atime_ns; /* Last access time (ns part) */ > > > > int32_t st_btime_ns; /* File creation time (ns part) */ > > > > int32_t st_ctime_ns; /* Last attribute change time (ns part) */ > > > > int32_t st_mtime_ns; /* Last data modification time (ns part) */ > > > > /* 0x40 */ > > > > int64_t st_atime; /* Last access time */ > > > > int64_t st_btime; /* File creation time */ > > > > int64_t st_ctime; /* Last attribute change time */ > > > > int64_t st_mtime; /* Last data modification time */ > > > > /* 0x60 */ > > > > uint64_t st_ino; /* Inode number */ > > > > uint64_t st_size; /* File size */ > > > > uint64_t st_blocks; /* Number of 512-byte blocks allocated */ > > > > uint64_t st_gen; /* Inode generation number */ > > > > > > I don't think we want to expose the inode generation numbers. It is > > > trivial to construct NFS file handles (usually just fsid, inode > > > number and generation) with that information and hence bypass > > > security checks to access files. > > > > I'm not convinced there's much value in trying to keep filehandles > > secret. > > Sure, but I can't really see any good reason to expose filesystem > internal implementation details like this - a generation number is > usually used to differentiate between inode life cycles which > userspace has no concept of and is different for every filesystem, > so it's behaviour and values are not going to be consistent across > filesystems. That's OK. The only requirement would be that the (inode number, inode generation) pair be different for different inodes on the same filesystem. > Some filesystems might not even have a generation > number they can export, and that makes me wonder if there is any > good reason for exposing it at all. That's true of a number of these new attributes. > If you need to discriminate between versions of files with the same > name, then use name_to_handle_at() and compare filehandles.... Sure. Since the only use case given for this has been constructing filehandles, and since we already have an interface for that, I don't feel particularly strongly about this. --b. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? @ 2012-05-09 11:14 ` J. Bruce Fields 0 siblings, 0 replies; 144+ messages in thread From: J. Bruce Fields @ 2012-05-09 11:14 UTC (permalink / raw) To: Dave Chinner Cc: David Howells, adilger, smfrench, ben, Trond.Myklebust, roland, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On Wed, May 09, 2012 at 02:25:32PM +1000, Dave Chinner wrote: > On Tue, May 08, 2012 at 09:09:41PM -0400, J. Bruce Fields wrote: > > On Wed, May 09, 2012 at 10:24:20AM +1000, Dave Chinner wrote: > > > On Tue, May 08, 2012 at 09:19:42PM +0100, David Howells wrote: > > > > > > > > Should I split the file-specific info and the fs-specific info and make the > > > > second optional? What I'm thinking of is something like this: > > > > > > > > Have a file information structure: > > > > > > > > struct statx { > > > > /* 0x00 */ > > > > uint32_t st_mask; /* What results were written */ > > > > uint32_t st_information; /* Information about the file */ > > > > uint16_t st_mode; /* File mode */ > > > > uint16_t __spare0[3]; > > > > /* 0x10 */ > > > > uint32_t st_uid; /* User ID of owner */ > > > > uint32_t st_gid; /* Group ID of owner */ > > > > uint32_t st_nlink; /* Number of hard links */ > > > > uint32_t st_blksize; /* Optimal size for filesystem I/O */ > > > > /* 0x20 */ > > > > struct statx_dev st_rdev; /* Device ID of special file */ > > > > struct statx_dev st_dev; /* ID of device containing file */ > > > > /* 0x30 */ > > > > int32_t st_atime_ns; /* Last access time (ns part) */ > > > > int32_t st_btime_ns; /* File creation time (ns part) */ > > > > int32_t st_ctime_ns; /* Last attribute change time (ns part) */ > > > > int32_t st_mtime_ns; /* Last data modification time (ns part) */ > > > > /* 0x40 */ > > > > int64_t st_atime; /* Last access time */ > > > > int64_t st_btime; /* File creation time */ > > > > int64_t st_ctime; /* Last attribute change time */ > > > > int64_t st_mtime; /* Last data modification time */ > > > > /* 0x60 */ > > > > uint64_t st_ino; /* Inode number */ > > > > uint64_t st_size; /* File size */ > > > > uint64_t st_blocks; /* Number of 512-byte blocks allocated */ > > > > uint64_t st_gen; /* Inode generation number */ > > > > > > I don't think we want to expose the inode generation numbers. It is > > > trivial to construct NFS file handles (usually just fsid, inode > > > number and generation) with that information and hence bypass > > > security checks to access files. > > > > I'm not convinced there's much value in trying to keep filehandles > > secret. > > Sure, but I can't really see any good reason to expose filesystem > internal implementation details like this - a generation number is > usually used to differentiate between inode life cycles which > userspace has no concept of and is different for every filesystem, > so it's behaviour and values are not going to be consistent across > filesystems. That's OK. The only requirement would be that the (inode number, inode generation) pair be different for different inodes on the same filesystem. > Some filesystems might not even have a generation > number they can export, and that makes me wonder if there is any > good reason for exposing it at all. That's true of a number of these new attributes. > If you need to discriminate between versions of files with the same > name, then use name_to_handle_at() and compare filehandles.... Sure. Since the only use case given for this has been constructing filehandles, and since we already have an interface for that, I don't feel particularly strongly about this. --b. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-09 0:24 ` Dave Chinner @ 2012-05-09 1:16 ` Andreas Dilger -1 siblings, 0 replies; 144+ messages in thread From: Andreas Dilger @ 2012-05-09 1:16 UTC (permalink / raw) To: Dave Chinner Cc: David Howells, adilger, bfields, smfrench, ben, Trond.Myklebust, roland, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On 2012-05-08, at 18:24, Dave Chinner <david@fromorbit.com> wrote: > On Tue, May 08, 2012 at 09:19:42PM +0100, David Howells wrote: >> >> Should I split the file-specific info and the fs-specific info and make the >> second optional? What I'm thinking of is something like this: >> >> Have a file information structure: >> >> struct statx { >> /* 0x00 */ >> uint32_t st_mask; /* What results were written */ >> uint32_t st_information; /* Information about the file */ >> uint16_t st_mode; /* File mode */ >> uint16_t __spare0[3]; >> /* 0x10 */ >> uint32_t st_uid; /* User ID of owner */ >> uint32_t st_gid; /* Group ID of owner */ >> uint32_t st_nlink; /* Number of hard links */ >> uint32_t st_blksize; /* Optimal size for filesystem I/O */ >> /* 0x20 */ >> struct statx_dev st_rdev; /* Device ID of special file */ >> struct statx_dev st_dev; /* ID of device containing file */ >> /* 0x30 */ >> int32_t st_atime_ns; /* Last access time (ns part) */ >> int32_t st_btime_ns; /* File creation time (ns part) */ >> int32_t st_ctime_ns; /* Last attribute change time (ns part) */ >> int32_t st_mtime_ns; /* Last data modification time (ns part) */ >> /* 0x40 */ >> int64_t st_atime; /* Last access time */ >> int64_t st_btime; /* File creation time */ >> int64_t st_ctime; /* Last attribute change time */ >> int64_t st_mtime; /* Last data modification time */ >> /* 0x60 */ >> uint64_t st_ino; /* Inode number */ >> uint64_t st_size; /* File size */ >> uint64_t st_blocks; /* Number of 512-byte blocks allocated */ >> uint64_t st_gen; /* Inode generation number */ > > I don't think we want to expose the inode generation numbers. It is > trivial to construct NFS file handles (usually just fsid, inode > number and generation) with that information and hence bypass > security checks to access files. At the same time, if the user can stay the file and get this information, that isn't making the sole any less secure than if they can access the file via NFS. They have already passed all of the pathname security checks by the time they can do a statxat() on the file. >> uint64_t st_version; /* Data version number */ >> uint64_t st_ioc_flags; /* As FS_IOC_GETFLAGS */ >> /* 0x90 */ >> uint64_t __spare1[13]; /* Spare space for future expansion */ >> /* 0x100 */ >> }; >> >> And an fs information structure for less commonly needed data: One comment on this struct is that it would probably be better to use sx_ or sf_ as the prefix for these fields. >> struct statx_fsinfo { >> /* 0x00 - General info */ >> uint32_t st_mask; /* What optional fields are filled in */ >> uint32_t st_type; /* Filesystem type from linux/magic.h */ >> >> /* 0x08 - file timestamp granularity info */ >> uint16_t st_atime_gran_mantissa; /* gran(secs) = mant * 10^exp */ >> uint16_t st_btime_gran_mantissa; >> uint16_t st_ctime_gran_mantissa; >> uint16_t st_mtime_gran_mantissa; >> /* 0x10 */ >> int8_t st_atime_gran_exponent; >> int8_t st_btime_gran_exponent; >> int8_t st_ctime_gran_exponent; >> int8_t st_mtime_gran_exponent; >> >> /* 0x14 - I/O parameters */ >> uint32_t st_blksize; /* File block size */ >> uint32_t st_alloc_blksize; /* Allocation block size/alignment */ >> uint32_t st_small_io_size; /* IO size/alignment that avoids fs/page cache RMW */ >> uint32_t st_pref_io_size; /* Preferred IO size for general usage */ >> uint32_t st_large_io_size; /* IO size/alignment for high bandwidth sequential IO */ > > That's per file information, not per filesystem. XFS definitely > needs this IO information per-file.... Definitely. Lustre can have wildly different layouts for each file. This will also avoid duplication of st_blksize in both the statx and statx_fsinfo structs. The main question I have about these fields is what the difference is between st_blksize and st_alloc_blksize? >> /* 0x28 - Restrictions on struct statx contents */ >> uint64_t st_supported_ioc_flags; /* FS_IOC_GETFLAGS flags supported */ >> >> /* 0x30 - Volume/filesystem information */ >> uint64_t st_fsid; /* Short 64-bit Filesystem ID (as statfs) */ >> uint64_t __spare0[3]; >> /* 0x50 */ >> uint8_t st_volume_id[16]; /* Volume/fs identifier */ >> uint8_t st_volume_uuid[16]; /* Volume/fs UUID */ > > And there's all the remaining information needed to construct file > NFS handles without root priviledges... > >> /* 0x80 */ >> uint64_t __spare1[8]; >> /* 0xc0 */ >> uint8_t st_volume_name[64]; /* Volume name (up to 64 chars) */ >> /* 0x100 */ >> uint8_t st_domain_name[256]; /* Domain/cell/workgroup name (up to 256 chars) */ >> /* 0x200 */ >> }; >> >> One could argue a bit over what goes in which, should we go for this. This >> may be better split between multiple syscalls though (with the race that that >> implies) and potentially merging with statfs. >> >> >> The statxat() syscall [née xstat] could then use the 6th parameter thusly: >> >> asmlinkage long sys_statxat(int dfd, const char __user *path, unsigned flags, >> unsigned mask, struct statx __user *buffer, >> struct statx_fsinfo __user *fsinfo); >> >> >> letting fsinfo be NULL to indicate a lack of interest. I'm not sure we want >> to do that, though. >> >> >> Also, do Dave Chinner's ideas for indicating five I/O parameters want to be >> 32-bit numbers? Larger? Smaller? Can they be log2? > > Definitely 32 bit, IMO, as it's not uncommon to see optimal IO sizes > in the tens of megabytes on large, high bandwidth storage systems. > As for being log2 - that's just making it more complex to use and > making code ugly - we'd have to convert to log2 in kernel, then > convert back in every single application.... > >> Note also, that I've suggested that we represent the timestamp granularity >> information as a decimal float (which requires 3 bytes per timestamp) and that >> we provide separate granularities for each timestamp. >> >> David >> > > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? @ 2012-05-09 1:16 ` Andreas Dilger 0 siblings, 0 replies; 144+ messages in thread From: Andreas Dilger @ 2012-05-09 1:16 UTC (permalink / raw) To: Dave Chinner Cc: David Howells, adilger, bfields, smfrench, ben, Trond.Myklebust, roland, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On 2012-05-08, at 18:24, Dave Chinner <david@fromorbit.com> wrote: > On Tue, May 08, 2012 at 09:19:42PM +0100, David Howells wrote: >> >> Should I split the file-specific info and the fs-specific info and make the >> second optional? What I'm thinking of is something like this: >> >> Have a file information structure: >> >> struct statx { >> /* 0x00 */ >> uint32_t st_mask; /* What results were written */ >> uint32_t st_information; /* Information about the file */ >> uint16_t st_mode; /* File mode */ >> uint16_t __spare0[3]; >> /* 0x10 */ >> uint32_t st_uid; /* User ID of owner */ >> uint32_t st_gid; /* Group ID of owner */ >> uint32_t st_nlink; /* Number of hard links */ >> uint32_t st_blksize; /* Optimal size for filesystem I/O */ >> /* 0x20 */ >> struct statx_dev st_rdev; /* Device ID of special file */ >> struct statx_dev st_dev; /* ID of device containing file */ >> /* 0x30 */ >> int32_t st_atime_ns; /* Last access time (ns part) */ >> int32_t st_btime_ns; /* File creation time (ns part) */ >> int32_t st_ctime_ns; /* Last attribute change time (ns part) */ >> int32_t st_mtime_ns; /* Last data modification time (ns part) */ >> /* 0x40 */ >> int64_t st_atime; /* Last access time */ >> int64_t st_btime; /* File creation time */ >> int64_t st_ctime; /* Last attribute change time */ >> int64_t st_mtime; /* Last data modification time */ >> /* 0x60 */ >> uint64_t st_ino; /* Inode number */ >> uint64_t st_size; /* File size */ >> uint64_t st_blocks; /* Number of 512-byte blocks allocated */ >> uint64_t st_gen; /* Inode generation number */ > > I don't think we want to expose the inode generation numbers. It is > trivial to construct NFS file handles (usually just fsid, inode > number and generation) with that information and hence bypass > security checks to access files. At the same time, if the user can stay the file and get this information, that isn't making the sole any less secure than if they can access the file via NFS. They have already passed all of the pathname security checks by the time they can do a statxat() on the file. >> uint64_t st_version; /* Data version number */ >> uint64_t st_ioc_flags; /* As FS_IOC_GETFLAGS */ >> /* 0x90 */ >> uint64_t __spare1[13]; /* Spare space for future expansion */ >> /* 0x100 */ >> }; >> >> And an fs information structure for less commonly needed data: One comment on this struct is that it would probably be better to use sx_ or sf_ as the prefix for these fields. >> struct statx_fsinfo { >> /* 0x00 - General info */ >> uint32_t st_mask; /* What optional fields are filled in */ >> uint32_t st_type; /* Filesystem type from linux/magic.h */ >> >> /* 0x08 - file timestamp granularity info */ >> uint16_t st_atime_gran_mantissa; /* gran(secs) = mant * 10^exp */ >> uint16_t st_btime_gran_mantissa; >> uint16_t st_ctime_gran_mantissa; >> uint16_t st_mtime_gran_mantissa; >> /* 0x10 */ >> int8_t st_atime_gran_exponent; >> int8_t st_btime_gran_exponent; >> int8_t st_ctime_gran_exponent; >> int8_t st_mtime_gran_exponent; >> >> /* 0x14 - I/O parameters */ >> uint32_t st_blksize; /* File block size */ >> uint32_t st_alloc_blksize; /* Allocation block size/alignment */ >> uint32_t st_small_io_size; /* IO size/alignment that avoids fs/page cache RMW */ >> uint32_t st_pref_io_size; /* Preferred IO size for general usage */ >> uint32_t st_large_io_size; /* IO size/alignment for high bandwidth sequential IO */ > > That's per file information, not per filesystem. XFS definitely > needs this IO information per-file.... Definitely. Lustre can have wildly different layouts for each file. This will also avoid duplication of st_blksize in both the statx and statx_fsinfo structs. The main question I have about these fields is what the difference is between st_blksize and st_alloc_blksize? >> /* 0x28 - Restrictions on struct statx contents */ >> uint64_t st_supported_ioc_flags; /* FS_IOC_GETFLAGS flags supported */ >> >> /* 0x30 - Volume/filesystem information */ >> uint64_t st_fsid; /* Short 64-bit Filesystem ID (as statfs) */ >> uint64_t __spare0[3]; >> /* 0x50 */ >> uint8_t st_volume_id[16]; /* Volume/fs identifier */ >> uint8_t st_volume_uuid[16]; /* Volume/fs UUID */ > > And there's all the remaining information needed to construct file > NFS handles without root priviledges... > >> /* 0x80 */ >> uint64_t __spare1[8]; >> /* 0xc0 */ >> uint8_t st_volume_name[64]; /* Volume name (up to 64 chars) */ >> /* 0x100 */ >> uint8_t st_domain_name[256]; /* Domain/cell/workgroup name (up to 256 chars) */ >> /* 0x200 */ >> }; >> >> One could argue a bit over what goes in which, should we go for this. This >> may be better split between multiple syscalls though (with the race that that >> implies) and potentially merging with statfs. >> >> >> The statxat() syscall [née xstat] could then use the 6th parameter thusly: >> >> asmlinkage long sys_statxat(int dfd, const char __user *path, unsigned flags, >> unsigned mask, struct statx __user *buffer, >> struct statx_fsinfo __user *fsinfo); >> >> >> letting fsinfo be NULL to indicate a lack of interest. I'm not sure we want >> to do that, though. >> >> >> Also, do Dave Chinner's ideas for indicating five I/O parameters want to be >> 32-bit numbers? Larger? Smaller? Can they be log2? > > Definitely 32 bit, IMO, as it's not uncommon to see optimal IO sizes > in the tens of megabytes on large, high bandwidth storage systems. > As for being log2 - that's just making it more complex to use and > making code ugly - we'd have to convert to log2 in kernel, then > convert back in every single application.... > >> Note also, that I've suggested that we represent the timestamp granularity >> information as a decimal float (which requires 3 bytes per timestamp) and that >> we provide separate granularities for each timestamp. >> >> David >> > > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-09 0:24 ` Dave Chinner ` (2 preceding siblings ...) (?) @ 2012-05-10 9:23 ` David Howells [not found] ` <14477.1336641794-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> -1 siblings, 1 reply; 144+ messages in thread From: David Howells @ 2012-05-10 9:23 UTC (permalink / raw) To: Andreas Dilger, Dave Chinner Cc: dhowells, adilger, bfields, smfrench, ben, Trond.Myklebust, roland, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha Andreas Dilger <aedilger@gmail.com> wrote: > The main question I have about these fields is what the difference is > between st_blksize and st_alloc_blksize? I would assume that st_blksize is some ideal I/O size, not the allocation block size. Dave Chinner might be able to answer better as he suggested these I/O information fields. David ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <14477.1336641794-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-10 9:23 ` David Howells @ 2012-05-10 16:05 ` Andreas Dilger 0 siblings, 0 replies; 144+ messages in thread From: Andreas Dilger @ 2012-05-10 16:05 UTC (permalink / raw) To: David Howells Cc: Andreas Dilger, Dave Chinner, bfields-uC3wQj2KruNg9hUCZPvPmw, smfrench-Re5JQEeQqe8AvxtiuMwx3w, ben-/+tVBieCtBitmTQ+vhA3Yw, Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA, roland-/Z5OmTQCD9xF6kxbq+BtvQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On 2012-05-10, at 3:23 AM, David Howells wrote: > Andreas Dilger <aedilger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> The main question I have about these fields is what the difference is >> between st_blksize and st_alloc_blksize? > > I would assume that st_blksize is some ideal I/O size, not the > allocation block size. But there are several optimal IO sizes, so unless there is a clear description of what each of these fields is intended to be used for, it doesn't make sense to add them in. > Dave Chinner might be able to answer better as he suggested these > I/O information fields. > > David Cheers, Andreas ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? @ 2012-05-10 16:05 ` Andreas Dilger 0 siblings, 0 replies; 144+ messages in thread From: Andreas Dilger @ 2012-05-10 16:05 UTC (permalink / raw) To: David Howells Cc: Andreas Dilger, Dave Chinner, bfields, smfrench, ben, Trond.Myklebust, roland, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On 2012-05-10, at 3:23 AM, David Howells wrote: > Andreas Dilger <aedilger@gmail.com> wrote: >> The main question I have about these fields is what the difference is >> between st_blksize and st_alloc_blksize? > > I would assume that st_blksize is some ideal I/O size, not the > allocation block size. But there are several optimal IO sizes, so unless there is a clear description of what each of these fields is intended to be used for, it doesn't make sense to add them in. > Dave Chinner might be able to answer better as he suggested these > I/O information fields. > > David Cheers, Andreas ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-10 9:23 ` David Howells @ 2012-05-10 17:10 ` Roland McGrath 0 siblings, 0 replies; 144+ messages in thread From: Roland McGrath @ 2012-05-10 17:10 UTC (permalink / raw) To: David Howells Cc: Andreas Dilger, Dave Chinner, adilger-m1MBpc4rdrD3fQ9qLvQP4Q, bfields-uC3wQj2KruNg9hUCZPvPmw, smfrench-Re5JQEeQqe8AvxtiuMwx3w, ben-/+tVBieCtBitmTQ+vhA3Yw, Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw POSIX specifies st_blksize thusly: "A file system-specific preferred I/O block size for this object. In some file system types, this may vary from file to file." Since there is only one available to POSIX applications, it should map to the one that's described as "preferred IO size for general usage". Thanks, Roland ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? @ 2012-05-10 17:10 ` Roland McGrath 0 siblings, 0 replies; 144+ messages in thread From: Roland McGrath @ 2012-05-10 17:10 UTC (permalink / raw) To: David Howells Cc: Andreas Dilger, Dave Chinner, adilger, bfields, smfrench, ben, Trond.Myklebust, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha POSIX specifies st_blksize thusly: "A file system-specific preferred I/O block size for this object. In some file system types, this may vary from file to file." Since there is only one available to POSIX applications, it should map to the one that's described as "preferred IO size for general usage". Thanks, Roland ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-10 17:10 ` Roland McGrath (?) @ 2012-05-11 8:54 ` Andreas Dilger -1 siblings, 0 replies; 144+ messages in thread From: Andreas Dilger @ 2012-05-11 8:54 UTC (permalink / raw) To: Roland McGrath Cc: David Howells, Dave Chinner, bfields@fieldses.org Fields, smfrench@gmail.com French, ben@decadent.org.uk Hutchings, Trond.Myklebust@netapp.com Myklebust, linux-fsdevel@vger.kernel.org Devel, linux-nfs, linux-cifs, samba-technical, linux-ext4@vger.kernel.org List, linux-api, libc-alpha On 2012-05-10, at 11:10 AM, Roland McGrath wrote: > POSIX specifies st_blksize thusly: "A file system-specific preferred > I/O block size for this object. In some file system types, this may > vary from file to file." > > Since there is only one available to POSIX applications, it should map > to the one that's described as "preferred IO size for general usage". Sure, but statxat() isn't a POSIX API. While I agree with the idea that there should be enough information about the underlying layout for applications to be able to submit good IO, it doesn't help if we have a bunch of extra fields that have vague meanings. They will get filled in by the filesystem in a haphazard way, and will not be used by application developers that don't understand what they mean. Cheers, Andreas ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-08 20:19 ` David Howells @ 2012-05-09 9:21 ` David Howells -1 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-05-09 9:21 UTC (permalink / raw) To: Dave Chinner Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, adilger-m1MBpc4rdrD3fQ9qLvQP4Q, bfields-uC3wQj2KruNg9hUCZPvPmw, smfrench-Re5JQEeQqe8AvxtiuMwx3w, ben-/+tVBieCtBitmTQ+vhA3Yw, Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA, roland-/Z5OmTQCD9xF6kxbq+BtvQ, jra-eUNUBHrolfbYtjvyW6yDsg, bernd.schubert-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote: > I don't think we want to expose the inode generation numbers. It is > trivial to construct NFS file handles (usually just fsid, inode > number and generation) with that information and hence bypass > security checks to access files. I was asked for it by Bernd Schubert for userspace NFS servers and FUSE - maybe he can say what he wants it for. I also have a note that Jeremy Allison asked for it, but I can't find where or why, so that might be an error. It looks like FreeBSD do have an st_gen field in their stat struct, but it's only filled in for root. Maybe I could do something like that? David ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? @ 2012-05-09 9:21 ` David Howells 0 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-05-09 9:21 UTC (permalink / raw) To: Dave Chinner Cc: dhowells, adilger, bfields, smfrench, ben, Trond.Myklebust, roland, jra, bernd.schubert, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha Dave Chinner <david@fromorbit.com> wrote: > I don't think we want to expose the inode generation numbers. It is > trivial to construct NFS file handles (usually just fsid, inode > number and generation) with that information and hence bypass > security checks to access files. I was asked for it by Bernd Schubert for userspace NFS servers and FUSE - maybe he can say what he wants it for. I also have a note that Jeremy Allison asked for it, but I can't find where or why, so that might be an error. It looks like FreeBSD do have an st_gen field in their stat struct, but it's only filled in for root. Maybe I could do something like that? David ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <20170.1336555274-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-09 9:21 ` David Howells @ 2012-05-09 11:19 ` Christoph Hellwig -1 siblings, 0 replies; 144+ messages in thread From: Christoph Hellwig @ 2012-05-09 11:19 UTC (permalink / raw) To: David Howells Cc: Dave Chinner, adilger-m1MBpc4rdrD3fQ9qLvQP4Q, bfields-uC3wQj2KruNg9hUCZPvPmw, smfrench-Re5JQEeQqe8AvxtiuMwx3w, ben-/+tVBieCtBitmTQ+vhA3Yw, Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA, roland-/Z5OmTQCD9xF6kxbq+BtvQ, jra-eUNUBHrolfbYtjvyW6yDsg, bernd.schubert-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Wed, May 09, 2012 at 10:21:14AM +0100, David Howells wrote: > Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote: > > > I don't think we want to expose the inode generation numbers. It is > > trivial to construct NFS file handles (usually just fsid, inode > > number and generation) with that information and hence bypass > > security checks to access files. > > I was asked for it by Bernd Schubert for userspace NFS servers and FUSE - > maybe he can say what he wants it for. It's entirely broken, as a generation number might be part of the file handle (and for Linux-like filesystems normally is), but it's entirely up to the filesystem to decide how it works. That's why we added system calls to do operations on opaque file handles that the file system controls. Exposing a completely meaningless "generation" is a bad idea. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? @ 2012-05-09 11:19 ` Christoph Hellwig 0 siblings, 0 replies; 144+ messages in thread From: Christoph Hellwig @ 2012-05-09 11:19 UTC (permalink / raw) To: David Howells Cc: Dave Chinner, adilger, bfields, smfrench, ben, Trond.Myklebust, roland, jra, bernd.schubert, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On Wed, May 09, 2012 at 10:21:14AM +0100, David Howells wrote: > Dave Chinner <david@fromorbit.com> wrote: > > > I don't think we want to expose the inode generation numbers. It is > > trivial to construct NFS file handles (usually just fsid, inode > > number and generation) with that information and hence bypass > > security checks to access files. > > I was asked for it by Bernd Schubert for userspace NFS servers and FUSE - > maybe he can say what he wants it for. It's entirely broken, as a generation number might be part of the file handle (and for Linux-like filesystems normally is), but it's entirely up to the filesystem to decide how it works. That's why we added system calls to do operations on opaque file handles that the file system controls. Exposing a completely meaningless "generation" is a bad idea. ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <20120509111958.GA11345-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-09 11:19 ` Christoph Hellwig @ 2012-05-09 11:55 ` Bernd Schubert -1 siblings, 0 replies; 144+ messages in thread From: Bernd Schubert @ 2012-05-09 11:55 UTC (permalink / raw) To: Christoph Hellwig Cc: David Howells, Dave Chinner, adilger-m1MBpc4rdrD3fQ9qLvQP4Q, bfields-uC3wQj2KruNg9hUCZPvPmw, smfrench-Re5JQEeQqe8AvxtiuMwx3w, ben-/+tVBieCtBitmTQ+vhA3Yw, Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA, roland-/Z5OmTQCD9xF6kxbq+BtvQ, jra-eUNUBHrolfbYtjvyW6yDsg, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On 05/09/2012 01:19 PM, Christoph Hellwig wrote: > On Wed, May 09, 2012 at 10:21:14AM +0100, David Howells wrote: >> Dave Chinner<david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote: >> >>> I don't think we want to expose the inode generation numbers. It is >>> trivial to construct NFS file handles (usually just fsid, inode >>> number and generation) with that information and hence bypass >>> security checks to access files. >> >> I was asked for it by Bernd Schubert for userspace NFS servers and FUSE - >> maybe he can say what he wants it for. > > It's entirely broken, as a generation number might be part of the file > handle (and for Linux-like filesystems normally is), but it's entirely > up to the filesystem to decide how it works. That's why we added system > calls to do operations on opaque file handles that the file system > controls. Exposing a completely meaningless "generation" is a bad idea. > The basic idea of generation numbers is to check if an inode was recycled, so only if the tuple of inode-number and generation-number matches we still have the same file. Kernel nfs uses that and unfs3 uses it via EXT2_IOC_GETVERSION, which has the overhead of an additional syscall. Unionfs-fuse usually keeps file open, however, it might run out of the maximum allowed files and I plan to add a mode to close and re-open files as failback mode. For that the definite knowledge if a file/inode is still the very same and the inode was not just recycled is crucial. All of that being said, I think with open_by_handle_at() syscall we don't need the inode generation number any more. Cheers, Bernd -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? @ 2012-05-09 11:55 ` Bernd Schubert 0 siblings, 0 replies; 144+ messages in thread From: Bernd Schubert @ 2012-05-09 11:55 UTC (permalink / raw) To: Christoph Hellwig Cc: David Howells, Dave Chinner, adilger, bfields, smfrench, ben, Trond.Myklebust, roland, jra, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On 05/09/2012 01:19 PM, Christoph Hellwig wrote: > On Wed, May 09, 2012 at 10:21:14AM +0100, David Howells wrote: >> Dave Chinner<david@fromorbit.com> wrote: >> >>> I don't think we want to expose the inode generation numbers. It is >>> trivial to construct NFS file handles (usually just fsid, inode >>> number and generation) with that information and hence bypass >>> security checks to access files. >> >> I was asked for it by Bernd Schubert for userspace NFS servers and FUSE - >> maybe he can say what he wants it for. > > It's entirely broken, as a generation number might be part of the file > handle (and for Linux-like filesystems normally is), but it's entirely > up to the filesystem to decide how it works. That's why we added system > calls to do operations on opaque file handles that the file system > controls. Exposing a completely meaningless "generation" is a bad idea. > The basic idea of generation numbers is to check if an inode was recycled, so only if the tuple of inode-number and generation-number matches we still have the same file. Kernel nfs uses that and unfs3 uses it via EXT2_IOC_GETVERSION, which has the overhead of an additional syscall. Unionfs-fuse usually keeps file open, however, it might run out of the maximum allowed files and I plan to add a mode to close and re-open files as failback mode. For that the definite knowledge if a file/inode is still the very same and the inode was not just recycled is crucial. All of that being said, I think with open_by_handle_at() syscall we don't need the inode generation number any more. Cheers, Bernd ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <4FAA5B24.1020306-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>]
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-09 11:55 ` Bernd Schubert @ 2012-05-09 12:05 ` Christoph Hellwig -1 siblings, 0 replies; 144+ messages in thread From: Christoph Hellwig @ 2012-05-09 12:05 UTC (permalink / raw) To: Bernd Schubert Cc: Christoph Hellwig, David Howells, Dave Chinner, adilger-m1MBpc4rdrD3fQ9qLvQP4Q, bfields-uC3wQj2KruNg9hUCZPvPmw, smfrench-Re5JQEeQqe8AvxtiuMwx3w, ben-/+tVBieCtBitmTQ+vhA3Yw, Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA, roland-/Z5OmTQCD9xF6kxbq+BtvQ, jra-eUNUBHrolfbYtjvyW6yDsg, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Wed, May 09, 2012 at 01:55:16PM +0200, Bernd Schubert wrote: > The basic idea of generation numbers is to check if an inode was > recycled, so only if the tuple of inode-number and generation-number > matches we still have the same file. Kernel nfs NFS does not and should not look at the inode generation. Except for a bit of legacy code for the old pre-Linux 2.4 filehandles it looks at the opaque file handle returned and only interpreted by the filesystem. Any userspace NFS server should do the same. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? @ 2012-05-09 12:05 ` Christoph Hellwig 0 siblings, 0 replies; 144+ messages in thread From: Christoph Hellwig @ 2012-05-09 12:05 UTC (permalink / raw) To: Bernd Schubert Cc: Christoph Hellwig, David Howells, Dave Chinner, adilger, bfields, smfrench, ben, Trond.Myklebust, roland, jra, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On Wed, May 09, 2012 at 01:55:16PM +0200, Bernd Schubert wrote: > The basic idea of generation numbers is to check if an inode was > recycled, so only if the tuple of inode-number and generation-number > matches we still have the same file. Kernel nfs NFS does not and should not look at the inode generation. Except for a bit of legacy code for the old pre-Linux 2.4 filehandles it looks at the opaque file handle returned and only interpreted by the filesystem. Any userspace NFS server should do the same. ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <20120509120544.GA17535-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-09 12:05 ` Christoph Hellwig @ 2012-05-09 12:25 ` Bernd Schubert -1 siblings, 0 replies; 144+ messages in thread From: Bernd Schubert @ 2012-05-09 12:25 UTC (permalink / raw) To: Christoph Hellwig Cc: David Howells, Dave Chinner, adilger-m1MBpc4rdrD3fQ9qLvQP4Q, bfields-uC3wQj2KruNg9hUCZPvPmw, smfrench-Re5JQEeQqe8AvxtiuMwx3w, ben-/+tVBieCtBitmTQ+vhA3Yw, Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA, roland-/Z5OmTQCD9xF6kxbq+BtvQ, jra-eUNUBHrolfbYtjvyW6yDsg, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On 05/09/2012 02:05 PM, Christoph Hellwig wrote: > On Wed, May 09, 2012 at 01:55:16PM +0200, Bernd Schubert wrote: >> The basic idea of generation numbers is to check if an inode was >> recycled, so only if the tuple of inode-number and generation-number >> matches we still have the same file. Kernel nfs > > NFS does not and should not look at the inode generation. Except for a > bit of legacy code for the old pre-Linux 2.4 filehandles it looks at the > opaque file handle returned and only interpreted by the filesystem. Any > userspace NFS server should do the same. Ok, I didn't look how kernel NFS does it for quite some time already... User space NFS only can do it beginning with 2.6.39 - given that user space also needs to support older kernels and other OSs, which might not have open_by_handle, userspace unfortunately cannot entirely rely on that feature. Cheers, Bernd ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? @ 2012-05-09 12:25 ` Bernd Schubert 0 siblings, 0 replies; 144+ messages in thread From: Bernd Schubert @ 2012-05-09 12:25 UTC (permalink / raw) To: Christoph Hellwig Cc: David Howells, Dave Chinner, adilger, bfields, smfrench, ben, Trond.Myklebust, roland, jra, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On 05/09/2012 02:05 PM, Christoph Hellwig wrote: > On Wed, May 09, 2012 at 01:55:16PM +0200, Bernd Schubert wrote: >> The basic idea of generation numbers is to check if an inode was >> recycled, so only if the tuple of inode-number and generation-number >> matches we still have the same file. Kernel nfs > > NFS does not and should not look at the inode generation. Except for a > bit of legacy code for the old pre-Linux 2.4 filehandles it looks at the > opaque file handle returned and only interpreted by the filesystem. Any > userspace NFS server should do the same. Ok, I didn't look how kernel NFS does it for quite some time already... User space NFS only can do it beginning with 2.6.39 - given that user space also needs to support older kernels and other OSs, which might not have open_by_handle, userspace unfortunately cannot entirely rely on that feature. Cheers, Bernd ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-09 12:25 ` Bernd Schubert (?) @ 2012-05-09 13:51 ` Andreas Dilger 2012-05-09 14:12 ` Bernd Schubert -1 siblings, 1 reply; 144+ messages in thread From: Andreas Dilger @ 2012-05-09 13:51 UTC (permalink / raw) To: Bernd Schubert Cc: Christoph Hellwig, David Howells, Dave Chinner, bfields, smfrench, ben, Trond.Myklebust, roland, jra, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On 2012-05-09, at 6:25 AM, Bernd Schubert wrote: > On 05/09/2012 02:05 PM, Christoph Hellwig wrote: >> On Wed, May 09, 2012 at 01:55:16PM +0200, Bernd Schubert wrote: >>> The basic idea of generation numbers is to check if an inode was >>> recycled, so only if the tuple of inode-number and generation-number >>> matches we still have the same file. Kernel nfs >> >> NFS does not and should not look at the inode generation. Except for a >> bit of legacy code for the old pre-Linux 2.4 filehandles it looks at the >> opaque file handle returned and only interpreted by the filesystem. Any >> userspace NFS server should do the same. > > Ok, I didn't look how kernel NFS does it for quite some time already... > User space NFS only can do it beginning with 2.6.39 - given that user space also needs to support older kernels and other OSs, which might not have open_by_handle, userspace unfortunately cannot entirely rely on that feature. But even fewer kernels have sys_statxat() in them (i.e. none), so you can rely on that even less than open_by_handle()... Cheers, Andreas ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-09 13:51 ` Andreas Dilger @ 2012-05-09 14:12 ` Bernd Schubert 0 siblings, 0 replies; 144+ messages in thread From: Bernd Schubert @ 2012-05-09 14:12 UTC (permalink / raw) To: Andreas Dilger Cc: Christoph Hellwig, David Howells, Dave Chinner, bfields, smfrench, ben, Trond.Myklebust, roland, jra, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha On 05/09/2012 03:51 PM, Andreas Dilger wrote: > On 2012-05-09, at 6:25 AM, Bernd Schubert wrote: >> On 05/09/2012 02:05 PM, Christoph Hellwig wrote: >>> On Wed, May 09, 2012 at 01:55:16PM +0200, Bernd Schubert wrote: >>>> The basic idea of generation numbers is to check if an inode was >>>> recycled, so only if the tuple of inode-number and generation-number >>>> matches we still have the same file. Kernel nfs >>> >>> NFS does not and should not look at the inode generation. Except for a >>> bit of legacy code for the old pre-Linux 2.4 filehandles it looks at the >>> opaque file handle returned and only interpreted by the filesystem. Any >>> userspace NFS server should do the same. >> >> Ok, I didn't look how kernel NFS does it for quite some time already... >> User space NFS only can do it beginning with 2.6.39 - given that user space also needs to support older kernels and other OSs, which might not have open_by_handle, userspace unfortunately cannot entirely rely on that feature. > > But even fewer kernels have sys_statxat() in them (i.e. none), so you can rely on that even less than open_by_handle()... Well, I didn't say that :) In summary, an application needs to try to use the open-by-handle call and if that is not supported, it has to fall back to traditional stat and generation-number-ioctl. And as I said before, open-by-handle very likely removes the requirement for generation numbers in sys_statxat(). Cheers, Bernd ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? 2012-05-08 20:19 ` David Howells @ 2012-05-10 9:14 ` David Howells -1 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-05-10 9:14 UTC (permalink / raw) To: Dave Chinner Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, adilger-m1MBpc4rdrD3fQ9qLvQP4Q, bfields-uC3wQj2KruNg9hUCZPvPmw, smfrench-Re5JQEeQqe8AvxtiuMwx3w, ben-/+tVBieCtBitmTQ+vhA3Yw, Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA, roland-/Z5OmTQCD9xF6kxbq+BtvQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote: > > Also, do Dave Chinner's ideas for indicating five I/O parameters want to be > > 32-bit numbers? Larger? Smaller? Can they be log2? > > Definitely 32 bit, IMO, as it's not uncommon to see optimal IO sizes > in the tens of megabytes on large, high bandwidth storage systems. > As for being log2 - that's just making it more complex to use and > making code ugly - we'd have to convert to log2 in kernel, then > convert back in every single application.... ilog2() in the kernel uses the CPU's bit-scan instruction if it has one and converting back is just a bitshift operator. But let's go with 32-bit fields for the moment. I presume we aren't worried about a driver that wants to do a 4GB transfer... David ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Extended file stat: Splitting file- and fs-specific info? @ 2012-05-10 9:14 ` David Howells 0 siblings, 0 replies; 144+ messages in thread From: David Howells @ 2012-05-10 9:14 UTC (permalink / raw) To: Dave Chinner Cc: dhowells, adilger, bfields, smfrench, ben, Trond.Myklebust, roland, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, linux-api, libc-alpha Dave Chinner <david@fromorbit.com> wrote: > > Also, do Dave Chinner's ideas for indicating five I/O parameters want to be > > 32-bit numbers? Larger? Smaller? Can they be log2? > > Definitely 32 bit, IMO, as it's not uncommon to see optimal IO sizes > in the tens of megabytes on large, high bandwidth storage systems. > As for being log2 - that's just making it more complex to use and > making code ugly - we'd have to convert to log2 in kernel, then > convert back in every single application.... ilog2() in the kernel uses the CPU's bit-scan instruction if it has one and converting back is just a bitshift operator. But let's go with 32-bit fields for the moment. I presume we aren't worried about a driver that wants to do a 4GB transfer... David ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-19 14:05 ` David Howells ` (8 preceding siblings ...) (?) @ 2012-04-27 9:39 ` David Howells [not found] ` <4111.1335519545-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> -1 siblings, 1 reply; 144+ messages in thread From: David Howells @ 2012-04-27 9:39 UTC (permalink / raw) To: Dave Chinner Cc: dhowells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Dave Chinner <david@fromorbit.com> wrote: > If we are adding per-inode flags, then what do we do with filesystem specific > flags? e.g. XFS has quite a number of per-inode flags that don't align with > any other filesystem (e.g. filestream allocator, real time file, behaviour > inheritence flags, etc), but may be useful to retrieve in such a call. We > currently have an ioctl to get that information from each inode. Have you > thought about how to handle such flags? I haven't looked at XFS with regard to xstat as yet, so I'm not sure exactly which flags you're talking about. The question, though, is what will actually make use of these flags? Will it just be XFS tools or are they something that a GUI might make use of? Either you can add some of them to the ioc flags (which may be impractical, I grant you) or we'd have to add an arbitrary fs-type specific field and specify the host fs (the provision of which might not be a bad idea in and of itself) to tell userspace how to interpret them. > Along the same lines, filesytsems can have different allocation constraints > to IO the filesystem block size - ext4 with it's bigalloc hack, XFS with it's > per-inode extent size hints and the realtime device, etc. Then there's > optimal IO characteristics (e.g. geometery hints like stripe unit/stripe > width for the allocation policy of that given file) that applications could > use if they were present rather than having to expose them through ioctls > that nobody even knows about... Yeah... Not representable by one number. You'd have to unset a flag to say you were providing this information. However, providing a whole bunch of hints about I/O characteristics is probably beyond this syscall - especially if it isn't constant over the length of a file. That's specialist knowledge that most applications don't need to know. Having a generic way to retrieve it, though, may be a good idea. OTOH, there's plenty of uncommitted space, so if we can condense the hints down to something small, we could perhaps add it later - but from your paragraph above, it doesn't sound like it'll be small. > Perhaps also exposing the project ID for quota purposes, like we do UID and > GID. That way we wouldn't need a filesystem specific ioctl to read it.... Is this an XFS only thing? If so, can it be generalised? David ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <4111.1335519545-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-27 9:39 ` [PATCH 0/6] Extended file stat system call David Howells @ 2012-04-27 13:13 ` Dave Chinner 0 siblings, 0 replies; 144+ messages in thread From: Dave Chinner @ 2012-04-27 13:13 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA, wine-devel-5vRYHf7vrtgdnm+yROfE0A, kfm-devel-RoXCvvDuEio, nautilus-list-rDKQcyrBJuzYtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA, libc-alpha-9JcytcrH/bA+uJoB2kUjGw On Fri, Apr 27, 2012 at 10:39:05AM +0100, David Howells wrote: > Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote: > > > If we are adding per-inode flags, then what do we do with filesystem specific > > flags? e.g. XFS has quite a number of per-inode flags that don't align with > > any other filesystem (e.g. filestream allocator, real time file, behaviour > > inheritence flags, etc), but may be useful to retrieve in such a call. We > > currently have an ioctl to get that information from each inode. Have you > > thought about how to handle such flags? > > I haven't looked at XFS with regard to xstat as yet, so I'm not sure exactly > which flags you're talking about. The question, though, is what will actually > make use of these flags? Will it just be XFS tools or are they something that > a GUI might make use of? Have a look at fs/xfs/xfs_dinode.h. There's a bunch of flags defined at the bottom of the file. Stuff like the "nodefrag", "nodump", and "prealloc" bits seem fairly generic - they are for indicating that files are to be avoided for defrag or backup purposes, the prealloc bit indicates that fallocate has been used to reserve space on the inode (finding files that space can be punched out of safely), and so on. Currently these things are queried and manipulated by ioctls (XFS_IOC_FSX[GS]ETATTR) along with extent size hints, project quotas, etc. but I think there's some wider use for many of the flags, which is why I was asking is there's any thought to this sort of flag being exposed by the VFS. Historically the flags exposed by the VFS are those used by extN - I see little reason why we should favour one filesystem's flags over any others in an extended stat interface if they are generically useful.... > Either you can add some of them to the ioc flags (which may be impractical, I > grant you) or we'd have to add an arbitrary fs-type specific field and specify > the host fs (the provision of which might not be a bad idea in and of itself) > to tell userspace how to interpret them. Well, that's the complexity, isn't it. I have no good answer to that... > > Along the same lines, filesytsems can have different allocation constraints > > to IO the filesystem block size - ext4 with it's bigalloc hack, XFS with it's > > per-inode extent size hints and the realtime device, etc. Then there's > > optimal IO characteristics (e.g. geometery hints like stripe unit/stripe > > width for the allocation policy of that given file) that applications could > > use if they were present rather than having to expose them through ioctls > > that nobody even knows about... > > Yeah... Not representable by one number. You'd have to unset a flag to say > you were providing this information. > > However, providing a whole bunch of hints about I/O characteristics is probably > beyond this syscall - especially if it isn't constant over the length of a > file. That's specialist knowledge that most applications don't need to know. > Having a generic way to retrieve it, though, may be a good idea. We're continually talking about applications giving us usage hints on what IO they are going to do so the storage can optimise the IO. IO is still a GIGO problem, though, and the idea of geometry hints is to enable us to tell the application to do well formed IO. i.e. less garbage. XFS has ioctls to expose filesystem geometry, optimal IO sizes, the alignment limits for direct IO, etc, and they are very useful to applications that care about high performance IO. A lot of this can be distilled down to a simple set of geometries, and generally speaking they don't change mid way through a file.... > OTOH, there's plenty of uncommitted space, so if we can condense the hints down > to something small, we could perhaps add it later - but from your paragraph > above, it doesn't sound like it'll be small. Allocation block size, minimum sane IO size (to avoid page cache RMW cycles or DIO zeroing), minimum prefered IO size (e.g. stripe unit), optimal IO size for bandwidth (e.g. stripe width). I don't think there's much more than that which will be really usable by applications. > > Perhaps also exposing the project ID for quota purposes, like we do UID and > > GID. That way we wouldn't need a filesystem specific ioctl to read it.... > > Is this an XFS only thing? If so, can it be generalised? Right now it is, but there's ben patches in the past to introduce project quotas to ext4. That didn't go far because it was done in a way that was semantically different to XFS (for no reason that I could understand) and nobody wanted two different sets of semantics for the "same" feature. The most common use of project quotas is to implement sub-tree quotas, which is probably of more interest to btrfs folks as it is an exact match for per-subvolume quotas. So, yes, I do see it as something generically useful - it's a feature that a lot of people use XFS specifically for.... Cheers, Dave. -- Dave Chinner david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-27 13:13 ` Dave Chinner 0 siblings, 0 replies; 144+ messages in thread From: Dave Chinner @ 2012-04-27 13:13 UTC (permalink / raw) To: David Howells Cc: linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Fri, Apr 27, 2012 at 10:39:05AM +0100, David Howells wrote: > Dave Chinner <david@fromorbit.com> wrote: > > > If we are adding per-inode flags, then what do we do with filesystem specific > > flags? e.g. XFS has quite a number of per-inode flags that don't align with > > any other filesystem (e.g. filestream allocator, real time file, behaviour > > inheritence flags, etc), but may be useful to retrieve in such a call. We > > currently have an ioctl to get that information from each inode. Have you > > thought about how to handle such flags? > > I haven't looked at XFS with regard to xstat as yet, so I'm not sure exactly > which flags you're talking about. The question, though, is what will actually > make use of these flags? Will it just be XFS tools or are they something that > a GUI might make use of? Have a look at fs/xfs/xfs_dinode.h. There's a bunch of flags defined at the bottom of the file. Stuff like the "nodefrag", "nodump", and "prealloc" bits seem fairly generic - they are for indicating that files are to be avoided for defrag or backup purposes, the prealloc bit indicates that fallocate has been used to reserve space on the inode (finding files that space can be punched out of safely), and so on. Currently these things are queried and manipulated by ioctls (XFS_IOC_FSX[GS]ETATTR) along with extent size hints, project quotas, etc. but I think there's some wider use for many of the flags, which is why I was asking is there's any thought to this sort of flag being exposed by the VFS. Historically the flags exposed by the VFS are those used by extN - I see little reason why we should favour one filesystem's flags over any others in an extended stat interface if they are generically useful.... > Either you can add some of them to the ioc flags (which may be impractical, I > grant you) or we'd have to add an arbitrary fs-type specific field and specify > the host fs (the provision of which might not be a bad idea in and of itself) > to tell userspace how to interpret them. Well, that's the complexity, isn't it. I have no good answer to that... > > Along the same lines, filesytsems can have different allocation constraints > > to IO the filesystem block size - ext4 with it's bigalloc hack, XFS with it's > > per-inode extent size hints and the realtime device, etc. Then there's > > optimal IO characteristics (e.g. geometery hints like stripe unit/stripe > > width for the allocation policy of that given file) that applications could > > use if they were present rather than having to expose them through ioctls > > that nobody even knows about... > > Yeah... Not representable by one number. You'd have to unset a flag to say > you were providing this information. > > However, providing a whole bunch of hints about I/O characteristics is probably > beyond this syscall - especially if it isn't constant over the length of a > file. That's specialist knowledge that most applications don't need to know. > Having a generic way to retrieve it, though, may be a good idea. We're continually talking about applications giving us usage hints on what IO they are going to do so the storage can optimise the IO. IO is still a GIGO problem, though, and the idea of geometry hints is to enable us to tell the application to do well formed IO. i.e. less garbage. XFS has ioctls to expose filesystem geometry, optimal IO sizes, the alignment limits for direct IO, etc, and they are very useful to applications that care about high performance IO. A lot of this can be distilled down to a simple set of geometries, and generally speaking they don't change mid way through a file.... > OTOH, there's plenty of uncommitted space, so if we can condense the hints down > to something small, we could perhaps add it later - but from your paragraph > above, it doesn't sound like it'll be small. Allocation block size, minimum sane IO size (to avoid page cache RMW cycles or DIO zeroing), minimum prefered IO size (e.g. stripe unit), optimal IO size for bandwidth (e.g. stripe width). I don't think there's much more than that which will be really usable by applications. > > Perhaps also exposing the project ID for quota purposes, like we do UID and > > GID. That way we wouldn't need a filesystem specific ioctl to read it.... > > Is this an XFS only thing? If so, can it be generalised? Right now it is, but there's ben patches in the past to introduce project quotas to ext4. That didn't go far because it was done in a way that was semantically different to XFS (for no reason that I could understand) and nobody wanted two different sets of semantics for the "same" feature. The most common use of project quotas is to implement sub-tree quotas, which is probably of more interest to btrfs folks as it is an exact match for per-subvolume quotas. So, yes, I do see it as something generically useful - it's a feature that a lot of people use XFS specifically for.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-27 13:13 ` Dave Chinner (?) @ 2012-04-27 15:10 ` J. Bruce Fields [not found] ` <20120427151057.GA16580-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> -1 siblings, 1 reply; 144+ messages in thread From: J. Bruce Fields @ 2012-04-27 15:10 UTC (permalink / raw) To: Dave Chinner Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Fri, Apr 27, 2012 at 11:13:06PM +1000, Dave Chinner wrote: > Right now it is, but there's ben patches in the past to introduce > project quotas to ext4. That didn't go far because it was done in a > way that was semantically different to XFS (for no reason that I > could understand) and nobody wanted two different sets of semantics > for the "same" feature. The most common use of project quotas is to > implement sub-tree quotas, (Though it's also useful as a way to do safe subtree NFS exports). --b. > which is probably of more interest to > btrfs folks as it is an exact match for per-subvolume quotas. > > So, yes, I do see it as something generically useful - it's a > feature that a lot of people use XFS specifically for.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
[parent not found: <20120427151057.GA16580-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH 0/6] Extended file stat system call 2012-04-27 15:10 ` J. Bruce Fields @ 2012-04-27 16:32 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-27 16:32 UTC (permalink / raw) To: J. Bruce Fields Cc: Dave Chinner, David Howells, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA On Fri, Apr 27, 2012 at 10:10 AM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote: > On Fri, Apr 27, 2012 at 11:13:06PM +1000, Dave Chinner wrote: >> Right now it is, but there's ben patches in the past to introduce >> project quotas to ext4. That didn't go far because it was done in a >> way that was semantically different to XFS (for no reason that I >> could understand) and nobody wanted two different sets of semantics >> for the "same" feature. The most common use of project quotas is to >> implement sub-tree quotas, > > (Though it's also useful as a way to do safe subtree NFS exports). Quotas are very important to Samba (and Windows) admins. See e.g. http://www.samba.org/samba/docs/man/manpages-3/smbcquotas.1.html But I would defer to Metze and others on the server side to see whether XFS or the proposed ext4 quotas match the protocol requirements which Samba has to support. Does anyone know whether Samba can handle all of the remote quota admin operations on Linux/XFS today (and local enforcement)? -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call @ 2012-04-27 16:32 ` Steve French 0 siblings, 0 replies; 144+ messages in thread From: Steve French @ 2012-04-27 16:32 UTC (permalink / raw) To: J. Bruce Fields Cc: Dave Chinner, David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4 On Fri, Apr 27, 2012 at 10:10 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > On Fri, Apr 27, 2012 at 11:13:06PM +1000, Dave Chinner wrote: >> Right now it is, but there's ben patches in the past to introduce >> project quotas to ext4. That didn't go far because it was done in a >> way that was semantically different to XFS (for no reason that I >> could understand) and nobody wanted two different sets of semantics >> for the "same" feature. The most common use of project quotas is to >> implement sub-tree quotas, > > (Though it's also useful as a way to do safe subtree NFS exports). Quotas are very important to Samba (and Windows) admins. See e.g. http://www.samba.org/samba/docs/man/manpages-3/smbcquotas.1.html But I would defer to Metze and others on the server side to see whether XFS or the proposed ext4 quotas match the protocol requirements which Samba has to support. Does anyone know whether Samba can handle all of the remote quota admin operations on Linux/XFS today (and local enforcement)? -- Thanks, Steve ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-27 13:13 ` Dave Chinner (?) (?) @ 2012-04-27 19:31 ` Andreas Dilger 2012-04-28 0:58 ` Dave Chinner 2012-05-10 9:51 ` David Howells -1 siblings, 2 replies; 144+ messages in thread From: Andreas Dilger @ 2012-04-27 19:31 UTC (permalink / raw) To: Dave Chinner Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On 2012-04-27, at 7:13 AM, Dave Chinner wrote: > Have a look at fs/xfs/xfs_dinode.h. There's a bunch of flags defined > at the bottom of the file. > > Stuff like the "nodefrag", "nodump", and "prealloc" bits seem fairly > generic - they are for indicating that files are to be avoided for > defrag or backup purposes, the prealloc bit indicates that fallocate > has been used to reserve space on the inode (finding files that space > can be punched out of safely), and so on. There is already the FS_NODUMP_FL in the standard FS_IOC_GETFLAGS ioctl and I expect this to be in statxat() also. In ext4 there was also an EXT4_EOFBLOCKS_FL added for inodes with fallocate'd data beyond EOF, but Eric thought it was a pain to maintain and it has been deprecated in ext4 and e2fsprogs recently. > Currently these things are queried and manipulated by ioctls > (XFS_IOC_FSX[GS]ETATTR) along with extent size hints, project > quotas, etc. but I think there's some wider use for many of the > flags, which is why I was asking is there's any thought to this sort > of flag being exposed by the VFS. > > Historically the flags exposed by the VFS are those used by extN - I > see little reason why we should favour one filesystem's flags over > any others in an extended stat interface if they are generically > useful.... Sure, they started as ext4 flags because the "lsattr" and "chattr" tools were using this ioctl/flags, but have become more generic in recent years. FS_NOTAIL_FL was added for Reiserfs, and FS_NOCOW_FL was added for another filesystem (maybe Btrfs?). I'm not against adding more flags here that are generically useful, and recommended that statxat() have a 64-bit st_ioc_flags, since there are already 22 FS_*_FL flags defined today. >> Either you can add some of them to the ioc flags (which may be >> impractical, I grant you) or we'd have to add an arbitrary fs-type >> specific field and specify the host fs (the provision of which might >> not be a bad idea in and of itself) to tell userspace how to interpret them. > > Well, that's the complexity, isn't it. I have no good answer to > that... > >>> Along the same lines, filesytsems can have different allocation >>> constraints to IO the filesystem block size - ext4 with it's >>> bigalloc hack, XFS with it's per-inode extent size hints and the >>> realtime device, etc. Then there's optimal IO characteristics >>> (e.g. geometery hints like stripe unit/stripe width for the >>> allocation policy of that given file) that applications could >>> use if they were present rather than having to expose them >>> through ioctls that nobody even knows about... >> >> Yeah... Not representable by one number. You'd have to unset a >> flag to say you were providing this information. >> >> However, providing a whole bunch of hints about I/O characteristics >> is probably beyond this syscall - especially if it isn't constant >> over the length of a file. That's specialist knowledge that most >> applications don't need to know. >> Having a generic way to retrieve it, though, may be a good idea. > > We're continually talking about applications giving us usage hints > on what IO they are going to do so the storage can optimise the IO. > IO is still a GIGO problem, though, and the idea of geometry hints > is to enable us to tell the application to do well formed IO. i.e. > less garbage. > > XFS has ioctls to expose filesystem geometry, optimal IO sizes, the > alignment limits for direct IO, etc, and they are very useful to > applications that care about high performance IO. A lot of this can > be distilled down to a simple set of geometries, and generally > speaking they don't change mid way through a file.... > >> OTOH, there's plenty of uncommitted space, so if we can condense >> the hints down to something small, we could perhaps add it later - >> but from your paragraph above, it doesn't sound like it'll be small. > > Allocation block size, minimum sane IO size (to avoid page cache RMW > cycles or DIO zeroing), minimum prefered IO size (e.g. stripe unit), > optimal IO size for bandwidth (e.g. stripe width). I don't think > there's much more than that which will be really usable by > applications. I think this is a minimal set that makes sense, and is manageable for both the interface and for users. Even if it isn't 100% correct for every file of every filesystem, it still makes sense for many systems. I'd suggest st_frsize (like BSD statvfs() f_frsize) would be the minimum fragment or page size, st_iosize (BSD f_iosize) could be the optimal IO size, and "st_stripesize" for the minimum preferred RAID/chunk size. One could argue that "st_blksize" is used for the "optimal IO size" on Linux today, but this is an overloaded term. It _appears_ to represent the filesystem blocksize, which it usually is not, and on BSD st_bsize means the minimum blocksize and has a confusingly similar name. Since any application using this API needs to do some extra coding already, we may as well give the structure members good names that are not ambiguous. >>> Perhaps also exposing the project ID for quota purposes, like we >>> do UID and GID. That way we wouldn't need a filesystem specific >>> ioctl to read it.... >> >> Is this an XFS only thing? If so, can it be generalised? > > Right now it is, but there's been patches in the past to introduce > project quotas to ext4. That didn't go far because it was done in a > way that was semantically different to XFS (for no reason that I > could understand) and nobody wanted two different sets of semantics > for the "same" feature. The most common use of project quotas is to > implement sub-tree quotas, which is probably of more interest to > btrfs folks as it is an exact match for per-subvolume quotas. > > So, yes, I do see it as something generically useful - it's a > feature that a lot of people use XFS specifically for.... I'd agree. There was the tree quota project for ext4, and I've also heard this is available in other filesystems. Cheers, Andreas ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-27 19:31 ` Andreas Dilger @ 2012-04-28 0:58 ` Dave Chinner 2012-05-10 9:51 ` David Howells 1 sibling, 0 replies; 144+ messages in thread From: Dave Chinner @ 2012-04-28 0:58 UTC (permalink / raw) To: Andreas Dilger Cc: David Howells, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha On Fri, Apr 27, 2012 at 01:31:07PM -0600, Andreas Dilger wrote: > On 2012-04-27, at 7:13 AM, Dave Chinner wrote: > > Have a look at fs/xfs/xfs_dinode.h. There's a bunch of flags defined > > at the bottom of the file. > > > > Stuff like the "nodefrag", "nodump", and "prealloc" bits seem fairly > > generic - they are for indicating that files are to be avoided for > > defrag or backup purposes, the prealloc bit indicates that fallocate > > has been used to reserve space on the inode (finding files that space > > can be punched out of safely), and so on. > > There is already the FS_NODUMP_FL in the standard FS_IOC_GETFLAGS ioctl > and I expect this to be in statxat() also. I forgot that was one of the generic flags :/ > In ext4 there was also an > EXT4_EOFBLOCKS_FL added for inodes with fallocate'd data beyond EOF, > but Eric thought it was a pain to maintain and it has been deprecated > in ext4 and e2fsprogs recently. I'd think that flag is more of a "filesystem implementation specific" flag than a general "this file contained persistent preallocation" flag, which is essentially what the XFS flag says. XFS uses in various ways to optimise extent management on the file (e.g. don't truncate extents past EOF when closing the file), but it is not specific to one particular aspect of the preallocation implementation. > >> OTOH, there's plenty of uncommitted space, so if we can condense > >> the hints down to something small, we could perhaps add it later - > >> but from your paragraph above, it doesn't sound like it'll be small. > > > > Allocation block size, minimum sane IO size (to avoid page cache RMW > > cycles or DIO zeroing), minimum prefered IO size (e.g. stripe unit), > > optimal IO size for bandwidth (e.g. stripe width). I don't think > > there's much more than that which will be really usable by > > applications. > > I think this is a minimal set that makes sense, and is manageable for > both the interface and for users. Even if it isn't 100% correct for > every file of every filesystem, it still makes sense for many systems. That's the aim, isn't it? To expose what is useful to the majority in a simple manner? > I'd suggest st_frsize (like BSD statvfs() f_frsize) would be the > minimum fragment or page size, st_iosize (BSD f_iosize) could be > the optimal IO size, and "st_stripesize" for the minimum preferred RAID/chunk size. Personally, I think those names are, well, terribly lacking in obviousness. Something more along the lines of: st_blksize - file block size st_alloc_blksize - allocation block size/alignment st_small_io_size - IO size/alignment that avoids filesystem/page cache RMW st_preferred_io_size - preferred IO size for general usage. st_large_io_size - IO size/alignment for high bandwidth sequential IO With the aim that applications tend to use st_preferred_io_size for all general IO (i.e. the default), st_small_io_size for small IO, IOPS intensive workloads, and st_large_io_size for writing large chunks of sequential data. > One could argue that "st_blksize" is used for the "optimal IO size" > on Linux today, but this is an overloaded term. It _appears_ to > represent the filesystem blocksize, which it usually is not, and on > BSD st_bsize means the minimum blocksize and has a confusingly > similar name. Since any application using this API needs to do some > extra coding already, we may as well give the structure members good > names that are not ambiguous. Well said - I couldn't have stated the case better myself. ;) Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: [PATCH 0/6] Extended file stat system call 2012-04-27 19:31 ` Andreas Dilger 2012-04-28 0:58 ` Dave Chinner @ 2012-05-10 9:51 ` David Howells 1 sibling, 0 replies; 144+ messages in thread From: David Howells @ 2012-05-10 9:51 UTC (permalink / raw) To: Dave Chinner Cc: dhowells, Andreas Dilger, linux-fsdevel, linux-nfs, linux-cifs, samba-technical, linux-ext4, wine-devel, kfm-devel, nautilus-list, linux-api, libc-alpha Dave Chinner <david@fromorbit.com> wrote: > st_blksize - file block size > st_alloc_blksize - allocation block size/alignment > st_small_io_size - IO size/alignment that avoids > filesystem/page cache RMW > st_preferred_io_size - preferred IO size for general > usage. > st_large_io_size - IO size/alignment for high > bandwidth sequential IO What is st_blksize here? Is it directly comparable (if such a thing is possible) to st_blksize in struct stat? Or does st_preferred_io_size map to the current st_blksize and your st_blksize map to actual media block size (and if so, is that the same as st_alloc_blksize? - I presume st_alloc_blksize must be a multiple of the media block size). David ^ permalink raw reply [flat|nested] 144+ messages in thread
end of thread, other threads:[~2012-05-11 8:54 UTC | newest] Thread overview: 144+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-04-19 14:05 [PATCH 0/6] Extended file stat system call David Howells 2012-04-19 14:05 ` David Howells 2012-04-19 14:06 ` [PATCH 3/6] xstat: AFS: Return extended attributes David Howells 2012-04-19 14:06 ` [PATCH 4/6] xstat: NFS: " David Howells [not found] ` <20120419140653.17272.95035.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2012-04-19 14:35 ` Myklebust, Trond 2012-04-19 14:35 ` Myklebust, Trond 2012-04-26 13:52 ` David Howells 2012-04-19 14:07 ` [PATCH 5/6] xstat: CIFS: " David Howells [not found] ` <20120419140706.17272.72290.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2012-04-19 15:19 ` Steve French 2012-04-19 15:19 ` Steve French 2012-04-19 16:32 ` [PATCH 0/6] Extended file stat system call Roland McGrath 2012-04-19 21:51 ` Paul Eggert 2012-04-19 23:05 ` Roland McGrath 2012-04-26 14:16 ` David Howells [not found] ` <20173.1335449760-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-04-26 18:22 ` Roland McGrath 2012-04-26 18:22 ` Roland McGrath [not found] ` <4F9088D6.9020203-764C0pRuGfqVc3sceRu5cw@public.gmane.org> 2012-04-26 14:04 ` David Howells 2012-04-26 14:04 ` David Howells [not found] ` <19638.1335449047-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-04-26 18:24 ` Roland McGrath 2012-04-26 18:24 ` Roland McGrath 2012-04-19 23:29 ` Andreas Dilger 2012-04-26 13:54 ` David Howells [not found] ` <19184.1335448455-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-04-26 18:25 ` Roland McGrath 2012-04-26 18:25 ` Roland McGrath 2012-04-27 23:54 ` Paul Eggert [not found] ` <20120426182524.E5ADF2C0EC-j1d2VQoJOwwHfwO+Tb3JRVaTQe2KTcn/@public.gmane.org> 2012-04-26 21:54 ` David Howells 2012-04-26 21:54 ` David Howells [not found] ` <9931.1335477281-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-04-26 22:02 ` Roland McGrath 2012-04-26 22:02 ` Roland McGrath 2012-04-26 22:21 ` Nix 2012-04-26 14:25 ` David Howells 2012-04-26 14:54 ` Steve French [not found] ` <CAH2r5mv1Lijdwk5zsQwYJr4Etb6fhrRyNXm-iFCQX+HecboGrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2012-04-26 15:25 ` Myklebust, Trond 2012-04-26 15:25 ` Myklebust, Trond 2012-04-26 16:56 ` Steve French 2012-04-26 16:56 ` Steve French [not found] ` <CAH2r5mt5af-_hxBRKK72iD5Gr99bo91ec78Rov8EGVEx8=21mA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2012-04-26 17:00 ` Myklebust, Trond 2012-04-26 17:00 ` Myklebust, Trond 2012-04-26 17:03 ` Steve French 2012-04-26 17:03 ` Steve French [not found] ` <CAH2r5mvmCfLrxRHje6Wx5X84zxPEHwRMUJGsjvWBujMu7w841w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2012-04-26 17:06 ` Myklebust, Trond 2012-04-26 17:06 ` Myklebust, Trond [not found] ` <1335460011.9701.30.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 2012-04-26 17:09 ` Steve French 2012-04-26 17:09 ` Steve French [not found] ` <CAH2r5muXk+frkFz9X523Ny=RMwJGeqOPH75G1ToNa5QoMo5SkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2012-04-26 17:10 ` Steve French 2012-04-26 17:10 ` Steve French 2012-04-26 21:57 ` David Howells 2012-04-26 21:57 ` David Howells [not found] ` <10104.1335477476-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-04-26 22:05 ` Roland McGrath 2012-04-26 22:05 ` Roland McGrath [not found] ` <20120426220552.D98D62C0D3-j1d2VQoJOwwHfwO+Tb3JRVaTQe2KTcn/@public.gmane.org> 2012-04-27 0:33 ` Myklebust, Trond 2012-04-27 0:33 ` Myklebust, Trond 2012-04-27 0:30 ` Myklebust, Trond 2012-04-27 0:30 ` Myklebust, Trond 2012-04-26 15:52 ` David Howells 2012-04-27 0:29 ` Andreas Dilger 2012-04-27 0:29 ` Andreas Dilger [not found] ` <3F302713-B675-4BAA-B2B7-235E03C5975F-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org> 2012-04-27 9:19 ` David Howells 2012-04-27 9:19 ` David Howells [not found] ` <20120419140558.17272.74360.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2012-04-19 14:06 ` [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available David Howells 2012-04-19 14:06 ` David Howells 2012-04-19 23:36 ` Andreas Dilger [not found] ` <20120419140612.17272.57774.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2012-04-24 21:29 ` J. Bruce Fields 2012-04-24 21:29 ` J. Bruce Fields [not found] ` <20120424212911.GA26073-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2012-04-24 22:08 ` Steve French 2012-04-24 22:08 ` Steve French 2012-04-25 14:44 ` Andreas Dilger 2012-04-25 14:44 ` Andreas Dilger 2012-04-26 13:45 ` David Howells [not found] ` <18765.1335447954-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-04-26 14:28 ` J. Bruce Fields 2012-04-26 14:28 ` J. Bruce Fields 2012-04-26 17:06 ` Steve French 2012-04-26 17:06 ` Steve French 2012-04-26 13:32 ` David Howells [not found] ` <18195.1335447156-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-04-27 0:51 ` Dave Chinner 2012-04-27 0:51 ` Dave Chinner 2012-04-27 3:11 ` Andreas Dilger 2012-04-27 3:11 ` Andreas Dilger 2012-04-26 13:40 ` David Howells [not found] ` <18533.1335447617-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-04-26 14:23 ` J. Bruce Fields 2012-04-26 14:23 ` J. Bruce Fields 2012-04-30 16:27 ` Ben Hutchings 2012-04-30 20:15 ` David Howells 2012-04-30 20:30 ` J. Bruce Fields 2012-04-30 23:31 ` Ben Hutchings 2012-04-19 14:06 ` [PATCH 2/6] xstat: Ext4: Return extended attributes David Howells 2012-04-19 14:06 ` David Howells [not found] ` <20120419140625.17272.23303.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2012-04-19 16:03 ` Steve French 2012-04-19 16:03 ` Steve French 2012-04-26 13:47 ` David Howells 2012-04-26 17:00 ` Steve French 2012-04-26 17:00 ` Steve French 2012-04-19 14:07 ` [PATCH 6/6] xstat: eCryptFS: " David Howells 2012-04-19 14:07 ` David Howells 2012-04-19 17:11 ` [PATCH 0/6] Extended file stat system call Steve French 2012-04-19 17:11 ` Steve French 2012-04-27 1:06 ` Dave Chinner 2012-04-27 1:06 ` Dave Chinner 2012-04-27 3:22 ` Andreas Dilger [not found] ` <ED5B8F1B-6C99-4516-85FA-A767E94B635F-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org> 2012-04-28 0:38 ` Dave Chinner 2012-04-28 0:38 ` Dave Chinner 2012-04-28 0:54 ` Steve French 2012-05-08 20:19 ` Extended file stat: Splitting file- and fs-specific info? David Howells 2012-05-08 20:19 ` David Howells 2012-05-08 21:13 ` Myklebust, Trond 2012-05-08 21:13 ` Myklebust, Trond [not found] ` <16281.1336508382-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-05-09 0:24 ` Dave Chinner 2012-05-09 0:24 ` Dave Chinner 2012-05-09 1:09 ` J. Bruce Fields 2012-05-09 1:09 ` J. Bruce Fields 2012-05-09 4:25 ` Dave Chinner 2012-05-09 11:14 ` J. Bruce Fields 2012-05-09 11:14 ` J. Bruce Fields 2012-05-09 1:16 ` Andreas Dilger 2012-05-09 1:16 ` Andreas Dilger 2012-05-10 9:23 ` David Howells [not found] ` <14477.1336641794-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-05-10 16:05 ` Andreas Dilger 2012-05-10 16:05 ` Andreas Dilger 2012-05-10 17:10 ` Roland McGrath 2012-05-10 17:10 ` Roland McGrath 2012-05-11 8:54 ` Andreas Dilger 2012-05-09 9:21 ` David Howells 2012-05-09 9:21 ` David Howells [not found] ` <20170.1336555274-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-05-09 11:19 ` Christoph Hellwig 2012-05-09 11:19 ` Christoph Hellwig [not found] ` <20120509111958.GA11345-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2012-05-09 11:55 ` Bernd Schubert 2012-05-09 11:55 ` Bernd Schubert [not found] ` <4FAA5B24.1020306-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org> 2012-05-09 12:05 ` Christoph Hellwig 2012-05-09 12:05 ` Christoph Hellwig [not found] ` <20120509120544.GA17535-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2012-05-09 12:25 ` Bernd Schubert 2012-05-09 12:25 ` Bernd Schubert 2012-05-09 13:51 ` Andreas Dilger 2012-05-09 14:12 ` Bernd Schubert 2012-05-10 9:14 ` David Howells 2012-05-10 9:14 ` David Howells 2012-04-27 9:39 ` [PATCH 0/6] Extended file stat system call David Howells [not found] ` <4111.1335519545-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-04-27 13:13 ` Dave Chinner 2012-04-27 13:13 ` Dave Chinner 2012-04-27 15:10 ` J. Bruce Fields [not found] ` <20120427151057.GA16580-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2012-04-27 16:32 ` Steve French 2012-04-27 16:32 ` Steve French 2012-04-27 19:31 ` Andreas Dilger 2012-04-28 0:58 ` Dave Chinner 2012-05-10 9:51 ` David Howells
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.