All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v2 0/6] ext4: yet another project quota
@ 2015-03-10 17:22 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-10 17:22 UTC (permalink / raw)
  To: linux-fsdevel, Dave Chinner, Jan Kara, linux-ext4, Theodore Ts'o
  Cc: Dmitry Monakhov, Andy Lutomirski, linux-kernel, Li Xi

Projects quota allows to enforce disk quota for several subtrees or even
individual files on the filesystem. Each inode is marked with project-id
(independently from uid and gid) and accounted into corresponding project
quota. New files inherits project id from directory where they are created.

This is must-have feature for deploying lightweight containers.
Also project quota can tell size of subtree without doing recursive 'du'.

This patchset adds project id and quota into ext4.

This time I've prepared patches also for e2fsprogs and quota-tools.

All patches are available at github:
https://github.com/koct9i/linux --branch project
https://github.com/koct9i/e2fsprogs --branch project
https://github.com/koct9i/quota-tools --branch project

---

Porposed behavior is similar to project quota in XFS:

* all inode are marked with project id
* new files inherits project id from parent directory
* project quota accounts inodes and enforces limits
* cross-project link and rename operations are restricted

Differences:

There is no flag similar to XFS_XFLAG_PROJINHERIT (which allows to disable
project id inheritance), instead of that project which userspace sees as '0'
(in nested user-name space that might be actually non-zero project) acts as
default project where restrictions for link/rename are ignored.
(also see below, in "why new ioctl" part)

This implementation adds shortcut for moving files from one project into
another: non-directory inodes with n_link == 1 are transferred without
copying (XFS in this case returns -EXDEV and userspace have to copy file).

In XFS file owner (or process with CAP_FOWNER) can set set any project id,
XFS permits changing project id only from init user-namespace.

This patchset adds sysctl fs.protected_projects. By default it's 0 and project
id acts as XFS project. Setting it to 1 makes chaning project id priviliged
operation which requires CAP_SYS_RESOURCE in current user-namespace, changing
project id mapping for nested user-namespace also requires that capability.
Thus there are two levels of control: project id mapping in user-ns defines set
of permitted projects and capability protects operations within this set.

I see no problems with supporting all this in XFS, all difference in interface.

Ext4 layout
-----------

Project id introduce ro-compatible feature 'project'.

Inode project id is stored in place of obsolete field 'i_faddr' (that trick was
suggested by Darrick J. Wong in previous discussions of project quota).
Linux never used that field and present fsck checks that it contains zero.

Quota information is stored in special inode №11 (by default only 10 inodes are
reserved for special usage, I've add option to resize2fs to reserve more).
(see e2fsprogs patches for details) For symmetry with other quotas inode number
is stored in superblock.

Project quota supports only modern 'hidden' journaled mode.

Interface
---------

Interface for changing limits / query current usage is common vfs quotactl()
where quotatype = PRJQUOTA = 2. User can query current state of any project
mapped into user-ns, changing of limits requires CAP_SYS_ADMIN in init user-ns.

Two new ioctls for getting / changing inode project id:
int ioctl(fd, FS_IOC_GETPROJECT, unsigned *project);
int ioctl(fd, FS_IOC_SETPROJECT, unsigned *project);

They acts as interface for super-block methods get_project / set_project
Generic code checks permissions, does project id translation in user-namespace
mapping, grabs write-access to the filesystem, locks i_mutex for set opetaion.
Filesystem method only updates inode and transfers project quota.

No new mount options added. Disk usage tracking is enabled at mount.
Limits are enabeld later by "quotaon".

(BTW why journalled quota doesn't enable limits right at the time of mounting?)

Why new ioctls?
---------------

XFS has it's own interface for that: XFS_IOC_FSGETXATTR / XFS_IOC_FSSETXATTR.
But it has several flaws and doesn't fit for a role of generic interface.

It contains a lot of xfs-specific flags and racy by design: set operation
commits all fields at once thus it's used in sequence get-change-set without
any lock, Concurrent updates from user space will collide.

Also xfs has flag XFS_XFLAG_PROJINHERIT which tells should new files inherit
project id from parent directory or not. This flag is barely useful and only
makes everything complicated. Even tools in xfsprogs don't use it: they always
set it together with project id and clears when set project id back to zero.

And the main reason: this compatibility gives nothing. The only user of xfs
ioctl which I've found is the xfsprogs. And these tools check filesystem name
and don't work anywhere except 'xfs'.

Links
-----

[1] 2014-12-09  ext4: add project quota support by Li Xi
http://marc.info/?l=linux-fsdevel&m=141810265603565&w=2

[2] 2014-01-28  A draft for making ext4 support project quota by Zheng Liu
http://marc.info/?l=linux-ext4&m=139089109605795&w=2

[3] 2012-07-09  introduce extended inode owner identifier v10 by Dmitry Monakhov
http://thread.gmane.org/gmane.linux.file-systems/65752

[4] 2010-02-08  Introduce subtree quota support by Dmitry Monakhov
http://thread.gmane.org/gmane.comp.file-systems.ext4/17530


---

Konstantin Khlebnikov (6):
      fs: vfs ioctls for managing project id
      fs: protected project id
      quota: generic project quota
      ext4: support project id and project quota
      ext4: add shortcut for moving files across projects
      ext4: mangle statfs results accourding to project quota usage and limits


 Documentation/filesystems/Locking |    4 +
 Documentation/filesystems/vfs.txt |    8 +++
 Documentation/sysctl/fs.txt       |   16 ++++++
 fs/compat_ioctl.c                 |    2 +
 fs/ext4/ext4.h                    |   15 ++++-
 fs/ext4/ialloc.c                  |    3 +
 fs/ext4/inode.c                   |   15 +++++
 fs/ext4/namei.c                   |  102 ++++++++++++++++++++++++++++++++++++-
 fs/ext4/super.c                   |   61 ++++++++++++++++++++--
 fs/ioctl.c                        |   62 ++++++++++++++++++++++
 fs/quota/dquot.c                  |   96 +++++++++++++++++++++++++++++++++--
 fs/quota/quota.c                  |    8 ++-
 fs/quota/quotaio_v2.h             |    6 +-
 include/linux/fs.h                |    3 +
 include/linux/quota.h             |    1 
 include/linux/quotaops.h          |   16 ++++++
 include/uapi/linux/capability.h   |    1 
 include/uapi/linux/fs.h           |    3 +
 include/uapi/linux/quota.h        |    6 +-
 kernel/sysctl.c                   |    9 +++
 kernel/user_namespace.c           |    4 +
 21 files changed, 416 insertions(+), 25 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RFC v2 0/6] ext4: yet another project quota
@ 2015-03-10 17:22 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-10 17:22 UTC (permalink / raw)
  To: linux-fsdevel, Dave Chinner, Jan Kara, linux-ext4, Theodore Ts'o
  Cc: Dmitry Monakhov, Andy Lutomirski, linux-kernel, Li Xi

Projects quota allows to enforce disk quota for several subtrees or even
individual files on the filesystem. Each inode is marked with project-id
(independently from uid and gid) and accounted into corresponding project
quota. New files inherits project id from directory where they are created.

This is must-have feature for deploying lightweight containers.
Also project quota can tell size of subtree without doing recursive 'du'.

This patchset adds project id and quota into ext4.

This time I've prepared patches also for e2fsprogs and quota-tools.

All patches are available at github:
https://github.com/koct9i/linux --branch project
https://github.com/koct9i/e2fsprogs --branch project
https://github.com/koct9i/quota-tools --branch project

---

Porposed behavior is similar to project quota in XFS:

* all inode are marked with project id
* new files inherits project id from parent directory
* project quota accounts inodes and enforces limits
* cross-project link and rename operations are restricted

Differences:

There is no flag similar to XFS_XFLAG_PROJINHERIT (which allows to disable
project id inheritance), instead of that project which userspace sees as '0'
(in nested user-name space that might be actually non-zero project) acts as
default project where restrictions for link/rename are ignored.
(also see below, in "why new ioctl" part)

This implementation adds shortcut for moving files from one project into
another: non-directory inodes with n_link == 1 are transferred without
copying (XFS in this case returns -EXDEV and userspace have to copy file).

In XFS file owner (or process with CAP_FOWNER) can set set any project id,
XFS permits changing project id only from init user-namespace.

This patchset adds sysctl fs.protected_projects. By default it's 0 and project
id acts as XFS project. Setting it to 1 makes chaning project id priviliged
operation which requires CAP_SYS_RESOURCE in current user-namespace, changing
project id mapping for nested user-namespace also requires that capability.
Thus there are two levels of control: project id mapping in user-ns defines set
of permitted projects and capability protects operations within this set.

I see no problems with supporting all this in XFS, all difference in interface.

Ext4 layout
-----------

Project id introduce ro-compatible feature 'project'.

Inode project id is stored in place of obsolete field 'i_faddr' (that trick was
suggested by Darrick J. Wong in previous discussions of project quota).
Linux never used that field and present fsck checks that it contains zero.

Quota information is stored in special inode №11 (by default only 10 inodes are
reserved for special usage, I've add option to resize2fs to reserve more).
(see e2fsprogs patches for details) For symmetry with other quotas inode number
is stored in superblock.

Project quota supports only modern 'hidden' journaled mode.

Interface
---------

Interface for changing limits / query current usage is common vfs quotactl()
where quotatype = PRJQUOTA = 2. User can query current state of any project
mapped into user-ns, changing of limits requires CAP_SYS_ADMIN in init user-ns.

Two new ioctls for getting / changing inode project id:
int ioctl(fd, FS_IOC_GETPROJECT, unsigned *project);
int ioctl(fd, FS_IOC_SETPROJECT, unsigned *project);

They acts as interface for super-block methods get_project / set_project
Generic code checks permissions, does project id translation in user-namespace
mapping, grabs write-access to the filesystem, locks i_mutex for set opetaion.
Filesystem method only updates inode and transfers project quota.

No new mount options added. Disk usage tracking is enabled at mount.
Limits are enabeld later by "quotaon".

(BTW why journalled quota doesn't enable limits right at the time of mounting?)

Why new ioctls?
---------------

XFS has it's own interface for that: XFS_IOC_FSGETXATTR / XFS_IOC_FSSETXATTR.
But it has several flaws and doesn't fit for a role of generic interface.

It contains a lot of xfs-specific flags and racy by design: set operation
commits all fields at once thus it's used in sequence get-change-set without
any lock, Concurrent updates from user space will collide.

Also xfs has flag XFS_XFLAG_PROJINHERIT which tells should new files inherit
project id from parent directory or not. This flag is barely useful and only
makes everything complicated. Even tools in xfsprogs don't use it: they always
set it together with project id and clears when set project id back to zero.

And the main reason: this compatibility gives nothing. The only user of xfs
ioctl which I've found is the xfsprogs. And these tools check filesystem name
and don't work anywhere except 'xfs'.

Links
-----

[1] 2014-12-09  ext4: add project quota support by Li Xi
http://marc.info/?l=linux-fsdevel&m=141810265603565&w=2

[2] 2014-01-28  A draft for making ext4 support project quota by Zheng Liu
http://marc.info/?l=linux-ext4&m=139089109605795&w=2

[3] 2012-07-09  introduce extended inode owner identifier v10 by Dmitry Monakhov
http://thread.gmane.org/gmane.linux.file-systems/65752

[4] 2010-02-08  Introduce subtree quota support by Dmitry Monakhov
http://thread.gmane.org/gmane.comp.file-systems.ext4/17530


---

Konstantin Khlebnikov (6):
      fs: vfs ioctls for managing project id
      fs: protected project id
      quota: generic project quota
      ext4: support project id and project quota
      ext4: add shortcut for moving files across projects
      ext4: mangle statfs results accourding to project quota usage and limits


 Documentation/filesystems/Locking |    4 +
 Documentation/filesystems/vfs.txt |    8 +++
 Documentation/sysctl/fs.txt       |   16 ++++++
 fs/compat_ioctl.c                 |    2 +
 fs/ext4/ext4.h                    |   15 ++++-
 fs/ext4/ialloc.c                  |    3 +
 fs/ext4/inode.c                   |   15 +++++
 fs/ext4/namei.c                   |  102 ++++++++++++++++++++++++++++++++++++-
 fs/ext4/super.c                   |   61 ++++++++++++++++++++--
 fs/ioctl.c                        |   62 ++++++++++++++++++++++
 fs/quota/dquot.c                  |   96 +++++++++++++++++++++++++++++++++--
 fs/quota/quota.c                  |    8 ++-
 fs/quota/quotaio_v2.h             |    6 +-
 include/linux/fs.h                |    3 +
 include/linux/quota.h             |    1 
 include/linux/quotaops.h          |   16 ++++++
 include/uapi/linux/capability.h   |    1 
 include/uapi/linux/fs.h           |    3 +
 include/uapi/linux/quota.h        |    6 +-
 kernel/sysctl.c                   |    9 +++
 kernel/user_namespace.c           |    4 +
 21 files changed, 416 insertions(+), 25 deletions(-)

--
Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RFC v2 1/6] fs: vfs ioctls for managing project id
  2015-03-10 17:22 ` Konstantin Khlebnikov
  (?)
@ 2015-03-10 17:22 ` Konstantin Khlebnikov
  2015-03-11  7:00   ` Andreas Dilger
  -1 siblings, 1 reply; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-10 17:22 UTC (permalink / raw)
  To: linux-fsdevel, Dave Chinner, Jan Kara, linux-ext4, Theodore Ts'o
  Cc: Dmitry Monakhov, Andy Lutomirski, linux-kernel, Li Xi

This patch adds generic vfs interface for project id and
related super-block methods.

Two new ioctls:
int ioctl(fd, FS_IOC_GETPROJECT, unsigned *project);
int ioctl(fd, FS_IOC_SETPROJECT, unsigned *project);

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 Documentation/filesystems/Locking |    4 +++
 Documentation/filesystems/vfs.txt |    8 +++++
 fs/compat_ioctl.c                 |    2 +
 fs/ioctl.c                        |   58 +++++++++++++++++++++++++++++++++++++
 include/linux/fs.h                |    2 +
 include/uapi/linux/fs.h           |    3 ++
 6 files changed, 77 insertions(+)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index f91926f2f482..7a01e2b606d0 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -126,6 +126,8 @@ prototypes:
 	ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
 	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
 	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
+	kprojid_t (*get_project) (struct inode *);
+	int (*set_project) (struct inode *, kprojid_t);
 
 locking rules:
 	All may block [not true, see below]
@@ -147,6 +149,8 @@ show_options:		no		(namespace_sem)
 quota_read:		no		(see below)
 quota_write:		no		(see below)
 bdev_try_to_free_page:	no		(see below)
+get_project		no
+set_project		no		(i_mutex)
 
 ->statfs() has s_umount (shared) when called by ustat(2) (native or
 compat), but that's an accident of bad API; s_umount is used to pin
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 966b22829f3b..aeff3c54021b 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -230,6 +230,8 @@ struct super_operations {
         ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
 	int (*nr_cached_objects)(struct super_block *);
 	void (*free_cached_objects)(struct super_block *, int);
+	kprojid_t (*get_project)(struct inode *);
+	int (*set_project)(struct inode *, kprojid_t);
 };
 
 All methods are called without any locks being held, unless otherwise
@@ -319,6 +321,12 @@ or bottom half).
 	implementations will cause holdoff problems due to large scan batch
 	sizes.
 
+  get_project: called by the ioctl and quota code to get project id of an inode.
+	Method should return INVALID_PROJID if feature isn't supported.
+
+  set_project: called by the ioctl to set project id for an inode.
+	This is called with i_mutex held.
+
 Whoever sets up the inode is responsible for filling in the "i_op" field. This
 is a pointer to a "struct inode_operations" which describes the methods that
 can be performed on individual inodes.
diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index afec6450450f..9a24e4038377 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -889,6 +889,8 @@ COMPATIBLE_IOCTL(FIOASYNC)
 COMPATIBLE_IOCTL(FIONBIO)
 COMPATIBLE_IOCTL(FIONREAD)  /* This is also TIOCINQ */
 COMPATIBLE_IOCTL(FS_IOC_FIEMAP)
+COMPATIBLE_IOCTL(FS_IOC_GETPROJECT)
+COMPATIBLE_IOCTL(FS_IOC_SETPROJECT)
 /* 0x00 */
 COMPATIBLE_IOCTL(FIBMAP)
 COMPATIBLE_IOCTL(FIGETBSZ)
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 5d01d2638ca5..d351576d95c8 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -9,6 +9,7 @@
 #include <linux/capability.h>
 #include <linux/file.h>
 #include <linux/fs.h>
+#include <linux/mount.h>
 #include <linux/security.h>
 #include <linux/export.h>
 #include <linux/uaccess.h>
@@ -545,6 +546,57 @@ static int ioctl_fsthaw(struct file *filp)
 	return thaw_super(sb);
 }
 
+static int ioctl_getproject(struct file *filp, projid_t __user *argp)
+{
+	struct user_namespace *ns = current_user_ns();
+	struct inode *inode = file_inode(filp);
+	struct super_block *sb = inode->i_sb;
+	kprojid_t kprojid;
+	projid_t projid;
+
+	if (!sb->s_op->get_project)
+		return -EOPNOTSUPP;
+	kprojid = sb->s_op->get_project(inode);
+	if (!projid_valid(kprojid))
+		return -EOPNOTSUPP;
+	projid = from_kprojid(ns, kprojid);
+	if (projid == (projid_t)-1)
+		return -EACCES;
+	return put_user(projid, argp);
+}
+
+static int ioctl_setproject(struct file *filp, projid_t __user *argp)
+{
+	struct user_namespace *ns = current_user_ns();
+	struct inode *inode = file_inode(filp);
+	struct super_block *sb = inode->i_sb;
+	kprojid_t kprojid;
+	projid_t projid;
+	int ret;
+
+	if (!sb->s_op->set_project)
+		return -EOPNOTSUPP;
+	if (ns != &init_user_ns)
+		return -EPERM;
+	ret = get_user(projid, argp);
+	if (ret)
+		return ret;
+	kprojid = make_kprojid(ns, projid);
+	if (!projid_valid(kprojid))
+		return -EACCES;
+	ret = mnt_want_write_file(filp);
+	if (ret)
+		return ret;
+	mutex_lock(&inode->i_mutex);
+	if (inode_owner_or_capable(inode))
+		ret = sb->s_op->set_project(inode, kprojid);
+	else
+		ret = -EPERM;
+	mutex_unlock(&inode->i_mutex);
+	mnt_drop_write_file(filp);
+	return ret;
+}
+
 /*
  * When you add any new common ioctls to the switches above and below
  * please update compat_sys_ioctl() too.
@@ -597,6 +649,12 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
 	case FS_IOC_FIEMAP:
 		return ioctl_fiemap(filp, arg);
 
+	case FS_IOC_GETPROJECT:
+		return ioctl_getproject(filp, argp);
+
+	case FS_IOC_SETPROJECT:
+		return ioctl_setproject(filp, argp);
+
 	case FIGETBSZ:
 		return put_user(inode->i_sb->s_blocksize, argp);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b4d71b5e1ff2..42156801739e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1655,6 +1655,8 @@ struct super_operations {
 				  struct shrink_control *);
 	long (*free_cached_objects)(struct super_block *,
 				    struct shrink_control *);
+	kprojid_t (*get_project)(struct inode *);
+	int (*set_project)(struct inode *, kprojid_t);
 };
 
 /*
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 9b964a5920af..144426326e15 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -170,6 +170,9 @@ struct inodes_stat_t {
 #define FS_IOC32_GETVERSION		_IOR('v', 1, int)
 #define FS_IOC32_SETVERSION		_IOW('v', 2, int)
 
+#define FS_IOC_GETPROJECT		_IOR('f', 20, unsigned)
+#define FS_IOC_SETPROJECT		_IOW('f', 21, unsigned)
+
 /*
  * Inode flags (FS_IOC_GETFLAGS / FS_IOC_SETFLAGS)
  */


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC v2 2/6] fs: protected project id
  2015-03-10 17:22 ` Konstantin Khlebnikov
  (?)
  (?)
@ 2015-03-10 17:22 ` Konstantin Khlebnikov
  2015-03-10 17:32   ` Andy Lutomirski
  -1 siblings, 1 reply; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-10 17:22 UTC (permalink / raw)
  To: linux-fsdevel, Dave Chinner, Jan Kara, linux-ext4, Theodore Ts'o
  Cc: Dmitry Monakhov, Andy Lutomirski, linux-kernel, Li Xi

Historically XFS project id doesn't have any permission control: file owner
is able to set any project id. Later they was sealed with user-namespace:
XFS allows to change it only from init user-ns. That works fine for isolated
containers or if user doesn't have direct access to the filesystem (NFS/FTP).

This patch adds sysctl fs.protected_projects which makes changing project id
privileged operation which requires CAP_SYS_RESOURCE in current user-namespace.
Thus there are two levels of protection: project id mapping in user-ns defines
set of permitted projects and capability protects operations within this set.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 Documentation/sysctl/fs.txt     |   16 ++++++++++++++++
 fs/ioctl.c                      |    6 +++++-
 include/linux/fs.h              |    1 +
 include/uapi/linux/capability.h |    1 +
 kernel/sysctl.c                 |    9 +++++++++
 kernel/user_namespace.c         |    4 ++--
 6 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 88152f214f48..9f6579b99be6 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -34,6 +34,7 @@ Currently, these files are in /proc/sys/fs:
 - overflowgid
 - protected_hardlinks
 - protected_symlinks
+- protected_projects
 - suid_dumpable
 - super-max
 - super-nr
@@ -199,6 +200,21 @@ This protection is based on the restrictions in Openwall and grsecurity.
 
 ==============================================================
 
+protected_projects:
+
+Project id allows to enforce disk quota for several subtrees or individual
+files on the filesystem. Historically changing project id was a unprivileged
+operation and file owner is able to set any project id.
+
+When set to "0", changing project id is unprivileged operation. File owner
+can set any project id mapped in current user namespace.
+
+When set to "1" changing project id requires capability CAP_SYS_RESOURCE
+in current user namespace. Also defining project id mapping for nested
+user namespace requires CAP_SYS_RESOURCE in the parent user namespace.
+
+==============================================================
+
 suid_dumpable:
 
 This value can be used to query and set the core dump mode for setuid
diff --git a/fs/ioctl.c b/fs/ioctl.c
index d351576d95c8..2acf5efbc045 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -565,6 +565,8 @@ static int ioctl_getproject(struct file *filp, projid_t __user *argp)
 	return put_user(projid, argp);
 }
 
+int sysctl_protected_projects;
+
 static int ioctl_setproject(struct file *filp, projid_t __user *argp)
 {
 	struct user_namespace *ns = current_user_ns();
@@ -576,7 +578,9 @@ static int ioctl_setproject(struct file *filp, projid_t __user *argp)
 
 	if (!sb->s_op->set_project)
 		return -EOPNOTSUPP;
-	if (ns != &init_user_ns)
+	if (sysctl_protected_projects ?
+	    !ns_capable(ns, CAP_SYS_RESOURCE) :
+	    (ns != &init_user_ns))
 		return -EPERM;
 	ret = get_user(projid, argp);
 	if (ret)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 42156801739e..d3021feb3f7f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -64,6 +64,7 @@ extern struct inodes_stat_t inodes_stat;
 extern int leases_enable, lease_break_time;
 extern int sysctl_protected_symlinks;
 extern int sysctl_protected_hardlinks;
+extern int sysctl_protected_projects;
 
 struct buffer_head;
 typedef int (get_block_t)(struct inode *inode, sector_t iblock,
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 12c37a197d24..0292885567cc 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -278,6 +278,7 @@ struct vfs_cap_data {
 /* Override resource limits. Set resource limits. */
 /* Override quota limits. */
 /* Override reserved space on ext2 filesystem */
+/* Modify file project id if protected_projects = 1 */
 /* Modify data journaling mode on ext3 filesystem (uses journaling
    resources) */
 /* NOTE: ext2 honors fsuid when checking for resource overrides, so
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 88ea2d6e0031..cb6f9fb13de3 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1649,6 +1649,15 @@ static struct ctl_table fs_table[] = {
 		.extra2		= &one,
 	},
 	{
+		.procname	= "protected_projects",
+		.data		= &sysctl_protected_projects,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{
 		.procname	= "suid_dumpable",
 		.data		= &suid_dumpable,
 		.maxlen		= sizeof(int),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 4109f8320684..88f66198b251 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -807,8 +807,8 @@ ssize_t proc_projid_map_write(struct file *file, const char __user *buf,
 	if ((seq_ns != ns) && (seq_ns != ns->parent))
 		return -EPERM;
 
-	/* Anyone can set any valid project id no capability needed */
-	return map_write(file, buf, size, ppos, -1,
+	return map_write(file, buf, size, ppos,
+			 sysctl_protected_projects ? CAP_SYS_RESOURCE : -1,
 			 &ns->projid_map, &ns->parent->projid_map);
 }
 


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC v2 3/6] quota: generic project quota
  2015-03-10 17:22 ` Konstantin Khlebnikov
                   ` (2 preceding siblings ...)
  (?)
@ 2015-03-10 17:22 ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-10 17:22 UTC (permalink / raw)
  To: linux-fsdevel, Dave Chinner, Jan Kara, linux-ext4, Theodore Ts'o
  Cc: Dmitry Monakhov, Andy Lutomirski, linux-kernel, Li Xi

This patch adds infrastructure for project quotas in generic vfs quota.

User can query current usage and limits for all projects mapped into current
user-namespace. Changing limits requires CAP_SYS_ADMIN in init user-namespace.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/quota/dquot.c           |   33 +++++++++++++++++++++++++++++++--
 fs/quota/quota.c           |    8 ++++++--
 fs/quota/quotaio_v2.h      |    6 ++++--
 include/linux/quota.h      |    1 +
 include/linux/quotaops.h   |    1 +
 include/uapi/linux/quota.h |    6 ++++--
 6 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 0ccd4ba3a246..04c27cdeca05 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1159,8 +1159,8 @@ static int need_print_warning(struct dquot_warn *warn)
 			return uid_eq(current_fsuid(), warn->w_dq_id.uid);
 		case GRPQUOTA:
 			return in_group_p(warn->w_dq_id.gid);
-		case PRJQUOTA:	/* Never taken... Just make gcc happy */
-			return 0;
+		case PRJQUOTA:
+			return 1;
 	}
 	return 0;
 }
@@ -1418,6 +1418,13 @@ static void __dquot_initialize(struct inode *inode, int type)
 		case GRPQUOTA:
 			qid = make_kqid_gid(inode->i_gid);
 			break;
+		case PRJQUOTA:
+			if (sb->s_op->get_project)
+				qid = make_kqid_projid(sb->s_op->
+						get_project(inode));
+			else
+				qid = make_kqid_invalid(PRJQUOTA);
+			break;
 		}
 		got[cnt] = dqget(sb, qid);
 	}
@@ -1951,6 +1958,24 @@ int dquot_transfer(struct inode *inode, struct iattr *iattr)
 EXPORT_SYMBOL(dquot_transfer);
 
 /*
+ * Helper function for transferring inode into another project.
+ */
+int dquot_transfer_project(struct inode *inode, kprojid_t projid)
+{
+	struct dquot *transfer_to[MAXQUOTAS] = {};
+	struct super_block *sb = inode->i_sb;
+	int ret;
+
+	if (!sb_has_quota_active(sb, PRJQUOTA))
+		return 0;
+	transfer_to[PRJQUOTA] = dqget(sb, make_kqid_projid(projid));
+	ret = __dquot_transfer(inode, transfer_to);
+	dqput_all(transfer_to);
+	return ret;
+}
+EXPORT_SYMBOL(dquot_transfer_project);
+
+/*
  * Write info of quota file to disk
  */
 int dquot_commit_info(struct super_block *sb, int type)
@@ -2165,6 +2190,10 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id,
 		error = -EINVAL;
 		goto out_fmt;
 	}
+	if (type == PRJQUOTA && !sb->s_op->get_project) {
+		error = -EINVAL;
+		goto out_fmt;
+	}
 	/* Usage always has to be set... */
 	if (!(flags & DQUOT_USAGE_ENABLED)) {
 		error = -EINVAL;
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index d14a799c7785..0acd1bb4bafd 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -30,11 +30,15 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
 	case Q_XGETQSTATV:
 	case Q_XQUOTASYNC:
 		break;
-	/* allow to query information for dquots we "own" */
+	/*
+	 * Allow to query information for user/group dquots we "own".
+	 * Allow querying project quota present in our user-namespace.
+	 */
 	case Q_GETQUOTA:
 	case Q_XGETQUOTA:
 		if ((type == USRQUOTA && uid_eq(current_euid(), make_kuid(current_user_ns(), id))) ||
-		    (type == GRPQUOTA && in_egroup_p(make_kgid(current_user_ns(), id))))
+		    (type == GRPQUOTA && in_egroup_p(make_kgid(current_user_ns(), id))) ||
+		    (type == PRJQUOTA && projid_valid(make_kprojid(current_user_ns(), id))))
 			break;
 		/*FALLTHROUGH*/
 	default:
diff --git a/fs/quota/quotaio_v2.h b/fs/quota/quotaio_v2.h
index f1966b42c2fd..4e95430093d9 100644
--- a/fs/quota/quotaio_v2.h
+++ b/fs/quota/quotaio_v2.h
@@ -13,12 +13,14 @@
  */
 #define V2_INITQMAGICS {\
 	0xd9c01f11,	/* USRQUOTA */\
-	0xd9c01927	/* GRPQUOTA */\
+	0xd9c01927,	/* GRPQUOTA */\
+	0xd9c03f14,	/* PRJQUOTA */\
 }
 
 #define V2_INITQVERSIONS {\
 	1,		/* USRQUOTA */\
-	1		/* GRPQUOTA */\
+	1,		/* GRPQUOTA */\
+	1,		/* PRJQUOTA */\
 }
 
 /* First generic header */
diff --git a/include/linux/quota.h b/include/linux/quota.h
index d534e8ed308a..8bad159a268b 100644
--- a/include/linux/quota.h
+++ b/include/linux/quota.h
@@ -50,6 +50,7 @@
 
 #undef USRQUOTA
 #undef GRPQUOTA
+#undef PRJQUOTA
 enum quota_type {
 	USRQUOTA = 0,		/* element used for user quotas */
 	GRPQUOTA = 1,		/* element used for group quotas */
diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
index df73258cca47..ba54745fe408 100644
--- a/include/linux/quotaops.h
+++ b/include/linux/quotaops.h
@@ -104,6 +104,7 @@ int dquot_set_dqblk(struct super_block *sb, struct kqid id,
 
 int __dquot_transfer(struct inode *inode, struct dquot **transfer_to);
 int dquot_transfer(struct inode *inode, struct iattr *iattr);
+int dquot_transfer_project(struct inode *inode, kprojid_t projid);
 
 static inline struct mem_dqinfo *sb_dqinfo(struct super_block *sb, int type)
 {
diff --git a/include/uapi/linux/quota.h b/include/uapi/linux/quota.h
index 1f49b8341c99..9c95b2c1c88a 100644
--- a/include/uapi/linux/quota.h
+++ b/include/uapi/linux/quota.h
@@ -36,11 +36,12 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 
-#define __DQUOT_VERSION__	"dquot_6.5.2"
+#define __DQUOT_VERSION__	"dquot_6.6.0"
 
-#define MAXQUOTAS 2
+#define MAXQUOTAS 3
 #define USRQUOTA  0		/* element used for user quotas */
 #define GRPQUOTA  1		/* element used for group quotas */
+#define PRJQUOTA  2		/* element used for project quotas */
 
 /*
  * Definitions for the default names of the quotas files.
@@ -48,6 +49,7 @@
 #define INITQFNAMES { \
 	"user",    /* USRQUOTA */ \
 	"group",   /* GRPQUOTA */ \
+	"project", /* PRJQUOTA */ \
 	"undefined", \
 };
 


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC v2 4/6] ext4: support project id and project quota
  2015-03-10 17:22 ` Konstantin Khlebnikov
                   ` (3 preceding siblings ...)
  (?)
@ 2015-03-10 17:22 ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-10 17:22 UTC (permalink / raw)
  To: linux-fsdevel, Dave Chinner, Jan Kara, linux-ext4, Theodore Ts'o
  Cc: Dmitry Monakhov, Andy Lutomirski, linux-kernel, Li Xi

This patch implements project quota feature.

If EXT4_FEATURE_RO_COMPAT_PROJECT is set then inode field i_faddr contains
project id (otherwise all project id are set to invalid: -1).
I_faddr is obsolete for a long time, it seems linux never used it.

New files always inherit project id from directory where they are created.

Link/rename works only if projects of inode and target directory are matched,
or if project id of directory is mapped to zero. Otherwise -EXDEV is returned.
User-space tool 'mv' handles EXDEV as cross-filesystem move: it makes a copy of
files and unlinks source, this files will be transferred into target project.
(next patch adds optimization for that)

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/ext4/ext4.h   |   15 ++++++++++----
 fs/ext4/ialloc.c |    3 +++
 fs/ext4/inode.c  |   15 ++++++++++++++
 fs/ext4/namei.c  |   25 +++++++++++++++++++++++
 fs/ext4/super.c  |   59 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 5 files changed, 109 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f63c3d5805c4..8ba523ff5f84 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -212,6 +212,8 @@ struct ext4_io_submit {
 /* First non-reserved inode for old ext4 filesystems */
 #define EXT4_GOOD_OLD_FIRST_INO	11
 
+#define EXT4_PRJ_QUOTA_INO	11	/* Project quota inode */
+
 /*
  * Maximal count of links to a file
  */
@@ -653,7 +655,7 @@ struct ext4_inode {
 	__le32	i_generation;	/* File version (for NFS) */
 	__le32	i_file_acl_lo;	/* File ACL */
 	__le32	i_size_high;
-	__le32	i_obso_faddr;	/* Obsoleted fragment address */
+	__le32	i_project;	/* Formerly i_faddr, fragment address */
 	union {
 		struct {
 			__le16	l_i_blocks_high; /* were l_i_reserved1 */
@@ -939,6 +941,7 @@ struct ext4_inode_info {
 
 	/* Precomputed uuid+inum+igen checksum for seeding inode checksums */
 	__u32 i_csum_seed;
+	kprojid_t i_project;
 };
 
 /*
@@ -1169,7 +1172,8 @@ struct ext4_super_block {
 	__le32	s_overhead_clusters;	/* overhead blocks/clusters in fs */
 	__le32	s_backup_bgs[2];	/* groups with sparse_super2 SBs */
 	__u8	s_encrypt_algos[4];	/* Encryption algorithms in use  */
-	__le32	s_reserved[105];	/* Padding to the end of the block */
+	__le32	s_prj_quota_inum;	/* inode for tracking project quota */
+	__le32	s_reserved[104];	/* Padding to the end of the block */
 	__le32	s_checksum;		/* crc32c(superblock) */
 };
 
@@ -1184,7 +1188,7 @@ struct ext4_super_block {
 #define EXT4_MF_FS_ABORTED	0x0002	/* Fatal error detected */
 
 /* Number of quota types we support */
-#define EXT4_MAXQUOTAS 2
+#define EXT4_MAXQUOTAS 3
 
 /*
  * fourth extended-fs super-block data in memory
@@ -1373,6 +1377,7 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
 	return ino == EXT4_ROOT_INO ||
 		ino == EXT4_USR_QUOTA_INO ||
 		ino == EXT4_GRP_QUOTA_INO ||
+		ino == EXT4_PRJ_QUOTA_INO ||
 		ino == EXT4_BOOT_LOADER_INO ||
 		ino == EXT4_JOURNAL_INO ||
 		ino == EXT4_RESIZE_INO ||
@@ -1536,6 +1541,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
  */
 #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM	0x0400
 #define EXT4_FEATURE_RO_COMPAT_READONLY		0x1000
+#define EXT4_FEATURE_RO_COMPAT_PROJECT		0x2000
 
 #define EXT4_FEATURE_INCOMPAT_COMPRESSION	0x0001
 #define EXT4_FEATURE_INCOMPAT_FILETYPE		0x0002
@@ -1586,7 +1592,8 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 					 EXT4_FEATURE_RO_COMPAT_HUGE_FILE |\
 					 EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
 					 EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
-					 EXT4_FEATURE_RO_COMPAT_QUOTA)
+					 EXT4_FEATURE_RO_COMPAT_QUOTA |\
+					 EXT4_FEATURE_RO_COMPAT_PROJECT)
 
 /*
  * Default values for user and/or group using reserved blocks
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index ac644c31ca67..5d812caf4bef 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -756,6 +756,9 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
 		inode->i_gid = dir->i_gid;
 	} else
 		inode_init_owner(inode, dir, mode);
+
+	ei->i_project = EXT4_I(dir)->i_project;
+
 	dquot_initialize(inode);
 
 	if (!goal)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5cb9a212b86f..fc55590437af 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3968,6 +3968,14 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
 	}
 	i_uid_write(inode, i_uid);
 	i_gid_write(inode, i_gid);
+
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT)) {
+		projid_t project = le32_to_cpu(raw_inode->i_project);
+
+		ei->i_project = make_kprojid(&init_user_ns, project);
+	} else
+		ei->i_project = INVALID_PROJID;
+
 	set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
 
 	ext4_clear_state_flags(ei);	/* Only relevant on 32-bit archs */
@@ -4293,6 +4301,13 @@ static int ext4_do_update_inode(handle_t *handle,
 		raw_inode->i_uid_high = 0;
 		raw_inode->i_gid_high = 0;
 	}
+
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT)) {
+		projid_t i_project = from_kprojid(&init_user_ns, ei->i_project);
+
+		raw_inode->i_project = cpu_to_le32(i_project);
+	}
+
 	raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
 
 	EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 8110dd20ad3f..094f7096a41c 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2934,6 +2934,17 @@ err_drop_inode:
 	return err;
 }
 
+static bool ext4_check_project(struct inode *dir, struct inode *inode)
+{
+	/*
+	 * Allow link/rename only into directory with matched project or
+	 * if direcorty belongs to default 'zero' project. Without feature
+	 * RO_COMPAT_PROJECT all i_project are 'invalid' thus always equal.
+	 */
+	return projid_eq(EXT4_I(dir)->i_project, EXT4_I(inode)->i_project) ||
+	       from_kprojid(current_user_ns(), EXT4_I(dir)->i_project) == 0;
+}
+
 static int ext4_link(struct dentry *old_dentry,
 		     struct inode *dir, struct dentry *dentry)
 {
@@ -2944,6 +2955,9 @@ static int ext4_link(struct dentry *old_dentry,
 	if (inode->i_nlink >= EXT4_LINK_MAX)
 		return -EMLINK;
 
+	if (!ext4_check_project(dir, inode))
+		return -EXDEV;
+
 	dquot_initialize(dir);
 
 retry:
@@ -3260,6 +3274,11 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 	if (new.inode && !test_opt(new.dir->i_sb, NO_AUTO_DA_ALLOC))
 		ext4_alloc_da_blocks(old.inode);
 
+	if (!ext4_check_project(new.dir, old.inode)) {
+		retval = -EXDEV;
+		goto end_rename;
+	}
+
 	credits = (2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
 		   EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2);
 	if (!(flags & RENAME_WHITEOUT)) {
@@ -3436,6 +3455,12 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
 	if (!new.bh || le32_to_cpu(new.de->inode) != new.inode->i_ino)
 		goto end_rename;
 
+	if (!ext4_check_project(new.dir, old.inode) ||
+	    !ext4_check_project(old.dir, new.inode)) {
+		retval = -EXDEV;
+		goto end_rename;
+	}
+
 	handle = ext4_journal_start(old.dir, EXT4_HT_DIR,
 		(2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
 		 2 * EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e061e66c8280..c62ed5b554ae 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1035,9 +1035,54 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
 	return try_to_free_buffers(page);
 }
 
+static kprojid_t ext4_get_project(struct inode *inode)
+{
+	/* Without feature RO_COMPAT_PROJECT all i_project are set to invalid */
+	return EXT4_I(inode)->i_project;
+}
+
+static int ext4_set_project(struct inode *inode, kprojid_t project)
+{
+	struct super_block *sb = inode->i_sb;
+	struct ext4_iloc iloc;
+	handle_t *handle;
+	int ret;
+
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
+		return -EOPNOTSUPP;
+
+	if (projid_eq(EXT4_I(inode)->i_project, project))
+		return 0;
+
+	dquot_initialize(inode);
+
+	handle = ext4_journal_start(inode, EXT4_HT_INODE, 1 +
+					EXT4_QUOTA_INIT_BLOCKS(sb) +
+					EXT4_QUOTA_DEL_BLOCKS(sb));
+	if (IS_ERR(handle))
+		return PTR_ERR(handle);
+
+	ret = dquot_transfer_project(inode, project);
+	if (ret)
+		goto out;
+
+	ret = ext4_reserve_inode_write(handle, inode, &iloc);
+	if (!ret) {
+		inode->i_ctime = ext4_current_time(inode);
+		EXT4_I(inode)->i_project = project;
+		ret = ext4_mark_iloc_dirty(handle, inode, &iloc);
+	}
+	if (IS_SYNC(inode))
+		ext4_handle_sync(handle);
+out:
+	ext4_journal_stop(handle);
+
+	return ret;
+}
+
 #ifdef CONFIG_QUOTA
-#define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
-#define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
+static char *quotatypes[] = INITQFNAMES;
+#define QTYPE2NAME(t) (quotatypes[t])
 
 static int ext4_write_dquot(struct dquot *dquot);
 static int ext4_acquire_dquot(struct dquot *dquot);
@@ -1103,6 +1148,8 @@ static const struct super_operations ext4_sops = {
 	.get_dquots	= ext4_get_dquots,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.get_project	= ext4_get_project,
+	.set_project	= ext4_set_project,
 };
 
 static const struct export_operations ext4_export_ops = {
@@ -3965,6 +4012,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	else
 		sb->s_qcop = &ext4_qctl_operations;
 	sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP;
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
+		sb->s_quota_types |= QTYPE_MASK_PRJ;
 #endif
 	memcpy(sb->s_uuid, es->s_uuid, sizeof(es->s_uuid));
 
@@ -5269,7 +5318,8 @@ static int ext4_quota_enable(struct super_block *sb, int type, int format_id,
 	struct inode *qf_inode;
 	unsigned long qf_inums[EXT4_MAXQUOTAS] = {
 		le32_to_cpu(EXT4_SB(sb)->s_es->s_usr_quota_inum),
-		le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum)
+		le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum),
+		le32_to_cpu(EXT4_SB(sb)->s_es->s_prj_quota_inum),
 	};
 
 	BUG_ON(!EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_QUOTA));
@@ -5297,7 +5347,8 @@ static int ext4_enable_quotas(struct super_block *sb)
 	int type, err = 0;
 	unsigned long qf_inums[EXT4_MAXQUOTAS] = {
 		le32_to_cpu(EXT4_SB(sb)->s_es->s_usr_quota_inum),
-		le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum)
+		le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum),
+		le32_to_cpu(EXT4_SB(sb)->s_es->s_prj_quota_inum),
 	};
 
 	sb_dqopt(sb)->flags |= DQUOT_QUOTA_SYS_FILE;


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC v2 5/6] ext4: add shortcut for moving files across projects
  2015-03-10 17:22 ` Konstantin Khlebnikov
                   ` (4 preceding siblings ...)
  (?)
@ 2015-03-10 17:22 ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-10 17:22 UTC (permalink / raw)
  To: linux-fsdevel, Dave Chinner, Jan Kara, linux-ext4, Theodore Ts'o
  Cc: Dmitry Monakhov, Andy Lutomirski, linux-kernel, Li Xi

This patch adds useful optimization for most common case of moving files
across projects: non-directory inode without extra hardlinks (i_nlink == 1)
can be moved into different project without making a copy. We just have to
change project in and reaccount disk usage in one transaction with rename.

As a result simple recursive 'mv' works much faster: it creates new
directories but files are moved without copying.

Flag DQUOT_TRANSFER_NOFAIL tells dquot_transfer_project() to move inode
regardless of quota limits. This is required for moving inode back into
the old project if rename had failed. This error-path little-bit racy
(user could use more that quota allows) but there are not so much errors
which might trigger this path: filesystem corruption or disk failure.
They seem bigger problem than potential quota abuse.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/ext4/namei.c          |   93 +++++++++++++++++++++++++++++++++++++++++-----
 fs/ext4/super.c          |    2 -
 fs/quota/dquot.c         |   16 +++++---
 include/linux/quotaops.h |   10 ++++-
 4 files changed, 103 insertions(+), 18 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 094f7096a41c..2c738bae7c36 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3029,6 +3029,7 @@ struct ext4_renament {
 	struct inode *inode;
 	bool is_dir;
 	int dir_nlink_delta;
+	bool transfer_project;
 
 	/* entry for "dentry" */
 	struct buffer_head *bh;
@@ -3274,13 +3275,23 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 	if (new.inode && !test_opt(new.dir->i_sb, NO_AUTO_DA_ALLOC))
 		ext4_alloc_da_blocks(old.inode);
 
+	credits = (2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
+		   EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2);
+
 	if (!ext4_check_project(new.dir, old.inode)) {
-		retval = -EXDEV;
-		goto end_rename;
+		/*
+		 * Shortcut for moving files across projects: inode with one
+		 * hardlink can be tranferred as is without making a copy.
+		 */
+		if (!S_ISDIR(old.inode->i_mode) && old.inode->i_nlink == 1) {
+			credits += 2 * EXT4_QUOTA_TRANS_BLOCKS(old.dir->i_sb);
+			old.transfer_project = true;
+		} else {
+			retval = -EXDEV;
+			goto end_rename;
+		}
 	}
 
-	credits = (2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
-		   EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2);
 	if (!(flags & RENAME_WHITEOUT)) {
 		handle = ext4_journal_start(old.dir, EXT4_HT_DIR, credits);
 		if (IS_ERR(handle)) {
@@ -3297,6 +3308,15 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 		}
 	}
 
+	if (old.transfer_project) {
+		retval = dquot_transfer_project(old.inode,
+				EXT4_I(new.dir)->i_project, 0);
+		if (retval) {
+			old.transfer_project = false;
+			goto end_rename;
+		}
+	}
+
 	if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
 		ext4_handle_sync(handle);
 
@@ -3355,6 +3375,8 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 	 * rename.
 	 */
 	old.inode->i_ctime = ext4_current_time(old.inode);
+	if (old.transfer_project)
+		EXT4_I(old.inode)->i_project = EXT4_I(new.dir)->i_project;
 	ext4_mark_inode_dirty(handle, old.inode);
 
 	if (!whiteout) {
@@ -3395,6 +3417,9 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 	retval = 0;
 
 end_rename:
+	if (retval && old.transfer_project)
+		dquot_transfer_project(old.inode, EXT4_I(old.inode)->i_project,
+				       DQUOT_TRANSFER_NOFAIL);
 	brelse(old.dir_bh);
 	brelse(old.bh);
 	brelse(new.bh);
@@ -3425,6 +3450,7 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
 	};
 	u8 new_file_type;
 	int retval;
+	int credits;
 
 	dquot_initialize(old.dir);
 	dquot_initialize(new.dir);
@@ -3455,21 +3481,56 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
 	if (!new.bh || le32_to_cpu(new.de->inode) != new.inode->i_ino)
 		goto end_rename;
 
-	if (!ext4_check_project(new.dir, old.inode) ||
-	    !ext4_check_project(old.dir, new.inode)) {
-		retval = -EXDEV;
-		goto end_rename;
+	credits = 2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
+		  2 * EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2;
+
+	if (!ext4_check_project(new.dir, old.inode)) {
+		if (!S_ISDIR(old.inode->i_mode) && old.inode->i_nlink == 1) {
+			credits += 2 * EXT4_QUOTA_TRANS_BLOCKS(old.dir->i_sb);
+			old.transfer_project = true;
+		} else {
+			retval = -EXDEV;
+			goto end_rename;
+		}
+	}
+
+	if (!ext4_check_project(old.dir, new.inode)) {
+		if (!S_ISDIR(new.inode->i_mode) && new.inode->i_nlink == 1) {
+			credits += 2 * EXT4_QUOTA_TRANS_BLOCKS(old.dir->i_sb);
+			new.transfer_project = true;
+		} else {
+			old.transfer_project = false;
+			retval = -EXDEV;
+			goto end_rename;
+		}
 	}
 
-	handle = ext4_journal_start(old.dir, EXT4_HT_DIR,
-		(2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
-		 2 * EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
+	handle = ext4_journal_start(old.dir, EXT4_HT_DIR, credits);
 	if (IS_ERR(handle)) {
 		retval = PTR_ERR(handle);
 		handle = NULL;
 		goto end_rename;
 	}
 
+	if (old.transfer_project) {
+		retval = dquot_transfer_project(old.inode,
+				EXT4_I(new.dir)->i_project, 0);
+		if (retval) {
+			old.transfer_project = false;
+			new.transfer_project = false;
+			goto end_rename;
+		}
+	}
+
+	if (new.transfer_project) {
+		retval = dquot_transfer_project(new.inode,
+				EXT4_I(old.dir)->i_project, 0);
+		if (retval) {
+			new.transfer_project = false;
+			goto end_rename;
+		}
+	}
+
 	if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
 		ext4_handle_sync(handle);
 
@@ -3514,6 +3575,10 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
 	 */
 	old.inode->i_ctime = ext4_current_time(old.inode);
 	new.inode->i_ctime = ext4_current_time(new.inode);
+	if (old.transfer_project)
+		EXT4_I(old.inode)->i_project = EXT4_I(new.dir)->i_project;
+	if (new.transfer_project)
+		EXT4_I(new.inode)->i_project = EXT4_I(old.dir)->i_project;
 	ext4_mark_inode_dirty(handle, old.inode);
 	ext4_mark_inode_dirty(handle, new.inode);
 
@@ -3532,6 +3597,12 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
 	retval = 0;
 
 end_rename:
+	if (retval && old.transfer_project)
+		dquot_transfer_project(old.inode, EXT4_I(old.inode)->i_project,
+				       DQUOT_TRANSFER_NOFAIL);
+	if (retval && new.transfer_project)
+		dquot_transfer_project(new.inode, EXT4_I(new.inode)->i_project,
+				       DQUOT_TRANSFER_NOFAIL);
 	brelse(old.dir_bh);
 	brelse(new.dir_bh);
 	brelse(old.bh);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c62ed5b554ae..6a6506bce53c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1062,7 +1062,7 @@ static int ext4_set_project(struct inode *inode, kprojid_t project)
 	if (IS_ERR(handle))
 		return PTR_ERR(handle);
 
-	ret = dquot_transfer_project(inode, project);
+	ret = dquot_transfer_project(inode, project, 0);
 	if (ret)
 		goto out;
 
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 04c27cdeca05..0b61357554ed 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1838,7 +1838,8 @@ EXPORT_SYMBOL(dquot_free_inode);
  * We are holding reference on transfer_from & transfer_to, no need to
  * protect them by srcu_read_lock().
  */
-int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
+static int do_dquot_transfer(struct inode *inode,
+		struct dquot **transfer_to, int flags)
 {
 	qsize_t space, cur_space;
 	qsize_t rsv_space = 0;
@@ -1879,10 +1880,10 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
 		is_valid[cnt] = 1;
 		transfer_from[cnt] = i_dquot(inode)[cnt];
 		ret = check_idq(transfer_to[cnt], 1, &warn_to[cnt]);
-		if (ret)
+		if (ret && !(flags & DQUOT_TRANSFER_NOFAIL))
 			goto over_quota;
 		ret = check_bdq(transfer_to[cnt], space, 0, &warn_to[cnt]);
-		if (ret)
+		if (ret && !(flags & DQUOT_TRANSFER_NOFAIL))
 			goto over_quota;
 	}
 
@@ -1932,6 +1933,11 @@ over_quota:
 	flush_warnings(warn_to);
 	return ret;
 }
+
+int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
+{
+	return do_dquot_transfer(inode, transfer_to, 0);
+}
 EXPORT_SYMBOL(__dquot_transfer);
 
 /* Wrapper for transferring ownership of an inode for uid/gid only
@@ -1960,7 +1966,7 @@ EXPORT_SYMBOL(dquot_transfer);
 /*
  * Helper function for transferring inode into another project.
  */
-int dquot_transfer_project(struct inode *inode, kprojid_t projid)
+int dquot_transfer_project(struct inode *inode, kprojid_t projid, int flags)
 {
 	struct dquot *transfer_to[MAXQUOTAS] = {};
 	struct super_block *sb = inode->i_sb;
@@ -1969,7 +1975,7 @@ int dquot_transfer_project(struct inode *inode, kprojid_t projid)
 	if (!sb_has_quota_active(sb, PRJQUOTA))
 		return 0;
 	transfer_to[PRJQUOTA] = dqget(sb, make_kqid_projid(projid));
-	ret = __dquot_transfer(inode, transfer_to);
+	ret = do_dquot_transfer(inode, transfer_to, flags);
 	dqput_all(transfer_to);
 	return ret;
 }
diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
index ba54745fe408..810b88c69c5b 100644
--- a/include/linux/quotaops.h
+++ b/include/linux/quotaops.h
@@ -9,10 +9,18 @@
 
 #include <linux/fs.h>
 
+/*
+ * Flags for functions __dquot_alloc_space() and __dquot_free_space()
+ */
 #define DQUOT_SPACE_WARN	0x1
 #define DQUOT_SPACE_RESERVE	0x2
 #define DQUOT_SPACE_NOFAIL	0x4
 
+/*
+ * Flags for functions dquot_transfer_*
+ */
+#define DQUOT_TRANSFER_NOFAIL	0x1
+
 static inline struct quota_info *sb_dqopt(struct super_block *sb)
 {
 	return &sb->s_dquot;
@@ -104,7 +112,7 @@ int dquot_set_dqblk(struct super_block *sb, struct kqid id,
 
 int __dquot_transfer(struct inode *inode, struct dquot **transfer_to);
 int dquot_transfer(struct inode *inode, struct iattr *iattr);
-int dquot_transfer_project(struct inode *inode, kprojid_t projid);
+int dquot_transfer_project(struct inode *inode, kprojid_t projid, int flags);
 
 static inline struct mem_dqinfo *sb_dqinfo(struct super_block *sb, int type)
 {


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC v2 6/6] ext4: mangle statfs results accourding to project quota usage and limits
  2015-03-10 17:22 ` Konstantin Khlebnikov
                   ` (5 preceding siblings ...)
  (?)
@ 2015-03-10 17:22 ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-10 17:22 UTC (permalink / raw)
  To: linux-fsdevel, Dave Chinner, Jan Kara, linux-ext4, Theodore Ts'o
  Cc: Dmitry Monakhov, Andy Lutomirski, linux-kernel, Li Xi

This patch adds helper function dquot_mangle_statfs() which fills statfs
result with information from project quota counters. XFS does the same
thing in function xfs_fill_statvfs_from_dquot().

As a result subtree under project quota acts like separate filesystem,
for example 'df' inside chroot or container will show expected numbers.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/ext4/super.c          |    2 +-
 fs/quota/dquot.c         |   51 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/quotaops.h |    7 ++++++
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6a6506bce53c..09ffa4905651 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5154,7 +5154,7 @@ static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
 	buf->f_fsid.val[0] = fsid & 0xFFFFFFFFUL;
 	buf->f_fsid.val[1] = (fsid >> 32) & 0xFFFFFFFFUL;
 
-	return 0;
+	return dquot_mangle_statfs(dentry, buf);
 }
 
 /* Helper function for writing quotas on sync - we need to start transaction
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 0b61357554ed..b05ebbcb4d5e 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -2025,6 +2025,57 @@ int dquot_file_open(struct inode *inode, struct file *file)
 EXPORT_SYMBOL(dquot_file_open);
 
 /*
+ * Update statfs results accourding to project quota limits.
+ */
+int dquot_mangle_statfs(struct dentry *dentry, struct kstatfs *buf)
+{
+	u64 blimit = 0, ilimit = 0, busage, iusage, bfree, ifree;
+	struct inode *inode = dentry->d_inode;
+	struct super_block *sb = dentry->d_sb;
+	struct dquot *dquot;
+
+	if (!sb_has_quota_limits_enabled(sb, PRJQUOTA))
+		return 0;
+
+	__dquot_initialize(inode, PRJQUOTA);
+
+	spin_lock(&dq_data_lock);
+	dquot = i_dquot(inode)[PRJQUOTA];
+	if (dquot) {
+		struct mem_dqblk *dm = &(dquot->dq_dqb);
+
+		blimit = dm->dqb_bsoftlimit ?: dm->dqb_bhardlimit;
+		busage = dm->dqb_curspace + dm->dqb_rsvspace;
+		ilimit = dm->dqb_isoftlimit ?: dm->dqb_ihardlimit;
+		iusage = dm->dqb_curinodes;
+	}
+	spin_unlock(&dq_data_lock);
+
+	if (blimit) {
+		blimit = div_u64(blimit, buf->f_bsize);
+		busage = div_u64(busage, buf->f_bsize);
+		bfree = (blimit <= busage) ? 0 : (blimit - busage);
+		if (buf->f_blocks > blimit)
+			buf->f_blocks = blimit;
+		if (buf->f_bfree > bfree)
+			buf->f_bfree = bfree;
+		if (buf->f_bavail > bfree)
+			buf->f_bavail = bfree;
+	}
+
+	if (ilimit) {
+		ifree = (ilimit <= iusage) ? 0 : (ilimit - iusage);
+		if (buf->f_files > ilimit)
+			buf->f_files = ilimit;
+		if (buf->f_ffree > ifree)
+			buf->f_ffree = ifree;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(dquot_mangle_statfs);
+
+/*
  * Turn quota off on a device. type == -1 ==> quotaoff for all types (umount)
  */
 int dquot_disable(struct super_block *sb, int type, unsigned int flags)
diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
index 810b88c69c5b..cd4c02e3d17e 100644
--- a/include/linux/quotaops.h
+++ b/include/linux/quotaops.h
@@ -8,6 +8,7 @@
 #define _LINUX_QUOTAOPS_
 
 #include <linux/fs.h>
+#include <linux/statfs.h>
 
 /*
  * Flags for functions __dquot_alloc_space() and __dquot_free_space()
@@ -93,6 +94,7 @@ int dquot_commit_info(struct super_block *sb, int type);
 int dquot_mark_dquot_dirty(struct dquot *dquot);
 
 int dquot_file_open(struct inode *inode, struct file *file);
+int dquot_mangle_statfs(struct dentry *dentry, struct kstatfs *buf);
 
 int dquot_enable(struct inode *inode, int type, int format_id,
 	unsigned int flags);
@@ -283,6 +285,11 @@ static inline int dquot_resume(struct super_block *sb, int type)
 
 #define dquot_file_open		generic_file_open
 
+static inline int dquot_mangle_statfs(struct dentry *d, struct kstatfs *buf)
+{
+	return 0;
+}
+
 static inline int dquot_writeback_dquots(struct super_block *sb, int type)
 {
 	return 0;


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 2/6] fs: protected project id
  2015-03-10 17:22 ` [PATCH RFC v2 2/6] fs: protected " Konstantin Khlebnikov
@ 2015-03-10 17:32   ` Andy Lutomirski
  2015-03-10 18:51     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2015-03-10 17:32 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Linux FS Devel, Dave Chinner, Jan Kara, linux-ext4,
	Theodore Ts'o, Dmitry Monakhov, linux-kernel, Li Xi

On Tue, Mar 10, 2015 at 10:22 AM, Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
> Historically XFS project id doesn't have any permission control: file owner
> is able to set any project id. Later they was sealed with user-namespace:
> XFS allows to change it only from init user-ns. That works fine for isolated
> containers or if user doesn't have direct access to the filesystem (NFS/FTP).
>
> This patch adds sysctl fs.protected_projects which makes changing project id
> privileged operation which requires CAP_SYS_RESOURCE in current user-namespace.
> Thus there are two levels of protection: project id mapping in user-ns defines
> set of permitted projects and capability protects operations within this set.

If I understand this right, this doesn't work.  If I lack
CAP_SYS_RESOURCE but I have two projids mapped, then I can create a
new userns, map both projids, and get CAP_SYS_RESOURCE.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 2/6] fs: protected project id
  2015-03-10 17:32   ` Andy Lutomirski
@ 2015-03-10 18:51     ` Konstantin Khlebnikov
  2015-03-10 18:57       ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-10 18:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Konstantin Khlebnikov, Linux FS Devel, Dave Chinner, Jan Kara,
	linux-ext4, Theodore Ts'o, Dmitry Monakhov, linux-kernel,
	Li Xi

On Tue, Mar 10, 2015 at 8:32 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Tue, Mar 10, 2015 at 10:22 AM, Konstantin Khlebnikov
> <khlebnikov@yandex-team.ru> wrote:
>> Historically XFS project id doesn't have any permission control: file owner
>> is able to set any project id. Later they was sealed with user-namespace:
>> XFS allows to change it only from init user-ns. That works fine for isolated
>> containers or if user doesn't have direct access to the filesystem (NFS/FTP).
>>
>> This patch adds sysctl fs.protected_projects which makes changing project id
>> privileged operation which requires CAP_SYS_RESOURCE in current user-namespace.
>> Thus there are two levels of protection: project id mapping in user-ns defines
>> set of permitted projects and capability protects operations within this set.
>
> If I understand this right, this doesn't work.  If I lack
> CAP_SYS_RESOURCE but I have two projids mapped, then I can create a
> new userns, map both projids, and get CAP_SYS_RESOURCE.

Setting project id mapping for nested user-namespace also requires
this capability in parent namespace. The same as for setting uid/gid
mapping but without special case for mapping current uid/gid because
task has no "current" project id.

This is mentioned in cover letter but I forget it here. Sorry.

>
> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 2/6] fs: protected project id
  2015-03-10 18:51     ` Konstantin Khlebnikov
@ 2015-03-10 18:57       ` Andy Lutomirski
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2015-03-10 18:57 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Konstantin Khlebnikov, Linux FS Devel, Dave Chinner, Jan Kara,
	linux-ext4, Theodore Ts'o, Dmitry Monakhov, linux-kernel,
	Li Xi

On Tue, Mar 10, 2015 at 11:51 AM, Konstantin Khlebnikov
<koct9i@gmail.com> wrote:
> On Tue, Mar 10, 2015 at 8:32 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Tue, Mar 10, 2015 at 10:22 AM, Konstantin Khlebnikov
>> <khlebnikov@yandex-team.ru> wrote:
>>> Historically XFS project id doesn't have any permission control: file owner
>>> is able to set any project id. Later they was sealed with user-namespace:
>>> XFS allows to change it only from init user-ns. That works fine for isolated
>>> containers or if user doesn't have direct access to the filesystem (NFS/FTP).
>>>
>>> This patch adds sysctl fs.protected_projects which makes changing project id
>>> privileged operation which requires CAP_SYS_RESOURCE in current user-namespace.
>>> Thus there are two levels of protection: project id mapping in user-ns defines
>>> set of permitted projects and capability protects operations within this set.
>>
>> If I understand this right, this doesn't work.  If I lack
>> CAP_SYS_RESOURCE but I have two projids mapped, then I can create a
>> new userns, map both projids, and get CAP_SYS_RESOURCE.
>
> Setting project id mapping for nested user-namespace also requires
> this capability in parent namespace. The same as for setting uid/gid
> mapping but without special case for mapping current uid/gid because
> task has no "current" project id.
>
> This is mentioned in cover letter but I forget it here. Sorry.

Right, sorry.  I'm still used to projid mappings being unprotected.

--Andy

>
>>
>> --Andy
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 1/6] fs: vfs ioctls for managing project id
  2015-03-10 17:22 ` [PATCH RFC v2 1/6] fs: vfs ioctls for managing project id Konstantin Khlebnikov
@ 2015-03-11  7:00   ` Andreas Dilger
  2015-03-11  7:19     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Dilger @ 2015-03-11  7:00 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-fsdevel, Dave Chinner, Jan Kara, linux-ext4,
	Theodore Ts'o, Dmitry Monakhov, Andy Lutomirski,
	linux-kernel, Li Xi

How do your patches relate to the project quota patches from Li Xi?

Cheers, Andreas

> On Mar 10, 2015, at 13:22, Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote:
> 
> This patch adds generic vfs interface for project id and
> related super-block methods.
> 
> Two new ioctls:
> int ioctl(fd, FS_IOC_GETPROJECT, unsigned *project);
> int ioctl(fd, FS_IOC_SETPROJECT, unsigned *project);
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
> Documentation/filesystems/Locking |    4 +++
> Documentation/filesystems/vfs.txt |    8 +++++
> fs/compat_ioctl.c                 |    2 +
> fs/ioctl.c                        |   58 +++++++++++++++++++++++++++++++++++++
> include/linux/fs.h                |    2 +
> include/uapi/linux/fs.h           |    3 ++
> 6 files changed, 77 insertions(+)
> 
> diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> index f91926f2f482..7a01e2b606d0 100644
> --- a/Documentation/filesystems/Locking
> +++ b/Documentation/filesystems/Locking
> @@ -126,6 +126,8 @@ prototypes:
>    ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
>    ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
>    int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
> +    kprojid_t (*get_project) (struct inode *);
> +    int (*set_project) (struct inode *, kprojid_t);
> 
> locking rules:
>    All may block [not true, see below]
> @@ -147,6 +149,8 @@ show_options:        no        (namespace_sem)
> quota_read:        no        (see below)
> quota_write:        no        (see below)
> bdev_try_to_free_page:    no        (see below)
> +get_project        no
> +set_project        no        (i_mutex)
> 
> ->statfs() has s_umount (shared) when called by ustat(2) (native or
> compat), but that's an accident of bad API; s_umount is used to pin
> diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> index 966b22829f3b..aeff3c54021b 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -230,6 +230,8 @@ struct super_operations {
>         ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
>    int (*nr_cached_objects)(struct super_block *);
>    void (*free_cached_objects)(struct super_block *, int);
> +    kprojid_t (*get_project)(struct inode *);
> +    int (*set_project)(struct inode *, kprojid_t);
> };
> 
> All methods are called without any locks being held, unless otherwise
> @@ -319,6 +321,12 @@ or bottom half).
>    implementations will cause holdoff problems due to large scan batch
>    sizes.
> 
> +  get_project: called by the ioctl and quota code to get project id of an inode.
> +    Method should return INVALID_PROJID if feature isn't supported.
> +
> +  set_project: called by the ioctl to set project id for an inode.
> +    This is called with i_mutex held.
> +
> Whoever sets up the inode is responsible for filling in the "i_op" field. This
> is a pointer to a "struct inode_operations" which describes the methods that
> can be performed on individual inodes.
> diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
> index afec6450450f..9a24e4038377 100644
> --- a/fs/compat_ioctl.c
> +++ b/fs/compat_ioctl.c
> @@ -889,6 +889,8 @@ COMPATIBLE_IOCTL(FIOASYNC)
> COMPATIBLE_IOCTL(FIONBIO)
> COMPATIBLE_IOCTL(FIONREAD)  /* This is also TIOCINQ */
> COMPATIBLE_IOCTL(FS_IOC_FIEMAP)
> +COMPATIBLE_IOCTL(FS_IOC_GETPROJECT)
> +COMPATIBLE_IOCTL(FS_IOC_SETPROJECT)
> /* 0x00 */
> COMPATIBLE_IOCTL(FIBMAP)
> COMPATIBLE_IOCTL(FIGETBSZ)
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 5d01d2638ca5..d351576d95c8 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -9,6 +9,7 @@
> #include <linux/capability.h>
> #include <linux/file.h>
> #include <linux/fs.h>
> +#include <linux/mount.h>
> #include <linux/security.h>
> #include <linux/export.h>
> #include <linux/uaccess.h>
> @@ -545,6 +546,57 @@ static int ioctl_fsthaw(struct file *filp)
>    return thaw_super(sb);
> }
> 
> +static int ioctl_getproject(struct file *filp, projid_t __user *argp)
> +{
> +    struct user_namespace *ns = current_user_ns();
> +    struct inode *inode = file_inode(filp);
> +    struct super_block *sb = inode->i_sb;
> +    kprojid_t kprojid;
> +    projid_t projid;
> +
> +    if (!sb->s_op->get_project)
> +        return -EOPNOTSUPP;
> +    kprojid = sb->s_op->get_project(inode);
> +    if (!projid_valid(kprojid))
> +        return -EOPNOTSUPP;
> +    projid = from_kprojid(ns, kprojid);
> +    if (projid == (projid_t)-1)
> +        return -EACCES;
> +    return put_user(projid, argp);
> +}
> +
> +static int ioctl_setproject(struct file *filp, projid_t __user *argp)
> +{
> +    struct user_namespace *ns = current_user_ns();
> +    struct inode *inode = file_inode(filp);
> +    struct super_block *sb = inode->i_sb;
> +    kprojid_t kprojid;
> +    projid_t projid;
> +    int ret;
> +
> +    if (!sb->s_op->set_project)
> +        return -EOPNOTSUPP;
> +    if (ns != &init_user_ns)
> +        return -EPERM;
> +    ret = get_user(projid, argp);
> +    if (ret)
> +        return ret;
> +    kprojid = make_kprojid(ns, projid);
> +    if (!projid_valid(kprojid))
> +        return -EACCES;
> +    ret = mnt_want_write_file(filp);
> +    if (ret)
> +        return ret;
> +    mutex_lock(&inode->i_mutex);
> +    if (inode_owner_or_capable(inode))
> +        ret = sb->s_op->set_project(inode, kprojid);
> +    else
> +        ret = -EPERM;
> +    mutex_unlock(&inode->i_mutex);
> +    mnt_drop_write_file(filp);
> +    return ret;
> +}
> +
> /*
>  * When you add any new common ioctls to the switches above and below
>  * please update compat_sys_ioctl() too.
> @@ -597,6 +649,12 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
>    case FS_IOC_FIEMAP:
>        return ioctl_fiemap(filp, arg);
> 
> +    case FS_IOC_GETPROJECT:
> +        return ioctl_getproject(filp, argp);
> +
> +    case FS_IOC_SETPROJECT:
> +        return ioctl_setproject(filp, argp);
> +
>    case FIGETBSZ:
>        return put_user(inode->i_sb->s_blocksize, argp);
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b4d71b5e1ff2..42156801739e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1655,6 +1655,8 @@ struct super_operations {
>                  struct shrink_control *);
>    long (*free_cached_objects)(struct super_block *,
>                    struct shrink_control *);
> +    kprojid_t (*get_project)(struct inode *);
> +    int (*set_project)(struct inode *, kprojid_t);
> };
> 
> /*
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 9b964a5920af..144426326e15 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -170,6 +170,9 @@ struct inodes_stat_t {
> #define FS_IOC32_GETVERSION        _IOR('v', 1, int)
> #define FS_IOC32_SETVERSION        _IOW('v', 2, int)
> 
> +#define FS_IOC_GETPROJECT        _IOR('f', 20, unsigned)
> +#define FS_IOC_SETPROJECT        _IOW('f', 21, unsigned)
> +
> /*
>  * Inode flags (FS_IOC_GETFLAGS / FS_IOC_SETFLAGS)
>  */
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 1/6] fs: vfs ioctls for managing project id
  2015-03-11  7:00   ` Andreas Dilger
@ 2015-03-11  7:19     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-11  7:19 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Konstantin Khlebnikov, linux-fsdevel, Dave Chinner, Jan Kara,
	linux-ext4, Theodore Ts'o, Dmitry Monakhov, Andy Lutomirski,
	linux-kernel, Li Xi

On Wed, Mar 11, 2015 at 10:00 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> How do your patches relate to the project quota patches from Li Xi?

In v1 patchset was some modified patches from him, in v2 nothing left
from that code.
Probably couple hunks left in headers, for example the same magic
number is used for project quota inode. =)

>
> Cheers, Andreas
>
>> On Mar 10, 2015, at 13:22, Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote:
>>
>> This patch adds generic vfs interface for project id and
>> related super-block methods.
>>
>> Two new ioctls:
>> int ioctl(fd, FS_IOC_GETPROJECT, unsigned *project);
>> int ioctl(fd, FS_IOC_SETPROJECT, unsigned *project);
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> ---

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 0/6] ext4: yet another project quota
  2015-03-10 17:22 ` Konstantin Khlebnikov
@ 2015-03-16 16:52   ` Jan Kara
  -1 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2015-03-16 16:52 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-fsdevel, Dave Chinner, Jan Kara, linux-ext4,
	Theodore Ts'o, Dmitry Monakhov, Andy Lutomirski,
	linux-kernel, Li Xi

  Hello,

On Tue 10-03-15 20:22:04, Konstantin Khlebnikov wrote:
> Projects quota allows to enforce disk quota for several subtrees or even
> individual files on the filesystem. Each inode is marked with project-id
> (independently from uid and gid) and accounted into corresponding project
> quota. New files inherits project id from directory where they are created.
> 
> This is must-have feature for deploying lightweight containers.
> Also project quota can tell size of subtree without doing recursive 'du'.
> 
> This patchset adds project id and quota into ext4.
> 
> This time I've prepared patches also for e2fsprogs and quota-tools.
> 
> All patches are available at github:
> https://github.com/koct9i/linux --branch project
> https://github.com/koct9i/e2fsprogs --branch project
> https://github.com/koct9i/quota-tools --branch project
>
> Porposed behavior is similar to project quota in XFS:
> 
> * all inode are marked with project id
> * new files inherits project id from parent directory
> * project quota accounts inodes and enforces limits
> * cross-project link and rename operations are restricted
  Thanks for the patches and sorry for taking a long time to look into
them. I like your patches and they looks mostly fine to me but can we
*please* start with making things completely compatible with how XFS works?
I understand that may seem broken / not useful for your usecase but
consistency among filesystems is really important. 

I was talking with Dave Chinner last week and we agreed that the direction
you are changing things makes sense but getting things compatible with how
they were for 15 years in XFS is an important first step (because there may
be people with other usecases depending on the old behavior and they would
get confused by ext4 behaving differently). After we have compatible code
in, we can add your fs.protected_projects thing on top of that (and also
support it in XFS to stay compatible).

> Differences:
> 
> There is no flag similar to XFS_XFLAG_PROJINHERIT (which allows to disable
> project id inheritance), instead of that project which userspace sees as '0'
> (in nested user-name space that might be actually non-zero project) acts as
> default project where restrictions for link/rename are ignored.
> (also see below, in "why new ioctl" part)
> 
> This implementation adds shortcut for moving files from one project into
> another: non-directory inodes with n_link == 1 are transferred without
> copying (XFS in this case returns -EXDEV and userspace have to copy file).
> 
> In XFS file owner (or process with CAP_FOWNER) can set set any project id,
> XFS permits changing project id only from init user-namespace.
> 
> This patchset adds sysctl fs.protected_projects. By default it's 0 and project
> id acts as XFS project. Setting it to 1 makes chaning project id priviliged
> operation which requires CAP_SYS_RESOURCE in current user-namespace, changing
> project id mapping for nested user-namespace also requires that capability.
> Thus there are two levels of control: project id mapping in user-ns defines set
> of permitted projects and capability protects operations within this set.
> 
> I see no problems with supporting all this in XFS, all difference in interface.
> 
> Ext4 layout
> -----------
> 
> Project id introduce ro-compatible feature 'project'.
> 
> Inode project id is stored in place of obsolete field 'i_faddr' (that trick was
> suggested by Darrick J. Wong in previous discussions of project quota).
> Linux never used that field and present fsck checks that it contains zero.
> 
> Quota information is stored in special inode №11 (by default only 10 inodes are
> reserved for special usage, I've add option to resize2fs to reserve more).
> (see e2fsprogs patches for details) For symmetry with other quotas inode number
> is stored in superblock.
> 
> Project quota supports only modern 'hidden' journaled mode.
> 
> Interface
> ---------
> 
> Interface for changing limits / query current usage is common vfs quotactl()
> where quotatype = PRJQUOTA = 2. User can query current state of any project
> mapped into user-ns, changing of limits requires CAP_SYS_ADMIN in init user-ns.
> 
> Two new ioctls for getting / changing inode project id:
> int ioctl(fd, FS_IOC_GETPROJECT, unsigned *project);
> int ioctl(fd, FS_IOC_SETPROJECT, unsigned *project);
> 
> They acts as interface for super-block methods get_project / set_project
> Generic code checks permissions, does project id translation in user-namespace
> mapping, grabs write-access to the filesystem, locks i_mutex for set opetaion.
> Filesystem method only updates inode and transfers project quota.
> 
> No new mount options added. Disk usage tracking is enabled at mount.
> Limits are enabeld later by "quotaon".
> 
> (BTW why journalled quota doesn't enable limits right at the time of mounting?)
  Well, mostly historical reasons :) The biggest probably was that to
properly initialize quotas (or fix them if quota file gets corrupted) you
have to still run quotacheck and for that kernel mustn't touch the files.
At the time when I was writing journaled quotas I didn't feel like adding
quotacheck support into e2fsck... We eventually did that anyway for support
of quota in system files but how we did that was actually based on the
experience I had from the journaled quota lesson ;).

> Why new ioctls?
> ---------------
> 
> XFS has it's own interface for that: XFS_IOC_FSGETXATTR / XFS_IOC_FSSETXATTR.
> But it has several flaws and doesn't fit for a role of generic interface.
> 
> It contains a lot of xfs-specific flags and racy by design: set operation
> commits all fields at once thus it's used in sequence get-change-set without
> any lock, Concurrent updates from user space will collide.
  Sure but that's the case for generic GETFLAGS / SETFLAGS interface as
well.

> Also xfs has flag XFS_XFLAG_PROJINHERIT which tells should new files inherit
> project id from parent directory or not. This flag is barely useful and only
> makes everything complicated. Even tools in xfsprogs don't use it: they always
> set it together with project id and clears when set project id back to zero.
  I think this is about balance - adding the support for the PROJINHERIT flag
costs us close to nothing and we get a compatibility with XFS. So even though
the usefulness of that flag is doubtful, I think the additional
compatibility is worth it.

> And the main reason: this compatibility gives nothing. The only user of xfs
> ioctl which I've found is the xfsprogs. And these tools check filesystem name
> and don't work anywhere except 'xfs'.
  The fact that you didn't find any other user doesn't mean there isn't
any. And historically if we though "nobody could be relying on this", we
were sometimes proven wrong - a few years later when it was *much* more
painful to make things compatible.

Here we speak about new feature for the filesystem so compatibility
requirements aren't that strong but still I find the case that the same
ioctl will just work on ext4 as it does on xfs more appealing than creating
a new simpler generic ioctl. That way userspace doesn't have to care on
which filesystem it is working or whether the current kernel already
supports the new ioctl for XFS. So please use the XFS interface is Li Xi
does in his patches.

								Honza

> Links
> -----
> 
> [1] 2014-12-09  ext4: add project quota support by Li Xi
> http://marc.info/?l=linux-fsdevel&m=141810265603565&w=2
> 
> [2] 2014-01-28  A draft for making ext4 support project quota by Zheng Liu
> http://marc.info/?l=linux-ext4&m=139089109605795&w=2
> 
> [3] 2012-07-09  introduce extended inode owner identifier v10 by Dmitry Monakhov
> http://thread.gmane.org/gmane.linux.file-systems/65752
> 
> [4] 2010-02-08  Introduce subtree quota support by Dmitry Monakhov
> http://thread.gmane.org/gmane.comp.file-systems.ext4/17530
> 
> 
> ---
> 
> Konstantin Khlebnikov (6):
>       fs: vfs ioctls for managing project id
>       fs: protected project id
>       quota: generic project quota
>       ext4: support project id and project quota
>       ext4: add shortcut for moving files across projects
>       ext4: mangle statfs results accourding to project quota usage and limits
> 
> 
>  Documentation/filesystems/Locking |    4 +
>  Documentation/filesystems/vfs.txt |    8 +++
>  Documentation/sysctl/fs.txt       |   16 ++++++
>  fs/compat_ioctl.c                 |    2 +
>  fs/ext4/ext4.h                    |   15 ++++-
>  fs/ext4/ialloc.c                  |    3 +
>  fs/ext4/inode.c                   |   15 +++++
>  fs/ext4/namei.c                   |  102 ++++++++++++++++++++++++++++++++++++-
>  fs/ext4/super.c                   |   61 ++++++++++++++++++++--
>  fs/ioctl.c                        |   62 ++++++++++++++++++++++
>  fs/quota/dquot.c                  |   96 +++++++++++++++++++++++++++++++++--
>  fs/quota/quota.c                  |    8 ++-
>  fs/quota/quotaio_v2.h             |    6 +-
>  include/linux/fs.h                |    3 +
>  include/linux/quota.h             |    1 
>  include/linux/quotaops.h          |   16 ++++++
>  include/uapi/linux/capability.h   |    1 
>  include/uapi/linux/fs.h           |    3 +
>  include/uapi/linux/quota.h        |    6 +-
>  kernel/sysctl.c                   |    9 +++
>  kernel/user_namespace.c           |    4 +
>  21 files changed, 416 insertions(+), 25 deletions(-)
> 
> --
> Signature
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 0/6] ext4: yet another project quota
@ 2015-03-16 16:52   ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2015-03-16 16:52 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-fsdevel, Dave Chinner, Jan Kara, linux-ext4,
	Theodore Ts'o, Dmitry Monakhov, Andy Lutomirski,
	linux-kernel, Li Xi

  Hello,

On Tue 10-03-15 20:22:04, Konstantin Khlebnikov wrote:
> Projects quota allows to enforce disk quota for several subtrees or even
> individual files on the filesystem. Each inode is marked with project-id
> (independently from uid and gid) and accounted into corresponding project
> quota. New files inherits project id from directory where they are created.
> 
> This is must-have feature for deploying lightweight containers.
> Also project quota can tell size of subtree without doing recursive 'du'.
> 
> This patchset adds project id and quota into ext4.
> 
> This time I've prepared patches also for e2fsprogs and quota-tools.
> 
> All patches are available at github:
> https://github.com/koct9i/linux --branch project
> https://github.com/koct9i/e2fsprogs --branch project
> https://github.com/koct9i/quota-tools --branch project
>
> Porposed behavior is similar to project quota in XFS:
> 
> * all inode are marked with project id
> * new files inherits project id from parent directory
> * project quota accounts inodes and enforces limits
> * cross-project link and rename operations are restricted
  Thanks for the patches and sorry for taking a long time to look into
them. I like your patches and they looks mostly fine to me but can we
*please* start with making things completely compatible with how XFS works?
I understand that may seem broken / not useful for your usecase but
consistency among filesystems is really important. 

I was talking with Dave Chinner last week and we agreed that the direction
you are changing things makes sense but getting things compatible with how
they were for 15 years in XFS is an important first step (because there may
be people with other usecases depending on the old behavior and they would
get confused by ext4 behaving differently). After we have compatible code
in, we can add your fs.protected_projects thing on top of that (and also
support it in XFS to stay compatible).

> Differences:
> 
> There is no flag similar to XFS_XFLAG_PROJINHERIT (which allows to disable
> project id inheritance), instead of that project which userspace sees as '0'
> (in nested user-name space that might be actually non-zero project) acts as
> default project where restrictions for link/rename are ignored.
> (also see below, in "why new ioctl" part)
> 
> This implementation adds shortcut for moving files from one project into
> another: non-directory inodes with n_link == 1 are transferred without
> copying (XFS in this case returns -EXDEV and userspace have to copy file).
> 
> In XFS file owner (or process with CAP_FOWNER) can set set any project id,
> XFS permits changing project id only from init user-namespace.
> 
> This patchset adds sysctl fs.protected_projects. By default it's 0 and project
> id acts as XFS project. Setting it to 1 makes chaning project id priviliged
> operation which requires CAP_SYS_RESOURCE in current user-namespace, changing
> project id mapping for nested user-namespace also requires that capability.
> Thus there are two levels of control: project id mapping in user-ns defines set
> of permitted projects and capability protects operations within this set.
> 
> I see no problems with supporting all this in XFS, all difference in interface.
> 
> Ext4 layout
> -----------
> 
> Project id introduce ro-compatible feature 'project'.
> 
> Inode project id is stored in place of obsolete field 'i_faddr' (that trick was
> suggested by Darrick J. Wong in previous discussions of project quota).
> Linux never used that field and present fsck checks that it contains zero.
> 
> Quota information is stored in special inode №11 (by default only 10 inodes are
> reserved for special usage, I've add option to resize2fs to reserve more).
> (see e2fsprogs patches for details) For symmetry with other quotas inode number
> is stored in superblock.
> 
> Project quota supports only modern 'hidden' journaled mode.
> 
> Interface
> ---------
> 
> Interface for changing limits / query current usage is common vfs quotactl()
> where quotatype = PRJQUOTA = 2. User can query current state of any project
> mapped into user-ns, changing of limits requires CAP_SYS_ADMIN in init user-ns.
> 
> Two new ioctls for getting / changing inode project id:
> int ioctl(fd, FS_IOC_GETPROJECT, unsigned *project);
> int ioctl(fd, FS_IOC_SETPROJECT, unsigned *project);
> 
> They acts as interface for super-block methods get_project / set_project
> Generic code checks permissions, does project id translation in user-namespace
> mapping, grabs write-access to the filesystem, locks i_mutex for set opetaion.
> Filesystem method only updates inode and transfers project quota.
> 
> No new mount options added. Disk usage tracking is enabled at mount.
> Limits are enabeld later by "quotaon".
> 
> (BTW why journalled quota doesn't enable limits right at the time of mounting?)
  Well, mostly historical reasons :) The biggest probably was that to
properly initialize quotas (or fix them if quota file gets corrupted) you
have to still run quotacheck and for that kernel mustn't touch the files.
At the time when I was writing journaled quotas I didn't feel like adding
quotacheck support into e2fsck... We eventually did that anyway for support
of quota in system files but how we did that was actually based on the
experience I had from the journaled quota lesson ;).

> Why new ioctls?
> ---------------
> 
> XFS has it's own interface for that: XFS_IOC_FSGETXATTR / XFS_IOC_FSSETXATTR.
> But it has several flaws and doesn't fit for a role of generic interface.
> 
> It contains a lot of xfs-specific flags and racy by design: set operation
> commits all fields at once thus it's used in sequence get-change-set without
> any lock, Concurrent updates from user space will collide.
  Sure but that's the case for generic GETFLAGS / SETFLAGS interface as
well.

> Also xfs has flag XFS_XFLAG_PROJINHERIT which tells should new files inherit
> project id from parent directory or not. This flag is barely useful and only
> makes everything complicated. Even tools in xfsprogs don't use it: they always
> set it together with project id and clears when set project id back to zero.
  I think this is about balance - adding the support for the PROJINHERIT flag
costs us close to nothing and we get a compatibility with XFS. So even though
the usefulness of that flag is doubtful, I think the additional
compatibility is worth it.

> And the main reason: this compatibility gives nothing. The only user of xfs
> ioctl which I've found is the xfsprogs. And these tools check filesystem name
> and don't work anywhere except 'xfs'.
  The fact that you didn't find any other user doesn't mean there isn't
any. And historically if we though "nobody could be relying on this", we
were sometimes proven wrong - a few years later when it was *much* more
painful to make things compatible.

Here we speak about new feature for the filesystem so compatibility
requirements aren't that strong but still I find the case that the same
ioctl will just work on ext4 as it does on xfs more appealing than creating
a new simpler generic ioctl. That way userspace doesn't have to care on
which filesystem it is working or whether the current kernel already
supports the new ioctl for XFS. So please use the XFS interface is Li Xi
does in his patches.

								Honza

> Links
> -----
> 
> [1] 2014-12-09  ext4: add project quota support by Li Xi
> http://marc.info/?l=linux-fsdevel&m=141810265603565&w=2
> 
> [2] 2014-01-28  A draft for making ext4 support project quota by Zheng Liu
> http://marc.info/?l=linux-ext4&m=139089109605795&w=2
> 
> [3] 2012-07-09  introduce extended inode owner identifier v10 by Dmitry Monakhov
> http://thread.gmane.org/gmane.linux.file-systems/65752
> 
> [4] 2010-02-08  Introduce subtree quota support by Dmitry Monakhov
> http://thread.gmane.org/gmane.comp.file-systems.ext4/17530
> 
> 
> ---
> 
> Konstantin Khlebnikov (6):
>       fs: vfs ioctls for managing project id
>       fs: protected project id
>       quota: generic project quota
>       ext4: support project id and project quota
>       ext4: add shortcut for moving files across projects
>       ext4: mangle statfs results accourding to project quota usage and limits
> 
> 
>  Documentation/filesystems/Locking |    4 +
>  Documentation/filesystems/vfs.txt |    8 +++
>  Documentation/sysctl/fs.txt       |   16 ++++++
>  fs/compat_ioctl.c                 |    2 +
>  fs/ext4/ext4.h                    |   15 ++++-
>  fs/ext4/ialloc.c                  |    3 +
>  fs/ext4/inode.c                   |   15 +++++
>  fs/ext4/namei.c                   |  102 ++++++++++++++++++++++++++++++++++++-
>  fs/ext4/super.c                   |   61 ++++++++++++++++++++--
>  fs/ioctl.c                        |   62 ++++++++++++++++++++++
>  fs/quota/dquot.c                  |   96 +++++++++++++++++++++++++++++++++--
>  fs/quota/quota.c                  |    8 ++-
>  fs/quota/quotaio_v2.h             |    6 +-
>  include/linux/fs.h                |    3 +
>  include/linux/quota.h             |    1 
>  include/linux/quotaops.h          |   16 ++++++
>  include/uapi/linux/capability.h   |    1 
>  include/uapi/linux/fs.h           |    3 +
>  include/uapi/linux/quota.h        |    6 +-
>  kernel/sysctl.c                   |    9 +++
>  kernel/user_namespace.c           |    4 +
>  21 files changed, 416 insertions(+), 25 deletions(-)
> 
> --
> Signature
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 0/6] ext4: yet another project quota
  2015-03-16 16:52   ` Jan Kara
@ 2015-03-17  5:40     ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-17  5:40 UTC (permalink / raw)
  To: Jan Kara
  Cc: Konstantin Khlebnikov, linux-fsdevel, Dave Chinner, linux-ext4,
	Theodore Ts'o, Dmitry Monakhov, Andy Lutomirski,
	Linux Kernel Mailing List, Li Xi

On Mon, Mar 16, 2015 at 7:52 PM, Jan Kara <jack@suse.cz> wrote:
>   Hello,
>
> On Tue 10-03-15 20:22:04, Konstantin Khlebnikov wrote:
>> Projects quota allows to enforce disk quota for several subtrees or even
>> individual files on the filesystem. Each inode is marked with project-id
>> (independently from uid and gid) and accounted into corresponding project
>> quota. New files inherits project id from directory where they are created.
>>
>> This is must-have feature for deploying lightweight containers.
>> Also project quota can tell size of subtree without doing recursive 'du'.
>>
>> This patchset adds project id and quota into ext4.
>>
>> This time I've prepared patches also for e2fsprogs and quota-tools.
>>
>> All patches are available at github:
>> https://github.com/koct9i/linux --branch project
>> https://github.com/koct9i/e2fsprogs --branch project
>> https://github.com/koct9i/quota-tools --branch project
>>
>> Porposed behavior is similar to project quota in XFS:
>>
>> * all inode are marked with project id
>> * new files inherits project id from parent directory
>> * project quota accounts inodes and enforces limits
>> * cross-project link and rename operations are restricted
>
>   Thanks for the patches and sorry for taking a long time to look into
> them. I like your patches and they looks mostly fine to me but can we
> *please* start with making things completely compatible with how XFS works?
> I understand that may seem broken / not useful for your usecase but
> consistency among filesystems is really important.
>
> I was talking with Dave Chinner last week and we agreed that the direction
> you are changing things makes sense but getting things compatible with how
> they were for 15 years in XFS is an important first step (because there may
> be people with other usecases depending on the old behavior and they would
> get confused by ext4 behaving differently). After we have compatible code
> in, we can add your fs.protected_projects thing on top of that (and also
> support it in XFS to stay compatible).
>
>
>> Differences:
>>
>> There is no flag similar to XFS_XFLAG_PROJINHERIT (which allows to disable
>> project id inheritance), instead of that project which userspace sees as '0'
>> (in nested user-name space that might be actually non-zero project) acts as
>> default project where restrictions for link/rename are ignored.
>> (also see below, in "why new ioctl" part)
>>
>> This implementation adds shortcut for moving files from one project into
>> another: non-directory inodes with n_link == 1 are transferred without
>> copying (XFS in this case returns -EXDEV and userspace have to copy file).
>>
>> In XFS file owner (or process with CAP_FOWNER) can set set any project id,
>> XFS permits changing project id only from init user-namespace.
>>
>> This patchset adds sysctl fs.protected_projects. By default it's 0 and project
>> id acts as XFS project. Setting it to 1 makes chaning project id priviliged
>> operation which requires CAP_SYS_RESOURCE in current user-namespace, changing
>> project id mapping for nested user-namespace also requires that capability.
>> Thus there are two levels of control: project id mapping in user-ns defines set
>> of permitted projects and capability protects operations within this set.
>>
>> I see no problems with supporting all this in XFS, all difference in interface.
>>
>> Ext4 layout
>> -----------
>>
>> Project id introduce ro-compatible feature 'project'.
>>
>> Inode project id is stored in place of obsolete field 'i_faddr' (that trick was
>> suggested by Darrick J. Wong in previous discussions of project quota).
>> Linux never used that field and present fsck checks that it contains zero.
>>
>> Quota information is stored in special inode №11 (by default only 10 inodes are
>> reserved for special usage, I've add option to resize2fs to reserve more).
>> (see e2fsprogs patches for details) For symmetry with other quotas inode number
>> is stored in superblock.
>>
>> Project quota supports only modern 'hidden' journaled mode.
>>
>> Interface
>> ---------
>>
>> Interface for changing limits / query current usage is common vfs quotactl()
>> where quotatype = PRJQUOTA = 2. User can query current state of any project
>> mapped into user-ns, changing of limits requires CAP_SYS_ADMIN in init user-ns.
>>
>> Two new ioctls for getting / changing inode project id:
>> int ioctl(fd, FS_IOC_GETPROJECT, unsigned *project);
>> int ioctl(fd, FS_IOC_SETPROJECT, unsigned *project);
>>
>> They acts as interface for super-block methods get_project / set_project
>> Generic code checks permissions, does project id translation in user-namespace
>> mapping, grabs write-access to the filesystem, locks i_mutex for set opetaion.
>> Filesystem method only updates inode and transfers project quota.
>>
>> No new mount options added. Disk usage tracking is enabled at mount.
>> Limits are enabeld later by "quotaon".
>>
>> (BTW why journalled quota doesn't enable limits right at the time of mounting?)
>   Well, mostly historical reasons :) The biggest probably was that to
> properly initialize quotas (or fix them if quota file gets corrupted) you
> have to still run quotacheck and for that kernel mustn't touch the files.
> At the time when I was writing journaled quotas I didn't feel like adding
> quotacheck support into e2fsck... We eventually did that anyway for support
> of quota in system files but how we did that was actually based on the
> experience I had from the journaled quota lesson ;).
>
>> Why new ioctls?
>> ---------------
>>
>> XFS has it's own interface for that: XFS_IOC_FSGETXATTR / XFS_IOC_FSSETXATTR.
>> But it has several flaws and doesn't fit for a role of generic interface.
>>
>> It contains a lot of xfs-specific flags and racy by design: set operation
>> commits all fields at once thus it's used in sequence get-change-set without
>> any lock, Concurrent updates from user space will collide.

>   Sure but that's the case for generic GETFLAGS / SETFLAGS interface as
> well.

In this case we care only about project id. We can make interface clear,
non-racy and do it in just one syscall instead of two.

>
>> Also xfs has flag XFS_XFLAG_PROJINHERIT which tells should new files inherit
>> project id from parent directory or not. This flag is barely useful and only
>> makes everything complicated. Even tools in xfsprogs don't use it: they always
>> set it together with project id and clears when set project id back to zero.
>   I think this is about balance - adding the support for the PROJINHERIT flag
> costs us close to nothing and we get a compatibility with XFS. So even though
> the usefulness of that flag is doubtful, I think the additional
> compatibility is worth it.

There're only few free bits left. This bit gives nothing, xfsprogs can change it
but project quota management doesn't uses it. It gives nothing but complication.

Also this bit makes difference only for accounting directories itself
and it's internal
structures. This is pretty fs-specific thing.

>
>> And the main reason: this compatibility gives nothing. The only user of xfs
>> ioctl which I've found is the xfsprogs. And these tools check filesystem name
>> and don't work anywhere except 'xfs'.
>   The fact that you didn't find any other user doesn't mean there isn't
> any. And historically if we though "nobody could be relying on this", we
> were sometimes proven wrong - a few years later when it was *much* more
> painful to make things compatible.
>
> Here we speak about new feature for the filesystem so compatibility
> requirements aren't that strong but still I find the case that the same
> ioctl will just work on ext4 as it does on xfs more appealing than creating
> a new simpler generic ioctl. That way userspace doesn't have to care on
> which filesystem it is working or whether the current kernel already
> supports the new ioctl for XFS. So please use the XFS interface is Li Xi
> does in his patches.

Please consider this as _new_ feature which has compatible behaviour
from user's point of view. That's perfect time to fix its birth defects and
redesign behavior to satisfy expectations of new users. I do not care
about migration from xfs to ext4, that's not my case. I do not care about
mythical closed source software which might use this interface.

As I already said ioctl compatibility gives nothing but legacy problems.
Unexpected apperance of partial compatibility might be a nightmare.
And I've seen a lot when I worked with mixed containerized evironments.
it emerges untested and unexpected combinations of old and new software.
But, of course not in this case because nobody uses this interface except
xfsprogs. And as I said without update it works only with XFS.

Also that's ioctl is so poorly designed for generic usage so resulting code
looks like a shit. I just cannot add this into the kernel without any
good reason.

I've implemented interface and behaviour xfs-compatible enough to add
support of that into xfsprogs/xfstests in few lines. That's my next step after
polishing support in quota-tools.

>
>                                                                 Honza
>
>> Links
>> -----
>>
>> [1] 2014-12-09  ext4: add project quota support by Li Xi
>> http://marc.info/?l=linux-fsdevel&m=141810265603565&w=2
>>
>> [2] 2014-01-28  A draft for making ext4 support project quota by Zheng Liu
>> http://marc.info/?l=linux-ext4&m=139089109605795&w=2
>>
>> [3] 2012-07-09  introduce extended inode owner identifier v10 by Dmitry Monakhov
>> http://thread.gmane.org/gmane.linux.file-systems/65752
>>
>> [4] 2010-02-08  Introduce subtree quota support by Dmitry Monakhov
>> http://thread.gmane.org/gmane.comp.file-systems.ext4/17530
>>
>>
>> ---
>>
>> Konstantin Khlebnikov (6):
>>       fs: vfs ioctls for managing project id
>>       fs: protected project id
>>       quota: generic project quota
>>       ext4: support project id and project quota
>>       ext4: add shortcut for moving files across projects
>>       ext4: mangle statfs results accourding to project quota usage and limits
>>
>>
>>  Documentation/filesystems/Locking |    4 +
>>  Documentation/filesystems/vfs.txt |    8 +++
>>  Documentation/sysctl/fs.txt       |   16 ++++++
>>  fs/compat_ioctl.c                 |    2 +
>>  fs/ext4/ext4.h                    |   15 ++++-
>>  fs/ext4/ialloc.c                  |    3 +
>>  fs/ext4/inode.c                   |   15 +++++
>>  fs/ext4/namei.c                   |  102 ++++++++++++++++++++++++++++++++++++-
>>  fs/ext4/super.c                   |   61 ++++++++++++++++++++--
>>  fs/ioctl.c                        |   62 ++++++++++++++++++++++
>>  fs/quota/dquot.c                  |   96 +++++++++++++++++++++++++++++++++--
>>  fs/quota/quota.c                  |    8 ++-
>>  fs/quota/quotaio_v2.h             |    6 +-
>>  include/linux/fs.h                |    3 +
>>  include/linux/quota.h             |    1
>>  include/linux/quotaops.h          |   16 ++++++
>>  include/uapi/linux/capability.h   |    1
>>  include/uapi/linux/fs.h           |    3 +
>>  include/uapi/linux/quota.h        |    6 +-
>>  kernel/sysctl.c                   |    9 +++
>>  kernel/user_namespace.c           |    4 +
>>  21 files changed, 416 insertions(+), 25 deletions(-)
>>
>> --
>> Signature
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 0/6] ext4: yet another project quota
@ 2015-03-17  5:40     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-17  5:40 UTC (permalink / raw)
  To: Jan Kara
  Cc: Konstantin Khlebnikov, linux-fsdevel, Dave Chinner, linux-ext4,
	Theodore Ts'o, Dmitry Monakhov, Andy Lutomirski,
	Linux Kernel Mailing List, Li Xi

On Mon, Mar 16, 2015 at 7:52 PM, Jan Kara <jack@suse.cz> wrote:
>   Hello,
>
> On Tue 10-03-15 20:22:04, Konstantin Khlebnikov wrote:
>> Projects quota allows to enforce disk quota for several subtrees or even
>> individual files on the filesystem. Each inode is marked with project-id
>> (independently from uid and gid) and accounted into corresponding project
>> quota. New files inherits project id from directory where they are created.
>>
>> This is must-have feature for deploying lightweight containers.
>> Also project quota can tell size of subtree without doing recursive 'du'.
>>
>> This patchset adds project id and quota into ext4.
>>
>> This time I've prepared patches also for e2fsprogs and quota-tools.
>>
>> All patches are available at github:
>> https://github.com/koct9i/linux --branch project
>> https://github.com/koct9i/e2fsprogs --branch project
>> https://github.com/koct9i/quota-tools --branch project
>>
>> Porposed behavior is similar to project quota in XFS:
>>
>> * all inode are marked with project id
>> * new files inherits project id from parent directory
>> * project quota accounts inodes and enforces limits
>> * cross-project link and rename operations are restricted
>
>   Thanks for the patches and sorry for taking a long time to look into
> them. I like your patches and they looks mostly fine to me but can we
> *please* start with making things completely compatible with how XFS works?
> I understand that may seem broken / not useful for your usecase but
> consistency among filesystems is really important.
>
> I was talking with Dave Chinner last week and we agreed that the direction
> you are changing things makes sense but getting things compatible with how
> they were for 15 years in XFS is an important first step (because there may
> be people with other usecases depending on the old behavior and they would
> get confused by ext4 behaving differently). After we have compatible code
> in, we can add your fs.protected_projects thing on top of that (and also
> support it in XFS to stay compatible).
>
>
>> Differences:
>>
>> There is no flag similar to XFS_XFLAG_PROJINHERIT (which allows to disable
>> project id inheritance), instead of that project which userspace sees as '0'
>> (in nested user-name space that might be actually non-zero project) acts as
>> default project where restrictions for link/rename are ignored.
>> (also see below, in "why new ioctl" part)
>>
>> This implementation adds shortcut for moving files from one project into
>> another: non-directory inodes with n_link == 1 are transferred without
>> copying (XFS in this case returns -EXDEV and userspace have to copy file).
>>
>> In XFS file owner (or process with CAP_FOWNER) can set set any project id,
>> XFS permits changing project id only from init user-namespace.
>>
>> This patchset adds sysctl fs.protected_projects. By default it's 0 and project
>> id acts as XFS project. Setting it to 1 makes chaning project id priviliged
>> operation which requires CAP_SYS_RESOURCE in current user-namespace, changing
>> project id mapping for nested user-namespace also requires that capability.
>> Thus there are two levels of control: project id mapping in user-ns defines set
>> of permitted projects and capability protects operations within this set.
>>
>> I see no problems with supporting all this in XFS, all difference in interface.
>>
>> Ext4 layout
>> -----------
>>
>> Project id introduce ro-compatible feature 'project'.
>>
>> Inode project id is stored in place of obsolete field 'i_faddr' (that trick was
>> suggested by Darrick J. Wong in previous discussions of project quota).
>> Linux never used that field and present fsck checks that it contains zero.
>>
>> Quota information is stored in special inode №11 (by default only 10 inodes are
>> reserved for special usage, I've add option to resize2fs to reserve more).
>> (see e2fsprogs patches for details) For symmetry with other quotas inode number
>> is stored in superblock.
>>
>> Project quota supports only modern 'hidden' journaled mode.
>>
>> Interface
>> ---------
>>
>> Interface for changing limits / query current usage is common vfs quotactl()
>> where quotatype = PRJQUOTA = 2. User can query current state of any project
>> mapped into user-ns, changing of limits requires CAP_SYS_ADMIN in init user-ns.
>>
>> Two new ioctls for getting / changing inode project id:
>> int ioctl(fd, FS_IOC_GETPROJECT, unsigned *project);
>> int ioctl(fd, FS_IOC_SETPROJECT, unsigned *project);
>>
>> They acts as interface for super-block methods get_project / set_project
>> Generic code checks permissions, does project id translation in user-namespace
>> mapping, grabs write-access to the filesystem, locks i_mutex for set opetaion.
>> Filesystem method only updates inode and transfers project quota.
>>
>> No new mount options added. Disk usage tracking is enabled at mount.
>> Limits are enabeld later by "quotaon".
>>
>> (BTW why journalled quota doesn't enable limits right at the time of mounting?)
>   Well, mostly historical reasons :) The biggest probably was that to
> properly initialize quotas (or fix them if quota file gets corrupted) you
> have to still run quotacheck and for that kernel mustn't touch the files.
> At the time when I was writing journaled quotas I didn't feel like adding
> quotacheck support into e2fsck... We eventually did that anyway for support
> of quota in system files but how we did that was actually based on the
> experience I had from the journaled quota lesson ;).
>
>> Why new ioctls?
>> ---------------
>>
>> XFS has it's own interface for that: XFS_IOC_FSGETXATTR / XFS_IOC_FSSETXATTR.
>> But it has several flaws and doesn't fit for a role of generic interface.
>>
>> It contains a lot of xfs-specific flags and racy by design: set operation
>> commits all fields at once thus it's used in sequence get-change-set without
>> any lock, Concurrent updates from user space will collide.

>   Sure but that's the case for generic GETFLAGS / SETFLAGS interface as
> well.

In this case we care only about project id. We can make interface clear,
non-racy and do it in just one syscall instead of two.

>
>> Also xfs has flag XFS_XFLAG_PROJINHERIT which tells should new files inherit
>> project id from parent directory or not. This flag is barely useful and only
>> makes everything complicated. Even tools in xfsprogs don't use it: they always
>> set it together with project id and clears when set project id back to zero.
>   I think this is about balance - adding the support for the PROJINHERIT flag
> costs us close to nothing and we get a compatibility with XFS. So even though
> the usefulness of that flag is doubtful, I think the additional
> compatibility is worth it.

There're only few free bits left. This bit gives nothing, xfsprogs can change it
but project quota management doesn't uses it. It gives nothing but complication.

Also this bit makes difference only for accounting directories itself
and it's internal
structures. This is pretty fs-specific thing.

>
>> And the main reason: this compatibility gives nothing. The only user of xfs
>> ioctl which I've found is the xfsprogs. And these tools check filesystem name
>> and don't work anywhere except 'xfs'.
>   The fact that you didn't find any other user doesn't mean there isn't
> any. And historically if we though "nobody could be relying on this", we
> were sometimes proven wrong - a few years later when it was *much* more
> painful to make things compatible.
>
> Here we speak about new feature for the filesystem so compatibility
> requirements aren't that strong but still I find the case that the same
> ioctl will just work on ext4 as it does on xfs more appealing than creating
> a new simpler generic ioctl. That way userspace doesn't have to care on
> which filesystem it is working or whether the current kernel already
> supports the new ioctl for XFS. So please use the XFS interface is Li Xi
> does in his patches.

Please consider this as _new_ feature which has compatible behaviour
from user's point of view. That's perfect time to fix its birth defects and
redesign behavior to satisfy expectations of new users. I do not care
about migration from xfs to ext4, that's not my case. I do not care about
mythical closed source software which might use this interface.

As I already said ioctl compatibility gives nothing but legacy problems.
Unexpected apperance of partial compatibility might be a nightmare.
And I've seen a lot when I worked with mixed containerized evironments.
it emerges untested and unexpected combinations of old and new software.
But, of course not in this case because nobody uses this interface except
xfsprogs. And as I said without update it works only with XFS.

Also that's ioctl is so poorly designed for generic usage so resulting code
looks like a shit. I just cannot add this into the kernel without any
good reason.

I've implemented interface and behaviour xfs-compatible enough to add
support of that into xfsprogs/xfstests in few lines. That's my next step after
polishing support in quota-tools.

>
>                                                                 Honza
>
>> Links
>> -----
>>
>> [1] 2014-12-09  ext4: add project quota support by Li Xi
>> http://marc.info/?l=linux-fsdevel&m=141810265603565&w=2
>>
>> [2] 2014-01-28  A draft for making ext4 support project quota by Zheng Liu
>> http://marc.info/?l=linux-ext4&m=139089109605795&w=2
>>
>> [3] 2012-07-09  introduce extended inode owner identifier v10 by Dmitry Monakhov
>> http://thread.gmane.org/gmane.linux.file-systems/65752
>>
>> [4] 2010-02-08  Introduce subtree quota support by Dmitry Monakhov
>> http://thread.gmane.org/gmane.comp.file-systems.ext4/17530
>>
>>
>> ---
>>
>> Konstantin Khlebnikov (6):
>>       fs: vfs ioctls for managing project id
>>       fs: protected project id
>>       quota: generic project quota
>>       ext4: support project id and project quota
>>       ext4: add shortcut for moving files across projects
>>       ext4: mangle statfs results accourding to project quota usage and limits
>>
>>
>>  Documentation/filesystems/Locking |    4 +
>>  Documentation/filesystems/vfs.txt |    8 +++
>>  Documentation/sysctl/fs.txt       |   16 ++++++
>>  fs/compat_ioctl.c                 |    2 +
>>  fs/ext4/ext4.h                    |   15 ++++-
>>  fs/ext4/ialloc.c                  |    3 +
>>  fs/ext4/inode.c                   |   15 +++++
>>  fs/ext4/namei.c                   |  102 ++++++++++++++++++++++++++++++++++++-
>>  fs/ext4/super.c                   |   61 ++++++++++++++++++++--
>>  fs/ioctl.c                        |   62 ++++++++++++++++++++++
>>  fs/quota/dquot.c                  |   96 +++++++++++++++++++++++++++++++++--
>>  fs/quota/quota.c                  |    8 ++-
>>  fs/quota/quotaio_v2.h             |    6 +-
>>  include/linux/fs.h                |    3 +
>>  include/linux/quota.h             |    1
>>  include/linux/quotaops.h          |   16 ++++++
>>  include/uapi/linux/capability.h   |    1
>>  include/uapi/linux/fs.h           |    3 +
>>  include/uapi/linux/quota.h        |    6 +-
>>  kernel/sysctl.c                   |    9 +++
>>  kernel/user_namespace.c           |    4 +
>>  21 files changed, 416 insertions(+), 25 deletions(-)
>>
>> --
>> Signature
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 0/6] ext4: yet another project quota
  2015-03-17  5:40     ` Konstantin Khlebnikov
  (?)
@ 2015-03-19  9:16     ` Jan Kara
  -1 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2015-03-19  9:16 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Jan Kara, Konstantin Khlebnikov, linux-fsdevel, Dave Chinner,
	linux-ext4, Theodore Ts'o, Dmitry Monakhov, Andy Lutomirski,
	Linux Kernel Mailing List, Li Xi

On Tue 17-03-15 08:40:19, Konstantin Khlebnikov wrote:
> On Mon, Mar 16, 2015 at 7:52 PM, Jan Kara <jack@suse.cz> wrote:
> >> Why new ioctls?
> >> ---------------
> >>
> >> XFS has it's own interface for that: XFS_IOC_FSGETXATTR / XFS_IOC_FSSETXATTR.
> >> But it has several flaws and doesn't fit for a role of generic interface.
> >>
> >> It contains a lot of xfs-specific flags and racy by design: set operation
> >> commits all fields at once thus it's used in sequence get-change-set without
> >> any lock, Concurrent updates from user space will collide.
> 
> >   Sure but that's the case for generic GETFLAGS / SETFLAGS interface as
> > well.
> 
> In this case we care only about project id. We can make interface clear,
> non-racy and do it in just one syscall instead of two.
> 
> >
> >> Also xfs has flag XFS_XFLAG_PROJINHERIT which tells should new files inherit
> >> project id from parent directory or not. This flag is barely useful and only
> >> makes everything complicated. Even tools in xfsprogs don't use it: they always
> >> set it together with project id and clears when set project id back to zero.
> >   I think this is about balance - adding the support for the PROJINHERIT flag
> > costs us close to nothing and we get a compatibility with XFS. So even though
> > the usefulness of that flag is doubtful, I think the additional
> > compatibility is worth it.
> 
> There're only few free bits left. This bit gives nothing, xfsprogs can change it
> but project quota management doesn't uses it. It gives nothing but complication.
> 
> Also this bit makes difference only for accounting directories itself
> and it's internal structures. This is pretty fs-specific thing.
> 
> >
> >> And the main reason: this compatibility gives nothing. The only user of xfs
> >> ioctl which I've found is the xfsprogs. And these tools check filesystem name
> >> and don't work anywhere except 'xfs'.
> >   The fact that you didn't find any other user doesn't mean there isn't
> > any. And historically if we though "nobody could be relying on this", we
> > were sometimes proven wrong - a few years later when it was *much* more
> > painful to make things compatible.
> >
> > Here we speak about new feature for the filesystem so compatibility
> > requirements aren't that strong but still I find the case that the same
> > ioctl will just work on ext4 as it does on xfs more appealing than creating
> > a new simpler generic ioctl. That way userspace doesn't have to care on
> > which filesystem it is working or whether the current kernel already
> > supports the new ioctl for XFS. So please use the XFS interface is Li Xi
> > does in his patches.
> 
> Please consider this as _new_ feature which has compatible behaviour
> from user's point of view. That's perfect time to fix its birth defects and
> redesign behavior to satisfy expectations of new users. I do not care
> about migration from xfs to ext4, that's not my case. I do not care about
> mythical closed source software which might use this interface.
> 
> As I already said ioctl compatibility gives nothing but legacy problems.
> Unexpected apperance of partial compatibility might be a nightmare.
> And I've seen a lot when I worked with mixed containerized evironments.
> it emerges untested and unexpected combinations of old and new software.
> But, of course not in this case because nobody uses this interface except
> xfsprogs. And as I said without update it works only with XFS.
> 
> Also that's ioctl is so poorly designed for generic usage so resulting code
> looks like a shit. I just cannot add this into the kernel without any
> good reason.
> 
> I've implemented interface and behaviour xfs-compatible enough to add
> support of that into xfsprogs/xfstests in few lines. That's my next step
> after polishing support in quota-tools.
  Well, you are certainly free to write the feature any way you like. My
experience tells me that writing code compatible between filesystems pays
off even though the code isn't elegant. Ultimately, it's upto Ted as an
ext4 maintainer to decide which approach to take.

For now I've merged Li Xi's first patch which enables project quotas in VFS
quota core and that's independent of this decision.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-03-19  9:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-10 17:22 [PATCH RFC v2 0/6] ext4: yet another project quota Konstantin Khlebnikov
2015-03-10 17:22 ` Konstantin Khlebnikov
2015-03-10 17:22 ` [PATCH RFC v2 1/6] fs: vfs ioctls for managing project id Konstantin Khlebnikov
2015-03-11  7:00   ` Andreas Dilger
2015-03-11  7:19     ` Konstantin Khlebnikov
2015-03-10 17:22 ` [PATCH RFC v2 2/6] fs: protected " Konstantin Khlebnikov
2015-03-10 17:32   ` Andy Lutomirski
2015-03-10 18:51     ` Konstantin Khlebnikov
2015-03-10 18:57       ` Andy Lutomirski
2015-03-10 17:22 ` [PATCH RFC v2 3/6] quota: generic project quota Konstantin Khlebnikov
2015-03-10 17:22 ` [PATCH RFC v2 4/6] ext4: support project id and " Konstantin Khlebnikov
2015-03-10 17:22 ` [PATCH RFC v2 5/6] ext4: add shortcut for moving files across projects Konstantin Khlebnikov
2015-03-10 17:22 ` [PATCH RFC v2 6/6] ext4: mangle statfs results accourding to project quota usage and limits Konstantin Khlebnikov
2015-03-16 16:52 ` [PATCH RFC v2 0/6] ext4: yet another project quota Jan Kara
2015-03-16 16:52   ` Jan Kara
2015-03-17  5:40   ` Konstantin Khlebnikov
2015-03-17  5:40     ` Konstantin Khlebnikov
2015-03-19  9:16     ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.