linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ovl: do not ignore disk quota if current task is not privileged
@ 2017-01-10 11:26 Konstantin Khlebnikov
  2017-01-10 14:46 ` Vivek Goyal
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Konstantin Khlebnikov @ 2017-01-10 11:26 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, linux-unionfs; +Cc: Vivek Goyal, Miklos Szeredi

If overlay was mounted by root then quota set for upper layer does not work
because overlay now always use mounter's credentials for operations.

This patch adds second copy of credentials without CAP_SYS_RESOURCE and
use it if current task doesn't have this capability in mounter's user-ns.
This affects creation new files, whiteouts, and copy-up operations.

Now quota limits are ignored only if both mounter and current task have
capability CAP_SYS_RESOURCE in root user namespace.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context")
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/ovl_entry.h |    2 ++
 fs/overlayfs/super.c     |   13 ++++++++++++-
 fs/overlayfs/util.c      |   10 +++++++++-
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index d14bca1850d9..55eb3b08e292 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -27,6 +27,8 @@ struct ovl_fs {
 	struct ovl_config config;
 	/* creds of process who forced instantiation of super block */
 	const struct cred *creator_cred;
+	/* the same credentials without CAP_SYS_RESOURCE */
+	const struct cred *creator_cred_unpriv;
 };
 
 /* private information held for every overlayfs dentry */
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 20f48abbb82f..6a15693641e0 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -157,6 +157,7 @@ static void ovl_put_super(struct super_block *sb)
 	kfree(ufs->config.upperdir);
 	kfree(ufs->config.workdir);
 	put_cred(ufs->creator_cred);
+	put_cred(ufs->creator_cred_unpriv);
 	kfree(ufs);
 }
 
@@ -701,6 +702,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	unsigned int stacklen = 0;
 	unsigned int i;
 	bool remote = false;
+	struct cred *cred;
 	int err;
 
 	err = -ENOMEM;
@@ -874,10 +876,17 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	if (!ufs->creator_cred)
 		goto out_put_lower_mnt;
 
+	cred = prepare_creds();
+	if (!cred)
+		goto out_put_cred;
+
+	ufs->creator_cred_unpriv = cred;
+	cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
+
 	err = -ENOMEM;
 	oe = ovl_alloc_entry(numlower);
 	if (!oe)
-		goto out_put_cred;
+		goto out_put_cred_unpriv;
 
 	sb->s_magic = OVERLAYFS_SUPER_MAGIC;
 	sb->s_op = &ovl_super_operations;
@@ -914,6 +923,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 out_free_oe:
 	kfree(oe);
+out_put_cred_unpriv:
+	put_cred(ufs->creator_cred_unpriv);
 out_put_cred:
 	put_cred(ufs->creator_cred);
 out_put_lower_mnt:
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 952286f4826c..92f60096c5da 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -35,8 +35,16 @@ struct dentry *ovl_workdir(struct dentry *dentry)
 const struct cred *ovl_override_creds(struct super_block *sb)
 {
 	struct ovl_fs *ofs = sb->s_fs_info;
+	const struct cred *cred = ofs->creator_cred;
 
-	return override_creds(ofs->creator_cred);
+	/*
+	 * Do not override quota inode limit if current task is not
+	 * capable to do that in mounter's user namespace.
+	 */
+	if (!ns_capable_noaudit(cred->user_ns, CAP_SYS_RESOURCE))
+		cred = ofs->creator_cred_unpriv;
+
+	return override_creds(cred);
 }
 
 struct ovl_entry *ovl_alloc_entry(unsigned int numlower)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] ovl: do not ignore disk quota if current task is not privileged
  2017-01-10 11:26 [PATCH] ovl: do not ignore disk quota if current task is not privileged Konstantin Khlebnikov
@ 2017-01-10 14:46 ` Vivek Goyal
  2017-01-10 15:57   ` Miklos Szeredi
  2017-01-10 16:06 ` Vivek Goyal
  2017-01-10 18:30 ` [PATCH v2] ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials Konstantin Khlebnikov
  2 siblings, 1 reply; 16+ messages in thread
From: Vivek Goyal @ 2017-01-10 14:46 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-fsdevel, linux-kernel, linux-unionfs, Miklos Szeredi

On Tue, Jan 10, 2017 at 02:26:48PM +0300, Konstantin Khlebnikov wrote:
> If overlay was mounted by root then quota set for upper layer does not work
> because overlay now always use mounter's credentials for operations.
> 
> This patch adds second copy of credentials without CAP_SYS_RESOURCE and
> use it if current task doesn't have this capability in mounter's user-ns.
> This affects creation new files, whiteouts, and copy-up operations.
> 
> Now quota limits are ignored only if both mounter and current task have
> capability CAP_SYS_RESOURCE in root user namespace.

This makes sense to me. I too would like quota to take effect for
containers on overlay.

I will test it. 

Vivek

> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context")
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  fs/overlayfs/ovl_entry.h |    2 ++
>  fs/overlayfs/super.c     |   13 ++++++++++++-
>  fs/overlayfs/util.c      |   10 +++++++++-
>  3 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index d14bca1850d9..55eb3b08e292 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -27,6 +27,8 @@ struct ovl_fs {
>  	struct ovl_config config;
>  	/* creds of process who forced instantiation of super block */
>  	const struct cred *creator_cred;
> +	/* the same credentials without CAP_SYS_RESOURCE */
> +	const struct cred *creator_cred_unpriv;
>  };
>  
>  /* private information held for every overlayfs dentry */
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 20f48abbb82f..6a15693641e0 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -157,6 +157,7 @@ static void ovl_put_super(struct super_block *sb)
>  	kfree(ufs->config.upperdir);
>  	kfree(ufs->config.workdir);
>  	put_cred(ufs->creator_cred);
> +	put_cred(ufs->creator_cred_unpriv);
>  	kfree(ufs);
>  }
>  
> @@ -701,6 +702,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  	unsigned int stacklen = 0;
>  	unsigned int i;
>  	bool remote = false;
> +	struct cred *cred;
>  	int err;
>  
>  	err = -ENOMEM;
> @@ -874,10 +876,17 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  	if (!ufs->creator_cred)
>  		goto out_put_lower_mnt;
>  
> +	cred = prepare_creds();
> +	if (!cred)
> +		goto out_put_cred;
> +
> +	ufs->creator_cred_unpriv = cred;
> +	cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
> +
>  	err = -ENOMEM;
>  	oe = ovl_alloc_entry(numlower);
>  	if (!oe)
> -		goto out_put_cred;
> +		goto out_put_cred_unpriv;
>  
>  	sb->s_magic = OVERLAYFS_SUPER_MAGIC;
>  	sb->s_op = &ovl_super_operations;
> @@ -914,6 +923,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  
>  out_free_oe:
>  	kfree(oe);
> +out_put_cred_unpriv:
> +	put_cred(ufs->creator_cred_unpriv);
>  out_put_cred:
>  	put_cred(ufs->creator_cred);
>  out_put_lower_mnt:
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 952286f4826c..92f60096c5da 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -35,8 +35,16 @@ struct dentry *ovl_workdir(struct dentry *dentry)
>  const struct cred *ovl_override_creds(struct super_block *sb)
>  {
>  	struct ovl_fs *ofs = sb->s_fs_info;
> +	const struct cred *cred = ofs->creator_cred;
>  
> -	return override_creds(ofs->creator_cred);
> +	/*
> +	 * Do not override quota inode limit if current task is not
> +	 * capable to do that in mounter's user namespace.
> +	 */
> +	if (!ns_capable_noaudit(cred->user_ns, CAP_SYS_RESOURCE))
> +		cred = ofs->creator_cred_unpriv;
> +
> +	return override_creds(cred);
>  }
>  
>  struct ovl_entry *ovl_alloc_entry(unsigned int numlower)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ovl: do not ignore disk quota if current task is not privileged
  2017-01-10 14:46 ` Vivek Goyal
@ 2017-01-10 15:57   ` Miklos Szeredi
  2017-01-10 16:34     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 16+ messages in thread
From: Miklos Szeredi @ 2017-01-10 15:57 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Konstantin Khlebnikov, linux-fsdevel, linux-kernel, linux-unionfs

On Tue, Jan 10, 2017 at 3:46 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Jan 10, 2017 at 02:26:48PM +0300, Konstantin Khlebnikov wrote:
>> If overlay was mounted by root then quota set for upper layer does not work
>> because overlay now always use mounter's credentials for operations.
>>
>> This patch adds second copy of credentials without CAP_SYS_RESOURCE and
>> use it if current task doesn't have this capability in mounter's user-ns.
>> This affects creation new files, whiteouts, and copy-up operations.
>>
>> Now quota limits are ignored only if both mounter and current task have
>> capability CAP_SYS_RESOURCE in root user namespace.
>
> This makes sense to me. I too would like quota to take effect for
> containers on overlay.

At first sight I hated this patch.  It breaks the nice concept that
underlying filesystems are just storage for the overlay and don't care
about caller's privileges (as a block device wouldn't care about
caller's privileges when allocating space).

However I don't see a good way around this, so...

Looks like this also has effect on reserving space in ext4, not sure
what that entails.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ovl: do not ignore disk quota if current task is not privileged
  2017-01-10 11:26 [PATCH] ovl: do not ignore disk quota if current task is not privileged Konstantin Khlebnikov
  2017-01-10 14:46 ` Vivek Goyal
@ 2017-01-10 16:06 ` Vivek Goyal
  2017-01-10 16:30   ` Konstantin Khlebnikov
  2017-01-10 16:35   ` Vivek Goyal
  2017-01-10 18:30 ` [PATCH v2] ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials Konstantin Khlebnikov
  2 siblings, 2 replies; 16+ messages in thread
From: Vivek Goyal @ 2017-01-10 16:06 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-fsdevel, linux-kernel, linux-unionfs, Miklos Szeredi

On Tue, Jan 10, 2017 at 02:26:48PM +0300, Konstantin Khlebnikov wrote:
> If overlay was mounted by root then quota set for upper layer does not work
> because overlay now always use mounter's credentials for operations.
> 

Hi Konstantin,

So CAP_SYS_RESOURCE bypasses the quota checks?

I just created dir upper on xfs filesystem and defined quota of 1G and
as root user (with cap_sys_resoureces), I am not able to create file
bigger than 1g in that dir. So looks like xfs quota took affect even
for privileged user with CAP_SYS_RESOURCE set.

What am I missing?

Vivek

> This patch adds second copy of credentials without CAP_SYS_RESOURCE and
> use it if current task doesn't have this capability in mounter's user-ns.
> This affects creation new files, whiteouts, and copy-up operations.
> 
> Now quota limits are ignored only if both mounter and current task have
> capability CAP_SYS_RESOURCE in root user namespace.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context")
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  fs/overlayfs/ovl_entry.h |    2 ++
>  fs/overlayfs/super.c     |   13 ++++++++++++-
>  fs/overlayfs/util.c      |   10 +++++++++-
>  3 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index d14bca1850d9..55eb3b08e292 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -27,6 +27,8 @@ struct ovl_fs {
>  	struct ovl_config config;
>  	/* creds of process who forced instantiation of super block */
>  	const struct cred *creator_cred;
> +	/* the same credentials without CAP_SYS_RESOURCE */
> +	const struct cred *creator_cred_unpriv;
>  };
>  
>  /* private information held for every overlayfs dentry */
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 20f48abbb82f..6a15693641e0 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -157,6 +157,7 @@ static void ovl_put_super(struct super_block *sb)
>  	kfree(ufs->config.upperdir);
>  	kfree(ufs->config.workdir);
>  	put_cred(ufs->creator_cred);
> +	put_cred(ufs->creator_cred_unpriv);
>  	kfree(ufs);
>  }
>  
> @@ -701,6 +702,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  	unsigned int stacklen = 0;
>  	unsigned int i;
>  	bool remote = false;
> +	struct cred *cred;
>  	int err;
>  
>  	err = -ENOMEM;
> @@ -874,10 +876,17 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  	if (!ufs->creator_cred)
>  		goto out_put_lower_mnt;
>  
> +	cred = prepare_creds();
> +	if (!cred)
> +		goto out_put_cred;
> +
> +	ufs->creator_cred_unpriv = cred;
> +	cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
> +
>  	err = -ENOMEM;
>  	oe = ovl_alloc_entry(numlower);
>  	if (!oe)
> -		goto out_put_cred;
> +		goto out_put_cred_unpriv;
>  
>  	sb->s_magic = OVERLAYFS_SUPER_MAGIC;
>  	sb->s_op = &ovl_super_operations;
> @@ -914,6 +923,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  
>  out_free_oe:
>  	kfree(oe);
> +out_put_cred_unpriv:
> +	put_cred(ufs->creator_cred_unpriv);
>  out_put_cred:
>  	put_cred(ufs->creator_cred);
>  out_put_lower_mnt:
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 952286f4826c..92f60096c5da 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -35,8 +35,16 @@ struct dentry *ovl_workdir(struct dentry *dentry)
>  const struct cred *ovl_override_creds(struct super_block *sb)
>  {
>  	struct ovl_fs *ofs = sb->s_fs_info;
> +	const struct cred *cred = ofs->creator_cred;
>  
> -	return override_creds(ofs->creator_cred);
> +	/*
> +	 * Do not override quota inode limit if current task is not
> +	 * capable to do that in mounter's user namespace.
> +	 */
> +	if (!ns_capable_noaudit(cred->user_ns, CAP_SYS_RESOURCE))
> +		cred = ofs->creator_cred_unpriv;
> +
> +	return override_creds(cred);
>  }
>  
>  struct ovl_entry *ovl_alloc_entry(unsigned int numlower)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ovl: do not ignore disk quota if current task is not privileged
  2017-01-10 16:06 ` Vivek Goyal
@ 2017-01-10 16:30   ` Konstantin Khlebnikov
  2017-01-10 16:35   ` Vivek Goyal
  1 sibling, 0 replies; 16+ messages in thread
From: Konstantin Khlebnikov @ 2017-01-10 16:30 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-fsdevel, linux-kernel, linux-unionfs, Miklos Szeredi

On 10.01.2017 19:06, Vivek Goyal wrote:
> On Tue, Jan 10, 2017 at 02:26:48PM +0300, Konstantin Khlebnikov wrote:
>> If overlay was mounted by root then quota set for upper layer does not work
>> because overlay now always use mounter's credentials for operations.
>>
>
> Hi Konstantin,
>
> So CAP_SYS_RESOURCE bypasses the quota checks?

Yep. See in fs/quota/dquot.c

static int ignore_hardlimit(struct dquot *dquot)
{
	struct mem_dqinfo *info = &sb_dqopt(dquot->dq_sb)->info[dquot->dq_id.type];

	return capable(CAP_SYS_RESOURCE) &&
	       (info->dqi_format->qf_fmt_id != QFMT_VFS_OLD ||
		!(info->dqi_flags & DQF_ROOT_SQUASH));
}

Feature DQF_ROOT_SQUASH which disables this bypassing is obsoleted in modern quota formats.

>
> I just created dir upper on xfs filesystem and defined quota of 1G and
> as root user (with cap_sys_resoureces), I am not able to create file
> bigger than 1g in that dir. So looks like xfs quota took affect even
> for privileged user with CAP_SYS_RESOURCE set.
>
> What am I missing?

XFS has its own quota and I cannot find any capable(CAP_SYS_RESOURCE) here.
So, probably XFS ignores this capability and always limits root user.

>
> Vivek
>
>> This patch adds second copy of credentials without CAP_SYS_RESOURCE and
>> use it if current task doesn't have this capability in mounter's user-ns.
>> This affects creation new files, whiteouts, and copy-up operations.
>>
>> Now quota limits are ignored only if both mounter and current task have
>> capability CAP_SYS_RESOURCE in root user namespace.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context")
>> Cc: Vivek Goyal <vgoyal@redhat.com>
>> Cc: Miklos Szeredi <mszeredi@redhat.com>
>> ---
>>  fs/overlayfs/ovl_entry.h |    2 ++
>>  fs/overlayfs/super.c     |   13 ++++++++++++-
>>  fs/overlayfs/util.c      |   10 +++++++++-
>>  3 files changed, 23 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
>> index d14bca1850d9..55eb3b08e292 100644
>> --- a/fs/overlayfs/ovl_entry.h
>> +++ b/fs/overlayfs/ovl_entry.h
>> @@ -27,6 +27,8 @@ struct ovl_fs {
>>  	struct ovl_config config;
>>  	/* creds of process who forced instantiation of super block */
>>  	const struct cred *creator_cred;
>> +	/* the same credentials without CAP_SYS_RESOURCE */
>> +	const struct cred *creator_cred_unpriv;
>>  };
>>
>>  /* private information held for every overlayfs dentry */
>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>> index 20f48abbb82f..6a15693641e0 100644
>> --- a/fs/overlayfs/super.c
>> +++ b/fs/overlayfs/super.c
>> @@ -157,6 +157,7 @@ static void ovl_put_super(struct super_block *sb)
>>  	kfree(ufs->config.upperdir);
>>  	kfree(ufs->config.workdir);
>>  	put_cred(ufs->creator_cred);
>> +	put_cred(ufs->creator_cred_unpriv);
>>  	kfree(ufs);
>>  }
>>
>> @@ -701,6 +702,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>  	unsigned int stacklen = 0;
>>  	unsigned int i;
>>  	bool remote = false;
>> +	struct cred *cred;
>>  	int err;
>>
>>  	err = -ENOMEM;
>> @@ -874,10 +876,17 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>  	if (!ufs->creator_cred)
>>  		goto out_put_lower_mnt;
>>
>> +	cred = prepare_creds();
>> +	if (!cred)
>> +		goto out_put_cred;
>> +
>> +	ufs->creator_cred_unpriv = cred;
>> +	cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
>> +
>>  	err = -ENOMEM;
>>  	oe = ovl_alloc_entry(numlower);
>>  	if (!oe)
>> -		goto out_put_cred;
>> +		goto out_put_cred_unpriv;
>>
>>  	sb->s_magic = OVERLAYFS_SUPER_MAGIC;
>>  	sb->s_op = &ovl_super_operations;
>> @@ -914,6 +923,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>
>>  out_free_oe:
>>  	kfree(oe);
>> +out_put_cred_unpriv:
>> +	put_cred(ufs->creator_cred_unpriv);
>>  out_put_cred:
>>  	put_cred(ufs->creator_cred);
>>  out_put_lower_mnt:
>> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
>> index 952286f4826c..92f60096c5da 100644
>> --- a/fs/overlayfs/util.c
>> +++ b/fs/overlayfs/util.c
>> @@ -35,8 +35,16 @@ struct dentry *ovl_workdir(struct dentry *dentry)
>>  const struct cred *ovl_override_creds(struct super_block *sb)
>>  {
>>  	struct ovl_fs *ofs = sb->s_fs_info;
>> +	const struct cred *cred = ofs->creator_cred;
>>
>> -	return override_creds(ofs->creator_cred);
>> +	/*
>> +	 * Do not override quota inode limit if current task is not
>> +	 * capable to do that in mounter's user namespace.
>> +	 */
>> +	if (!ns_capable_noaudit(cred->user_ns, CAP_SYS_RESOURCE))
>> +		cred = ofs->creator_cred_unpriv;
>> +
>> +	return override_creds(cred);
>>  }
>>
>>  struct ovl_entry *ovl_alloc_entry(unsigned int numlower)


-- 
Konstantin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ovl: do not ignore disk quota if current task is not privileged
  2017-01-10 15:57   ` Miklos Szeredi
@ 2017-01-10 16:34     ` Konstantin Khlebnikov
  2017-01-10 18:11       ` Amir Goldstein
  0 siblings, 1 reply; 16+ messages in thread
From: Konstantin Khlebnikov @ 2017-01-10 16:34 UTC (permalink / raw)
  To: Miklos Szeredi, Vivek Goyal; +Cc: linux-fsdevel, linux-kernel, linux-unionfs

On 10.01.2017 18:57, Miklos Szeredi wrote:
> On Tue, Jan 10, 2017 at 3:46 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> On Tue, Jan 10, 2017 at 02:26:48PM +0300, Konstantin Khlebnikov wrote:
>>> If overlay was mounted by root then quota set for upper layer does not work
>>> because overlay now always use mounter's credentials for operations.
>>>
>>> This patch adds second copy of credentials without CAP_SYS_RESOURCE and
>>> use it if current task doesn't have this capability in mounter's user-ns.
>>> This affects creation new files, whiteouts, and copy-up operations.
>>>
>>> Now quota limits are ignored only if both mounter and current task have
>>> capability CAP_SYS_RESOURCE in root user namespace.
>>
>> This makes sense to me. I too would like quota to take effect for
>> containers on overlay.
>
> At first sight I hated this patch.  It breaks the nice concept that
> underlying filesystems are just storage for the overlay and don't care
> about caller's privileges (as a block device wouldn't care about
> caller's privileges when allocating space).
>
> However I don't see a good way around this, so...

Another solution: just always drop CAP_SYS_RESOURCE from capabilities.

> Looks like this also has effect on reserving space in ext4, not sure
> what that entails.

Yes, CAP_SYS_RESOURCE allows to use reserved space and inodes.

-- 
Konstantin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ovl: do not ignore disk quota if current task is not privileged
  2017-01-10 16:06 ` Vivek Goyal
  2017-01-10 16:30   ` Konstantin Khlebnikov
@ 2017-01-10 16:35   ` Vivek Goyal
  2017-01-10 16:44     ` Konstantin Khlebnikov
  1 sibling, 1 reply; 16+ messages in thread
From: Vivek Goyal @ 2017-01-10 16:35 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-fsdevel, linux-kernel, linux-unionfs, Miklos Szeredi

On Tue, Jan 10, 2017 at 11:06:47AM -0500, Vivek Goyal wrote:
> On Tue, Jan 10, 2017 at 02:26:48PM +0300, Konstantin Khlebnikov wrote:
> > If overlay was mounted by root then quota set for upper layer does not work
> > because overlay now always use mounter's credentials for operations.
> > 
> 
> Hi Konstantin,
> 
> So CAP_SYS_RESOURCE bypasses the quota checks?
> 
> I just created dir upper on xfs filesystem and defined quota of 1G and
> as root user (with cap_sys_resoureces), I am not able to create file
> bigger than 1g in that dir. So looks like xfs quota took affect even
> for privileged user with CAP_SYS_RESOURCE set.
> 
> What am I missing?

As root user, I set limit of 1G on work/ and upper/ dir and created
a file of 4G in lower/ dir and mounted overlay on "merged" dir and
did "touch zerofile", and go error message.

touch: cannot touch 'zerofile': No space left on device

So looks like I am doing something wrong and not be able to reproduce
the issue to begin with.

Vivek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ovl: do not ignore disk quota if current task is not privileged
  2017-01-10 16:35   ` Vivek Goyal
@ 2017-01-10 16:44     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 16+ messages in thread
From: Konstantin Khlebnikov @ 2017-01-10 16:44 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-fsdevel, linux-kernel, linux-unionfs, Miklos Szeredi

On 10.01.2017 19:35, Vivek Goyal wrote:
> On Tue, Jan 10, 2017 at 11:06:47AM -0500, Vivek Goyal wrote:
>> On Tue, Jan 10, 2017 at 02:26:48PM +0300, Konstantin Khlebnikov wrote:
>>> If overlay was mounted by root then quota set for upper layer does not work
>>> because overlay now always use mounter's credentials for operations.
>>>
>>
>> Hi Konstantin,
>>
>> So CAP_SYS_RESOURCE bypasses the quota checks?
>>
>> I just created dir upper on xfs filesystem and defined quota of 1G and
>> as root user (with cap_sys_resoureces), I am not able to create file
>> bigger than 1g in that dir. So looks like xfs quota took affect even
>> for privileged user with CAP_SYS_RESOURCE set.
>>
>> What am I missing?
>
> As root user, I set limit of 1G on work/ and upper/ dir and created
> a file of 4G in lower/ dir and mounted overlay on "merged" dir and
> did "touch zerofile", and go error message.
>
> touch: cannot touch 'zerofile': No space left on device
>
> So looks like I am doing something wrong and not be able to reproduce
> the issue to begin with.
>

I'm using ext4 project quota around upper/work directories.
ext4 use generic quota implementation: it ignores hard limits if CAP_SYS_RESOURCE is set.

It seems XFS ignores CAP_SYS_RESOURCE for any kind of quota.
But I've found special meaning for CAP_FOWNER - chown() can ignore user/group
quota if caller have this capability.

-- 
Konstantin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ovl: do not ignore disk quota if current task is not privileged
  2017-01-10 16:34     ` Konstantin Khlebnikov
@ 2017-01-10 18:11       ` Amir Goldstein
  2017-01-10 18:30         ` Konstantin Khlebnikov
  0 siblings, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2017-01-10 18:11 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Miklos Szeredi, Vivek Goyal, linux-fsdevel, linux-kernel, linux-unionfs

On Tue, Jan 10, 2017 at 6:34 PM, Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
>
> On 10.01.2017 18:57, Miklos Szeredi wrote:
>>
>> On Tue, Jan 10, 2017 at 3:46 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>>>
>>> On Tue, Jan 10, 2017 at 02:26:48PM +0300, Konstantin Khlebnikov wrote:
>>>>
>>>> If overlay was mounted by root then quota set for upper layer does not work
>>>> because overlay now always use mounter's credentials for operations.
>>>>
>>>> This patch adds second copy of credentials without CAP_SYS_RESOURCE and
>>>> use it if current task doesn't have this capability in mounter's user-ns.
>>>> This affects creation new files, whiteouts, and copy-up operations.
>>>>
>>>> Now quota limits are ignored only if both mounter and current task have
>>>> capability CAP_SYS_RESOURCE in root user namespace.
>>>
>>>
>>> This makes sense to me. I too would like quota to take effect for
>>> containers on overlay.
>>
>>
>> At first sight I hated this patch.  It breaks the nice concept that
>> underlying filesystems are just storage for the overlay and don't care
>> about caller's privileges (as a block device wouldn't care about
>> caller's privileges when allocating space).
>>
>> However I don't see a good way around this, so...
>
>
> Another solution: just always drop CAP_SYS_RESOURCE from capabilities.
>

That sounds like a better (and simpler) solution.

Let overlayfs support mount options noquota|quota (default configurable
from Kconfig and module param) and 'quota' means drop CAP_SYS_RESOURCE.


>> Looks like this also has effect on reserving space in ext4, not sure
>> what that entails.
>
>
> Yes, CAP_SYS_RESOURCE allows to use reserved space and inodes.
>

That's really not good. It's beyond disobeying user quotas, because
file system may get to unrecoverable state when corruption is detected
and already filled the root reserved space.

Amir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ovl: do not ignore disk quota if current task is not privileged
  2017-01-10 18:11       ` Amir Goldstein
@ 2017-01-10 18:30         ` Konstantin Khlebnikov
  0 siblings, 0 replies; 16+ messages in thread
From: Konstantin Khlebnikov @ 2017-01-10 18:30 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Vivek Goyal, linux-fsdevel, linux-kernel, linux-unionfs

On 10.01.2017 21:11, Amir Goldstein wrote:
> On Tue, Jan 10, 2017 at 6:34 PM, Konstantin Khlebnikov
> <khlebnikov@yandex-team.ru> wrote:
>>
>> On 10.01.2017 18:57, Miklos Szeredi wrote:
>>>
>>> On Tue, Jan 10, 2017 at 3:46 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>>>>
>>>> On Tue, Jan 10, 2017 at 02:26:48PM +0300, Konstantin Khlebnikov wrote:
>>>>>
>>>>> If overlay was mounted by root then quota set for upper layer does not work
>>>>> because overlay now always use mounter's credentials for operations.
>>>>>
>>>>> This patch adds second copy of credentials without CAP_SYS_RESOURCE and
>>>>> use it if current task doesn't have this capability in mounter's user-ns.
>>>>> This affects creation new files, whiteouts, and copy-up operations.
>>>>>
>>>>> Now quota limits are ignored only if both mounter and current task have
>>>>> capability CAP_SYS_RESOURCE in root user namespace.
>>>>
>>>>
>>>> This makes sense to me. I too would like quota to take effect for
>>>> containers on overlay.
>>>
>>>
>>> At first sight I hated this patch.  It breaks the nice concept that
>>> underlying filesystems are just storage for the overlay and don't care
>>> about caller's privileges (as a block device wouldn't care about
>>> caller's privileges when allocating space).
>>>
>>> However I don't see a good way around this, so...
>>
>>
>> Another solution: just always drop CAP_SYS_RESOURCE from capabilities.
>>
>
> That sounds like a better (and simpler) solution.
>
> Let overlayfs support mount options noquota|quota (default configurable
> from Kconfig and module param) and 'quota' means drop CAP_SYS_RESOURCE.

Too complicated for me. Let's drop it unconditionally. See v2 patch.

>
>
>>> Looks like this also has effect on reserving space in ext4, not sure
>>> what that entails.
>>
>>
>> Yes, CAP_SYS_RESOURCE allows to use reserved space and inodes.
>>
>
> That's really not good. It's beyond disobeying user quotas, because
> file system may get to unrecoverable state when corruption is detected
> and already filled the root reserved space.
>
> Amir.
>


-- 
Konstantin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2] ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials
  2017-01-10 11:26 [PATCH] ovl: do not ignore disk quota if current task is not privileged Konstantin Khlebnikov
  2017-01-10 14:46 ` Vivek Goyal
  2017-01-10 16:06 ` Vivek Goyal
@ 2017-01-10 18:30 ` Konstantin Khlebnikov
  2017-01-10 19:17   ` Vivek Goyal
  2 siblings, 1 reply; 16+ messages in thread
From: Konstantin Khlebnikov @ 2017-01-10 18:30 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, linux-unionfs
  Cc: Amir Goldstein, Vivek Goyal, Miklos Szeredi

If overlay was mounted by root then quota set for upper layer does not work
because overlay now always use mounter's credentials for operations.
Also overlay might deplete reserved space and inodes in ext4.

This patch drops capability SYS_RESOURCE from saved credentials.
This affects creation new files, whiteouts, and copy-up operations.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context")
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/super.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 20f48abbb82f..8dba982e1af5 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -701,6 +701,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	unsigned int stacklen = 0;
 	unsigned int i;
 	bool remote = false;
+	struct cred *cred;
 	int err;
 
 	err = -ENOMEM;
@@ -870,10 +871,14 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	else
 		sb->s_d_op = &ovl_dentry_operations;
 
-	ufs->creator_cred = prepare_creds();
-	if (!ufs->creator_cred)
+	cred = prepare_creds();
+	if (!cred)
 		goto out_put_lower_mnt;
 
+	/* Never override disk quota limits or use reserved space */
+	cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
+	ufs->creator_cred = cred;
+
 	err = -ENOMEM;
 	oe = ovl_alloc_entry(numlower);
 	if (!oe)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials
  2017-01-10 18:30 ` [PATCH v2] ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials Konstantin Khlebnikov
@ 2017-01-10 19:17   ` Vivek Goyal
  2017-07-22  9:30     ` Amir Goldstein
  2017-07-24  8:19     ` Miklos Szeredi
  0 siblings, 2 replies; 16+ messages in thread
From: Vivek Goyal @ 2017-01-10 19:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-fsdevel, linux-kernel, linux-unionfs, Amir Goldstein,
	Miklos Szeredi

On Tue, Jan 10, 2017 at 09:30:21PM +0300, Konstantin Khlebnikov wrote:
> If overlay was mounted by root then quota set for upper layer does not work
> because overlay now always use mounter's credentials for operations.
> Also overlay might deplete reserved space and inodes in ext4.
> 
> This patch drops capability SYS_RESOURCE from saved credentials.
> This affects creation new files, whiteouts, and copy-up operations.
> 

I am not an expert in this area, but I thought previous patch was
better. I am not sure why overlay internal operations should be
done without CAP_SYS_RESOURCES when caller has CAP_SYS_RESOURCES. That
might be counter-intuitive.

If some task is allowed to bypass quota limitations on a file system
then same should be true when task is working on overlay.

Similary if a task is allowed to use reserved space on filesystem, then same
task should be allowed to use reserved space on underlying filesystem
when doing overlay.  It should not be overlay's job to prevent that?

May be it is just me....

Vivek


> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context")
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  fs/overlayfs/super.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 20f48abbb82f..8dba982e1af5 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -701,6 +701,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  	unsigned int stacklen = 0;
>  	unsigned int i;
>  	bool remote = false;
> +	struct cred *cred;
>  	int err;
>  
>  	err = -ENOMEM;
> @@ -870,10 +871,14 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  	else
>  		sb->s_d_op = &ovl_dentry_operations;
>  
> -	ufs->creator_cred = prepare_creds();
> -	if (!ufs->creator_cred)
> +	cred = prepare_creds();
> +	if (!cred)
>  		goto out_put_lower_mnt;
>  
> +	/* Never override disk quota limits or use reserved space */
> +	cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
> +	ufs->creator_cred = cred;
> +
>  	err = -ENOMEM;
>  	oe = ovl_alloc_entry(numlower);
>  	if (!oe)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials
  2017-01-10 19:17   ` Vivek Goyal
@ 2017-07-22  9:30     ` Amir Goldstein
  2017-07-24  8:15       ` Miklos Szeredi
  2017-07-24  8:19     ` Miklos Szeredi
  1 sibling, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2017-07-22  9:30 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Konstantin Khlebnikov, linux-fsdevel, linux-kernel, overlayfs,
	Miklos Szeredi, stable [v4.8]

Bumped into this patch (Now upstream commit 51f8f3c4e225) and realized
it is missing cc: stable # v4.8

At least this docker PR suggests that regression introduced in v4.8 will not be
appreciated down the road:
https://github.com/moby/moby/issues/29364


On Tue, Jan 10, 2017 at 9:17 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Jan 10, 2017 at 09:30:21PM +0300, Konstantin Khlebnikov wrote:
>> If overlay was mounted by root then quota set for upper layer does not work
>> because overlay now always use mounter's credentials for operations.
>> Also overlay might deplete reserved space and inodes in ext4.
>>
>> This patch drops capability SYS_RESOURCE from saved credentials.
>> This affects creation new files, whiteouts, and copy-up operations.
>>
>
> I am not an expert in this area, but I thought previous patch was
> better. I am not sure why overlay internal operations should be
> done without CAP_SYS_RESOURCES when caller has CAP_SYS_RESOURCES. That
> might be counter-intuitive.
>
> If some task is allowed to bypass quota limitations on a file system
> then same should be true when task is working on overlay.
>
> Similary if a task is allowed to use reserved space on filesystem, then same
> task should be allowed to use reserved space on underlying filesystem
> when doing overlay.  It should not be overlay's job to prevent that?
>
> May be it is just me....
>

Vivek,

Since your question was not answered in this thread, IMO, your concern
is just, but in practice I think that:
1. It's going to be harder to implement for every operation to combine the
    mounter's creds with the process capabilities... weird
2. The use case of ext4 reserved blocks is to allow sys admin some slack
    for disk allocations that are needed in order to free up disk space or for
    other critical tasks to prevent the system from hanging. It doesn't sound
    like this use case fits an overlayfs mount that well.
3. FYI, xfs project quota (which as you know can be applied to docker
    overlayfs container) does not check CAP_SYS_RESOURCES at all.
    and if and when ext4 project quotas can also be applied to docker
    overlayfs container, I am sure that containers admin will not appreciate
    a container exceeding its quota, even if that was a privileged process
    writing to that container

So IMO that fix as it is is good for all practical purpose.

Cheers,
Amir.

>
>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context")
>> Cc: Vivek Goyal <vgoyal@redhat.com>
>> Cc: Miklos Szeredi <mszeredi@redhat.com>
>> ---
>>  fs/overlayfs/super.c |    9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>> index 20f48abbb82f..8dba982e1af5 100644
>> --- a/fs/overlayfs/super.c
>> +++ b/fs/overlayfs/super.c
>> @@ -701,6 +701,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>       unsigned int stacklen = 0;
>>       unsigned int i;
>>       bool remote = false;
>> +     struct cred *cred;
>>       int err;
>>
>>       err = -ENOMEM;
>> @@ -870,10 +871,14 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>       else
>>               sb->s_d_op = &ovl_dentry_operations;
>>
>> -     ufs->creator_cred = prepare_creds();
>> -     if (!ufs->creator_cred)
>> +     cred = prepare_creds();
>> +     if (!cred)
>>               goto out_put_lower_mnt;
>>
>> +     /* Never override disk quota limits or use reserved space */
>> +     cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
>> +     ufs->creator_cred = cred;
>> +
>>       err = -ENOMEM;
>>       oe = ovl_alloc_entry(numlower);
>>       if (!oe)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials
  2017-07-22  9:30     ` Amir Goldstein
@ 2017-07-24  8:15       ` Miklos Szeredi
  2017-07-25  3:47         ` Greg KH
  0 siblings, 1 reply; 16+ messages in thread
From: Miklos Szeredi @ 2017-07-24  8:15 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Vivek Goyal, Konstantin Khlebnikov, linux-fsdevel, linux-kernel,
	overlayfs, stable [v4.8]

On Sat, Jul 22, 2017 at 11:30 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> Bumped into this patch (Now upstream commit 51f8f3c4e225) and realized
> it is missing cc: stable # v4.8
>
> At least this docker PR suggests that regression introduced in v4.8 will not be
> appreciated down the road:
> https://github.com/moby/moby/issues/29364

Greg,

Can you please queue 51f8f3c4e225 ("ovl: drop CAP_SYS_RESOURCE from
saved mounter's credentials") for 4.9.y?

Thanks,
Miklos


>
>
> On Tue, Jan 10, 2017 at 9:17 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> On Tue, Jan 10, 2017 at 09:30:21PM +0300, Konstantin Khlebnikov wrote:
>>> If overlay was mounted by root then quota set for upper layer does not work
>>> because overlay now always use mounter's credentials for operations.
>>> Also overlay might deplete reserved space and inodes in ext4.
>>>
>>> This patch drops capability SYS_RESOURCE from saved credentials.
>>> This affects creation new files, whiteouts, and copy-up operations.
>>>
>>
>> I am not an expert in this area, but I thought previous patch was
>> better. I am not sure why overlay internal operations should be
>> done without CAP_SYS_RESOURCES when caller has CAP_SYS_RESOURCES. That
>> might be counter-intuitive.
>>
>> If some task is allowed to bypass quota limitations on a file system
>> then same should be true when task is working on overlay.
>>
>> Similary if a task is allowed to use reserved space on filesystem, then same
>> task should be allowed to use reserved space on underlying filesystem
>> when doing overlay.  It should not be overlay's job to prevent that?
>>
>> May be it is just me....
>>
>
> Vivek,
>
> Since your question was not answered in this thread, IMO, your concern
> is just, but in practice I think that:
> 1. It's going to be harder to implement for every operation to combine the
>     mounter's creds with the process capabilities... weird
> 2. The use case of ext4 reserved blocks is to allow sys admin some slack
>     for disk allocations that are needed in order to free up disk space or for
>     other critical tasks to prevent the system from hanging. It doesn't sound
>     like this use case fits an overlayfs mount that well.
> 3. FYI, xfs project quota (which as you know can be applied to docker
>     overlayfs container) does not check CAP_SYS_RESOURCES at all.
>     and if and when ext4 project quotas can also be applied to docker
>     overlayfs container, I am sure that containers admin will not appreciate
>     a container exceeding its quota, even if that was a privileged process
>     writing to that container
>
> So IMO that fix as it is is good for all practical purpose.
>
> Cheers,
> Amir.
>
>>
>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>> Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context")
>>> Cc: Vivek Goyal <vgoyal@redhat.com>
>>> Cc: Miklos Szeredi <mszeredi@redhat.com>
>>> ---
>>>  fs/overlayfs/super.c |    9 +++++++--
>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>>> index 20f48abbb82f..8dba982e1af5 100644
>>> --- a/fs/overlayfs/super.c
>>> +++ b/fs/overlayfs/super.c
>>> @@ -701,6 +701,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>>       unsigned int stacklen = 0;
>>>       unsigned int i;
>>>       bool remote = false;
>>> +     struct cred *cred;
>>>       int err;
>>>
>>>       err = -ENOMEM;
>>> @@ -870,10 +871,14 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>>       else
>>>               sb->s_d_op = &ovl_dentry_operations;
>>>
>>> -     ufs->creator_cred = prepare_creds();
>>> -     if (!ufs->creator_cred)
>>> +     cred = prepare_creds();
>>> +     if (!cred)
>>>               goto out_put_lower_mnt;
>>>
>>> +     /* Never override disk quota limits or use reserved space */
>>> +     cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
>>> +     ufs->creator_cred = cred;
>>> +
>>>       err = -ENOMEM;
>>>       oe = ovl_alloc_entry(numlower);
>>>       if (!oe)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials
  2017-01-10 19:17   ` Vivek Goyal
  2017-07-22  9:30     ` Amir Goldstein
@ 2017-07-24  8:19     ` Miklos Szeredi
  1 sibling, 0 replies; 16+ messages in thread
From: Miklos Szeredi @ 2017-07-24  8:19 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Konstantin Khlebnikov, linux-fsdevel, linux-kernel,
	linux-unionfs, Amir Goldstein

On Tue, Jan 10, 2017 at 8:17 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Jan 10, 2017 at 09:30:21PM +0300, Konstantin Khlebnikov wrote:
>> If overlay was mounted by root then quota set for upper layer does not work
>> because overlay now always use mounter's credentials for operations.
>> Also overlay might deplete reserved space and inodes in ext4.
>>
>> This patch drops capability SYS_RESOURCE from saved credentials.
>> This affects creation new files, whiteouts, and copy-up operations.
>>
>
> I am not an expert in this area, but I thought previous patch was
> better. I am not sure why overlay internal operations should be
> done without CAP_SYS_RESOURCES when caller has CAP_SYS_RESOURCES. That
> might be counter-intuitive.
>
> If some task is allowed to bypass quota limitations on a file system
> then same should be true when task is working on overlay.
>
> Similary if a task is allowed to use reserved space on filesystem, then same
> task should be allowed to use reserved space on underlying filesystem
> when doing overlay.  It should not be overlay's job to prevent that?
>
> May be it is just me....

Well, depends how you look at at it.  Overlayfs just uses the
underlying filesystem as storage.  So this patch basically asserts
that overlayfs can only use non-reserved space for its storage.  I
don't see a problem with and it's the simpler fix, but if real use
cases turn up then this can be revisited.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials
  2017-07-24  8:15       ` Miklos Szeredi
@ 2017-07-25  3:47         ` Greg KH
  0 siblings, 0 replies; 16+ messages in thread
From: Greg KH @ 2017-07-25  3:47 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Amir Goldstein, Vivek Goyal, Konstantin Khlebnikov,
	linux-fsdevel, linux-kernel, overlayfs, stable [v4.8]

On Mon, Jul 24, 2017 at 10:15:22AM +0200, Miklos Szeredi wrote:
> On Sat, Jul 22, 2017 at 11:30 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> > Bumped into this patch (Now upstream commit 51f8f3c4e225) and realized
> > it is missing cc: stable # v4.8
> >
> > At least this docker PR suggests that regression introduced in v4.8 will not be
> > appreciated down the road:
> > https://github.com/moby/moby/issues/29364
> 
> Greg,
> 
> Can you please queue 51f8f3c4e225 ("ovl: drop CAP_SYS_RESOURCE from
> saved mounter's credentials") for 4.9.y?

Now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-07-25  3:48 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-10 11:26 [PATCH] ovl: do not ignore disk quota if current task is not privileged Konstantin Khlebnikov
2017-01-10 14:46 ` Vivek Goyal
2017-01-10 15:57   ` Miklos Szeredi
2017-01-10 16:34     ` Konstantin Khlebnikov
2017-01-10 18:11       ` Amir Goldstein
2017-01-10 18:30         ` Konstantin Khlebnikov
2017-01-10 16:06 ` Vivek Goyal
2017-01-10 16:30   ` Konstantin Khlebnikov
2017-01-10 16:35   ` Vivek Goyal
2017-01-10 16:44     ` Konstantin Khlebnikov
2017-01-10 18:30 ` [PATCH v2] ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials Konstantin Khlebnikov
2017-01-10 19:17   ` Vivek Goyal
2017-07-22  9:30     ` Amir Goldstein
2017-07-24  8:15       ` Miklos Szeredi
2017-07-25  3:47         ` Greg KH
2017-07-24  8:19     ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).