linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Inode limitation for overlayfs
@ 2020-03-26  5:45 Chengguang Xu
  2020-03-26  7:34 ` Amir Goldstein
  0 siblings, 1 reply; 7+ messages in thread
From: Chengguang Xu @ 2020-03-26  5:45 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, cgxu519

​Hello,

On container use case, in order to prevent inode exhaustion on host file system by particular containers,  we would like to add inode limitation for containers.
However,  current solution for inode limitation is based on project quota in specific underlying filesystem so it will also count deleted files(char type files) in overlay's upper layer.
Even worse, users may delete some lower layer files for getting more usable free inodes but the result will be opposite (consuming more inodes).

It is somewhat different compare to disk size limitation for overlayfs, so I think maybe we can add a limit option just for new files in overlayfs. What do you think?
If you guys think it is reasonable solution I can post a RFC patch for review.


Thanks,
cgxu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Inode limitation for overlayfs
  2020-03-26  5:45 Inode limitation for overlayfs Chengguang Xu
@ 2020-03-26  7:34 ` Amir Goldstein
  2020-03-27  5:18   ` Chengguang Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Amir Goldstein @ 2020-03-26  7:34 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: linux-unionfs, miklos

On Thu, Mar 26, 2020 at 7:45 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> Hello,
>
> On container use case, in order to prevent inode exhaustion on host file system by particular containers,  we would like to add inode limitation for containers.
> However,  current solution for inode limitation is based on project quota in specific underlying filesystem so it will also count deleted files(char type files) in overlay's upper layer.
> Even worse, users may delete some lower layer files for getting more usable free inodes but the result will be opposite (consuming more inodes).
>
> It is somewhat different compare to disk size limitation for overlayfs, so I think maybe we can add a limit option just for new files in overlayfs. What do you think?

The questions are where do we store the accounting and how do we maintain them.
An answer to those questions could be - in the inode index:

Currently, with nfs_export=on, there is already an index dir containing:
- 1 hardlink per copied up non-dir inode
- 1 directory per copied-up directory
- 1 whiteout per whiteout in upperdir (not an hardlink)

We can also make this behavior independent of nfs_export feature.
In the past, I proposed the option index=all for this behavior.

On mount, in ovl_indexdir_cleanup(), the index entries for file/dir/whiteout
can be counted and then maintained on index add/remove.

Now if you combine that with project quotas on upper/work dir, you get:
<Total upper/work inodes> = <pure upper inodes> + <non-dir index count> +
                                           2*<dir index count> +
2*<whiteout index count>

Assuming that you know the total from project quotas and the index counts
from overlayfs, you can calculate total pure upper.

Now you *can* implement upper inodes quota within overlayfs, but you
can also do that without changing overlayfs at all assuming you can
allow some slack in quota enforcement -
periodically scan the index dir and adjust project quota limits.

Note that if inodes are really expensive on your system, index_all
wastes 1 inode per whiteout + 1 inode per copied up dir, but those
counts should be pretty small compared to number of pure upper inodes
and copied up files.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Inode limitation for overlayfs
  2020-03-26  7:34 ` Amir Goldstein
@ 2020-03-27  5:18   ` Chengguang Xu
  2020-03-27  9:45     ` Amir Goldstein
  0 siblings, 1 reply; 7+ messages in thread
From: Chengguang Xu @ 2020-03-27  5:18 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-unionfs, miklos

 ---- 在 星期四, 2020-03-26 15:34:13 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Thu, Mar 26, 2020 at 7:45 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > > Hello,
 > >
 > > On container use case, in order to prevent inode exhaustion on host file system by particular containers,  we would like to add inode limitation for containers.
 > > However,  current solution for inode limitation is based on project quota in specific underlying filesystem so it will also count deleted files(char type files) in overlay's upper layer.
 > > Even worse, users may delete some lower layer files for getting more usable free inodes but the result will be opposite (consuming more inodes).
 > >
 > > It is somewhat different compare to disk size limitation for overlayfs, so I think maybe we can add a limit option just for new files in overlayfs. What do you think?
 > 
 > The questions are where do we store the accounting and how do we maintain them.
 > An answer to those questions could be - in the inode index:
 > 
 > Currently, with nfs_export=on, there is already an index dir containing:
 > - 1 hardlink per copied up non-dir inode
 > - 1 directory per copied-up directory
 > - 1 whiteout per whiteout in upperdir (not an hardlink)
 > 

Hi Amir,

Thanks for quick response and detail information. 

I think the simplest way is just store accounting info in memory(maybe  in s_fs_info).
At very first, I just thought  doing it for container use case, for container, it will be 
enough because the upper layer is always empty at starting time and will be destroyed 
at ending time.  

Adding a meta info to index dir is a  better solution for general use case but it seems
more complicated and I'm not sure if there are other use cases concern with this problem. 
Suggestion?


 > We can also make this behavior independent of nfs_export feature.
 > In the past, I proposed the option index=all for this behavior.
 > 
 > On mount, in ovl_indexdir_cleanup(), the index entries for file/dir/whiteout
 > can be counted and then maintained on index add/remove.
 > 
 > Now if you combine that with project quotas on upper/work dir, you get:
 > <Total upper/work inodes> = <pure upper inodes> + <non-dir index count> +
 >                                            2*<dir index count> +
 > 2*<whiteout index count>

I'm not clear what the exact relationships between those indexes and nfs_export
but  if possible I hope having  separated switches for every index functions and a total
switch(index=all) to enable all index functions at same time.

 > 
 > Assuming that you know the total from project quotas and the index counts
 > from overlayfs, you can calculate total pure upper.
 > 
 > Now you *can* implement upper inodes quota within overlayfs, but you
 > can also do that without changing overlayfs at all assuming you can
 > allow some slack in quota enforcement -
 > periodically scan the index dir and adjust project quota limits.

Dynamically changing inode limit  looks  too complicated to implement in management system
and having different quota limit during lifetime for same container may cause confusion to sys admins. 
So I still hope to solve this problem on overlayfs layer.

 > 
 > Note that if inodes are really expensive on your system, index_all
 > wastes 1 inode per whiteout + 1 inode per copied up dir, but those
 > counts should be pretty small compared to number of pure upper inodes
 > and copied up files.
 > 


Thanks,
cgxu.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Inode limitation for overlayfs
  2020-03-27  5:18   ` Chengguang Xu
@ 2020-03-27  9:45     ` Amir Goldstein
  2020-03-29 14:19       ` Chengguang Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Amir Goldstein @ 2020-03-27  9:45 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: linux-unionfs, miklos

On Fri, Mar 27, 2020 at 8:18 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期四, 2020-03-26 15:34:13 Amir Goldstein <amir73il@gmail.com> 撰写 ----
>  > On Thu, Mar 26, 2020 at 7:45 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >
>  > > Hello,
>  > >
>  > > On container use case, in order to prevent inode exhaustion on host file system by particular containers,  we would like to add inode limitation for containers.
>  > > However,  current solution for inode limitation is based on project quota in specific underlying filesystem so it will also count deleted files(char type files) in overlay's upper layer.
>  > > Even worse, users may delete some lower layer files for getting more usable free inodes but the result will be opposite (consuming more inodes).
>  > >
>  > > It is somewhat different compare to disk size limitation for overlayfs, so I think maybe we can add a limit option just for new files in overlayfs. What do you think?

You are saying above that the goal is to prevent inode exhaustion on
host file system,
but you want to allow containers to modify and delete unlimited number
of lower files
thus allowing inode exhaustion. I don't see the logic is that.

Even if we only count new files and present this information on df -i
how would users be able to free up inodes when they hit the limit?
How would they know which inodes to delete?

>  >
>  > The questions are where do we store the accounting and how do we maintain them.
>  > An answer to those questions could be - in the inode index:
>  >
>  > Currently, with nfs_export=on, there is already an index dir containing:
>  > - 1 hardlink per copied up non-dir inode
>  > - 1 directory per copied-up directory
>  > - 1 whiteout per whiteout in upperdir (not an hardlink)
>  >
>
> Hi Amir,
>
> Thanks for quick response and detail information.
>
> I think the simplest way is just store accounting info in memory(maybe  in s_fs_info).
> At very first, I just thought  doing it for container use case, for container, it will be
> enough because the upper layer is always empty at starting time and will be destroyed
> at ending time.

That is not a concept that overlayfs is currently aware of.
*If* the concept is acceptable and you do implement a feature intended for this
special use case, you should verify on mount time that upperdir is empty.

>
> Adding a meta info to index dir is a  better solution for general use case but it seems
> more complicated and I'm not sure if there are other use cases concern with this problem.
> Suggestion?

docker already supports container storage quota using project quotas
on upperdir (I implemented it).
Seems like a very natural extension to also limit no. of inodes.
The problem, as you wrote it above is that project quotas
"will also count deleted files(char type files) in overlay's upper layer."
My suggestion to you was a way to account for the whiteouts separately,
so you may deduct them from total inode count.
If you are saying my suggestion is complicated, perhaps you did not
understand it.

>
>
>  > We can also make this behavior independent of nfs_export feature.
>  > In the past, I proposed the option index=all for this behavior.
>  >
>  > On mount, in ovl_indexdir_cleanup(), the index entries for file/dir/whiteout
>  > can be counted and then maintained on index add/remove.
>  >
>  > Now if you combine that with project quotas on upper/work dir, you get:
>  > <Total upper/work inodes> = <pure upper inodes> + <non-dir index count> +
>  >                                            2*<dir index count> +
>  > 2*<whiteout index count>
>
> I'm not clear what the exact relationships between those indexes and nfs_export

nfs_export feature reuiqres index_all, but we did not have a reason (yet) to
add an option to enable index_all without enabling nfs_export:

/* Index all files on copy up. For now only enabled for NFS export */
bool ovl_index_all(struct super_block *sb)
{
        struct ovl_fs *ofs = sb->s_fs_info;

        return ofs->config.nfs_export && ofs->config.index;
}

> but  if possible I hope having  separated switches for every index functions and a total
> switch(index=all) to enable all index functions at same time.
>

FYI, index_all stands for "index all modified/deleted lower files (and dirs)"
At the moment, the only limitation of nfs_export=on that could be relaxed with
index=all is that nfs_export=on is mutually exclusive with metacopy=on.
index=all will not have this limitation.

>  >
>  > Assuming that you know the total from project quotas and the index counts
>  > from overlayfs, you can calculate total pure upper.
>  >
>  > Now you *can* implement upper inodes quota within overlayfs, but you
>  > can also do that without changing overlayfs at all assuming you can
>  > allow some slack in quota enforcement -
>  > periodically scan the index dir and adjust project quota limits.
>
> Dynamically changing inode limit  looks  too complicated to implement in management system
> and having different quota limit during lifetime for same container may cause confusion to sys admins.
> So I still hope to solve this problem on overlayfs layer.
>

To me that sounds like shifting complexity from system to kernel
for not a good enough reason and with loosing flexibility.
You are proposing a heuristic solution anyway because it is inherently
not immune against DoS of a malicious container that does rm -rf *.
So to me it makes more sense to deal with that logic in container
management level, where more heuristics can be applied, for example:
Allow to add up to X new files, modify %Y files from lower and
delete %Z files from lower.

Note that container management does not have to adjust project quota
limits periodically, it only needs to re-calculate and adjust project quota
limits when user gets out of quota warning.
I believe there are already mechanisms in Linux quota management to
notify management software of quota limit expiry in order to take action,
but I am not that familiar with those mechanisms.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Inode limitation for overlayfs
  2020-03-27  9:45     ` Amir Goldstein
@ 2020-03-29 14:19       ` Chengguang Xu
  2020-03-29 15:06         ` Amir Goldstein
  0 siblings, 1 reply; 7+ messages in thread
From: Chengguang Xu @ 2020-03-29 14:19 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-unionfs, miklos

 ---- 在 星期五, 2020-03-27 17:45:37 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Fri, Mar 27, 2020 at 8:18 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期四, 2020-03-26 15:34:13 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > >  > On Thu, Mar 26, 2020 at 7:45 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  > >
 > >  > > Hello,
 > >  > >
 > >  > > On container use case, in order to prevent inode exhaustion on host file system by particular containers,  we would like to add inode limitation for containers.
 > >  > > However,  current solution for inode limitation is based on project quota in specific underlying filesystem so it will also count deleted files(char type files) in overlay's upper layer.
 > >  > > Even worse, users may delete some lower layer files for getting more usable free inodes but the result will be opposite (consuming more inodes).
 > >  > >
 > >  > > It is somewhat different compare to disk size limitation for overlayfs, so I think maybe we can add a limit option just for new files in overlayfs. What do you think?
 > 
 > You are saying above that the goal is to prevent inode exhaustion on
 > host file system,
 > but you want to allow containers to modify and delete unlimited number
 > of lower files
 > thus allowing inode exhaustion. I don't see the logic is that.
 > 

End users do not understand kernel tech very well, so we just want to mitigate
container's different user experience as much as possible. In our point of view, 
consuming more inode by deleting lower file is the feature of overlayfs, it's not
caused by user's  abnormal using. However, we have to limit malicious user
program which is endlessly creating new files until host inode exhausting.


 > Even if we only count new files and present this information on df -i
 > how would users be able to free up inodes when they hit the limit?
 > How would they know which inodes to delete?
 > 
 > >  >
 > >  > The questions are where do we store the accounting and how do we maintain them.
 > >  > An answer to those questions could be - in the inode index:
 > >  >
 > >  > Currently, with nfs_export=on, there is already an index dir containing:
 > >  > - 1 hardlink per copied up non-dir inode
 > >  > - 1 directory per copied-up directory
 > >  > - 1 whiteout per whiteout in upperdir (not an hardlink)
 > >  >
 > >
 > > Hi Amir,
 > >
 > > Thanks for quick response and detail information.
 > >
 > > I think the simplest way is just store accounting info in memory(maybe  in s_fs_info).
 > > At very first, I just thought  doing it for container use case, for container, it will be
 > > enough because the upper layer is always empty at starting time and will be destroyed
 > > at ending time.
 > 
 > That is not a concept that overlayfs is currently aware of.
 > *If* the concept is acceptable and you do implement a feature intended for this
 > special use case, you should verify on mount time that upperdir is empty.
 > 
 > >
 > > Adding a meta info to index dir is a  better solution for general use case but it seems
 > > more complicated and I'm not sure if there are other use cases concern with this problem.
 > > Suggestion?
 > 
 > docker already supports container storage quota using project quotas
 > on upperdir (I implemented it).
 > Seems like a very natural extension to also limit no. of inodes.
 > The problem, as you wrote it above is that project quotas
 > "will also count deleted files(char type files) in overlay's upper layer."
 > My suggestion to you was a way to account for the whiteouts separately,
 > so you may deduct them from total inode count.
 > If you are saying my suggestion is complicated, perhaps you did not
 > understand it.
 > 

I think the key point here is the count of whiteout inode. I would like to
propose share same inode with different whiteout files so that we can save
inode significantly for whiteout files. After this, I think we can just implement
normal inode limit for container just like block limit.


 > >
 > >
 > >  > We can also make this behavior independent of nfs_export feature.
 > >  > In the past, I proposed the option index=all for this behavior.
 > >  >
 > >  > On mount, in ovl_indexdir_cleanup(), the index entries for file/dir/whiteout
 > >  > can be counted and then maintained on index add/remove.
 > >  >
 > >  > Now if you combine that with project quotas on upper/work dir, you get:
 > >  > <Total upper/work inodes> = <pure upper inodes> + <non-dir index count> +
 > >  >                                            2*<dir index count> +
 > >  > 2*<whiteout index count>
 > >
 > > I'm not clear what the exact relationships between those indexes and nfs_export
 > 
 > nfs_export feature reuiqres index_all, but we did not have a reason (yet) to
 > add an option to enable index_all without enabling nfs_export:
 > 
 > /* Index all files on copy up. For now only enabled for NFS export */
 > bool ovl_index_all(struct super_block *sb)
 > {
 >         struct ovl_fs *ofs = sb->s_fs_info;
 > 
 >         return ofs->config.nfs_export && ofs->config.index;
 > }
 > 
 > > but  if possible I hope having  separated switches for every index functions and a total
 > > switch(index=all) to enable all index functions at same time.
 > >
 > 
 > FYI, index_all stands for "index all modified/deleted lower files (and dirs)"
 > At the moment, the only limitation of nfs_export=on that could be relaxed with
 > index=all is that nfs_export=on is mutually exclusive with metacopy=on.
 > index=all will not have this limitation.
 > 
 > >  >
 > >  > Assuming that you know the total from project quotas and the index counts
 > >  > from overlayfs, you can calculate total pure upper.
 > >  >
 > >  > Now you *can* implement upper inodes quota within overlayfs, but you
 > >  > can also do that without changing overlayfs at all assuming you can
 > >  > allow some slack in quota enforcement -
 > >  > periodically scan the index dir and adjust project quota limits.
 > >
 > > Dynamically changing inode limit  looks  too complicated to implement in management system
 > > and having different quota limit during lifetime for same container may cause confusion to sys admins.
 > > So I still hope to solve this problem on overlayfs layer.
 > >
 > 
 > To me that sounds like shifting complexity from system to kernel
 > for not a good enough reason and with loosing flexibility.
 > You are proposing a heuristic solution anyway because it is inherently
 > not immune against DoS of a malicious container that does rm -rf *.
 > So to me it makes more sense to deal with that logic in container
 > management level, where more heuristics can be applied, for example:
 > Allow to add up to X new files, modify %Y files from lower and
 > delete %Z files from lower.
 > 
 > Note that container management does not have to adjust project quota
 > limits periodically, it only needs to re-calculate and adjust project quota
 > limits when user gets out of quota warning.
 > I believe there are already mechanisms in Linux quota management to
 > notify management software of quota limit expiry in order to take action,
 > but I am not that familiar with those mechanisms.
 > 

If overlayfs could export statistical information(whiteout count, etc.. ) to user space,
it will be very helpful and easy to implement based on the detail information.


Thanks,
cgxu



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Inode limitation for overlayfs
  2020-03-29 14:19       ` Chengguang Xu
@ 2020-03-29 15:06         ` Amir Goldstein
  2020-03-30  2:00           ` Chengguang Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Amir Goldstein @ 2020-03-29 15:06 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: linux-unionfs, miklos, Hou Tao

On Sun, Mar 29, 2020 at 5:19 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期五, 2020-03-27 17:45:37 Amir Goldstein <amir73il@gmail.com> 撰写 ----
>  > On Fri, Mar 27, 2020 at 8:18 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >
>  > >  ---- 在 星期四, 2020-03-26 15:34:13 Amir Goldstein <amir73il@gmail.com> 撰写 ----
>  > >  > On Thu, Mar 26, 2020 at 7:45 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >  > >
>  > >  > > Hello,
>  > >  > >
>  > >  > > On container use case, in order to prevent inode exhaustion on host file system by particular containers,  we would like to add inode limitation for containers.
>  > >  > > However,  current solution for inode limitation is based on project quota in specific underlying filesystem so it will also count deleted files(char type files) in overlay's upper layer.
>  > >  > > Even worse, users may delete some lower layer files for getting more usable free inodes but the result will be opposite (consuming more inodes).
>  > >  > >
>  > >  > > It is somewhat different compare to disk size limitation for overlayfs, so I think maybe we can add a limit option just for new files in overlayfs. What do you think?
>  >
>  > You are saying above that the goal is to prevent inode exhaustion on
>  > host file system,
>  > but you want to allow containers to modify and delete unlimited number
>  > of lower files
>  > thus allowing inode exhaustion. I don't see the logic is that.
>  >
>
> End users do not understand kernel tech very well, so we just want to mitigate
> container's different user experience as much as possible. In our point of view,
> consuming more inode by deleting lower file is the feature of overlayfs, it's not
> caused by user's  abnormal using. However, we have to limit malicious user
> program which is endlessly creating new files until host inode exhausting.
>
>
>  > Even if we only count new files and present this information on df -i
>  > how would users be able to free up inodes when they hit the limit?
>  > How would they know which inodes to delete?
>  >
>  > >  >
>  > >  > The questions are where do we store the accounting and how do we maintain them.
>  > >  > An answer to those questions could be - in the inode index:
>  > >  >
>  > >  > Currently, with nfs_export=on, there is already an index dir containing:
>  > >  > - 1 hardlink per copied up non-dir inode
>  > >  > - 1 directory per copied-up directory
>  > >  > - 1 whiteout per whiteout in upperdir (not an hardlink)
>  > >  >
>  > >
>  > > Hi Amir,
>  > >
>  > > Thanks for quick response and detail information.
>  > >
>  > > I think the simplest way is just store accounting info in memory(maybe  in s_fs_info).
>  > > At very first, I just thought  doing it for container use case, for container, it will be
>  > > enough because the upper layer is always empty at starting time and will be destroyed
>  > > at ending time.
>  >
>  > That is not a concept that overlayfs is currently aware of.
>  > *If* the concept is acceptable and you do implement a feature intended for this
>  > special use case, you should verify on mount time that upperdir is empty.
>  >
>  > >
>  > > Adding a meta info to index dir is a  better solution for general use case but it seems
>  > > more complicated and I'm not sure if there are other use cases concern with this problem.
>  > > Suggestion?
>  >
>  > docker already supports container storage quota using project quotas
>  > on upperdir (I implemented it).
>  > Seems like a very natural extension to also limit no. of inodes.
>  > The problem, as you wrote it above is that project quotas
>  > "will also count deleted files(char type files) in overlay's upper layer."
>  > My suggestion to you was a way to account for the whiteouts separately,
>  > so you may deduct them from total inode count.
>  > If you are saying my suggestion is complicated, perhaps you did not
>  > understand it.
>  >
>
> I think the key point here is the count of whiteout inode. I would like to
> propose share same inode with different whiteout files so that we can save
> inode significantly for whiteout files. After this, I think we can just implement
> normal inode limit for container just like block limit.
>

Very good idea. See:
https://lore.kernel.org/linux-unionfs/20180301064526.17216-1-houtao1@huawei.com/

I don't think Tao ever followed up with v3 patch.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Inode limitation for overlayfs
  2020-03-29 15:06         ` Amir Goldstein
@ 2020-03-30  2:00           ` Chengguang Xu
  0 siblings, 0 replies; 7+ messages in thread
From: Chengguang Xu @ 2020-03-30  2:00 UTC (permalink / raw)
  To: Amir Goldstein, Hou Tao; +Cc: linux-unionfs, miklos

 ---- 在 星期日, 2020-03-29 23:06:42 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Sun, Mar 29, 2020 at 5:19 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期五, 2020-03-27 17:45:37 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > >  > On Fri, Mar 27, 2020 at 8:18 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  > >
 > >  > >  ---- 在 星期四, 2020-03-26 15:34:13 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > >  > >  > On Thu, Mar 26, 2020 at 7:45 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  > >  > >
 > >  > >  > > Hello,
 > >  > >  > >
 > >  > >  > > On container use case, in order to prevent inode exhaustion on host file system by particular containers,  we would like to add inode limitation for containers.
 > >  > >  > > However,  current solution for inode limitation is based on project quota in specific underlying filesystem so it will also count deleted files(char type files) in overlay's upper layer.
 > >  > >  > > Even worse, users may delete some lower layer files for getting more usable free inodes but the result will be opposite (consuming more inodes).
 > >  > >  > >
 > >  > >  > > It is somewhat different compare to disk size limitation for overlayfs, so I think maybe we can add a limit option just for new files in overlayfs. What do you think?
 > >  >
 > >  > You are saying above that the goal is to prevent inode exhaustion on
 > >  > host file system,
 > >  > but you want to allow containers to modify and delete unlimited number
 > >  > of lower files
 > >  > thus allowing inode exhaustion. I don't see the logic is that.
 > >  >
 > >
 > > End users do not understand kernel tech very well, so we just want to mitigate
 > > container's different user experience as much as possible. In our point of view,
 > > consuming more inode by deleting lower file is the feature of overlayfs, it's not
 > > caused by user's  abnormal using. However, we have to limit malicious user
 > > program which is endlessly creating new files until host inode exhausting.
 > >
 > >
 > >  > Even if we only count new files and present this information on df -i
 > >  > how would users be able to free up inodes when they hit the limit?
 > >  > How would they know which inodes to delete?
 > >  >
 > >  > >  >
 > >  > >  > The questions are where do we store the accounting and how do we maintain them.
 > >  > >  > An answer to those questions could be - in the inode index:
 > >  > >  >
 > >  > >  > Currently, with nfs_export=on, there is already an index dir containing:
 > >  > >  > - 1 hardlink per copied up non-dir inode
 > >  > >  > - 1 directory per copied-up directory
 > >  > >  > - 1 whiteout per whiteout in upperdir (not an hardlink)
 > >  > >  >
 > >  > >
 > >  > > Hi Amir,
 > >  > >
 > >  > > Thanks for quick response and detail information.
 > >  > >
 > >  > > I think the simplest way is just store accounting info in memory(maybe  in s_fs_info).
 > >  > > At very first, I just thought  doing it for container use case, for container, it will be
 > >  > > enough because the upper layer is always empty at starting time and will be destroyed
 > >  > > at ending time.
 > >  >
 > >  > That is not a concept that overlayfs is currently aware of.
 > >  > *If* the concept is acceptable and you do implement a feature intended for this
 > >  > special use case, you should verify on mount time that upperdir is empty.
 > >  >
 > >  > >
 > >  > > Adding a meta info to index dir is a  better solution for general use case but it seems
 > >  > > more complicated and I'm not sure if there are other use cases concern with this problem.
 > >  > > Suggestion?
 > >  >
 > >  > docker already supports container storage quota using project quotas
 > >  > on upperdir (I implemented it).
 > >  > Seems like a very natural extension to also limit no. of inodes.
 > >  > The problem, as you wrote it above is that project quotas
 > >  > "will also count deleted files(char type files) in overlay's upper layer."
 > >  > My suggestion to you was a way to account for the whiteouts separately,
 > >  > so you may deduct them from total inode count.
 > >  > If you are saying my suggestion is complicated, perhaps you did not
 > >  > understand it.
 > >  >
 > >
 > > I think the key point here is the count of whiteout inode. I would like to
 > > propose share same inode with different whiteout files so that we can save
 > > inode significantly for whiteout files. After this, I think we can just implement
 > > normal inode limit for container just like block limit.
 > >
 > 
 > Very good idea. See:
 > https://lore.kernel.org/linux-unionfs/20180301064526.17216-1-houtao1@huawei.com/
 > 
 > I don't think Tao ever followed up with v3 patch.
 > 

Thanks Amir, the feature looks exactly what we need.

Hi Tao,

Would you have plan to update the patch "overlay: hardlink all whiteout files into a single one" in the recent future?
I'm trying to fix a issue regarding to inode limitation in container and inode sharing with whiteout files is the key point to support the solution.


Thanks,
cgxu















^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-30  2:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-26  5:45 Inode limitation for overlayfs Chengguang Xu
2020-03-26  7:34 ` Amir Goldstein
2020-03-27  5:18   ` Chengguang Xu
2020-03-27  9:45     ` Amir Goldstein
2020-03-29 14:19       ` Chengguang Xu
2020-03-29 15:06         ` Amir Goldstein
2020-03-30  2:00           ` Chengguang Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).