All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Thelen <gthelen@google.com>
To: Tejun Heo <tj@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Cgroups <cgroups@vger.kernel.org>,
	"linux-mm\@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>,
	Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	Li Zefan <lizefan@huawei.com>, Hugh Dickins <hughd@google.com>
Subject: Re: [RFC] Making memcg track ownership per address_space or anon_vma
Date: Thu, 05 Feb 2015 14:05:19 -0800	[thread overview]
Message-ID: <xr93siekt3p3.fsf@gthelen.mtv.corp.google.com> (raw)
In-Reply-To: <20150205131514.GD25736@htj.dyndns.org>


On Thu, Feb 05 2015, Tejun Heo wrote:

> Hello, Greg.
>
> On Wed, Feb 04, 2015 at 03:51:01PM -0800, Greg Thelen wrote:
>> I think the linux-next low (and the TBD min) limits also have the
>> problem for more than just the root memcg.  I'm thinking of a 2M file
>> shared between C and D below.  The file will be charged to common parent
>> B.
>> 
>> 	A
>> 	+-B    (usage=2M lim=3M min=2M)
>> 	  +-C  (usage=0  lim=2M min=1M shared_usage=2M)
>> 	  +-D  (usage=0  lim=2M min=1M shared_usage=2M)
>> 	  \-E  (usage=0  lim=2M min=0)
>> 
>> The problem arises if A/B/E allocates more than 1M of private
>> reclaimable file data.  This pushes A/B into reclaim which will reclaim
>> both the shared file from A/B and private file from A/B/E.  In contrast,
>> the current per-page memcg would've protected the shared file in either
>> C or D leaving A/B reclaim to only attack A/B/E.
>> 
>> Pinning the shared file to either C or D, using TBD policy such as mount
>> option, would solve this for tightly shared files.  But for wide fanout
>> file (libc) the admin would need to assign a global bucket and this
>> would be a pain to size due to various job requirements.
>
> Shouldn't we be able to handle it the same way as I proposed for
> handling sharing?  The above would look like
>
>  	A
>  	+-B    (usage=2M lim=3M min=2M hosted_usage=2M)
>  	  +-C  (usage=0  lim=2M min=1M shared_usage=2M)
>  	  +-D  (usage=0  lim=2M min=1M shared_usage=2M)
>  	  \-E  (usage=0  lim=2M min=0)
>
> Now, we don't wanna use B's min verbatim on the hosted inodes shared
> by children but we're unconditionally charging the shared amount to
> all sharing children, which means that we're eating into the min
> settings of all participating children, so, we should be able to use
> sum of all sharing children's min-covered amount as the inode's min,
> which of course is to be contained inside the min of the parent.
>
> Above, we're charging 2M to C and D, each of which has 1M min which is
> being consumed by the shared charge (the shared part won't get
> reclaimed from the internal pressure of children, so we're really
> taking that part away from it).  Summing them up, the shared inode
> would have 2M protection which is honored as long as B as a whole is
> under its 3M limit.  This is similar to creating a dedicated child for
> each shared resource for low limits.  The downside is that we end up
> guarding the shared inodes more than non-shared ones, but, after all,
> we're charging it to everybody who's using it.
>
> Would something like this work?

Maybe, but I want to understand more about how pressure works in the
child.  As C (or D) allocates non shared memory does it perform reclaim
to ensure that its (C.usage + C.shared_usage < C.lim).  Given C's
shared_usage is linked into B.LRU it wouldn't be naturally reclaimable
by C.  Are you thinking that charge failures on cgroups with non zero
shared_usage would, as needed, induce reclaim of parent's hosted_usage?

WARNING: multiple messages have this Message-ID (diff)
From: Greg Thelen <gthelen@google.com>
To: Tejun Heo <tj@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Cgroups <cgroups@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>,
	Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	Li Zefan <lizefan@huawei.com>, Hugh Dickins <hughd@google.com>
Subject: Re: [RFC] Making memcg track ownership per address_space or anon_vma
Date: Thu, 05 Feb 2015 14:05:19 -0800	[thread overview]
Message-ID: <xr93siekt3p3.fsf@gthelen.mtv.corp.google.com> (raw)
In-Reply-To: <20150205131514.GD25736@htj.dyndns.org>


On Thu, Feb 05 2015, Tejun Heo wrote:

> Hello, Greg.
>
> On Wed, Feb 04, 2015 at 03:51:01PM -0800, Greg Thelen wrote:
>> I think the linux-next low (and the TBD min) limits also have the
>> problem for more than just the root memcg.  I'm thinking of a 2M file
>> shared between C and D below.  The file will be charged to common parent
>> B.
>> 
>> 	A
>> 	+-B    (usage=2M lim=3M min=2M)
>> 	  +-C  (usage=0  lim=2M min=1M shared_usage=2M)
>> 	  +-D  (usage=0  lim=2M min=1M shared_usage=2M)
>> 	  \-E  (usage=0  lim=2M min=0)
>> 
>> The problem arises if A/B/E allocates more than 1M of private
>> reclaimable file data.  This pushes A/B into reclaim which will reclaim
>> both the shared file from A/B and private file from A/B/E.  In contrast,
>> the current per-page memcg would've protected the shared file in either
>> C or D leaving A/B reclaim to only attack A/B/E.
>> 
>> Pinning the shared file to either C or D, using TBD policy such as mount
>> option, would solve this for tightly shared files.  But for wide fanout
>> file (libc) the admin would need to assign a global bucket and this
>> would be a pain to size due to various job requirements.
>
> Shouldn't we be able to handle it the same way as I proposed for
> handling sharing?  The above would look like
>
>  	A
>  	+-B    (usage=2M lim=3M min=2M hosted_usage=2M)
>  	  +-C  (usage=0  lim=2M min=1M shared_usage=2M)
>  	  +-D  (usage=0  lim=2M min=1M shared_usage=2M)
>  	  \-E  (usage=0  lim=2M min=0)
>
> Now, we don't wanna use B's min verbatim on the hosted inodes shared
> by children but we're unconditionally charging the shared amount to
> all sharing children, which means that we're eating into the min
> settings of all participating children, so, we should be able to use
> sum of all sharing children's min-covered amount as the inode's min,
> which of course is to be contained inside the min of the parent.
>
> Above, we're charging 2M to C and D, each of which has 1M min which is
> being consumed by the shared charge (the shared part won't get
> reclaimed from the internal pressure of children, so we're really
> taking that part away from it).  Summing them up, the shared inode
> would have 2M protection which is honored as long as B as a whole is
> under its 3M limit.  This is similar to creating a dedicated child for
> each shared resource for low limits.  The downside is that we end up
> guarding the shared inodes more than non-shared ones, but, after all,
> we're charging it to everybody who's using it.
>
> Would something like this work?

Maybe, but I want to understand more about how pressure works in the
child.  As C (or D) allocates non shared memory does it perform reclaim
to ensure that its (C.usage + C.shared_usage < C.lim).  Given C's
shared_usage is linked into B.LRU it wouldn't be naturally reclaimable
by C.  Are you thinking that charge failures on cgroups with non zero
shared_usage would, as needed, induce reclaim of parent's hosted_usage?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-02-05 22:05 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-30  4:43 [RFC] Making memcg track ownership per address_space or anon_vma Tejun Heo
2015-01-30  4:43 ` Tejun Heo
2015-01-30  5:55 ` Greg Thelen
2015-01-30  5:55   ` Greg Thelen
2015-01-30  6:27   ` Tejun Heo
2015-01-30  6:27     ` Tejun Heo
2015-01-30 16:07     ` Tejun Heo
2015-01-30 16:07       ` Tejun Heo
2015-01-30 16:07       ` Tejun Heo
2015-02-02 19:26       ` Konstantin Khlebnikov
2015-02-02 19:26         ` Konstantin Khlebnikov
2015-02-02 19:46         ` Tejun Heo
2015-02-02 19:46           ` Tejun Heo
2015-02-03 23:30           ` Greg Thelen
2015-02-03 23:30             ` Greg Thelen
2015-02-04 10:49             ` Konstantin Khlebnikov
2015-02-04 10:49               ` Konstantin Khlebnikov
2015-02-04 17:15               ` Tejun Heo
2015-02-04 17:15                 ` Tejun Heo
2015-02-04 17:58                 ` Konstantin Khlebnikov
2015-02-04 17:58                   ` Konstantin Khlebnikov
2015-02-04 18:28                   ` Tejun Heo
2015-02-04 18:28                     ` Tejun Heo
2015-02-04 18:28                     ` Tejun Heo
2015-02-04 17:06             ` Tejun Heo
2015-02-04 17:06               ` Tejun Heo
2015-02-04 23:51               ` Greg Thelen
2015-02-04 23:51                 ` Greg Thelen
2015-02-04 23:51                 ` Greg Thelen
2015-02-05 13:15                 ` Tejun Heo
2015-02-05 13:15                   ` Tejun Heo
2015-02-05 22:05                   ` Greg Thelen [this message]
2015-02-05 22:05                     ` Greg Thelen
2015-02-05 22:25                     ` Tejun Heo
2015-02-05 22:25                       ` Tejun Heo
2015-02-05 22:25                       ` Tejun Heo
2015-02-06  0:03                       ` Greg Thelen
2015-02-06  0:03                         ` Greg Thelen
2015-02-06 14:17                         ` Tejun Heo
2015-02-06 14:17                           ` Tejun Heo
2015-02-06 23:43                           ` Greg Thelen
2015-02-06 23:43                             ` Greg Thelen
2015-02-07 14:38                             ` Tejun Heo
2015-02-07 14:38                               ` Tejun Heo
2015-02-07 14:38                               ` Tejun Heo
2015-02-11  2:19                               ` Tejun Heo
2015-02-11  2:19                                 ` Tejun Heo
2015-02-11  2:19                                 ` Tejun Heo
2015-02-11  7:32                                 ` Jan Kara
2015-02-11  7:32                                   ` Jan Kara
2015-02-11  7:32                                   ` Jan Kara
2015-02-11 18:28                                 ` Greg Thelen
2015-02-11 18:28                                   ` Greg Thelen
2015-02-11 18:28                                   ` Greg Thelen
2015-02-11 20:33                                   ` Tejun Heo
2015-02-11 20:33                                     ` Tejun Heo
2015-02-11 21:22                                     ` Konstantin Khlebnikov
2015-02-11 21:22                                       ` Konstantin Khlebnikov
2015-02-11 21:22                                       ` Konstantin Khlebnikov
2015-02-11 21:46                                       ` Tejun Heo
2015-02-11 21:46                                         ` Tejun Heo
2015-02-11 21:57                                         ` Konstantin Khlebnikov
2015-02-11 21:57                                           ` Konstantin Khlebnikov
2015-02-11 21:57                                           ` Konstantin Khlebnikov
2015-02-11 22:05                                           ` Tejun Heo
2015-02-11 22:05                                             ` Tejun Heo
2015-02-11 22:05                                             ` Tejun Heo
2015-02-11 22:15                                             ` Konstantin Khlebnikov
2015-02-11 22:15                                               ` Konstantin Khlebnikov
2015-02-11 22:15                                               ` Konstantin Khlebnikov
2015-02-11 22:30                                               ` Tejun Heo
2015-02-11 22:30                                                 ` Tejun Heo
2015-02-12  2:10                                     ` Greg Thelen
2015-02-12  2:10                                       ` Greg Thelen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xr93siekt3p3.fsf@gthelen.mtv.corp.google.com \
    --to=gthelen@google.com \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.