All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Thelen <gthelen@google.com>
To: Tejun Heo <tj@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Cgroups <cgroups@vger.kernel.org>,
	"linux-mm\@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>,
	Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	Li Zefan <lizefan@huawei.com>, Hugh Dickins <hughd@google.com>
Subject: Re: [RFC] Making memcg track ownership per address_space or anon_vma
Date: Thu, 05 Feb 2015 16:03:34 -0800	[thread overview]
Message-ID: <xr93pp9nucrt.fsf@gthelen.mtv.corp.google.com> (raw)
In-Reply-To: <20150205222522.GA10580@htj.dyndns.org>


On Thu, Feb 05 2015, Tejun Heo wrote:

> Hey,
>
> On Thu, Feb 05, 2015 at 02:05:19PM -0800, Greg Thelen wrote:
>> >  	A
>> >  	+-B    (usage=2M lim=3M min=2M hosted_usage=2M)
>> >  	  +-C  (usage=0  lim=2M min=1M shared_usage=2M)
>> >  	  +-D  (usage=0  lim=2M min=1M shared_usage=2M)
>> >  	  \-E  (usage=0  lim=2M min=0)
> ...
>> Maybe, but I want to understand more about how pressure works in the
>> child.  As C (or D) allocates non shared memory does it perform reclaim
>> to ensure that its (C.usage + C.shared_usage < C.lim).  Given C's
>
> Yes.
>
>> shared_usage is linked into B.LRU it wouldn't be naturally reclaimable
>> by C.  Are you thinking that charge failures on cgroups with non zero
>> shared_usage would, as needed, induce reclaim of parent's hosted_usage?
>
> Hmmm.... I'm not really sure but why not?  If we properly account for
> the low protection when pushing inodes to the parent, I don't think
> it'd break anything.  IOW, allow the amount beyond the sum of low
> limits to be reclaimed when one of the sharers is under pressure.
>
> Thanks.

I'm not saying that it'd break anything.  I think it's required that
children perform reclaim on shared data hosted in the parent.  The child
is limited by shared_usage, so it needs ability to reclaim it.  So I
think we're in agreement.  Child will reclaim parent's hosted_usage when
the child is charged for shared_usage.  Ideally the only parental memory
reclaimed in this situation would be shared.  But I think (though I
can't claim to have followed the new memcg philosophy discussions) that
internal nodes in the cgroup tree (i.e. parents) do not have any
resources charged directly to them.  All resources are charged to leaf
cgroups which linger until resources are uncharged.  Thus the LRUs of
parent will only contain hosted (shared) memory.  This thankfully focus
parental reclaim easy on shared pages.  Child pressure will,
unfortunately, reclaim shared pages used by any container.  But if
shared pages were charged all sharing containers, then it will help
relieve pressure in the caller.

So  this is  a system  which charges  all cgroups  using a  shared inode
(recharge on read) for all resident pages of that shared inode.  There's
only one copy of the page in memory on just one LRU, but the page may be
charged to multiple container's (shared_)usage.

Perhaps I missed it, but what happens when a child's limit is
insufficient to accept all pages shared by its siblings?  Example
starting with 2M cached of a shared file:

	A
	+-B    (usage=2M lim=3M hosted_usage=2M)
	  +-C  (usage=0  lim=2M shared_usage=2M)
	  +-D  (usage=0  lim=2M shared_usage=2M)
	  \-E  (usage=0  lim=1M shared_usage=0)

If E faults in a new 4K page within the shared file, then E is a sharing
participant so it'd be charged the 2M+4K, which pushes E over it's
limit.

WARNING: multiple messages have this Message-ID (diff)
From: Greg Thelen <gthelen@google.com>
To: Tejun Heo <tj@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Cgroups <cgroups@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>,
	Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	Li Zefan <lizefan@huawei.com>, Hugh Dickins <hughd@google.com>
Subject: Re: [RFC] Making memcg track ownership per address_space or anon_vma
Date: Thu, 05 Feb 2015 16:03:34 -0800	[thread overview]
Message-ID: <xr93pp9nucrt.fsf@gthelen.mtv.corp.google.com> (raw)
In-Reply-To: <20150205222522.GA10580@htj.dyndns.org>


On Thu, Feb 05 2015, Tejun Heo wrote:

> Hey,
>
> On Thu, Feb 05, 2015 at 02:05:19PM -0800, Greg Thelen wrote:
>> >  	A
>> >  	+-B    (usage=2M lim=3M min=2M hosted_usage=2M)
>> >  	  +-C  (usage=0  lim=2M min=1M shared_usage=2M)
>> >  	  +-D  (usage=0  lim=2M min=1M shared_usage=2M)
>> >  	  \-E  (usage=0  lim=2M min=0)
> ...
>> Maybe, but I want to understand more about how pressure works in the
>> child.  As C (or D) allocates non shared memory does it perform reclaim
>> to ensure that its (C.usage + C.shared_usage < C.lim).  Given C's
>
> Yes.
>
>> shared_usage is linked into B.LRU it wouldn't be naturally reclaimable
>> by C.  Are you thinking that charge failures on cgroups with non zero
>> shared_usage would, as needed, induce reclaim of parent's hosted_usage?
>
> Hmmm.... I'm not really sure but why not?  If we properly account for
> the low protection when pushing inodes to the parent, I don't think
> it'd break anything.  IOW, allow the amount beyond the sum of low
> limits to be reclaimed when one of the sharers is under pressure.
>
> Thanks.

I'm not saying that it'd break anything.  I think it's required that
children perform reclaim on shared data hosted in the parent.  The child
is limited by shared_usage, so it needs ability to reclaim it.  So I
think we're in agreement.  Child will reclaim parent's hosted_usage when
the child is charged for shared_usage.  Ideally the only parental memory
reclaimed in this situation would be shared.  But I think (though I
can't claim to have followed the new memcg philosophy discussions) that
internal nodes in the cgroup tree (i.e. parents) do not have any
resources charged directly to them.  All resources are charged to leaf
cgroups which linger until resources are uncharged.  Thus the LRUs of
parent will only contain hosted (shared) memory.  This thankfully focus
parental reclaim easy on shared pages.  Child pressure will,
unfortunately, reclaim shared pages used by any container.  But if
shared pages were charged all sharing containers, then it will help
relieve pressure in the caller.

So  this is  a system  which charges  all cgroups  using a  shared inode
(recharge on read) for all resident pages of that shared inode.  There's
only one copy of the page in memory on just one LRU, but the page may be
charged to multiple container's (shared_)usage.

Perhaps I missed it, but what happens when a child's limit is
insufficient to accept all pages shared by its siblings?  Example
starting with 2M cached of a shared file:

	A
	+-B    (usage=2M lim=3M hosted_usage=2M)
	  +-C  (usage=0  lim=2M shared_usage=2M)
	  +-D  (usage=0  lim=2M shared_usage=2M)
	  \-E  (usage=0  lim=1M shared_usage=0)

If E faults in a new 4K page within the shared file, then E is a sharing
participant so it'd be charged the 2M+4K, which pushes E over it's
limit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-02-06  0:03 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-30  4:43 [RFC] Making memcg track ownership per address_space or anon_vma Tejun Heo
2015-01-30  4:43 ` Tejun Heo
2015-01-30  5:55 ` Greg Thelen
2015-01-30  5:55   ` Greg Thelen
2015-01-30  6:27   ` Tejun Heo
2015-01-30  6:27     ` Tejun Heo
2015-01-30 16:07     ` Tejun Heo
2015-01-30 16:07       ` Tejun Heo
2015-01-30 16:07       ` Tejun Heo
2015-02-02 19:26       ` Konstantin Khlebnikov
2015-02-02 19:26         ` Konstantin Khlebnikov
2015-02-02 19:46         ` Tejun Heo
2015-02-02 19:46           ` Tejun Heo
2015-02-03 23:30           ` Greg Thelen
2015-02-03 23:30             ` Greg Thelen
2015-02-04 10:49             ` Konstantin Khlebnikov
2015-02-04 10:49               ` Konstantin Khlebnikov
2015-02-04 17:15               ` Tejun Heo
2015-02-04 17:15                 ` Tejun Heo
2015-02-04 17:58                 ` Konstantin Khlebnikov
2015-02-04 17:58                   ` Konstantin Khlebnikov
2015-02-04 18:28                   ` Tejun Heo
2015-02-04 18:28                     ` Tejun Heo
2015-02-04 18:28                     ` Tejun Heo
2015-02-04 17:06             ` Tejun Heo
2015-02-04 17:06               ` Tejun Heo
2015-02-04 23:51               ` Greg Thelen
2015-02-04 23:51                 ` Greg Thelen
2015-02-04 23:51                 ` Greg Thelen
2015-02-05 13:15                 ` Tejun Heo
2015-02-05 13:15                   ` Tejun Heo
2015-02-05 22:05                   ` Greg Thelen
2015-02-05 22:05                     ` Greg Thelen
2015-02-05 22:25                     ` Tejun Heo
2015-02-05 22:25                       ` Tejun Heo
2015-02-05 22:25                       ` Tejun Heo
2015-02-06  0:03                       ` Greg Thelen [this message]
2015-02-06  0:03                         ` Greg Thelen
2015-02-06 14:17                         ` Tejun Heo
2015-02-06 14:17                           ` Tejun Heo
2015-02-06 23:43                           ` Greg Thelen
2015-02-06 23:43                             ` Greg Thelen
2015-02-07 14:38                             ` Tejun Heo
2015-02-07 14:38                               ` Tejun Heo
2015-02-07 14:38                               ` Tejun Heo
2015-02-11  2:19                               ` Tejun Heo
2015-02-11  2:19                                 ` Tejun Heo
2015-02-11  2:19                                 ` Tejun Heo
2015-02-11  7:32                                 ` Jan Kara
2015-02-11  7:32                                   ` Jan Kara
2015-02-11  7:32                                   ` Jan Kara
2015-02-11 18:28                                 ` Greg Thelen
2015-02-11 18:28                                   ` Greg Thelen
2015-02-11 18:28                                   ` Greg Thelen
2015-02-11 20:33                                   ` Tejun Heo
2015-02-11 20:33                                     ` Tejun Heo
2015-02-11 21:22                                     ` Konstantin Khlebnikov
2015-02-11 21:22                                       ` Konstantin Khlebnikov
2015-02-11 21:22                                       ` Konstantin Khlebnikov
2015-02-11 21:46                                       ` Tejun Heo
2015-02-11 21:46                                         ` Tejun Heo
2015-02-11 21:57                                         ` Konstantin Khlebnikov
2015-02-11 21:57                                           ` Konstantin Khlebnikov
2015-02-11 21:57                                           ` Konstantin Khlebnikov
2015-02-11 22:05                                           ` Tejun Heo
2015-02-11 22:05                                             ` Tejun Heo
2015-02-11 22:05                                             ` Tejun Heo
2015-02-11 22:15                                             ` Konstantin Khlebnikov
2015-02-11 22:15                                               ` Konstantin Khlebnikov
2015-02-11 22:15                                               ` Konstantin Khlebnikov
2015-02-11 22:30                                               ` Tejun Heo
2015-02-11 22:30                                                 ` Tejun Heo
2015-02-12  2:10                                     ` Greg Thelen
2015-02-12  2:10                                       ` Greg Thelen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xr93pp9nucrt.fsf@gthelen.mtv.corp.google.com \
    --to=gthelen@google.com \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.