All of lore.kernel.org
 help / color / mirror / Atom feed
From: Balbir Singh <balbir@in.ibm.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	Herbert Poetzl <herbert@13thfloor.at>,
	Kirill Korotaev <dev@sw.ru>,
	containers@lists.osdl.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Paul Menage <menage@google.com>, Pavel Emelianov <xemul@sw.ru>,
	Dave Hansen <hansendc@us.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Subject: Re: [RFC][PATCH 4/7] RSS accounting hooks over the code
Date: Wed, 14 Mar 2007 13:18:35 +0530	[thread overview]
Message-ID: <45F7A8D3.3010109@in.ibm.com> (raw)
In-Reply-To: <45F79CF3.1090302@yahoo.com.au>

Nick Piggin wrote:
> Balbir Singh wrote:
>> Nick Piggin wrote:
> 
>>> And strangely, this example does not go outside the parameters of
>>> what you asked for AFAIKS. In the worst case of one container getting
>>> _all_ the shared pages, they will still remain inside their maximum
>>> rss limit.
>>>
>>
>> When that does happen and if a container hits it limit, with a LRU
>> per-container, if the container is not actually using those pages,
>> they'll get thrown out of that container and get mapped into the
>> container that is using those pages most frequently.
> 
> Exactly. Statistically, first touch will work OK. It may mean some
> reclaim inefficiencies in corner cases, but things will tend to
> even out.
> 

Exactly!

>>> So they might get penalised a bit on reclaim, but maximum rss limits
>>> will work fine, and you can (almost) guarantee X amount of memory for
>>> a given container, and it will _work_.
>>>
>>> But I also take back my comments about this being the only design I
>>> have seen that gets everything, because the node-per-container idea
>>> is a really good one on the surface. And it could mean even less impact
>>> on the core VM than this patch. That is also a first-touch scheme.
>>>
>>
>> With the proposed node-per-container, we will need to make massive core
>> VM changes to reorganize zones and nodes. We would want to allow
>>
>> 1. For sharing of nodes
>> 2. Resizing nodes
>> 3. May be more
> 
> But a lot of that is happening anyway for other reasons (eg. memory
> plug/unplug). And I don't consider node/zone setup to be part of the
> "core VM" as such... it is _good_ if we can move extra work into setup
> rather than have it in the mm.
> 
> That said, I don't think this patch is terribly intrusive either.
> 

Thanks, thats one of our goals, to keep it simple, understandable and
non-intrusive.

> 
>> With the node-per-container idea, it will hard to control page cache
>> limits, independent of RSS limits or mlock limits.
>>
>> NOTE: page cache == unmapped page cache here.
> 
> I don't know that it would be particularly harder than any other
> first-touch scheme. If one container ends up being charged with too
> much pagecache, eventually they'll reclaim a bit of it and the pages
> will get charged to more frequent users.
> 
> 

Yes, true, but what if a user does not want to control the page
cache usage in a particular container or wants to turn off
RSS control.

>>>> However the messed up accounting that doesn't handle sharing between
>>>> groups of processes properly really bugs me.  Especially when we have
>>>> the infrastructure to do it right.
>>>>
>>>> Does that make more sense?
>>>
>>>
>>> I think it is simplistic.
>>>
>>> Sure you could probably use some of the rmap stuff to account shared
>>> mapped _user_ pages once for each container that touches them. And
>>> this patchset isn't preventing that.
>>>
>>> But how do you account kernel allocations? How do you account unmapped
>>> pagecache?
>>>
>>> What's the big deal so many accounting people have with just RSS? I'm
>>> not a container person, this is an honest question. Because from my
>>> POV if you conveniently ignore everything else... you may as well just
>>> not do any accounting at all.
>>>
>>
>> We decided to implement accounting and control in phases
>>
>> 1. RSS control
>> 2. unmapped page cache control
>> 3. mlock control
>> 4. Kernel accounting and limits
>>
>> This has several advantages
>>
>> 1. The limits can be individually set and controlled.
>> 2. The code is broken down into simpler chunks for review and merging.
> 
> But this patch gives the groundwork to handle 1-4, and it is in a small
> chunk, and one would be able to apply different limits to different types
> of pages with it. Just using rmap to handle 1 does not really seem like a
> viable alternative because it fundamentally isn't going to handle 2 or 4.
> 

For (2), we have the basic setup in the form of a per-container LRU list
and a pointer from struct page to the container that first brought in
the page.

> I'm not saying that you couldn't _later_ add something that uses rmap or
> our current RSS accounting to tweak container-RSS semantics. But isn't it
> sensible to lay the groundwork first? Get a clear path to something that
> is good (not perfect), but *works*?
> 

I agree with your development model suggestion. One of things we are going 
to do in the near future is to build (2) and then add (3) and (4). So far,
we've not encountered any difficulties on building on top of (1).

Vaidy, any comments?

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

  reply	other threads:[~2007-03-14  8:37 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-06 14:42 [RFC][PATCH 0/7] Resource controllers based on process containers Pavel Emelianov
2007-03-06 14:49 ` [RFC][PATCH 1/7] Resource counters Pavel Emelianov
2007-03-07  4:03   ` Balbir Singh
2007-03-07  7:19     ` Pavel Emelianov
2007-03-09 16:37       ` Herbert Poetzl
2007-03-11  9:01         ` Pavel Emelianov
2007-03-11 19:00         ` Eric W. Biederman
2007-03-12  1:16           ` Herbert Poetzl
2007-03-13  9:09             ` Eric W. Biederman
2007-03-13  9:27               ` Pavel Emelianov
2007-03-13  9:49               ` [Devel] " Kirill Korotaev
2007-03-13 15:21               ` Herbert Poetzl
2007-03-13 15:41                 ` Pavel Emelianov
2007-03-13 16:07                   ` Srivatsa Vaddagiri
2007-03-14  7:12                     ` Pavel Emelianov
2007-03-15 16:51                       ` Eric W. Biederman
2007-03-13 16:32                   ` Herbert Poetzl
2007-03-06 14:55 ` [RFC][PATCH 2/7] RSS controller core Pavel Emelianov
2007-03-06 22:00   ` Andrew Morton
2007-03-09 16:48     ` Herbert Poetzl
2007-03-11  9:08       ` Pavel Emelianov
2007-03-11 14:32         ` Herbert Poetzl
2007-03-11 15:04           ` Pavel Emelianov
2007-03-12  0:41             ` Herbert Poetzl
2007-03-12  8:31               ` Pavel Emelianov
2007-03-12  9:55       ` Balbir Singh
2007-03-12 23:43         ` Herbert Poetzl
2007-03-13  1:57           ` Balbir Singh
2007-03-13  2:24             ` Srivatsa Vaddagiri
2007-03-13 16:06             ` Herbert Poetzl
2007-03-11 12:26     ` Kirill Korotaev
2007-03-11 12:51       ` Andrew Morton
2007-03-11 15:51         ` Balbir Singh
2007-03-11 19:34         ` Eric W. Biederman
2007-03-12  9:23           ` [Devel] " Kirill Korotaev
2007-03-13  9:26             ` Eric W. Biederman
2007-03-13 15:43               ` Kirill Korotaev
2007-03-12  1:00         ` Herbert Poetzl
2007-03-12  9:02           ` Pavel Emelianov
2007-03-12 21:11             ` Herbert Poetzl
2007-03-13  7:17               ` Pavel Emelianov
2007-03-13 15:05                 ` Herbert Poetzl
2007-03-13 15:32                   ` Pavel Emelianov
2007-03-13 15:10               ` Kirill Korotaev
2007-03-13 15:11                 ` Herbert Poetzl
2007-03-13 15:54                   ` Kirill Korotaev
2007-03-12 18:42           ` Dave Hansen
2007-03-12 22:41             ` Herbert Poetzl
2007-03-12 23:02               ` Dave Hansen
2007-03-18 16:58                 ` Eric W. Biederman
2007-03-13  6:04               ` Andrew Morton
2007-03-13 10:19                 ` [Devel] " Kirill Korotaev
2007-03-13 11:48                   ` Andrew Morton
2007-03-13 14:59                     ` Herbert Poetzl
2007-03-13 17:05                     ` Dave Hansen
2007-03-14 15:38                       ` Mel Gorman
2007-03-14 20:42                         ` Dave Hansen
2007-03-20 18:57                           ` Mel Gorman
2007-03-18 22:44                       ` [Devel] " Paul Menage
2007-03-19 17:41                         ` Eric W. Biederman
2007-03-13 17:26                 ` Dave Hansen
2007-03-13 19:09                   ` Alan Cox
2007-03-13 20:28                     ` Dave Hansen
2007-03-16  0:55                     ` Eric W. Biederman
2007-03-16 16:31                       ` Dave Hansen
2007-03-16 18:54                         ` Eric W. Biederman
2007-03-16 19:46                           ` Dave Hansen
2007-03-18 17:42                             ` Eric W. Biederman
2007-03-19 15:48                               ` Herbert Poetzl
2007-03-20 16:15                               ` controlling mmap()'d vs read/write() pages Dave Hansen
2007-03-20 21:19                                 ` Eric W. Biederman
2007-03-23  0:51                                   ` Herbert Poetzl
2007-03-23  5:57                                   ` Nick Piggin
2007-03-23 10:12                                     ` Eric W. Biederman
2007-03-23 10:47                                       ` Nick Piggin
2007-03-23 12:21                                         ` Eric W. Biederman
2007-03-28  7:33                                           ` Nick Piggin
2007-03-23 16:41                                       ` Dave Hansen
2007-03-23 18:16                                         ` Herbert Poetzl
2007-03-28  9:18                                           ` Balbir Singh
2007-03-14 16:47                   ` [RFC][PATCH 2/7] RSS controller core Mel Gorman
2007-03-07  5:37   ` Balbir Singh
2007-03-07  7:27     ` Pavel Emelianov
2007-03-06 14:58 ` [RFC][PATCH 3/7] Data structures changes for RSS accounting Pavel Emelianov
2007-03-11 19:13   ` Eric W. Biederman
2007-03-12 16:16     ` Kirill Korotaev
2007-03-12 16:48       ` Dave Hansen
2007-03-12 17:19         ` Pavel Emelianov
2007-03-12 17:27           ` Dave Hansen
2007-03-13  7:10             ` Pavel Emelianov
2007-03-12 17:21         ` Balbir Singh
2007-03-06 15:00 ` [RFC][PATCH 4/7] RSS accounting hooks over the code Pavel Emelianov
2007-03-11 19:14   ` Eric W. Biederman
2007-03-12 16:23     ` Kirill Korotaev
2007-03-12 16:50       ` Dave Hansen
2007-03-12 17:07         ` Kirill Korotaev
2007-03-12 17:33           ` Dave Hansen
2007-03-13  9:43             ` Eric W. Biederman
2007-03-12 23:54         ` Herbert Poetzl
2007-03-13  9:58           ` Eric W. Biederman
2007-03-13 10:25             ` Nick Piggin
2007-03-13 16:01               ` Eric W. Biederman
2007-03-14  3:51                 ` Nick Piggin
2007-03-14  6:42                   ` Balbir Singh
2007-03-14  6:57                     ` Nick Piggin
2007-03-14  7:48                       ` Balbir Singh [this message]
2007-03-14 13:25                         ` Vaidyanathan Srinivasan
2007-03-14 13:49                           ` Nick Piggin
2007-03-14 14:43                             ` Vaidyanathan Srinivasan
2007-03-14 16:16                             ` Kirill Korotaev
2007-03-15  5:01                               ` Nick Piggin
2007-03-15  5:44                                 ` Balbir Singh
2007-03-28 20:15               ` Ethan Solomita
2007-03-14 15:37   ` Cedric Le Goater
2007-03-14 15:45     ` Pavel Emelianov
2007-03-06 15:03 ` [RFC][PATCH 5/7] Per-container OOM killer and page reclamation Pavel Emelianov
2007-03-09 21:21   ` Balbir Singh
2007-03-11  8:41     ` Pavel Emelianov
2007-03-06 15:04 ` [RFC][PATCH 6/7] Account for the number of tasks within container Pavel Emelianov
2007-03-07  2:00   ` Paul Menage
2007-03-07  7:13     ` Pavel Emelianov
2007-03-08 13:49       ` Paul Menage
2007-03-11  8:36         ` Pavel Emelianov
2007-03-06 15:07 ` [RFC][PATCH 7/7] Account for the number of files opened " Pavel Emelianov
2007-03-07  2:02 ` [RFC][PATCH 0/7] Resource controllers based on process containers Paul Menage
2007-03-07  7:30   ` Pavel Emelianov
2007-03-07  6:52 ` Balbir Singh
2007-03-07  7:32   ` Pavel Emelianov
2007-03-07  9:43     ` Kirill Korotaev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45F7A8D3.3010109@in.ibm.com \
    --to=balbir@in.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=dev@sw.ru \
    --cc=ebiederm@xmission.com \
    --cc=hansendc@us.ibm.com \
    --cc=herbert@13thfloor.at \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=xemul@sw.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.