From: Balbir Singh <balbir@in.ibm.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>,
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Herbert Poetzl <herbert@13thfloor.at>,
containers@lists.osdl.org,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Paul Menage <menage@google.com>, Pavel Emelianov <xemul@sw.ru>,
Dave Hansen <hansendc@us.ibm.com>
Subject: Re: [RFC][PATCH 4/7] RSS accounting hooks over the code
Date: Thu, 15 Mar 2007 11:14:27 +0530 [thread overview]
Message-ID: <45F8DD3B.8070302@in.ibm.com> (raw)
In-Reply-To: <45F8D30F.7050900@yahoo.com.au>
Nick Piggin wrote:
> Kirill Korotaev wrote:
>
>>> The approaches I have seen that don't have a struct page pointer, do
>>> intrusive things like try to put hooks everywhere throughout the kernel
>>> where a userspace task can cause an allocation (and of course end up
>>> missing many, so they aren't secure anyway)... and basically just
>>> nasty stuff that will never get merged.
>>
>>
>> User beancounters patch has got through all these...
>> The approach where each charged object has a pointer to the owner
>> container,
>> who has charged it - is the most easy/clean way to handle
>> all the problems with dynamic context change, races, etc.
>> and 1 pointer in page struct is just 0.1% overehad.
>
> The pointer in struct page approach is a decent one, which I have
> liked since this whole container effort came up. IIRC Linus and Alan
> also thought that was a reasonable way to go.
>
> I haven't reviewed the rest of the beancounters patch since looking
> at it quite a few months ago... I probably don't have time for a
> good review at the moment, but I should eventually.
>
This patch is not really beancounters.
1. It uses the containers framework
2. It is similar to my RSS controller (http://lkml.org/lkml/2007/2/26/8)
I would say that beancounters are changing and evolving.
>>> Struct page overhead really isn't bad. Sure, nobody who doesn't use
>>> containers will want to turn it on, but unless you're using a big PAE
>>> system you're actually unlikely to notice.
>>
>>
>> big PAE doesn't make any difference IMHO
>> (until struct pages are not created for non-present physical memory
>> areas)
>
> The issue is just that struct pages use low memory, which is a really
> scarce commodity on PAE. One more pointer in the struct page means
> 64MB less lowmem.
>
> But PAE is crap anyway. We've already made enough concessions in the
> kernel to support it. I agree: struct page overhead is not really
> significant. The benefits of simplicity seems to outweigh the downside.
>
>>> But again, I'll say the node-container approach of course does avoid
>>> this nicely (because we already can get the node from the page). So
>>> definitely that approach needs to be discredited before going with this
>>> one.
>>
>>
>> But it lacks some other features:
>> 1. page can't be shared easily with another container
>
> I think they could be shared. You allocate _new_ pages from your own
> node, but you can definitely use existing pages allocated to other
> nodes.
>
>> 2. shared page can't be accounted honestly to containers
>> as fraction=PAGE_SIZE/containers-using-it
>
> Yes there would be some accounting differences. I think it is hard
> to say exactly what containers are "using" what page anyway, though.
> What do you say about unmapped pages? Kernel allocations? etc.
>
>> 3. It doesn't help accounting of kernel memory structures.
>> e.g. in OpenVZ we use exactly the same pointer on the page
>> to track which container owns it, e.g. pages used for page
>> tables are accounted this way.
>
> ?
> page_to_nid(page) ~= container that owns it.
>
>> 4. I guess container destroy requires destroy of memory zone,
>> which means write out of dirty data. Which doesn't sound
>> good for me as well.
>
> I haven't looked at any implementation, but I think it is fine for
> the zone to stay around.
>
>> 5. memory reclamation in case of global memory shortage
>> becomes a tricky/unfair task.
>
> I don't understand why? You can much more easily target a specific
> container for reclaim with this approach than with others (because
> you have an lru per container).
>
Yes, but we break the global LRU. With these RSS patches, reclaim not
triggered by containers still uses the global LRU, by using nodes,
we would have lost the global LRU.
>> 6. You cannot overcommit. AFAIU, the memory should be granted
>> to node exclusive usage and cannot be used by by another containers,
>> even if it is unused. This is not an option for us.
>
> I'm not sure about that. If you have a larger number of nodes, then
> you could assign more free nodes to a container on demand. But I
> think there would definitely be less flexibility with nodes...
>
> I don't know... and seeing as I don't really know where the google
> guys are going with it, I won't misrepresent their work any further ;)
>
>
>>> Everyone seems to have a plan ;) I don't read the containers list...
>>> does everyone still have *different* plans, or is any sort of consensus
>>> being reached?
>>
>>
>> hope we'll have it soon :)
>
> Good luck ;)
>
I think we have made some forward progress on the consensus.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
next prev parent reply other threads:[~2007-03-15 5:44 UTC|newest]
Thread overview: 129+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-06 14:42 [RFC][PATCH 0/7] Resource controllers based on process containers Pavel Emelianov
2007-03-06 14:49 ` [RFC][PATCH 1/7] Resource counters Pavel Emelianov
2007-03-07 4:03 ` Balbir Singh
2007-03-07 7:19 ` Pavel Emelianov
2007-03-09 16:37 ` Herbert Poetzl
2007-03-11 9:01 ` Pavel Emelianov
2007-03-11 19:00 ` Eric W. Biederman
2007-03-12 1:16 ` Herbert Poetzl
2007-03-13 9:09 ` Eric W. Biederman
2007-03-13 9:27 ` Pavel Emelianov
2007-03-13 9:49 ` [Devel] " Kirill Korotaev
2007-03-13 15:21 ` Herbert Poetzl
2007-03-13 15:41 ` Pavel Emelianov
2007-03-13 16:07 ` Srivatsa Vaddagiri
2007-03-14 7:12 ` Pavel Emelianov
2007-03-15 16:51 ` Eric W. Biederman
2007-03-13 16:32 ` Herbert Poetzl
2007-03-06 14:55 ` [RFC][PATCH 2/7] RSS controller core Pavel Emelianov
2007-03-06 22:00 ` Andrew Morton
2007-03-09 16:48 ` Herbert Poetzl
2007-03-11 9:08 ` Pavel Emelianov
2007-03-11 14:32 ` Herbert Poetzl
2007-03-11 15:04 ` Pavel Emelianov
2007-03-12 0:41 ` Herbert Poetzl
2007-03-12 8:31 ` Pavel Emelianov
2007-03-12 9:55 ` Balbir Singh
2007-03-12 23:43 ` Herbert Poetzl
2007-03-13 1:57 ` Balbir Singh
2007-03-13 2:24 ` Srivatsa Vaddagiri
2007-03-13 16:06 ` Herbert Poetzl
2007-03-11 12:26 ` Kirill Korotaev
2007-03-11 12:51 ` Andrew Morton
2007-03-11 15:51 ` Balbir Singh
2007-03-11 19:34 ` Eric W. Biederman
2007-03-12 9:23 ` [Devel] " Kirill Korotaev
2007-03-13 9:26 ` Eric W. Biederman
2007-03-13 15:43 ` Kirill Korotaev
2007-03-12 1:00 ` Herbert Poetzl
2007-03-12 9:02 ` Pavel Emelianov
2007-03-12 21:11 ` Herbert Poetzl
2007-03-13 7:17 ` Pavel Emelianov
2007-03-13 15:05 ` Herbert Poetzl
2007-03-13 15:32 ` Pavel Emelianov
2007-03-13 15:10 ` Kirill Korotaev
2007-03-13 15:11 ` Herbert Poetzl
2007-03-13 15:54 ` Kirill Korotaev
2007-03-12 18:42 ` Dave Hansen
2007-03-12 22:41 ` Herbert Poetzl
2007-03-12 23:02 ` Dave Hansen
2007-03-18 16:58 ` Eric W. Biederman
2007-03-13 6:04 ` Andrew Morton
2007-03-13 10:19 ` [Devel] " Kirill Korotaev
2007-03-13 11:48 ` Andrew Morton
2007-03-13 14:59 ` Herbert Poetzl
2007-03-13 17:05 ` Dave Hansen
2007-03-14 15:38 ` Mel Gorman
2007-03-14 20:42 ` Dave Hansen
2007-03-20 18:57 ` Mel Gorman
2007-03-18 22:44 ` [Devel] " Paul Menage
2007-03-19 17:41 ` Eric W. Biederman
2007-03-13 17:26 ` Dave Hansen
2007-03-13 19:09 ` Alan Cox
2007-03-13 20:28 ` Dave Hansen
2007-03-16 0:55 ` Eric W. Biederman
2007-03-16 16:31 ` Dave Hansen
2007-03-16 18:54 ` Eric W. Biederman
2007-03-16 19:46 ` Dave Hansen
2007-03-18 17:42 ` Eric W. Biederman
2007-03-19 15:48 ` Herbert Poetzl
2007-03-20 16:15 ` controlling mmap()'d vs read/write() pages Dave Hansen
2007-03-20 21:19 ` Eric W. Biederman
2007-03-23 0:51 ` Herbert Poetzl
2007-03-23 5:57 ` Nick Piggin
2007-03-23 10:12 ` Eric W. Biederman
2007-03-23 10:47 ` Nick Piggin
2007-03-23 12:21 ` Eric W. Biederman
2007-03-28 7:33 ` Nick Piggin
2007-03-23 16:41 ` Dave Hansen
2007-03-23 18:16 ` Herbert Poetzl
2007-03-28 9:18 ` Balbir Singh
2007-03-14 16:47 ` [RFC][PATCH 2/7] RSS controller core Mel Gorman
2007-03-07 5:37 ` Balbir Singh
2007-03-07 7:27 ` Pavel Emelianov
2007-03-06 14:58 ` [RFC][PATCH 3/7] Data structures changes for RSS accounting Pavel Emelianov
2007-03-11 19:13 ` Eric W. Biederman
2007-03-12 16:16 ` Kirill Korotaev
2007-03-12 16:48 ` Dave Hansen
2007-03-12 17:19 ` Pavel Emelianov
2007-03-12 17:27 ` Dave Hansen
2007-03-13 7:10 ` Pavel Emelianov
2007-03-12 17:21 ` Balbir Singh
2007-03-06 15:00 ` [RFC][PATCH 4/7] RSS accounting hooks over the code Pavel Emelianov
2007-03-11 19:14 ` Eric W. Biederman
2007-03-12 16:23 ` Kirill Korotaev
2007-03-12 16:50 ` Dave Hansen
2007-03-12 17:07 ` Kirill Korotaev
2007-03-12 17:33 ` Dave Hansen
2007-03-13 9:43 ` Eric W. Biederman
2007-03-12 23:54 ` Herbert Poetzl
2007-03-13 9:58 ` Eric W. Biederman
2007-03-13 10:25 ` Nick Piggin
2007-03-13 16:01 ` Eric W. Biederman
2007-03-14 3:51 ` Nick Piggin
2007-03-14 6:42 ` Balbir Singh
2007-03-14 6:57 ` Nick Piggin
2007-03-14 7:48 ` Balbir Singh
2007-03-14 13:25 ` Vaidyanathan Srinivasan
2007-03-14 13:49 ` Nick Piggin
2007-03-14 14:43 ` Vaidyanathan Srinivasan
2007-03-14 16:16 ` Kirill Korotaev
2007-03-15 5:01 ` Nick Piggin
2007-03-15 5:44 ` Balbir Singh [this message]
2007-03-28 20:15 ` Ethan Solomita
2007-03-14 15:37 ` Cedric Le Goater
2007-03-14 15:45 ` Pavel Emelianov
2007-03-06 15:03 ` [RFC][PATCH 5/7] Per-container OOM killer and page reclamation Pavel Emelianov
2007-03-09 21:21 ` Balbir Singh
2007-03-11 8:41 ` Pavel Emelianov
2007-03-06 15:04 ` [RFC][PATCH 6/7] Account for the number of tasks within container Pavel Emelianov
2007-03-07 2:00 ` Paul Menage
2007-03-07 7:13 ` Pavel Emelianov
2007-03-08 13:49 ` Paul Menage
2007-03-11 8:36 ` Pavel Emelianov
2007-03-06 15:07 ` [RFC][PATCH 7/7] Account for the number of files opened " Pavel Emelianov
2007-03-07 2:02 ` [RFC][PATCH 0/7] Resource controllers based on process containers Paul Menage
2007-03-07 7:30 ` Pavel Emelianov
2007-03-07 6:52 ` Balbir Singh
2007-03-07 7:32 ` Pavel Emelianov
2007-03-07 9:43 ` Kirill Korotaev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45F8DD3B.8070302@in.ibm.com \
--to=balbir@in.ibm.com \
--cc=containers@lists.osdl.org \
--cc=dev@sw.ru \
--cc=ebiederm@xmission.com \
--cc=hansendc@us.ibm.com \
--cc=herbert@13thfloor.at \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=nickpiggin@yahoo.com.au \
--cc=svaidy@linux.vnet.ibm.com \
--cc=xemul@sw.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).