linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Balbir Singh <balbir@in.ibm.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Herbert Poetzl <herbert@13thfloor.at>,
	containers@lists.osdl.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Paul Menage <menage@google.com>, Pavel Emelianov <xemul@sw.ru>,
	Dave Hansen <hansendc@us.ibm.com>
Subject: Re: [RFC][PATCH 4/7] RSS accounting hooks over the code
Date: Thu, 15 Mar 2007 11:14:27 +0530	[thread overview]
Message-ID: <45F8DD3B.8070302@in.ibm.com> (raw)
In-Reply-To: <45F8D30F.7050900@yahoo.com.au>

Nick Piggin wrote:
> Kirill Korotaev wrote:
> 
>>> The approaches I have seen that don't have a struct page pointer, do
>>> intrusive things like try to put hooks everywhere throughout the kernel
>>> where a userspace task can cause an allocation (and of course end up
>>> missing many, so they aren't secure anyway)... and basically just
>>> nasty stuff that will never get merged.
>>
>>
>> User beancounters patch has got through all these...
>> The approach where each charged object has a pointer to the owner 
>> container,
>> who has charged it - is the most easy/clean way to handle
>> all the problems with dynamic context change, races, etc.
>> and 1 pointer in page struct is just 0.1% overehad.
> 
> The pointer in struct page approach is a decent one, which I have
> liked since this whole container effort came up. IIRC Linus and Alan
> also thought that was a reasonable way to go.
> 
> I haven't reviewed the rest of the beancounters patch since looking
> at it quite a few months ago... I probably don't have time for a
> good review at the moment, but I should eventually.
> 

This patch is not really beancounters.

1. It uses the containers framework
2. It is similar to my RSS controller (http://lkml.org/lkml/2007/2/26/8)

I would say that beancounters are changing and evolving.

>>> Struct page overhead really isn't bad. Sure, nobody who doesn't use
>>> containers will want to turn it on, but unless you're using a big PAE
>>> system you're actually unlikely to notice.
>>
>>
>> big PAE doesn't make any difference IMHO
>> (until struct pages are not created for non-present physical memory 
>> areas)
> 
> The issue is just that struct pages use low memory, which is a really
> scarce commodity on PAE. One more pointer in the struct page means
> 64MB less lowmem.
> 
> But PAE is crap anyway. We've already made enough concessions in the
> kernel to support it. I agree: struct page overhead is not really
> significant. The benefits of simplicity seems to outweigh the downside.
> 
>>> But again, I'll say the node-container approach of course does avoid
>>> this nicely (because we already can get the node from the page). So
>>> definitely that approach needs to be discredited before going with this
>>> one.
>>
>>
>> But it lacks some other features:
>> 1. page can't be shared easily with another container
> 
> I think they could be shared. You allocate _new_ pages from your own
> node, but you can definitely use existing pages allocated to other
> nodes.
> 
>> 2. shared page can't be accounted honestly to containers
>>    as fraction=PAGE_SIZE/containers-using-it
> 
> Yes there would be some accounting differences. I think it is hard
> to say exactly what containers are "using" what page anyway, though.
> What do you say about unmapped pages? Kernel allocations? etc.
> 
>> 3. It doesn't help accounting of kernel memory structures.
>>    e.g. in OpenVZ we use exactly the same pointer on the page
>>    to track which container owns it, e.g. pages used for page
>>    tables are accounted this way.
> 
> ?
> page_to_nid(page) ~= container that owns it.
> 
>> 4. I guess container destroy requires destroy of memory zone,
>>    which means write out of dirty data. Which doesn't sound
>>    good for me as well.
> 
> I haven't looked at any implementation, but I think it is fine for
> the zone to stay around.
> 
>> 5. memory reclamation in case of global memory shortage
>>    becomes a tricky/unfair task.
> 
> I don't understand why? You can much more easily target a specific
> container for reclaim with this approach than with others (because
> you have an lru per container).
> 

Yes, but we break the global LRU. With these RSS patches, reclaim not
triggered by containers still uses the global LRU, by using nodes,
we would have lost the global LRU.

>> 6. You cannot overcommit. AFAIU, the memory should be granted
>>    to node exclusive usage and cannot be used by by another containers,
>>    even if it is unused. This is not an option for us.
> 
> I'm not sure about that. If you have a larger number of nodes, then
> you could assign more free nodes to a container on demand. But I
> think there would definitely be less flexibility with nodes...
> 
> I don't know... and seeing as I don't really know where the google
> guys are going with it, I won't misrepresent their work any further ;)
> 
> 
>>> Everyone seems to have a plan ;) I don't read the containers list...
>>> does everyone still have *different* plans, or is any sort of consensus
>>> being reached?
>>
>>
>> hope we'll have it soon :)
> 
> Good luck ;)
> 

I think we have made some forward progress on the consensus.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

  reply	other threads:[~2007-03-15  5:44 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-06 14:42 [RFC][PATCH 0/7] Resource controllers based on process containers Pavel Emelianov
2007-03-06 14:49 ` [RFC][PATCH 1/7] Resource counters Pavel Emelianov
2007-03-07  4:03   ` Balbir Singh
2007-03-07  7:19     ` Pavel Emelianov
2007-03-09 16:37       ` Herbert Poetzl
2007-03-11  9:01         ` Pavel Emelianov
2007-03-11 19:00         ` Eric W. Biederman
2007-03-12  1:16           ` Herbert Poetzl
2007-03-13  9:09             ` Eric W. Biederman
2007-03-13  9:27               ` Pavel Emelianov
2007-03-13  9:49               ` [Devel] " Kirill Korotaev
2007-03-13 15:21               ` Herbert Poetzl
2007-03-13 15:41                 ` Pavel Emelianov
2007-03-13 16:07                   ` Srivatsa Vaddagiri
2007-03-14  7:12                     ` Pavel Emelianov
2007-03-15 16:51                       ` Eric W. Biederman
2007-03-13 16:32                   ` Herbert Poetzl
2007-03-06 14:55 ` [RFC][PATCH 2/7] RSS controller core Pavel Emelianov
2007-03-06 22:00   ` Andrew Morton
2007-03-09 16:48     ` Herbert Poetzl
2007-03-11  9:08       ` Pavel Emelianov
2007-03-11 14:32         ` Herbert Poetzl
2007-03-11 15:04           ` Pavel Emelianov
2007-03-12  0:41             ` Herbert Poetzl
2007-03-12  8:31               ` Pavel Emelianov
2007-03-12  9:55       ` Balbir Singh
2007-03-12 23:43         ` Herbert Poetzl
2007-03-13  1:57           ` Balbir Singh
2007-03-13  2:24             ` Srivatsa Vaddagiri
2007-03-13 16:06             ` Herbert Poetzl
2007-03-11 12:26     ` Kirill Korotaev
2007-03-11 12:51       ` Andrew Morton
2007-03-11 15:51         ` Balbir Singh
2007-03-11 19:34         ` Eric W. Biederman
2007-03-12  9:23           ` [Devel] " Kirill Korotaev
2007-03-13  9:26             ` Eric W. Biederman
2007-03-13 15:43               ` Kirill Korotaev
2007-03-12  1:00         ` Herbert Poetzl
2007-03-12  9:02           ` Pavel Emelianov
2007-03-12 21:11             ` Herbert Poetzl
2007-03-13  7:17               ` Pavel Emelianov
2007-03-13 15:05                 ` Herbert Poetzl
2007-03-13 15:32                   ` Pavel Emelianov
2007-03-13 15:10               ` Kirill Korotaev
2007-03-13 15:11                 ` Herbert Poetzl
2007-03-13 15:54                   ` Kirill Korotaev
2007-03-12 18:42           ` Dave Hansen
2007-03-12 22:41             ` Herbert Poetzl
2007-03-12 23:02               ` Dave Hansen
2007-03-18 16:58                 ` Eric W. Biederman
2007-03-13  6:04               ` Andrew Morton
2007-03-13 10:19                 ` [Devel] " Kirill Korotaev
2007-03-13 11:48                   ` Andrew Morton
2007-03-13 14:59                     ` Herbert Poetzl
2007-03-13 17:05                     ` Dave Hansen
2007-03-14 15:38                       ` Mel Gorman
2007-03-14 20:42                         ` Dave Hansen
2007-03-20 18:57                           ` Mel Gorman
2007-03-18 22:44                       ` [Devel] " Paul Menage
2007-03-19 17:41                         ` Eric W. Biederman
2007-03-13 17:26                 ` Dave Hansen
2007-03-13 19:09                   ` Alan Cox
2007-03-13 20:28                     ` Dave Hansen
2007-03-16  0:55                     ` Eric W. Biederman
2007-03-16 16:31                       ` Dave Hansen
2007-03-16 18:54                         ` Eric W. Biederman
2007-03-16 19:46                           ` Dave Hansen
2007-03-18 17:42                             ` Eric W. Biederman
2007-03-19 15:48                               ` Herbert Poetzl
2007-03-20 16:15                               ` controlling mmap()'d vs read/write() pages Dave Hansen
2007-03-20 21:19                                 ` Eric W. Biederman
2007-03-23  0:51                                   ` Herbert Poetzl
2007-03-23  5:57                                   ` Nick Piggin
2007-03-23 10:12                                     ` Eric W. Biederman
2007-03-23 10:47                                       ` Nick Piggin
2007-03-23 12:21                                         ` Eric W. Biederman
2007-03-28  7:33                                           ` Nick Piggin
2007-03-23 16:41                                       ` Dave Hansen
2007-03-23 18:16                                         ` Herbert Poetzl
2007-03-28  9:18                                           ` Balbir Singh
2007-03-14 16:47                   ` [RFC][PATCH 2/7] RSS controller core Mel Gorman
2007-03-07  5:37   ` Balbir Singh
2007-03-07  7:27     ` Pavel Emelianov
2007-03-06 14:58 ` [RFC][PATCH 3/7] Data structures changes for RSS accounting Pavel Emelianov
2007-03-11 19:13   ` Eric W. Biederman
2007-03-12 16:16     ` Kirill Korotaev
2007-03-12 16:48       ` Dave Hansen
2007-03-12 17:19         ` Pavel Emelianov
2007-03-12 17:27           ` Dave Hansen
2007-03-13  7:10             ` Pavel Emelianov
2007-03-12 17:21         ` Balbir Singh
2007-03-06 15:00 ` [RFC][PATCH 4/7] RSS accounting hooks over the code Pavel Emelianov
2007-03-11 19:14   ` Eric W. Biederman
2007-03-12 16:23     ` Kirill Korotaev
2007-03-12 16:50       ` Dave Hansen
2007-03-12 17:07         ` Kirill Korotaev
2007-03-12 17:33           ` Dave Hansen
2007-03-13  9:43             ` Eric W. Biederman
2007-03-12 23:54         ` Herbert Poetzl
2007-03-13  9:58           ` Eric W. Biederman
2007-03-13 10:25             ` Nick Piggin
2007-03-13 16:01               ` Eric W. Biederman
2007-03-14  3:51                 ` Nick Piggin
2007-03-14  6:42                   ` Balbir Singh
2007-03-14  6:57                     ` Nick Piggin
2007-03-14  7:48                       ` Balbir Singh
2007-03-14 13:25                         ` Vaidyanathan Srinivasan
2007-03-14 13:49                           ` Nick Piggin
2007-03-14 14:43                             ` Vaidyanathan Srinivasan
2007-03-14 16:16                             ` Kirill Korotaev
2007-03-15  5:01                               ` Nick Piggin
2007-03-15  5:44                                 ` Balbir Singh [this message]
2007-03-28 20:15               ` Ethan Solomita
2007-03-14 15:37   ` Cedric Le Goater
2007-03-14 15:45     ` Pavel Emelianov
2007-03-06 15:03 ` [RFC][PATCH 5/7] Per-container OOM killer and page reclamation Pavel Emelianov
2007-03-09 21:21   ` Balbir Singh
2007-03-11  8:41     ` Pavel Emelianov
2007-03-06 15:04 ` [RFC][PATCH 6/7] Account for the number of tasks within container Pavel Emelianov
2007-03-07  2:00   ` Paul Menage
2007-03-07  7:13     ` Pavel Emelianov
2007-03-08 13:49       ` Paul Menage
2007-03-11  8:36         ` Pavel Emelianov
2007-03-06 15:07 ` [RFC][PATCH 7/7] Account for the number of files opened " Pavel Emelianov
2007-03-07  2:02 ` [RFC][PATCH 0/7] Resource controllers based on process containers Paul Menage
2007-03-07  7:30   ` Pavel Emelianov
2007-03-07  6:52 ` Balbir Singh
2007-03-07  7:32   ` Pavel Emelianov
2007-03-07  9:43     ` Kirill Korotaev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45F8DD3B.8070302@in.ibm.com \
    --to=balbir@in.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=dev@sw.ru \
    --cc=ebiederm@xmission.com \
    --cc=hansendc@us.ibm.com \
    --cc=herbert@13thfloor.at \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=xemul@sw.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).