linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kirill Korotaev <dev@sw.ru>
To: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	menage@google.com, pj@sgi.com,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	winget@google.com, containers@lists.osdl.org,
	akpm@linux-foundation.org, ckrm-tech@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, xemul@sw.ru
Subject: Re: [PATCH 1/2] rcfs core patch
Date: Tue, 13 Mar 2007 11:28:06 +0300	[thread overview]
Message-ID: <45F66096.20900@sw.ru> (raw)
In-Reply-To: <20070312230053.GD21258@MAIL.13thfloor.at>

>>>well, Linux-VServer is "working", "secure", "flexible"
>>>_and_ non-intrusive ... it is quite natural that less
>>>won't work for me ... and regarding patches, there
>>>will be a 2.2 release soon, with all the patches ...
> 
> 
>>ok. please check your dcache and slab accounting then
>>(analyzed according to patch-2.6.20.1-vs2.3.0.11.diff):
> 
> 
> development branch, good choice for new features
> and code which is currently tested ...
you know better than I that stable branch doesn't differ much,
especially in securiy (because it lacks these controls at all).

BTW, killing arbitrary task in case of RSS limit hit
doesn't look acceptable resource management approach, does it?

>>Both are full of races and problems. Some of them:
>>1. Slabs allocated from interrupt context are charged to 
>>   current context.
>>   So charged values contain arbitrary mess, since during
>>   interrupts context can be arbitrary.
> 
> 
>>2. Due to (1) I guess you do not make any limiting of slabs.
>>   So there are number of ways how to consume a lot of kernel
>>   memory from inside container and
>>   OOM killer will kill arbitrary tasks in case of 
>>   memory-shortage after that.
>>   Don't think it is secure... real DoS.
> 
> 
>>3. Dcache accounting simply doesn't work, since
>>   charges/uncharges are done on current context (sic!!!),
>>   which is arbitrary. i.e. lookup can be done in VE context,
>>   while dcache shrink can be done from another context.
>>   So the whole problem with dcache DoS is not solved at 
>>   all, it is just hard to trigger.
> 
> 
>>4. Dcache accounting is racy, since your checks look like:
>>   if (atomic_read(de->d_count))
>>      charge();
>>   which obviously races with other dput()'s/lookups.
> 
> 
>>5. Dcache accounting can be hit if someone does `find /`
>>   inside container.
>>   After that it is impossible to open something new,
>>   since all the dentries for directories in dcache will 
>>   have d_count > 0 (due it's children).
>>   It is a BUG.
> 
> 
>>6. Counters can be non-zero on container stop due to all
>>   of the above.
> 
> 
> looks like for the the first time you are actually
> looking at the code, or at least providing feedback
> and/or suggestions for improvements (well, not many
> of them, but hey, nobody is perfect :)
It's a pity, but it took me only 5 minutes of looking into the code,
so "not perfect" is a wrong word here, sorry.

>>There are more and more points which arise when such a 
>>non-intrusive accounting is concerned. 
> 
> 
> never claimed that Linux-VServer code is perfect,
> (the Linux accounting isn't perfect either in many
> ways) and Linux-VServer is constantly improving
> (see my other email) ... but IIRC, we are _not_
> discussing Linux-VServer code at all, we are talking
> about a superior solution, which combines the best
> of both worlds ...
Forget about Vserver and OpenVZ. It is not a war.
We are looking for something working, new and robust.
I'm just trying you to show that non-intrusive and pretty small
accounting/limiting code like in Vserver
simply doesn't work. The problem of resource controls is much more complicated.
So non-intrusiveness is a very weird argument from you (and the only).

>>I'm really suprised, that you don't see them
>>or try to behave as you don't see them :/
> 
> 
> all I'm saying is that there is no point in achieving
> perfect accounting and limits (and everything else)
> when all you get is Xen performance and resource usage
then please elaborate on what you mean by
perfect and non-perfect accounting and limits?
I would be happy to sent a patch with a "non-perfect"
accounting if it really works correct and good and suits all the people needs.

BTW, Xen overhead comes mostly from different things (not resource management) -
inability to share data effectively, emulation overhead etc.

>>And, please, believe me, I would not suggest so much 
>>complicated patches If everything was so easy and I 
>>had no reasons simply to accept vserver code.
> 
> 
> no, you are suggesting those patches, because that
> is what your company came up with after being confronted
> with the task (of creating OS-Level virtualization) and
> the arising problems ... so it definitely _is_ a
> solution to those problems, but not necessarily the
> best and definitely not the only one :)
You judge so because you want to.
Have you had some time to compare UBC patches from OVZ
and those sent to LKML (container + RSS)?
You would notice too litle in common.
Patches in LKML has non-OVZ interfaces, no shared pages accounting,
RSS accounting which is not used in OVZ at all.
So do you see any similarities except for stupid and simple
controls like numtask/numfile?

Thanks,
Kirill


  reply	other threads:[~2007-03-13  8:16 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-01 13:35 [PATCH 0/2] resource control file system - aka containers on top of nsproxy! Srivatsa Vaddagiri
2007-03-01 13:45 ` [PATCH 1/2] rcfs core patch Srivatsa Vaddagiri
2007-03-01 16:31   ` Serge E. Hallyn
2007-03-01 16:46     ` Srivatsa Vaddagiri
2007-03-02  5:06   ` [ckrm-tech] " Balbir Singh
2007-03-03  9:38     ` Srivatsa Vaddagiri
2007-03-08  3:12   ` Eric W. Biederman
2007-03-08  9:10     ` Paul Menage
2007-03-09  0:38       ` Herbert Poetzl
2007-03-09  9:07         ` Kirill Korotaev
2007-03-09 13:29           ` Herbert Poetzl
2007-03-09 17:57         ` Srivatsa Vaddagiri
2007-03-10  1:19           ` Herbert Poetzl
2007-03-11 16:36             ` Serge E. Hallyn
2007-03-12 23:16               ` Herbert Poetzl
2007-03-08 10:13     ` Srivatsa Vaddagiri
2007-03-09  0:48       ` Herbert Poetzl
2007-03-09  2:35         ` Paul Jackson
2007-03-09  9:23         ` Kirill Korotaev
2007-03-09  9:38           ` Paul Jackson
2007-03-09 13:21           ` Herbert Poetzl
2007-03-11 17:09             ` Kirill Korotaev
2007-03-12 23:00               ` Herbert Poetzl
2007-03-13  8:28                 ` Kirill Korotaev [this message]
2007-03-13 13:55                   ` Herbert Poetzl
2007-03-13 14:11                     ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-13 15:52                       ` Herbert Poetzl
2007-03-09 18:14         ` Srivatsa Vaddagiri
2007-03-09 19:25           ` Paul Jackson
2007-03-10  1:00             ` Herbert Poetzl
2007-03-10  1:31               ` Paul Jackson
2007-03-10  0:56           ` Herbert Poetzl
2007-03-09 16:16       ` Serge E. Hallyn
2007-03-01 13:50 ` [PATCH 2/2] cpu_accounting controller Srivatsa Vaddagiri
2007-03-01 19:39 ` [PATCH 0/2] resource control file system - aka containers on top of nsproxy! Paul Jackson
2007-03-02 15:45   ` Kirill Korotaev
2007-03-02 16:52     ` Andrew Morton
2007-03-02 17:25       ` Kirill Korotaev
2007-03-03 17:45     ` Herbert Poetzl
2007-03-03 21:22       ` Paul Jackson
2007-03-05 17:47         ` Srivatsa Vaddagiri
2007-03-03  9:36   ` Srivatsa Vaddagiri
2007-03-03 10:21     ` Paul Jackson
2007-03-05 17:02       ` Srivatsa Vaddagiri
2007-03-03 17:32     ` Herbert Poetzl
2007-03-05 17:34       ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-05 18:39         ` Herbert Poetzl
2007-03-06 10:39           ` Srivatsa Vaddagiri
2007-03-06 13:28             ` Herbert Poetzl
2007-03-06 16:21               ` Srivatsa Vaddagiri
2007-03-07  2:32 ` Paul Menage
2007-03-07 17:30   ` Srivatsa Vaddagiri
2007-03-07 17:29     ` Paul Menage
2007-03-07 17:52       ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-07 17:32     ` Srivatsa Vaddagiri
2007-03-07 17:43     ` Serge E. Hallyn
2007-03-07 17:46       ` Paul Menage
2007-03-07 23:16         ` Eric W. Biederman
2007-03-08 11:39           ` Srivatsa Vaddagiri
2007-03-07 18:00       ` Srivatsa Vaddagiri
2007-03-07 20:58         ` Serge E. Hallyn
2007-03-07 21:20           ` Paul Menage
2007-03-07 21:59             ` Serge E. Hallyn
2007-03-07 22:13               ` Dave Hansen
2007-03-07 23:13                 ` Eric W. Biederman
2007-03-12 14:11               ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-07 22:32             ` Eric W. Biederman
2007-03-07 23:18               ` Paul Menage
2007-03-08  0:35                 ` Sam Vilain
2007-03-08  0:42                   ` Paul Menage
2007-03-08  0:53                     ` Sam Vilain
2007-03-08  0:58                       ` [ckrm-tech] " Paul Menage
2007-03-08  1:32                         ` Eric W. Biederman
2007-03-08  1:35                           ` Paul Menage
2007-03-08  2:25                             ` Eric W. Biederman
2007-03-09  0:56                             ` Herbert Poetzl
2007-03-09  0:53                           ` Herbert Poetzl
2007-03-09 18:19                             ` Srivatsa Vaddagiri
2007-03-09 19:36                               ` Paul Jackson
2007-03-09 21:52                               ` Herbert Poetzl
2007-03-09 22:06                                 ` Paul Jackson
2007-03-12 14:01                                   ` Srivatsa Vaddagiri
2007-03-12 15:15                                     ` Srivatsa Vaddagiri
2007-03-12 20:26                                     ` Paul Jackson
2007-03-09  4:30                           ` Paul Jackson
2007-03-08  2:47                         ` Sam Vilain
2007-03-08  2:57                           ` Paul Menage
2007-03-08  3:32                             ` Sam Vilain
2007-03-08  6:10                               ` Matt Helsley
2007-03-08  6:44                                 ` Eric W. Biederman
2007-03-09  1:06                                   ` Herbert Poetzl
2007-03-10  9:06                                     ` Sam Vilain
2007-03-11 21:15                                       ` Paul Jackson
2007-03-12  9:35                                         ` Sam Vilain
2007-03-12 10:00                                         ` Paul Menage
2007-03-12 23:21                                           ` Herbert Poetzl
2007-03-13  2:25                                             ` Paul Menage
2007-03-13 15:57                                               ` Herbert Poetzl
2007-03-09  4:37                                 ` Paul Jackson
2007-03-08  6:32                               ` Eric W. Biederman
2007-03-08  9:10                               ` Paul Menage
2007-03-09 16:50                                 ` Serge E. Hallyn
2007-03-22 14:08                                   ` Srivatsa Vaddagiri
2007-03-22 14:39                                     ` Serge E. Hallyn
2007-03-22 14:56                                       ` Srivatsa Vaddagiri
2007-03-09  4:27                       ` Paul Jackson
2007-03-10  8:52                         ` Sam Vilain
2007-03-10  9:11                           ` Paul Jackson
2007-03-09 16:34             ` Srivatsa Vaddagiri
2007-03-09 16:41               ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-09 22:09               ` Paul Menage
2007-03-10  2:02                 ` Srivatsa Vaddagiri
2007-03-10  3:19                   ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-12 15:07                 ` Srivatsa Vaddagiri
2007-03-12 15:56                   ` Serge E. Hallyn
2007-03-12 16:20                     ` Srivatsa Vaddagiri
2007-03-12 17:25                       ` Serge E. Hallyn
2007-03-12 21:15                       ` Sam Vilain
2007-03-12 23:31                       ` Herbert Poetzl
2007-03-13  2:22                         ` Srivatsa Vaddagiri
2007-03-08  0:50     ` Sam Vilain
2007-03-08 11:30       ` Srivatsa Vaddagiri
2007-03-09  1:16         ` Herbert Poetzl
2007-03-09 18:41           ` Srivatsa Vaddagiri
2007-03-10  2:03             ` Herbert Poetzl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45F66096.20900@sw.ru \
    --to=dev@sw.ru \
    --cc=akpm@linux-foundation.org \
    --cc=ckrm-tech@lists.sourceforge.net \
    --cc=containers@lists.osdl.org \
    --cc=ebiederm@xmission.com \
    --cc=herbert@13thfloor.at \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=pj@sgi.com \
    --cc=vatsa@in.ibm.com \
    --cc=winget@google.com \
    --cc=xemul@sw.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).