From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1422769AbXCWKPj (ORCPT ); Fri, 23 Mar 2007 06:15:39 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1422778AbXCWKPi (ORCPT ); Fri, 23 Mar 2007 06:15:38 -0400 Received: from ebiederm.dsl.xmission.com ([166.70.28.69]:54055 "EHLO ebiederm.dsl.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422769AbXCWKPf (ORCPT ); Fri, 23 Mar 2007 06:15:35 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Nick Piggin Cc: Dave Hansen , Alan Cox , containers@lists.osdl.org, linux-kernel@vger.kernel.org, menage@google.com, Andrew Morton , xemul@sw.ru Subject: Re: controlling mmap()'d vs read/write() pages References: <45ED7DEC.7010403@sw.ru> <45ED80E1.7030406@sw.ru> <20070306140036.4e85bd2f.akpm@linux-foundation.org> <45F3F581.9030503@sw.ru> <20070311045111.62d3e9f9.akpm@linux-foundation.org> <20070312010039.GC21861@MAIL.13thfloor.at> <1173724979.11945.103.camel@localhost.localdomain> <20070312224129.GC21258@MAIL.13thfloor.at> <20070312220439.677b4787.akpm@linux-foundation.org> <1173806793.6680.44.camel@localhost.localdomain> <20070313190931.1417c012@lxorguk.ukuu.org.uk> <1174062660.8184.8.camel@localhost.localdomain> <1174074412.8184.29.camel@localhost.localdomain> <1174407335.26166.146.camel@localhost.localdomain> <46036C54.6030502@yahoo.com.au> Date: Fri, 23 Mar 2007 04:12:09 -0600 In-Reply-To: <46036C54.6030502@yahoo.com.au> (Nick Piggin's message of "Fri, 23 Mar 2007 16:57:40 +1100") Message-ID: User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Nick Piggin writes: > Eric W. Biederman wrote: >> Dave Hansen writes: >> >> >>>So, I think we have a difference of opinion. I think it's _all_ about >>>memory pressure, and you think it is _not_ about accounting for memory >>>pressure. :) Perhaps we mean different things, but we appear to >>>disagree greatly on the surface. >> >> >> I think it is about preventing a badly behaved container from having a >> significant effect on the rest of the system, and in particular other >> containers on the system. > > That's Dave's point, I believe. Limiting mapped memory may be > mostly OK for well behaved applications, but it doesn't do anything > to stop bad ones from effectively DoSing the system or ruining any > guarantees you might proclaim (not that hard guarantees are always > possible without using virtualisation anyway). > > This is why I'm surprised at efforts that go to such great lengths > to get accounting "just right" (but only for mmaped memory). You > may as well not even bother, IMO. > > Give me an RSS limit big enough to run a couple of system calls and > a loop... Would any of them work on a system on which every filesystem was on ramfs, and there was no swap? If not then they are not memory attacks but I/O attacks. I completely concede that you can DOS the system with I/O if that is not limited as well. My point is that is not a memory problem but a disk I/O problem which is much easier to and cheaper to solve. Disk I/O is fundamentally a slow path which makes it hard to modify it in a way that negatively affects system performance. I don't think with a memory RSS limit you can DOS the system in a way that is purely about memory. You have to pick a different kind of DOS attack. As for virtualization that is what a kernel is about virtualizing it's resources so you can have multiple users accessing them at the same time. You don't need some hypervisor or virtual machine to give you that. That is where we start. However it was found long ago that global optimizations give better system through put then the rigid systems you can get with hypervisors. Although things are not quite as deterministic when you optimize globally. They should be sufficiently deterministic you can avoid the worst of the DOS attacks. The real practical problem with the current system is that nearly all of our limits are per process and applications now span more than one process so the limits provided by linux are generally useless to limit real world applications. This isn't generally a problem until we start trying to run multiple applications on the same system because the hardware is so powerful. Which the namespace work which will allow you to run several different instances of user space simultaneously is likely to allow. At the moment I very much in a position of doing review not implementing this part of it. I'm trying to get the people doing the implementation to make certain they have actually been paying attention to how their proposed limits will interact with the rest of the system. So far generally the conversation has centered on memory limits because it seems that is where people have decided the conversation should focus. What I haven't seen is people with the limitations coming back to me tearing my arguments apart and showing or telling me where I'm confused. In general I can challenge even the simplest things and not get a good response. All of which tells me the implementations are not ready. I do have some practical use cases and I have some clue how these subsystems work, and I do care. Which puts in a decent position to at least to high level design review. My biggest disappointment is that none of this is new, and that we seem to have forgotten a lot of the lessons of the past. Eric