From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752168Ab1JKVFA (ORCPT ); Tue, 11 Oct 2011 17:05:00 -0400 Received: from smtp-out.google.com ([74.125.121.67]:44037 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751188Ab1JKVE7 (ORCPT ); Tue, 11 Oct 2011 17:04:59 -0400 DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=dkim-signature:date:from:x-x-sender:to:cc:subject: in-reply-to:message-id:references:user-agent:mime-version:content-type:x-system-of-record; b=RUDnhY1ec1VJzDX+6VK3mri7gTl4C3h8QlZHKgLYiDOUMFI1R9FhD+QJoQRnSdcXy bNJDwrfSxMQwZmC6cgLag== Date: Tue, 11 Oct 2011 14:04:45 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Satoru Moriya cc: Rik van Riel , Randy Dunlap , Satoru Moriya , linux-kernel@vger.kernel.org, linux-mm@kvack.org, "lwoodman@redhat.com" , Seiji Aguchi , Andrew Morton , Hugh Dickins , "hannes@cmpxchg.org" Subject: RE: [PATCH -v2 -mm] add extra free kbytes tunable In-Reply-To: <65795E11DBF1E645A09CEC7EAEE94B9CB516CBBC@USINDEVS02.corp.hds.com> Message-ID: References: <20110901105208.3849a8ff@annuminas.surriel.com> <20110901100650.6d884589.rdunlap@xenotime.net> <20110901152650.7a63cb8b@annuminas.surriel.com> <65795E11DBF1E645A09CEC7EAEE94B9CB516CBBC@USINDEVS02.corp.hds.com> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 11 Oct 2011, Satoru Moriya wrote: > > I also > > think that it will cause regressions on other cpu intensive workloads > > that don't require this extra freed memory because it works as a > > global heuristic and is not tied to any specific application. > > It's yes and no. It may cause regressions on the workloads due to > less amount of available memory. But it may improve the workloads' > performance because they can avoid direct reclaim due to extra > free memory. > There's only a memory-availability regression if background reclaim is actually triggered in the first place, i.e. extra_free_kbytes doesn't affect the watermarks themselves when reclaim is started but rather causes it to, when set, reclaim more memory than otherwise. That's not really what I was referring to; I was referring to cpu intensive workloads that now incur a regression because kswapd is now doing more work (potentially a significant amount of work since extra_free_kbytes is unbounded) on shared machines. These applications may not be allocating memory at all and now they incur a performance penalty because kswapd is taking away one of their cores. In other words, I think it's a fine solution if you're running a single application with very bursty memory allocations so you need to reclaim more memory when low, but that solution is troublesome if it comes at the penalty of other applications and that's a direct consequence of it being a global tunable. I'd much rather identify memory allocations in the kernel that causing the pain here and mitigate it by (i) attempting to sanely rate limit those allocations, (ii) preallocate at least a partial amount of those allocations ahead of time so avoid significant reclaim all at one, or (iii) annotate memory allocations with such potential so that the page allocator can add this reclaim bonus itself only in these conditions. > Of course if one doesn't need extra free memory, one can turn it > off. I think we can add this feature to cgroup if we want to set > it for any specific process or process group. (Before that we > need to implement min_free_kbytes for cgroup and the implementation > of extra free kbytes strongly depends on it.) > That would allow you to only reclaim additional memory when certain applications tirgger it, but it's not actually a solution since another task can hit a zone's low watermark and kick kswapd and then the bursty memory allocations happen immediately following that and doesn't actually do anything because kswapd was already running. So I disagree, as I did when per-cgroup watermark tunables were proposed, that watermarks should be changed for a subset of applications unless you guarantee memory isolation such that that subset of applications has exclusive access to the memory zones being tuned. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id A35B36B002C for ; Tue, 11 Oct 2011 17:04:59 -0400 (EDT) Received: from wpaz21.hot.corp.google.com (wpaz21.hot.corp.google.com [172.24.198.85]) by smtp-out.google.com with ESMTP id p9BL4tHA023851 for ; Tue, 11 Oct 2011 14:04:55 -0700 Received: from qap1 (qap1.prod.google.com [10.224.4.1]) by wpaz21.hot.corp.google.com with ESMTP id p9BL3MLD003445 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Tue, 11 Oct 2011 14:04:54 -0700 Received: by qap1 with SMTP id 1so31620qap.0 for ; Tue, 11 Oct 2011 14:04:49 -0700 (PDT) Date: Tue, 11 Oct 2011 14:04:45 -0700 (PDT) From: David Rientjes Subject: RE: [PATCH -v2 -mm] add extra free kbytes tunable In-Reply-To: <65795E11DBF1E645A09CEC7EAEE94B9CB516CBBC@USINDEVS02.corp.hds.com> Message-ID: References: <20110901105208.3849a8ff@annuminas.surriel.com> <20110901100650.6d884589.rdunlap@xenotime.net> <20110901152650.7a63cb8b@annuminas.surriel.com> <65795E11DBF1E645A09CEC7EAEE94B9CB516CBBC@USINDEVS02.corp.hds.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Satoru Moriya Cc: Rik van Riel , Randy Dunlap , Satoru Moriya , linux-kernel@vger.kernel.org, linux-mm@kvack.org, "lwoodman@redhat.com" , Seiji Aguchi , Andrew Morton , Hugh Dickins , "hannes@cmpxchg.org" On Tue, 11 Oct 2011, Satoru Moriya wrote: > > I also > > think that it will cause regressions on other cpu intensive workloads > > that don't require this extra freed memory because it works as a > > global heuristic and is not tied to any specific application. > > It's yes and no. It may cause regressions on the workloads due to > less amount of available memory. But it may improve the workloads' > performance because they can avoid direct reclaim due to extra > free memory. > There's only a memory-availability regression if background reclaim is actually triggered in the first place, i.e. extra_free_kbytes doesn't affect the watermarks themselves when reclaim is started but rather causes it to, when set, reclaim more memory than otherwise. That's not really what I was referring to; I was referring to cpu intensive workloads that now incur a regression because kswapd is now doing more work (potentially a significant amount of work since extra_free_kbytes is unbounded) on shared machines. These applications may not be allocating memory at all and now they incur a performance penalty because kswapd is taking away one of their cores. In other words, I think it's a fine solution if you're running a single application with very bursty memory allocations so you need to reclaim more memory when low, but that solution is troublesome if it comes at the penalty of other applications and that's a direct consequence of it being a global tunable. I'd much rather identify memory allocations in the kernel that causing the pain here and mitigate it by (i) attempting to sanely rate limit those allocations, (ii) preallocate at least a partial amount of those allocations ahead of time so avoid significant reclaim all at one, or (iii) annotate memory allocations with such potential so that the page allocator can add this reclaim bonus itself only in these conditions. > Of course if one doesn't need extra free memory, one can turn it > off. I think we can add this feature to cgroup if we want to set > it for any specific process or process group. (Before that we > need to implement min_free_kbytes for cgroup and the implementation > of extra free kbytes strongly depends on it.) > That would allow you to only reclaim additional memory when certain applications tirgger it, but it's not actually a solution since another task can hit a zone's low watermark and kick kswapd and then the bursty memory allocations happen immediately following that and doesn't actually do anything because kswapd was already running. So I disagree, as I did when per-cgroup watermark tunables were proposed, that watermarks should be changed for a subset of applications unless you guarantee memory isolation such that that subset of applications has exclusive access to the memory zones being tuned. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org