From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752403Ab1JMUsV (ORCPT ); Thu, 13 Oct 2011 16:48:21 -0400 Received: from smtp-out.google.com ([74.125.121.67]:27759 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752151Ab1JMUsT (ORCPT ); Thu, 13 Oct 2011 16:48:19 -0400 DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=dkim-signature:date:from:x-x-sender:to:cc:subject: in-reply-to:message-id:references:user-agent:mime-version:content-type:x-system-of-record; b=ED+VUH2eezU4rr9LMuRW3L7l/5iqSmMq/7WLs3FERh1dh3L+fQEPgRvZrAzoMbh0g 662PCB7wS1ZAkTbcjihyQ== Date: Thu, 13 Oct 2011 13:48:13 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Satoru Moriya cc: Con Kolivas , Andrew Morton , Rik van Riel , Randy Dunlap , Satoru Moriya , linux-kernel@vger.kernel.org, linux-mm@kvack.org, "lwoodman@redhat.com" , Seiji Aguchi , Hugh Dickins , "hannes@cmpxchg.org" Subject: RE: [PATCH -v2 -mm] add extra free kbytes tunable In-Reply-To: <65795E11DBF1E645A09CEC7EAEE94B9CB516D459@USINDEVS02.corp.hds.com> Message-ID: References: <20110901105208.3849a8ff@annuminas.surriel.com> <20110901100650.6d884589.rdunlap@xenotime.net> <20110901152650.7a63cb8b@annuminas.surriel.com> <20111010153723.6397924f.akpm@linux-foundation.org> <65795E11DBF1E645A09CEC7EAEE94B9CB516CBC4@USINDEVS02.corp.hds.com> <65795E11DBF1E645A09CEC7EAEE94B9CB516D459@USINDEVS02.corp.hds.com> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 13 Oct 2011, Satoru Moriya wrote: > My test case is just a simple one (maybe too simple), and I tried > to demonstrate following issues that current kernel has with it. > > 1. Current kernel uses free memory as pagecache. > 2. Applications may allocate memory burstly and when it happens > they may get a latency issue because there are not enough free > memory. Also the amount of required memory is wide-ranging. This is what the per-zone watermarks are intended to address and I understand that it's not doing a good enough job for your particular workloads. I'm trying to find a solution that mitigates that for all threads that allocate faster than the kernel can reclaim, realtime or otherwise, without requiring the admin to set those watermarks himself, which is really what extra_free_kbytes is eventually leading to. > 3. Some users would like to control the amount of free memory > to avoid the situation above. The only possible way to do that is with min_free_kbytes right now and that would increase the amount of memory that realtime threads have exclusive access to. Let's try not to add additional tunables so that admins need to find their own optimal watermarks for every kernel release. I see no reason why we can't add logic for rt-threads triggering reclaim to either reclaim faster (Con's patch) or more memory than normal (an ALLOC_HARDER type bonus in the reclaim path to reclaim 1.25 * high_wmark, for example). We've had a rt-thread bonus in the page allocator for a long time, I'm not saying we don't need more elsewhere. > 4. User can't setup the amount of free memory explicitly. > From user's point of view, the amount of free memory is the delta > between high watermark - min watermark because below min watermark > user applications incur a penalty (direct reclaim). The width of > delta depends on min_free_kbytes, actually min watermark / 2, and > so if we want to make free memory bigger, we must make > min_free_kbytes bigger. It's not a intuitive and it introduces > another problem that is possibility of direct reclaim is increased. > So you're saying that we need to increase the space between high_wmark and min_wmark anytime that min_free_kbytes changes? That certainly may be true and would hopefully mitigate direct reclaim becoming too intrusive for your workload. We _really_ don't want to cause regressions for others, though, which extra_free_kbytes can easily do for cpu-intensive workloads if nothing is currently requiring that extra burst of memory (and occurs because extra_free_kbytes is a global tunable and not tied to any specific application [like testing for rt_task()] that we can identify when reclaiming). > But my concern described above is still alive because whether > latency issue happen or not depends on how heavily workloads > allocate memory at a short time. Of cource we can say same > things for extra_free_kbytes, but we can change it and test > an effect easily. > We'll never know the future and how much memory a latency-sensitive application will require 100ms from now. The only thing that we can do is (i) identify the latency-sensitive app, (ii) reclaim more aggressively for them, and (iii) reclaim additional memory in preparation for another burst. At some point, though, userspace needs to be responsible to not allocate enormous amounts of memory all at once and there's room for mitigation there too to preallocate ahead of what you actually need. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id 62E156B016B for ; Thu, 13 Oct 2011 16:48:21 -0400 (EDT) Received: from wpaz29.hot.corp.google.com (wpaz29.hot.corp.google.com [172.24.198.93]) by smtp-out.google.com with ESMTP id p9DKmGj2004572 for ; Thu, 13 Oct 2011 13:48:17 -0700 Received: from pzk34 (pzk34.prod.google.com [10.243.19.162]) by wpaz29.hot.corp.google.com with ESMTP id p9DKjUdH009991 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Thu, 13 Oct 2011 13:48:15 -0700 Received: by pzk34 with SMTP id 34so4986397pzk.10 for ; Thu, 13 Oct 2011 13:48:15 -0700 (PDT) Date: Thu, 13 Oct 2011 13:48:13 -0700 (PDT) From: David Rientjes Subject: RE: [PATCH -v2 -mm] add extra free kbytes tunable In-Reply-To: <65795E11DBF1E645A09CEC7EAEE94B9CB516D459@USINDEVS02.corp.hds.com> Message-ID: References: <20110901105208.3849a8ff@annuminas.surriel.com> <20110901100650.6d884589.rdunlap@xenotime.net> <20110901152650.7a63cb8b@annuminas.surriel.com> <20111010153723.6397924f.akpm@linux-foundation.org> <65795E11DBF1E645A09CEC7EAEE94B9CB516CBC4@USINDEVS02.corp.hds.com> <65795E11DBF1E645A09CEC7EAEE94B9CB516D459@USINDEVS02.corp.hds.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Satoru Moriya Cc: Con Kolivas , Andrew Morton , Rik van Riel , Randy Dunlap , Satoru Moriya , linux-kernel@vger.kernel.org, linux-mm@kvack.org, "lwoodman@redhat.com" , Seiji Aguchi , Hugh Dickins , "hannes@cmpxchg.org" On Thu, 13 Oct 2011, Satoru Moriya wrote: > My test case is just a simple one (maybe too simple), and I tried > to demonstrate following issues that current kernel has with it. > > 1. Current kernel uses free memory as pagecache. > 2. Applications may allocate memory burstly and when it happens > they may get a latency issue because there are not enough free > memory. Also the amount of required memory is wide-ranging. This is what the per-zone watermarks are intended to address and I understand that it's not doing a good enough job for your particular workloads. I'm trying to find a solution that mitigates that for all threads that allocate faster than the kernel can reclaim, realtime or otherwise, without requiring the admin to set those watermarks himself, which is really what extra_free_kbytes is eventually leading to. > 3. Some users would like to control the amount of free memory > to avoid the situation above. The only possible way to do that is with min_free_kbytes right now and that would increase the amount of memory that realtime threads have exclusive access to. Let's try not to add additional tunables so that admins need to find their own optimal watermarks for every kernel release. I see no reason why we can't add logic for rt-threads triggering reclaim to either reclaim faster (Con's patch) or more memory than normal (an ALLOC_HARDER type bonus in the reclaim path to reclaim 1.25 * high_wmark, for example). We've had a rt-thread bonus in the page allocator for a long time, I'm not saying we don't need more elsewhere. > 4. User can't setup the amount of free memory explicitly. > From user's point of view, the amount of free memory is the delta > between high watermark - min watermark because below min watermark > user applications incur a penalty (direct reclaim). The width of > delta depends on min_free_kbytes, actually min watermark / 2, and > so if we want to make free memory bigger, we must make > min_free_kbytes bigger. It's not a intuitive and it introduces > another problem that is possibility of direct reclaim is increased. > So you're saying that we need to increase the space between high_wmark and min_wmark anytime that min_free_kbytes changes? That certainly may be true and would hopefully mitigate direct reclaim becoming too intrusive for your workload. We _really_ don't want to cause regressions for others, though, which extra_free_kbytes can easily do for cpu-intensive workloads if nothing is currently requiring that extra burst of memory (and occurs because extra_free_kbytes is a global tunable and not tied to any specific application [like testing for rt_task()] that we can identify when reclaiming). > But my concern described above is still alive because whether > latency issue happen or not depends on how heavily workloads > allocate memory at a short time. Of cource we can say same > things for extra_free_kbytes, but we can change it and test > an effect easily. > We'll never know the future and how much memory a latency-sensitive application will require 100ms from now. The only thing that we can do is (i) identify the latency-sensitive app, (ii) reclaim more aggressively for them, and (iii) reclaim additional memory in preparation for another burst. At some point, though, userspace needs to be responsible to not allocate enormous amounts of memory all at once and there's room for mitigation there too to preallocate ahead of what you actually need. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org