linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Satoru Moriya <satoru.moriya@hds.com>
To: Seiji Aguchi <seiji.aguchi@hds.com>,
	dormando <dormando@rydia.net>, "Rik van Riel" <riel@redhat.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"lwoodman@redhat.com" <lwoodman@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"hughd@google.com" <hughd@google.com>
Subject: RE: extra free kbytes tunable
Date: Fri, 15 Feb 2013 22:49:32 +0000	[thread overview]
Message-ID: <8631DC5930FA9E468F04F3FD3A5D007214B0CCF3@USINDEM103.corp.hds.com> (raw)
In-Reply-To: <A5ED84D3BB3A384992CBB9C77DEDA4D414A98EBF@USINDEM103.corp.hds.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 4298 bytes --]

On 02/15/2013 05:21 PM, Seiji Aguchi wrote:
> Rik, Satoru,
> 
> Do you have any comments?
> 
> Seiji

Hmm, this seems what we wanted to know in the previous thread.

Because extra_free_kbytes is quite simple and it fixes the problem,
it should be merged into upstream.

Regards,
Satoru


>> -----Original Message-----
>> From: linux-kernel-owner@vger.kernel.org 
>> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of dormando
>> Sent: Monday, February 11, 2013 9:01 PM
>> To: Rik van Riel
>> Cc: Randy Dunlap; Satoru Moriya; linux-kernel@vger.kernel.org; 
>> linux-mm@kvack.org; lwoodman@redhat.com; Seiji Aguchi; 
>> akpm@linux-foundation.org; hughd@google.com
>> Subject: extra free kbytes tunable
>>
>> Hi,
>>
>> As discussed in this thread:
>> http://marc.info/?l=linux-mm&m=131490523222031&w=2
>> (with this cleanup as well: https://lkml.org/lkml/2011/9/2/225)
>>
>> A tunable was proposed to allow specifying the distance between 
>> pages_min and the low watermark before kswapd is kicked in to free up 
>> pages. I'd like to re-open this thread since the patch did not appear to go anywhere.
>>
>> We have a server workload wherein machines with 100G+ of "free" 
>> memory (used by page cache), scattered but frequent random io reads 
>> from 12+ SSD's, and 5gbps+ of internet traffic, will frequently hit 
>> direct reclaim in a few different ways.
>>
>> 1) It'll run into small amounts of reclaim randomly (a few hundred thousand).
>>
>> 2) A burst of reads or traffic can cause extra pressure, which kswapd 
>> occasionally responds to by freeing up 40g+ of the pagecache all at 
>> once
>> (!) while pausing the system (Argh).
>>
>> 3) A blip in an upstream provider or failover from a peer causes the 
>> kernel to allocate massive amounts of memory for retransmission 
>> queues/etc, potentially along with buffered IO reads and (some, but 
>> not often a ton) of new allocations from an application. This paired 
>> with 2) can cause the box to stall for 15+ seconds.
>>
>> We're seeing this more in 3.4/3.5/3.6, saw it less in 2.6.38. Mass 
>> reclaims are more common in newer kernels, but reclaims still happen 
>> in all kernels without raising min_free_kbytes dramatically.
>>
>> I've found that setting "lowmem_reserve_ratio" to something like "1 1 32"
>> (thus protecting the DMA32 zone) causes 2) to happen less often, and 
>> is generally less violent with 1).
>>
>> Setting min_free_kbytes to 15G or more, paired with the above, has 
>> been the best at mitigating the issue. This is simply trying to raise 
>> the distance between the min and low watermarks. With min_free_kbytes 
>> set to 15000000, that gives us a whopping 1.8G (!!!) of leeway before 
>> slamming into direct reclaim.
>>
>> So, this patch is unfortunate but wonderful at letting us reclaim 
>> 10G+ of otherwise lost memory. Could we please revisit it?
>>
>> I saw a lot of discussion on doing this automatically, or making 
>> kswapd more efficient to it, and I'd love to do that. Beyond making 
>> kswapd psychic I haven't seen any better options yet.
>>
>> The issue is more complex than simply having an application warn of 
>> an impending allocation, since this can happen via read load on disk 
>> or from kernel page allocations for the network, or a combination of 
>> the two (or three, if you add the app back in).
>>
>> It's going to get worse as we push machines with faster SSD's and 
>> bigger networks. I'm open to any ideas on how to make kswapd more 
>> efficient in our case, or really anything at all that works.
>>
>> I have more details, but cut it down as much as I could for this mail.
>>
>> Thanks,
>> -Dormando
>> --
>> To unsubscribe from this list: send the line "unsubscribe 
>> linux-kernel" in the body of a message to majordomo@vger.kernel.org 
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in the body 
> to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=ilto:"dont@kvack.org"> email@kvack.org </a>
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

      parent reply	other threads:[~2013-02-15 22:49 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-12  2:01 extra free kbytes tunable dormando
2013-02-15 22:21 ` Seiji Aguchi
2013-02-15 22:25   ` Rik van Riel
2013-02-17 23:48     ` [PATCH] add " dormando
2013-02-19 23:29       ` Andrew Morton
2013-02-20  5:19         ` dormando
2013-02-22 17:56           ` Johannes Weiner
2013-02-26 10:47             ` Mel Gorman
2013-02-26 15:13               ` Johannes Weiner
2013-02-26 16:25                 ` Mel Gorman
2013-03-01  9:22             ` Simon Jeons
2013-03-01  9:31               ` Simon Jeons
2013-03-01 22:33                 ` Hugh Dickins
2013-03-02  0:10                   ` Simon Jeons
2013-03-02  1:42                     ` Hugh Dickins
2013-03-02  2:42                       ` Simon Jeons
2013-03-02  3:08                         ` Hugh Dickins
2013-03-02  4:06                           ` Simon Jeons
2013-03-09  1:08                           ` Simon Jeons
2013-02-17 23:54     ` dormando
2013-02-15 22:49   ` Satoru Moriya [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8631DC5930FA9E468F04F3FD3A5D007214B0CCF3@USINDEM103.corp.hds.com \
    --to=satoru.moriya@hds.com \
    --cc=akpm@linux-foundation.org \
    --cc=dormando@rydia.net \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwoodman@redhat.com \
    --cc=rdunlap@xenotime.net \
    --cc=riel@redhat.com \
    --cc=seiji.aguchi@hds.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).