linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Lameter <cl@linux.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-Netdev <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	David Miller <davem@davemloft.net>, Neil Brown <neilb@suse.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Pekka Enberg <penberg@cs.helsinki.fi>
Subject: Re: [PATCH 02/15] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages
Date: Fri, 10 Feb 2012 15:01:37 -0600 (CST)	[thread overview]
Message-ID: <alpine.DEB.2.00.1202101443570.31424@router.home> (raw)
In-Reply-To: <20120210102605.GO5938@suse.de>

On Fri, 10 Feb 2012, Mel Gorman wrote:

> I have an updated version of this 02/15 patch below. It passed testing
> and is a lot less invasive than the previous release. As you suggested,
> it uses page flags and the bulk of the complexity is only executed if
> someone is using network-backed storage.

Hmmm.. hmm... Still modifies the hotpaths of the allocators for a
pretty exotic feature.

> > On top of that you want to add
> > special code in various subsystems to also do that over the network.
> > Sigh. I think we agreed a while back that we want to limit the amount of
> > I/O triggered from reclaim paths?
>
> Specifically we wanted to reduce or stop page reclaim calling ->writepage()
> for file-backed pages because it generated awful IO patterns and deep
> call stacks. We still write anonymous pages from page reclaim because we
> do not have a dedicated thread for writing to swap. It is expected that
> the call stack for writing to network storage would be less than a
> filesystem.
>
> > AFAICT many filesystems do not support
> > writeout from reclaim anymore because of all the issues that arise at that
> > level.
> >
>
> NBD is a block device so filesystem restrictions like you mention do not
> apply. In NFS, the direct_IO paths are used to write pages not
> ->writepage so again the restriction does not apply.

Block devices are a little simpler ok. But it is still not a desirable
thing to do (just think about raid and other complex filesystems that may
also have to do allocations).I do not think that block device writers
code with the VM in mind. In the case of network devices as block devices
we have a pretty serious problem since the network subsystem is certainly
not designed to be called from VM reclaim code that may be triggered
arbitrarily from deeply nested other code in the kernel. Implementing
something like this invites breakage all over the place to show up.

> index 8b3b8cf..6a3fa1c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -695,6 +695,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
>  	trace_mm_page_free(page, order);
>  	kmemcheck_free_shadow(page, order);
>
> +	page->pfmemalloc = false;
>  	if (PageAnon(page))
>  		page->mapping = NULL;
>  	for (i = 0; i < (1 << order); i++)
> @@ -1221,6 +1222,7 @@ void free_hot_cold_page(struct page *page, int cold)
>
>  	migratetype = get_pageblock_migratetype(page);
>  	set_page_private(page, migratetype);
> +	page->pfmemalloc = false;
>  	local_irq_save(flags);
>  	if (unlikely(wasMlocked))
>  		free_page_mlock(page);

page allocator hotpaths affected.

> diff --git a/mm/slab.c b/mm/slab.c
> index f0bd785..f322dc2 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -123,6 +123,8 @@
>
>  #include <trace/events/kmem.h>
>
> +#include	"internal.h"
> +
>  /*
>   * DEBUG	- 1 for kmem_cache_create() to honour; SLAB_RED_ZONE & SLAB_POISON.
>   *		  0 for faster, smaller code (especially in the critical paths).
> @@ -151,6 +153,12 @@
>  #define ARCH_KMALLOC_FLAGS SLAB_HWCACHE_ALIGN
>  #endif
>
> +/*
> + * true if a page was allocated from pfmemalloc reserves for network-based
> + * swap
> + */
> +static bool pfmemalloc_active;

Implying an additional cacheline use in critical slab paths? Hopefully
grouped with other variables already in cache.

> @@ -3243,23 +3380,35 @@ static inline void *____cache_alloc(struct kmem_cache *cachep, gfp_t flags)
>  {
>  	void *objp;
>  	struct array_cache *ac;
> +	bool force_refill = false;

... hitting the hotpath here.

> @@ -3693,12 +3845,12 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp,
>
>  	if (likely(ac->avail < ac->limit)) {
>  		STATS_INC_FREEHIT(cachep);
> -		ac->entry[ac->avail++] = objp;
> +		ac_put_obj(cachep, ac, objp);
>  		return;
>  	} else {
>  		STATS_INC_FREEMISS(cachep);
>  		cache_flusharray(cachep, ac);
> -		ac->entry[ac->avail++] = objp;
> +		ac_put_obj(cachep, ac, objp);
>  	}
>  }

and here.


> diff --git a/mm/slub.c b/mm/slub.c
> index 4907563..8eed0de 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c

> @@ -2304,8 +2327,8 @@ redo:
>  	barrier();
>
>  	object = c->freelist;
> -	if (unlikely(!object || !node_match(c, node)))
> -
> +	if (unlikely(!object || !node_match(c, node) ||
> +					!pfmemalloc_match(c, gfpflags)))
>  		object = __slab_alloc(s, gfpflags, node, addr, c);
>
>  	else {


Modification to hotpath. That could be fixed here by forcing pfmemalloc
(like debug allocs) to always go to the slow path and checking in there
instead. Just keep c->freelist == NULL.




  reply	other threads:[~2012-02-10 21:01 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-06 22:56 [PATCH 00/15] Swap-over-NBD without deadlocking V8 Mel Gorman
2012-02-06 22:56 ` [PATCH 01/15] mm: Serialize access to min_free_kbytes Mel Gorman
2012-02-08 18:47   ` Rik van Riel
2012-02-06 22:56 ` [PATCH 02/15] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages Mel Gorman
2012-02-07 16:27   ` Christoph Lameter
2012-02-08 14:45     ` Mel Gorman
2012-02-08 15:14       ` Christoph Lameter
2012-02-08 16:34         ` Mel Gorman
2012-02-08 19:49           ` Christoph Lameter
2012-02-08 21:23             ` Mel Gorman
2012-02-08 22:13               ` Christoph Lameter
2012-02-09 12:50                 ` Mel Gorman
2012-02-09 19:53                   ` Christoph Lameter
2012-02-10 10:26                     ` Mel Gorman
2012-02-10 21:01                       ` Christoph Lameter [this message]
2012-02-10 22:07                         ` Christoph Lameter
2012-02-13 10:12                           ` Mel Gorman
2012-02-13 11:10                         ` Mel Gorman
2012-02-06 22:56 ` [PATCH 03/15] mm: Introduce __GFP_MEMALLOC to allow access to emergency reserves Mel Gorman
2012-02-06 22:56 ` [PATCH 04/15] mm: allow PF_MEMALLOC from softirq context Mel Gorman
2012-02-06 22:56 ` [PATCH 05/15] mm: Ignore mempolicies when using ALLOC_NO_WATERMARK Mel Gorman
2012-02-06 22:56 ` [PATCH 06/15] net: Introduce sk_allocation() to allow addition of GFP flags depending on the individual socket Mel Gorman
2012-02-06 22:56 ` [PATCH 07/15] netvm: Allow the use of __GFP_MEMALLOC by specific sockets Mel Gorman
2012-02-06 22:56 ` [PATCH 08/15] netvm: Allow skb allocation to use PFMEMALLOC reserves Mel Gorman
2012-02-06 22:56 ` [PATCH 09/15] netvm: Propagate page->pfmemalloc to skb Mel Gorman
2012-02-06 22:56 ` [PATCH 10/15] netvm: Propagate page->pfmemalloc from netdev_alloc_page " Mel Gorman
2012-02-07 23:38   ` Alexander Duyck
2012-02-08 15:23     ` Mel Gorman
2012-02-06 22:56 ` [PATCH 11/15] netvm: Set PF_MEMALLOC as appropriate during SKB processing Mel Gorman
2012-02-06 22:56 ` [PATCH 12/15] mm: Micro-optimise slab to avoid a function call Mel Gorman
2012-02-06 22:56 ` [PATCH 13/15] nbd: Set SOCK_MEMALLOC for access to PFMEMALLOC reserves Mel Gorman
2012-02-06 22:56 ` [PATCH 14/15] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage Mel Gorman
2012-02-06 22:56 ` [PATCH 15/15] mm: Account for the number of times direct reclaimers get throttled Mel Gorman
2012-02-07 12:45 ` [PATCH 00/15] Swap-over-NBD without deadlocking V8 Hillf Danton
2012-02-07 13:27   ` Mel Gorman
2012-02-08 12:51     ` Hillf Danton
2012-02-08 15:26       ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1202101443570.31424@router.home \
    --to=cl@linux.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=neilb@suse.de \
    --cc=netdev@vger.kernel.org \
    --cc=penberg@cs.helsinki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).