linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>,
	David Rientjes <rientjes@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH 2/2] mm: make compound_head() robust
Date: Sun, 16 Aug 2015 18:40:16 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.1508161751150.1299@eggly.anvils> (raw)
In-Reply-To: <1439481286-81093-3-git-send-email-kirill.shutemov@linux.intel.com>

On Thu, 13 Aug 2015, Kirill A. Shutemov wrote:

> Hugh has pointed that compound_head() call can be unsafe in some
> context. There's one example:
> 
> 	CPU0					CPU1
> 
> isolate_migratepages_block()
>   page_count()
>     compound_head()
>       !!PageTail() == true
> 					put_page()
> 					  tail->first_page = NULL
>       head = tail->first_page
> 					alloc_pages(__GFP_COMP)
> 					   prep_compound_page()
> 					     tail->first_page = head
> 					     __SetPageTail(p);
>       !!PageTail() == true
>     <head == NULL dereferencing>
> 
> The race is pure theoretical. I don't it's possible to trigger it in
> practice. But who knows.
> 
> We can fix the race by chaging how encode PageTail() and compound_head()
> within struct page to be able to update them in one shot.
> 
> The patch introduces page->compound_head into union with
> page->mem_cgroup.
> 
> Set bit 0 of page->compound_head means that the page is tail. If the bit
> set, rest of the page->compound_head is pointer to head page. Otherwise,
> the field is NULL or pointer to memory cgroup.
> 
> page->mem_cgroup currenly only used for small or head pages, so there
> shouldn't be any conflicts.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> ---
>  Documentation/vm/split_page_table_lock |  4 +-
>  arch/xtensa/configs/iss_defconfig      |  1 -
>  include/linux/mm.h                     | 53 ++--------------------
>  include/linux/mm_types.h               | 15 ++++---
>  include/linux/page-flags.h             | 80 ++++++++--------------------------
>  mm/Kconfig                             | 12 -----
>  mm/debug.c                             |  7 ---
>  mm/hugetlb.c                           |  8 +---
>  mm/internal.h                          |  4 +-
>  mm/memory-failure.c                    |  7 ---
>  mm/page_alloc.c                        | 36 +++++++--------
>  mm/swap.c                              |  4 +-
>  12 files changed, 56 insertions(+), 175 deletions(-)

Mostly I like this: especially those deletions,
and removing the unloved CONFIG_PAGEFLAGS_EXTENDED.

But I do disagree with:

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 0038ac7466fd..e0c4c0a8ec3d 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -172,13 +172,17 @@ struct page {
>  #endif
>  #endif
>  		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
> -		struct page *first_page;	/* Compound tail pages */
>  	};
>  
> -#ifdef CONFIG_MEMCG
> -	struct mem_cgroup *mem_cgroup;
> -#endif
> +	union {
> +		/* Bit zero of the word encode PageTail() */
> +		struct mem_cgroup *mem_cgroup;	/* If bit zero is clear */
> +		unsigned long compound_head;	/* If bit zero is set */
> +	};

On 32-bit, I think the addition of that mem_cgroup pointer enlarged
struct page from 32 bytes to 36 (SLAB or SLOB) or 40 (SLUB) bytes.
I can well imagine people wanting to cut that bloat by turning off
CONFIG_MEMCG, but now you would be thwarting them.  (I can also
imagine memcg people might want to add flag bits of their own to it.)

My own preference (Andrew might disagree) would be to give up on
compound_page_dtor *compound_dtor: in all the years it's been there,
it has only been assigned two possibilities, and I think you'd do
well to hard code those in the one or two places they're needed -
moving PageHuge to be a second bit alongside your PageTail (but of
course it could only be set in the first tail page, not the head).

But as far as fixing the isolate_migratepages_block() bug you
discovered, I think your original atomic_read(&page->_count) fix
was good enough, or a page_count_raw(page) if Andrew prefers -
though I'm not so keen on dressing these things up myself, such
raw scans are exceptional and face very special problems, I like
to see exactly what's going on in them.

And even when a race between PageTail and first_page is resolved
by your READ_ONCE, it still leaves races of whether the head page
still agrees with the tail.  Safer races now because first_page is
reliably a recent first_page pointer: which might be all that's needed
to make your refcounting patches' wider use of compound_head() safe -
this patch does look like a good step for those.

Hugh

      reply	other threads:[~2015-08-17  1:41 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-13 15:54 [PATCH 0/2] Fix compound_head() race Kirill A. Shutemov
2015-08-13 15:54 ` [PATCH 1/2] zsmalloc: use page->private instead of page->first_page Kirill A. Shutemov
2015-08-13 15:54 ` [PATCH 2/2] mm: make compound_head() robust Kirill A. Shutemov
2015-08-17  1:40   ` Hugh Dickins [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1508161751150.1299@eggly.anvils \
    --to=hughd@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.cz \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).