All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>,
	David Rientjes <rientjes@google.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCHv5 5/7] mm: make compound_head() robust
Date: Mon, 14 Sep 2015 15:31:07 +0200	[thread overview]
Message-ID: <55F6CC1B.8020304@suse.cz> (raw)
In-Reply-To: <20150911133501.GA9129@node.dhcp.inet.fi>

On 09/11/2015 03:35 PM, Kirill A. Shutemov wrote:
>>> index 097c7a4bfbd9..330377f83ac7 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -1686,8 +1686,7 @@ static void __split_huge_page_refcount(struct page *page,
>>>   				      (1L << PG_unevictable)));
>>>   		page_tail->flags |= (1L << PG_dirty);
>>>
>>> -		/* clear PageTail before overwriting first_page */
>>> -		smp_wmb();
>>> +		clear_compound_head(page_tail);
>>
>> I would sleep better if this was done before setting all the page->flags above,
>> previously, PageTail was cleared by the first operation which is
>> "page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;"
>> I do realize that it doesn't use WRITE_ONCE, so it might have been theoretically
>> broken already, if it does matter.
>
> Right. Nothing enforces particular order. If we really need to have some
> ordering on PageTail() vs. page->flags let's define it, but so far I
> don't see a reason to change this part.

OK.

>>> diff --git a/mm/internal.h b/mm/internal.h
>>> index 36b23f1e2ca6..89e21a07080a 100644
>>> --- a/mm/internal.h
>>> +++ b/mm/internal.h
>>> @@ -61,9 +61,9 @@ static inline void __get_page_tail_foll(struct page *page,
>>>   	 * speculative page access (like in
>>>   	 * page_cache_get_speculative()) on tail pages.
>>>   	 */
>>> -	VM_BUG_ON_PAGE(atomic_read(&page->first_page->_count) <= 0, page);
>>> +	VM_BUG_ON_PAGE(atomic_read(&compound_head(page)->_count) <= 0, page);
>>>   	if (get_page_head)
>>> -		atomic_inc(&page->first_page->_count);
>>> +		atomic_inc(&compound_head(page)->_count);
>>
>> Doing another compound_head() seems like overkill when this code already assumes
>> PageTail.
>
> "Overkill"? It's too strong wording for re-read hot cache line.

Hm good point.

>> All callers do it after if (PageTail()) which means they already did
>> READ_ONCE(page->compound_head) and here they do another one. Potentially with
>> different result in bit 0, which would be a subtle bug, that could be
>> interesting to catch with some VM_BUG_ON. I don't know if a direct plain
>> page->compound_head access here instead of compound_head() would also result in
>> a re-read, since the previous access did use READ_ONCE(). Maybe it would be best
>> to reorganize the code here and in the 3 call sites so that the READ_ONCE() used
>> to determine PageTail also obtains the compound head pointer.
>
> All we would possbily win by the change is few bytes in code. Additional
> READ_ONCE() only affect code generation. It doesn't introduce any cpu
> barriers. The cache line with compound_head is in L1 anyway.
>
> I don't see justification to change this part too. If you think we can
> gain something by reworking this code, feel free to propose patch on top.

OK, it's probably not worth:

add/remove: 0/0 grow/shrink: 1/3 up/down: 7/-66 (-59)
function                                     old     new   delta
follow_trans_huge_pmd                        516     523      +7
follow_page_pte                              905     893     -12
follow_hugetlb_page                          943     919     -24
__get_page_tail                              440     410     -30


>>> diff --git a/mm/swap.c b/mm/swap.c
>>> index a3a0a2f1f7c3..faa9e1687dea 100644
>>> --- a/mm/swap.c
>>> +++ b/mm/swap.c
>>> @@ -200,7 +200,7 @@ out_put_single:
>>>   				__put_single_page(page);
>>>   			return;
>>>   		}
>>> -		VM_BUG_ON_PAGE(page_head != page->first_page, page);
>>> +		VM_BUG_ON_PAGE(page_head != compound_head(page), page);
>>>   		/*
>>>   		 * We can release the refcount taken by
>>>   		 * get_page_unless_zero() now that
>>> @@ -261,7 +261,7 @@ static void put_compound_page(struct page *page)
>>>   	 *  Case 3 is possible, as we may race with
>>>   	 *  __split_huge_page_refcount tearing down a THP page.
>>>   	 */
>>> -	page_head = compound_head_by_tail(page);
>>> +	page_head = compound_head(page);
>>
>> This is also in a path after PageTail() is true.
>
> We can only save one branch here: other PageTail() is most likely in other
> compilation unit and compiler would not be able to eliminate additional
> load.
> Why bother?

Hmm, right. Bah.

add/remove: 0/0 grow/shrink: 1/0 up/down: 8/0 (8)
function                                     old     new   delta
put_compound_page                            500     508      +8




WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>,
	David Rientjes <rientjes@google.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCHv5 5/7] mm: make compound_head() robust
Date: Mon, 14 Sep 2015 15:31:07 +0200	[thread overview]
Message-ID: <55F6CC1B.8020304@suse.cz> (raw)
In-Reply-To: <20150911133501.GA9129@node.dhcp.inet.fi>

On 09/11/2015 03:35 PM, Kirill A. Shutemov wrote:
>>> index 097c7a4bfbd9..330377f83ac7 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -1686,8 +1686,7 @@ static void __split_huge_page_refcount(struct page *page,
>>>   				      (1L << PG_unevictable)));
>>>   		page_tail->flags |= (1L << PG_dirty);
>>>
>>> -		/* clear PageTail before overwriting first_page */
>>> -		smp_wmb();
>>> +		clear_compound_head(page_tail);
>>
>> I would sleep better if this was done before setting all the page->flags above,
>> previously, PageTail was cleared by the first operation which is
>> "page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;"
>> I do realize that it doesn't use WRITE_ONCE, so it might have been theoretically
>> broken already, if it does matter.
>
> Right. Nothing enforces particular order. If we really need to have some
> ordering on PageTail() vs. page->flags let's define it, but so far I
> don't see a reason to change this part.

OK.

>>> diff --git a/mm/internal.h b/mm/internal.h
>>> index 36b23f1e2ca6..89e21a07080a 100644
>>> --- a/mm/internal.h
>>> +++ b/mm/internal.h
>>> @@ -61,9 +61,9 @@ static inline void __get_page_tail_foll(struct page *page,
>>>   	 * speculative page access (like in
>>>   	 * page_cache_get_speculative()) on tail pages.
>>>   	 */
>>> -	VM_BUG_ON_PAGE(atomic_read(&page->first_page->_count) <= 0, page);
>>> +	VM_BUG_ON_PAGE(atomic_read(&compound_head(page)->_count) <= 0, page);
>>>   	if (get_page_head)
>>> -		atomic_inc(&page->first_page->_count);
>>> +		atomic_inc(&compound_head(page)->_count);
>>
>> Doing another compound_head() seems like overkill when this code already assumes
>> PageTail.
>
> "Overkill"? It's too strong wording for re-read hot cache line.

Hm good point.

>> All callers do it after if (PageTail()) which means they already did
>> READ_ONCE(page->compound_head) and here they do another one. Potentially with
>> different result in bit 0, which would be a subtle bug, that could be
>> interesting to catch with some VM_BUG_ON. I don't know if a direct plain
>> page->compound_head access here instead of compound_head() would also result in
>> a re-read, since the previous access did use READ_ONCE(). Maybe it would be best
>> to reorganize the code here and in the 3 call sites so that the READ_ONCE() used
>> to determine PageTail also obtains the compound head pointer.
>
> All we would possbily win by the change is few bytes in code. Additional
> READ_ONCE() only affect code generation. It doesn't introduce any cpu
> barriers. The cache line with compound_head is in L1 anyway.
>
> I don't see justification to change this part too. If you think we can
> gain something by reworking this code, feel free to propose patch on top.

OK, it's probably not worth:

add/remove: 0/0 grow/shrink: 1/3 up/down: 7/-66 (-59)
function                                     old     new   delta
follow_trans_huge_pmd                        516     523      +7
follow_page_pte                              905     893     -12
follow_hugetlb_page                          943     919     -24
__get_page_tail                              440     410     -30


>>> diff --git a/mm/swap.c b/mm/swap.c
>>> index a3a0a2f1f7c3..faa9e1687dea 100644
>>> --- a/mm/swap.c
>>> +++ b/mm/swap.c
>>> @@ -200,7 +200,7 @@ out_put_single:
>>>   				__put_single_page(page);
>>>   			return;
>>>   		}
>>> -		VM_BUG_ON_PAGE(page_head != page->first_page, page);
>>> +		VM_BUG_ON_PAGE(page_head != compound_head(page), page);
>>>   		/*
>>>   		 * We can release the refcount taken by
>>>   		 * get_page_unless_zero() now that
>>> @@ -261,7 +261,7 @@ static void put_compound_page(struct page *page)
>>>   	 *  Case 3 is possible, as we may race with
>>>   	 *  __split_huge_page_refcount tearing down a THP page.
>>>   	 */
>>> -	page_head = compound_head_by_tail(page);
>>> +	page_head = compound_head(page);
>>
>> This is also in a path after PageTail() is true.
>
> We can only save one branch here: other PageTail() is most likely in other
> compilation unit and compiler would not be able to eliminate additional
> load.
> Why bother?

Hmm, right. Bah.

add/remove: 0/0 grow/shrink: 1/0 up/down: 8/0 (8)
function                                     old     new   delta
put_compound_page                            500     508      +8



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-09-14 13:31 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-03 12:35 [PATCHv5 0/7] Fix compound_head() race Kirill A. Shutemov
2015-09-03 12:35 ` Kirill A. Shutemov
2015-09-03 12:35 ` [PATCHv5 1/7] mm: drop page->slab_page Kirill A. Shutemov
2015-09-03 12:35   ` Kirill A. Shutemov
2015-09-07  5:00   ` Alexander Duyck
2015-09-07  5:00     ` Alexander Duyck
2015-09-07 12:19     ` Kirill A. Shutemov
2015-09-07 12:19       ` Kirill A. Shutemov
2015-09-03 12:35 ` [PATCHv5 2/7] slub: use page->rcu_head instead of page->lru plus cast Kirill A. Shutemov
2015-09-03 12:35   ` Kirill A. Shutemov
2015-09-03 12:35 ` [PATCHv5 3/7] zsmalloc: use page->private instead of page->first_page Kirill A. Shutemov
2015-09-03 12:35   ` Kirill A. Shutemov
2015-09-03 12:35 ` [PATCHv5 4/7] mm: pack compound_dtor and compound_order into one word in struct page Kirill A. Shutemov
2015-09-03 12:35   ` Kirill A. Shutemov
2015-09-10  9:41   ` Vlastimil Babka
2015-09-10  9:41     ` Vlastimil Babka
2015-09-03 12:35 ` [PATCHv5 5/7] mm: make compound_head() robust Kirill A. Shutemov
2015-09-03 12:35   ` Kirill A. Shutemov
2015-09-10 10:54   ` Vlastimil Babka
2015-09-10 10:54     ` Vlastimil Babka
2015-09-11 13:35     ` Kirill A. Shutemov
2015-09-11 13:35       ` Kirill A. Shutemov
2015-09-14 13:31       ` Vlastimil Babka [this message]
2015-09-14 13:31         ` Vlastimil Babka
2015-09-03 12:35 ` [PATCHv5 6/7] mm: use 'unsigned int' for page order Kirill A. Shutemov
2015-09-03 12:35   ` Kirill A. Shutemov
2015-09-10 11:04   ` Vlastimil Babka
2015-09-10 11:04     ` Vlastimil Babka
2015-09-03 12:35 ` [PATCHv5 7/7] mm: use 'unsigned int' for compound_dtor/compound_order on 64BIT Kirill A. Shutemov
2015-09-03 12:35   ` Kirill A. Shutemov
2015-09-10 11:34   ` Vlastimil Babka
2015-09-10 11:34     ` Vlastimil Babka
2015-09-04 13:43 ` [PATCHv5 0/7] Fix compound_head() race Andrea Arcangeli
2015-09-04 13:43   ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55F6CC1B.8020304@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.