All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@kernel.org>,
	Christopher Lameter <cl@linux.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.cz>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 4/6] mm: introduce page->dma_pinned_flags, _count
Date: Fri, 2 Nov 2018 16:27:19 -0700	[thread overview]
Message-ID: <8a9d3a4e-a5d6-3d2d-008f-dc2d6cfd94ed@nvidia.com> (raw)
In-Reply-To: <20181024110031.GM8537@350D>

On 10/24/18 4:00 AM, Balbir Singh wrote:
> On Fri, Oct 12, 2018 at 05:15:51PM -0700, John Hubbard wrote:
>> On 10/12/18 3:56 AM, Balbir Singh wrote:
>>> On Thu, Oct 11, 2018 at 11:00:12PM -0700, john.hubbard@gmail.com wrote:
>>>> From: John Hubbard <jhubbard@nvidia.com>
>> [...]
>>>> + * Because page->dma_pinned_flags is unioned with page->lru, any page that
>>>> + * uses these flags must NOT be on an LRU. That's partly enforced by
>>>> + * ClearPageDmaPinned, which gives the page back to LRU.
>>>> + *
>>>> + * PageDmaPinned also corresponds to PageTail (the 0th bit in the first union
>>>> + * of struct page), and this flag is checked without knowing whether it is a
>>>> + * tail page or a PageDmaPinned page. Therefore, start the flags at bit 1 (0x2),
>>>> + * rather than bit 0.
>>>> + */
>>>> +#define PAGE_DMA_PINNED		0x2
>>>> +#define PAGE_DMA_PINNED_FLAGS	(PAGE_DMA_PINNED)
>>>> +
>>>
>>> This is really subtle, additional changes to compound_head will need to coordinate
>>> with these flags? Also doesn't this bit need to be unique across all structs in
>>> the union? I guess that is guaranteed by the fact that page == compound_head(page)
>>> as per your assertion, but I've forgotten why that is true. Could you please
>>> add some commentary on that
>>>
>>
>> Yes, agreed. I've rewritten and augmented that comment block, plus removed the 
>> PAGE_DMA_PINNED_FLAGS (there are no more bits available, so it's just misleading 
>> to even have it). So now it looks like this:
>>
>> /*
>>  * Because page->dma_pinned_flags is unioned with page->lru, any page that
>>  * uses these flags must NOT be on an LRU. That's partly enforced by
>>  * ClearPageDmaPinned, which gives the page back to LRU.
>>  *
>>  * PageDmaPinned is checked without knowing whether it is a tail page or a
>>  * PageDmaPinned page. For that reason, PageDmaPinned avoids PageTail (the 0th
>>  * bit in the first union of struct page), and instead uses bit 1 (0x2),
>>  * rather than bit 0.
>>  *
>>  * PageDmaPinned can only be used if no other systems are using the same bit
>>  * across the first struct page union. In this regard, it is similar to
>>  * PageTail, and in fact, because of PageTail's constraint that bit 0 be left
>>  * alone, bit 1 is also left alone so far: other union elements (ignoring tail
>>  * pages) put pointers there, and pointer alignment leaves the lower two bits
>>  * available.
>>  *
>>  * So, constraints include:
>>  *
>>  *     -- Only use PageDmaPinned on non-tail pages.
>>  *     -- Remove the page from any LRU list first.
>>  */
>> #define PAGE_DMA_PINNED		0x2
>>
>> /*
>>  * Because these flags are read outside of a lock, ensure visibility between
>>  * different threads, by using READ|WRITE_ONCE.
>>  */
>> static __always_inline int PageDmaPinned(struct page *page)
>> {
>> 	VM_BUG_ON(page != compound_head(page));
>> 	return (READ_ONCE(page->dma_pinned_flags) & PAGE_DMA_PINNED) != 0;
>> }
>>
>> [...]
>>>> +static __always_inline void SetPageDmaPinned(struct page *page)
>>>> +{
>>>> +	VM_BUG_ON(page != compound_head(page));
>>>
>>> VM_BUG_ON(!list_empty(&page->lru))
>>
>>
>> There is only one place where we set this flag, and that is when (in patch 6/6)
>> transitioning from a page that might (or might not) have been
>> on an LRU. In that case, the calling code has already corrupted page->lru, by
>> writing to page->dma_pinned_count, which is unions with page->lru:
>>
>> 		atomic_set(&page->dma_pinned_count, 1);
>> 		SetPageDmaPinned(page);
>>
>> ...so it would be inappropriate to call a list function, such as 
>> list_empty(), on that field.  Let's just leave it as-is.
>>
>>
>>>
>>>> +	WRITE_ONCE(page->dma_pinned_flags, PAGE_DMA_PINNED);
>>>> +}
>>>> +
>>>> +static __always_inline void ClearPageDmaPinned(struct page *page)
>>>> +{
>>>> +	VM_BUG_ON(page != compound_head(page));
>>>> +	VM_BUG_ON_PAGE(!PageDmaPinnedFlags(page), page);
>>>> +
>>>> +	/* This does a WRITE_ONCE to the lru.next, which is also the
>>>> +	 * page->dma_pinned_flags field. So in addition to restoring page->lru,
>>>> +	 * this provides visibility to other threads.
>>>> +	 */
>>>> +	INIT_LIST_HEAD(&page->lru);
>>>
>>> This assumes certain things about list_head, why not use the correct
>>> initialization bits.
>>>
>>
>> Yes, OK, changed to:
>>
>> static __always_inline void ClearPageDmaPinned(struct page *page)
>> {
>> 	VM_BUG_ON(page != compound_head(page));
>> 	VM_BUG_ON_PAGE(!PageDmaPinned(page), page);
>>
>> 	/* Provide visibility to other threads: */
>> 	WRITE_ONCE(page->dma_pinned_flags, 0);
>>
>> 	/*
>> 	 * Safety precaution: restore the list head, before possibly returning
>> 	 * the page to other subsystems.
>> 	 */
>> 	INIT_LIST_HEAD(&page->lru);
>> }
>>
>>
> 
> Sorry, I've been distracted with other things
> 
> This looks better, do we still need the INIT_LIST_HEAD?
> 

Good point. I guess not. I was getting a little too fancy, and it's better 
for ClearPageDmaPinned to be true to its name, and just only do that.

(Sorry for the delayed response.)

thanks,
-- 
John Hubbard
NVIDIA

WARNING: multiple messages have this Message-ID (diff)
From: John Hubbard <jhubbard@nvidia.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@kernel.org>,
	Christopher Lameter <cl@linux.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.cz>,
	<linux-mm@kvack.org>, Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 4/6] mm: introduce page->dma_pinned_flags, _count
Date: Fri, 2 Nov 2018 16:27:19 -0700	[thread overview]
Message-ID: <8a9d3a4e-a5d6-3d2d-008f-dc2d6cfd94ed@nvidia.com> (raw)
In-Reply-To: <20181024110031.GM8537@350D>

On 10/24/18 4:00 AM, Balbir Singh wrote:
> On Fri, Oct 12, 2018 at 05:15:51PM -0700, John Hubbard wrote:
>> On 10/12/18 3:56 AM, Balbir Singh wrote:
>>> On Thu, Oct 11, 2018 at 11:00:12PM -0700, john.hubbard@gmail.com wrote:
>>>> From: John Hubbard <jhubbard@nvidia.com>
>> [...]
>>>> + * Because page->dma_pinned_flags is unioned with page->lru, any page that
>>>> + * uses these flags must NOT be on an LRU. That's partly enforced by
>>>> + * ClearPageDmaPinned, which gives the page back to LRU.
>>>> + *
>>>> + * PageDmaPinned also corresponds to PageTail (the 0th bit in the first union
>>>> + * of struct page), and this flag is checked without knowing whether it is a
>>>> + * tail page or a PageDmaPinned page. Therefore, start the flags at bit 1 (0x2),
>>>> + * rather than bit 0.
>>>> + */
>>>> +#define PAGE_DMA_PINNED		0x2
>>>> +#define PAGE_DMA_PINNED_FLAGS	(PAGE_DMA_PINNED)
>>>> +
>>>
>>> This is really subtle, additional changes to compound_head will need to coordinate
>>> with these flags? Also doesn't this bit need to be unique across all structs in
>>> the union? I guess that is guaranteed by the fact that page == compound_head(page)
>>> as per your assertion, but I've forgotten why that is true. Could you please
>>> add some commentary on that
>>>
>>
>> Yes, agreed. I've rewritten and augmented that comment block, plus removed the 
>> PAGE_DMA_PINNED_FLAGS (there are no more bits available, so it's just misleading 
>> to even have it). So now it looks like this:
>>
>> /*
>>  * Because page->dma_pinned_flags is unioned with page->lru, any page that
>>  * uses these flags must NOT be on an LRU. That's partly enforced by
>>  * ClearPageDmaPinned, which gives the page back to LRU.
>>  *
>>  * PageDmaPinned is checked without knowing whether it is a tail page or a
>>  * PageDmaPinned page. For that reason, PageDmaPinned avoids PageTail (the 0th
>>  * bit in the first union of struct page), and instead uses bit 1 (0x2),
>>  * rather than bit 0.
>>  *
>>  * PageDmaPinned can only be used if no other systems are using the same bit
>>  * across the first struct page union. In this regard, it is similar to
>>  * PageTail, and in fact, because of PageTail's constraint that bit 0 be left
>>  * alone, bit 1 is also left alone so far: other union elements (ignoring tail
>>  * pages) put pointers there, and pointer alignment leaves the lower two bits
>>  * available.
>>  *
>>  * So, constraints include:
>>  *
>>  *     -- Only use PageDmaPinned on non-tail pages.
>>  *     -- Remove the page from any LRU list first.
>>  */
>> #define PAGE_DMA_PINNED		0x2
>>
>> /*
>>  * Because these flags are read outside of a lock, ensure visibility between
>>  * different threads, by using READ|WRITE_ONCE.
>>  */
>> static __always_inline int PageDmaPinned(struct page *page)
>> {
>> 	VM_BUG_ON(page != compound_head(page));
>> 	return (READ_ONCE(page->dma_pinned_flags) & PAGE_DMA_PINNED) != 0;
>> }
>>
>> [...]
>>>> +static __always_inline void SetPageDmaPinned(struct page *page)
>>>> +{
>>>> +	VM_BUG_ON(page != compound_head(page));
>>>
>>> VM_BUG_ON(!list_empty(&page->lru))
>>
>>
>> There is only one place where we set this flag, and that is when (in patch 6/6)
>> transitioning from a page that might (or might not) have been
>> on an LRU. In that case, the calling code has already corrupted page->lru, by
>> writing to page->dma_pinned_count, which is unions with page->lru:
>>
>> 		atomic_set(&page->dma_pinned_count, 1);
>> 		SetPageDmaPinned(page);
>>
>> ...so it would be inappropriate to call a list function, such as 
>> list_empty(), on that field.  Let's just leave it as-is.
>>
>>
>>>
>>>> +	WRITE_ONCE(page->dma_pinned_flags, PAGE_DMA_PINNED);
>>>> +}
>>>> +
>>>> +static __always_inline void ClearPageDmaPinned(struct page *page)
>>>> +{
>>>> +	VM_BUG_ON(page != compound_head(page));
>>>> +	VM_BUG_ON_PAGE(!PageDmaPinnedFlags(page), page);
>>>> +
>>>> +	/* This does a WRITE_ONCE to the lru.next, which is also the
>>>> +	 * page->dma_pinned_flags field. So in addition to restoring page->lru,
>>>> +	 * this provides visibility to other threads.
>>>> +	 */
>>>> +	INIT_LIST_HEAD(&page->lru);
>>>
>>> This assumes certain things about list_head, why not use the correct
>>> initialization bits.
>>>
>>
>> Yes, OK, changed to:
>>
>> static __always_inline void ClearPageDmaPinned(struct page *page)
>> {
>> 	VM_BUG_ON(page != compound_head(page));
>> 	VM_BUG_ON_PAGE(!PageDmaPinned(page), page);
>>
>> 	/* Provide visibility to other threads: */
>> 	WRITE_ONCE(page->dma_pinned_flags, 0);
>>
>> 	/*
>> 	 * Safety precaution: restore the list head, before possibly returning
>> 	 * the page to other subsystems.
>> 	 */
>> 	INIT_LIST_HEAD(&page->lru);
>> }
>>
>>
> 
> Sorry, I've been distracted with other things
> 
> This looks better, do we still need the INIT_LIST_HEAD?
> 

Good point. I guess not. I was getting a little too fancy, and it's better 
for ClearPageDmaPinned to be true to its name, and just only do that.

(Sorry for the delayed response.)

thanks,
-- 
John Hubbard
NVIDIA

  reply	other threads:[~2018-11-02 23:27 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-12  6:00 [PATCH 0/6] RFC: gup+dma: tracking dma-pinned pages john.hubbard
2018-10-12  6:00 ` [PATCH 1/6] mm: get_user_pages: consolidate error handling john.hubbard
2018-10-12  6:30   ` Balbir Singh
2018-10-12 22:45     ` John Hubbard
2018-10-12 22:45       ` John Hubbard
2018-10-12  6:00 ` [PATCH 2/6] mm: introduce put_user_page*(), placeholder versions john.hubbard
2018-10-12  7:35   ` Balbir Singh
2018-10-12 22:31     ` John Hubbard
2018-10-12 22:31       ` John Hubbard
2018-10-12  6:00 ` [PATCH 3/6] infiniband/mm: convert put_page() to put_user_page*() john.hubbard
2018-10-12  6:00 ` [PATCH 4/6] mm: introduce page->dma_pinned_flags, _count john.hubbard
2018-10-12 10:56   ` Balbir Singh
2018-10-13  0:15     ` John Hubbard
2018-10-13  0:15       ` John Hubbard
2018-10-24 11:00       ` Balbir Singh
2018-11-02 23:27         ` John Hubbard [this message]
2018-11-02 23:27           ` John Hubbard
2018-10-13  3:55   ` Dave Chinner
2018-10-13  7:34     ` John Hubbard
2018-10-13  7:34       ` John Hubbard
2018-10-13 16:47       ` Christoph Hellwig
2018-10-13 21:19         ` John Hubbard
2018-10-13 21:19           ` John Hubbard
2018-11-05  7:10         ` John Hubbard
2018-11-05  7:10           ` John Hubbard
2018-11-05  9:54           ` Jan Kara
2018-11-06  0:26             ` John Hubbard
2018-11-06  0:26               ` John Hubbard
2018-11-06  2:47               ` Dave Chinner
2018-11-06 11:00                 ` Jan Kara
2018-11-06 20:41                   ` Dave Chinner
2018-11-07  6:36                     ` John Hubbard
2018-11-07  6:36                       ` John Hubbard
2018-10-13 23:01       ` Dave Chinner
2018-10-16  8:51         ` Jan Kara
2018-10-17  1:48           ` John Hubbard
2018-10-17  1:48             ` John Hubbard
2018-10-17 11:09             ` Michal Hocko
2018-10-18  0:03               ` John Hubbard
2018-10-18  0:03                 ` John Hubbard
2018-10-19  8:11                 ` Michal Hocko
2018-10-12  6:00 ` [PATCH 5/6] mm: introduce zone_gup_lock, for dma-pinned pages john.hubbard
2018-10-12  6:00 ` [PATCH 6/6] mm: track gup pages with page->dma_pinned_* fields john.hubbard
2018-10-12 11:07   ` Balbir Singh
2018-10-13  0:33     ` John Hubbard
2018-10-13  0:33       ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a9d3a4e-a5d6-3d2d-008f-dc2d6cfd94ed@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.