All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@kernel.org>,
	Christopher Lameter <cl@linux.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.cz>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 6/6] mm: track gup pages with page->dma_pinned_* fields
Date: Fri, 12 Oct 2018 17:33:51 -0700	[thread overview]
Message-ID: <af901585-3f9a-da35-81a4-50ea341844c4@nvidia.com> (raw)
In-Reply-To: <20181012110728.GL8537@350D>

On 10/12/18 4:07 AM, Balbir Singh wrote:
> On Thu, Oct 11, 2018 at 11:00:14PM -0700, john.hubbard@gmail.com wrote:
>> From: John Hubbard <jhubbard@nvidia.com>
[...]
>> +static int pin_page_for_dma(struct page *page)
>> +{
>> +	int ret = 0;
>> +	struct zone *zone;
>> +
>> +	page = compound_head(page);
>> +	zone = page_zone(page);
>> +
>> +	spin_lock(zone_gup_lock(zone));
>> +
>> +	if (PageDmaPinned(page)) {
>> +		/* Page was not on an LRU list, because it was DMA-pinned. */
>> +		VM_BUG_ON_PAGE(PageLRU(page), page);
>> +
>> +		atomic_inc(&page->dma_pinned_count);
>> +		goto unlock_out;
>> +	}
>> +
>> +	/*
>> +	 * Note that page->dma_pinned_flags is unioned with page->lru.
>> +	 * Therefore, the rules are: checking if any of the
>> +	 * PAGE_DMA_PINNED_FLAGS bits are set may be done while page->lru
>> +	 * is in use. However, setting those flags requires that
>> +	 * the page is both locked, and also, removed from the LRU.
>> +	 */
>> +	ret = isolate_lru_page(page);
>> +
> 
> isolate_lru_page() can be expensive and in terms of the overall locking order
> sounds like zone_gup_lock is higher in the hierarchy than the locks taken
> inside isolate_lru_page()
 
As for the expensive part, that is a concern. But I do think we need some lock
here. The hierarchy shouldn't be a problem, given that this is a new lock. But
I'm not sure how to make this better. In any case, I think it should work--note that
the zone_lru_lock, within isolate_lru_page(), is of similar use, and is held
for a similar duration, so...maybe not really a problem?


>> +	if (ret == 0) {
>> +		/* Avoid problems later, when freeing the page: */
>> +		ClearPageActive(page);
>> +		ClearPageUnevictable(page);
>> +
>> +		/* counteract isolate_lru_page's effects: */
>> +		put_page(page);
> 
> Can the page get reaped here? What's the expected page count?

Nope. The page_count is at least one, because get_user_pages() incremented it.

 
>> +
>> +		atomic_set(&page->dma_pinned_count, 1);
>> +		SetPageDmaPinned(page);
>> +	}
>> +
>> +unlock_out:
>> +	spin_unlock(zone_gup_lock(zone));
>> +
>> +	return ret;
>> +}
>> +
>>  static struct page *no_page_table(struct vm_area_struct *vma,
>>  		unsigned int flags)
>>  {
>> @@ -659,7 +704,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>>  		unsigned int gup_flags, struct page **pages,
>>  		struct vm_area_struct **vmas, int *nonblocking)
>>  {
>> -	long i = 0;
>> +	long i = 0, j;
>>  	int err = 0;
>>  	unsigned int page_mask;
>>  	struct vm_area_struct *vma = NULL;
>> @@ -764,6 +809,10 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>>  	} while (nr_pages);
>>  
>>  out:
>> +	if (pages)
>> +		for (j = 0; j < i; j++)
>> +			pin_page_for_dma(pages[j]);
>> +
> 
> Why does get_user_pages() unconditionally pin_page_for_dma?

That's the grand plan here: get_user_pages() now means "unconditionally pin the page for dma".
If you didn't want that, then either release it quickly (many callers do), or use a different
way of pinning or acquiring the page.

> 
>>  	return i ? i : err;
>>  }
>>  
>> @@ -1841,7 +1890,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>>  			struct page **pages)
>>  {
>>  	unsigned long addr, len, end;
>> -	int nr = 0, ret = 0;
>> +	int nr = 0, ret = 0, i;
>>  
>>  	start &= PAGE_MASK;
>>  	addr = start;
>> @@ -1862,6 +1911,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>>  		ret = nr;
>>  	}
>>  
>> +	for (i = 0; i < nr; i++)
>> +		pin_page_for_dma(pages[i]);
> 
> Why does get_user_pages_fast() unconditionally pin_page_for_dma?

All of the get_user_pages*() variants need to follow the same rules, so the same 
explanation as above, applies here also.

>> +
>>  	if (nr < nr_pages) {
>>  		/* Try to get the remaining pages with get_user_pages */
>>  		start += nr << PAGE_SHIFT;
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index e79cb59552d9..af9719756081 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2335,6 +2335,11 @@ static void lock_page_lru(struct page *page, int *isolated)
>>  	if (PageLRU(page)) {
>>  		struct lruvec *lruvec;
>>  
>> +		/* LRU and PageDmaPinned are mutually exclusive: they use the
>> +		 * same fields in struct page, but for different purposes.
>> +		 */
> 
> Comment style needs fixing
 
oops, thanks for spotting those, will fix.


-- 
thanks,
John Hubbard
NVIDIA

WARNING: multiple messages have this Message-ID (diff)
From: John Hubbard <jhubbard@nvidia.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@kernel.org>,
	Christopher Lameter <cl@linux.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.cz>,
	<linux-mm@kvack.org>, Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 6/6] mm: track gup pages with page->dma_pinned_* fields
Date: Fri, 12 Oct 2018 17:33:51 -0700	[thread overview]
Message-ID: <af901585-3f9a-da35-81a4-50ea341844c4@nvidia.com> (raw)
In-Reply-To: <20181012110728.GL8537@350D>

On 10/12/18 4:07 AM, Balbir Singh wrote:
> On Thu, Oct 11, 2018 at 11:00:14PM -0700, john.hubbard@gmail.com wrote:
>> From: John Hubbard <jhubbard@nvidia.com>
[...]
>> +static int pin_page_for_dma(struct page *page)
>> +{
>> +	int ret = 0;
>> +	struct zone *zone;
>> +
>> +	page = compound_head(page);
>> +	zone = page_zone(page);
>> +
>> +	spin_lock(zone_gup_lock(zone));
>> +
>> +	if (PageDmaPinned(page)) {
>> +		/* Page was not on an LRU list, because it was DMA-pinned. */
>> +		VM_BUG_ON_PAGE(PageLRU(page), page);
>> +
>> +		atomic_inc(&page->dma_pinned_count);
>> +		goto unlock_out;
>> +	}
>> +
>> +	/*
>> +	 * Note that page->dma_pinned_flags is unioned with page->lru.
>> +	 * Therefore, the rules are: checking if any of the
>> +	 * PAGE_DMA_PINNED_FLAGS bits are set may be done while page->lru
>> +	 * is in use. However, setting those flags requires that
>> +	 * the page is both locked, and also, removed from the LRU.
>> +	 */
>> +	ret = isolate_lru_page(page);
>> +
> 
> isolate_lru_page() can be expensive and in terms of the overall locking order
> sounds like zone_gup_lock is higher in the hierarchy than the locks taken
> inside isolate_lru_page()
 
As for the expensive part, that is a concern. But I do think we need some lock
here. The hierarchy shouldn't be a problem, given that this is a new lock. But
I'm not sure how to make this better. In any case, I think it should work--note that
the zone_lru_lock, within isolate_lru_page(), is of similar use, and is held
for a similar duration, so...maybe not really a problem?


>> +	if (ret == 0) {
>> +		/* Avoid problems later, when freeing the page: */
>> +		ClearPageActive(page);
>> +		ClearPageUnevictable(page);
>> +
>> +		/* counteract isolate_lru_page's effects: */
>> +		put_page(page);
> 
> Can the page get reaped here? What's the expected page count?

Nope. The page_count is at least one, because get_user_pages() incremented it.

 
>> +
>> +		atomic_set(&page->dma_pinned_count, 1);
>> +		SetPageDmaPinned(page);
>> +	}
>> +
>> +unlock_out:
>> +	spin_unlock(zone_gup_lock(zone));
>> +
>> +	return ret;
>> +}
>> +
>>  static struct page *no_page_table(struct vm_area_struct *vma,
>>  		unsigned int flags)
>>  {
>> @@ -659,7 +704,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>>  		unsigned int gup_flags, struct page **pages,
>>  		struct vm_area_struct **vmas, int *nonblocking)
>>  {
>> -	long i = 0;
>> +	long i = 0, j;
>>  	int err = 0;
>>  	unsigned int page_mask;
>>  	struct vm_area_struct *vma = NULL;
>> @@ -764,6 +809,10 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>>  	} while (nr_pages);
>>  
>>  out:
>> +	if (pages)
>> +		for (j = 0; j < i; j++)
>> +			pin_page_for_dma(pages[j]);
>> +
> 
> Why does get_user_pages() unconditionally pin_page_for_dma?

That's the grand plan here: get_user_pages() now means "unconditionally pin the page for dma".
If you didn't want that, then either release it quickly (many callers do), or use a different
way of pinning or acquiring the page.

> 
>>  	return i ? i : err;
>>  }
>>  
>> @@ -1841,7 +1890,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>>  			struct page **pages)
>>  {
>>  	unsigned long addr, len, end;
>> -	int nr = 0, ret = 0;
>> +	int nr = 0, ret = 0, i;
>>  
>>  	start &= PAGE_MASK;
>>  	addr = start;
>> @@ -1862,6 +1911,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>>  		ret = nr;
>>  	}
>>  
>> +	for (i = 0; i < nr; i++)
>> +		pin_page_for_dma(pages[i]);
> 
> Why does get_user_pages_fast() unconditionally pin_page_for_dma?

All of the get_user_pages*() variants need to follow the same rules, so the same 
explanation as above, applies here also.

>> +
>>  	if (nr < nr_pages) {
>>  		/* Try to get the remaining pages with get_user_pages */
>>  		start += nr << PAGE_SHIFT;
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index e79cb59552d9..af9719756081 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2335,6 +2335,11 @@ static void lock_page_lru(struct page *page, int *isolated)
>>  	if (PageLRU(page)) {
>>  		struct lruvec *lruvec;
>>  
>> +		/* LRU and PageDmaPinned are mutually exclusive: they use the
>> +		 * same fields in struct page, but for different purposes.
>> +		 */
> 
> Comment style needs fixing
 
oops, thanks for spotting those, will fix.


-- 
thanks,
John Hubbard
NVIDIA


  reply	other threads:[~2018-10-13  0:33 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-12  6:00 [PATCH 0/6] RFC: gup+dma: tracking dma-pinned pages john.hubbard
2018-10-12  6:00 ` [PATCH 1/6] mm: get_user_pages: consolidate error handling john.hubbard
2018-10-12  6:30   ` Balbir Singh
2018-10-12 22:45     ` John Hubbard
2018-10-12 22:45       ` John Hubbard
2018-10-12  6:00 ` [PATCH 2/6] mm: introduce put_user_page*(), placeholder versions john.hubbard
2018-10-12  7:35   ` Balbir Singh
2018-10-12 22:31     ` John Hubbard
2018-10-12 22:31       ` John Hubbard
2018-10-12  6:00 ` [PATCH 3/6] infiniband/mm: convert put_page() to put_user_page*() john.hubbard
2018-10-12  6:00 ` [PATCH 4/6] mm: introduce page->dma_pinned_flags, _count john.hubbard
2018-10-12 10:56   ` Balbir Singh
2018-10-13  0:15     ` John Hubbard
2018-10-13  0:15       ` John Hubbard
2018-10-24 11:00       ` Balbir Singh
2018-11-02 23:27         ` John Hubbard
2018-11-02 23:27           ` John Hubbard
2018-10-13  3:55   ` Dave Chinner
2018-10-13  7:34     ` John Hubbard
2018-10-13  7:34       ` John Hubbard
2018-10-13 16:47       ` Christoph Hellwig
2018-10-13 21:19         ` John Hubbard
2018-10-13 21:19           ` John Hubbard
2018-11-05  7:10         ` John Hubbard
2018-11-05  7:10           ` John Hubbard
2018-11-05  9:54           ` Jan Kara
2018-11-06  0:26             ` John Hubbard
2018-11-06  0:26               ` John Hubbard
2018-11-06  2:47               ` Dave Chinner
2018-11-06 11:00                 ` Jan Kara
2018-11-06 20:41                   ` Dave Chinner
2018-11-07  6:36                     ` John Hubbard
2018-11-07  6:36                       ` John Hubbard
2018-10-13 23:01       ` Dave Chinner
2018-10-16  8:51         ` Jan Kara
2018-10-17  1:48           ` John Hubbard
2018-10-17  1:48             ` John Hubbard
2018-10-17 11:09             ` Michal Hocko
2018-10-18  0:03               ` John Hubbard
2018-10-18  0:03                 ` John Hubbard
2018-10-19  8:11                 ` Michal Hocko
2018-10-12  6:00 ` [PATCH 5/6] mm: introduce zone_gup_lock, for dma-pinned pages john.hubbard
2018-10-12  6:00 ` [PATCH 6/6] mm: track gup pages with page->dma_pinned_* fields john.hubbard
2018-10-12 11:07   ` Balbir Singh
2018-10-13  0:33     ` John Hubbard [this message]
2018-10-13  0:33       ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af901585-3f9a-da35-81a4-50ea341844c4@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.