intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Jan Kara" <jack@suse.cz>, "Jason Gunthorpe" <jgg@ziepe.ca>,
	"John Hubbard" <jhubbard@nvidia.com>,
	intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, "Jérôme Glisse" <jglisse@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Claudio Imbrenda" <imbrenda@linux.ibm.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [Intel-gfx] [PATCH] mm: Skip opportunistic reclaim for dma pinned pages
Date: Thu, 25 Jun 2020 09:57:25 +0200	[thread overview]
Message-ID: <20200625075725.GC1320@dhcp22.suse.cz> (raw)
In-Reply-To: <20200624191417.16735-1-chris@chris-wilson.co.uk>

On Wed 24-06-20 20:14:17, Chris Wilson wrote:
> A general rule of thumb is that shrinkers should be fast and effective.
> They are called from direct reclaim at the most incovenient of times when
> the caller is waiting for a page. If we attempt to reclaim a page being
> pinned for active dma [pin_user_pages()], we will incur far greater
> latency than a normal anonymous page mapped multiple times. Worse the
> page may be in use indefinitely by the HW and unable to be reclaimed
> in a timely manner.
> 
> A side effect of the LRU shrinker not being dma aware is that we will
> often attempt to perform direct reclaim on the persistent group of dma
> pages while continuing to use the dma HW (an issue as the HW may already
> be actively waiting for the next user request), and even attempt to
> reclaim a partially allocated dma object in order to satisfy pinning
> the next user page for that object.

You are talking about direct reclaim but this path is shared with the
background reclaim. This is a bit confusing. Maybe you just want to
outline the latency in the reclaim which is more noticeable in the
direct reclaim to the userspace. This would be good to be clarified.

How much memory are we talking about here btw?

> It is to be expected that such pages are made available for reclaim at
> the end of the dma operation [unpin_user_pages()], and for truly
> longterm pins to be proactively recovered via device specific shrinkers
> [i.e. stop the HW, allow the pages to be returned to the system, and
> then compete again for the memory].

Is the later implemented?

Btw. overall intention of the patch is not really clear to me. Do I get
it right that this is going to reduce latency of the reclaim for pages
that are not reclaimable anyway because they are pinned? If yes do we
have any numbers for that.

It would be also good to explain why the bail out is implemented in
try_to_unmap rather than shrink_shrink_page_list.

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> ---
> This seems perhaps a little devious and overzealous. Is there a more
> appropriate TTU flag? Would there be a way to limit its effect to say
> FOLL_LONGTERM? Doing the migration first would seem to be sensible if
> we disable opportunistic migration for the duration of the pin.
> ---
>  mm/rmap.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 5fe2dedce1fc..374c6e65551b 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1393,6 +1393,22 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  	    is_zone_device_page(page) && !is_device_private_page(page))
>  		return true;
>  
> +	/*
> +	 * Try and fail early to revoke a costly DMA pinned page.
> +	 *
> +	 * Reclaiming an active DMA page requires stopping the hardware
> +	 * and flushing access. [Hardware that does support pagefaulting,
> +	 * and so can quickly revoke DMA pages at any time, does not need
> +	 * to pin the DMA page.] At worst, the page may be indefinitely in
> +	 * use by the hardware. Even at best it will take far longer to
> +	 * revoke the access via the mmu notifier, forcing that latency
> +	 * onto our callers rather than the consumer of the HW. As we are
> +	 * called during opportunistic direct reclaim, declare the
> +	 * opportunity cost too high and ignore the page.
> +	 */
> +	if (page_maybe_dma_pinned(page))
> +		return true;

I do not understand why the page table walk needs to be done. The page
is going to be pinned no matter how many page tables are mapping it
right?

> +
>  	if (flags & TTU_SPLIT_HUGE_PMD) {
>  		split_huge_pmd_address(vma, address,
>  				flags & TTU_SPLIT_FREEZE, page);
> -- 
> 2.20.1

-- 
Michal Hocko
SUSE Labs
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2020-06-25  7:57 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-24 19:14 [Intel-gfx] [PATCH] mm: Skip opportunistic reclaim for dma pinned pages Chris Wilson
     [not found] ` <20200624192116.GO6578@ziepe.ca>
2020-06-24 20:23   ` Yang Shi
2020-06-24 21:02     ` Yang Shi
2020-06-24 20:23   ` Chris Wilson
2020-06-24 20:47   ` John Hubbard
     [not found]     ` <20200624232047.GP6578@ziepe.ca>
2020-06-25  0:11       ` John Hubbard
2020-06-25 11:24         ` Jan Kara
2020-06-25  7:57 ` Michal Hocko [this message]
2020-06-25 11:00   ` Chris Wilson
2020-06-25 15:12     ` Michal Hocko
2020-06-25 15:48       ` Chris Wilson
2020-06-25 11:42 ` Matthew Wilcox
2020-06-25 13:40   ` Jan Kara
2020-06-25 16:05     ` Matthew Wilcox
2020-06-25 16:32   ` Yang Shi
2020-06-26 10:15 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2020-06-26 10:37 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-06-26 12:53 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200625075725.GC1320@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=imbrenda@linux.ibm.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).