All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: David Hildenbrand <david@redhat.com>,
	Matthew Wilcox <willy@infradead.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>,
	Steven Sistare <steven.sistare@oracle.com>,
	<akpm@linux-foundation.org>, <ying.huang@intel.com>,
	<mgorman@techsingularity.net>, <baolin.wang@linux.alibaba.com>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	Khalid Aziz <khalid@kernel.org>
Subject: Re: [PATCH v4] mm, compaction: Skip all non-migratable pages during scan
Date: Fri, 26 May 2023 19:11:05 -0700	[thread overview]
Message-ID: <846b770c-9f63-90a2-0435-ec82484e3f74@nvidia.com> (raw)
In-Reply-To: <e31cd404-56ce-4cad-fcc3-3a6695f750fa@redhat.com>

On 5/26/23 11:50, David Hildenbrand wrote:
> On 26.05.23 20:46, Matthew Wilcox wrote:
>> On Fri, May 26, 2023 at 06:46:15PM +0200, David Hildenbrand wrote:
>>> On 26.05.23 18:44, Matthew Wilcox wrote:
>>>> On Fri, May 26, 2023 at 09:44:34AM -0600, Khalid Aziz wrote:
>>>>>> Oh, I think I found it!  pin_user_pages_remote() is called by
>>>>>> vaddr_get_pfns().  If these are the pages you're concerned about,
>>>>>> then the efficient way to do what you want is simply to call
>>>>>> folio_maybe_dma_pinned().  Far more efficient than the current mess
>>>>>> of total_mapcount().
>>>>>
>>>>> vfio pinned pages triggered this change. Wouldn't checking refcounts against
>>>>> mapcount provide a more generalized way of detecting non-migratable pages?
>>>>
>>>> Well, you changed the comment to say that we were concerned about
>>>> long-term pins.  If we are, than folio_maybe_dma_pinned() is how to test
>>>> for long-term pins.  If we want to skip pages which are short-term pinned,
>>>> then we need to not change the comment, and keep using mapcount/refcount
>>>> differences.
>>>>
>>>
>>> folio_maybe_dma_pinned() is all about FOLL_PIN, not FOLL_LONGTERM.
>>
>> But according to our documentation, FOLL_LONGTERM implies FOLL_PIN.
> 
> Yes. But folio_maybe_dma_pinned() will indicate both, long-term pins and short-term pins. There really is no way to distinguish both, unfortunately.

Not yet, anyway. :)

> 
>> Anyway, right now, the code skips any pages which are merely FOLL_GET,
>> so we'll skip fewer pages if we do only skip the FOLL_PIN ones,
>> regardless if we'd prefer to only skip the FOLL_LONGTERM ones.
>>
>>> folio_maybe_dma_pinned() would skip migrating any page that has more than
>>> 1024 references. (shared libraries?)
>>
>> True, but maybe we should be skipping any page with that many mappings,
>> given how disruptive it is to the rest of the system to unmap a page
>> from >1024 processes.
>>
> 
> So any user with 1024 processes can fragment physical memory? :/
> 
> Sorry, I'd like to minimize the usage of folio_maybe_dma_pinned().
> 

I was actually thinking that we should minimize any more cases of
fragile mapcount and refcount comparison, which then leads to
Matthew's approach here!


thanks,
-- 
John Hubbard
NVIDIA


  reply	other threads:[~2023-05-27  2:11 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-25 19:15 [PATCH v4] mm, compaction: Skip all non-migratable pages during scan Khalid Aziz
2023-05-25 19:58 ` Matthew Wilcox
2023-05-25 20:15   ` Steven Sistare
2023-05-25 20:45     ` Matthew Wilcox
2023-05-25 21:31       ` Matthew Wilcox
2023-05-26 15:44         ` Khalid Aziz
2023-05-26 16:44           ` Matthew Wilcox
2023-05-26 16:46             ` David Hildenbrand
2023-05-26 18:46               ` Matthew Wilcox
2023-05-26 18:50                 ` David Hildenbrand
2023-05-27  2:11                   ` John Hubbard [this message]
2023-05-27  3:18                     ` Matthew Wilcox
2023-05-28 23:49                       ` John Hubbard
2023-05-29  0:31                         ` Matthew Wilcox
2023-05-29  9:25                           ` David Hildenbrand
2023-05-30 15:42                             ` Khalid Aziz
2023-06-09 22:11                               ` Andrew Morton
2023-06-09 23:28                                 ` Khalid Aziz
2023-05-25 20:41   ` Khalid Aziz
2023-05-29  3:01 ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=846b770c-9f63-90a2-0435-ec82484e3f74@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=khalid.aziz@oracle.com \
    --cc=khalid@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=steven.sistare@oracle.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.