linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: "Kasireddy, Vivek" <vivek.kasireddy@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	David Hildenbrand <david@redhat.com>,
	"Kim, Dongwon" <dongwon.kim@intel.com>,
	"Chang, Junxiao" <junxiao.chang@intel.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	Hugh Dickins <hughd@google.com>, Peter Xu <peterx@redhat.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages)
Date: Mon, 21 Aug 2023 19:02:28 +1000	[thread overview]
Message-ID: <87h6oswysm.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <IA0PR11MB71857FDD99CAC23C88C9F27CF815A@IA0PR11MB7185.namprd11.prod.outlook.com>


"Kasireddy, Vivek" <vivek.kasireddy@intel.com> writes:

> Hi Jason,
>
>> > >
>> > > > No, adding HMM_PFN_REQ_WRITE still doesn't help in fixing the issue.
>> > > > Although, I do not have THP enabled (or built-in), shmem does not evict
>> > > > the pages after hole punch as noted in the comment in
>> shmem_fallocate():
>> > >
>> > > This is the source of all your problems.
>> > >
>> > > Things that are mm-centric are supposed to track the VMAs and changes
>> to
>> > > the PTEs. If you do something in userspace and it doesn't cause the
>> > > CPU page tables to change then it certainly shouldn't cause any mmu
>> > > notifiers or hmm_range_fault changes.
>> > I am not doing anything out of the blue in the userspace. I think the
>> behavior
>> > I am seeing with shmem (where an invalidation event
>> (MMU_NOTIFY_CLEAR)
>> > does occur because of a hole punch but the PTEs don't really get updated)
>> > can arguably be considered an optimization.
>> 
>> Your explanations don't make sense.
>> 
>> If MMU_NOTIFER_CLEAR was sent but the PTEs were left present then:
>> 
>> > > There should still be an invalidation notifier at some point when the
>> > > CPU tables do eventually change, whenever that is. Missing that
>> > > notification would be a bug.
>> > I clearly do not see any notification getting triggered (from both
>> shmem_fault()
>> > and hugetlb_fault()) when the PTEs do get updated as the hole is refilled
>> > due to writes. Are you saying that there needs to be an invalidation event
>> > (MMU_NOTIFY_CLEAR?) dispatched at this point?
>> 
>> You don't get to get shmem_fault in the first place.
> What I am observing is that even after MMU_NOTIFY_CLEAR (hole punch) is sent,
> hmm_range_fault() finds that the PTEs associated with the hole are still pte_present().
> I think it remains this way as long as there are reads on the hole. Once there are
> writes, it triggers shmem_fault() which results in PTEs getting updated but without
> any notification.

Oh wait, this is shmem. The read from hmm_range_fault() (assuming you
specified HMM_PFN_REQ_FAULT) will trigger shmem_fault() due to the
missing PTE. Subsequent writes will just upgrade PTE permissions
assuming the read didn't map them RW to begin with. If you want to
actually see the hole with hmm_range_fault() don't specify
HMM_PFN_REQ_FAULT (or _WRITE).

>> 
>> If they were marked non-prsent during the CLEAR then the shadow side
>> remains non-present until it gets its own fault.
>> 
>> If they were made non-present without an invalidation then that is a
>> bug.
>> 
>> > > hmm_range_fault() is the correct API to use if you are working with
>> > > notifiers. Do not hack something together using pin_user_pages.
>> 
>> > I noticed that hmm_range_fault() does not seem to be working as expected
>> > given that it gets stuck(hangs) while walking hugetlb pages.
>> 
>> You are the first to report that, it sounds like a serious bug. Please
>> try to fix it.
>> 
>> > Regardless, as I mentioned above, the lack of notification when PTEs
>> > do get updated due to writes is the crux of the issue
>> > here. Therefore, AFAIU, triggering an invalidation event or some
>> > other kind of notification would help in fixing this issue.
>> 
>> You seem to be facing some kind of bug in the mm, it sounds pretty
>> serious, and it almost certainly is a missing invalidation.
>> 
>> Basically, anything that changes a PTE must eventually trigger an
>> invalidation. It is illegal to change a PTE from one present value to
>> another present value without invalidation notification.
>> 
>> It is not surprising something would be missed here.
> As you suggest, it looks like the root-cause of this issue is the missing
> invalidation notification when the PTEs are changed from one present

I don't think there's a missing invalidation here. You say you're seeing
the MMU_NOTIFY_CLEAR when hole punching which is when the PTE is
cleared. When else do you expect a notification?

> value to another. I'd like to fix this issue eventually but I first need to
> focus on addressing udmabuf page migration (out of movable zone)
> and also look into the locking concerns Daniel mentioned about pairing
> static and dynamic dmabuf exporters and importers.
>
> Thanks,
> Vivek



  reply	other threads:[~2023-08-21  9:05 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-18  8:28 [RFC v1 0/3] udmabuf: Replace pages when there is FALLOC_FL_PUNCH_HOLE in memfd Vivek Kasireddy
2023-07-18  8:28 ` [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages) Vivek Kasireddy
2023-07-18 15:36   ` Jason Gunthorpe
2023-07-19  0:05     ` Kasireddy, Vivek
2023-07-19  0:24       ` Jason Gunthorpe
2023-07-19  6:19         ` Kasireddy, Vivek
2023-07-19  2:08   ` Alistair Popple
2023-07-20  7:43     ` Kasireddy, Vivek
2023-07-20  9:00       ` Alistair Popple
2023-07-24  7:54         ` Kasireddy, Vivek
2023-07-24 13:35           ` Jason Gunthorpe
2023-07-24 20:32             ` Kasireddy, Vivek
2023-07-25  4:30               ` Hugh Dickins
2023-07-25 22:24                 ` Kasireddy, Vivek
2023-07-27 21:43                   ` Peter Xu
2023-07-29  0:08                     ` Kasireddy, Vivek
2023-07-31 17:05                       ` Peter Xu
2023-08-01  7:11                         ` Kasireddy, Vivek
2023-08-01 21:57                           ` Peter Xu
2023-08-03  8:08                             ` Kasireddy, Vivek
2023-08-03 13:02                               ` Peter Xu
2023-07-25 12:36               ` Jason Gunthorpe
2023-07-25 22:44                 ` Kasireddy, Vivek
2023-07-25 22:53                   ` Jason Gunthorpe
2023-07-27  7:34                     ` Kasireddy, Vivek
2023-07-27 11:58                       ` Jason Gunthorpe
2023-07-29  0:46                         ` Kasireddy, Vivek
2023-07-30 23:09                           ` Jason Gunthorpe
2023-08-01  5:32                             ` Kasireddy, Vivek
2023-08-01 12:19                               ` Jason Gunthorpe
2023-08-01 12:22                                 ` David Hildenbrand
2023-08-01 12:23                                   ` Jason Gunthorpe
2023-08-01 12:26                                     ` David Hildenbrand
2023-08-01 12:26                                       ` Jason Gunthorpe
2023-08-01 12:28                                         ` David Hildenbrand
2023-08-01 17:53                                           ` Kasireddy, Vivek
2023-08-01 18:19                                             ` Jason Gunthorpe
2023-08-03  7:35                                               ` Kasireddy, Vivek
2023-08-03 12:14                                                 ` Jason Gunthorpe
2023-08-03 12:32                                                   ` David Hildenbrand
2023-08-04  0:14                                                     ` Alistair Popple
2023-08-04  6:39                                                       ` Kasireddy, Vivek
2023-08-04  7:23                                                         ` David Hildenbrand
2023-08-04 21:53                                                           ` Kasireddy, Vivek
2023-08-04 12:49                                                         ` Jason Gunthorpe
2023-08-08  7:37                                                           ` Kasireddy, Vivek
2023-08-08 12:42                                                             ` Jason Gunthorpe
2023-08-16  6:43                                                               ` Kasireddy, Vivek
2023-08-21  9:02                                                                 ` Alistair Popple [this message]
2023-08-22  6:14                                                                   ` Kasireddy, Vivek
2023-08-22  8:15                                                                     ` Alistair Popple
2023-08-24  6:48                                                                       ` Kasireddy, Vivek
2023-08-28  4:38                                                                         ` Kasireddy, Vivek
2023-08-30 16:02                                                                           ` Jason Gunthorpe
2023-07-25  3:38             ` Alistair Popple
2023-07-24 13:36           ` Alistair Popple
2023-07-24 13:37             ` Jason Gunthorpe
2023-07-24 20:42             ` Kasireddy, Vivek
2023-07-25  3:14               ` Alistair Popple
2023-07-18  8:28 ` [RFC v1 2/3] udmabuf: Replace pages when there is FALLOC_FL_PUNCH_HOLE in memfd Vivek Kasireddy
2023-08-02 12:40   ` Daniel Vetter
2023-08-03  8:24     ` Kasireddy, Vivek
2023-08-03  8:32       ` Daniel Vetter
2023-07-18  8:28 ` [RFC v1 3/3] selftests/dma-buf/udmabuf: Add tests for huge pages and FALLOC_FL_PUNCH_HOLE Vivek Kasireddy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h6oswysm.fsf@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=dongwon.kim@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=junxiao.chang@intel.com \
    --cc=kraxel@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=vivek.kasireddy@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).