linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: "Kasireddy, Vivek" <vivek.kasireddy@intel.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>,
	"Kim, Dongwon" <dongwon.kim@intel.com>,
	David Hildenbrand <david@redhat.com>,
	"Chang, Junxiao" <junxiao.chang@intel.com>,
	Hugh Dickins <hughd@google.com>, Peter Xu <peterx@redhat.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages)
Date: Tue, 25 Jul 2023 13:14:22 +1000	[thread overview]
Message-ID: <875y68zkvu.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <IA0PR11MB71855B4A7D0F428C9E8771D9F802A@IA0PR11MB7185.namprd11.prod.outlook.com>


"Kasireddy, Vivek" <vivek.kasireddy@intel.com> writes:

> Hi Alistair,

Hi Vivek,

>> I wonder if we actually need the flag? IIUC it is already used for more
>> than just KSM. For example it can be called as part of fault handling by
>> set_pte_at_notify() in in wp_page_copy().
> Yes, I noticed that but what I really meant is I'd put all these prior instances
> of change_pte in one category using the flag. Without the flag, KVM, the only
> user that currently has a callback for change_pte would get notified which
> may not be appropriate. Note that the change_pte callback for KVM was
> added (based on Git log) for KSM updates and it is not clear to me if that
> is still the case.

It is certainly now called from contexts other than KSM though. I have
no idea whether that's a problem, nor if adding more callers would
actually be an issue though so understand the motivation for the flag.

>> > Would that work for your HMM use-case -- assuming we call change_pte
>> after
>> > taking PTL?
>> 
>> I suspect being called under the PTL could be an issue. For HMM we use a
>> combination of sequence numbers and a mutex to synchronise PTEs. To
>> avoid calling the notifier while holding PTL we might be able to record
>> the sequence number (subscriptions->invalidate_seq) while holding PTL,
>> release the PTL and provide that sequence number to the notifier
>> callback along with the PTE.
>> 
>> Some form of mmu_interval_read_retry() could then be used to detect if
>> the PTE has changed between dropping the PTL and calling the
>> update_pte()/change_pte() notifier.
>> 
>> Of course if your design can handle being called while holding the PTL
>> then the above is probably unnecessarily complex for your use-case.
> Yes, I believe we can handle it while holding the PTL. 
>
>> 
>> My primary issue with this patch is the notifier is called without the
>> PTL while providing a PTE value. Without some form of synchronisation it
>> isn't safe to use the result of eg. pte_page(pte) or pte_write(pte) in
>> the notifier callback. Based on your comments it seems udmabuf might
>> have some other synchronisation that makes it safe, but being external
>> to the notifier calls make it's hard to reason about.
> I intend to fix the PTL issue in v2 but I am still not sure what is the best
> thing to do as far as the notifier is concerned given the following options:
> - Keep this patch (and notifier name) but ensure that it is called under PTL

I think that's preferable to adding a flag so long as it's implemented
and documented that this is called whenever a PTE is updated. Otherwise
a third user will come along and have the same problem we've currently
got with KVMs usage.

> - Drop this patch and expand the use of change_pte but add the flag to
>   distinguish between prior usage and new usage
> - Keep this patch but don't include the PTE or the pfn of the new page as
>   part of the notifier. In other words, just have this:
> mmu_notifier_update_mapping(struct mm_struct *mm, unsigned long address)
> This way, in udmabuf driver, we could get the new page from the page cache
> as soon as we get notified:
> 	mapoff = linear_page_index(vma, address);
> 	new_page = find_get_page(vma->vm_file->f_mapping, mapoff);
> This last option would probably limit the new notifier to the udmabuf
> use-case but I do not intend to pursue it as you suggested that you are
> also interested in a new notifier associated with PTE updates.

Actually the last option isn't limiting assuming it's sent whenever the
PTE is updated. It just means users have to use hmm_range_fault() or
some equivalent that already enforces proper synchronisation if they
need the actual PTE value. That seems fine to me.

> Thanks,
> Vivek
>
>> 
>>  - Alistair
>> 
>> > Thanks,
>> > Vivek
>> >
>> >>
>> >> Thanks.
>> >>
>> >> > Thanks,
>> >> > Vivek
>> >> >
>> >> >>
>> >> >> > +		return ret;
>> >> >> >
>> >> >> >  	ret = 0;
>> >> >> >
>> >> >> > @@ -6223,6 +6227,9 @@ vm_fault_t hugetlb_fault(struct mm_struct
>> >> *mm,
>> >> >> struct vm_area_struct *vma,
>> >> >> >  	 */
>> >> >> >  	if (need_wait_lock)
>> >> >> >  		folio_wait_locked(folio);
>> >> >> > +	if (!ret)
>> >> >> > +		mmu_notifier_update_mapping(vma->vm_mm,
>> address,
>> >> >> > +					    pte_pfn(*ptep));
>> >> >> >  	return ret;
>> >> >> >  }
>> >> >> >
>> >> >> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
>> >> >> > index 50c0dde1354f..6421405334b9 100644
>> >> >> > --- a/mm/mmu_notifier.c
>> >> >> > +++ b/mm/mmu_notifier.c
>> >> >> > @@ -441,6 +441,23 @@ void __mmu_notifier_change_pte(struct
>> >> >> mm_struct *mm, unsigned long address,
>> >> >> >  	srcu_read_unlock(&srcu, id);
>> >> >> >  }
>> >> >> >
>> >> >> > +void __mmu_notifier_update_mapping(struct mm_struct *mm,
>> >> unsigned
>> >> >> long address,
>> >> >> > +				   unsigned long pfn)
>> >> >> > +{
>> >> >> > +	struct mmu_notifier *subscription;
>> >> >> > +	int id;
>> >> >> > +
>> >> >> > +	id = srcu_read_lock(&srcu);
>> >> >> > +	hlist_for_each_entry_rcu(subscription,
>> >> >> > +				 &mm->notifier_subscriptions->list,
>> hlist,
>> >> >> > +				 srcu_read_lock_held(&srcu)) {
>> >> >> > +		if (subscription->ops->update_mapping)
>> >> >> > +			subscription->ops-
>> >update_mapping(subscription,
>> >> >> mm,
>> >> >> > +							  address,
>> pfn);
>> >> >> > +	}
>> >> >> > +	srcu_read_unlock(&srcu, id);
>> >> >> > +}
>> >> >> > +
>> >> >> >  static int mn_itree_invalidate(struct mmu_notifier_subscriptions
>> >> >> *subscriptions,
>> >> >> >  			       const struct mmu_notifier_range *range)
>> >> >> >  {
>> >> >> > diff --git a/mm/shmem.c b/mm/shmem.c
>> >> >> > index 2f2e0e618072..e59eb5fafadb 100644
>> >> >> > --- a/mm/shmem.c
>> >> >> > +++ b/mm/shmem.c
>> >> >> > @@ -77,6 +77,7 @@ static struct vfsmount *shm_mnt;
>> >> >> >  #include <linux/fcntl.h>
>> >> >> >  #include <uapi/linux/memfd.h>
>> >> >> >  #include <linux/rmap.h>
>> >> >> > +#include <linux/mmu_notifier.h>
>> >> >> >  #include <linux/uuid.h>
>> >> >> >
>> >> >> >  #include <linux/uaccess.h>
>> >> >> > @@ -2164,8 +2165,12 @@ static vm_fault_t shmem_fault(struct
>> >> vm_fault
>> >> >> *vmf)
>> >> >> >  				  gfp, vma, vmf, &ret);
>> >> >> >  	if (err)
>> >> >> >  		return vmf_error(err);
>> >> >> > -	if (folio)
>> >> >> > +	if (folio) {
>> >> >> >  		vmf->page = folio_file_page(folio, vmf->pgoff);
>> >> >> > +		if (ret == VM_FAULT_LOCKED)
>> >> >> > +			mmu_notifier_update_mapping(vma-
>> >vm_mm, vmf-
>> >> >> >address,
>> >> >> > +						    page_to_pfn(vmf-
>> >page));
>> >> >> > +	}
>> >> >> >  	return ret;
>> >> >> >  }



  reply	other threads:[~2023-07-25  4:16 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-18  8:28 [RFC v1 0/3] udmabuf: Replace pages when there is FALLOC_FL_PUNCH_HOLE in memfd Vivek Kasireddy
2023-07-18  8:28 ` [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages) Vivek Kasireddy
2023-07-18 15:36   ` Jason Gunthorpe
2023-07-19  0:05     ` Kasireddy, Vivek
2023-07-19  0:24       ` Jason Gunthorpe
2023-07-19  6:19         ` Kasireddy, Vivek
2023-07-19  2:08   ` Alistair Popple
2023-07-20  7:43     ` Kasireddy, Vivek
2023-07-20  9:00       ` Alistair Popple
2023-07-24  7:54         ` Kasireddy, Vivek
2023-07-24 13:35           ` Jason Gunthorpe
2023-07-24 20:32             ` Kasireddy, Vivek
2023-07-25  4:30               ` Hugh Dickins
2023-07-25 22:24                 ` Kasireddy, Vivek
2023-07-27 21:43                   ` Peter Xu
2023-07-29  0:08                     ` Kasireddy, Vivek
2023-07-31 17:05                       ` Peter Xu
2023-08-01  7:11                         ` Kasireddy, Vivek
2023-08-01 21:57                           ` Peter Xu
2023-08-03  8:08                             ` Kasireddy, Vivek
2023-08-03 13:02                               ` Peter Xu
2023-07-25 12:36               ` Jason Gunthorpe
2023-07-25 22:44                 ` Kasireddy, Vivek
2023-07-25 22:53                   ` Jason Gunthorpe
2023-07-27  7:34                     ` Kasireddy, Vivek
2023-07-27 11:58                       ` Jason Gunthorpe
2023-07-29  0:46                         ` Kasireddy, Vivek
2023-07-30 23:09                           ` Jason Gunthorpe
2023-08-01  5:32                             ` Kasireddy, Vivek
2023-08-01 12:19                               ` Jason Gunthorpe
2023-08-01 12:22                                 ` David Hildenbrand
2023-08-01 12:23                                   ` Jason Gunthorpe
2023-08-01 12:26                                     ` David Hildenbrand
2023-08-01 12:26                                       ` Jason Gunthorpe
2023-08-01 12:28                                         ` David Hildenbrand
2023-08-01 17:53                                           ` Kasireddy, Vivek
2023-08-01 18:19                                             ` Jason Gunthorpe
2023-08-03  7:35                                               ` Kasireddy, Vivek
2023-08-03 12:14                                                 ` Jason Gunthorpe
2023-08-03 12:32                                                   ` David Hildenbrand
2023-08-04  0:14                                                     ` Alistair Popple
2023-08-04  6:39                                                       ` Kasireddy, Vivek
2023-08-04  7:23                                                         ` David Hildenbrand
2023-08-04 21:53                                                           ` Kasireddy, Vivek
2023-08-04 12:49                                                         ` Jason Gunthorpe
2023-08-08  7:37                                                           ` Kasireddy, Vivek
2023-08-08 12:42                                                             ` Jason Gunthorpe
2023-08-16  6:43                                                               ` Kasireddy, Vivek
2023-08-21  9:02                                                                 ` Alistair Popple
2023-08-22  6:14                                                                   ` Kasireddy, Vivek
2023-08-22  8:15                                                                     ` Alistair Popple
2023-08-24  6:48                                                                       ` Kasireddy, Vivek
2023-08-28  4:38                                                                         ` Kasireddy, Vivek
2023-08-30 16:02                                                                           ` Jason Gunthorpe
2023-07-25  3:38             ` Alistair Popple
2023-07-24 13:36           ` Alistair Popple
2023-07-24 13:37             ` Jason Gunthorpe
2023-07-24 20:42             ` Kasireddy, Vivek
2023-07-25  3:14               ` Alistair Popple [this message]
2023-07-18  8:28 ` [RFC v1 2/3] udmabuf: Replace pages when there is FALLOC_FL_PUNCH_HOLE in memfd Vivek Kasireddy
2023-08-02 12:40   ` Daniel Vetter
2023-08-03  8:24     ` Kasireddy, Vivek
2023-08-03  8:32       ` Daniel Vetter
2023-07-18  8:28 ` [RFC v1 3/3] selftests/dma-buf/udmabuf: Add tests for huge pages and FALLOC_FL_PUNCH_HOLE Vivek Kasireddy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875y68zkvu.fsf@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=dongwon.kim@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=junxiao.chang@intel.com \
    --cc=kraxel@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=vivek.kasireddy@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).