linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Kasireddy, Vivek" <vivek.kasireddy@intel.com>
To: Alistair Popple <apopple@nvidia.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>,
	"Kim, Dongwon" <dongwon.kim@intel.com>,
	David Hildenbrand <david@redhat.com>,
	"Chang, Junxiao" <junxiao.chang@intel.com>,
	Hugh Dickins <hughd@google.com>, Peter Xu <peterx@redhat.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: RE: [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages)
Date: Mon, 24 Jul 2023 07:54:38 +0000	[thread overview]
Message-ID: <IA0PR11MB7185EA5ABD21EE7DA900B481F802A@IA0PR11MB7185.namprd11.prod.outlook.com> (raw)
In-Reply-To: <87pm4nj6s5.fsf@nvdebian.thelocal>

Hi Alistair,

> 
> 
> "Kasireddy, Vivek" <vivek.kasireddy@intel.com> writes:
> 
> > Hi Alistair,
> >
> >>
> >> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> > index 64a3239b6407..1f2f0209101a 100644
> >> > --- a/mm/hugetlb.c
> >> > +++ b/mm/hugetlb.c
> >> > @@ -6096,8 +6096,12 @@ vm_fault_t hugetlb_fault(struct mm_struct
> >> *mm, struct vm_area_struct *vma,
> >> >  		 * hugetlb_no_page will drop vma lock and hugetlb fault
> >> >  		 * mutex internally, which make us return immediately.
> >> >  		 */
> >> > -		return hugetlb_no_page(mm, vma, mapping, idx, address,
> >> ptep,
> >> > +		ret = hugetlb_no_page(mm, vma, mapping, idx, address,
> >> ptep,
> >> >  				      entry, flags);
> >> > +		if (!ret)
> >> > +			mmu_notifier_update_mapping(vma->vm_mm,
> >> address,
> >> > +						    pte_pfn(*ptep));
> >>
> >> The next patch ends up calling pfn_to_page() on the result of
> >> pte_pfn(*ptep). I don't think that's safe because couldn't the PTE have
> >> already changed and/or the new page have been freed?
> > Yeah, that might be possible; I believe the right thing to do would be:
> > -               return hugetlb_no_page(mm, vma, mapping, idx, address, ptep,
> > +               ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep,
> >                                       entry, flags);
> > +               if (!ret) {
> > +                       ptl = huge_pte_lock(h, mm, ptep);
> > +                       mmu_notifier_update_mapping(vma->vm_mm, address,
> > +                                                    pte_pfn(*ptep));
> > +                       spin_unlock(ptl);
> > +               }
> 
> Yes, although obviously as I think you point out below you wouldn't be
> able to take any sleeping locks in mmu_notifier_update_mapping().
Yes, I understand that, but I am not sure how we can prevent any potential
notifier callback from taking sleeping locks other than adding clear comments.

> 
> > In which case I'd need to make a similar change in the shmem path as well.
> > And, also redo (or eliminate) the locking in udmabuf (patch) which seems a
> > bit excessive on a second look given our use-case (where reads and writes
> do
> > not happen simultaneously due to fence synchronization in the guest
> driver).
> 
> I'm not at all familiar with the udmabuf use case but that sounds
> brittle and effectively makes this notifier udmabuf specific right?
Oh, Qemu uses the udmabuf driver to provide Host Graphics components
(such as Spice, Gstreamer, UI, etc) zero-copy access to Guest created
buffers. In other words, from a core mm standpoint, udmabuf just
collects a bunch of pages (associated with buffers) scattered inside
the memfd (Guest ram backed by shmem or hugetlbfs) and wraps
them in a dmabuf fd. And, since we provide zero-copy access, we
use DMA fences to ensure that the components on the Host and
Guest do not access the buffer simultaneously.

> 
> The name gives the impression it is more general though. I have
I'd like to make it suitable for general usage.

> contemplated adding a notifier for PTE updates for drivers using
> hmm_range_fault() as it would save some expensive device faults and it
> this could be useful for that.
> 
> So if we're adding a notifier for PTE updates I think it would be good
> if it covered all cases and was robust enough to allow mirroring of the
> correct PTE value (ie. by being called under PTL or via some other
> synchronisation like hmm_range_fault()).
Ok; in order to make it clear that the notifier is associated with PTE updates,
I think it needs to have a more suitable name such as mmu_notifier_update_pte()
or mmu_notifier_new_pte(). But we already have mmu_notifier_change_pte,
which IIUC is used mainly for PTE updates triggered by KSM. So, I am inclining
towards dropping this new notifier and instead adding a new flag to change_pte
to distinguish between KSM triggered notifications and others. Something along
the lines of:
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 218ddc3b4bc7..6afce2287143 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -129,7 +129,8 @@ struct mmu_notifier_ops {
        void (*change_pte)(struct mmu_notifier *subscription,
                           struct mm_struct *mm,
                           unsigned long address,
-                          pte_t pte);
+                          pte_t pte,
+                          bool ksm_update);
@@ -658,7 +659,7 @@ static inline void mmu_notifier_range_init_owner(
        unsigned long ___address = __address;                           \
        pte_t ___pte = __pte;                                           \
                                                                        \
-       mmu_notifier_change_pte(___mm, ___address, ___pte);             \
+       mmu_notifier_change_pte(___mm, ___address, ___pte, true);       \

And replace mmu_notifier_update_mapping(vma->vm_mm, address, pte_pfn(*ptep))
in the current patch with
mmu_notifier_change_pte(vma->vm_mm, address, ptep, false));

Would that work for your HMM use-case -- assuming we call change_pte after
taking PTL?

Thanks,
Vivek

> 
> Thanks.
> 
> > Thanks,
> > Vivek
> >
> >>
> >> > +		return ret;
> >> >
> >> >  	ret = 0;
> >> >
> >> > @@ -6223,6 +6227,9 @@ vm_fault_t hugetlb_fault(struct mm_struct
> *mm,
> >> struct vm_area_struct *vma,
> >> >  	 */
> >> >  	if (need_wait_lock)
> >> >  		folio_wait_locked(folio);
> >> > +	if (!ret)
> >> > +		mmu_notifier_update_mapping(vma->vm_mm, address,
> >> > +					    pte_pfn(*ptep));
> >> >  	return ret;
> >> >  }
> >> >
> >> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> >> > index 50c0dde1354f..6421405334b9 100644
> >> > --- a/mm/mmu_notifier.c
> >> > +++ b/mm/mmu_notifier.c
> >> > @@ -441,6 +441,23 @@ void __mmu_notifier_change_pte(struct
> >> mm_struct *mm, unsigned long address,
> >> >  	srcu_read_unlock(&srcu, id);
> >> >  }
> >> >
> >> > +void __mmu_notifier_update_mapping(struct mm_struct *mm,
> unsigned
> >> long address,
> >> > +				   unsigned long pfn)
> >> > +{
> >> > +	struct mmu_notifier *subscription;
> >> > +	int id;
> >> > +
> >> > +	id = srcu_read_lock(&srcu);
> >> > +	hlist_for_each_entry_rcu(subscription,
> >> > +				 &mm->notifier_subscriptions->list, hlist,
> >> > +				 srcu_read_lock_held(&srcu)) {
> >> > +		if (subscription->ops->update_mapping)
> >> > +			subscription->ops->update_mapping(subscription,
> >> mm,
> >> > +							  address, pfn);
> >> > +	}
> >> > +	srcu_read_unlock(&srcu, id);
> >> > +}
> >> > +
> >> >  static int mn_itree_invalidate(struct mmu_notifier_subscriptions
> >> *subscriptions,
> >> >  			       const struct mmu_notifier_range *range)
> >> >  {
> >> > diff --git a/mm/shmem.c b/mm/shmem.c
> >> > index 2f2e0e618072..e59eb5fafadb 100644
> >> > --- a/mm/shmem.c
> >> > +++ b/mm/shmem.c
> >> > @@ -77,6 +77,7 @@ static struct vfsmount *shm_mnt;
> >> >  #include <linux/fcntl.h>
> >> >  #include <uapi/linux/memfd.h>
> >> >  #include <linux/rmap.h>
> >> > +#include <linux/mmu_notifier.h>
> >> >  #include <linux/uuid.h>
> >> >
> >> >  #include <linux/uaccess.h>
> >> > @@ -2164,8 +2165,12 @@ static vm_fault_t shmem_fault(struct
> vm_fault
> >> *vmf)
> >> >  				  gfp, vma, vmf, &ret);
> >> >  	if (err)
> >> >  		return vmf_error(err);
> >> > -	if (folio)
> >> > +	if (folio) {
> >> >  		vmf->page = folio_file_page(folio, vmf->pgoff);
> >> > +		if (ret == VM_FAULT_LOCKED)
> >> > +			mmu_notifier_update_mapping(vma->vm_mm, vmf-
> >> >address,
> >> > +						    page_to_pfn(vmf->page));
> >> > +	}
> >> >  	return ret;
> >> >  }



  reply	other threads:[~2023-07-24  7:55 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-18  8:28 [RFC v1 0/3] udmabuf: Replace pages when there is FALLOC_FL_PUNCH_HOLE in memfd Vivek Kasireddy
2023-07-18  8:28 ` [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages) Vivek Kasireddy
2023-07-18 15:36   ` Jason Gunthorpe
2023-07-19  0:05     ` Kasireddy, Vivek
2023-07-19  0:24       ` Jason Gunthorpe
2023-07-19  6:19         ` Kasireddy, Vivek
2023-07-19  2:08   ` Alistair Popple
2023-07-20  7:43     ` Kasireddy, Vivek
2023-07-20  9:00       ` Alistair Popple
2023-07-24  7:54         ` Kasireddy, Vivek [this message]
2023-07-24 13:35           ` Jason Gunthorpe
2023-07-24 20:32             ` Kasireddy, Vivek
2023-07-25  4:30               ` Hugh Dickins
2023-07-25 22:24                 ` Kasireddy, Vivek
2023-07-27 21:43                   ` Peter Xu
2023-07-29  0:08                     ` Kasireddy, Vivek
2023-07-31 17:05                       ` Peter Xu
2023-08-01  7:11                         ` Kasireddy, Vivek
2023-08-01 21:57                           ` Peter Xu
2023-08-03  8:08                             ` Kasireddy, Vivek
2023-08-03 13:02                               ` Peter Xu
2023-07-25 12:36               ` Jason Gunthorpe
2023-07-25 22:44                 ` Kasireddy, Vivek
2023-07-25 22:53                   ` Jason Gunthorpe
2023-07-27  7:34                     ` Kasireddy, Vivek
2023-07-27 11:58                       ` Jason Gunthorpe
2023-07-29  0:46                         ` Kasireddy, Vivek
2023-07-30 23:09                           ` Jason Gunthorpe
2023-08-01  5:32                             ` Kasireddy, Vivek
2023-08-01 12:19                               ` Jason Gunthorpe
2023-08-01 12:22                                 ` David Hildenbrand
2023-08-01 12:23                                   ` Jason Gunthorpe
2023-08-01 12:26                                     ` David Hildenbrand
2023-08-01 12:26                                       ` Jason Gunthorpe
2023-08-01 12:28                                         ` David Hildenbrand
2023-08-01 17:53                                           ` Kasireddy, Vivek
2023-08-01 18:19                                             ` Jason Gunthorpe
2023-08-03  7:35                                               ` Kasireddy, Vivek
2023-08-03 12:14                                                 ` Jason Gunthorpe
2023-08-03 12:32                                                   ` David Hildenbrand
2023-08-04  0:14                                                     ` Alistair Popple
2023-08-04  6:39                                                       ` Kasireddy, Vivek
2023-08-04  7:23                                                         ` David Hildenbrand
2023-08-04 21:53                                                           ` Kasireddy, Vivek
2023-08-04 12:49                                                         ` Jason Gunthorpe
2023-08-08  7:37                                                           ` Kasireddy, Vivek
2023-08-08 12:42                                                             ` Jason Gunthorpe
2023-08-16  6:43                                                               ` Kasireddy, Vivek
2023-08-21  9:02                                                                 ` Alistair Popple
2023-08-22  6:14                                                                   ` Kasireddy, Vivek
2023-08-22  8:15                                                                     ` Alistair Popple
2023-08-24  6:48                                                                       ` Kasireddy, Vivek
2023-08-28  4:38                                                                         ` Kasireddy, Vivek
2023-08-30 16:02                                                                           ` Jason Gunthorpe
2023-07-25  3:38             ` Alistair Popple
2023-07-24 13:36           ` Alistair Popple
2023-07-24 13:37             ` Jason Gunthorpe
2023-07-24 20:42             ` Kasireddy, Vivek
2023-07-25  3:14               ` Alistair Popple
2023-07-18  8:28 ` [RFC v1 2/3] udmabuf: Replace pages when there is FALLOC_FL_PUNCH_HOLE in memfd Vivek Kasireddy
2023-08-02 12:40   ` Daniel Vetter
2023-08-03  8:24     ` Kasireddy, Vivek
2023-08-03  8:32       ` Daniel Vetter
2023-07-18  8:28 ` [RFC v1 3/3] selftests/dma-buf/udmabuf: Add tests for huge pages and FALLOC_FL_PUNCH_HOLE Vivek Kasireddy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=IA0PR11MB7185EA5ABD21EE7DA900B481F802A@IA0PR11MB7185.namprd11.prod.outlook.com \
    --to=vivek.kasireddy@intel.com \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=dongwon.kim@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=junxiao.chang@intel.com \
    --cc=kraxel@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).