linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bharat Bhushan <bbhushan2@marvell.com>
To: Ankur Arora <ankur.a.arora@oracle.com>
Cc: "alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Sunil Kovvuri Goutham <sgoutham@marvell.com>,
	"terminus@gmail.com" <terminus@gmail.com>
Subject: RE: [EXT] Re: vfio-pci: protect remap_pfn_range() from simultaneous calls
Date: Mon, 8 Mar 2021 07:03:03 +0000	[thread overview]
Message-ID: <DM6PR18MB269865BC107C37F100743427E3939@DM6PR18MB2698.namprd18.prod.outlook.com> (raw)
In-Reply-To: <9ece62e2-df7b-09b0-d96a-8105048c0ed0@oracle.com>



> -----Original Message-----
> From: Ankur Arora <ankur.a.arora@oracle.com>
> Sent: Monday, March 8, 2021 12:29 PM
> To: Bharat Bhushan <bbhushan2@marvell.com>
> Cc: alex.williamson@redhat.com; linux-kernel@vger.kernel.org; Sunil Kovvuri
> Goutham <sgoutham@marvell.com>; terminus@gmail.com
> Subject: Re: [EXT] Re: vfio-pci: protect remap_pfn_range() from simultaneous
> calls
> 
> On 2021-03-02 4:47 a.m., Bharat Bhushan wrote:
> > Hi Ankur,
> >
> >> -----Original Message-----
> >> From: Ankur Arora <ankur.a.arora@oracle.com>
> >> Sent: Friday, February 26, 2021 6:24 AM
> >> To: Bharat Bhushan <bbhushan2@marvell.com>
> >> Cc: alex.williamson@redhat.com; ankur.a.arora@oracle.com; linux-
> >> kernel@vger.kernel.org; Sunil Kovvuri Goutham <sgoutham@marvell.com>;
> >> terminus@gmail.com
> >> Subject: [EXT] Re: vfio-pci: protect remap_pfn_range() from
> >> simultaneous calls
> >>
> >> External Email
> >>
> >> ---------------------------------------------------------------------
> >> -
> >> Hi Bharat,
> >>
> >> Can you test the patch below to see if it works for you?
> >
> > Sorry for late reply, I actually missed this email.
> >
> > Reproducibility of the issue was low in my test scenario, one out of ~15 runs. I
> run it multiple times, overnight and observed no issues.
> 
> Awesome. Would you mind giving me your Tested-by for the patch?

Sure, please point if this is already sent for review.

> 
> >> Also could you add some more detail to your earlier description of
> >> the bug?
> >
> > Our test case is running ODP multi-threaded application, where parent process
> maps (yes it uses MAP_DMA) the region and then child processes access same.
> As a workaround we tried accessing the region once by parent process before
> creating other accessor threads and it worked as expected.
> 
> Thanks for the detail. So if the child processes start early -- they might fault while
> the VFIO_IOMMU_MAP_DMA was going on. And, since they only acquire
> mmap_lock in RO mode, both paths would end up calling io_remap_pfn_range()
> via the fault handler.

Yes, that's correct.

Thanks
-Bharat

> 
> Thanks
> Ankur
> 
> >
> > Thanks
> > -Bharat
> >
> >> In particular, AFAICS you are using ODP (-DPDK?) with multiple
> >> threads touching this region. From your stack, it looks like the
> >> fault was user-space generated, and I'm guessing you were not using
> >> the VFIO_IOMMU_MAP_DMA.
> >>
> >> Ankur
> >>
> >> -- >8 --
> >>
> >> Subject: [PATCH] vfio-pci: protect io_remap_pfn_range() from
> >> simultaneous calls
> >>
> >> vfio_pci_mmap_fault() maps the complete VMA on fault. With concurrent
> >> faults, this would result in multiple calls to io_remap_pfn_range(),
> >> where it would hit a BUG_ON(!pte_none(*pte)) in remap_pte_range().
> >> (It would also link the same VMA multiple times in vdev->vma_list but
> >> given the BUG_ON that is less serious.)
> >>
> >> Normally, however, this won't happen -- at least with
> >> vfio_iommu_type1 -- the VFIO_IOMMU_MAP_DMA path is protected by
> iommu->lock.
> >>
> >> If, however, we are using some kind of parallelization mechanism like
> >> this one with ktask under discussion [1], we would hit this.
> >> Even if we were doing this serially, given that vfio-pci remaps a
> >> larger extent than strictly necessary it should internally enforce
> >> coherence of its data structures.
> >>
> >> Handle this by using the VMA's presence in the vdev->vma_list as
> >> indicative of a fully mapped VMA and returning success early to all
> >> but the first VMA fault. Note that this is clearly optimstic given
> >> that the mapping is ongoing, and might mean that the caller sees more
> >> faults until the remap is done.
> >>
> >> [1]
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_
> >> linux-
> >> 2Dmm_20181105145141.6f9937f6-
> >>
> 40w520.home_&d=DwIDAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=PAAlWswPe7d8gHl
> >>
> GbCLmy2YezyK7O3Hv_t2heGnouBw&m=3ZDXqnn9xNUCjgXwN9mHIKT7oyXu55P
> >> U7yV2j0b-5hw&s=hiICkNtrcH4AbAWRrbkvMUylp7Bv0YHFCjxNGC6CGOk&e=
> >>
> >> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> >> ---
> >>   drivers/vfio/pci/vfio_pci.c | 25 ++++++++++++++++++++++++-
> >>   1 file changed, 24 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci.c
> >> b/drivers/vfio/pci/vfio_pci.c index 65e7e6b44578..b9f509863db1 100644
> >> --- a/drivers/vfio/pci/vfio_pci.c
> >> +++ b/drivers/vfio/pci/vfio_pci.c
> >> @@ -1573,6 +1573,11 @@ static int __vfio_pci_add_vma(struct
> >> vfio_pci_device *vdev,
> >>   {
> >>   	struct vfio_pci_mmap_vma *mmap_vma;
> >>
> >> +	list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) {
> >> +		if (mmap_vma->vma == vma)
> >> +			return 1;
> >> +	}
> >> +
> >>   	mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL);
> >>   	if (!mmap_vma)
> >>   		return -ENOMEM;
> >> @@ -1613,6 +1618,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct
> >> vm_fault
> >> *vmf)
> >>   	struct vm_area_struct *vma = vmf->vma;
> >>   	struct vfio_pci_device *vdev = vma->vm_private_data;
> >>   	vm_fault_t ret = VM_FAULT_NOPAGE;
> >> +	int vma_present;
> >>
> >>   	mutex_lock(&vdev->vma_lock);
> >>   	down_read(&vdev->memory_lock);
> >> @@ -1623,7 +1629,21 @@ static vm_fault_t vfio_pci_mmap_fault(struct
> >> vm_fault *vmf)
> >>   		goto up_out;
> >>   	}
> >>
> >> -	if (__vfio_pci_add_vma(vdev, vma)) {
> >> +	/*
> >> +	 * __vfio_pci_add_vma() either adds the vma to the vdev->vma_list
> >> +	 * (vma_present == 0), or indicates that the vma is already present
> >> +	 * on the list (vma_present == 1).
> >> +	 *
> >> +	 * Overload the meaning of this flag to also imply that the vma is
> >> +	 * fully mapped. This allows us to serialize the mapping -- ensuring
> >> +	 * that simultaneous faults will not both try to call
> >> +	 * io_remap_pfn_range().
> >> +	 *
> >> +	 * However, this might mean that callers to which we returned success
> >> +	 * optimistically will see more faults until the remap is complete.
> >> +	 */
> >> +	vma_present = __vfio_pci_add_vma(vdev, vma);
> >> +	if (vma_present < 0) {
> >>   		ret = VM_FAULT_OOM;
> >>   		mutex_unlock(&vdev->vma_lock);
> >>   		goto up_out;
> >> @@ -1631,6 +1651,9 @@ static vm_fault_t vfio_pci_mmap_fault(struct
> >> vm_fault
> >> *vmf)
> >>
> >>   	mutex_unlock(&vdev->vma_lock);
> >>
> >> +	if (vma_present)
> >> +		goto up_out;
> >> +
> >>   	if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
> >>   			       vma->vm_end - vma->vm_start, vma-
> >>> vm_page_prot))
> >>   		ret = VM_FAULT_SIGBUS;
> >> --
> >> 2.29.2
> >

      reply	other threads:[~2021-03-08  7:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-06 16:17 vfio-pci: protect remap_pfn_range() from simultaneous calls Bharat Bhushan
2021-01-06 18:13 ` Ankur Arora
2021-01-07  4:57   ` [EXT] " Bharat Bhushan
2021-01-19  8:51   ` Bharat Bhushan
2021-01-21  4:39     ` Ankur Arora
2021-02-26  0:53       ` Ankur Arora
2021-03-02 12:47         ` [EXT] " Bharat Bhushan
2021-03-08  6:59           ` Ankur Arora
2021-03-08  7:03             ` Bharat Bhushan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR18MB269865BC107C37F100743427E3939@DM6PR18MB2698.namprd18.prod.outlook.com \
    --to=bbhushan2@marvell.com \
    --cc=alex.williamson@redhat.com \
    --cc=ankur.a.arora@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sgoutham@marvell.com \
    --cc=terminus@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).