linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: "bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	Jonathan Corbet <corbet@lwn.net>,
	David Woodhouse <dwmw2@infradead.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	Joerg Roedel <joro@8bytes.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kselftest@vger.kernel.org"
	<linux-kselftest@vger.kernel.org>,
	"llvm@lists.linux.dev" <llvm@lists.linux.dev>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Miguel Ojeda <ojeda@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Shuah Khan <shuah@kernel.org>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Tom Rix <trix@redhat.com>, Will Deacon <will@kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	Eric Auger <eric.auger@redhat.com>,
	Eric Farman <farman@linux.ibm.com>,
	Jason Wang <jasowang@redhat.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	"Martins, Joao" <joao.m.martins@oracle.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	Keqian Zhu <zhukeqian1@huawei.com>
Subject: Re: [PATCH v4 09/17] iommufd: Data structure to provide IOVA to PFN mapping
Date: Mon, 14 Nov 2022 14:43:59 -0400	[thread overview]
Message-ID: <Y3KMbyVwS6D505cA@nvidia.com> (raw)
In-Reply-To: <BN9PR11MB527638FCF4A1351DBA1A644E8C059@BN9PR11MB5276.namprd11.prod.outlook.com>

On Mon, Nov 14, 2022 at 07:28:47AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, November 8, 2022 8:49 AM
> > 
> > +
> > +/*
> > + * Automatically find a block of IOVA that is not being used and not reserved.
> > + * Does not return a 0 IOVA even if it is valid.
> 
> what is the problem with 0? should this be documented in uAPI?

0 is commonly used as an errant value for uninitialized things. We
don't automatically map it into a process mm because it can cause
security problems if we don't trap a bogus 0/NULL pointer reference.

The same logic applies here too, the allocator should not return 0 to
reserve it as an unmapped IOVA page to catch bugs.

I don't think it needs to be documented

> > +	interval_tree_for_each_span(&allowed_span, &iopt->allowed_itree,
> > +				    PAGE_SIZE, ULONG_MAX - PAGE_SIZE) {
> > +		if (RB_EMPTY_ROOT(&iopt->allowed_itree.rb_root)) {
> > +			allowed_span.start_used = PAGE_SIZE;
> > +			allowed_span.last_used = ULONG_MAX - PAGE_SIZE;
> > +			allowed_span.is_hole = false;
> > +		}
> 
> statically initialize it when iopt is created?

allowed_span is a stack variable?

> > +		if (!__alloc_iova_check_used(&allowed_span, length,
> > +					     iova_alignment, page_offset))
> > +			continue;
> > +
> > +		interval_tree_for_each_span(&area_span, &iopt->area_itree,
> > +					    allowed_span.start_used,
> > +					    allowed_span.last_used) {
> > +			if (!__alloc_iova_check_hole(&area_span, length,
> > +						     iova_alignment,
> > +						     page_offset))
> > +				continue;
> > +
> > +			interval_tree_for_each_span(&reserved_span,
> > +						    &iopt->reserved_itree,
> > +						    area_span.start_used,
> > +						    area_span.last_used) {
> > +				if (!__alloc_iova_check_hole(
> > +					    &reserved_span, length,
> > +					    iova_alignment, page_offset))
> > +					continue;
> 
> this could be simplified by double span.

It is subtly not compatible, the double span looks for used areas.
This is looking for a used area in the allowed_itree, a hole in the
area_itree, and a hole in the reserved_itree.

I don't think IOVA allocation should be a fast path so it is not worth
alot of effort to micro-optimize this.

> > +static int iopt_check_iova(struct io_pagetable *iopt, unsigned long iova,
> > +			   unsigned long length)
> > +{
> > +	unsigned long last;
> > +
> > +	lockdep_assert_held(&iopt->iova_rwsem);
> > +
> > +	if ((iova & (iopt->iova_alignment - 1)))
> > +		return -EINVAL;
> > +
> > +	if (check_add_overflow(iova, length - 1, &last))
> > +		return -EOVERFLOW;
> > +
> > +	/* No reserved IOVA intersects the range */
> > +	if (iopt_reserved_iter_first(iopt, iova, last))
> > +		return -ENOENT;
> 
> vfio type1 returns -EINVAL
> 
> > +
> > +	/* Check that there is not already a mapping in the range */
> > +	if (iopt_area_iter_first(iopt, iova, last))
> > +		return -EADDRINUSE;
> 
> vfio type1 returns -EEXIST

Hum I guess we can change them here, it is a bit annoying for the test suite
though.

> > +static int iopt_unmap_iova_range(struct io_pagetable *iopt, unsigned long
> > start,
> > +				 unsigned long end, unsigned long
> 
> s/end/last/
> 
> > +int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
> > +		    unsigned long length, unsigned long *unmapped)
> > +{
> > +	unsigned long iova_end;
> 
> s/iova_end/iova_last/

yep
 
> > +static int iopt_calculate_iova_alignment(struct io_pagetable *iopt)
> > +{
> > +	unsigned long new_iova_alignment;
> > +	struct iommufd_access *access;
> > +	struct iommu_domain *domain;
> > +	unsigned long index;
> > +
> > +	lockdep_assert_held_write(&iopt->iova_rwsem);
> > +	lockdep_assert_held(&iopt->domains_rwsem);
> > +
> > +	if (iopt->disable_large_pages)
> > +		new_iova_alignment = PAGE_SIZE;
> > +	else
> > +		new_iova_alignment = 1;
> 
> I didn't understand why we start searching alignment from a
> smaller value when large pages is enabled. what is the
> connection here?

'disable_large_pages' is a tiny bit misnamed, what it really does is
ensure that every iommu_map call is exactly PAGE_SIZE, not more (large
pages) and not less (what this is protecting against).

So if a domain has less than PAGE_SIZE we upgrade to
PAGE_SIZE. Otherwise we allow using the lowest possible alignment.

This allows userspace to always work in PAGE_SIZE units without fear
of problems, eg with sub-page-size units becoming weird or something.

> > +	interval_tree_remove(&area->node, &iopt->area_itree);
> > +	rc = iopt_insert_area(iopt, lhs, area->pages, start_iova,
> > +			      iopt_area_start_byte(area, start_iova),
> > +			      (new_start - 1) - start_iova + 1,
> > +			      area->iommu_prot);
> > +	if (WARN_ON(rc))
> > +		goto err_insert;
> > +
> > +	rc = iopt_insert_area(iopt, rhs, area->pages, new_start,
> > +			      iopt_area_start_byte(area, new_start),
> > +			      last_iova - new_start + 1, area->iommu_prot);
> > +	if (WARN_ON(rc))
> > +		goto err_remove_lhs;
> > +
> > +	lhs->storage_domain = area->storage_domain;
> > +	lhs->num_accesses = area->num_accesses;
> > +	lhs->pages = area->pages;
> > +	rhs->storage_domain = area->storage_domain;
> > +	rhs->num_accesses = area->num_accesses;
> 
> if an access only spans one side, is it correct to have both split sides
> keep the access number?

Er, this is acatually completely broken, woops. A removal of an access
will trigger a WARN_ON since the access_itree element is very likely
no longer correct.

Ah.. So the only use case here is unmapping and you can't unmap
something that has an access established, except in some pathalogical
case where the access does not intersect with what is being mapped.

There is no way to tell which iopt_pages_access are connected to which
areas, so without spending some memory this can't be fixed up. I think
it is not a real issue as mdev plus this ancient VFIO interface is
probably not something that exists in the real world..

+       /*
+        * Splitting is not permitted if an access exists, we don't track enough
+        * information to split existing accesses.
+        */
+       if (area->num_accesses) {
+               rc = -EINVAL;
+               goto err_unlock;
+       }
+
@@ -1041,10 +1050,8 @@ static int iopt_area_split(struct iopt_area *area, unsigned long iova)
                goto err_remove_lhs;
 
        lhs->storage_domain = area->storage_domain;
-       lhs->num_accesses = area->num_accesses;
        lhs->pages = area->pages;
        rhs->storage_domain = area->storage_domain;
-       rhs->num_accesses = area->num_accesses;
        rhs->pages = area->pages;
        kref_get(&rhs->pages->kref);
        kfree(area);

Thanks,
Jason

  reply	other threads:[~2022-11-14 18:44 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-08  0:48 [PATCH v4 00/17] IOMMUFD Generic interface Jason Gunthorpe
2022-11-08  0:48 ` [PATCH v4 01/17] iommu: Add IOMMU_CAP_ENFORCE_CACHE_COHERENCY Jason Gunthorpe
2022-11-08  0:48 ` [PATCH v4 02/17] iommu: Add device-centric DMA ownership interfaces Jason Gunthorpe
2022-11-11  5:37   ` Tian, Kevin
2022-11-14 16:44     ` Jason Gunthorpe
2022-11-14 13:33   ` Eric Auger
2022-11-14 16:58     ` Jason Gunthorpe
2022-11-08  0:48 ` [PATCH v4 03/17] interval-tree: Add a utility to iterate over spans in an interval tree Jason Gunthorpe
2022-11-15 14:14   ` Eric Auger
2022-11-15 16:44     ` Jason Gunthorpe
2022-11-08  0:48 ` [PATCH v4 04/17] iommufd: Document overview of iommufd Jason Gunthorpe
2022-11-08  3:45   ` Bagas Sanjaya
2022-11-08 17:10   ` [PATCH v4 4/17] " Jason Gunthorpe
2022-11-11  5:59     ` Tian, Kevin
2022-11-14 15:14       ` Jason Gunthorpe
2022-11-10  9:30   ` [PATCH v4 04/17] " Bagas Sanjaya
2022-11-10 14:49     ` Jonathan Corbet
2022-11-10 14:54       ` Jason Gunthorpe
2022-11-10 15:10         ` Jonathan Corbet
2022-11-10 15:23           ` Jason Gunthorpe
2022-11-10 15:28             ` Jonathan Corbet
2022-11-10 15:29               ` Jason Gunthorpe
2022-11-10 15:52                 ` Jonathan Corbet
2022-11-10 16:54                   ` Jason Gunthorpe
2022-11-11  1:46       ` Bagas Sanjaya
2022-11-14 20:50   ` Eric Auger
2022-11-15  0:52     ` Jason Gunthorpe
2022-11-08  0:48 ` [PATCH v4 05/17] iommufd: File descriptor, context, kconfig and makefiles Jason Gunthorpe
2022-11-11  6:07   ` Tian, Kevin
2022-11-08  0:48 ` [PATCH v4 06/17] kernel/user: Allow user::locked_vm to be usable for iommufd Jason Gunthorpe
2022-11-08  0:49 ` [PATCH v4 07/17] iommufd: PFN handling for iopt_pages Jason Gunthorpe
2022-11-11  9:56   ` Tian, Kevin
2022-11-14 17:20     ` Jason Gunthorpe
2022-11-11 11:09   ` Tian, Kevin
2022-11-14 17:24     ` Jason Gunthorpe
2022-11-15  2:59       ` Tian, Kevin
2022-11-08  0:49 ` [PATCH v4 08/17] iommufd: Algorithms for PFN storage Jason Gunthorpe
2022-11-14  5:50   ` Tian, Kevin
2022-11-14 18:02     ` Jason Gunthorpe
2022-11-15  3:06       ` Tian, Kevin
2022-11-15 14:49         ` Jason Gunthorpe
2022-11-14 19:19   ` [PATCH v4 8/17] " Jason Gunthorpe
2022-11-08  0:49 ` [PATCH v4 09/17] iommufd: Data structure to provide IOVA to PFN mapping Jason Gunthorpe
2022-11-14  7:28   ` Tian, Kevin
2022-11-14 18:43     ` Jason Gunthorpe [this message]
2022-11-15  3:13       ` Tian, Kevin
2022-11-15 15:05         ` Jason Gunthorpe
2022-11-16  0:09           ` Tian, Kevin
2022-11-16  0:32             ` Jason Gunthorpe
2022-11-16  2:30               ` Tian, Kevin
2022-11-08  0:49 ` [PATCH v4 10/17] iommufd: IOCTLs for the io_pagetable Jason Gunthorpe
2022-11-08 13:27   ` Bagas Sanjaya
2022-11-08 17:01     ` Jason Gunthorpe
2022-11-14  7:46   ` Tian, Kevin
2022-11-08  0:49 ` [PATCH v4 11/17] iommufd: Add a HW pagetable object Jason Gunthorpe
2022-11-08  0:49 ` [PATCH v4 12/17] iommufd: Add kAPI toward external drivers for physical devices Jason Gunthorpe
2022-11-08 14:34   ` Yi Liu
2022-11-08 17:57     ` Jason Gunthorpe
2022-11-14  7:59   ` Tian, Kevin
2022-11-08  0:49 ` [PATCH v4 13/17] iommufd: Add kAPI toward external drivers for kernel access Jason Gunthorpe
2022-11-14  8:25   ` Tian, Kevin
2022-11-14 19:05     ` Jason Gunthorpe
2022-11-08  0:49 ` [PATCH v4 14/17] iommufd: vfio container FD ioctl compatibility Jason Gunthorpe
2022-11-08  0:49 ` [PATCH v4 16/17] iommufd: Add some fault injection points Jason Gunthorpe
2022-11-08  7:25   ` Nicolin Chen
2022-11-08 12:37     ` Jason Gunthorpe
2022-11-08  0:49 ` [PATCH v4 17/17] iommufd: Add additional invariant assertions Jason Gunthorpe
     [not found] ` <15-v4-0de2f6c78ed0+9d1-iommufd_jgg@nvidia.com>
2022-11-08  1:01   ` [PATCH v4 15/17] iommufd: Add a selftest Jason Gunthorpe
2022-11-08  5:48   ` Nicolin Chen
2022-11-08 13:27     ` Jason Gunthorpe
2022-11-09 23:51   ` Matthew Rosato
2022-11-11 15:51 ` [PATCH v4 00/17] IOMMUFD Generic interface Shameerali Kolothum Thodi
2022-11-12 12:44   ` Yi Liu
2023-01-10 11:35     ` Shameerali Kolothum Thodi
2023-01-10 13:49       ` Jason Gunthorpe
2023-01-10 15:16         ` Joao Martins
2023-01-10 15:18           ` Jason Gunthorpe
2023-01-10 15:30           ` Shameerali Kolothum Thodi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y3KMbyVwS6D505cA@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=chaitanyak@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=corbet@lwn.net \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dwmw2@infradead.org \
    --cc=eric.auger@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=iommu@lists.linux.dev \
    --cc=jasowang@redhat.com \
    --cc=jean-philippe@linaro.org \
    --cc=joao.m.martins@oracle.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mjrosato@linux.ibm.com \
    --cc=mst@redhat.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=nicolinc@nvidia.com \
    --cc=ojeda@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=schnelle@linux.ibm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=shuah@kernel.org \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=trix@redhat.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).