IOMMU Archive on lore.kernel.org
 help / color / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Lu Baolu <baolu.lu@linux.intel.com>
Cc: kevin.tian@intel.com, Yi Sun <yi.y.sun@linux.intel.com>,
	ashok.raj@intel.com, kvm@vger.kernel.org,
	sanjay.k.kumar@intel.com, iommu@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org,
	Alex Williamson <alex.williamson@redhat.com>,
	David Woodhouse <dwmw2@infradead.org>,
	yi.y.sun@intel.com
Subject: Re: [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces
Date: Fri, 27 Sep 2019 13:34:49 +0800
Message-ID: <20190927053449.GA9412@xz-x1> (raw)
In-Reply-To: <52778812-129b-0fa7-985d-5814e9d84047@linux.intel.com>

Hi, Baolu,

On Fri, Sep 27, 2019 at 10:27:24AM +0800, Lu Baolu wrote:
> > > > > +	spin_lock(&(domain)->page_table_lock);				\
> > > > 
> > > > Is this intended to lock here instead of taking the lock during the
> > > > whole page table walk?  Is it safe?
> > > > 
> > > > Taking the example where nm==PTE: when we reach here how do we
> > > > guarantee that the PMD page that has this PTE is still valid?
> > > 
> > > We will always keep the non-leaf pages in the table,
> > 
> > I see.  Though, could I ask why?  It seems to me that the existing 2nd
> > level page table does not keep these when unmap, and it's not even use
> > locking at all by leveraging cmpxchg()?
> 
> I still need some time to understand how cmpxchg() solves the race issue
> when reclaims pages. For example.
> 
> Thread A				Thread B
> -A1: check all PTE's empty		-B1: up-level PDE valid
> -A2: clear the up-level PDE
> -A3: reclaim the page			-B2: populate the PTEs
> 
> Both (A1,A2) and (B1,B2) should be atomic. Otherwise, race could happen.

I'm not sure of this, but IMHO it is similarly because we need to
allocate the iova ranges from iova allocator first, so thread A (who's
going to unmap pages) and thread B (who's going to map new pages)
should never have collapsed regions if happening concurrently.  I'm
referring to intel_unmap() in which we won't free the iova region
before domain_unmap() completes (which should cover the whole process
of A1-A3) so the same iova range to be unmapped won't be allocated to
any new pages in some other thread.

There's also a hint in domain_unmap():

  /* we don't need lock here; nobody else touches the iova range */

> 
> Actually, the iova allocator always packs IOVA ranges close to the top
> of the address space. This results in requiring a minimal number of
> pages to map the allocated IOVA ranges, which makes memory onsumption
> by IOMMU page tables tolerable. Hence, we don't need to reclaim the
> pages until the whole page table is about to tear down. The real data
> on my test machine also improves this.

Do you mean you have run the code with a 1st-level-supported IOMMU
hardware?  IMHO any data point would be good to be in the cover letter
as reference.

[...]

> > > > > +static struct page *
> > > > > +mmunmap_pte_range(struct dmar_domain *domain, pmd_t *pmd,
> > > > > +		  unsigned long addr, unsigned long end,
> > > > > +		  struct page *freelist, bool reclaim)
> > > > > +{
> > > > > +	int i;
> > > > > +	unsigned long start;
> > > > > +	pte_t *pte, *first_pte;
> > > > > +
> > > > > +	start = addr;
> > > > > +	pte = pte_offset_kernel(pmd, addr);
> > > > > +	first_pte = pte;
> > > > > +	do {
> > > > > +		set_pte(pte, __pte(0));
> > > > > +	} while (pte++, addr += PAGE_SIZE, addr != end);
> > > > > +
> > > > > +	domain_flush_cache(domain, first_pte, (void *)pte - (void *)first_pte);
> > > > > +
> > > > > +	/* Add page to free list if all entries are empty. */
> > > > > +	if (reclaim) {
> > > > 
> > > > Shouldn't we know whether to reclaim if with (addr, end) specified as
> > > > long as they cover the whole range of this PMD?
> > > 
> > > Current policy is that we don't reclaim any pages until the whole page
> > > table will be torn down.
> > 
> > Ah OK.  But I saw that you're passing in relaim==!start_addr.
> > Shouldn't that errornously trigger if one wants to unmap the 1st page
> > as well even if not the whole address space?
> 
> IOVA 0 is assumed to be reserved by the allocator. Otherwise, we have no
> means to check whether a IOVA is valid.

Is this an assumption of the allocator?  Could that change in the future?

IMHO that's not necessary if so, after all it's as simple as replacing
(!start_addr) with (start == 0 && end == END).  I see that in
domain_unmap() it has a similar check when freeing pgd:

  if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw))

Thanks,

-- 
Peter Xu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply index

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-23 12:24 [RFC PATCH 0/4] Use 1st-level for DMA remapping in guest Lu Baolu
2019-09-23 12:24 ` [RFC PATCH 1/4] iommu/vt-d: Move domain_flush_cache helper into header Lu Baolu
2019-09-23 12:24 ` [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces Lu Baolu
2019-09-23 20:31   ` Raj, Ashok
2019-09-24  1:38     ` Lu Baolu
2019-09-25  4:30       ` Peter Xu
2019-09-25  4:38         ` Tian, Kevin
2019-09-25  5:24           ` Peter Xu
2019-09-25  6:52             ` Lu Baolu
2019-09-25  7:32               ` Tian, Kevin
2019-09-25  8:35                 ` Peter Xu
2019-09-26  1:42                 ` Lu Baolu
2019-09-25  5:21   ` Peter Xu
2019-09-26  2:35     ` Lu Baolu
2019-09-26  3:49       ` Peter Xu
2019-09-27  2:27         ` Lu Baolu
2019-09-27  5:34           ` Peter Xu [this message]
2019-09-28  8:23             ` Lu Baolu
2019-09-29  5:25               ` Peter Xu
2019-10-08  2:20                 ` Lu Baolu
2019-09-23 12:24 ` [RFC PATCH 3/4] iommu/vt-d: Map/unmap domain with mmmap/mmunmap Lu Baolu
2019-09-25  5:00   ` Tian, Kevin
2019-09-25  7:06     ` Lu Baolu
2019-09-23 12:24 ` [RFC PATCH 4/4] iommu/vt-d: Identify domains using first level page table Lu Baolu
2019-09-25  6:50   ` Peter Xu
2019-09-25  7:35     ` Tian, Kevin
2019-09-23 19:27 ` [RFC PATCH 0/4] Use 1st-level for DMA remapping in guest Jacob Pan
2019-09-23 20:25   ` Raj, Ashok
2019-09-24  4:40     ` Lu Baolu
2019-09-24  7:00     ` Tian, Kevin
2019-09-25  2:48       ` Lu Baolu
2019-09-25  6:56         ` Peter Xu
2019-09-25  7:21           ` Tian, Kevin
2019-09-25  7:45             ` Peter Xu
2019-09-25  8:02               ` Tian, Kevin
2019-09-25  8:52                 ` Peter Xu
2019-09-26  1:37                   ` Lu Baolu
2019-09-24  4:27   ` Lu Baolu

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190927053449.GA9412@xz-x1 \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sanjay.k.kumar@intel.com \
    --cc=yi.y.sun@intel.com \
    --cc=yi.y.sun@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

IOMMU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-iommu/0 linux-iommu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-iommu linux-iommu/ https://lore.kernel.org/linux-iommu \
		iommu@lists.linux-foundation.org
	public-inbox-index linux-iommu

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.linux-foundation.lists.iommu


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git