linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)"  <longpeng2@huawei.com>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	chenjiashang <chenjiashang@huawei.com>,
	David Woodhouse <dwmw2@infradead.org>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"Gonglei (Arei)" <arei.gonglei@huawei.com>,
	"will@kernel.org" <will@kernel.org>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Joerg Roedel <joro@8bytes.org>
Subject: RE: A problem of Intel IOMMU hardware ?
Date: Sun, 21 Mar 2021 23:51:26 +0000	[thread overview]
Message-ID: <ac1e9b4699c4438f80ab771e5fbb4ee9@huawei.com> (raw)
In-Reply-To: <55E334BA-C6D2-4892-9207-32654FBF4360@gmail.com>

Hi Nadav,

> -----Original Message-----
> From: Nadav Amit [mailto:nadav.amit@gmail.com]
> Sent: Friday, March 19, 2021 12:46 AM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@huawei.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; chenjiashang
> <chenjiashang@huawei.com>; David Woodhouse <dwmw2@infradead.org>;
> iommu@lists.linux-foundation.org; LKML <linux-kernel@vger.kernel.org>;
> alex.williamson@redhat.com; Gonglei (Arei) <arei.gonglei@huawei.com>;
> will@kernel.org
> Subject: Re: A problem of Intel IOMMU hardware ?
> 
> 
> 
> > On Mar 18, 2021, at 2:25 AM, Longpeng (Mike, Cloud Infrastructure Service
> Product Dept.) <longpeng2@huawei.com> wrote:
> >
> >
> >
> >> -----Original Message-----
> >> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> >> Sent: Thursday, March 18, 2021 4:56 PM
> >> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >> <longpeng2@huawei.com>; Nadav Amit <nadav.amit@gmail.com>
> >> Cc: chenjiashang <chenjiashang@huawei.com>; David Woodhouse
> >> <dwmw2@infradead.org>; iommu@lists.linux-foundation.org; LKML
> >> <linux-kernel@vger.kernel.org>; alex.williamson@redhat.com; Gonglei
> >> (Arei) <arei.gonglei@huawei.com>; will@kernel.org
> >> Subject: RE: A problem of Intel IOMMU hardware ?
> >>
> >>> From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >>> <longpeng2@huawei.com>
> >>>
> >>>> -----Original Message-----
> >>>> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> >>>> Sent: Thursday, March 18, 2021 4:27 PM
> >>>> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >>>> <longpeng2@huawei.com>; Nadav Amit <nadav.amit@gmail.com>
> >>>> Cc: chenjiashang <chenjiashang@huawei.com>; David Woodhouse
> >>>> <dwmw2@infradead.org>; iommu@lists.linux-foundation.org; LKML
> >>>> <linux-kernel@vger.kernel.org>; alex.williamson@redhat.com; Gonglei
> >>> (Arei)
> >>>> <arei.gonglei@huawei.com>; will@kernel.org
> >>>> Subject: RE: A problem of Intel IOMMU hardware ?
> >>>>
> >>>>> From: iommu <iommu-bounces@lists.linux-foundation.org> On Behalf
> >>>>> Of Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >>>>>
> >>>>>> 2. Consider ensuring that the problem is not somehow related to
> >>>>>> queued invalidations. Try to use __iommu_flush_iotlb() instead of
> >>>> qi_flush_iotlb().
> >>>>>>
> >>>>>
> >>>>> I tried to force to use __iommu_flush_iotlb(), but maybe something
> >>>>> wrong, the system crashed, so I prefer to lower the priority of
> >>>>> this
> >>> operation.
> >>>>>
> >>>>
> >>>> The VT-d spec clearly says that register-based invalidation can be
> >>>> used only
> >>> when
> >>>> queued-invalidations are not enabled. Intel-IOMMU driver doesn't
> >>>> provide
> >>> an
> >>>> option to disable queued-invalidation though, when the hardware is
> >>> capable. If you
> >>>> really want to try, tweak the code in intel_iommu_init_qi.
> >>>>
> >>>
> >>> Hi Kevin,
> >>>
> >>> Thanks to point out this. Do you have any ideas about this problem ?
> >>> I tried to descript the problem much clear in my reply to Alex, hope
> >>> you could have a look if you're interested.
> >>>
> >>
> >> btw I saw you used 4.18 kernel in this test. What about latest kernel?
> >>
> >
> > Not test yet. It's hard to upgrade kernel in our environment.
> >
> >> Also one way to separate sw/hw bug is to trace the low level
> >> interface (e.g.,
> >> qi_flush_iotlb) which actually sends invalidation descriptors to the
> >> IOMMU hardware. Check the window between b) and c) and see whether
> >> the software does the right thing as expected there.
> >>
> >
> > We add some log in iommu driver these days, the software seems fine.
> > But we didn't look inside the qi_submit_sync yet, I'll try it tonight.
> 
> So here is my guess:
> 
> Intel probably used as a basis for the IOTLB an implementation of some other
> (regular) TLB design.
> 
> Intel SDM says regarding TLBs (4.10.4.2 “Recommended Invalidation”):
> 
> "Software wishing to prevent this uncertainty should not write to a
> paging-structure entry in a way that would change, for any linear address, both the
> page size and either the page frame, access rights, or other attributes.”
> 
> 
> Now the aforementioned uncertainty is a bit different (multiple
> *valid* translations of a single address). Yet, perhaps this is yet another thing that
> might happen.
> 
> From a brief look on the handling of MMU (not IOMMU) hugepages in Linux, indeed
> the PMD is first cleared and flushed before a new valid PMD is set. This is possible
> for MMUs since they allow the software to handle spurious page-faults gracefully.
> This is not the case for the IOMMU though (without PRI).
> 

But in my case, the flush_iotlb is called after the range of (0x0, 0xa0000) is unmapped,
I've no idea why this invalidation isn't effective except I've not look inside the qi yet, but
there is no complaints from the driver.

Could you please point out the code of MMU you mentioned above? In MMU code, is it
possible that all the entries of the PTE are all not-present but the PMD entry is still present?

*Page table after (0x0, 0xa0000) is unmapped:
PML4: 0x      1a34fbb003
  PDPE: 0x      1a34fbb003
   PDE: 0x      1a34fbf003
    PTE: 0x               0

*Page table after (0x0, 0xc0000000) is mapped:
PML4: 0x      1a34fbb003
  PDPE: 0x      1a34fbb003
   PDE: 0x       15ec00883

> Not sure this explains everything though. If that is the problem, then during a
> mapping that changes page-sizes, a TLB flush is needed, similarly to the one
> Longpeng did manually.
> 



  reply	other threads:[~2021-03-21 23:53 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17  3:16 A problem of Intel IOMMU hardware ? Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-03-17  5:16 ` Lu Baolu
2021-03-17  9:40   ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-03-17 15:18   ` Alex Williamson
2021-03-18  2:58     ` Lu Baolu
2021-03-18  4:46       ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-03-18  7:48         ` Nadav Amit
2021-03-17  5:46 ` Nadav Amit
2021-03-17  9:35   ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-03-17 18:12     ` Nadav Amit
2021-03-18  3:03       ` Lu Baolu
2021-03-18  8:20       ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-03-18  8:27         ` Tian, Kevin
2021-03-18  8:38           ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-03-18  8:43             ` Tian, Kevin
2021-03-18  8:54               ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-03-18  8:56             ` Tian, Kevin
2021-03-18  9:25               ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-03-18 16:46                 ` Nadav Amit
2021-03-21 23:51                   ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.) [this message]
2021-03-22  0:27                   ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-03-27  2:31                   ` Lu Baolu
2021-03-27  4:36                     ` Nadav Amit
2021-03-27  5:27                       ` Lu Baolu
2021-03-19  0:15               ` Lu Baolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac1e9b4699c4438f80ab771e5fbb4ee9@huawei.com \
    --to=longpeng2@huawei.com \
    --cc=alex.williamson@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=chenjiashang@huawei.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).