xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Julien Grall <julien.grall@arm.com>
To: Oleksandr Tyshchenko <olekstysh@gmail.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>,
	"sstabellini@kernel.org" <sstabellini@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Subject: Re: [Xen-devel] [PATCH V2 6/6] iommu/arm: Add Renesas IPMMU-VMSA support
Date: Fri, 9 Aug 2019 10:56:46 +0100	[thread overview]
Message-ID: <a2e19875-5c22-bdc0-0a84-5a983921fbd7@arm.com> (raw)
In-Reply-To: <CAPD2p-nv27=4RH5RVMyzBdAfTXeSArW3nvKEydhKWgm-mRoBmg@mail.gmail.com>

(+ Robin)

On 09/08/2019 00:32, Oleksandr Tyshchenko wrote:
> Hi Julien.

Hi,

> 
> Sorry for the possible format issues.
> 
> чт, 8 авг. 2019 г., 23:32 Julien Grall <julien.grall@arm.com 
> <mailto:julien.grall@arm.com>>:
> 
>     Hi Oleksandr,
> 
>     On 8/8/19 8:29 PM, Oleksandr wrote:
>      >>>>>>
>      >>>>>>> Sorry for the possible format issues.
>      >>>>>>>
>      >>>>>>>
>      >>>>>>>      > No, it is not disabled. But, ipmmu_irq() uses another
>      >>>>>>> mmu->lock. So, I
>      >>>>>>>      > think, there won't be a deadlock.
>      >>>>>>>      >
>      >>>>>>>      > Or I really missed something?
>      >>>>>>>      >
>      >>>>>>>      > If we worry about ipmmu_tlb_invalidate() which is called
>      >>>>>>> here (to
>      >>>>>>>      > perform a flush by request from P2M code, which manages a
>      >>>>>>> page table)
>      >>>>>>>      > and from the irq handler (to perform a flush to resume
>      >>>>>>> address
>      >>>>>>>      > translation), I could use a tasklet to schedule
>      >>>>>>> ipmmu_tlb_invalidate()
>      >>>>>>>      > from the irq handler then. This way we would get this
>      >>>>>>> serialized. What
>      >>>>>>>      > do you think?
>      >>>>>>>
>      >>>>>>>     I am afraid a tasklet is not an option. You need to perform
>      >>>>>>> the TLB
>      >>>>>>>     flush when requested otherwise you are introducing a security
>      >>>>>>> issue.
>      >>>>>>>
>      >>>>>>>     This is because as soon as a region is unmapped in the page
>      >>>>>>> table, we
>      >>>>>>>     remove the drop the reference on any page backing that
>      >>>>>>> region. When the
>      >>>>>>>     reference is dropped to zero, the page can be reallocated to
>      >>>>>>> another
>      >>>>>>>     domain or even Xen. If the TLB flush happen after, then the
>      >>>>>>> guest may
>      >>>>>>>     still be able to access the page for a short time if the
>      >>>>>>> translation has
>      >>>>>>>     been cached by the any TLB (IOMMU, Processor).
>      >>>>>>>
>      >>>>>>>
>      >>>>>>>
>      >>>>>>> I understand this. I am not proposing to delay a requested by P2M
>      >>>>>>> code TLB flush in any case. I just propose to issue TLB flush
>      >>>>>>> (which we have to perform in case of page faults, to resolve
>      >>>>>>> error condition and resume translations) from a tasklet rather
>      >>>>>>> than from interrupt handler directly. This is the TLB flush I am
>      >>>>>>> speaking about:
>      >>>>>>>
>      >>>>>>>
>     https://github.com/otyshchenko1/xen/blob/ipmmu_upstream2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L598
> 
>      >>>>>>>
>      >>>>>>>
>      >>>>>>> Sorry if I was unclear.
>      >>>>>>
>      >>>>>> My mistake, I misread what you wrote.
>      >>>>>>
>      >>>>>> I found the flush in the renesas-bsp and not Linux upstream but it
>      >>>>>> is not clear why this is actually required. You are not fixing any
>      >>>>>> translation error. So what this flush will do?
>      >>>>>>
>      >>>>>> Regarding the placement of the flush, then if you execute in a
>      >>>>>> tasklet it will likely be done later on when the IRQ has been
>      >>>>>> acknowledge. What's the implication to delay it?
>      >>>>>
>      >>>>>
>      >>>>> Looks like, there is no need to put this flush into a tasklet. As I
>      >>>>> understand from Shimoda-san's answer it is OK to call flush here.
>      >>>>>
>      >>>>> So, my worry about calling ipmmu_tlb_invalidate() directly from the
>      >>>>> interrupt handler is not actual anymore.
>      >>>>> ----------
>      >>>>> This is my understanding regarding the flush purpose here. This
>      >>>>> code, just follows the TRM, no more no less,
>      >>>>> which mentions about a need to flush TLB after clearing error
>      >>>>> status register and updating a page table entries (which, I assume,
>      >>>>> means to resolve a reason why translation/page fault error actually
>      >>>>> have happened) to resume address translation request.
>      >>>>
>      >>>> Well, I don't have the TRM... so my point of reference is Linux. Why
>      >>>> does upstream not do the TLB flush?
>      >>>
>      >>> I have no idea regarding that. >
>      >>>
>      >>>>
>      >>>>
>      >>>> I have been told this is an errata on the IPMMU. Is it related to
>      >>>> the series posted on linux-iommu [1]?
>      >>>
>      >>> I don't think, the TLB flush we are speaking about, is related to
>      >>> that series [1] somehow. This TLB flush, I think, is just the last
>      >>> step in a sequence of actions which should be performed when the
>      >>> error occurs, no more no less. This is how I understand this.
>      >>
>      >> If you have to flush the TLBs in the IRQ context then something has
>      >> gone really wrong.
>      >>
>      >> I don't deny that Break-Before-Make is an issue. However, if it is
>      >> handled correctly in the P2M code. You should only be there because
>      >> there are no mapping in the TLBs for the address accessed. So flushing
>      >> the TLBs should be unnecessary, unless your TLB is also caching
>      >> invalid entry?
>      >
>      > Sorry, I don't quite understand why we need to worry about this flush
>      > too much for a case which won't occur in normal condition (if everything
>      > is correct). Why we can't just consider this flush as a required action,
> 
>     A translation error can be easy to reach. For instance if the guest does
>     not program the Device correctly and request to access an address that
>     is not mapped.
> 
> 
> Yes, I understand these bits. But, I wrote that error wouldn't occur in normal 
> condition (if everything was correct).

I don't understand your point here. Whether this is in an error path or correct 
path, we should be able to understand the reason behind it. Otherwise, error 
path would become the wild west...

> 
> 
> 
> 
> 
>      > which needed to exit from the error state and resume stopped address
>      > translation request. The same required action as "clearing error status
>      > flags" before. We are not trying to understand, why is it so necessary
>      > to clear error flags when error happens, or can we end up without
>      > clearing it, for example. We just follow what described in document. The
>      > same, I think, we have for that flush, if described, then should be
>      > followed. Looks like this flush acts as a trigger to unblock stopped
>      > transaction in that particular case.
> 
>     What will actually happen if the transaction fail again? For instance,
>     if the IOVA was not mapped. Will you receive the interrupt again?
>     If so, are you going to make the flush again and again until the guest
>     is killed?
> 
> 
> This is a good question. I think, if address is not mapped, the transaction will 
> fail again and we will get the interrupt again. Not sure, until the guest is 
> killed or until the driver in the guest detects timeout and cancels DMA. Let's 
> consider the worst case, until the guest is killed.
> 
> So my questions are what do you think would be the proper driver's behavior in 
> that case? Do nothing and don't even try to resolve error condition/unblock 
> translation at the first page fault, or give it a few attempts, or unblock every 
> time.

I will answer back with a question here. How is the TLB flush is going to 
unblock anything? The more you are not fixing any error condition here... And 
the print "Unhandled fault" just afterwards clearly leads to think that there 
are very little chance the fault has been resolved.

> How does the SMMU driver act in such situation?

I have CCed Robin who knows better than me the SMMU driver. Though it is the 
Linux one but Xen is based on it.

 From my understanding, it is implementation defined whether the SMMU supports 
stalling a transaction on fault. AFAICT, the current Xen driver will just 
terminate the transaction and therefore the client transaction behave as RAZ/WI.

> 
> Quite clear, if we get a fault, then address is not mapped. I think, it can be 
> both: by issuing wrong address (baggy driver, malicious driver) or by race 
> (unlikely). If this is the real race (device hits brake-before-make, for 
> example), we could give it another attempt, for example. Looks like we need some 
> mechanism to deploy faulted address to P2M code (which manages page table) to 
> analyze? Or it is not worth doing that?

You seem to speak about break-before-make as it was an error. Break-Before-Make 
is just a sequence to prevent the TLB walker to cache both old and new mapping 
at the same time. At a given point the IOVA translation can only be:
    1) The old physical address
    2) No address -> result to a fault
    3) The new physical address

1) and 3) should not result to a fault. 2) will result to a fault but then the 
TLB should not cache invalid entry, right?

In order to see 2), we always flush the TLBs after removing the old physical 
address.

Unfortunately, some of the IOMMUs are not able to restart transactions, Xen 
currently avoids to flush the TLBs after 2). So you may be able to see both 
mapping at the same time.

Looking at your driver, I believe you would have the flag IMSTR.MHIT (multiple 
tlb hits) set because this is the condition we are trying to prevent with 
break-before-make. The comment in the code leads to think this is a fault error, 
so I am not sure why you would recover here...

If your IOMMU is able to stall transaction, then it would be best if we properly 
handle break-before-make with it.

Overall, it feels to me the TLB flush is here for a different reason.

> 
> 
>      >
>      > Different H/W could have different restoring sequences. Some H/W
>      > requires just clearing error status, other H/W requires full
>      > re-initialization in a specific order to recover from the error state.
>      >
>      > Please correct me if I am wrong.
> 
>     I am not confident to accept any code that I don't understand or I don't
>     find sensible. As I pointed out in my previous e-mail, this hasn't
>     reached upstream so something looks quite fishy here.
> 
> 
> As I answered in previous e-mail, I hope, we will be able to clarify a reason 
> why this hasn't reached upstream.

Thank you.

Cheers,


-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-08-09  9:57 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02 16:39 [Xen-devel] [PATCH V2 0/6] iommu/arm: Add Renesas IPMMU-VMSA support + Linux's iommu_fwspec Oleksandr Tyshchenko
2019-08-02 16:39 ` [Xen-devel] [PATCH V2 1/6] iommu/arm: Add iommu_helpers.c file to keep common for IOMMUs stuff Oleksandr Tyshchenko
2019-08-09 17:35   ` Julien Grall
2019-08-09 18:10     ` Oleksandr
2019-08-02 16:39 ` [Xen-devel] [PATCH V2 2/6] iommu/arm: Add ability to handle deferred probing request Oleksandr Tyshchenko
2019-08-12 11:11   ` Julien Grall
2019-08-12 12:01     ` Oleksandr
2019-08-12 19:46       ` Julien Grall
2019-08-13 12:35         ` Oleksandr
2019-08-14 17:34           ` Julien Grall
2019-08-14 19:25             ` Stefano Stabellini
2019-08-15  9:29               ` Julien Grall
2019-08-15 12:54                 ` Julien Grall
2019-08-15 13:14                   ` Oleksandr
2019-08-15 16:39                     ` Oleksandr
2019-08-02 16:39 ` [Xen-devel] [PATCH V2 3/6] [RFC] xen/common: Introduce _xrealloc function Oleksandr Tyshchenko
2019-08-05 10:02   ` Jan Beulich
2019-08-06 18:50     ` Oleksandr
2019-08-07  6:22       ` Jan Beulich
2019-08-07 17:31         ` Oleksandr
2019-08-06 19:51     ` Volodymyr Babchuk
2019-08-07  6:26       ` Jan Beulich
2019-08-07 18:36         ` Oleksandr
2019-08-08  6:08           ` Jan Beulich
2019-08-08  7:05           ` Jan Beulich
2019-08-08 11:05             ` Oleksandr
2019-08-02 16:39 ` [Xen-devel] [PATCH V2 4/6] iommu/arm: Add lightweight iommu_fwspec support Oleksandr Tyshchenko
2019-08-13 12:39   ` Julien Grall
2019-08-13 15:17     ` Oleksandr
2019-08-13 15:28       ` Julien Grall
2019-08-13 16:18         ` Oleksandr
2019-08-13 13:40   ` Julien Grall
2019-08-13 16:28     ` Oleksandr
2019-08-02 16:39 ` [Xen-devel] [PATCH V2 5/6] iommu/arm: Introduce iommu_add_dt_device API Oleksandr Tyshchenko
2019-08-13 13:49   ` Julien Grall
2019-08-13 16:05     ` Oleksandr
2019-08-13 17:13       ` Julien Grall
2019-08-02 16:39 ` [Xen-devel] [PATCH V2 6/6] iommu/arm: Add Renesas IPMMU-VMSA support Oleksandr Tyshchenko
2019-08-07  2:41   ` Yoshihiro Shimoda
2019-08-07 16:01     ` Oleksandr
2019-08-07 19:15       ` Julien Grall
2019-08-07 20:28         ` Oleksandr Tyshchenko
2019-08-08  9:05           ` Julien Grall
2019-08-08 10:14             ` Oleksandr
2019-08-08 12:44               ` Julien Grall
2019-08-08 15:04                 ` Oleksandr
2019-08-08 17:16                   ` Julien Grall
2019-08-08 19:29                     ` Oleksandr
2019-08-08 20:32                       ` Julien Grall
2019-08-08 23:32                         ` Oleksandr Tyshchenko
2019-08-09  9:56                           ` Julien Grall [this message]
2019-08-09 18:38                             ` Oleksandr
2019-08-08 12:28         ` Oleksandr
2019-08-08 14:23         ` Lars Kurth
2019-08-08  4:05       ` Yoshihiro Shimoda
2019-08-14 17:38   ` Julien Grall
2019-08-14 18:45     ` Oleksandr
2019-08-05  7:58 ` [Xen-devel] [PATCH V2 0/6] iommu/arm: Add Renesas IPMMU-VMSA support + Linux's iommu_fwspec Oleksandr
2019-08-05  8:29   ` Julien Grall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2e19875-5c22-bdc0-0a84-5a983921fbd7@arm.com \
    --to=julien.grall@arm.com \
    --cc=oleksandr_tyshchenko@epam.com \
    --cc=olekstysh@gmail.com \
    --cc=robin.murphy@arm.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    --cc=yoshihiro.shimoda.uh@renesas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).