Another idea is seperate remote TLB invalidate into two instructions:

 - sfence.vma.b.asyc
 - sfence.vma.b.barrier // wait all async TLB invalidate operations
finished for all harts.

(I remember who mentioned me separate them into two instructions after
session. Anup? Is the idea right ?)

Actually, I never consider asyc TLB invalidate before, because current our
light iommu did not need it.

Thx all people attend the session :) Let's continue the talk.


Guo Ren <guoren@kernel.org> 于 2019年9月12日周四 22:59写道：

> Thx Will for reply.
>
> On Thu, Sep 12, 2019 at 3:03 PM Will Deacon <will@kernel.org> wrote:
> >
> > On Sun, Sep 08, 2019 at 07:52:55AM +0800, Guo Ren wrote:
> > > On Mon, Jun 24, 2019 at 6:40 PM Will Deacon <will@kernel.org> wrote:
> > > > > I'll keep my system use the same ASID for SMP + IOMMU :P
> > > >
> > > > You will want a separate allocator for that:
> > > >
> > > >
> https://lkml.kernel.org/r/20190610184714.6786-2-jean-philippe.brucker@arm.com
> > >
> > > Yes, it is hard to maintain ASID between IOMMU and CPUMMU or different
> > > system, because it's difficult to synchronize the IO_ASID when the CPU
> > > ASID is rollover.
> > > But we could still use hardware broadcast TLB invalidation instruction
> > > to uniformly manage the ASID and IO_ASID, or OTHER_ASID in our IOMMU.
> >
> > That's probably a bad idea, because you'll likely stall execution on the
> > CPU until the IOTLB has completed invalidation. In the case of ATS, I
> think
> > an endpoint ATC is permitted to take over a minute to respond. In
> reality, I
> > suspect the worst you'll ever see would be in the msec range, but that's
> > still an unacceptable period of time to hold a CPU.
> Just as I've said in the session that IOTLB invalidate delay is
> another topic, My main proposal is to introduce stage1.pgd and
> stage2.pgd as address space identifiers between different TLB systems
> based on vmid, asid. My last part of sildes will show you how to
> translate stage1/2.pgd to as/vmid in PCI ATS system and the method
> could work with SMMU-v3 and intel Vt-d. (It's regret for me there is
> no time to show you the whole slides.)
>
> In our light IOMMU implementation, there's no IOTLB invalidate delay
> problem. Becasue IOMMU is very close to CPU MMU and interconnect's
> delay is the same with SMP CPUs MMU (no PCI, VM supported).
>
> To solve the problem, we could define a async mode in sfence.vma.b to
> slove the problem and finished with per_cpu_irq/exception.
>
> --
> Best Regards
>  Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/
>