From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42051) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fCFoj-0007XH-9b for qemu-devel@nongnu.org; Fri, 27 Apr 2018 22:42:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fCFog-0007zJ-2O for qemu-devel@nongnu.org; Fri, 27 Apr 2018 22:42:29 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:59872 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fCFof-0007z2-SV for qemu-devel@nongnu.org; Fri, 27 Apr 2018 22:42:25 -0400 References: <20180425045129.17449-1-peterx@redhat.com> <20180425045129.17449-4-peterx@redhat.com> <2492f7d8-ed63-6f1c-773f-baa223020022@redhat.com> <20180427062615.GY9036@xz-mi> <20180428022407.GG13269@xz-mi> From: Jason Wang Message-ID: <635e37b2-30a6-b204-3005-e3e098cb38f8@redhat.com> Date: Sat, 28 Apr 2018 10:42:11 +0800 MIME-Version: 1.0 In-Reply-To: <20180428022407.GG13269@xz-mi> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 03/10] intel-iommu: add iommu lock List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Xu Cc: Fam Zheng , "Michael S . Tsirkin" , qemu-devel@nongnu.org, Alex Williamson , Stefan Hajnoczi , Paolo Bonzini , Jintack Lim , David Gibson On 2018=E5=B9=B404=E6=9C=8828=E6=97=A5 10:24, Peter Xu wrote: > On Sat, Apr 28, 2018 at 09:43:54AM +0800, Jason Wang wrote: >> >> On 2018=E5=B9=B404=E6=9C=8827=E6=97=A5 14:26, Peter Xu wrote: >>> On Fri, Apr 27, 2018 at 01:13:02PM +0800, Jason Wang wrote: >>>> On 2018=E5=B9=B404=E6=9C=8825=E6=97=A5 12:51, Peter Xu wrote: >>>>> Add a per-iommu big lock to protect IOMMU status. Currently the on= ly >>>>> thing to be protected is the IOTLB cache, since that can be accesse= d >>>>> even without BQL, e.g., in IO dataplane. >>>>> >>>>> Note that device page tables should not need any protection. The s= afety >>>>> of that should be provided by guest OS. E.g., when a page entry is >>>>> freed, the guest OS should be responsible to make sure that no devi= ce >>>>> will be using that page any more. >>>>> >>>>> Reported-by: Fam Zheng >>>>> Signed-off-by: Peter Xu >>>>> --- >>>>> include/hw/i386/intel_iommu.h | 8 ++++++++ >>>>> hw/i386/intel_iommu.c | 31 ++++++++++++++++++++++++++++= +-- >>>>> 2 files changed, 37 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_= iommu.h >>>>> index 220697253f..1a8ba8e415 100644 >>>>> --- a/include/hw/i386/intel_iommu.h >>>>> +++ b/include/hw/i386/intel_iommu.h >>>>> @@ -262,6 +262,14 @@ struct IntelIOMMUState { >>>>> uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) b= ytes */ >>>>> uint8_t womask[DMAR_REG_SIZE]; /* WO (write only - read re= turns 0) */ >>>>> uint32_t version; >>>>> + /* >>>>> + * Protects IOMMU states in general. Normally we don't need t= o >>>>> + * take this lock when we are with BQL held. However we have = code >>>>> + * paths that may run even without BQL. In those cases, we ne= ed >>>>> + * to take the lock when we have access to IOMMU state >>>>> + * informations, e.g., the IOTLB. >>>>> + */ >>>>> + QemuMutex iommu_lock; >>>> Some questions: >>>> >>>> 1) Do we need to protect context cache too? >>> IMHO the context cache entry should work even without lock. That's a >>> bit trickly since we have two cases that this cache will be updated: >>> >>> (1) first translation of the address space of a device >>> (2) invalidation of context entries >>> >>> For (2) IMHO we don't need to worry about since guest OS should be >>> controlling that part, say, device should not be doing any translatio= n >>> (DMA operations) when the context entry is invalidated. >>> >>> For (1) the worst case is that the context entry cache be updated >>> multiple times with the same value by multiple threads. IMHO that'll >>> be fine too. >>> >>> But yes for sure we can protect that too with the iommu lock. >>> >>>> 2) Can we just reuse qemu BQL here? >>> I would prefer not. As I mentioned, at least I have spent too much >>> time on fighting BQL already. I really hope we can start to use >>> isolated locks when capable. BQL is always the worst choice to me. >> Just a thought, using BQL may greatly simplify the code actually (cons= ider >> we don't plan to remove BQL now). > Frankly speaking I don't understand why using BQL may greatly simplify > the code... :( IMHO the lock here is really not a complicated one. > > Note that IMO BQL is mostly helpful when we really want something to > be run sequentially with some other things _already_ protected by BQL. Except for the translate path from dataplane, I belive all other codes=20 were already protected by BQL. > In this case, all the stuff is inside VT-d code itself (or other > IOMMUs), why bother taking the BQL to make our life harder? It looks to me it was as simple as: @@ -494,6 +494,7 @@ static MemoryRegionSection=20 flatview_do_translate(FlatView *fv, =C2=A0=C2=A0=C2=A0=C2=A0 IOMMUMemoryRegionClass *imrc; =C2=A0=C2=A0=C2=A0=C2=A0 hwaddr page_mask =3D (hwaddr)(-1); =C2=A0=C2=A0=C2=A0=C2=A0 hwaddr plen =3D (hwaddr)(-1); +=C2=A0=C2=A0=C2=A0 int locked =3D false; =C2=A0=C2=A0=C2=A0=C2=A0 if (plen_out) { =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 plen =3D *plen_out; @@ -510,8 +511,15 @@ static MemoryRegionSection=20 flatview_do_translate(FlatView *fv, =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 imrc =3D memory_region_= get_iommu_class_nocheck(iommu_mr); +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!qemu_mutex_iothread_lock= ed()) { +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 locke= d =3D true; +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 qemu_= mutex_lock_iothread(); +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 iotlb =3D imrc->transla= te(iommu_mr, addr, is_write ? =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 IOMMU_WO : IOMMU_RO); +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (locked) { +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 qemu_= mutex_unlock_iothread(); +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 addr =3D ((iotlb.transl= ated_addr & ~iotlb.addr_mask) =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 | (addr & iotlb.addr_mask)); =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 page_mask &=3D iotlb.ad= dr_mask; > > So, even if we want to provide a general lock for the translation > procedure, I would prefer we add a per AddressSpace lock but not BQL. It could be, but it needs more work on each specific IOMMU codes. > However still that will need some extra flag showing that whether we > need the protection of not. For example, we may need to expliclitly > turn that off for Power and s390. Would that really worth it? It would cost just several lines of code, anything wrong with this? > > So my final preference is still current patch - we solve thread-safety > problems in VT-d and IOMMU code. Again, we really should make sure > all IOMMUs work with multithreads. > >>>> 3) I think the issue is common to all other kinds of IOMMU, so can w= e simply >>>> synchronize before calling ->translate() in memory.c. This seems a m= ore >>>> common solution. >>> I suspect Power and s390 live well with that. I think it mean at >>> least these platforms won't have problem in concurrency. I'm adding >>> DavidG in loop in case there is further comment. IMHO we should just >>> make sure IOMMU code be thread safe, and we fix problem if there is. >>> >>> Thanks, >>> >> Yes, it needs some investigation, but we have other IOMMUs like AMD, a= nd we >> could have a flag to bypass BQL if IOMMU can synchronize by itself. > AMD is still only for experimental. If we really want to use it in > production IMHO it'll need more testings and tunings not only on > thread-safety but on other stuffs too. So again, we can just fix them > when needed. I still don't see it a reason to depend on BQL here. Well, it's not about BQL specifically, it's about whether we have or=20 need a generic thread safety solution for all IOMMUs. We have more IOMMUs than just AMD, s390 and ppc: # git grep imrc-\>translate\ =3D hw/alpha/typhoon.c:=C2=A0=C2=A0=C2=A0 imrc->translate =3D typhoon_transla= te_iommu; hw/dma/rc4030.c:=C2=A0=C2=A0=C2=A0 imrc->translate =3D rc4030_dma_transla= te; hw/i386/amd_iommu.c:=C2=A0=C2=A0=C2=A0 imrc->translate =3D amdvi_translat= e; hw/i386/intel_iommu.c:=C2=A0=C2=A0=C2=A0 imrc->translate =3D vtd_iommu_tr= anslate; hw/ppc/spapr_iommu.c:=C2=A0=C2=A0=C2=A0 imrc->translate =3D spapr_tce_tra= nslate_iommu; hw/s390x/s390-pci-bus.c:=C2=A0=C2=A0=C2=A0 imrc->translate =3D s390_trans= late_iommu; hw/sparc/sun4m_iommu.c:=C2=A0=C2=A0=C2=A0 imrc->translate =3D sun4m_trans= late_iommu; hw/sparc64/sun4u_iommu.c:=C2=A0=C2=A0=C2=A0 imrc->translate =3D sun4u_tra= nslate_iommu; And we know there will be more in the near future. Thanks > > I'll see what others think about it. > > CCing Paolo, Stefan and Fam too. > > Thanks, >