From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA25EC47404 for ; Sat, 28 Sep 2019 08:25:22 +0000 (UTC) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8DBE820815 for ; Sat, 28 Sep 2019 08:25:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8DBE820815 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=iommu-bounces@lists.linux-foundation.org Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 554091815; Sat, 28 Sep 2019 08:25:22 +0000 (UTC) Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id E16EC1810 for ; Sat, 28 Sep 2019 08:25:20 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id AA38A8A9 for ; Sat, 28 Sep 2019 08:25:19 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Sep 2019 01:25:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,558,1559545200"; d="scan'208";a="189686935" Received: from allen-box.sh.intel.com (HELO [10.239.159.136]) ([10.239.159.136]) by fmsmga008.fm.intel.com with ESMTP; 28 Sep 2019 01:25:16 -0700 Subject: Re: [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces To: Peter Xu References: <20190923122454.9888-1-baolu.lu@linux.intel.com> <20190923122454.9888-3-baolu.lu@linux.intel.com> <20190925052157.GL28074@xz-x1> <20190926034905.GW28074@xz-x1> <52778812-129b-0fa7-985d-5814e9d84047@linux.intel.com> <20190927053449.GA9412@xz-x1> From: Lu Baolu Message-ID: <66823e27-aa33-5968-b5fd-e5221fb1fffe@linux.intel.com> Date: Sat, 28 Sep 2019 16:23:16 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20190927053449.GA9412@xz-x1> Content-Language: en-US Cc: kevin.tian@intel.com, Yi Sun , ashok.raj@intel.com, kvm@vger.kernel.org, sanjay.k.kumar@intel.com, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org, Alex Williamson , David Woodhouse , yi.y.sun@intel.com X-BeenThere: iommu@lists.linux-foundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Development issues for Linux IOMMU support List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: iommu-bounces@lists.linux-foundation.org Errors-To: iommu-bounces@lists.linux-foundation.org Hi Peter, On 9/27/19 1:34 PM, Peter Xu wrote: > Hi, Baolu, > > On Fri, Sep 27, 2019 at 10:27:24AM +0800, Lu Baolu wrote: >>>>>> + spin_lock(&(domain)->page_table_lock); \ >>>>> >>>>> Is this intended to lock here instead of taking the lock during the >>>>> whole page table walk? Is it safe? >>>>> >>>>> Taking the example where nm==PTE: when we reach here how do we >>>>> guarantee that the PMD page that has this PTE is still valid? >>>> >>>> We will always keep the non-leaf pages in the table, >>> >>> I see. Though, could I ask why? It seems to me that the existing 2nd >>> level page table does not keep these when unmap, and it's not even use >>> locking at all by leveraging cmpxchg()? >> >> I still need some time to understand how cmpxchg() solves the race issue >> when reclaims pages. For example. >> >> Thread A Thread B >> -A1: check all PTE's empty -B1: up-level PDE valid >> -A2: clear the up-level PDE >> -A3: reclaim the page -B2: populate the PTEs >> >> Both (A1,A2) and (B1,B2) should be atomic. Otherwise, race could happen. > > I'm not sure of this, but IMHO it is similarly because we need to > allocate the iova ranges from iova allocator first, so thread A (who's > going to unmap pages) and thread B (who's going to map new pages) > should never have collapsed regions if happening concurrently. I'm Although they don't collapse, they might share a same pmd entry. If A cleared the pmd entry and B goes ahead with populating the pte's. It will crash. > referring to intel_unmap() in which we won't free the iova region > before domain_unmap() completes (which should cover the whole process > of A1-A3) so the same iova range to be unmapped won't be allocated to > any new pages in some other thread. > > There's also a hint in domain_unmap(): > > /* we don't need lock here; nobody else touches the iova range */ > >> >> Actually, the iova allocator always packs IOVA ranges close to the top >> of the address space. This results in requiring a minimal number of >> pages to map the allocated IOVA ranges, which makes memory onsumption >> by IOMMU page tables tolerable. Hence, we don't need to reclaim the >> pages until the whole page table is about to tear down. The real data >> on my test machine also improves this. > > Do you mean you have run the code with a 1st-level-supported IOMMU > hardware? IMHO any data point would be good to be in the cover letter > as reference. Yes. Sure! Let me do this since the next version. > > [...] > >>>>>> +static struct page * >>>>>> +mmunmap_pte_range(struct dmar_domain *domain, pmd_t *pmd, >>>>>> + unsigned long addr, unsigned long end, >>>>>> + struct page *freelist, bool reclaim) >>>>>> +{ >>>>>> + int i; >>>>>> + unsigned long start; >>>>>> + pte_t *pte, *first_pte; >>>>>> + >>>>>> + start = addr; >>>>>> + pte = pte_offset_kernel(pmd, addr); >>>>>> + first_pte = pte; >>>>>> + do { >>>>>> + set_pte(pte, __pte(0)); >>>>>> + } while (pte++, addr += PAGE_SIZE, addr != end); >>>>>> + >>>>>> + domain_flush_cache(domain, first_pte, (void *)pte - (void *)first_pte); >>>>>> + >>>>>> + /* Add page to free list if all entries are empty. */ >>>>>> + if (reclaim) { >>>>> >>>>> Shouldn't we know whether to reclaim if with (addr, end) specified as >>>>> long as they cover the whole range of this PMD? >>>> >>>> Current policy is that we don't reclaim any pages until the whole page >>>> table will be torn down. >>> >>> Ah OK. But I saw that you're passing in relaim==!start_addr. >>> Shouldn't that errornously trigger if one wants to unmap the 1st page >>> as well even if not the whole address space? >> >> IOVA 0 is assumed to be reserved by the allocator. Otherwise, we have no >> means to check whether a IOVA is valid. > > Is this an assumption of the allocator? Could that change in the future? Yes. And I think it should keep unless no consumer depends on this optimization. > > IMHO that's not necessary if so, after all it's as simple as replacing > (!start_addr) with (start == 0 && end == END). I see that in > domain_unmap() it has a similar check when freeing pgd: > > if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) > Yours looks better. Thank you! > Thanks, > Best regards, Baolu _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu