From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2033C433E6 for ; Fri, 12 Mar 2021 20:06:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5922B64F84 for ; Fri, 12 Mar 2021 20:06:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5922B64F84 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=deltatee.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BFF716B0070; Fri, 12 Mar 2021 15:06:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BD7116B0071; Fri, 12 Mar 2021 15:06:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A77616B0072; Fri, 12 Mar 2021 15:06:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0207.hostedemail.com [216.40.44.207]) by kanga.kvack.org (Postfix) with ESMTP id 891EE6B0070 for ; Fri, 12 Mar 2021 15:06:53 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 14D021837AE56 for ; Fri, 12 Mar 2021 20:06:53 +0000 (UTC) X-FDA: 77912305506.01.E933946 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by imf29.hostedemail.com (Postfix) with ESMTP id B15919016E for ; Fri, 12 Mar 2021 20:06:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:In-Reply-To:MIME-Version:Date: Message-ID:From:References:Cc:To:content-disposition; bh=/dKXn2MTgnalq4INDTL+cUYvr798DsS+qKv7RTLo0q8=; b=da62C93BS1h63ogYf0+1s+anR6 jE7LGT3lzTTjYIIqb+PgW/fdS/Tx6fEntgD51NJlTprHY0a60/LT/QP4Fc5Ai6ePuzSg+5NF+L6py nRbrs8LRxsjgEQ3bbreB2q1tcLRZSbwiR6N63c8IPAj25MKLjTff5d3tDmA5GNH69iLyo75HdTRQL icsGfRW5L4uGM435GQLPMFZysIjMtxvto4KAjvE2O5c0vlurrQcxvNGeRxzpZnms87a0en6OmRz0y 4VE4+Oyatorjzq2kdC3StQcBh4Yfq+TM0mZnbCUm+7sKjhSlsjY53WT5kVLKDNyivZA8UJ6mwJmxv bJOEOTPw==; Received: from s01060023bee90a7d.cg.shawcable.net ([24.64.145.4] helo=[192.168.0.10]) by ale.deltatee.com with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1lKo3U-00023x-1V; Fri, 12 Mar 2021 13:06:41 -0700 To: Robin Murphy , linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-pci@vger.kernel.org, linux-mm@kvack.org, iommu@lists.linux-foundation.org Cc: Minturn Dave B , John Hubbard , Dave Hansen , Ira Weiny , Matthew Wilcox , =?UTF-8?Q?Christian_K=c3=b6nig?= , Jason Gunthorpe , Jason Ekstrand , Daniel Vetter , Dan Williams , Stephen Bates , Jakowski Andrzej , Christoph Hellwig , Xiong Jianxin References: <20210311233142.7900-1-logang@deltatee.com> <20210311233142.7900-9-logang@deltatee.com> <45701356-ee41-1ad2-0e06-ca74af87b05a@deltatee.com> <76cc1c82-3cf4-92d3-992f-5c876ed30523@arm.com> From: Logan Gunthorpe Message-ID: <6dd679b3-e38b-2c18-1990-bfc96bb4d971@deltatee.com> Date: Fri, 12 Mar 2021 13:06:34 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: <76cc1c82-3cf4-92d3-992f-5c876ed30523@arm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-SA-Exim-Connect-IP: 24.64.145.4 X-SA-Exim-Rcpt-To: jianxin.xiong@intel.com, hch@lst.de, andrzej.jakowski@intel.com, sbates@raithlin.com, dan.j.williams@intel.com, daniel.vetter@ffwll.ch, jason@jlekstrand.net, jgg@ziepe.ca, christian.koenig@amd.com, willy@infradead.org, iweiny@intel.com, dave.hansen@linux.intel.com, jhubbard@nvidia.com, dave.b.minturn@intel.com, iommu@lists.linux-foundation.org, linux-mm@kvack.org, linux-pci@vger.kernel.org, linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, robin.murphy@arm.com X-SA-Exim-Mail-From: logang@deltatee.com Subject: Re: [RFC PATCH v2 08/11] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg X-SA-Exim-Version: 4.2.1 (built Wed, 08 May 2019 21:11:16 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) X-Stat-Signature: 5ni1seug441na7rrgxhmthr965ax6rre X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B15919016E Received-SPF: none (deltatee.com>: No applicable sender policy available) receiver=imf29; identity=mailfrom; envelope-from=""; helo=ale.deltatee.com; client-ip=204.191.154.188 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615579608-644404 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2021-03-12 12:47 p.m., Robin Murphy wrote: >>>> =C2=A0=C2=A0 { >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct scatterlist *s, *cur =3D= sg; >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long seg_mask =3D dma_= get_seg_boundary(dev); >>>> @@ -864,6 +865,20 @@ static int __finalise_sg(struct device *dev, >>>> struct scatterlist *sg, int nents, >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sg_dma_= address(s) =3D DMA_MAPPING_ERROR; >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sg_dma_= len(s) =3D 0; >>>> =C2=A0=C2=A0 +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (is_pci_= p2pdma_page(sg_page(s)) && !s_iova_len) { >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = if (i > 0) >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 cur =3D sg_next(cur); >>>> + >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = sg_dma_address(cur) =3D sg_phys(s) + s->offset - >>> >>> Are you sure about that? ;) >> >> Do you see a bug? I don't follow you... >=20 > sg_phys() already accounts for the offset, so you're adding it twice. Ah, oops. Nice catch. I missed that. >=20 >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 pci_p2pdma_bus_offset(sg_page(s)); >>> >>> Can the bus offset make P2P addresses overlap with regions of mem spa= ce >>> that we might use for regular IOVA allocation? That would be very bad= ... >> >> No. IOMMU drivers already disallow all PCI addresses from being used a= s >> IOVA addresses. See, for example,=C2=A0 dmar_init_reserved_ranges(). I= t would >> be a huge problem for a whole lot of other reasons if it didn't. >=20 > I know we reserve the outbound windows (largely *because* some host=20 > bridges will consider those addresses as attempts at unsupported P2P an= d=20 > prevent them working), I just wanted to confirm that this bus offset is= =20 > always something small that stays within the relevant window, rather=20 > than something that might make a BAR appear in a completely different=20 > place for P2P purposes. If so, that's good. Yes, well if an IOVA overlaps with any PCI bus address there's going to=20 be some difficult brokenness because when the IOVA is used it might be=20 directed to the a PCI device and not the IOMMU. I fixed a bug like that=20 once. >>> I'm not really thrilled about the idea of passing zero-length segment= s >>> to iommu_map_sg(). Yes, it happens to trick the concatenation logic i= n >>> the current implementation into doing what you want, but it feels=20 >>> fragile. >> >> We're not passing zero length segments to iommu_map_sg() (or any >> function). This loop is just scanning to calculate the length of the >> required IOVA. __finalise_sg() (which is intimately tied to this loop) >> then needs a way to determine which segments were P2P segments. The >> existing code already overwrites s->length with an aligned length and >> stores the original length in sg_dma_len. So we're not relying on >> tricking any logic here. >=20 > Yes, we temporarily shuffle in page-aligned quantities to satisfy the=20 > needs of the iommu_map_sg() call, before unpacking things again in=20 > __finalise_sg(). It's some disgusting trickery that I'm particularly=20 > proud of. My point is that if you have a mix of both p2p and normal=20 > segments - which seems to be a case you want to support - then the=20 > length of 0 that you set to flag p2p segments here will be seen by=20 > iommu_map_sg() (as it walks the list to map the other segments) before=20 > you then use it as a key to override the DMA address in the final step.= =20 > It's not a concern if you have a p2p-only list and short-circuit=20 > straight to that step (in which case all the shuffling was wasted effor= t=20 > anyway), but since it's not entirely clear what a segment with zero=20 > length would mean in general, it seems like a good idea to avoid passin= g=20 > the list across a public boundary in that state, if possible. Ok, well, I mean the iommu_map_sg() does the right thing as is without=20 changing it and IMO sg->length set to zero does make sense. Supporting=20 mixed P2P and normal segments is really the whole point of this series=20 (the current kernel supports homogeneous SGLs with a specialized path --=20 see pci_p2pdma_unmap_sg_attrs()). But do you have an alternate solution=20 for sg->length? Logan