From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69805C433EF for ; Fri, 1 Oct 2021 13:49:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4A5BD619F9 for ; Fri, 1 Oct 2021 13:49:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231537AbhJANuo (ORCPT ); Fri, 1 Oct 2021 09:50:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231502AbhJANun (ORCPT ); Fri, 1 Oct 2021 09:50:43 -0400 Received: from mail-qk1-x731.google.com (mail-qk1-x731.google.com [IPv6:2607:f8b0:4864:20::731]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 611BAC06177D for ; Fri, 1 Oct 2021 06:48:59 -0700 (PDT) Received: by mail-qk1-x731.google.com with SMTP id x12so9123006qkf.9 for ; Fri, 01 Oct 2021 06:48:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=LG5nuVM12UKCUvMUz/H7oZ+JnPtb7xeOrg+cdikjdUw=; b=i2nZSJalGl2HFR2lljteQga9LLzznj1VY4sKfMFrOuI5TPw+9fxa9Y7VF6ORCDu6gf s1VZcbmu8prmAySSW0muDV9sPFy+ocsOAHMu7czDjdTka8/r9gDirvuLv+DjNcMP+yu2 2vpgG0jpIzFJO3o2EESr3Knc9sAVcu/3zrGpfyryk0rXsa3CM7UVEgJZI2lyiRkkfNpM 06X0/la9ZF1/JOw1nIH1hT6QVKTk7TJuv33hiHw5ih9p7MWcfmzNHEd0oxDuij3diKPb ruwnbMTM1vlfPcAVocSCOhTaVtV1FJ3tIhW47hPjbC6RUYlSTDx8/p4BvGKL4lQu/579 YBXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=LG5nuVM12UKCUvMUz/H7oZ+JnPtb7xeOrg+cdikjdUw=; b=Vb2tFaGftm4Io76EnBMJ3InE+BFtDVJYisvOuQ0/evh4UEfQmLcijr6jrlgrbVgdKe MQDhNXoI8Uo4kzYxH9FMtYEuGpUoWzNILvWsacytjifojZXNHpIKTBGUVX06z187k7/j FRQRLkVARNjdlSbNFKhZqCLKOtbcznvamNjkRWvw+9pDNEhdO0E4/gcXyC8fwYIC33Lj mtk1895MPb3/V1SKs2enH8zRo9zQ4zUSV5VwHCUU2ve2fdEc7cfkX3AGSCkLwjw96/hZ nN/pgE572JAjCUYJbn2M6nM1eu/IMhx8CMyPvOlhW78myOPx3iKzTjeH8JR52sR37tBL 3udQ== X-Gm-Message-State: AOAM533VIYtCVZguNId2kvmJq/ykjRwF/VMIXLCckEAPbG8UYUWuupIE ZUTZImjy3lX0/llcHhqVF6GnuQ== X-Google-Smtp-Source: ABdhPJxAWiRAix1MOnP+fusB3RWFs4aefuUjIqiau0+pjC7MwPUEHTq7EjNfCISJpSPUQC5qQMYTaw== X-Received: by 2002:a37:65d6:: with SMTP id z205mr9907719qkb.522.1633096138367; Fri, 01 Oct 2021 06:48:58 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-162-113-129.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.113.129]) by smtp.gmail.com with ESMTPSA id u19sm3747206qtx.40.2021.10.01.06.48.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Oct 2021 06:48:57 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1mWIuG-008R2P-Oj; Fri, 01 Oct 2021 10:48:56 -0300 Date: Fri, 1 Oct 2021 10:48:56 -0300 From: Jason Gunthorpe To: Logan Gunthorpe , Alistair Popple , Felix Kuehling , Christoph Hellwig , Dan Williams Cc: linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-pci@vger.kernel.org, linux-mm@kvack.org, iommu@lists.linux-foundation.org, Stephen Bates , Christian =?utf-8?B?S8O2bmln?= , John Hubbard , Don Dutile , Matthew Wilcox , Daniel Vetter , Jakowski Andrzej , Minturn Dave B , Jason Ekstrand , Dave Hansen , Xiong Jianxin , Bjorn Helgaas , Ira Weiny , Robin Murphy , Martin Oliveira , Chaitanya Kulkarni Subject: Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem() Message-ID: <20211001134856.GN3544071@ziepe.ca> References: <20210916234100.122368-1-logang@deltatee.com> <20210916234100.122368-20-logang@deltatee.com> <20210928195518.GV3544071@ziepe.ca> <8d386273-c721-c919-9749-fc0a7dc1ed8b@deltatee.com> <20210929230543.GB3544071@ziepe.ca> <32ce26d7-86e9-f8d5-f0cf-40497946efe9@deltatee.com> <20210929233540.GF3544071@ziepe.ca> <20210930003652.GH3544071@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210930003652.GH3544071@ziepe.ca> Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Wed, Sep 29, 2021 at 09:36:52PM -0300, Jason Gunthorpe wrote: > Why would DAX want to do this in the first place?? This means the > address space zap is much more important that just speeding up > destruction, it is essential for correctness since the PTEs are not > holding refcounts naturally... It is not really for this series to fix, but I think the whole thing is probably racy once you start allowing pte_special pages to be accessed by GUP. If we look at unmapping the PTE relative to GUP fast the important sequence is how the TLB flushing doesn't decrement the page refcount until after it knows any concurrent GUP fast is completed. This is arch specific, eg it could be done async through a call_rcu handler. This ensures that pages can't cross back into the free pool and be reallocated until we know for certain that nobody is walking the PTEs and could potentially take an additional reference on it. The scheme cannot rely on the page refcount being 0 because oce it goes into the free pool it could be immeidately reallocated back to a non-zero refcount. A DAX user that simply does an address space invalidation doesn't sequence itself with any of this mechanism. So we can race with a thread doing GUP fast and another thread re-cycling the page into another use - creating a leakage of the page from one security context to another. This seems to be made worse for the pgmap stuff due to the wonky refcount usage - at least if the refcount had dropped to zero gup fast would be blocked for a time, but even that doesn't happen. In short, I think using pg special for anything that can be returned by gup fast (and maybe even gup!) is racy/wrong. We must have the normal refcount mechanism work for correctness of the recycling flow. I don't know why DAX did this, I think we should be talking about udoing it all of it, not just the wonky refcounting Alistair and Felix are working on, but also the use of MIXEDMAP and pte special for struct page backed memory. Jason