From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E71BC4363A for ; Mon, 5 Oct 2020 18:16:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4C3CD20853 for ; Mon, 5 Oct 2020 18:16:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="g6/KKyKk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727773AbgJESQq (ORCPT ); Mon, 5 Oct 2020 14:16:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46544 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727261AbgJESQp (ORCPT ); Mon, 5 Oct 2020 14:16:45 -0400 Received: from mail-oi1-x242.google.com (mail-oi1-x242.google.com [IPv6:2607:f8b0:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CC3EC0613CE for ; Mon, 5 Oct 2020 11:16:45 -0700 (PDT) Received: by mail-oi1-x242.google.com with SMTP id 26so9631792ois.5 for ; Mon, 05 Oct 2020 11:16:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FN2AJgQesBFrflJe0AntQ7YnHAZ/BkQQlGgu8HacbQA=; b=g6/KKyKk/tBCaMaj97Lm7iHrYxg7bJ29XQV8IhKUtxpXBMElnA0Mbv+lpKUShMIfHO MqqLNs2Ci3XA+tHIPwb3KdmMIDgXvYMIh2Gq+/uts7XqiK36dJTgztw6Syp+RG3BsD5T Ppy21BbMDLHOP8TA62tHInwCRD/lrYW410N2c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FN2AJgQesBFrflJe0AntQ7YnHAZ/BkQQlGgu8HacbQA=; b=osNKMLZV7ucSxyr9o7MMnbjRglOwS0avc+uftzGFaYjJoqjMVi3nRT8J0ZSwzguYWf iLz1IRQI94EUnEpQzUwnlo5Ul0Os01Ag/N5m/GBV1e03U7gD/OAhMIJf9EgBDms77BVh ABJTly4QU2xvKLfuZKG77TOq2X0GVUtsu8MMgZJvevCXvUSU0QkKAbXMCcalDPl+KTr0 ktNOrnvE87YegsbMc8GWh0Zg7tsxOl1KlZeWEriN/zdYT9nzmeEhI6GR8FxtbBphKC46 Zs11tOQV+S3UxH9g3tbw13+i3tQENp5DS8zF3Cbfui4fTet7RzzCi1gCrGdLq1lwdIOV /00w== X-Gm-Message-State: AOAM531V82zZyNy9PhU9pFI4fmZoompJl1sDR7pReczt8KB9CHYdpo1z sIgb0az1uffHdjPzjUTe3WaJnYLrQx7N/AMGpRnM9Q== X-Google-Smtp-Source: ABdhPJwE3O5vPlLIwwOdbL49TN4HbzWmZ5VHe6jljPtrdxkXMqN7oaPXtHoRcyjt74oM3jMzJ1cKkFt4Dx3tnDQvQtA= X-Received: by 2002:aca:c6cc:: with SMTP id w195mr345570oif.101.1601921804728; Mon, 05 Oct 2020 11:16:44 -0700 (PDT) MIME-Version: 1.0 References: <20201002175303.390363-1-daniel.vetter@ffwll.ch> <20201002175303.390363-2-daniel.vetter@ffwll.ch> <20201002180603.GL9916@ziepe.ca> <20201002233118.GM9916@ziepe.ca> <20201004125059.GP9916@ziepe.ca> <20201005172854.GA5177@ziepe.ca> In-Reply-To: <20201005172854.GA5177@ziepe.ca> From: Daniel Vetter Date: Mon, 5 Oct 2020 20:16:33 +0200 Message-ID: Subject: Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM To: Jason Gunthorpe Cc: DRI Development , LKML , Daniel Vetter , Andrew Morton , John Hubbard , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jan Kara , Dan Williams , Linux MM , Linux ARM , Pawel Osciak , Marek Szyprowski , Kyungmin Park , Tomasz Figa , Inki Dae , Joonyoung Shim , Seung-Woo Kim , linux-samsung-soc , "open list:DMA BUFFER SHARING FRAMEWORK" , Oded Gabbay Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 5, 2020 at 7:28 PM Jason Gunthorpe wrote: > > On Sun, Oct 04, 2020 at 06:09:29PM +0200, Daniel Vetter wrote: > > On Sun, Oct 4, 2020 at 2:51 PM Jason Gunthorpe wrote: > > > > > > On Sat, Oct 03, 2020 at 11:40:22AM +0200, Daniel Vetter wrote: > > > > > > > > That leaves the only interesting places as vb2_dc_get_userptr() and > > > > > vb2_vmalloc_get_userptr() which both completely fail to follow the > > > > > REQUIRED behavior in the function's comment about checking PTEs. It > > > > > just DMA maps them. Badly broken. > > > > > > > > > > Guessing this hackery is for some embedded P2P DMA transfer? > > > > > > > > Yeah, see also the follow_pfn trickery in > > > > videobuf_dma_contig_user_get(), I think this is fully intentional and > > > > userspace abi we can't break :-/ > > > > > > We don't need to break uABI, it just needs to work properly in the > > > kernel: > > > > > > vma = find_vma_intersection() > > > dma_buf = dma_buf_get_from_vma(vma) > > > sg = dma_buf_p2p_dma_map(dma_buf) > > > [.. do dma ..] > > > dma_buf_unmap(sg) > > > dma_buf_put(dma_buf) > > > > > > It is as we discussed before, dma buf needs to be discoverable from a > > > VMA, at least for users doing this kind of stuff. > > > > I'm not a big fan of magic behaviour like this, there's more to > > dma-buf buffer sharing than just "how do I get at the backing > > storage". Thus far we've done everything rather explicitly. Plus with > > exynos and habanalabs converted there's only v4l left over, and that > > has a proper dma-buf import path already. > > Well, any VA approach like this has to access some backing refcount > via the VMA. Not really any way to avoid something like that > > > > A VM flag doesn't help - we need to introduce some kind of lifetime, > > > and that has to be derived from the VMA. It needs data not just a flag > > > > I don't want to make it work, I just want to make it fail. Rough idea > > I have in mind is to add a follow_pfn_longterm, for all callers which > > aren't either synchronized through mmap_sem or an mmu_notifier. > > follow_pfn() doesn't work outside the pagetable locks or mmu notifier > protection. Can't be fixed. > > We only have a few users: > > arch/s390/pci/pci_mmio.c: ret = follow_pfn(vma, user_addr, pfn); > drivers/media/v4l2-core/videobuf-dma-contig.c: ret = follow_pfn(vma, user_address, &this_pfn); > drivers/vfio/vfio_iommu_type1.c: ret = follow_pfn(vma, vaddr, pfn); > drivers/vfio/vfio_iommu_type1.c: ret = follow_pfn(vma, vaddr, pfn); > mm/frame_vector.c: err = follow_pfn(vma, start, &nums[ret]); > virt/kvm/kvm_main.c: r = follow_pfn(vma, addr, &pfn); > virt/kvm/kvm_main.c: r = follow_pfn(vma, addr, &pfn); > > VFIO is broken like media, but I saw patches fixing the vfio cases > using the VMA and a vfio specific refcount. > > media & frame_vector we are talking about here. > > kvm is some similar hack added for P2P DMA, see commit > add6a0cd1c5ba51b201e1361b05a5df817083618. It might be protected by notifiers.. Yeah my thinking is that kvm (and I think also vfio, also seems to have mmu notifier nearby) are ok because of the mmu notiifer. Assuming that one works correctly. > s390 looks broken too, needs to hold the page table locks. Hm yeah I guess that looks fairly reasonable to fix too. > So, the answer really is that s390 and media need fixing, and this API > should go away (or become kvm specific) I'm still not clear how you want fo fix this, since your vma->dma_buf idea is kinda a decade long plan and so just not going to happen: - v4l used this mostly (afaik the lore at least) for buffer sharing with v4l itself, and also a bit with fbdev. Neither even has any dma-buf exporter code as-is. - like I said, there's no central dma-buf instance, it was fairly intentionally create as an all-to-all abstraction. Which means you either have to roll out a vm_ops->gimme_the_dmabuf or, even more work, refactor all the dma-buf exporters to go through the same things - even where we have dma-buf, most mmaps of buffer objects aren't a dma-buf. Those are only set up when userspace explicitly asks for one, so we'd also need to change the mmap code of all drivers involved to make sure the dma-buf is always created when we do any kind of mmap. I don't see that as a realistic thing to ever happen, and meanwhile we can't leave the gap open for a few years. > > If this really breaks anyone's use-case we can add a tainting kernel > > option which re-enables this (we've done something similar for > > phys_addr_t based buffer sharing in fbdev, entirely unfixable since > > the other driver has to just blindly trust that what userspace > > passes around is legit). This here isn't unfixable, but if v4l > > people want to keep it without a big "security hole here" sticker, > > they should do the work, not me :-) > > This seems fairly reasonable.. > > So after frame_vec is purged and we have the one caller in media, move > all this stuff to media and taint the kernel if it goes down the > follow_pfn path Yeah I think moving frame_vec back to media sounds like a good idea, it should stop new users like habanalbas/exynos from popping up at least. It's follow_pfn that freaks me out more. -Daniel -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 293EAC47095 for ; Mon, 5 Oct 2020 18:18:08 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D064D20B80 for ; Mon, 5 Oct 2020 18:18:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="1gUY1szM"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="g6/KKyKk" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D064D20B80 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ffwll.ch Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=g6PAi1zjNh0JAxqz4Ii8gCcRbOkxg38t4XuDK9+LBq4=; b=1gUY1szMGCUAk8gxzBz37qH2h f7BsQDdhg8Zw9o2f0735Ji2bzwAgZS6CFztXSdPJ+jZkbOvvgQzybEDdXn2GYd+EsxwAAqdPuYPge xKmUWgeAh1XQaru8CJonyayve9a12tWd/HRGvTA5NrWaXS6YuZ0qy2QQkwr6cEKKr93Rv8yPgdPTY PQH/7HsKFzHh7VJ7oai54gBKssRvy3XLHaAJ9j/0i7XpRhXdfiPq7HWgQqc3bW20snIB0kFio/fUK xLMFmsRl/1Iq1opBgkvbWmxvVjr52xsTWgHBoc/+8+GhdKeIvasc3lNmIWt35M9YWkfvu6EqMMbKu bx6jKEHMg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kPV2X-0007fx-Ax; Mon, 05 Oct 2020 18:16:49 +0000 Received: from mail-oi1-x243.google.com ([2607:f8b0:4864:20::243]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kPV2U-0007f5-Ps for linux-arm-kernel@lists.infradead.org; Mon, 05 Oct 2020 18:16:47 +0000 Received: by mail-oi1-x243.google.com with SMTP id t77so6139683oie.4 for ; Mon, 05 Oct 2020 11:16:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FN2AJgQesBFrflJe0AntQ7YnHAZ/BkQQlGgu8HacbQA=; b=g6/KKyKk/tBCaMaj97Lm7iHrYxg7bJ29XQV8IhKUtxpXBMElnA0Mbv+lpKUShMIfHO MqqLNs2Ci3XA+tHIPwb3KdmMIDgXvYMIh2Gq+/uts7XqiK36dJTgztw6Syp+RG3BsD5T Ppy21BbMDLHOP8TA62tHInwCRD/lrYW410N2c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FN2AJgQesBFrflJe0AntQ7YnHAZ/BkQQlGgu8HacbQA=; b=eU5c4+4tYbDNuPbsqCi3cpv7uJzeAkoXDNrvC2CjziD9z/9Y/n/kPiXMxyzxsHbN5Z zM4kYVybcvPYvzOT0cLNnvFJpWwbGq15WVm7uQaBcS+OJ3ddybMw07l5I05LqbRIBLju C4LTQIXVL65EQdROmb+bypns5FjIFGm2lMHf4fz5tMSThW+vjKC9Asw9JEGZe3CW5jq9 tdeAnFLtREGxVIryZVzllsnN7XI6Ho9tH2DgAPW85JlAsmvFIYzC5KF9/udAhT3yXZJl bDt6AQcIL9luk7U3qm4Ou4r7+xSyDAq7dPh9bnl26psv3pvMFI279KdEX3mZ80ZmoxL+ KqNQ== X-Gm-Message-State: AOAM531pm300+GeWB1vyVeTgYqgS9LnT+FhaSJgD4RP1tk/zEcrh0yUr rQS4OSs3ZJiJ9MiRTY5Ph3E4vinOfWqaXK+LsdP9pA== X-Google-Smtp-Source: ABdhPJwE3O5vPlLIwwOdbL49TN4HbzWmZ5VHe6jljPtrdxkXMqN7oaPXtHoRcyjt74oM3jMzJ1cKkFt4Dx3tnDQvQtA= X-Received: by 2002:aca:c6cc:: with SMTP id w195mr345570oif.101.1601921804728; Mon, 05 Oct 2020 11:16:44 -0700 (PDT) MIME-Version: 1.0 References: <20201002175303.390363-1-daniel.vetter@ffwll.ch> <20201002175303.390363-2-daniel.vetter@ffwll.ch> <20201002180603.GL9916@ziepe.ca> <20201002233118.GM9916@ziepe.ca> <20201004125059.GP9916@ziepe.ca> <20201005172854.GA5177@ziepe.ca> In-Reply-To: <20201005172854.GA5177@ziepe.ca> From: Daniel Vetter Date: Mon, 5 Oct 2020 20:16:33 +0200 Message-ID: Subject: Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM To: Jason Gunthorpe X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201005_141647_115192_C936F8E1 X-CRM114-Status: GOOD ( 39.64 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Oded Gabbay , Inki Dae , linux-samsung-soc , Jan Kara , Joonyoung Shim , Pawel Osciak , John Hubbard , Seung-Woo Kim , LKML , DRI Development , Tomasz Figa , Kyungmin Park , Linux MM , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Daniel Vetter , Andrew Morton , "open list:DMA BUFFER SHARING FRAMEWORK" , Dan Williams , Linux ARM , Marek Szyprowski Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Oct 5, 2020 at 7:28 PM Jason Gunthorpe wrote: > > On Sun, Oct 04, 2020 at 06:09:29PM +0200, Daniel Vetter wrote: > > On Sun, Oct 4, 2020 at 2:51 PM Jason Gunthorpe wrote: > > > > > > On Sat, Oct 03, 2020 at 11:40:22AM +0200, Daniel Vetter wrote: > > > > > > > > That leaves the only interesting places as vb2_dc_get_userptr() and > > > > > vb2_vmalloc_get_userptr() which both completely fail to follow the > > > > > REQUIRED behavior in the function's comment about checking PTEs. It > > > > > just DMA maps them. Badly broken. > > > > > > > > > > Guessing this hackery is for some embedded P2P DMA transfer? > > > > > > > > Yeah, see also the follow_pfn trickery in > > > > videobuf_dma_contig_user_get(), I think this is fully intentional and > > > > userspace abi we can't break :-/ > > > > > > We don't need to break uABI, it just needs to work properly in the > > > kernel: > > > > > > vma = find_vma_intersection() > > > dma_buf = dma_buf_get_from_vma(vma) > > > sg = dma_buf_p2p_dma_map(dma_buf) > > > [.. do dma ..] > > > dma_buf_unmap(sg) > > > dma_buf_put(dma_buf) > > > > > > It is as we discussed before, dma buf needs to be discoverable from a > > > VMA, at least for users doing this kind of stuff. > > > > I'm not a big fan of magic behaviour like this, there's more to > > dma-buf buffer sharing than just "how do I get at the backing > > storage". Thus far we've done everything rather explicitly. Plus with > > exynos and habanalabs converted there's only v4l left over, and that > > has a proper dma-buf import path already. > > Well, any VA approach like this has to access some backing refcount > via the VMA. Not really any way to avoid something like that > > > > A VM flag doesn't help - we need to introduce some kind of lifetime, > > > and that has to be derived from the VMA. It needs data not just a flag > > > > I don't want to make it work, I just want to make it fail. Rough idea > > I have in mind is to add a follow_pfn_longterm, for all callers which > > aren't either synchronized through mmap_sem or an mmu_notifier. > > follow_pfn() doesn't work outside the pagetable locks or mmu notifier > protection. Can't be fixed. > > We only have a few users: > > arch/s390/pci/pci_mmio.c: ret = follow_pfn(vma, user_addr, pfn); > drivers/media/v4l2-core/videobuf-dma-contig.c: ret = follow_pfn(vma, user_address, &this_pfn); > drivers/vfio/vfio_iommu_type1.c: ret = follow_pfn(vma, vaddr, pfn); > drivers/vfio/vfio_iommu_type1.c: ret = follow_pfn(vma, vaddr, pfn); > mm/frame_vector.c: err = follow_pfn(vma, start, &nums[ret]); > virt/kvm/kvm_main.c: r = follow_pfn(vma, addr, &pfn); > virt/kvm/kvm_main.c: r = follow_pfn(vma, addr, &pfn); > > VFIO is broken like media, but I saw patches fixing the vfio cases > using the VMA and a vfio specific refcount. > > media & frame_vector we are talking about here. > > kvm is some similar hack added for P2P DMA, see commit > add6a0cd1c5ba51b201e1361b05a5df817083618. It might be protected by notifiers.. Yeah my thinking is that kvm (and I think also vfio, also seems to have mmu notifier nearby) are ok because of the mmu notiifer. Assuming that one works correctly. > s390 looks broken too, needs to hold the page table locks. Hm yeah I guess that looks fairly reasonable to fix too. > So, the answer really is that s390 and media need fixing, and this API > should go away (or become kvm specific) I'm still not clear how you want fo fix this, since your vma->dma_buf idea is kinda a decade long plan and so just not going to happen: - v4l used this mostly (afaik the lore at least) for buffer sharing with v4l itself, and also a bit with fbdev. Neither even has any dma-buf exporter code as-is. - like I said, there's no central dma-buf instance, it was fairly intentionally create as an all-to-all abstraction. Which means you either have to roll out a vm_ops->gimme_the_dmabuf or, even more work, refactor all the dma-buf exporters to go through the same things - even where we have dma-buf, most mmaps of buffer objects aren't a dma-buf. Those are only set up when userspace explicitly asks for one, so we'd also need to change the mmap code of all drivers involved to make sure the dma-buf is always created when we do any kind of mmap. I don't see that as a realistic thing to ever happen, and meanwhile we can't leave the gap open for a few years. > > If this really breaks anyone's use-case we can add a tainting kernel > > option which re-enables this (we've done something similar for > > phys_addr_t based buffer sharing in fbdev, entirely unfixable since > > the other driver has to just blindly trust that what userspace > > passes around is legit). This here isn't unfixable, but if v4l > > people want to keep it without a big "security hole here" sticker, > > they should do the work, not me :-) > > This seems fairly reasonable.. > > So after frame_vec is purged and we have the one caller in media, move > all this stuff to media and taint the kernel if it goes down the > follow_pfn path Yeah I think moving frame_vec back to media sounds like a good idea, it should stop new users like habanalbas/exynos from popping up at least. It's follow_pfn that freaks me out more. -Daniel -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B21FC47095 for ; Mon, 5 Oct 2020 18:16:48 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 46A8E2100A for ; Mon, 5 Oct 2020 18:16:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="g6/KKyKk" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 46A8E2100A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ffwll.ch Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8BF5489C49; Mon, 5 Oct 2020 18:16:47 +0000 (UTC) Received: from mail-oi1-x243.google.com (mail-oi1-x243.google.com [IPv6:2607:f8b0:4864:20::243]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9891F89B69 for ; Mon, 5 Oct 2020 18:16:45 +0000 (UTC) Received: by mail-oi1-x243.google.com with SMTP id c13so9637247oiy.6 for ; Mon, 05 Oct 2020 11:16:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FN2AJgQesBFrflJe0AntQ7YnHAZ/BkQQlGgu8HacbQA=; b=g6/KKyKk/tBCaMaj97Lm7iHrYxg7bJ29XQV8IhKUtxpXBMElnA0Mbv+lpKUShMIfHO MqqLNs2Ci3XA+tHIPwb3KdmMIDgXvYMIh2Gq+/uts7XqiK36dJTgztw6Syp+RG3BsD5T Ppy21BbMDLHOP8TA62tHInwCRD/lrYW410N2c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FN2AJgQesBFrflJe0AntQ7YnHAZ/BkQQlGgu8HacbQA=; b=oDOjsUq8YSEvGrizHRSG8Md/3VE0+LUBZNzkCtFQo2i1zHnHOU6LlTcAvimP45O+mk BgvnR1nzA3eKtMiBXwysEXL0SMO0NmUVJPNkrLSUAyro3UYutA/2S9/K/AlBBlSKc26c Uz4qQ2G5K18ecKdvHJcfcc8KS0Ghg1rXC55/sA73V6so2uHgwnNmCS3a5Er/HCfUNzbt ifCNtM6JnXZNjxX/PFlk0THDas2E9I04YSgQDC6bej/CMX/sIiEOiiVD7irNPsdrruXT cB9JkVrvtH4rzZJDfleXgT9V9ff0ma5y6bZ284LnD8icxAodcepvusS6i2FKPWRTnSvG 76Sg== X-Gm-Message-State: AOAM532VucHHxdo2ZPwLSGj9pLGzC0JBllCeOWIaorw6z4+qpzNQgnNK 0e3Cexa7mAkjt8FE6DiSurEbF39oun5rOpgJaZuBdA== X-Google-Smtp-Source: ABdhPJwE3O5vPlLIwwOdbL49TN4HbzWmZ5VHe6jljPtrdxkXMqN7oaPXtHoRcyjt74oM3jMzJ1cKkFt4Dx3tnDQvQtA= X-Received: by 2002:aca:c6cc:: with SMTP id w195mr345570oif.101.1601921804728; Mon, 05 Oct 2020 11:16:44 -0700 (PDT) MIME-Version: 1.0 References: <20201002175303.390363-1-daniel.vetter@ffwll.ch> <20201002175303.390363-2-daniel.vetter@ffwll.ch> <20201002180603.GL9916@ziepe.ca> <20201002233118.GM9916@ziepe.ca> <20201004125059.GP9916@ziepe.ca> <20201005172854.GA5177@ziepe.ca> In-Reply-To: <20201005172854.GA5177@ziepe.ca> From: Daniel Vetter Date: Mon, 5 Oct 2020 20:16:33 +0200 Message-ID: Subject: Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM To: Jason Gunthorpe X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-samsung-soc , Jan Kara , Joonyoung Shim , Pawel Osciak , John Hubbard , Seung-Woo Kim , LKML , DRI Development , Tomasz Figa , Kyungmin Park , Linux MM , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Daniel Vetter , Andrew Morton , "open list:DMA BUFFER SHARING FRAMEWORK" , Dan Williams , Linux ARM , Marek Szyprowski Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Mon, Oct 5, 2020 at 7:28 PM Jason Gunthorpe wrote: > > On Sun, Oct 04, 2020 at 06:09:29PM +0200, Daniel Vetter wrote: > > On Sun, Oct 4, 2020 at 2:51 PM Jason Gunthorpe wrote: > > > > > > On Sat, Oct 03, 2020 at 11:40:22AM +0200, Daniel Vetter wrote: > > > > > > > > That leaves the only interesting places as vb2_dc_get_userptr() and > > > > > vb2_vmalloc_get_userptr() which both completely fail to follow the > > > > > REQUIRED behavior in the function's comment about checking PTEs. It > > > > > just DMA maps them. Badly broken. > > > > > > > > > > Guessing this hackery is for some embedded P2P DMA transfer? > > > > > > > > Yeah, see also the follow_pfn trickery in > > > > videobuf_dma_contig_user_get(), I think this is fully intentional and > > > > userspace abi we can't break :-/ > > > > > > We don't need to break uABI, it just needs to work properly in the > > > kernel: > > > > > > vma = find_vma_intersection() > > > dma_buf = dma_buf_get_from_vma(vma) > > > sg = dma_buf_p2p_dma_map(dma_buf) > > > [.. do dma ..] > > > dma_buf_unmap(sg) > > > dma_buf_put(dma_buf) > > > > > > It is as we discussed before, dma buf needs to be discoverable from a > > > VMA, at least for users doing this kind of stuff. > > > > I'm not a big fan of magic behaviour like this, there's more to > > dma-buf buffer sharing than just "how do I get at the backing > > storage". Thus far we've done everything rather explicitly. Plus with > > exynos and habanalabs converted there's only v4l left over, and that > > has a proper dma-buf import path already. > > Well, any VA approach like this has to access some backing refcount > via the VMA. Not really any way to avoid something like that > > > > A VM flag doesn't help - we need to introduce some kind of lifetime, > > > and that has to be derived from the VMA. It needs data not just a flag > > > > I don't want to make it work, I just want to make it fail. Rough idea > > I have in mind is to add a follow_pfn_longterm, for all callers which > > aren't either synchronized through mmap_sem or an mmu_notifier. > > follow_pfn() doesn't work outside the pagetable locks or mmu notifier > protection. Can't be fixed. > > We only have a few users: > > arch/s390/pci/pci_mmio.c: ret = follow_pfn(vma, user_addr, pfn); > drivers/media/v4l2-core/videobuf-dma-contig.c: ret = follow_pfn(vma, user_address, &this_pfn); > drivers/vfio/vfio_iommu_type1.c: ret = follow_pfn(vma, vaddr, pfn); > drivers/vfio/vfio_iommu_type1.c: ret = follow_pfn(vma, vaddr, pfn); > mm/frame_vector.c: err = follow_pfn(vma, start, &nums[ret]); > virt/kvm/kvm_main.c: r = follow_pfn(vma, addr, &pfn); > virt/kvm/kvm_main.c: r = follow_pfn(vma, addr, &pfn); > > VFIO is broken like media, but I saw patches fixing the vfio cases > using the VMA and a vfio specific refcount. > > media & frame_vector we are talking about here. > > kvm is some similar hack added for P2P DMA, see commit > add6a0cd1c5ba51b201e1361b05a5df817083618. It might be protected by notifiers.. Yeah my thinking is that kvm (and I think also vfio, also seems to have mmu notifier nearby) are ok because of the mmu notiifer. Assuming that one works correctly. > s390 looks broken too, needs to hold the page table locks. Hm yeah I guess that looks fairly reasonable to fix too. > So, the answer really is that s390 and media need fixing, and this API > should go away (or become kvm specific) I'm still not clear how you want fo fix this, since your vma->dma_buf idea is kinda a decade long plan and so just not going to happen: - v4l used this mostly (afaik the lore at least) for buffer sharing with v4l itself, and also a bit with fbdev. Neither even has any dma-buf exporter code as-is. - like I said, there's no central dma-buf instance, it was fairly intentionally create as an all-to-all abstraction. Which means you either have to roll out a vm_ops->gimme_the_dmabuf or, even more work, refactor all the dma-buf exporters to go through the same things - even where we have dma-buf, most mmaps of buffer objects aren't a dma-buf. Those are only set up when userspace explicitly asks for one, so we'd also need to change the mmap code of all drivers involved to make sure the dma-buf is always created when we do any kind of mmap. I don't see that as a realistic thing to ever happen, and meanwhile we can't leave the gap open for a few years. > > If this really breaks anyone's use-case we can add a tainting kernel > > option which re-enables this (we've done something similar for > > phys_addr_t based buffer sharing in fbdev, entirely unfixable since > > the other driver has to just blindly trust that what userspace > > passes around is legit). This here isn't unfixable, but if v4l > > people want to keep it without a big "security hole here" sticker, > > they should do the work, not me :-) > > This seems fairly reasonable.. > > So after frame_vec is purged and we have the one caller in media, move > all this stuff to media and taint the kernel if it goes down the > follow_pfn path Yeah I think moving frame_vec back to media sounds like a good idea, it should stop new users like habanalbas/exynos from popping up at least. It's follow_pfn that freaks me out more. -Daniel -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel