From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E26B8C4363A for ; Wed, 28 Oct 2020 06:00:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 34C3E22447 for ; Wed, 28 Oct 2020 06:00:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="UFaVq85d" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 34C3E22447 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 665846B005C; Wed, 28 Oct 2020 02:00:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 615E66B005D; Wed, 28 Oct 2020 02:00:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 504EE6B0062; Wed, 28 Oct 2020 02:00:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0190.hostedemail.com [216.40.44.190]) by kanga.kvack.org (Postfix) with ESMTP id 249F46B005C for ; Wed, 28 Oct 2020 02:00:26 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id AE5E23625 for ; Wed, 28 Oct 2020 06:00:25 +0000 (UTC) X-FDA: 77420284410.21.twig89_2d05e6327282 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 92D37180442C2 for ; Wed, 28 Oct 2020 06:00:25 +0000 (UTC) X-HE-Tag: twig89_2d05e6327282 X-Filterd-Recvd-Size: 6353 Received: from hqnvemgate25.nvidia.com (hqnvemgate25.nvidia.com [216.228.121.64]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Wed, 28 Oct 2020 06:00:24 +0000 (UTC) Received: from hqmail.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, AES256-SHA) id ; Tue, 27 Oct 2020 23:00:26 -0700 Received: from [10.2.58.85] (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 28 Oct 2020 06:00:22 +0000 Subject: Re: [PATCH 1/2] mm: reorganize internal_get_user_pages_fast() To: Jason Gunthorpe , Jan Kara CC: , Andrea Arcangeli , Andrew Morton , Christoph Hellwig , Hugh Dickins , Jann Horn , Kirill Shutemov , Kirill Tkhai , Linux-MM , Michal Hocko , Oleg Nesterov , Peter Xu References: <1-v1-281e425c752f+2df-gup_fork_jgg@nvidia.com> <16c50bb0-431d-5bfb-7b80-a8af0b4da90f@nvidia.com> <20201027093301.GA16090@quack2.suse.cz> <20201027131509.GU36674@ziepe.ca> From: John Hubbard Message-ID: <0e04a7ed-a3e6-76f6-0fa8-26e2bbf10ee3@nvidia.com> Date: Tue, 27 Oct 2020 23:00:22 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20201027131509.GU36674@ziepe.ca> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To HQMAIL107.nvidia.com (172.20.187.13) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1603864826; bh=DvH8rRYN+1btG24CbugM+DbRy3SuuPBLL4zfqXI2x70=; h=Subject:To:CC:References:From:Message-ID:Date:User-Agent: MIME-Version:In-Reply-To:Content-Type:Content-Language: Content-Transfer-Encoding:X-Originating-IP:X-ClientProxiedBy; b=UFaVq85de//dl0Jhx4SQC8+HZZkH78201146S2/JEhYEqUWnkNzX+bMeKzceyFZ+O Rhp7WNN9kDmhSLbSbKObLLpQRN7fQnZFlriOgcBhZk/jZRuUv+cdgBCy4ozgOXwyFf M0OoKUIniQvD1XHkUS2491uLXIYc3w2nw/j/zUuJBHkyYETty9g5s0iVbrb14pOQ2Z 6JdrsAKfifv5/4gJPJC1VmK7NUR2bXbZPhnQI6Rh/7YwIfzBrjayjqFMHLdAUx7yDC nOkr/JSsirijfDgHVdqy5zwq5PGxOk7sAdLE2bgHZ2rgb1QMQkJ5YljKeWKtpL0Grv GbQpwzb9V8wHw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/27/20 6:15 AM, Jason Gunthorpe wrote: > On Tue, Oct 27, 2020 at 10:33:01AM +0100, Jan Kara wrote: >> On Fri 23-10-20 21:44:17, John Hubbard wrote: >>> On 10/23/20 5:19 PM, Jason Gunthorpe wrote: >>>> + start += (unsigned long)nr_pinned << PAGE_SHIFT; >>>> + pages += nr_pinned; >>>> + ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned, gup_flags, >>>> + pages); >>>> + if (ret < 0) { >>>> /* Have to be a bit careful with return values */ >>> >>> ...and can we move that comment up one level, so that it reads: >>> >>> /* Have to be a bit careful with return values */ >>> if (ret < 0) { >>> if (nr_pinned) >>> return nr_pinned; >>> return ret; >>> } >>> return ret + nr_pinned; >>> >>> Thinking about this longer term, it would be nice if the whole gup/pup API >>> set just stopped pretending that anyone cares about partial success, because >>> they *don't*. If we had return values of "0 or -ERRNO" throughout, and an >>> additional set of API wrappers that did some sort of limited retry just like >>> some of the callers do, that would be a happier story. >> >> Actually there are callers that care about partial success. See e.g. >> iov_iter_get_pages() usage in fs/direct_io.c:dio_refill_pages() or >> bio_iov_iter_get_pages(). These places handle partial success just fine and >> not allowing partial success from GUP could regress things... > > I looked through a bunch of call sites, and there are a wack that So did I! :) > actually do only want a complete return and are carrying a bunch of > code to fix it: > > pvec = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL); > if (!pvec) > return -ENOMEM; > > do { > unsigned num_pages = npages - pinned; > uint64_t ptr = userptr->ptr + pinned * PAGE_SIZE; > struct page **pages = pvec + pinned; > > ret = pin_user_pages_fast(ptr, num_pages, > !userptr->ro ? FOLL_WRITE : 0, pages); > if (ret < 0) { > unpin_user_pages(pvec, pinned); > kvfree(pvec); > return ret; > } > > pinned += ret; > > } while (pinned < npages); > > Is really a lot better if written as: > > pvec = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL); > if (!pvec) > return -ENOMEM; > ret = pin_user_pages_fast(userptr->ptr, npages, FOLL_COMPLETE | > (!userptr->ro ? FOLL_WRITE : 0), > pvec); > if (ret) { > kvfree(pvec); > return ret; > } > > (eg FOLL_COMPLETE says to return exactly npages or fail) Yes, exactly. And if I reverse the polarity (to Christoph's FOLL_PARTIAL, instead of FOLL_COMPLETE) it's even smaller, slightly. Which is where I am leaning now. > > Some code assumes things work that way already anyhow: > > /* Pin user pages for DMA Xfer */ > err = pin_user_pages_unlocked(user_dma.uaddr, user_dma.page_count, > dma->map, FOLL_FORCE); > > if (user_dma.page_count != err) { > IVTV_DEBUG_WARN("failed to map user pages, returned %d instead of %d\n", > err, user_dma.page_count); > if (err >= 0) { > unpin_user_pages(dma->map, err); > return -EINVAL; > } > return err; > } > > Actually I'm quite surprised I didn't find too many missing the tricky > unpin_user_pages() on the error path - eg > videobuf_dma_init_user_locked() is wrong. > Well. That's not accidental. "Some People" (much thanks to Souptick Joarder, btw) have been fixing up many of those sites, during the pin_user_pages() conversions. Otherwise you would have found about 10 or 15 more. I'll fix up that one above (using your Reported-by, I assume), unless someone else is already taking care of it. thanks, -- John Hubbard NVIDIA