From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3CACCA9EC9 for ; Tue, 5 Nov 2019 00:18:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A03E2204FD for ; Tue, 5 Nov 2019 00:18:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="nx62lGrP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387648AbfKEASi (ORCPT ); Mon, 4 Nov 2019 19:18:38 -0500 Received: from hqemgate14.nvidia.com ([216.228.121.143]:17961 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387415AbfKEASh (ORCPT ); Mon, 4 Nov 2019 19:18:37 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Mon, 04 Nov 2019 16:18:41 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Mon, 04 Nov 2019 16:18:35 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Mon, 04 Nov 2019 16:18:35 -0800 Received: from [10.110.48.28] (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 5 Nov 2019 00:18:34 +0000 Subject: Re: [PATCH v2 12/18] mm/gup: track FOLL_PIN pages To: Jerome Glisse CC: Andrew Morton , Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML References: <20191103211813.213227-1-jhubbard@nvidia.com> <20191103211813.213227-13-jhubbard@nvidia.com> <20191104185238.GG5134@redhat.com> <7821cf87-75a8-45e2-cf28-f85b62192416@nvidia.com> <20191104234920.GA18515@redhat.com> X-Nvconfidentiality: public From: John Hubbard Message-ID: Date: Mon, 4 Nov 2019 16:18:33 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20191104234920.GA18515@redhat.com> X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="windows-1252" Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572913121; bh=it/avAzbdjbPt9c7ZhR3SugLAHDAeo7qSUQnTa8Gd0Y=; h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=nx62lGrPMlwQBcqF9vYxqUTNBIlMwnY3v4vXnUqcAbCO2/xdO7aUYp8qclkmts9z7 gtH9Ov6maSgTLFms+gOL/dE2dCoQ1xMQpdGQy9Akbo7N7NyETXXQXZTJZqjI7N8kF+ n3RQEPIGAaiZBh6nvWW7FoLz32nqtlJ71DClIb0EMAf4xWjkAmpFYtedbKPVLxdf/b tlQKPTEISDGthKQTi64XGZlvQXC4V843TbV9Pv5FMwu9POX8T7Ox5OlQOI/t1RVVuU ehgOt7zOUTQcQRd6bK+4WI3VER1yK/cV2GljR29nekSSEXFENv8o9uy5i0m+683Ub0 piAG/LY3Z7DNw== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Hi Dan, there is a question for you further down: On 11/4/19 3:49 PM, Jerome Glisse wrote: > On Mon, Nov 04, 2019 at 02:49:18PM -0800, John Hubbard wrote: ... >>> Maybe add a small comment about wrap around :) >> >> >> I don't *think* the count can wrap around, due to the checks in user_page_ref_inc(). >> >> But it's true that the documentation is a little light here...What did you have >> in mind? > > About false positive case (and how unlikely they are) and that wrap > around is properly handle. Maybe just a pointer to the documentation > so that people know they can go look there for details. I know my > brain tend to forget where to look for things so i like to be constantly > reminded hey the doc is Documentations/foobar :) > I see. OK, here's a version with a thoroughly overhauled comment header: /** * page_dma_pinned() - report if a page is pinned for DMA. * * This function checks if a page has been pinned via a call to * pin_user_pages*() or pin_longterm_pages*(). * * The return value is partially fuzzy: false is not fuzzy, because it means * "definitely not pinned for DMA", but true means "probably pinned for DMA, but * possibly a false positive due to having at least GUP_PIN_COUNTING_BIAS worth * of normal page references". * * False positives are OK, because: a) it's unlikely for a page to get that many * refcounts, and b) all the callers of this routine are expected to be able to * deal gracefully with a false positive. * * For more information, please see Documentation/vm/pin_user_pages.rst. * * @page: pointer to page to be queried. * @Return: True, if it is likely that the page has been "dma-pinned". * False, if the page is definitely not dma-pinned. */ static inline bool page_dma_pinned(struct page *page) >>> [...] >>> >>>> @@ -1930,12 +2028,20 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, >>>> >>>> pgmap = get_dev_pagemap(pfn, pgmap); >>>> if (unlikely(!pgmap)) { >>>> - undo_dev_pagemap(nr, nr_start, pages); >>>> + undo_dev_pagemap(nr, nr_start, flags, pages); >>>> return 0; >>>> } >>>> SetPageReferenced(page); >>>> pages[*nr] = page; >>>> - get_page(page); >>>> + >>>> + if (flags & FOLL_PIN) { >>>> + if (unlikely(!user_page_ref_inc(page))) { >>>> + undo_dev_pagemap(nr, nr_start, flags, pages); >>>> + return 0; >>>> + } >>> >>> Maybe add a comment about a case that should never happens ie >>> user_page_ref_inc() fails after the second iteration of the >>> loop as it would be broken and a bug to call undo_dev_pagemap() >>> after the first iteration of that loop. >>> >>> Also i believe that this should never happens as if first >>> iteration succeed than __page_cache_add_speculative() will >>> succeed for all the iterations. >>> >>> Note that the pgmap case above follows that too ie the call to >>> get_dev_pagemap() can only fail on first iteration of the loop, >>> well i assume you can never have a huge device page that span >>> different pgmap ie different devices (which is a reasonable >>> assumption). So maybe this code needs fixing ie : >>> >>> pgmap = get_dev_pagemap(pfn, pgmap); >>> if (unlikely(!pgmap)) >>> return 0; >>> >>> >> >> OK, yes that does make sense. And I think a comment is adequate, >> no need to check for bugs during every tail page iteration. So how >> about this, as a preliminary patch: > > Actualy i thought about it and i think that there is pgmap > per section and thus maybe one device can have multiple pgmap > and that would be an issue for page bigger than section size > (ie bigger than 128MB iirc). I will go double check that, but > maybe Dan can chime in. > > In any case my comment above is correct for the page ref > increment, if the first one succeed than others will too > or otherwise it means someone is doing too many put_page()/ > put_user_page() which is _bad_ :) > I'll wait to hear from Dan before doing anything rash. :) thanks, John Hubbard NVIDIA