BPF Archive on lore.kernel.org
 help / color / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Jerome Glisse <jglisse@redhat.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Björn Töpel" <bjorn.topel@intel.com>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Dave Chinner" <david@fromorbit.com>,
	"David Airlie" <airlied@linux.ie>,
	"David S . Miller" <davem@davemloft.net>,
	"Ira Weiny" <ira.weiny@intel.com>, "Jan Kara" <jack@suse.cz>,
	"Jason Gunthorpe" <jgg@ziepe.ca>, "Jens Axboe" <axboe@kernel.dk>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Magnus Karlsson" <magnus.karlsson@intel.com>,
	"Mauro Carvalho Chehab" <mchehab@kernel.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Michal Hocko" <mhocko@suse.com>,
	"Mike Kravetz" <mike.kravetz@oracle.com>,
	"Paul Mackerras" <paulus@samba.org>,
	"Shuah Khan" <shuah@kernel.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	bpf@vger.kernel.org, dri-devel@lists.freedesktop.org,
	kvm@vger.kernel.org, linux-block@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org,
	linux-rdma@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	netdev@vger.kernel.org, linux-mm@kvack.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 12/18] mm/gup: track FOLL_PIN pages
Date: Mon, 4 Nov 2019 16:18:33 -0800
Message-ID: <f587647d-83dc-5bde-d244-f522ec5bda60@nvidia.com> (raw)
In-Reply-To: <20191104234920.GA18515@redhat.com>

Hi Dan, there is a question for you further down:


On 11/4/19 3:49 PM, Jerome Glisse wrote:
> On Mon, Nov 04, 2019 at 02:49:18PM -0800, John Hubbard wrote:
...
>>> Maybe add a small comment about wrap around :)
>>
>>
>> I don't *think* the count can wrap around, due to the checks in user_page_ref_inc().
>>
>> But it's true that the documentation is a little light here...What did you have 
>> in mind?
> 
> About false positive case (and how unlikely they are) and that wrap
> around is properly handle. Maybe just a pointer to the documentation
> so that people know they can go look there for details. I know my
> brain tend to forget where to look for things so i like to be constantly
> reminded hey the doc is Documentations/foobar :)
> 

I see. OK, here's a version with a thoroughly overhauled comment header:

/**
 * page_dma_pinned() - report if a page is pinned for DMA.
 *
 * This function checks if a page has been pinned via a call to
 * pin_user_pages*() or pin_longterm_pages*().
 *
 * The return value is partially fuzzy: false is not fuzzy, because it means
 * "definitely not pinned for DMA", but true means "probably pinned for DMA, but
 * possibly a false positive due to having at least GUP_PIN_COUNTING_BIAS worth
 * of normal page references".
 *
 * False positives are OK, because: a) it's unlikely for a page to get that many
 * refcounts, and b) all the callers of this routine are expected to be able to
 * deal gracefully with a false positive.
 *
 * For more information, please see Documentation/vm/pin_user_pages.rst.
 *
 * @page:	pointer to page to be queried.
 * @Return:	True, if it is likely that the page has been "dma-pinned".
 *		False, if the page is definitely not dma-pinned.
 */
static inline bool page_dma_pinned(struct page *page)


>>> [...]
>>>
>>>> @@ -1930,12 +2028,20 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
>>>>  
>>>>  		pgmap = get_dev_pagemap(pfn, pgmap);
>>>>  		if (unlikely(!pgmap)) {
>>>> -			undo_dev_pagemap(nr, nr_start, pages);
>>>> +			undo_dev_pagemap(nr, nr_start, flags, pages);
>>>>  			return 0;
>>>>  		}
>>>>  		SetPageReferenced(page);
>>>>  		pages[*nr] = page;
>>>> -		get_page(page);
>>>> +
>>>> +		if (flags & FOLL_PIN) {
>>>> +			if (unlikely(!user_page_ref_inc(page))) {
>>>> +				undo_dev_pagemap(nr, nr_start, flags, pages);
>>>> +				return 0;
>>>> +			}
>>>
>>> Maybe add a comment about a case that should never happens ie
>>> user_page_ref_inc() fails after the second iteration of the
>>> loop as it would be broken and a bug to call undo_dev_pagemap()
>>> after the first iteration of that loop.
>>>
>>> Also i believe that this should never happens as if first
>>> iteration succeed than __page_cache_add_speculative() will
>>> succeed for all the iterations.
>>>
>>> Note that the pgmap case above follows that too ie the call to
>>> get_dev_pagemap() can only fail on first iteration of the loop,
>>> well i assume you can never have a huge device page that span
>>> different pgmap ie different devices (which is a reasonable
>>> assumption). So maybe this code needs fixing ie :
>>>
>>> 		pgmap = get_dev_pagemap(pfn, pgmap);
>>> 		if (unlikely(!pgmap))
>>> 			return 0;
>>>
>>>
>>
>> OK, yes that does make sense. And I think a comment is adequate,
>> no need to check for bugs during every tail page iteration. So how 
>> about this, as a preliminary patch:
> 
> Actualy i thought about it and i think that there is pgmap
> per section and thus maybe one device can have multiple pgmap
> and that would be an issue for page bigger than section size
> (ie bigger than 128MB iirc). I will go double check that, but
> maybe Dan can chime in.
> 
> In any case my comment above is correct for the page ref
> increment, if the first one succeed than others will too
> or otherwise it means someone is doing too many put_page()/
> put_user_page() which is _bad_ :)
> 

I'll wait to hear from Dan before doing anything rash. :)


thanks,

John Hubbard
NVIDIA

  reply index

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-03 21:17 [PATCH v2 00/18] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM John Hubbard
2019-11-03 21:17 ` [PATCH v2 01/18] mm/gup: pass flags arg to __gup_device_* functions John Hubbard
2019-11-04 16:39   ` Jerome Glisse
2019-11-03 21:17 ` [PATCH v2 02/18] mm/gup: factor out duplicate code from four routines John Hubbard
2019-11-04 16:51   ` Jerome Glisse
2019-11-03 21:17 ` [PATCH v2 03/18] goldish_pipe: rename local pin_user_pages() routine John Hubbard
2019-11-04 16:52   ` Jerome Glisse
2019-11-03 21:17 ` [PATCH v2 04/18] media/v4l2-core: set pages dirty upon releasing DMA buffers John Hubbard
2019-11-10 10:10   ` Hans Verkuil
2019-11-11 21:46     ` John Hubbard
2019-11-03 21:18 ` [PATCH v2 05/18] mm/gup: introduce pin_user_pages*() and FOLL_PIN John Hubbard
2019-11-04 17:33   ` Jerome Glisse
2019-11-04 19:04     ` John Hubbard
2019-11-04 19:18       ` Jerome Glisse
2019-11-04 19:30         ` John Hubbard
2019-11-04 19:52           ` Jerome Glisse
2019-11-04 20:09             ` John Hubbard
2019-11-04 20:31               ` Jason Gunthorpe
2019-11-04 20:40                 ` John Hubbard
2019-11-04 20:31               ` Jerome Glisse
2019-11-04 20:37                 ` Jason Gunthorpe
2019-11-04 20:57                   ` John Hubbard
2019-11-04 21:15                     ` Jason Gunthorpe
2019-11-04 21:34                       ` John Hubbard
2019-11-04 20:33   ` David Rientjes
2019-11-04 20:48     ` Jerome Glisse
2019-11-05 13:10   ` Mike Rapoport
2019-11-05 19:00     ` John Hubbard
2019-11-07  2:25       ` Ira Weiny
2019-11-07  8:07       ` Mike Rapoport
2019-11-03 21:18 ` [PATCH v2 06/18] goldish_pipe: convert to pin_user_pages() and put_user_page() John Hubbard
2019-11-03 21:18 ` [PATCH v2 07/18] infiniband: set FOLL_PIN, FOLL_LONGTERM via pin_longterm_pages*() John Hubbard
2019-11-04 20:33   ` Jason Gunthorpe
2019-11-04 20:48     ` John Hubbard
2019-11-04 20:57       ` Jason Gunthorpe
2019-11-04 22:03         ` John Hubbard
2019-11-05  2:32           ` Jason Gunthorpe
2019-11-07  2:26         ` Ira Weiny
2019-11-03 21:18 ` [PATCH v2 08/18] mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote() John Hubbard
2019-11-04 17:41   ` Jerome Glisse
2019-11-03 21:18 ` [PATCH v2 09/18] drm/via: set FOLL_PIN via pin_user_pages_fast() John Hubbard
2019-11-04 17:44   ` Jerome Glisse
2019-11-04 18:22     ` John Hubbard
2019-11-03 21:18 ` [PATCH v2 10/18] fs/io_uring: set FOLL_PIN via pin_user_pages() John Hubbard
2019-11-03 21:18 ` [PATCH v2 11/18] net/xdp: " John Hubbard
2019-11-03 21:18 ` [PATCH v2 12/18] mm/gup: track FOLL_PIN pages John Hubbard
2019-11-04 18:52   ` Jerome Glisse
2019-11-04 22:49     ` John Hubbard
2019-11-04 23:49       ` Jerome Glisse
2019-11-05  0:18         ` John Hubbard [this message]
2019-11-03 21:18 ` [PATCH v2 13/18] media/v4l2-core: pin_longterm_pages (FOLL_PIN) and put_user_page() conversion John Hubbard
2019-11-10 10:11   ` Hans Verkuil
2019-11-03 21:18 ` [PATCH v2 14/18] vfio, mm: " John Hubbard
2019-11-03 21:18 ` [PATCH v2 15/18] powerpc: book3s64: convert to pin_longterm_pages() and put_user_page() John Hubbard
2019-11-03 21:18 ` [PATCH v2 16/18] mm/gup_benchmark: support pin_user_pages() and related calls John Hubbard
2019-11-03 21:18 ` [PATCH v2 17/18] selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN coverage John Hubbard
2019-11-03 21:18 ` [PATCH v2 18/18] mm/gup: remove support for gup(FOLL_LONGTERM) John Hubbard

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f587647d-83dc-5bde-d244-f522ec5bda60@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=bjorn.topel@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=davem@davemloft.net \
    --cc=david@fromorbit.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=magnus.karlsson@intel.com \
    --cc=mchehab@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mpe@ellerman.id.au \
    --cc=netdev@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=shuah@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git