All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>,
	Juergen Gross <jgross@suse.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	linux-mm@kvack.org, Peter Zijlstra <peterz@infradead.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	Minchan Kim <minchan@kernel.org>,
	dri-devel@lists.freedesktop.org, xen-devel@lists.xenproject.org,
	Andrew Morton <akpm@linux-foundation.org>,
	intel-gfx@lists.freedesktop.org, Nitin Gupta <ngupta@vflare.org>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	Matthew Auld <matthew.auld@intel.com>
Subject: Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map
Date: Tue, 22 Sep 2020 18:04:37 +0100	[thread overview]
Message-ID: <1b05b9d6-a14c-85cd-0728-d0d40c9ff84b@linux.intel.com> (raw)
In-Reply-To: <20200922163346.GA1701@lst.de>


On 22/09/2020 17:33, Christoph Hellwig wrote:
> On Tue, Sep 22, 2020 at 05:13:45PM +0100, Tvrtko Ursulin wrote:
>>>    void *shmem_pin_map(struct file *file)
>>>    {
>>> -	const size_t n_pte = shmem_npte(file);
>>> -	pte_t *stack[32], **ptes, **mem;
>>
>> Chris can comment how much he'd miss the 32 page stack shortcut.
> 
> I'd like to see a profile that claim that kmalloc matters in a
> path that does a vmap and reads pages through the page cache.
> Especially when the kmalloc saves doing another page cache lookup
> on the free side.

Only reason I can come up with now is if mapping side is on a latency 
sensitive path, while un-mapping is lazy/delayed so can be more costly. 
Then fast map and extra cost on unmap may make sense.

It more applies to the other i915 patch, which implements a much more 
used API, but whether or not we can demonstrate any difference in the 
perf profiles I couldn't tell you without trying to collect some.

>> Is there something in vmap() preventing us from freeing the pages array
>> here? I can't spot anything that is holding on to the pointer. Or it was
>> just a sketch before you realized we could walk the vm_area?
>>
>> Also, I may be totally misunderstanding something, but I think you need to
>> assign area->pages manually so shmem_unpin_map can access it below.
> 
> We need area->pages to hold the pages for the free side.  That being
> said the patch I posted is broken because it never assigned to that.
> As said it was a sketch.  This is the patch I just rebooted into on
> my Laptop:
> 
> http://git.infradead.org/users/hch/misc.git/commitdiff/048522dfa26b6667adfb0371ff530dc263abe829
> 
> it needs extra prep patches from the series:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/alloc_vm_area
> 
>>>    	mapping_clear_unevictable(file->f_mapping);
>>> -	__shmem_unpin_map(file, ptr, shmem_npte(file));
>>> +	for (i = 0; i < shmem_npages(file); i++)
>>> +		put_page(area->pages[i]);
>>> +	kvfree(area->pages);
>>> +	vunmap(ptr);
>>
>> Is the verdict from mm experts that we can't use vfree due __free_pages vs
>> put_page differences?
> 
> Switched to vfree now.
> 
>> Could we get from ptes to pages, so that we don't have to keep the
>> area->pages array allocated for the duration of the pin?
> 
> We could do vmalloc_to_page, but that is fairly expensive (not as bad
> as reading from the page cache..).  Are you really worried about the
> allocation?

Not so much given how we don't even use shmem_pin_map outside selftests.

If we start using it I expect it will be for tiny objects anyway. Only 
if they end up being pinned for the lifetime of the driver, it may be a 
pointless waste of memory compared to the downsides of vmalloc_to_page. 
But we can revisit this particular edge case optimization if the need 
arises.

I'll look at your other i915 patch tomorrow.

Regards,

Tvrtko


WARNING: multiple messages have this Message-ID (diff)
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Juergen Gross <jgross@suse.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Minchan Kim <minchan@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	intel-gfx@lists.freedesktop.org, x86@kernel.org,
	linux-kernel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	linux-mm@kvack.org, dri-devel@lists.freedesktop.org,
	xen-devel@lists.xenproject.org,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nitin Gupta <ngupta@vflare.org>,
	Matthew Auld <matthew.auld@intel.com>
Subject: Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map
Date: Tue, 22 Sep 2020 18:04:37 +0100	[thread overview]
Message-ID: <1b05b9d6-a14c-85cd-0728-d0d40c9ff84b@linux.intel.com> (raw)
In-Reply-To: <20200922163346.GA1701@lst.de>


On 22/09/2020 17:33, Christoph Hellwig wrote:
> On Tue, Sep 22, 2020 at 05:13:45PM +0100, Tvrtko Ursulin wrote:
>>>    void *shmem_pin_map(struct file *file)
>>>    {
>>> -	const size_t n_pte = shmem_npte(file);
>>> -	pte_t *stack[32], **ptes, **mem;
>>
>> Chris can comment how much he'd miss the 32 page stack shortcut.
> 
> I'd like to see a profile that claim that kmalloc matters in a
> path that does a vmap and reads pages through the page cache.
> Especially when the kmalloc saves doing another page cache lookup
> on the free side.

Only reason I can come up with now is if mapping side is on a latency 
sensitive path, while un-mapping is lazy/delayed so can be more costly. 
Then fast map and extra cost on unmap may make sense.

It more applies to the other i915 patch, which implements a much more 
used API, but whether or not we can demonstrate any difference in the 
perf profiles I couldn't tell you without trying to collect some.

>> Is there something in vmap() preventing us from freeing the pages array
>> here? I can't spot anything that is holding on to the pointer. Or it was
>> just a sketch before you realized we could walk the vm_area?
>>
>> Also, I may be totally misunderstanding something, but I think you need to
>> assign area->pages manually so shmem_unpin_map can access it below.
> 
> We need area->pages to hold the pages for the free side.  That being
> said the patch I posted is broken because it never assigned to that.
> As said it was a sketch.  This is the patch I just rebooted into on
> my Laptop:
> 
> http://git.infradead.org/users/hch/misc.git/commitdiff/048522dfa26b6667adfb0371ff530dc263abe829
> 
> it needs extra prep patches from the series:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/alloc_vm_area
> 
>>>    	mapping_clear_unevictable(file->f_mapping);
>>> -	__shmem_unpin_map(file, ptr, shmem_npte(file));
>>> +	for (i = 0; i < shmem_npages(file); i++)
>>> +		put_page(area->pages[i]);
>>> +	kvfree(area->pages);
>>> +	vunmap(ptr);
>>
>> Is the verdict from mm experts that we can't use vfree due __free_pages vs
>> put_page differences?
> 
> Switched to vfree now.
> 
>> Could we get from ptes to pages, so that we don't have to keep the
>> area->pages array allocated for the duration of the pin?
> 
> We could do vmalloc_to_page, but that is fairly expensive (not as bad
> as reading from the page cache..).  Are you really worried about the
> allocation?

Not so much given how we don't even use shmem_pin_map outside selftests.

If we start using it I expect it will be for tiny objects anyway. Only 
if they end up being pinned for the lifetime of the driver, it may be a 
pointless waste of memory compared to the downsides of vmalloc_to_page. 
But we can revisit this particular edge case optimization if the need 
arises.

I'll look at your other i915 patch tomorrow.

Regards,

Tvrtko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Juergen Gross <jgross@suse.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Minchan Kim <minchan@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	intel-gfx@lists.freedesktop.org, x86@kernel.org,
	linux-kernel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	linux-mm@kvack.org, dri-devel@lists.freedesktop.org,
	xen-devel@lists.xenproject.org,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nitin Gupta <ngupta@vflare.org>,
	Matthew Auld <matthew.auld@intel.com>
Subject: Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map
Date: Tue, 22 Sep 2020 18:04:37 +0100	[thread overview]
Message-ID: <1b05b9d6-a14c-85cd-0728-d0d40c9ff84b@linux.intel.com> (raw)
In-Reply-To: <20200922163346.GA1701@lst.de>


On 22/09/2020 17:33, Christoph Hellwig wrote:
> On Tue, Sep 22, 2020 at 05:13:45PM +0100, Tvrtko Ursulin wrote:
>>>    void *shmem_pin_map(struct file *file)
>>>    {
>>> -	const size_t n_pte = shmem_npte(file);
>>> -	pte_t *stack[32], **ptes, **mem;
>>
>> Chris can comment how much he'd miss the 32 page stack shortcut.
> 
> I'd like to see a profile that claim that kmalloc matters in a
> path that does a vmap and reads pages through the page cache.
> Especially when the kmalloc saves doing another page cache lookup
> on the free side.

Only reason I can come up with now is if mapping side is on a latency 
sensitive path, while un-mapping is lazy/delayed so can be more costly. 
Then fast map and extra cost on unmap may make sense.

It more applies to the other i915 patch, which implements a much more 
used API, but whether or not we can demonstrate any difference in the 
perf profiles I couldn't tell you without trying to collect some.

>> Is there something in vmap() preventing us from freeing the pages array
>> here? I can't spot anything that is holding on to the pointer. Or it was
>> just a sketch before you realized we could walk the vm_area?
>>
>> Also, I may be totally misunderstanding something, but I think you need to
>> assign area->pages manually so shmem_unpin_map can access it below.
> 
> We need area->pages to hold the pages for the free side.  That being
> said the patch I posted is broken because it never assigned to that.
> As said it was a sketch.  This is the patch I just rebooted into on
> my Laptop:
> 
> http://git.infradead.org/users/hch/misc.git/commitdiff/048522dfa26b6667adfb0371ff530dc263abe829
> 
> it needs extra prep patches from the series:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/alloc_vm_area
> 
>>>    	mapping_clear_unevictable(file->f_mapping);
>>> -	__shmem_unpin_map(file, ptr, shmem_npte(file));
>>> +	for (i = 0; i < shmem_npages(file); i++)
>>> +		put_page(area->pages[i]);
>>> +	kvfree(area->pages);
>>> +	vunmap(ptr);
>>
>> Is the verdict from mm experts that we can't use vfree due __free_pages vs
>> put_page differences?
> 
> Switched to vfree now.
> 
>> Could we get from ptes to pages, so that we don't have to keep the
>> area->pages array allocated for the duration of the pin?
> 
> We could do vmalloc_to_page, but that is fairly expensive (not as bad
> as reading from the page cache..).  Are you really worried about the
> allocation?

Not so much given how we don't even use shmem_pin_map outside selftests.

If we start using it I expect it will be for tiny objects anyway. Only 
if they end up being pinned for the lifetime of the driver, it may be a 
pointless waste of memory compared to the downsides of vmalloc_to_page. 
But we can revisit this particular edge case optimization if the need 
arises.

I'll look at your other i915 patch tomorrow.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2020-09-22 17:05 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-18 16:37 remove alloc_vm_area Christoph Hellwig
2020-09-18 16:37 ` [Intel-gfx] " Christoph Hellwig
2020-09-18 16:37 ` [PATCH 1/6] zsmalloc: switch from alloc_vm_area to get_vm_area Christoph Hellwig
2020-09-18 16:37   ` [Intel-gfx] " Christoph Hellwig
2020-09-21 17:42   ` Minchan Kim
2020-09-21 17:42     ` [Intel-gfx] " Minchan Kim
2020-09-21 17:42     ` Minchan Kim
2020-09-21 18:17     ` Christoph Hellwig
2020-09-21 18:17       ` [Intel-gfx] " Christoph Hellwig
2020-09-21 18:42       ` Minchan Kim
2020-09-21 18:42         ` [Intel-gfx] " Minchan Kim
2020-09-21 18:42         ` Minchan Kim
2020-09-21 18:43         ` Christoph Hellwig
2020-09-21 18:43           ` [Intel-gfx] " Christoph Hellwig
2020-09-18 16:37 ` [PATCH 2/6] mm: add a vmap_pfn function Christoph Hellwig
2020-09-18 16:37   ` [Intel-gfx] " Christoph Hellwig
2020-09-18 16:37 ` [PATCH 3/6] drm/i915: use vmap in shmem_pin_map Christoph Hellwig
2020-09-18 16:37   ` [Intel-gfx] " Christoph Hellwig
2020-09-21 19:11   ` Matthew Wilcox
2020-09-21 19:11     ` [Intel-gfx] " Matthew Wilcox
2020-09-21 19:11     ` Matthew Wilcox
2020-09-22  6:22     ` Christoph Hellwig
2020-09-22  6:22       ` [Intel-gfx] " Christoph Hellwig
2020-09-22  8:23       ` Tvrtko Ursulin
2020-09-22  8:23         ` Tvrtko Ursulin
2020-09-22  8:23         ` Tvrtko Ursulin
2020-09-22 14:31         ` Christoph Hellwig
2020-09-22 14:31           ` Christoph Hellwig
2020-09-22 16:13           ` Tvrtko Ursulin
2020-09-22 16:13             ` Tvrtko Ursulin
2020-09-22 16:13             ` Tvrtko Ursulin
2020-09-22 16:33             ` Christoph Hellwig
2020-09-22 16:33               ` Christoph Hellwig
2020-09-22 17:04               ` Tvrtko Ursulin [this message]
2020-09-22 17:04                 ` Tvrtko Ursulin
2020-09-22 17:04                 ` Tvrtko Ursulin
2020-09-23  6:11                 ` Christoph Hellwig
2020-09-23  6:11                   ` Christoph Hellwig
2020-09-22 11:21       ` Matthew Wilcox
2020-09-22 11:21         ` [Intel-gfx] " Matthew Wilcox
2020-09-22 11:21         ` Matthew Wilcox
2020-09-22 14:39         ` Christoph Hellwig
2020-09-22 14:39           ` [Intel-gfx] " Christoph Hellwig
2020-09-22 14:53           ` Matthew Wilcox
2020-09-22 14:53             ` [Intel-gfx] " Matthew Wilcox
2020-09-22 14:53             ` Matthew Wilcox
2020-09-18 16:37 ` [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map Christoph Hellwig
2020-09-18 16:37   ` [Intel-gfx] " Christoph Hellwig
2020-09-23  9:52   ` Tvrtko Ursulin
2020-09-23  9:52     ` Tvrtko Ursulin
2020-09-23  9:52     ` Tvrtko Ursulin
2020-09-23 13:41     ` Christoph Hellwig
2020-09-23 13:41       ` Christoph Hellwig
2020-09-23 13:58       ` Tvrtko Ursulin
2020-09-23 13:58         ` Tvrtko Ursulin
2020-09-23 13:58         ` Tvrtko Ursulin
2020-09-23 14:44         ` Christoph Hellwig
2020-09-23 14:44           ` Christoph Hellwig
2020-09-24 12:22           ` Tvrtko Ursulin
2020-09-24 12:22             ` Tvrtko Ursulin
2020-09-24 12:22             ` Tvrtko Ursulin
2020-09-24 13:23             ` Christoph Hellwig
2020-09-24 13:23               ` Christoph Hellwig
2020-09-18 16:37 ` [PATCH 5/6] xen/xenbus: use apply_to_page_range directly in xenbus_map_ring_pv Christoph Hellwig
2020-09-18 16:37   ` [Intel-gfx] " Christoph Hellwig
2020-09-18 16:37 ` [PATCH 6/6] x86/xen: open code alloc_vm_area in arch_gnttab_valloc Christoph Hellwig
2020-09-18 16:37   ` [Intel-gfx] " Christoph Hellwig
2020-09-21 20:44   ` boris.ostrovsky
2020-09-21 20:44     ` [Intel-gfx] " boris.ostrovsky
2020-09-21 20:44     ` boris.ostrovsky
2020-09-22 14:58     ` Christoph Hellwig
2020-09-22 14:58       ` [Intel-gfx] " Christoph Hellwig
2020-09-22 15:24       ` boris.ostrovsky
2020-09-22 15:24         ` [Intel-gfx] " boris.ostrovsky
2020-09-22 15:24         ` boris.ostrovsky
2020-09-22 15:27         ` Christoph Hellwig
2020-09-22 15:27           ` [Intel-gfx] " Christoph Hellwig
2020-09-22 15:34           ` boris.ostrovsky
2020-09-22 15:34             ` [Intel-gfx] " boris.ostrovsky
2020-09-22 15:34             ` boris.ostrovsky
2020-09-18 17:03 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/6] zsmalloc: switch from alloc_vm_area to get_vm_area Patchwork
2020-09-21 17:50 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/6] zsmalloc: switch from alloc_vm_area to get_vm_area (rev2) Patchwork
2020-09-21 18:47 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/6] zsmalloc: switch from alloc_vm_area to get_vm_area (rev3) Patchwork
2020-09-22 14:44 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/6] zsmalloc: switch from alloc_vm_area to get_vm_area (rev4) Patchwork
2020-09-22 15:01 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/6] zsmalloc: switch from alloc_vm_area to get_vm_area (rev5) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1b05b9d6-a14c-85cd-0728-d0d40c9ff84b@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jgross@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.auld@intel.com \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=peterz@infradead.org \
    --cc=sstabellini@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.