All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Christoph Lameter <cl@linux.com>
Cc: linux-mm@kvack.org, Vladimir Davydov <vdavydov@virtuozzo.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	brouer@redhat.com
Subject: Re: [RFC PATCH V2 8/9] slab: implement bulk free in SLAB allocator
Date: Mon, 14 Dec 2015 16:19:58 +0100	[thread overview]
Message-ID: <20151214161958.1a8edf79@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1512090945570.30894@east.gentwo.org>


On Wed, 9 Dec 2015 10:06:39 -0600 (CST) Christoph Lameter <cl@linux.com> wrote:

> From: Christoph Lameter <cl@linux.com>
> Subject: slab bulk api: Remove the kmem_cache parameter from kmem_cache_bulk_free()
> 
> It is desirable and necessary to free objects from different kmem_caches.
> It is required in order to support memcg object freeing across different5
> cgroups.
> 
> So drop the pointless parameter and allow freeing of arbitrary lists
> of slab allocated objects.
> 
> This patch also does the proper compound page handling so that
> arbitrary objects allocated via kmalloc() can be handled by
> kmem_cache_bulk_free().
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

I've modified this patch, to instead introduce a kfree_bulk() and keep
the old behavior of kmem_cache_free_bulk().  This allow us to easier
compare the two impl. approaches.

[...]
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c
> +++ linux/mm/slub.c
> @@ -2887,23 +2887,30 @@ static int build_detached_freelist(struc
> 
> 
>  /* Note that interrupts must be enabled when calling this function. */
> -void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p) 
> +void kmem_cache_free_bulk(size_t size, void **p)

Renamed to kfree_bulk(size_t size, void **p)

>  {
>  	if (WARN_ON(!size))
>  		return;
> 
>  	do {
>  		struct detached_freelist df;
> -		struct kmem_cache *s;
> +		struct page *page;
> 
> -		/* Support for memcg */
> -		s = cache_from_obj(orig_s, p[size - 1]);
> +		page = virt_to_head_page(p[size - 1]);
> 
> -		size = build_detached_freelist(s, size, p, &df);
> +		if (unlikely(!PageSlab(page))) {
> +			BUG_ON(!PageCompound(page));
> +			kfree_hook(p[size - 1]);
> +			__free_kmem_pages(page, compound_order(page));
> +			p[--size] = NULL;
> +			continue;
> +		}
> +
> +		size = build_detached_freelist(page->slab_cache, size, p, &df);
> 		if (unlikely(!df.page))
>  			continue;
> 
> -		slab_free(s, df.page, df.freelist, df.tail, df.cnt, _RET_IP_);
> +		slab_free(page->slab_cache, df.page, df.freelist,
> + 			df.tail, df.cnt, _RET_IP_);
> 	} while (likely(size));
>  }
>  EXPORT_SYMBOL(kmem_cache_free_bulk);

This specific implementation was too slow, mostly because we call
virt_to_head_page() both in this function and inside build_detached_freelist().

After integrating this directly into build_detached_freelist() I'm
getting more comparative results.

Results with disabled CONFIG_MEMCG_KMEM, and SLUB (and slab_nomerge for
more accurate results between runs):

 bulk- fallback          - kmem_cache_free_bulk - kfree_bulk
 1 - 58 cycles 14.735 ns -  55 cycles 13.880 ns -  59 cycles 14.843 ns
 2 - 53 cycles 13.298 ns -  32 cycles  8.037 ns -  34 cycles 8.592 ns
 3 - 51 cycles 12.837 ns -  25 cycles  6.442 ns -  27 cycles 6.794 ns
 4 - 50 cycles 12.514 ns -  23 cycles  5.952 ns -  23 cycles 5.958 ns
 8 - 48 cycles 12.097 ns -  20 cycles  5.160 ns -  22 cycles 5.505 ns
 16 - 47 cycles 11.888 ns -  19 cycles 4.900 ns -  19 cycles 4.969 ns
 30 - 47 cycles 11.793 ns -  18 cycles 4.688 ns -  18 cycles 4.682 ns
 32 - 47 cycles 11.926 ns -  18 cycles 4.674 ns -  18 cycles 4.702 ns
 34 - 95 cycles 23.823 ns -  24 cycles 6.068 ns -  24 cycles 6.058 ns
 48 - 81 cycles 20.258 ns -  21 cycles 5.360 ns -  21 cycles 5.338 ns
 64 - 73 cycles 18.414 ns -  20 cycles 5.160 ns -  20 cycles 5.140 ns
 128 - 90 cycles 22.563 ns -  27 cycles 6.765 ns -  27 cycles 6.801 ns
 158 - 99 cycles 24.831 ns -  30 cycles 7.625 ns -  30 cycles 7.720 ns
 250 - 104 cycles 26.173 ns -  37 cycles 9.271 ns -  37 cycles 9.371 ns

As can been seen the old kmem_cache_free_bulk() is faster than the new
kfree_bulk() (which omits the kmem_cache pointer and need to derive it
from the page->slab_cache). The base (bulk=1) extra cost is 4 cycles,
which then gets amortized as build_detached_freelist() combines objects
belonging to same page.

This is likely because the compiler, with disabled CONFIG_MEMCG_KMEM=n,
can optimize and avoid doing the lookup of the kmem_cache structure.

I'll start doing testing with CONFIG_MEMCG_KMEM enabled...

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2015-12-14 15:20 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-03 15:56 [RFC PATCH 0/2] slab: implement bulking for SLAB allocator Jesper Dangaard Brouer
2015-12-03 15:57 ` [RFC PATCH 1/2] slab: implement bulk alloc in " Jesper Dangaard Brouer
2015-12-04 10:16   ` Jesper Dangaard Brouer
2015-12-04 17:10   ` Christoph Lameter
2015-12-07 10:20     ` Jesper Dangaard Brouer
2015-12-07 14:57       ` Christoph Lameter
2015-12-03 15:57 ` [RFC PATCH 2/2] slab: implement bulk free " Jesper Dangaard Brouer
2015-12-04 17:17   ` Christoph Lameter
2015-12-07 11:25     ` Jesper Dangaard Brouer
2015-12-07 14:59       ` Christoph Lameter
2015-12-08 13:39         ` Jesper Dangaard Brouer
2015-12-08 14:11           ` Christoph Lameter
2015-12-08 14:12         ` Vladimir Davydov
2015-12-08 14:15           ` Christoph Lameter
2015-12-08 14:56             ` Vladimir Davydov
2015-12-08 15:13               ` Jesper Dangaard Brouer
2015-12-08 15:32               ` Christoph Lameter
2015-12-04  9:01 ` [RFC PATCH 0/2] slab: implement bulking for " Jesper Dangaard Brouer
2015-12-08 16:18 ` [RFC PATCH V2 0/9] slab: cleanup and bulk API for SLAB Jesper Dangaard Brouer
2015-12-08 16:18   ` [RFC PATCH V2 1/9] mm/slab: move SLUB alloc hooks to common mm/slab.h Jesper Dangaard Brouer
2015-12-09 15:43     ` Christoph Lameter
2015-12-08 16:18   ` [RFC PATCH V2 2/9] mm: generalize avoid fault-inject on bootstrap kmem_cache Jesper Dangaard Brouer
2015-12-09  2:36     ` Joonsoo Kim
2015-12-08 16:18   ` [RFC PATCH V2 3/9] slab: use slab_pre_alloc_hook in SLAB allocator Jesper Dangaard Brouer
2015-12-08 16:18   ` [RFC PATCH V2 4/9] mm: kmemcheck skip object if slab allocation failed Jesper Dangaard Brouer
2015-12-08 16:18   ` [RFC PATCH V2 5/9] slab: use slab_post_alloc_hook in SLAB allocator Jesper Dangaard Brouer
2015-12-08 16:18   ` [RFC PATCH V2 6/9] slab: implement bulk alloc " Jesper Dangaard Brouer
2015-12-08 16:18   ` [RFC PATCH V2 7/9] slab: avoid running debug SLAB code with IRQs disabled for alloc_bulk Jesper Dangaard Brouer
2015-12-08 16:19   ` [RFC PATCH V2 8/9] slab: implement bulk free in SLAB allocator Jesper Dangaard Brouer
2015-12-09 16:06     ` Christoph Lameter
2015-12-09 18:53       ` Jesper Dangaard Brouer
2015-12-09 19:41         ` Christoph Lameter
2015-12-09 20:50           ` Jesper Dangaard Brouer
2015-12-10 15:15             ` Christoph Lameter
2015-12-10 15:10           ` Jesper Dangaard Brouer
2015-12-10 15:18             ` Christoph Lameter
2015-12-10 15:26               ` Vladimir Davydov
2015-12-10 17:24                 ` Christoph Lameter
2015-12-14 15:19       ` Jesper Dangaard Brouer [this message]
2015-12-15 12:02         ` Jesper Dangaard Brouer
2015-12-08 16:19   ` [RFC PATCH V2 9/9] slab: annotate code to generate more compact asm code Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151214161958.1a8edf79@redhat.com \
    --to=brouer@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vdavydov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.