RCU Archive on lore.kernel.org
 help / color / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, byungchul.park@lge.com,
	Davidlohr Bueso <dave@stgolabs.net>,
	Josh Triplett <josh@joshtriplett.org>,
	kernel-team@android.com, kernel-team@lge.com,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	max.byungchul.park@gmail.com, Rao Shoaib <rao.shoaib@oracle.com>,
	rcu@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH v4 2/2] rcuperf: Add kfree_rcu() performance Tests
Date: Tue, 20 Aug 2019 20:27:05 -0400
Message-ID: <20190821002705.GA212946@google.com> (raw)
In-Reply-To: <20190820025056.GL28441@linux.ibm.com>

On Mon, Aug 19, 2019 at 07:50:56PM -0700, Paul E. McKenney wrote:
 
> > > > > > +	do {
> > > > > > +		for (i = 0; i < kfree_alloc_num; i++) {
> > > > > > +			alloc_ptrs[i] = kmalloc(sizeof(struct kfree_obj), GFP_KERNEL);
> > > > > > +			if (!alloc_ptrs[i])
> > > > > > +				return -ENOMEM;
> > > > > > +		}
> > > > > > +
> > > > > > +		for (i = 0; i < kfree_alloc_num; i++) {
> > > > > > +			if (!kfree_no_batch) {
> > > > > > +				kfree_rcu(alloc_ptrs[i], rh);
> > > > > > +			} else {
> > > > > > +				rcu_callback_t cb;
> > > > > > +
> > > > > > +				cb = (rcu_callback_t)(unsigned long)offsetof(struct kfree_obj, rh);
> > > > > > +				kfree_call_rcu_nobatch(&(alloc_ptrs[i]->rh), cb);
> > > > > > +			}
> > > > > > +		}
> > > > > 
> > > > > The point of allocating a large batch and then kfree_rcu()ing them in a
> > > > > loop is to defeat the per-CPU pool optimization?  Either way, a comment
> > > > > would be very good!
> > > > 
> > > > It was a reasoning like this, added it as a comment:
> > > > 
> > > > 	/* While measuring kfree_rcu() time, we also end up measuring kmalloc()
> > > > 	 * time. So the strategy here is to do a few (kfree_alloc_num) number
> > > > 	 * of kmalloc() and kfree_rcu() every loop so that the current loop's
> > > > 	 * deferred kfree()ing overlaps with the next loop's kmalloc().
> > > > 	 */
> > > 
> > > The thought being that the CPU will be executing the two loops
> > > concurrently?  Up to a point, agreed, but how much of an effect is
> > > that, really?
> > 
> > Yes it may not matter much. It was just a small thought when I added the
> > loop, I had to start somewhere, so I did it this way.
> > 
> > > Or is the idea to time the kfree_rcu() loop separately?  (I don't see
> > > any such separate timing, though.)
> > 
> > The kmalloc() times are included within the kfree loop. The timing of
> > kfree_rcu() is not separate in my patch.
> 
> You lost me on this one.  What happens when you just interleave the
> kmalloc() and kfree_rcu(), without looping, compared to the looping
> above?  Does this get more expensive?  Cheaper?  More vulnerable to OOM?
> Something else?

You mean pairing a single kmalloc() with a single kfree_rcu() and doing this
several times? The results are very similar to doing kfree_alloc_num
kmalloc()s, then do kfree_alloc_num kfree_rcu()s; and repeat the whole thing
kfree_loops times (as done by this rcuperf patch we are reviewing).

Following are some numbers. One change is the case where we are not at all
batching does seem to complete even faster when we fully interleave kmalloc()
with kfree() while the case of batching in the same scenario completes at the
same time as did the "not fully interleaved" scenario. However, the grace
period reduction improvements and the chances of OOM'ing are pretty much the
same in either case.

Fully interleaved, single kmalloc followed by kfree_rcu, do this kfree_alloc_num * kfree_loops times.
=======================
(1) Batching
rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_no_batch=0 rcuperf.kfree_rcu_test=1

root@(none):/# free -m
              total        used        free      shared  buff/cache   available
Mem:            977         261         675           0          39         674

[   15.635620] Total time taken by all kfree'ers: 14255673998 ns, loops: 20000, batches: 1596

(2) No Batching
rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_no_batch=1 rcuperf.kfree_rcu_test=1

root@(none):/# free -m
             total        used        free      shared  buff/cache   available
Mem:            977          67         870           0          39         869
Swap:             0           0           0

[   12.365872] Total time taken by all kfree'ers: 10902137101 ns, loops: 20000, batches: 6893


Not fully interleaved: do kfree_alloc_num kmallocs, then do kfree_alloc_num kfree_rcu()s. And repeat this kfree_loops times.
=======================
(1) Batching
rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_no_batch=0 rcuperf.kfree_rcu_test=1

root@(none):/# free -m
              total        used        free      shared  buff/cache   available
Mem:            977         251         686           0          39         684
Swap:             0           0           0

[   15.574402] Total time taken by all kfree'ers: 14185970787 ns, loops: 20000, batches: 1548

(2) No Batching
rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_no_batch=1 rcuperf.kfree_rcu_test=1

root@(none):/# free -m
              total        used        free      shared  buff/cache   available
Mem:            977          82         855           0          39         853
Swap:             0           0           0

[   13.724554] Total time taken by all kfree'ers: 12246217291 ns, loops: 20000, batches: 7262


thanks,

 - Joel



  reply index

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-14 16:04 [PATCH v4 1/2] rcu/tree: Add basic support for kfree_rcu() batching Joel Fernandes (Google)
2019-08-14 16:04 ` [PATCH v4 2/2] rcuperf: Add kfree_rcu() performance Tests Joel Fernandes (Google)
2019-08-14 22:58   ` Paul E. McKenney
2019-08-19 19:33     ` Joel Fernandes
2019-08-19 22:23       ` Paul E. McKenney
2019-08-19 23:51         ` Joel Fernandes
2019-08-20  2:50           ` Paul E. McKenney
2019-08-21  0:27             ` Joel Fernandes [this message]
2019-08-21  0:31               ` Joel Fernandes
2019-08-21  0:44                 ` Paul E. McKenney
2019-08-21  0:51                   ` Joel Fernandes
2019-08-16 16:43 ` [PATCH v4 1/2] rcu/tree: Add basic support for kfree_rcu() batching Paul E. McKenney
2019-08-16 17:44   ` Joel Fernandes
2019-08-16 19:16     ` Paul E. McKenney
2019-08-17  1:32       ` Joel Fernandes
2019-08-17  3:56         ` Paul E. McKenney
2019-08-17  4:30           ` Joel Fernandes
2019-08-17  5:20             ` Paul E. McKenney
2019-08-17  5:53               ` Joel Fernandes
2019-08-17 21:45                 ` Paul E. McKenney
2019-09-18  9:58 ` Uladzislau Rezki
2019-09-30 20:16   ` Joel Fernandes
2019-10-01 11:27     ` Uladzislau Rezki
2019-10-04 17:20       ` Joel Fernandes
2019-10-08 16:23         ` Uladzislau Rezki

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190821002705.GA212946@google.com \
    --to=joel@joelfernandes.org \
    --cc=byungchul.park@lge.com \
    --cc=dave@stgolabs.net \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=kernel-team@android.com \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=max.byungchul.park@gmail.com \
    --cc=paulmck@linux.ibm.com \
    --cc=rao.shoaib@oracle.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

RCU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/rcu/0 rcu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 rcu rcu/ https://lore.kernel.org/rcu \
		rcu@vger.kernel.org
	public-inbox-index rcu

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.rcu


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git