All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: linux-kernel@vger.kernel.org, byungchul.park@lge.com,
	Davidlohr Bueso <dave@stgolabs.net>,
	Josh Triplett <josh@joshtriplett.org>,
	kernel-team@android.com, kernel-team@lge.com,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	max.byungchul.park@gmail.com, Rao Shoaib <rao.shoaib@oracle.com>,
	rcu@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH v4 2/2] rcuperf: Add kfree_rcu() performance Tests
Date: Mon, 19 Aug 2019 19:50:56 -0700	[thread overview]
Message-ID: <20190820025056.GL28441@linux.ibm.com> (raw)
In-Reply-To: <20190819235123.GA185164@google.com>

On Mon, Aug 19, 2019 at 07:51:23PM -0400, Joel Fernandes wrote:
> On Mon, Aug 19, 2019 at 03:23:30PM -0700, Paul E. McKenney wrote:
> [snip]
> > > [snip]
> > > > > @@ -592,6 +593,175 @@ rcu_perf_shutdown(void *arg)
> > > > >  	return -EINVAL;
> > > > >  }
> > > > >  
> > > > > +/*
> > > > > + * kfree_rcu performance tests: Start a kfree_rcu loop on all CPUs for number
> > > > > + * of iterations and measure total time and number of GP for all iterations to complete.
> > > > > + */
> > > > > +
> > > > > +torture_param(int, kfree_nthreads, -1, "Number of threads running loops of kfree_rcu().");
> > > > > +torture_param(int, kfree_alloc_num, 8000, "Number of allocations and frees done in an iteration.");
> > > > > +torture_param(int, kfree_loops, 10, "Number of loops doing kfree_alloc_num allocations and frees.");
> > > > > +torture_param(int, kfree_no_batch, 0, "Use the non-batching (slower) version of kfree_rcu.");
> > > > > +
> > > > > +static struct task_struct **kfree_reader_tasks;
> > > > > +static int kfree_nrealthreads;
> > > > > +static atomic_t n_kfree_perf_thread_started;
> > > > > +static atomic_t n_kfree_perf_thread_ended;
> > > > > +
> > > > > +struct kfree_obj {
> > > > > +	char kfree_obj[8];
> > > > > +	struct rcu_head rh;
> > > > > +};
> > > > 
> > > > (Aside from above, no need to change this part of the patch, at least not
> > > > that I know of at the moment.)
> > > > 
> > > > 24 bytes on a 64-bit system, 16 on a 32-bit system.  So there might
> > > > have been 10 million extra objects awaiting free in the batching case
> > > > given the 400M-50M=350M excess for the batching approach.  If freeing
> > > > each object took about 100ns, that could account for the additional
> > > > wall-clock time for the batching approach.
> > > 
> > > Makes sense, and this comes down to 200-220MB range with the additional list.
> > 
> > Which might even match the observed numbers?
> 
> Yes, they would. Since those *are* the observed numbers :-D ;-) ;-)

;-)

> > > > > +	do {
> > > > > +		for (i = 0; i < kfree_alloc_num; i++) {
> > > > > +			alloc_ptrs[i] = kmalloc(sizeof(struct kfree_obj), GFP_KERNEL);
> > > > > +			if (!alloc_ptrs[i])
> > > > > +				return -ENOMEM;
> > > > > +		}
> > > > > +
> > > > > +		for (i = 0; i < kfree_alloc_num; i++) {
> > > > > +			if (!kfree_no_batch) {
> > > > > +				kfree_rcu(alloc_ptrs[i], rh);
> > > > > +			} else {
> > > > > +				rcu_callback_t cb;
> > > > > +
> > > > > +				cb = (rcu_callback_t)(unsigned long)offsetof(struct kfree_obj, rh);
> > > > > +				kfree_call_rcu_nobatch(&(alloc_ptrs[i]->rh), cb);
> > > > > +			}
> > > > > +		}
> > > > 
> > > > The point of allocating a large batch and then kfree_rcu()ing them in a
> > > > loop is to defeat the per-CPU pool optimization?  Either way, a comment
> > > > would be very good!
> > > 
> > > It was a reasoning like this, added it as a comment:
> > > 
> > > 	/* While measuring kfree_rcu() time, we also end up measuring kmalloc()
> > > 	 * time. So the strategy here is to do a few (kfree_alloc_num) number
> > > 	 * of kmalloc() and kfree_rcu() every loop so that the current loop's
> > > 	 * deferred kfree()ing overlaps with the next loop's kmalloc().
> > > 	 */
> > 
> > The thought being that the CPU will be executing the two loops
> > concurrently?  Up to a point, agreed, but how much of an effect is
> > that, really?
> 
> Yes it may not matter much. It was just a small thought when I added the
> loop, I had to start somewhere, so I did it this way.
> 
> > Or is the idea to time the kfree_rcu() loop separately?  (I don't see
> > any such separate timing, though.)
> 
> The kmalloc() times are included within the kfree loop. The timing of
> kfree_rcu() is not separate in my patch.

You lost me on this one.  What happens when you just interleave the
kmalloc() and kfree_rcu(), without looping, compared to the looping
above?  Does this get more expensive?  Cheaper?  More vulnerable to OOM?
Something else?

							Thanx, Paul

  reply	other threads:[~2019-08-20  2:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-14 16:04 [PATCH v4 1/2] rcu/tree: Add basic support for kfree_rcu() batching Joel Fernandes (Google)
2019-08-14 16:04 ` [PATCH v4 2/2] rcuperf: Add kfree_rcu() performance Tests Joel Fernandes (Google)
2019-08-14 22:58   ` Paul E. McKenney
2019-08-19 19:33     ` Joel Fernandes
2019-08-19 22:23       ` Paul E. McKenney
2019-08-19 23:51         ` Joel Fernandes
2019-08-20  2:50           ` Paul E. McKenney [this message]
2019-08-21  0:27             ` Joel Fernandes
2019-08-21  0:31               ` Joel Fernandes
2019-08-21  0:44                 ` Paul E. McKenney
2019-08-21  0:51                   ` Joel Fernandes
2019-08-16 16:43 ` [PATCH v4 1/2] rcu/tree: Add basic support for kfree_rcu() batching Paul E. McKenney
2019-08-16 17:44   ` Joel Fernandes
2019-08-16 19:16     ` Paul E. McKenney
2019-08-17  1:32       ` Joel Fernandes
2019-08-17  3:56         ` Paul E. McKenney
2019-08-17  4:30           ` Joel Fernandes
2019-08-17  5:20             ` Paul E. McKenney
2019-08-17  5:53               ` Joel Fernandes
2019-08-17 21:45                 ` Paul E. McKenney
2019-09-18  9:58 ` Uladzislau Rezki
2019-09-30 20:16   ` Joel Fernandes
2019-10-01 11:27     ` Uladzislau Rezki
2019-10-04 17:20       ` Joel Fernandes
2019-10-08 16:23         ` Uladzislau Rezki
2019-12-10  9:53   ` Uladzislau Rezki
2019-12-11 23:46     ` Paul E. McKenney
2019-12-16 12:06       ` Uladzislau Rezki
2019-12-12  5:27     ` Joel Fernandes
2019-12-16 12:46       ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190820025056.GL28441@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=byungchul.park@lge.com \
    --cc=dave@stgolabs.net \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=kernel-team@android.com \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=max.byungchul.park@gmail.com \
    --cc=rao.shoaib@oracle.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.