linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] padata: section cleanup
@ 2010-03-27  6:53 Henrik Kretzschmar
  2010-03-29  7:42 ` Steffen Klassert
  2010-03-29  8:15 ` Herbert Xu
  0 siblings, 2 replies; 17+ messages in thread
From: Henrik Kretzschmar @ 2010-03-27  6:53 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, linux-kernel, Henrik Kretzschmar

This patch removes the __cupinit from padata_cpu_callback(),
which is refered by the exportet function padata_alloc().

This could lead to problems if CONFIG_HOTPLUG_CPU is disabled,
which should happen very often.

WARNING: kernel/built-in.o(.text+0x7ffcb): Section mismatch in reference from the function padata_alloc() to the function .cpuinit.text:padata_cpu_callback()
The function padata_alloc() references
the function __cpuinit padata_cpu_callback().
This is often because padata_alloc lacks a __cpuinit
annotation or the annotation of padata_cpu_callback is wrong.

Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
---
 kernel/padata.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/padata.c b/kernel/padata.c
index 93caf65..32fdd5f 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -569,7 +569,7 @@ void padata_stop(struct padata_instance *pinst)
 }
 EXPORT_SYMBOL(padata_stop);
 
-static int __cpuinit padata_cpu_callback(struct notifier_block *nfb,
+static int padata_cpu_callback(struct notifier_block *nfb,
 					 unsigned long action, void *hcpu)
 {
 	int err;
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-03-27  6:53 [PATCH] padata: section cleanup Henrik Kretzschmar
@ 2010-03-29  7:42 ` Steffen Klassert
  2010-03-29  8:15 ` Herbert Xu
  1 sibling, 0 replies; 17+ messages in thread
From: Steffen Klassert @ 2010-03-29  7:42 UTC (permalink / raw)
  To: Henrik Kretzschmar; +Cc: herbert, linux-kernel

On Sat, Mar 27, 2010 at 07:53:46AM +0100, Henrik Kretzschmar wrote:
> This patch removes the __cupinit from padata_cpu_callback(),
> which is refered by the exportet function padata_alloc().
> 
> This could lead to problems if CONFIG_HOTPLUG_CPU is disabled,
> which should happen very often.
> 
> WARNING: kernel/built-in.o(.text+0x7ffcb): Section mismatch in reference from the function padata_alloc() to the function .cpuinit.text:padata_cpu_callback()
> The function padata_alloc() references
> the function __cpuinit padata_cpu_callback().
> This is often because padata_alloc lacks a __cpuinit
> annotation or the annotation of padata_cpu_callback is wrong.
> 
> Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>

Acked-by: Steffen Klassert <steffen.klassert@secunet.com>

Thanks,

Steffen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-03-27  6:53 [PATCH] padata: section cleanup Henrik Kretzschmar
  2010-03-29  7:42 ` Steffen Klassert
@ 2010-03-29  8:15 ` Herbert Xu
  2010-03-31 21:58   ` Andrew Morton
  1 sibling, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2010-03-29  8:15 UTC (permalink / raw)
  To: Henrik Kretzschmar; +Cc: steffen.klassert, linux-kernel

On Sat, Mar 27, 2010 at 07:53:46AM +0100, Henrik Kretzschmar wrote:
> This patch removes the __cupinit from padata_cpu_callback(),
> which is refered by the exportet function padata_alloc().
> 
> This could lead to problems if CONFIG_HOTPLUG_CPU is disabled,
> which should happen very often.
> 
> WARNING: kernel/built-in.o(.text+0x7ffcb): Section mismatch in reference from the function padata_alloc() to the function .cpuinit.text:padata_cpu_callback()
> The function padata_alloc() references
> the function __cpuinit padata_cpu_callback().
> This is often because padata_alloc lacks a __cpuinit
> annotation or the annotation of padata_cpu_callback is wrong.
> 
> Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>

Patch applied.  Thanks!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-03-29  8:15 ` Herbert Xu
@ 2010-03-31 21:58   ` Andrew Morton
  2010-04-01  0:24     ` Herbert Xu
  2010-04-01  8:11     ` Steffen Klassert
  0 siblings, 2 replies; 17+ messages in thread
From: Andrew Morton @ 2010-03-31 21:58 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Henrik Kretzschmar, steffen.klassert, linux-kernel

On Mon, 29 Mar 2010 16:15:50 +0800
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Sat, Mar 27, 2010 at 07:53:46AM +0100, Henrik Kretzschmar wrote:
> > This patch removes the __cupinit from padata_cpu_callback(),
> > which is refered by the exportet function padata_alloc().
> > 
> > This could lead to problems if CONFIG_HOTPLUG_CPU is disabled,
> > which should happen very often.
> > 
> > WARNING: kernel/built-in.o(.text+0x7ffcb): Section mismatch in reference from the function padata_alloc() to the function .cpuinit.text:padata_cpu_callback()
> > The function padata_alloc() references
> > the function __cpuinit padata_cpu_callback().
> > This is often because padata_alloc lacks a __cpuinit
> > annotation or the annotation of padata_cpu_callback is wrong.
> > 
> > Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
> 
> Patch applied.  Thanks!

(wtf?)

OK, on behalf of thousands I ask: what the heck is kernel/padata.c? 

Seems to have popped up in 2.6.34, positioned as generic kernel-wide
code only it has been secreted away on the linux-crypto list.  Please
don't do this.

What is it for, how does it work, where might it otherwise be used,
what are the dynamics, why does it use softirqs rather than (say)
kernel threads and how do we stop it from using yield()?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-04-01  0:24     ` Herbert Xu
@ 2010-03-31 22:04       ` Andrew Morton
  2010-04-01  1:38         ` Herbert Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2010-03-31 22:04 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Henrik Kretzschmar, steffen.klassert, linux-kernel

On Thu, 1 Apr 2010 08:24:40 +0800 Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Wed, Mar 31, 2010 at 02:58:17PM -0700, Andrew Morton wrote:
> >
> > (wtf?)
> > 
> > OK, on behalf of thousands I ask: what the heck is kernel/padata.c? 
> > 
> > Seems to have popped up in 2.6.34, positioned as generic kernel-wide
> > code only it has been secreted away on the linux-crypto list.  Please
> > don't do this.
> 
> It was posted to linux-kernel multiple times.
> 

I can find it once, in December 2009, in the middle of a massive thread
called "workqueue thing".  It had no replies.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-03-31 21:58   ` Andrew Morton
@ 2010-04-01  0:24     ` Herbert Xu
  2010-03-31 22:04       ` Andrew Morton
  2010-04-01  8:11     ` Steffen Klassert
  1 sibling, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2010-04-01  0:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Henrik Kretzschmar, steffen.klassert, linux-kernel

On Wed, Mar 31, 2010 at 02:58:17PM -0700, Andrew Morton wrote:
>
> (wtf?)
> 
> OK, on behalf of thousands I ask: what the heck is kernel/padata.c? 
> 
> Seems to have popped up in 2.6.34, positioned as generic kernel-wide
> code only it has been secreted away on the linux-crypto list.  Please
> don't do this.

It was posted to linux-kernel multiple times.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-04-01  1:38         ` Herbert Xu
@ 2010-04-01  0:29           ` Andrew Morton
  2010-04-01  7:15             ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2010-04-01  0:29 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Henrik Kretzschmar, steffen.klassert, linux-kernel, David S. Miller

On Thu, 1 Apr 2010 09:38:45 +0800 Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Wed, Mar 31, 2010 at 06:04:45PM -0400, Andrew Morton wrote:
> >
> > I can find it once, in December 2009, in the middle of a massive thread
> > called "workqueue thing".  It had no replies.
> 
> OK you're right.  It's only been posted to lkml once.  However,
> this has been discussed on netdev since 2008 at least.  The patch
> in its current form has also been acked by David Miller.
> 

Thing is, this isn't net code and it isn't crypto code - it's a
kernel-wide utility.  Everyone needs to know at least a bit about it
and code-reviewers such as myself need to know a lot about it, so we
can tell people "hey dummy, shouldn't you use padata".

So please, can we belatedly go through the process of educating the
developers about this thing you guys have written for us?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-03-31 22:04       ` Andrew Morton
@ 2010-04-01  1:38         ` Herbert Xu
  2010-04-01  0:29           ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2010-04-01  1:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Henrik Kretzschmar, steffen.klassert, linux-kernel, David S. Miller

On Wed, Mar 31, 2010 at 06:04:45PM -0400, Andrew Morton wrote:
>
> I can find it once, in December 2009, in the middle of a massive thread
> called "workqueue thing".  It had no replies.

OK you're right.  It's only been posted to lkml once.  However,
this has been discussed on netdev since 2008 at least.  The patch
in its current form has also been acked by David Miller.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-04-01  0:29           ` Andrew Morton
@ 2010-04-01  7:15             ` David Miller
  2010-04-01 13:59               ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2010-04-01  7:15 UTC (permalink / raw)
  To: akpm; +Cc: herbert, henne, steffen.klassert, linux-kernel

From: Andrew Morton <akpm@linux-foundation.org>
Date: Wed, 31 Mar 2010 20:29:31 -0400

> Thing is, this isn't net code and it isn't crypto code - it's a
> kernel-wide utility.

Thanks goodness, because if we had put a private copy in the
networking or the crypto code someone would have given us a hard time.

Wait a second... we're being given a hard time anyways. :-)

Kidding aside, why didn't anyone show any interest in the patch when
it was posted to lkml?  You can say it was smoothered together with
some crypto stuff, but that was absolutely the logical thing to do
because the follow-on crypto patches showed what the thing was going
to be used for.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-03-31 21:58   ` Andrew Morton
  2010-04-01  0:24     ` Herbert Xu
@ 2010-04-01  8:11     ` Steffen Klassert
  2010-04-01 21:59       ` Andrew Morton
  2010-04-06 18:00       ` Jonathan Corbet
  1 sibling, 2 replies; 17+ messages in thread
From: Steffen Klassert @ 2010-04-01  8:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Herbert Xu, Henrik Kretzschmar, linux-kernel

On Wed, Mar 31, 2010 at 02:58:17PM -0700, Andrew Morton wrote:
> 
> What is it for, how does it work, where might it otherwise be used,
> what are the dynamics, why does it use softirqs rather than (say)
> kernel threads and how do we stop it from using yield()?
> 

padata can be used to spread cpu intensive work over multiple cpus. It came
into being with the idea to parallelize IPsec. We separated the generic
parts out to padata in the hope it can be useful to others too.
So padata is used to parallelize cpu intensive codepaths, like crypto
operations. As soon as the cpu intensive work is done, it is possible to
serialize again without getting reorder of the parallelized objects.
This is in particular important for IPsec to keep the network packets
in the right order.

To achieve a parallelization of the cpu intensive parts of IPsec, we
created the pcrypt parallel crypto template (crypto/pcrypt.c). This
wraps around an existing crypto algorithm and does the
parallelization/serialization by using padata. So pcrypt is an example on
how to use padata.

I'm about to write some documentation for padata/pcrypt. The big picture
is as follows. A user can allocate a padata instance by using padata_alloc().
The user can specify the cpumask of the cpus and the workqueue he wants to
use for the parallel codepath on allocation time. After allocation it is
possible to start/stop the padata instance with padata_start() padata_stop().
The actual parallel codepath is entered by calling padata_do_parallel().
The parallelized objects, in case of pcrypt the crypto requests, need to
embed padata's control structure of type struct padata_priv. This structure
is the anchor for padata to handle the object. padata_do_parallel() takes
three parameters, the padata instance on which to parallelize, the control
structure that is embedded in the object and a callback cpu on which the
object will appear after serialization. Once the cpu intensive work is done,
it is possible to serialize by calling padata_do_serial(). This function
takes the padata control structure as parameter. It brings the parallelized
objects back to the order they were before the parallelization and sends
them to the cpu which was specified as the callback cpu.

Internally padata uses a workqueue to do the parallelization/serialization,
not softirqs (some earlier versions used softirqs).

I was not aware that using yield() is considered to be a bad thing, it is
of course possible to do it without yield() if this is wanted.

Steffen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-04-01  7:15             ` David Miller
@ 2010-04-01 13:59               ` Andrew Morton
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2010-04-01 13:59 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, henne, steffen.klassert, linux-kernel

On Thu, 01 Apr 2010 00:15:56 -0700 (PDT) David Miller <davem@davemloft.net> wrote:

> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Wed, 31 Mar 2010 20:29:31 -0400
> 
> > Thing is, this isn't net code and it isn't crypto code - it's a
> > kernel-wide utility.
> 
> Thanks goodness, because if we had put a private copy in the
> networking or the crypto code someone would have given us a hard time.
> 
> Wait a second... we're being given a hard time anyways. :-)

Because the target audience for this work weren't informed about it,
and it would have been better if they had been.

> Kidding aside, why didn't anyone show any interest in the patch when
> it was posted to lkml?  You can say it was smoothered together with
> some crypto stuff, but that was absolutely the logical thing to do
> because the follow-on crypto patches showed what the thing was going
> to be used for.

I dunno - these things happen.  The padata patch appears to have been
sent once to netdev as an RFC back in 2008 and didn't get any replies
there either.  Personally I pay less attention to patches which appear
in the middle of some discussion because I assume they're for
discussion purposes and will come back later.

If I'd known this code was coming I would have reviewed it at an
appropriate time.  Probably others are interested also.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-04-01  8:11     ` Steffen Klassert
@ 2010-04-01 21:59       ` Andrew Morton
  2010-04-02  1:09         ` Herbert Xu
  2010-04-02 11:23         ` Steffen Klassert
  2010-04-06 18:00       ` Jonathan Corbet
  1 sibling, 2 replies; 17+ messages in thread
From: Andrew Morton @ 2010-04-01 21:59 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Herbert Xu, Henrik Kretzschmar, linux-kernel

On Thu, 1 Apr 2010 10:11:26 +0200
Steffen Klassert <steffen.klassert@secunet.com> wrote:

> On Wed, Mar 31, 2010 at 02:58:17PM -0700, Andrew Morton wrote:
> > 
> > What is it for, how does it work, where might it otherwise be used,
> > what are the dynamics, why does it use softirqs rather than (say)
> > kernel threads and how do we stop it from using yield()?
> > 
> 
> padata can be used to spread cpu intensive work over multiple cpus. It came
> into being with the idea to parallelize IPsec. We separated the generic
> parts out to padata in the hope it can be useful to others too.
> So padata is used to parallelize cpu intensive codepaths, like crypto
> operations. As soon as the cpu intensive work is done, it is possible to
> serialize again without getting reorder of the parallelized objects.
> This is in particular important for IPsec to keep the network packets
> in the right order.
> 
> To achieve a parallelization of the cpu intensive parts of IPsec, we
> created the pcrypt parallel crypto template (crypto/pcrypt.c). This
> wraps around an existing crypto algorithm and does the
> parallelization/serialization by using padata. So pcrypt is an example on
> how to use padata.
> 
> I'm about to write some documentation for padata/pcrypt.

Documentation is always appreciated.

> The big picture
> is as follows. A user can allocate a padata instance by using padata_alloc().
> The user can specify the cpumask of the cpus and the workqueue he wants to
> use for the parallel codepath on allocation time. After allocation it is
> possible to start/stop the padata instance with padata_start() padata_stop().
> The actual parallel codepath is entered by calling padata_do_parallel().
> The parallelized objects, in case of pcrypt the crypto requests, need to
> embed padata's control structure of type struct padata_priv. This structure
> is the anchor for padata to handle the object. padata_do_parallel() takes
> three parameters, the padata instance on which to parallelize, the control
> structure that is embedded in the object and a callback cpu on which the
> object will appear after serialization. Once the cpu intensive work is done,
> it is possible to serialize by calling padata_do_serial(). This function
> takes the padata control structure as parameter. It brings the parallelized
> objects back to the order they were before the parallelization and sends
> them to the cpu which was specified as the callback cpu.
> 
> Internally padata uses a workqueue to do the parallelization/serialization,
> not softirqs (some earlier versions used softirqs).

OK.  Doing it in threads is better because it lets the CPU scheduler
regulate things and the load becomes visible and correctly attributed
in per-process accounting.

> I was not aware that using yield() is considered to be a bad thing, it is
> of course possible to do it without yield() if this is wanted.

yield() is a bit problematic - it can sometimes take enormous amounts
of time.  It wasn't always that way - it changed vastly in 2002 and has
since got a bit better (I think).  But generally a yield-based busywait
is a concern and it'd be better to use some more well-defined primitive
such as a lock or wait_for_completion(), etc.

I'd suggest at least loading the system up with 100 busywait processes
and verify that the padata code still behaves appropriately.


Some other stuff:

- The code does

	might_sleep()
	mutex_lock()

  in a lot of places.  But mutex_lock() does might_sleep() too, so
  it's a waste of space and will cause a double-warning if it triggers.

- The code does local_bh_disable() and spin_trylock_bh().  I assume
  that this is to support this code being used from networking
  softirqs.  So the code is usable frmo softirq context and from
  process context but not from hard IRQ context?

  It'd be useful if these designed decisions were described somewhere:
  what's the thinking behind it and what are the implications.

- padata_reorder() does a trylock.  It's quite unobvious to the
  reader why it didn't just do spin_lock().  Needs a code comment.

- the try-again loop in that function would benefit from a comment
  too.  Why is it there, and in what circumstances will the goto be
  taken?

  Once that's understood, we can see under which conditions the code
  will livelock ;)

- did __padata_add_cpu() need to test cpu_active_mask?  Wouldn't it
  be a bug for this to be called against an inactive CPU?

- similarly, does __padata_remove_cpu() need to test cpu_online_mask?

- It appears that the cpu-hotplug support in this code will be
  compiled-in even if the kernel doesn't support CPU hotplug.  It's a
  sneaky feature of the hotcpu_notifier()->cpu_notifier() macros that
  (when used with a modern gcc), the notifier block and the (static)
  notifier handler all get stripped away by the compiler/linker.  I
  suspect the way padata is organized doesn't permit that.  Fixable
  with ifdefs if once wishes to.

- It'd be nice if the internal functions had a bit of documentation. 
  I'm sitting here trying to work out why padata_alloc_pd() goes and
  touches all possible CPUs, and whether it could only touch online
  CPUs.  But I don't really know what padata_alloc_pd() _does_, in the
  overall scheme of things.

- It's expecially useful to document the data structures and the
  relationships between them.  Particularly when they are attached
  together via anonymous list_head's rather than via typed C pointers. 
  What are the structural relationships between the various structs in
  padata.h?  Needs a bit of head-scratching to work out.

- Example: parallel_data.seq_nr.  What does it actually do, and how
  is it managed and how does it tie in to padata_priv.seq_nr?  This is
  all pretty key to the implementation and reverse-engineering your
  intent from the implementation isn't trivial, and can lead to errors.

- What's all this reordering stuff?

- The distinction between serial work and parallel work is somewhat
  lost on me.  I guess that'd be covered in overall documentation.

- Please add comments to padata_get_next() explaining what's
  happening when this returns -ENODATA or -EINPROGRESS.

- If the padata is in the state PADATA_INIT, padata_do_parallel()
  returns 0, which seems odd.  Shouldn't it tell the caller that he
  goofed?

- padata_do_parallel() returns -EINPROGRESS on success, which is
  either a bug, or is peculiar.

- If I have one set of kernel threads and I use then to process
  multiple separate apdata's, it seems that the code will schedule my
  work in a FIFO, run-to-completion fashion?  So I might have been
  better off creating separate workqueues and letting the CPU scheduler
  work it out?  Worthy of discussion in the padata doc?

- Why is parallel work hashed onto a random CPU?  For crude
  load-balancing, I guess?

- Why would I want to specify which CPU the parallel completion
  callback is to be executed on?

  - What happens if that CPU isn't online any more?

- The code looks generally a bit fragile against CPU hotpug.  Maybe
  sprinkling get_online_cpus()/put_online_cpus() in strategic places
  would make things look better ;)

  You can manually online and offline CPUs via
  /sys/devices/system/cpu/cpu*/online (I think).  Thrashing away on
  those files provides a good way to test for hotplug racinesses.



I guess the major question in my mind is: in what other kernel-coding
scenarios might this code be reused?  What patterns should reviwers be
looking out for?  Thoughts on that?

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-04-01 21:59       ` Andrew Morton
@ 2010-04-02  1:09         ` Herbert Xu
  2010-04-02 11:23         ` Steffen Klassert
  1 sibling, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2010-04-02  1:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Steffen Klassert, Henrik Kretzschmar, linux-kernel

On Thu, Apr 01, 2010 at 02:59:53PM -0700, Andrew Morton wrote:
>
> - What's all this reordering stuff?

Padata must guarantee that even though the work may occur in
parallel, anyone submitting work to it in serial must observe
the result strictly in the order that they were submitted.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-04-01 21:59       ` Andrew Morton
  2010-04-02  1:09         ` Herbert Xu
@ 2010-04-02 11:23         ` Steffen Klassert
  1 sibling, 0 replies; 17+ messages in thread
From: Steffen Klassert @ 2010-04-02 11:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Herbert Xu, Henrik Kretzschmar, linux-kernel

On Thu, Apr 01, 2010 at 02:59:53PM -0700, Andrew Morton wrote:
> 
> yield() is a bit problematic - it can sometimes take enormous amounts
> of time.  It wasn't always that way - it changed vastly in 2002 and has
> since got a bit better (I think).  But generally a yield-based busywait
> is a concern and it'd be better to use some more well-defined primitive
> such as a lock or wait_for_completion(), etc.

Yes, wait_for_completion() is probably the better way to do the
busywait.

> 
> I'd suggest at least loading the system up with 100 busywait processes
> and verify that the padata code still behaves appropriately.
> 
> 
> Some other stuff:
> 
> - The code does
> 
> 	might_sleep()
> 	mutex_lock()
> 
>   in a lot of places.  But mutex_lock() does might_sleep() too, so
>   it's a waste of space and will cause a double-warning if it triggers.

Right, the mutex_sleep() can be removed in these cases. Will do that.

> 
> - The code does local_bh_disable() and spin_trylock_bh().  I assume
>   that this is to support this code being used from networking
>   softirqs.  So the code is usable frmo softirq context and from
>   process context but not from hard IRQ context?

Right, can be used from softirq and process, but not from hardirq context.

> 
>   It'd be useful if these designed decisions were described somewhere:
>   what's the thinking behind it and what are the implications.

Ok, will place some description to the code.

> 
> - padata_reorder() does a trylock.  It's quite unobvious to the
>   reader why it didn't just do spin_lock().  Needs a code comment.

Ok.

> 
> - the try-again loop in that function would benefit from a comment
>   too.  Why is it there, and in what circumstances will the goto be
>   taken?

The try-again loop it to handle a corner case that appears with the trylock.
Will add some comments on this too.

> 
>   Once that's understood, we can see under which conditions the code
>   will livelock ;)
> 
> - did __padata_add_cpu() need to test cpu_active_mask?  Wouldn't it
>   be a bug for this to be called against an inactive CPU?
> 
> - similarly, does __padata_remove_cpu() need to test cpu_online_mask?
> 

Well, the idea behind that was to maintain a cpumask the user wishes to
use for parallelization. The cpumask that is actually used, is the logical and
between user's cpumask and the cpu_active_mask. The reason for maintaining
such a cpumask was to keep the user supplied cpumask untouched on hotplug
events. If a cpu goes down and then up again padata can simply reuse this cpu
if it is in the user's cpumask. So this made it possible to add even offline
cpus to the user's cpumask. Once the cpu comes online, it will be used.
I guess this need a code comment too.
	
> - It appears that the cpu-hotplug support in this code will be
>   compiled-in even if the kernel doesn't support CPU hotplug.  It's a
>   sneaky feature of the hotcpu_notifier()->cpu_notifier() macros that
>   (when used with a modern gcc), the notifier block and the (static)
>   notifier handler all get stripped away by the compiler/linker.  I
>   suspect the way padata is organized doesn't permit that.  Fixable
>   with ifdefs if once wishes to.

Yes, I noticed that already. A patch that ifdef the cpu hotplug code is
already in my queue.

> 
> - It'd be nice if the internal functions had a bit of documentation. 
>   I'm sitting here trying to work out why padata_alloc_pd() goes and
>   touches all possible CPUs, and whether it could only touch online
>   CPUs.  But I don't really know what padata_alloc_pd() _does_, in the
>   overall scheme of things.

Hm, good point. For the moment I think it would be even sufficient to touch
just the logical and between the supplied and the active mask. I'll look
into this.

> 
> - It's expecially useful to document the data structures and the
>   relationships between them.  Particularly when they are attached
>   together via anonymous list_head's rather than via typed C pointers. 
>   What are the structural relationships between the various structs in
>   padata.h?  Needs a bit of head-scratching to work out.

Ok, will do.

> 
> - Example: parallel_data.seq_nr.  What does it actually do, and how
>   is it managed and how does it tie in to padata_priv.seq_nr?  This is
>   all pretty key to the implementation and reverse-engineering your
>   intent from the implementation isn't trivial, and can lead to errors.

The sequence numbers are in fact the key to maintain the order of the
parallelized objects. padata_priv.seq_nr is the sequence number that
identifies a object unique. The object is equipped with this number before
it is queued for the parallel codepath. On exit of the parallel codepath
this number is used to bring the objects back to the right order.
parallel_data.seq_nr maintains the 'next free sequence number' of the
padata instance. The next object that appears will be equipped with this
number as it's private sequence number (padata_priv.seq_nr).
parallel_data.seq_nr is incremented by one and then again contains the
number for the next object.

> 
> - What's all this reordering stuff?
> 
> - The distinction between serial work and parallel work is somewhat
>   lost on me.  I guess that'd be covered in overall documentation.
> 
> - Please add comments to padata_get_next() explaining what's
>   happening when this returns -ENODATA or -EINPROGRESS.

Ok.

> 
> - If the padata is in the state PADATA_INIT, padata_do_parallel()
>   returns 0, which seems odd.  Shouldn't it tell the caller that he
>   goofed?

It is possible to start/stop the padata instance. PADATA_INIT maintains
whether the instance is running. padata_do_parallel() returns 0 if the
instance is not initialized or stopped. It's up to the caller what to
do in this case. Maybe 0 is not the right thing to return in this case,
I'll think about it.

> 
> - padata_do_parallel() returns -EINPROGRESS on success, which is
>   either a bug, or is peculiar.

It returns -EINPROGRESS because the object is queued to a workqueue and
it returns asynchronous. It's at least in the crypto layer quite common
to do this if a crypto request is queued to a workqueue for further
processing.

> 
> - If I have one set of kernel threads and I use then to process
>   multiple separate apdata's, it seems that the code will schedule my
>   work in a FIFO, run-to-completion fashion?  So I might have been
>   better off creating separate workqueues and letting the CPU scheduler
>   work it out?  Worthy of discussion in the padata doc?

Yes, exactly. A separate workqueue should be created for each padata instance.
I'll add this to the documentation.

> 
> - Why is parallel work hashed onto a random CPU?  For crude
>   load-balancing, I guess?

It is not entirely random :)
The parallelized objects are send round robin to the cpus. Starting with
the object with sequence number 0 which is send to the cpu with index 0.
So the objects are sent to the cpus round robin, modulus the number
of cpus in use. In fact this makes it possible to calculate which object
appear on which reorder queue. That's quite important to be able to
bring the objects back to the right order.

> 
> - Why would I want to specify which CPU the parallel completion
>   callback is to be executed on?

Well, for IPsec for example it is quite interesting to separate the
different flows to different cpus. pcrypt does this by choosing different
callback cpus for the requests belonging to different transforms.
Others might want to see the object on the same cpu as it was before
the parallel codepath.

> 
>   - What happens if that CPU isn't online any more?

The cpu won't go away, that's what the yield() thing is doing.

> 
> - The code looks generally a bit fragile against CPU hotpug.  Maybe
>   sprinkling get_online_cpus()/put_online_cpus() in strategic places
>   would make things look better ;)
> 
>   You can manually online and offline CPUs via
>   /sys/devices/system/cpu/cpu*/online (I think).  Thrashing away on
>   those files provides a good way to test for hotplug racinesses.

Yes, I know. I used this to test the cpu hotplug code. I used a script
that takes cpus online and offline as fast as possible and I sent
with a traffic generator bidirectional traffic at maximum rate to
put the system under pressure during the cpu hotplugs. I ran this
for several hours, so I think there are at least no obvious races.

> 
> I guess the major question in my mind is: in what other kernel-coding
> scenarios might this code be reused?  What patterns should reviwers be
> looking out for?  Thoughts on that?
> 

I thought that somebody will raise this question.
When we decided to move the hooks for the parallelization from the
networking to the crypto layer I asked whether padata should be kept
generic or moved too. Nobody had a strong opinion on this, so I decided
to keep it generic.

OTOH, if I had moved it to the crypto layer, I'm sure somebody would have
asked why this code is local to a certain subsystem.

Anyway I had to decide for the one or the other option, knowingly that
both decisions could be wrong :)

For the moment I don't know about potential other users. But I haven't
searched for it so far. Anyway, it can be used whenever a stream of data
(like network packets) needs to be processed by something cpu intensive
but also need to be kept in the original order.

Steffen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-04-01  8:11     ` Steffen Klassert
  2010-04-01 21:59       ` Andrew Morton
@ 2010-04-06 18:00       ` Jonathan Corbet
  2010-04-07 13:34         ` Steffen Klassert
  2010-04-29 12:14         ` Steffen Klassert
  1 sibling, 2 replies; 17+ messages in thread
From: Jonathan Corbet @ 2010-04-06 18:00 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Andrew Morton, Herbert Xu, Henrik Kretzschmar, linux-kernel

On Thu, 1 Apr 2010 10:11:26 +0200
Steffen Klassert <steffen.klassert@secunet.com> wrote:

> I'm about to write some documentation for padata/pcrypt.

I hope that nobody minds if I went ahead and bashed something out?

	http://lwn.net/SubscriberLink/382257/fe415ce242a12c15/

Comments welcome.  I'd be happy to rework it into a Documentation/ file
if that would be useful.

jon

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-04-06 18:00       ` Jonathan Corbet
@ 2010-04-07 13:34         ` Steffen Klassert
  2010-04-29 12:14         ` Steffen Klassert
  1 sibling, 0 replies; 17+ messages in thread
From: Steffen Klassert @ 2010-04-07 13:34 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrew Morton, Herbert Xu, Henrik Kretzschmar, linux-kernel

On Tue, Apr 06, 2010 at 12:00:15PM -0600, Jonathan Corbet wrote:
> On Thu, 1 Apr 2010 10:11:26 +0200
> Steffen Klassert <steffen.klassert@secunet.com> wrote:
> 
> > I'm about to write some documentation for padata/pcrypt.
> 
> I hope that nobody minds if I went ahead and bashed something out?
> 
> 	http://lwn.net/SubscriberLink/382257/fe415ce242a12c15/
> 
> Comments welcome.  

I think your article reflects patada and it's API as it
is quite good.

> I'd be happy to rework it into a Documentation/ file
> if that would be useful.
> 

That would be great! I'd be glad to provide you with further
information if you need it.

Thanks a lot,

Steffen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] padata: section cleanup
  2010-04-06 18:00       ` Jonathan Corbet
  2010-04-07 13:34         ` Steffen Klassert
@ 2010-04-29 12:14         ` Steffen Klassert
  1 sibling, 0 replies; 17+ messages in thread
From: Steffen Klassert @ 2010-04-29 12:14 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrew Morton, Herbert Xu, Henrik Kretzschmar, linux-kernel

On Tue, Apr 06, 2010 at 12:00:15PM -0600, Jonathan Corbet wrote:
> On Thu, 1 Apr 2010 10:11:26 +0200
> Steffen Klassert <steffen.klassert@secunet.com> wrote:
> 
> > I'm about to write some documentation for padata/pcrypt.
> 
> I hope that nobody minds if I went ahead and bashed something out?
> 
> 	http://lwn.net/SubscriberLink/382257/fe415ce242a12c15/
> 
> Comments welcome.  I'd be happy to rework it into a Documentation/ file
> if that would be useful.
> 

I wrote an article about parallelizing IPsec for the Linuxtag Conference
proceedings. This article covers also a section that describes the ideas of
padata on a formal level. I can send it if anybody is interested.

I'll send out the patches based on Andrew's review later today.

Steffen

PS: I'll be off for a week starting tomorrow evening, so I might not
respond during this time.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-04-30 18:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-27  6:53 [PATCH] padata: section cleanup Henrik Kretzschmar
2010-03-29  7:42 ` Steffen Klassert
2010-03-29  8:15 ` Herbert Xu
2010-03-31 21:58   ` Andrew Morton
2010-04-01  0:24     ` Herbert Xu
2010-03-31 22:04       ` Andrew Morton
2010-04-01  1:38         ` Herbert Xu
2010-04-01  0:29           ` Andrew Morton
2010-04-01  7:15             ` David Miller
2010-04-01 13:59               ` Andrew Morton
2010-04-01  8:11     ` Steffen Klassert
2010-04-01 21:59       ` Andrew Morton
2010-04-02  1:09         ` Herbert Xu
2010-04-02 11:23         ` Steffen Klassert
2010-04-06 18:00       ` Jonathan Corbet
2010-04-07 13:34         ` Steffen Klassert
2010-04-29 12:14         ` Steffen Klassert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).