All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] percpu fixes for 2.6.32-rc6
@ 2009-11-10  6:04 Tejun Heo
  2009-11-10 17:10 ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2009-11-10  6:04 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel; +Cc: Yinghai Lu, Ingo Molnar

Hello, Linus.

Please pull from the following percpu fix branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git for-linus

It fixes a possible deadlock caused by lock ordering inversion through
irq.

Thanks.
---
Tejun Heo (1):
      percpu: fix possible deadlock via irq lock inversion

 mm/percpu.c |   17 +++++++++++++++--

diff --git a/mm/percpu.c b/mm/percpu.c
index d907971..30cd343 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -372,7 +372,7 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr)
 static int pcpu_extend_area_map(struct pcpu_chunk *chunk, unsigned long *flags)
 {
 	int new_alloc;
-	int *new;
+	int *new, *old = NULL;
 	size_t size;

 	/* has enough? */
@@ -407,10 +407,23 @@ static int pcpu_extend_area_map(struct pcpu_chunk *chunk, unsigned long *flags)
 	 * one of the first chunks and still using static map.
 	 */
 	if (chunk->map_alloc >= PCPU_DFL_MAP_ALLOC)
-		pcpu_mem_free(chunk->map, size);
+		old = chunk->map;

 	chunk->map_alloc = new_alloc;
 	chunk->map = new;
+
+	/*
+	 * pcpu_mem_free() might end up calling vfree() which uses
+	 * IRQ-unsafe lock and thus can't be called with pcpu_lock
+	 * held.  Release and reacquire pcpu_lock if old map needs to
+	 * be freed.
+	 */
+	if (old) {
+		spin_unlock_irqrestore(&pcpu_lock, *flags);
+		pcpu_mem_free(old, size);
+		spin_lock_irqsave(&pcpu_lock, *flags);
+	}
+
 	return 0;
 }

-- 
tejun

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-10  6:04 [GIT PULL] percpu fixes for 2.6.32-rc6 Tejun Heo
@ 2009-11-10 17:10 ` Linus Torvalds
  2009-11-10 18:33   ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2009-11-10 17:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linux Kernel, Yinghai Lu, Ingo Molnar



On Tue, 10 Nov 2009, Tejun Heo wrote:
> 
> Please pull from the following percpu fix branch.

No way in hell.

> It fixes a possible deadlock caused by lock ordering inversion through
> irq.

.. and it does so by introducing a new bug. No thank you.

> +
> +	/*
> +	 * pcpu_mem_free() might end up calling vfree() which uses
> +	 * IRQ-unsafe lock and thus can't be called with pcpu_lock
> +	 * held.  Release and reacquire pcpu_lock if old map needs to
> +	 * be freed.
> +	 */
> +	if (old) {
> +		spin_unlock_irqrestore(&pcpu_lock, *flags);
> +		pcpu_mem_free(old, size);
> +		spin_lock_irqsave(&pcpu_lock, *flags);
> +	}

Routines that drop and then re-take the lock should be banned, as it's 
almost always a bug waiting to happen. As it is this time:

>  	return 0;

Now the caller will happily continue to traverse a list that may no longer 
be valid, because you dropped the lock.

Really. This thing is total sh*t. It was misdesigned to start with, and 
the calling convention is wrong. That 'pcpu_extend_area_map()' function 
should be split up into two functions: 'pcpu_needs_to_extend()' that never 
drops the lock, and 'pcpu_extend_area()' that _always_ drops the lock 
(and then returns an error if it can't allocate the memory).

Not that shit-for-brains that may or may not drop the lock, and then 
returns an incorrect error return depending on whether it did.

In other words: fix the sh*t, don't add even more to it. That 'return 0' 
was and is wrong. It should have been a 'return 1'. And thank the Gods 
that I looked at it, 

Sure, you can fix the bug by just returning 1. But you can't fix the total 
crap of a calling convention that way. Fix it properly as outlined above, 
and remember: functions that drop locks that were held when called are 
EVIL and almost always the source of really subtle races.

As it was in this case.

			Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-10 17:10 ` Linus Torvalds
@ 2009-11-10 18:33   ` Tejun Heo
  2009-11-10 18:54     ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2009-11-10 18:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, Yinghai Lu, Ingo Molnar

Hello, Linus.

Linus Torvalds wrote:
>> +	if (old) {
>> +		spin_unlock_irqrestore(&pcpu_lock, *flags);
>> +		pcpu_mem_free(old, size);
>> +		spin_lock_irqsave(&pcpu_lock, *flags);
>> +	}
> 
> Routines that drop and then re-take the lock should be banned, as it's 
> almost always a bug waiting to happen. As it is this time:

Yeap, they usually are.  In this case, it's a little bit different.
There are two locks - pcpu_alloc_mutex and pcpu_lock.
pcpu_alloc_mutex protects the whole alloc and reclaim paths while
pcpu_lock protects the part used by free path - area maps in chunks
and pcpu_slot array pointing to chunks.

So, under pcpu_alloc_mutex which pcpu_extend_area_map() is called with
and never drops, the chunk never goes away nor can anything be
allocated from it.  Dropping pcpu_lock only allows free path to come
inbetween and mark areas in the area map of the chunk and maybe
relocate the chunk to another pcpu_slot according to new free area
size - both operations should be safe.  IOW, it's not really dropping
the main lock here.

At first, both alloc and free paths were protected by single mutex.
The original percpu allocator always assumed GFP_KERNEL allocation, so
I thought that should do it.  Unfortunately, I was forgetting about
the free path which was allowed to be called from atomic context, so
the lock was split into two so that the free path can be called from
atomic context.

The reason why this somewhat convoluted locking was chosen over making
both alloc and free paths irq-safe was partly because it was easier
but mostly because percpu allocator's dependence on vmalloc allocator.
All vmalloc area related lockings assume not to be called from irq
context which goes back to having to make cross cpu invalidate calls,
which make it difficult to make percpu allocator irqsafe as a whole.
So, the locking impedance matching was done in the percpu free path
and the temporary inner lock droppings in pcpu_extend_are_map() are
the byproducts.

Christoph Lameter has been saying that atomic percpu allocations would
be beneficial for certain callers.  Pushing the problem down to the
vmalloc layer where the problem originates and making percpu locking
more usual would be nice too.  I'm still not sure whether it would
justify the added complexity (it will probably require putting vmalloc
reclamation path to a workqueue) but that could be the ultimate right
thing to do here.

If I'm missing something, I'm sure you'll hammer it into me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-10 18:33   ` Tejun Heo
@ 2009-11-10 18:54     ` Linus Torvalds
  2009-11-10 19:25       ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2009-11-10 18:54 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linux Kernel, Yinghai Lu, Ingo Molnar



On Wed, 11 Nov 2009, Tejun Heo wrote:
> 
> If I'm missing something, I'm sure you'll hammer it into me.

Here's from the comments on that function:

 * RETURNS:
 * 0 if noop, 1 if successfully extended, -errno on failure.

and here's from one of the main callers:

                list_for_each_entry(chunk, &pcpu_slot[slot], list) {
		...
                        switch (pcpu_extend_area_map(chunk, &flags)) {
                        case 0:
                                break;
                        case 1:
                                goto restart;   /* pcpu_lock dropped, restart */

where that '&pcpu_slot[slot]' list is protected by the pcpu_lock, and 
nothing else. At least according to all the _other_ comments in that file. 
Including the one that very much tries to _explain_ the locking at the 
top, quote:

  "The latter is a spinlock and protects the index data structures - chunk 
   slots, chunks and area maps in chunks."

So as far as I can tell, either the comments are all crap, the whole 
restart code is pointless and in fact the whole spin-lock is seemingly 
almost entirely pointless to begin with (since pcpu_alloc_mutex is the 
only thing that matters), or the code is buggy.

Also, quite frankly, even if the code _isn't_ buggy, it's still wrong to 
release a lock that somebody else took. It's a sure-fire way to write 
unmaintainable code with bugs almost guaranteed in the future. Yes, it 
happens, and sometimes it's the only sane way to do it, but in this case 
that really isn't true as far as I can tell.

>From my (admittedly fairly quick) look, my suggested split-up really would 
make the code _more_ readable (no need for that subtle "negative, zero or 
positive all mean different things" logic), and hopefully avoid the whole 
"drop the lock that somebody else took", because we could drop it in the 
caller where it was taken.

So it all boils down to: the code is unquestionably ugly and almost 
certainly broken. And if it isn't broken, then _all_ the comments are 
total crap. 

Your choice. 

			Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-10 18:54     ` Linus Torvalds
@ 2009-11-10 19:25       ` Tejun Heo
  2009-11-10 19:37         ` Ingo Molnar
  2009-11-10 19:44         ` [GIT PULL] percpu fixes for 2.6.32-rc6 Tejun Heo
  0 siblings, 2 replies; 31+ messages in thread
From: Tejun Heo @ 2009-11-10 19:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, Yinghai Lu, Ingo Molnar

Hello,

Linus Torvalds wrote:
> On Wed, 11 Nov 2009, Tejun Heo wrote:
>> If I'm missing something, I'm sure you'll hammer it into me.
> 
> Here's from the comments on that function:
> 
>  * RETURNS:
>  * 0 if noop, 1 if successfully extended, -errno on failure.
> 
> and here's from one of the main callers:
> 
>                 list_for_each_entry(chunk, &pcpu_slot[slot], list) {
> 		...
>                         switch (pcpu_extend_area_map(chunk, &flags)) {
>                         case 0:
>                                 break;
>                         case 1:
>                                 goto restart;   /* pcpu_lock dropped, restart */
> 
> where that '&pcpu_slot[slot]' list is protected by the pcpu_lock, and 
> nothing else. At least according to all the _other_ comments in that file. 
> Including the one that very much tries to _explain_ the locking at the 
> top, quote:

Oh, yeah, right.  I was too fixated on the part modified by the patch.

>   "The latter is a spinlock and protects the index data structures - chunk 
>    slots, chunks and area maps in chunks."
> 
> So as far as I can tell, either the comments are all crap, the whole 
> restart code is pointless and in fact the whole spin-lock is seemingly 
> almost entirely pointless to begin with (since pcpu_alloc_mutex is the 
> only thing that matters), or the code is buggy.

The return value is wrong but it wouldn't lead to oops.  There's a
very slight chance that it might end up creating extra chunk when not
necessary - probably why it went unnoticed all this time.  The
spin-lock is only to allow free_percpu() to be called from atomic
context, so its usefulness would only be visible if you look at
free_percpu() too.

> Also, quite frankly, even if the code _isn't_ buggy, it's still wrong to 
> release a lock that somebody else took. It's a sure-fire way to write 
> unmaintainable code with bugs almost guaranteed in the future. Yes, it 
> happens, and sometimes it's the only sane way to do it, but in this case 
> that really isn't true as far as I can tell.
> 
> From my (admittedly fairly quick) look, my suggested split-up really would 
> make the code _more_ readable (no need for that subtle "negative, zero or 
> positive all mean different things" logic), and hopefully avoid the whole 
> "drop the lock that somebody else took", because we could drop it in the 
> caller where it was taken.
> 
> So it all boils down to: the code is unquestionably ugly and almost 
> certainly broken. And if it isn't broken, then _all_ the comments are 
> total crap. 

Yeap, the return value definitely is broken and the rather ugly
calling convention is remanant from the days when there was only
single mutex protecting the whole thing.

I think this type of function is a bit special in locking requirement
tho.  The initial step - checking whether the operation is necessary -
requires lock and the final step - copying over to the new thing and
installing it - also requires the lock, so unless there's one
unnecessary unlock/lock pair, the second function would be called
without lock but return with lock, which probably is safer than
releasing and regrabbing lock in the middle but still not quite
pretty.

In this case, as the second function needs to release to free the old
map, the extra unlock/lock pair is actually necessary.  Splitting into
two functions is fine but I think it would be better to fix it first
and then split them in following patches so that it can be bisected if
I screw up while splitting, right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-10 19:25       ` Tejun Heo
@ 2009-11-10 19:37         ` Ingo Molnar
  2009-11-10 19:50           ` Tejun Heo
  2009-11-11  8:49           ` [PATCH percpu#for-linus] percpu: restructure pcpu_extend_area_map() to fix bugs and improve readability Tejun Heo
  2009-11-10 19:44         ` [GIT PULL] percpu fixes for 2.6.32-rc6 Tejun Heo
  1 sibling, 2 replies; 31+ messages in thread
From: Ingo Molnar @ 2009-11-10 19:37 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linus Torvalds, Linux Kernel, Yinghai Lu


* Tejun Heo <tj@kernel.org> wrote:

> In this case, as the second function needs to release to free the old 
> map, the extra unlock/lock pair is actually necessary.  Splitting into 
> two functions is fine but I think it would be better to fix it first 
> and then split them in following patches so that it can be bisected if 
> I screw up while splitting, right?

i think Linus's point was that this hack was the last straw that broke 
the camel's back, and that we are better off with a clean solution 
straight away. If you send the clean approach i can help test it on a 
good number of boxes.

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-10 19:25       ` Tejun Heo
  2009-11-10 19:37         ` Ingo Molnar
@ 2009-11-10 19:44         ` Tejun Heo
  1 sibling, 0 replies; 31+ messages in thread
From: Tejun Heo @ 2009-11-10 19:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, Yinghai Lu, Ingo Molnar

Tejun Heo wrote:
> The return value is wrong but it wouldn't lead to oops.

Correction: It may lead to oops due to wrong end-of-list condition and
I just remembered that I was thinking about it when I was modifying
the locking and adding the switch block there and then I forgot to
actually update the return value of the extend function.  So, yeah,
three way return value sucks big time.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-10 19:37         ` Ingo Molnar
@ 2009-11-10 19:50           ` Tejun Heo
  2009-11-10 21:42             ` Linus Torvalds
  2009-11-11  8:49           ` [PATCH percpu#for-linus] percpu: restructure pcpu_extend_area_map() to fix bugs and improve readability Tejun Heo
  1 sibling, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2009-11-10 19:50 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel, Yinghai Lu

Ingo Molnar wrote:
> * Tejun Heo <tj@kernel.org> wrote:
> 
>> In this case, as the second function needs to release to free the old 
>> map, the extra unlock/lock pair is actually necessary.  Splitting into 
>> two functions is fine but I think it would be better to fix it first 
>> and then split them in following patches so that it can be bisected if 
>> I screw up while splitting, right?
> 
> i think Linus's point was that this hack was the last straw that broke 
> the camel's back, and that we are better off with a clean solution 
> straight away. If you send the clean approach i can help test it on a 
> good number of boxes.

If you're talking about the locking itself, there really is no clean
solution at this point.  Until vmalloc locking is updated, we'll have
to do locking impedance matching in percpu code.  I'm still not quite
sure whether we really need to update vmalloc code to be irqsafe.

If you're talking about the three way return value, which I do agree
to be quite ugly, I think it will be a lot safer to have three patches
- one to fix the deadlock, another to fix the return value and the
final one to de-uglify the function, especially as we're pretty late
in the release cycle.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-10 19:50           ` Tejun Heo
@ 2009-11-10 21:42             ` Linus Torvalds
  2009-11-11  3:55               ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2009-11-10 21:42 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Ingo Molnar, Linux Kernel, Yinghai Lu



On Wed, 11 Nov 2009, Tejun Heo wrote:
> 
> If you're talking about the three way return value, which I do agree
> to be quite ugly, I think it will be a lot safer to have three patches
> - one to fix the deadlock, another to fix the return value and the
> final one to de-uglify the function, especially as we're pretty late
> in the release cycle.

I'm certainly ok with doing it in stages if that is how you want to do it.

That said, I'm not entirely sure it's _worthwhile_, since the "return 1" 
case has apparently never ever actually worked. From a bisect standpoint, 
what's the difference between seeing

 - oh, now that we made it return the documented code and actually re-try 
   properly when dropping the lock, it turns out that the re-try code was 
   always buggy and we just hadn't noticed before because it didn't 
   trigger

or

 - oh, now that we rewrote the function to be cleaner and do the lock 
   dropping and retry more obviously, it turns out that the retry doesn't 
   actually work and leads to deadlocks.

but I don't care deeply.

I want the cleanup because I think that the code sucks from a "future 
proofing" and readability standpoint, but I really don't mind one way or 
the other whether you want to finally do that one "return 1" correctly for 
one commit, only to then fix it to not do that three-way test of a single 
function in the next one.

So whatever works - as long as the end result both looks sane and doesn't 
have the bug we clearly have now.

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-10 21:42             ` Linus Torvalds
@ 2009-11-11  3:55               ` Tejun Heo
  2009-11-11 11:31                 ` Ingo Molnar
  0 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2009-11-11  3:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Linux Kernel, Yinghai Lu

Hello,

Linus Torvalds wrote:
> I want the cleanup because I think that the code sucks from a "future 
> proofing" and readability standpoint, but I really don't mind one way or 
> the other whether you want to finally do that one "return 1" correctly for 
> one commit, only to then fix it to not do that three-way test of a single 
> function in the next one.
> 
> So whatever works - as long as the end result both looks sane and doesn't 
> have the bug we clearly have now.

I was mostly worrying about introducing unrelated bug while changing.
Anyways, one patch it is.  I'll route it through tip as suggested by
Ingo in a few hours.

Thanks a lot.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH percpu#for-linus] percpu: restructure pcpu_extend_area_map() to fix bugs and improve readability
  2009-11-10 19:37         ` Ingo Molnar
  2009-11-10 19:50           ` Tejun Heo
@ 2009-11-11  8:49           ` Tejun Heo
  2009-11-11 19:25             ` Linus Torvalds
  1 sibling, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2009-11-11  8:49 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel, Yinghai Lu

pcpu_extend_area_map() had the following two bugs.

* It should return 1 if pcpu_lock was dropped and reacquired but it
  returned 0.  This could lead to oops if free_percpu() races with
  area map extension.

* pcpu_mem_free() was called under pcpu_lock.  pcpu_mem_free() might
  end up calling vfree() which isn't IRQ safe.  This could lead to
  deadlock through lock order inversion via IRQ.

In addition, Linus pointed out that the temporary lock dropping and
subtle three-way return value of pcpu_extend_area_map() was very ugly
and suggested to split the function into two - pcpu_need_to_extend()
and pcpu_extend_area_map().

This patch restructures pcpu_extend_area_map() as suggested and fixes
the two bugs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
---
I've also put this patch on percpu#for-linus but routing this through
tip would be safer.  Went through usual battery of tests and also
verified both bugs are fixed.  Ingo, can you please put this into tip?

Thanks.

 mm/percpu.c |  122 +++++++++++++++++++++++++++++++++++++++-------------------
 1 files changed, 82 insertions(+), 40 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index d907971..46f037a 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -355,62 +355,86 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr)
 }

 /**
- * pcpu_extend_area_map - extend area map for allocation
- * @chunk: target chunk
+ * pcpu_need_to_extend - determine whether chunk area map needs to be extended
+ * @chunk: chunk of interest
  *
- * Extend area map of @chunk so that it can accomodate an allocation.
- * A single allocation can split an area into three areas, so this
- * function makes sure that @chunk->map has at least two extra slots.
+ * Determine whether area map of @chunk needs to be extended to
+ * accomodate a new allocation.
  *
  * CONTEXT:
- * pcpu_alloc_mutex, pcpu_lock.  pcpu_lock is released and reacquired
- * if area map is extended.
+ * pcpu_lock.
  *
  * RETURNS:
- * 0 if noop, 1 if successfully extended, -errno on failure.
+ * New target map allocation length if extension is necessary, 0
+ * otherwise.
  */
-static int pcpu_extend_area_map(struct pcpu_chunk *chunk, unsigned long *flags)
+static int pcpu_need_to_extend(struct pcpu_chunk *chunk)
 {
 	int new_alloc;
-	int *new;
-	size_t size;

-	/* has enough? */
 	if (chunk->map_alloc >= chunk->map_used + 2)
 		return 0;

-	spin_unlock_irqrestore(&pcpu_lock, *flags);
-
 	new_alloc = PCPU_DFL_MAP_ALLOC;
 	while (new_alloc < chunk->map_used + 2)
 		new_alloc *= 2;

-	new = pcpu_mem_alloc(new_alloc * sizeof(new[0]));
-	if (!new) {
-		spin_lock_irqsave(&pcpu_lock, *flags);
+	return new_alloc;	
+}
+
+/**
+ * pcpu_extend_area_map - extend area map of a chunk
+ * @chunk: chunk of interest
+ * @new_alloc: new target allocation length of the area map
+ *
+ * Extend area map of @chunk to have @new_alloc entries.
+ *
+ * CONTEXT:
+ * Does GFP_KERNEL allocation.  Grabs and releases pcpu_lock.
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+static int pcpu_extend_area_map(struct pcpu_chunk *chunk, int new_alloc)
+{
+	int *old = NULL, *new = NULL;
+	size_t old_size = 0, new_size = new_alloc * sizeof(new[0]);
+	unsigned long flags;
+
+	new = pcpu_mem_alloc(new_size);
+	if (!new)
 		return -ENOMEM;
-	}

-	/*
-	 * Acquire pcpu_lock and switch to new area map.  Only free
-	 * could have happened inbetween, so map_used couldn't have
-	 * grown.
-	 */
-	spin_lock_irqsave(&pcpu_lock, *flags);
-	BUG_ON(new_alloc < chunk->map_used + 2);
+	/* acquire pcpu_lock and switch to new area map */
+	spin_lock_irqsave(&pcpu_lock, flags);
+
+	if (new_alloc <= chunk->map_alloc)
+		goto out_unlock;

-	size = chunk->map_alloc * sizeof(chunk->map[0]);
-	memcpy(new, chunk->map, size);
+	old_size = chunk->map_alloc * sizeof(chunk->map[0]);
+	memcpy(new, chunk->map, old_size);

 	/*
 	 * map_alloc < PCPU_DFL_MAP_ALLOC indicates that the chunk is
 	 * one of the first chunks and still using static map.
 	 */
 	if (chunk->map_alloc >= PCPU_DFL_MAP_ALLOC)
-		pcpu_mem_free(chunk->map, size);
+		old = chunk->map;

 	chunk->map_alloc = new_alloc;
 	chunk->map = new;
+	new = NULL;
+
+out_unlock:
+	spin_unlock_irqrestore(&pcpu_lock, flags);
+
+	/*
+	 * pcpu_mem_free() might end up calling vfree() which uses
+	 * IRQ-unsafe lock and thus can't be called under pcpu_lock.
+	 */
+	pcpu_mem_free(old, old_size);
+	pcpu_mem_free(new, new_size);
+
 	return 0;
 }

@@ -1049,7 +1073,7 @@ static void *pcpu_alloc(size_t size, size_t align, bool reserved)
 	static int warn_limit = 10;
 	struct pcpu_chunk *chunk;
 	const char *err;
-	int slot, off;
+	int slot, off, new_alloc;
 	unsigned long flags;

 	if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE)) {
@@ -1064,14 +1088,26 @@ static void *pcpu_alloc(size_t size, size_t align, bool reserved)
 	/* serve reserved allocations from the reserved chunk if available */
 	if (reserved && pcpu_reserved_chunk) {
 		chunk = pcpu_reserved_chunk;
-		if (size > chunk->contig_hint ||
-		    pcpu_extend_area_map(chunk, &flags) < 0) {
-			err = "failed to extend area map of reserved chunk";
+
+		if (size > chunk->contig_hint) {
+			err = "alloc from reserved chunk failed";
 			goto fail_unlock;
 		}
+
+		while ((new_alloc = pcpu_need_to_extend(chunk))) {
+			spin_unlock_irqrestore(&pcpu_lock, flags);
+			if (pcpu_extend_area_map(chunk, new_alloc) < 0) {
+				err = "failed to extend area map of "
+					"reserved chunk";
+				goto fail_unlock_mutex;
+			}
+			spin_lock_irqsave(&pcpu_lock, flags);
+		}
+
 		off = pcpu_alloc_area(chunk, size, align);
 		if (off >= 0)
 			goto area_found;
+
 		err = "alloc from reserved chunk failed";
 		goto fail_unlock;
 	}
@@ -1083,14 +1119,20 @@ restart:
 			if (size > chunk->contig_hint)
 				continue;

-			switch (pcpu_extend_area_map(chunk, &flags)) {
-			case 0:
-				break;
-			case 1:
-				goto restart;	/* pcpu_lock dropped, restart */
-			default:
-				err = "failed to extend area map";
-				goto fail_unlock;
+			new_alloc = pcpu_need_to_extend(chunk);
+			if (new_alloc) {
+				spin_unlock_irqrestore(&pcpu_lock, flags);
+				if (pcpu_extend_area_map(chunk,
+							 new_alloc) < 0) {
+					err = "failed to extend area map";
+					goto fail_unlock_mutex;
+				}
+				spin_lock_irqsave(&pcpu_lock, flags);
+				/*
+				 * pcpu_lock has been dropped, need to
+				 * restart cpu_slot list walking.
+				 */
+				goto restart;
 			}

 			off = pcpu_alloc_area(chunk, size, align);
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-11  3:55               ` Tejun Heo
@ 2009-11-11 11:31                 ` Ingo Molnar
  2009-11-11 12:21                   ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2009-11-11 11:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linus Torvalds, Linux Kernel, Yinghai Lu


* Tejun Heo <tj@kernel.org> wrote:

> Hello,
> 
> Linus Torvalds wrote:
> > I want the cleanup because I think that the code sucks from a "future 
> > proofing" and readability standpoint, but I really don't mind one way or 
> > the other whether you want to finally do that one "return 1" correctly for 
> > one commit, only to then fix it to not do that three-way test of a single 
> > function in the next one.
> > 
> > So whatever works - as long as the end result both looks sane and doesn't 
> > have the bug we clearly have now.
> 
> I was mostly worrying about introducing unrelated bug while changing. 
> Anyways, one patch it is.  I'll route it through tip as suggested by 
> Ingo in a few hours.

Btw., i'd suggest you keep it in your percpu tree as usual and send it 
to Linus directly - i offered testing for the cleanup patch (and can 
pull your patch for such a purpose), it doesnt 'have' to go via -tip.

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-11 11:31                 ` Ingo Molnar
@ 2009-11-11 12:21                   ` Tejun Heo
  2009-11-11 19:57                     ` Ingo Molnar
  0 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2009-11-11 12:21 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel, Yinghai Lu

Hello, Ingo.

Ingo Molnar wrote:
>> I was mostly worrying about introducing unrelated bug while changing. 
>> Anyways, one patch it is.  I'll route it through tip as suggested by 
>> Ingo in a few hours.
> 
> Btw., i'd suggest you keep it in your percpu tree as usual and send it 
> to Linus directly - i offered testing for the cleanup patch (and can 
> pull your patch for such a purpose), it doesnt 'have' to go via -tip.

Can you please then pull from the following tree for testing?  I'll
push it to Linus after a couple of days if nothing explodes.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git for-linus

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH percpu#for-linus] percpu: restructure pcpu_extend_area_map() to fix bugs and improve readability
  2009-11-11  8:49           ` [PATCH percpu#for-linus] percpu: restructure pcpu_extend_area_map() to fix bugs and improve readability Tejun Heo
@ 2009-11-11 19:25             ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2009-11-11 19:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Ingo Molnar, Linux Kernel, Yinghai Lu



On Wed, 11 Nov 2009, Tejun Heo wrote:
>
> I've also put this patch on percpu#for-linus but routing this through
> tip would be safer.  Went through usual battery of tests and also
> verified both bugs are fixed.  Ingo, can you please put this into tip?

Looks good to me. Ack.

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-11 12:21                   ` Tejun Heo
@ 2009-11-11 19:57                     ` Ingo Molnar
  2009-11-12 10:11                       ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2009-11-11 19:57 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linus Torvalds, Linux Kernel, Yinghai Lu


* Tejun Heo <tj@kernel.org> wrote:

> Hello, Ingo.
> 
> Ingo Molnar wrote:
> >> I was mostly worrying about introducing unrelated bug while changing. 
> >> Anyways, one patch it is.  I'll route it through tip as suggested by 
> >> Ingo in a few hours.
> > 
> > Btw., i'd suggest you keep it in your percpu tree as usual and send it 
> > to Linus directly - i offered testing for the cleanup patch (and can 
> > pull your patch for such a purpose), it doesnt 'have' to go via -tip.
> 
> Can you please then pull from the following tree for testing?  I'll 
> push it to Linus after a couple of days if nothing explodes.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git for-linus
> 
> Thanks.

Sure - pulled it into tip:master for testing earlier today and after a 
few hours of it's looking good so far in x86 runtime tests. I also did 
cross-build testing to a dozen non-x86 architectures and it was fine 
there too.

btw., there's some 80-cols checkpatch warning artifacts in the commit:

+                       if (pcpu_extend_area_map(chunk, new_alloc) < 0) {
+                               err = "failed to extend area map of "
+                                       "reserved chunk";

which suggest that the logic here is perhaps nested a bit too deep. It 
could be improved by moving the reserved allocation branch of 
pcpu_alloc():

        if (reserved && pcpu_reserved_chunk) {

into a helper inline function, something like __pcpu_alloc_reserved().

It's a rare special case anyway. It could be changed to return with the 
pcpu_lock always taken, so the above branch would look like this:

	if (unlikely(reserved)) {
		off = __pcpu_alloc_reserved(&chunk, size, align, &err);
		if (off < 0)
			goto fail_unlock;
		goto area_found;
	}

Which is a cleaner flow IMO, and which simplifes pcpu_alloc().

And the error string should be:

		err = "failed to extend area map of reserved chunk";

(Ignore the checkpatch complaint.)

What do you think?

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-11 19:57                     ` Ingo Molnar
@ 2009-11-12 10:11                       ` Tejun Heo
  2009-11-12 10:36                         ` Ingo Molnar
  2009-11-12 11:07                         ` Ingo Molnar
  0 siblings, 2 replies; 31+ messages in thread
From: Tejun Heo @ 2009-11-12 10:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel, Yinghai Lu

Hello, Ingo.

11/12/2009 04:57 AM, Ingo Molnar wrote:
> Sure - pulled it into tip:master for testing earlier today and after a 
> few hours of it's looking good so far in x86 runtime tests. I also did 
> cross-build testing to a dozen non-x86 architectures and it was fine 
> there too.

Great.

> btw., there's some 80-cols checkpatch warning artifacts in the commit:
> 
> +                       if (pcpu_extend_area_map(chunk, new_alloc) < 0) {
> +                               err = "failed to extend area map of "
> +                                       "reserved chunk";
> 
> which suggest that the logic here is perhaps nested a bit too deep. It 
> could be improved by moving the reserved allocation branch of 
> pcpu_alloc():

Strange, although the line break isn't the prettiest thing, the only
checkpatch problem I can see is the following.

  > scripts/checkpatch.pl 0001-percpu-restructure-pcpu_extend_area_map-to-fix-bugs-.patch 
  ERROR: trailing whitespace
  #80: FILE: mm/percpu.c:382:
  +^Ireturn new_alloc;^I$

  total: 1 errors, 0 warnings, 179 lines checked

  0001-percpu-restructure-pcpu_extend_area_map-to-fix-bugs-.patch has style problems, please review.  If any of these errors
  are false positives report them to the maintainer, see
  CHECKPATCH in MAINTAINERS.

The patch adds a trailing tab.  I'll fix that up (I usually catch
these while using quilt but this one didn't go through quilt and I
forgot to run checkpatch).

>         if (reserved && pcpu_reserved_chunk) {
> 
> into a helper inline function, something like __pcpu_alloc_reserved().
> 
> It's a rare special case anyway. It could be changed to return with the 
> pcpu_lock always taken, so the above branch would look like this:
> 
> 	if (unlikely(reserved)) {
> 		off = __pcpu_alloc_reserved(&chunk, size, align, &err);
> 		if (off < 0)
> 			goto fail_unlock;
> 		goto area_found;
> 	}
> 
> Which is a cleaner flow IMO, and which simplifes pcpu_alloc().

Hmmm... The thing is that the nesting isn't that deep there and
breaking string in the middle is something we do quite often.  What
checkpatch warning did you see?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 10:11                       ` Tejun Heo
@ 2009-11-12 10:36                         ` Ingo Molnar
  2009-11-12 10:58                           ` Tejun Heo
  2009-11-12 11:07                         ` Ingo Molnar
  1 sibling, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2009-11-12 10:36 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linus Torvalds, Linux Kernel, Yinghai Lu


* Tejun Heo <tj@kernel.org> wrote:

> Hmmm... The thing is that the nesting isn't that deep there and 
> breaking string in the middle is something we do quite often.  What 
> checkpatch warning did you see?

( i did not run checkpatch over your commit - i just assumed that the 
  ugliness was a checkpatch artifact. )

Breaking strings mid-sentence is something we try not to do. (If you 
know about places that do it 'quite often' then those places need fixing 
too.)

The biggest problem with it s that it breaks git grep. If someone sees 
this message in the log:

  PERCPU: allocation failed, size=1024 align=32, failed to extend area map of reserved chunk

and types the obvious:

  git grep 'failed to extend area map of reserved chunk'

the git-grep comes up empty because the string was needlessly broken in 
mid-sentence. Which is a confusing result and which causes people to 
waste time trying to figure out where the message came from.

The other messages in this function are fine btw, for example:

  git grep 'failed to populate'

will come up with the right place.

( There are also other reasons why we dont break strings mid-sentence - 
  it's also less readable to have it on two lines. )

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 10:36                         ` Ingo Molnar
@ 2009-11-12 10:58                           ` Tejun Heo
  2009-11-12 11:25                             ` Ingo Molnar
                                               ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Tejun Heo @ 2009-11-12 10:58 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel, Yinghai Lu

Hello, Ingo.

11/12/2009 07:36 PM, Ingo Molnar wrote:
> 
> * Tejun Heo <tj@kernel.org> wrote:
> 
>> Hmmm... The thing is that the nesting isn't that deep there and 
>> breaking string in the middle is something we do quite often.  What 
>> checkpatch warning did you see?
> 
> ( i did not run checkpatch over your commit - i just assumed that the 
>   ugliness was a checkpatch artifact. )
> 
> Breaking strings mid-sentence is something we try not to do. (If you 
> know about places that do it 'quite often' then those places need fixing 
> too.)

Oh... I do that all the time and I see a lot of them around too.

> the git-grep comes up empty because the string was needlessly broken in 
> mid-sentence. Which is a confusing result and which causes people to 
> waste time trying to figure out where the message came from.
> 
> The other messages in this function are fine btw, for example:
> 
>   git grep 'failed to populate'
> 
> will come up with the right place.

While I agree this is a valid reason, I really don't think we should
be restructuring whole code to accomodate long strings on single line.
I think a better way would be to teach grepping tool to match those
broken lines.  It shouldn't be too difficult to put this into ack[1]
and maybe we can have git-ack (that's a bad name for git tho).  I'll
ask ack author nicely.

> ( There are also other reasons why we dont break strings mid-sentence - 
>   it's also less readable to have it on two lines. )

This really depends on personal tastes.  When trying to use long
string literals, there are several choices.

1. Use broken strings.

				printk("blah blah blah blah "
				       "blah blah blah blah\n");

2. Push it into new line and unindent it.

				printk(
	"blah blah blah blah blah blah blah blah\n");

3. Restructure code so that the literal ends up in outer block.

	printk("blah blah blah blah blah blah blah blah\n");

I prefer the first choice.  The third would be nice if it's trivial to
do but I don't think it should dictate the code structure.  The second
one, I don't know.  Some people like that and grep will be happy with
it but it just seems very disturbing to my eyes.

Thanks.

-- 
tejun

[1] http://betterthangrep.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 10:11                       ` Tejun Heo
  2009-11-12 10:36                         ` Ingo Molnar
@ 2009-11-12 11:07                         ` Ingo Molnar
  2009-11-12 11:29                           ` Tejun Heo
  1 sibling, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2009-11-12 11:07 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linus Torvalds, Linux Kernel, Yinghai Lu


* Tejun Heo <tj@kernel.org> wrote:

> >         if (reserved && pcpu_reserved_chunk) {
> > 
> > into a helper inline function, something like __pcpu_alloc_reserved().
> > 
> > It's a rare special case anyway. It could be changed to return with the 
> > pcpu_lock always taken, so the above branch would look like this:
> > 
> > 	if (unlikely(reserved)) {
> > 		off = __pcpu_alloc_reserved(&chunk, size, align, &err);
> > 		if (off < 0)
> > 			goto fail_unlock;
> > 		goto area_found;
> > 	}
> > 
> > Which is a cleaner flow IMO, and which simplifes pcpu_alloc().
> 
> Hmmm... The thing is that the nesting isn't that deep there [...]

Well, the pcpu_alloc() function is 115 lines which is a bit long. It 
does 2-3 things while a function should try to do one thing.

Putting the reserved allocation into a separate function also makes the 
'main' path of logic more visible and obstructed less by rare details.

The indentation i pinpointed is 4 levels deep:

                                err = "failed to extend area map of "
                                        "reserved chunk";

which is a bit too much IMO - the code starts in the middle of the 
screen, there's barely any space to do anything meaningful.

But there's other line wrap artifacts as well further down:

                                if (pcpu_extend_area_map(chunk,
                                                         new_alloc) < 0) {

But ... there's no hard rules here and i've seen functions where 4 
levels of indentation were just ok. Anyway, i just gave you my opinion, 
and i'm definitely more on the nitpicky side of the code quality 
equilibrium, YMMV.

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 10:58                           ` Tejun Heo
@ 2009-11-12 11:25                             ` Ingo Molnar
  2009-11-12 14:26                             ` Oliver Neukum
  2009-11-12 15:17                             ` Linus Torvalds
  2 siblings, 0 replies; 31+ messages in thread
From: Ingo Molnar @ 2009-11-12 11:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linus Torvalds, Linux Kernel, Yinghai Lu


* Tejun Heo <tj@kernel.org> wrote:

> This really depends on personal tastes.  When trying to use long 
> string literals, there are several choices.
> 
> 1. Use broken strings.
> 
> 				printk("blah blah blah blah "
> 				       "blah blah blah blah\n");
> 
> 2. Push it into new line and unindent it.
> 
> 				printk(
> 	"blah blah blah blah blah blah blah blah\n");
> 
> 3. Restructure code so that the literal ends up in outer block.
> 
> 	printk("blah blah blah blah blah blah blah blah\n");
> 
> I prefer the first choice.  The third would be nice if it's trivial to 
> do but I don't think it should dictate the code structure.  The second 
> one, I don't know.  Some people like that and grep will be happy with 
> it but it just seems very disturbing to my eyes.

My preferred choice is:

  4. Change "blah blah blah blah blah blah blah blah\n" to "blah\n" - 
     the user really does not win much from the repetition.

Seriously, if you _ever_ get into a 'hm, the string is too long' 
situation, you should ask yourself two fundamental questions:

  a) Is the user really interested in this small novel?

  b) Is this a side-effect of a bloated function having too deep
     indentation?

IMHO, in the specific case at hand, both a) and b) apply:

                                err = "failed to extend area map of "
                                        "reserved chunk";

the indentation is a bit too deep, and the message is too chatty - the 
output itself is:

 PERCPU: allocation failed, size=1024 align=32, failed to extend area map of reserved chunk

A better/shorter message would be:

 percpu: 1024 bytes allocation failed: could not extend reserved chunk area map

This formulation is a bit shorter and i doubt the align parameter really 
needs printing - it's almost always the same and other context will tell 
us what it is anyway.

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 11:07                         ` Ingo Molnar
@ 2009-11-12 11:29                           ` Tejun Heo
  0 siblings, 0 replies; 31+ messages in thread
From: Tejun Heo @ 2009-11-12 11:29 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel, Yinghai Lu

Hello, Ingo.

11/12/2009 08:07 PM, Ingo Molnar wrote:
> Well, the pcpu_alloc() function is 115 lines which is a bit long. It 
> does 2-3 things while a function should try to do one thing.

I agree for low level / utility functions but for top level functions
which direct the flow of the whole logic, I usually prefer to put them
flat.  To me, that way things seem less obfuscated.

> Putting the reserved allocation into a separate function also makes the 
> 'main' path of logic more visible and obstructed less by rare details.
> 
> The indentation i pinpointed is 4 levels deep:
> 
>                                 err = "failed to extend area map of "
>                                         "reserved chunk";
> 
> which is a bit too much IMO - the code starts in the middle of the 
> screen, there's barely any space to do anything meaningful.

Well, all that's there is error exit.  Surrounding code segment is,

			if (pcpu_extend_area_map(chunk, new_alloc) < 0) {
				err = "failed to extend area map of "
					"reserved chunk";
				goto fail_unlock_mutex;
			}

So, we might as well just do

			err = "failed to extend area map of reserved chunk";
			if (pcpu_extend_area_map(chunk, new_alloc) < 0)
				goto fail_unlock_mutex;

> But there's other line wrap artifacts as well further down:
> 
>                                 if (pcpu_extend_area_map(chunk,
>                                                          new_alloc) < 0) {

This one is uglier and one level deeper than the previous one.  The
resulting nesting was one of the reasons why I factored out
pcpu_extend_area_map() as a whole and switched on the return value but
that obfuscated locking.  Although it nests quite a bit, I don't think
the loop there is too bad.  It shows what it does pretty well.

> But ... there's no hard rules here and i've seen functions where 4 
> levels of indentation were just ok. Anyway, i just gave you my opinion, 
> and i'm definitely more on the nitpicky side of the code quality 
> equilibrium, YMMV.

Indentation and code style are actually something I end up spending
quite some time on and I did think about the second one.  Factoring
out without hiding locking is a bit difficult but if I rename
new_alloc to new_len, I can fit that thing onto a single line but that
would probably require renaming matching local variable in
pcpu_extend_area_map() which will end up generating unnecessary amount
of diff obfuscating the real change.  At that point, I just thought we
could live with one slightly ugly line break.

So, I don't know.  Pros and cons on these things depend too much on
personal tastes (and even mood at the time of writing) to form uniform
standard to follow.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 10:58                           ` Tejun Heo
  2009-11-12 11:25                             ` Ingo Molnar
@ 2009-11-12 14:26                             ` Oliver Neukum
  2009-11-12 15:17                             ` Linus Torvalds
  2 siblings, 0 replies; 31+ messages in thread
From: Oliver Neukum @ 2009-11-12 14:26 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Ingo Molnar, Linus Torvalds, Linux Kernel, Yinghai Lu

Am Donnerstag, 12. November 2009 11:58:19 schrieb Tejun Heo:

> While I agree this is a valid reason, I really don't think we should
> be restructuring whole code to accomodate long strings on single line.
> I think a better way would be to teach grepping tool to match those
> broken lines.  It shouldn't be too difficult to put this into ack[1]
> and maybe we can have git-ack (that's a bad name for git tho).  I'll
> ask ack author nicely.

There's a point where following style guidelines turns into a fetish.
Plain simple "grep" should work.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 10:58                           ` Tejun Heo
  2009-11-12 11:25                             ` Ingo Molnar
  2009-11-12 14:26                             ` Oliver Neukum
@ 2009-11-12 15:17                             ` Linus Torvalds
  2009-11-12 15:30                               ` Tejun Heo
                                                 ` (2 more replies)
  2 siblings, 3 replies; 31+ messages in thread
From: Linus Torvalds @ 2009-11-12 15:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Ingo Molnar, Linux Kernel, Yinghai Lu



On Thu, 12 Nov 2009, Tejun Heo wrote:
> 
> 11/12/2009 07:36 PM, Ingo Molnar wrote:
> > 
> > Breaking strings mid-sentence is something we try not to do. (If you 
> > know about places that do it 'quite often' then those places need fixing 
> > too.)
> 
> Oh... I do that all the time and I see a lot of them around too.

I hate them. I do greps for error messages, and it's annoying as hell if 
it's hard to find.

'checkpatch' is the major reason for them, but I think we've fixed 
checkpath long ago to not warn about long lines if they are due to a long 
string.

Strings should basically be broken up only at '\n' characters, so

	printk("This is a made-up example.\n"
		"Ok like this\n");

is fine, because you can expect to grep for "made-up example", but not 
over the newline.

				Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 15:17                             ` Linus Torvalds
@ 2009-11-12 15:30                               ` Tejun Heo
  2009-11-12 15:45                                 ` Tejun Heo
  2009-11-12 17:04                               ` Andres Baldrich
  2009-11-12 18:14                               ` Andi Kleen
  2 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2009-11-12 15:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Linux Kernel, Yinghai Lu

Hello,

11/13/2009 12:17 AM, Linus Torvalds wrote:
> On Thu, 12 Nov 2009, Tejun Heo wrote:
>> 11/12/2009 07:36 PM, Ingo Molnar wrote:
>>> Breaking strings mid-sentence is something we try not to do. (If you 
>>> know about places that do it 'quite often' then those places need fixing 
>>> too.)
>>
>> Oh... I do that all the time and I see a lot of them around too.
> 
> I hate them. I do greps for error messages, and it's annoying as hell if 
> it's hard to find.
> 
> 'checkpatch' is the major reason for them, but I think we've fixed 
> checkpath long ago to not warn about long lines if they are due to a long 
> string.
> 
> Strings should basically be broken up only at '\n' characters, so
> 
> 	printk("This is a made-up example.\n"
> 		"Ok like this\n");
> 
> is fine, because you can expect to grep for "made-up example", but not 
> over the newline.

If the consensus is to allow long string literals to overrun 80 column
limit, I have no objection.  It makes code slightly more difficult to
read for some people but well we can't make everyone happy.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 15:30                               ` Tejun Heo
@ 2009-11-12 15:45                                 ` Tejun Heo
  2009-11-12 15:52                                   ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2009-11-12 15:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Linux Kernel, Yinghai Lu

11/13/2009 12:30 AM, Tejun Heo wrote:
>> 'checkpatch' is the major reason for them, but I think we've fixed 
>> checkpath long ago to not warn about long lines if they are due to a long 
>> string.

It still warns...  :-(

 WARNING: line over 80 characters
 #11: FILE: mm/percpu.c:1100:
 +                               err = "failed to extend area map of reserved chunk";

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 15:45                                 ` Tejun Heo
@ 2009-11-12 15:52                                   ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2009-11-12 15:52 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Ingo Molnar, Linux Kernel, Yinghai Lu



On Fri, 13 Nov 2009, Tejun Heo wrote:

> 11/13/2009 12:30 AM, Tejun Heo wrote:
> >> 'checkpatch' is the major reason for them, but I think we've fixed 
> >> checkpath long ago to not warn about long lines if they are due to a long 
> >> string.
> 
> It still warns...  :-(
> 
>  WARNING: line over 80 characters
>  #11: FILE: mm/percpu.c:1100:
>  +                               err = "failed to extend area map of reserved chunk";

Ok, just ignore it. The 80-column warning has always been sh*t anyway.

It may be that the "long line" warning is suppressed only for actual 
'printk()' cases, but it may also be that it looks at the intendation and 
only suppresses it for low levels of indents. I'm allergic to perl, so I'm 
not even going to look to try to find out the details.

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 15:17                             ` Linus Torvalds
  2009-11-12 15:30                               ` Tejun Heo
@ 2009-11-12 17:04                               ` Andres Baldrich
  2009-11-12 17:18                                 ` Linus Torvalds
  2009-11-12 18:14                               ` Andi Kleen
  2 siblings, 1 reply; 31+ messages in thread
From: Andres Baldrich @ 2009-11-12 17:04 UTC (permalink / raw)
  To: Tejun Heo, Linus Torvalds; +Cc: Ingo Molnar, Linux Kernel, Yinghai Lu



El jue 12-nov-09, Linus Torvalds <torvalds@linux-foundation.org> escribió:
> On Thu, 12 Nov 2009, Tejun Heo wrote:
> > 
> > 11/12/2009 07:36 PM, Ingo Molnar wrote:
> > > 
> > > Breaking strings mid-sentence is something we try
> not to do. (If you 
> > > know about places that do it 'quite often' then
> those places need fixing 
> > > too.)
> > 
> > Oh... I do that all the time and I see a lot of them
> around too.
> 
> I hate them. I do greps for error messages, and it's
> annoying as hell if 
> it's hard to find.
> 
> 'checkpatch' is the major reason for them, but I think
> we've fixed 
> checkpath long ago to not warn about long lines if they are
> due to a long 
> string.
> 
> Strings should basically be broken up only at '\n'
> characters, so
> 
>     printk("This is a made-up example.\n"
>         "Ok like this\n");
> 
> is fine, because you can expect to grep for "made-up
> example", but not 
> over the newline.
> 
>            
>     Linus


(Kernel)/Documentation/CodingStyle
line 83:
"Statements longer than 80 columns will be broken into sensible chunks.
Descendants are always substantially shorter than the parent and are placed
substantially to the right. The same applies to function headers with a long
argument list. Long strings are as well broken into shorter strings. The
only exception to this is where exceeding 80 columns significantly increases
readability and does not hide information.

void fun(int a, int b, int c)
{
	if (condition)
		printk(KERN_WARNING "Warning this is a long printk with "
						"3 parameters a: %u b: %u "
						"c: %u \n", a, b, c);
	else
		next_statement;
}"

Perhaps it should be changed?
I read the list's faq, it asked useful code or "diff" outputs for proposed changes, but i am an *absolute* beginner with < 6 months of college. 'Never' used FOSS before.
I hope it was good imput.
Andrés Baldrich   <abaldric@alu.itba.edu.ar>


      Yahoo! Cocina

Encontra las mejores recetas con Yahoo! Cocina.


http://ar.mujer.yahoo.com/cocina/


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 17:04                               ` Andres Baldrich
@ 2009-11-12 17:18                                 ` Linus Torvalds
  2009-11-12 18:04                                   ` Ingo Molnar
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2009-11-12 17:18 UTC (permalink / raw)
  To: Andres Baldrich; +Cc: Tejun Heo, Ingo Molnar, Linux Kernel, Yinghai Lu



On Thu, 12 Nov 2009, Andres Baldrich wrote:
> 
> (Kernel)/Documentation/CodingStyle
> line 83:

A lot of people have added code to CodingStyle. That doesn't make it 
final. For example, that 80-column thing never existed in my original 
coding style, for a reason.

I'm really inclined to just remove the stupid thing entirely both from 
coding-style and from checkpatch.

80 columns do not matter. What matters is:
 - indentation
 - complex expressions and statements

and those two issues _together_ means that 80+ columns should be damn 
rare, but the 80 columns itself is not at all that important.

Much more important than 80 columns should be the general guideline that a 
"terminal window" may be as small as 80x24. But notice how 80 is just a 
small part of that limitation - the 24 is as important as the 80. We have 
a guideline that functions should fit on a screenful or two, ie we should 
generally aim for functions to be <50 lines long.

And the 80-column thing is EXACTLY THE SAME THING. We should remember that 
people may read the code using a roughly 80x24 screen size, but the same 
way that nobody sane thinks that "24" is some hard limit on number of 
lines, why do people suddenly think that "80" is a hard limit on the 
number of columns?

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 17:18                                 ` Linus Torvalds
@ 2009-11-12 18:04                                   ` Ingo Molnar
  0 siblings, 0 replies; 31+ messages in thread
From: Ingo Molnar @ 2009-11-12 18:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andres Baldrich, Tejun Heo, Linux Kernel, Yinghai Lu


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> I'm really inclined to just remove the stupid thing entirely both from 
> coding-style and from checkpatch.
> 
> 80 columns do not matter. What matters is:
>  - indentation
>  - complex expressions and statements
> 
> and those two issues _together_ means that 80+ columns should be damn 
> rare, but the 80 columns itself is not at all that important.

i'd love to have that coupled with some check for too deep indentation. 
About half of the col-80 warnings i see are due to genuinely 
over-complex (and to-be-fixed) code.

Too bad that _those_ get ignored (because it's hard to fix) - while 
strings get broken up (because it's easy to 'fix') ;-)

So i'd very very much love to see some common-sense check for excessive 
indentation, and drop the col-80 warning altogether.

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [GIT PULL] percpu fixes for 2.6.32-rc6
  2009-11-12 15:17                             ` Linus Torvalds
  2009-11-12 15:30                               ` Tejun Heo
  2009-11-12 17:04                               ` Andres Baldrich
@ 2009-11-12 18:14                               ` Andi Kleen
  2 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2009-11-12 18:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tejun Heo, Ingo Molnar, Linux Kernel, Yinghai Lu

Linus Torvalds <torvalds@linux-foundation.org> writes:
>
> 'checkpatch' is the major reason for them, but I think we've fixed 
> checkpath long ago to not warn about long lines if they are due to a long 
> string.

It still warns as of 2.6.36rc6 (I hit that all the time):

WARNING: line over 80 characters
#14: FILE: kernel/sys.c:1659:
+"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [GIT PULL] percpu fixes for 2.6.32-rc6
@ 2009-11-13  3:53 Tejun Heo
  0 siblings, 0 replies; 31+ messages in thread
From: Tejun Heo @ 2009-11-13  3:53 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel; +Cc: Ingo Molnar

Hello, Linus.

Please pull from the following percpu fix branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git for-linus

It restructures pcpu_extend_area_map() to de-obfuscate locking and
fixes the following two bugs.

* wrong return value which makes the caller continue walking a
  modified list and may lead to oops.

* possible deadlock caused by lock ordering inversion through irq.

Thanks.
---
Tejun Heo (1):
      percpu: restructure pcpu_extend_area_map() to fix bugs and improve readability

 mm/percpu.c |  121 +++++++++++++++++++++++++++++++++++++++-------------------
 1 files changed, 81 insertions(+), 40 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index d907971..5adfc26 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -355,62 +355,86 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr)
 }

 /**
- * pcpu_extend_area_map - extend area map for allocation
- * @chunk: target chunk
+ * pcpu_need_to_extend - determine whether chunk area map needs to be extended
+ * @chunk: chunk of interest
  *
- * Extend area map of @chunk so that it can accomodate an allocation.
- * A single allocation can split an area into three areas, so this
- * function makes sure that @chunk->map has at least two extra slots.
+ * Determine whether area map of @chunk needs to be extended to
+ * accomodate a new allocation.
  *
  * CONTEXT:
- * pcpu_alloc_mutex, pcpu_lock.  pcpu_lock is released and reacquired
- * if area map is extended.
+ * pcpu_lock.
  *
  * RETURNS:
- * 0 if noop, 1 if successfully extended, -errno on failure.
+ * New target map allocation length if extension is necessary, 0
+ * otherwise.
  */
-static int pcpu_extend_area_map(struct pcpu_chunk *chunk, unsigned long *flags)
+static int pcpu_need_to_extend(struct pcpu_chunk *chunk)
 {
 	int new_alloc;
-	int *new;
-	size_t size;

-	/* has enough? */
 	if (chunk->map_alloc >= chunk->map_used + 2)
 		return 0;

-	spin_unlock_irqrestore(&pcpu_lock, *flags);
-
 	new_alloc = PCPU_DFL_MAP_ALLOC;
 	while (new_alloc < chunk->map_used + 2)
 		new_alloc *= 2;

-	new = pcpu_mem_alloc(new_alloc * sizeof(new[0]));
-	if (!new) {
-		spin_lock_irqsave(&pcpu_lock, *flags);
+	return new_alloc;
+}
+
+/**
+ * pcpu_extend_area_map - extend area map of a chunk
+ * @chunk: chunk of interest
+ * @new_alloc: new target allocation length of the area map
+ *
+ * Extend area map of @chunk to have @new_alloc entries.
+ *
+ * CONTEXT:
+ * Does GFP_KERNEL allocation.  Grabs and releases pcpu_lock.
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+static int pcpu_extend_area_map(struct pcpu_chunk *chunk, int new_alloc)
+{
+	int *old = NULL, *new = NULL;
+	size_t old_size = 0, new_size = new_alloc * sizeof(new[0]);
+	unsigned long flags;
+
+	new = pcpu_mem_alloc(new_size);
+	if (!new)
 		return -ENOMEM;
-	}

-	/*
-	 * Acquire pcpu_lock and switch to new area map.  Only free
-	 * could have happened inbetween, so map_used couldn't have
-	 * grown.
-	 */
-	spin_lock_irqsave(&pcpu_lock, *flags);
-	BUG_ON(new_alloc < chunk->map_used + 2);
+	/* acquire pcpu_lock and switch to new area map */
+	spin_lock_irqsave(&pcpu_lock, flags);
+
+	if (new_alloc <= chunk->map_alloc)
+		goto out_unlock;

-	size = chunk->map_alloc * sizeof(chunk->map[0]);
-	memcpy(new, chunk->map, size);
+	old_size = chunk->map_alloc * sizeof(chunk->map[0]);
+	memcpy(new, chunk->map, old_size);

 	/*
 	 * map_alloc < PCPU_DFL_MAP_ALLOC indicates that the chunk is
 	 * one of the first chunks and still using static map.
 	 */
 	if (chunk->map_alloc >= PCPU_DFL_MAP_ALLOC)
-		pcpu_mem_free(chunk->map, size);
+		old = chunk->map;

 	chunk->map_alloc = new_alloc;
 	chunk->map = new;
+	new = NULL;
+
+out_unlock:
+	spin_unlock_irqrestore(&pcpu_lock, flags);
+
+	/*
+	 * pcpu_mem_free() might end up calling vfree() which uses
+	 * IRQ-unsafe lock and thus can't be called under pcpu_lock.
+	 */
+	pcpu_mem_free(old, old_size);
+	pcpu_mem_free(new, new_size);
+
 	return 0;
 }

@@ -1049,7 +1073,7 @@ static void *pcpu_alloc(size_t size, size_t align, bool reserved)
 	static int warn_limit = 10;
 	struct pcpu_chunk *chunk;
 	const char *err;
-	int slot, off;
+	int slot, off, new_alloc;
 	unsigned long flags;

 	if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE)) {
@@ -1064,14 +1088,25 @@ static void *pcpu_alloc(size_t size, size_t align, bool reserved)
 	/* serve reserved allocations from the reserved chunk if available */
 	if (reserved && pcpu_reserved_chunk) {
 		chunk = pcpu_reserved_chunk;
-		if (size > chunk->contig_hint ||
-		    pcpu_extend_area_map(chunk, &flags) < 0) {
-			err = "failed to extend area map of reserved chunk";
+
+		if (size > chunk->contig_hint) {
+			err = "alloc from reserved chunk failed";
 			goto fail_unlock;
 		}
+
+		while ((new_alloc = pcpu_need_to_extend(chunk))) {
+			spin_unlock_irqrestore(&pcpu_lock, flags);
+			if (pcpu_extend_area_map(chunk, new_alloc) < 0) {
+				err = "failed to extend area map of reserved chunk";
+				goto fail_unlock_mutex;
+			}
+			spin_lock_irqsave(&pcpu_lock, flags);
+		}
+
 		off = pcpu_alloc_area(chunk, size, align);
 		if (off >= 0)
 			goto area_found;
+
 		err = "alloc from reserved chunk failed";
 		goto fail_unlock;
 	}
@@ -1083,14 +1118,20 @@ restart:
 			if (size > chunk->contig_hint)
 				continue;

-			switch (pcpu_extend_area_map(chunk, &flags)) {
-			case 0:
-				break;
-			case 1:
-				goto restart;	/* pcpu_lock dropped, restart */
-			default:
-				err = "failed to extend area map";
-				goto fail_unlock;
+			new_alloc = pcpu_need_to_extend(chunk);
+			if (new_alloc) {
+				spin_unlock_irqrestore(&pcpu_lock, flags);
+				if (pcpu_extend_area_map(chunk,
+							 new_alloc) < 0) {
+					err = "failed to extend area map";
+					goto fail_unlock_mutex;
+				}
+				spin_lock_irqsave(&pcpu_lock, flags);
+				/*
+				 * pcpu_lock has been dropped, need to
+				 * restart cpu_slot list walking.
+				 */
+				goto restart;
 			}

 			off = pcpu_alloc_area(chunk, size, align);

-- 
tejun

^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2009-11-13  3:54 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-10  6:04 [GIT PULL] percpu fixes for 2.6.32-rc6 Tejun Heo
2009-11-10 17:10 ` Linus Torvalds
2009-11-10 18:33   ` Tejun Heo
2009-11-10 18:54     ` Linus Torvalds
2009-11-10 19:25       ` Tejun Heo
2009-11-10 19:37         ` Ingo Molnar
2009-11-10 19:50           ` Tejun Heo
2009-11-10 21:42             ` Linus Torvalds
2009-11-11  3:55               ` Tejun Heo
2009-11-11 11:31                 ` Ingo Molnar
2009-11-11 12:21                   ` Tejun Heo
2009-11-11 19:57                     ` Ingo Molnar
2009-11-12 10:11                       ` Tejun Heo
2009-11-12 10:36                         ` Ingo Molnar
2009-11-12 10:58                           ` Tejun Heo
2009-11-12 11:25                             ` Ingo Molnar
2009-11-12 14:26                             ` Oliver Neukum
2009-11-12 15:17                             ` Linus Torvalds
2009-11-12 15:30                               ` Tejun Heo
2009-11-12 15:45                                 ` Tejun Heo
2009-11-12 15:52                                   ` Linus Torvalds
2009-11-12 17:04                               ` Andres Baldrich
2009-11-12 17:18                                 ` Linus Torvalds
2009-11-12 18:04                                   ` Ingo Molnar
2009-11-12 18:14                               ` Andi Kleen
2009-11-12 11:07                         ` Ingo Molnar
2009-11-12 11:29                           ` Tejun Heo
2009-11-11  8:49           ` [PATCH percpu#for-linus] percpu: restructure pcpu_extend_area_map() to fix bugs and improve readability Tejun Heo
2009-11-11 19:25             ` Linus Torvalds
2009-11-10 19:44         ` [GIT PULL] percpu fixes for 2.6.32-rc6 Tejun Heo
2009-11-13  3:53 Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.