linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* namespaces?: bug at mm/slub.c:2750
@ 2009-02-06 11:35 Eric Sesterhenn
  2009-02-07  0:15 ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Sesterhenn @ 2009-02-06 11:35 UTC (permalink / raw)
  To: linux-kernel

Hi,

with todays -git i get the following bug when i reboot the machine

[   94.369135] ------------[ cut here ]------------
[   94.369344] kernel BUG at mm/slub.c:2750!
[   94.369463] invalid opcode: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[   94.369828] last sysfs file: /sys/devices/pnp0/00:0c/id
[   94.369952] Modules linked in:
[   94.370035] 
[   94.370035] Pid: 0, comm: swapper Not tainted
(2.6.29-rc3-00653-g8285bbf #240) System Name
[   94.370035] EIP: 0060:[<c018340b>] EFLAGS: 00010246 CPU: 0
[   94.370035] EIP is at kfree+0x3c/0xe4
[   94.370035] EAX: 00000400 EBX: c1139000 ECX: 00000000 EDX: c09fb204
[   94.370035] ESI: c114cf60 EDI: c09fb204 EBP: c0ab0f94 ESP: c0ab0f80
[   94.370035]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[   94.370035] Process swapper (pid: 0, ti=c0ab0000 task=c09f23ac
task.ti=c0a50000)
[   94.370035] Stack:
[   94.370035]  c0ab0f94 00000286 c09fb204 c014e784 00000020 c0ab0fa0
c014e79c c09fb204
[   94.370035]  c0ab0fb0 c04dca32 cede1240 00000282 c0ab0fc4 c01297d1
c012c8aa cf9e4dc0
[   94.370035]  cf9e4e20 c0ab0fd0 c0135b0d 00000000 c0ab0fe0 c015a783
00000100 00000001
[   94.370035] Call Trace:
[   94.370035]  [<c014e784>] ? free_user_ns+0x0/0x1b
[   94.370035]  [<c014e79c>] ? free_user_ns+0x18/0x1b
[   94.370035]  [<c04dca32>] ? kref_put+0x3b/0x49
[   94.370035]  [<c01297d1>] ? free_uid+0x49/0xa4
[   94.370035]  [<c012c8aa>] ? groups_free+0x31/0x35
[   94.370035]  [<c0135b0d>] ? put_cred_rcu+0x52/0x63
[   94.370035]  [<c015a783>] ? rcu_process_callbacks+0x60/0x74
[   94.370035]  [<c0125af2>] ? __do_softirq+0x6a/0xf1
[   94.370035]  [<c0125a88>] ? __do_softirq+0x0/0xf1
[   94.370035]  <IRQ> <0> [<c01598a8>] ? handle_level_irq+0x0/0xbc
[   94.370035]  [<c0125a0a>] ? irq_exit+0x3b/0x77
[   94.370035]  [<c010411d>] ? do_IRQ+0xdc/0xf3
[   94.370035]  [<c010340c>] ? common_interrupt+0x2c/0x34
[   94.370035]  [<c013007b>] ? param_array_set+0x18/0xc6
[   94.370035]  [<c051f5d2>] ? acpi_idle_enter_simple+0x167/0x1d1
[   94.370035]  [<c062c69d>] ? cpuidle_idle_call+0x57/0x8e
[   94.370035]  [<c01019eb>] ? cpu_idle+0x59/0x86
[   94.370035]  [<c077a585>] ? rest_init+0x61/0x63
[   94.370035] Code: 00 00 00 8b 1d e4 0f 03 c1 e8 fb f3 f8 ff c1 e8 0c
c1 e0 05 8d 34 03 f6 46 01 40 74 03 8b 76 0c 8b 06 84 c0 78 15 f6 c4 60
75 04 <0f> 0b eb fe 89 f0 e8 90 7a fe ff e9 90 00 00 00 8b 45 04 89 45 
[   94.370035] EIP: [<c018340b>] kfree+0x3c/0xe4 SS:ESP 0068:c0ab0f80
[   94.383497] ---[ end trace 4779637014de8de6 ]---
[   94.383620] Kernel panic - not syncing: Fatal exception in interrupt

The bug triggered is BUG_ON(!PageCompound(page)) in kfree();

(gdb) l *(free_user_ns+0x0)
0xc014e784 is in free_user_ns (kernel/user_namespace.c:64).
59	
60		return 0;
61	}
62	
63	void free_user_ns(struct kref *kref)
64	{
65		struct user_namespace *ns;
66	
67		ns = container_of(kref, struct user_namespace, kref);
68		free_uid(ns->creator);
(gdb) l *(groups_free)
0xc012c879 is in groups_free (kernel/sys.c:1160).
1155	}
1156	
1157	EXPORT_SYMBOL(groups_alloc);
1158	
1159	void groups_free(struct group_info *group_info)
1160	{
1161		if (group_info->blocks[0] != group_info->small_block) {
1162			int i;
1163			for (i = 0; i < group_info->nblocks; i++)
1164				free_page((unsigned
long)group_info->blocks[i]);


Greetings, Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: namespaces?: bug at mm/slub.c:2750
  2009-02-06 11:35 namespaces?: bug at mm/slub.c:2750 Eric Sesterhenn
@ 2009-02-07  0:15 ` Andrew Morton
  2009-02-09 11:53   ` Eric Sesterhenn
  2009-02-11  7:55   ` Vegard Nossum
  0 siblings, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2009-02-07  0:15 UTC (permalink / raw)
  To: Eric Sesterhenn; +Cc: linux-kernel

On Fri, 6 Feb 2009 12:35:56 +0100
Eric Sesterhenn <snakebyte@gmx.de> wrote:

> Hi,
> 
> with todays -git i get the following bug when i reboot the machine
> 
> [   94.369135] ------------[ cut here ]------------
> [   94.369344] kernel BUG at mm/slub.c:2750!
> [   94.369463] invalid opcode: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> [   94.369828] last sysfs file: /sys/devices/pnp0/00:0c/id
> [   94.369952] Modules linked in:
> [   94.370035] 
> [   94.370035] Pid: 0, comm: swapper Not tainted
> (2.6.29-rc3-00653-g8285bbf #240) System Name
> [   94.370035] EIP: 0060:[<c018340b>] EFLAGS: 00010246 CPU: 0
> [   94.370035] EIP is at kfree+0x3c/0xe4
> [   94.370035] EAX: 00000400 EBX: c1139000 ECX: 00000000 EDX: c09fb204
> [   94.370035] ESI: c114cf60 EDI: c09fb204 EBP: c0ab0f94 ESP: c0ab0f80
> [   94.370035]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [   94.370035] Process swapper (pid: 0, ti=c0ab0000 task=c09f23ac
> task.ti=c0a50000)
> [   94.370035] Stack:
> [   94.370035]  c0ab0f94 00000286 c09fb204 c014e784 00000020 c0ab0fa0
> c014e79c c09fb204
> [   94.370035]  c0ab0fb0 c04dca32 cede1240 00000282 c0ab0fc4 c01297d1
> c012c8aa cf9e4dc0
> [   94.370035]  cf9e4e20 c0ab0fd0 c0135b0d 00000000 c0ab0fe0 c015a783
> 00000100 00000001
> [   94.370035] Call Trace:
> [   94.370035]  [<c014e784>] ? free_user_ns+0x0/0x1b
> [   94.370035]  [<c014e79c>] ? free_user_ns+0x18/0x1b
> [   94.370035]  [<c04dca32>] ? kref_put+0x3b/0x49
> [   94.370035]  [<c01297d1>] ? free_uid+0x49/0xa4
> [   94.370035]  [<c012c8aa>] ? groups_free+0x31/0x35
> [   94.370035]  [<c0135b0d>] ? put_cred_rcu+0x52/0x63
> [   94.370035]  [<c015a783>] ? rcu_process_callbacks+0x60/0x74
> [   94.370035]  [<c0125af2>] ? __do_softirq+0x6a/0xf1
> [   94.370035]  [<c0125a88>] ? __do_softirq+0x0/0xf1
> [   94.370035]  <IRQ> <0> [<c01598a8>] ? handle_level_irq+0x0/0xbc
> [   94.370035]  [<c0125a0a>] ? irq_exit+0x3b/0x77
> [   94.370035]  [<c010411d>] ? do_IRQ+0xdc/0xf3
> [   94.370035]  [<c010340c>] ? common_interrupt+0x2c/0x34
> [   94.370035]  [<c013007b>] ? param_array_set+0x18/0xc6
> [   94.370035]  [<c051f5d2>] ? acpi_idle_enter_simple+0x167/0x1d1
> [   94.370035]  [<c062c69d>] ? cpuidle_idle_call+0x57/0x8e
> [   94.370035]  [<c01019eb>] ? cpu_idle+0x59/0x86
> [   94.370035]  [<c077a585>] ? rest_init+0x61/0x63
> [   94.370035] Code: 00 00 00 8b 1d e4 0f 03 c1 e8 fb f3 f8 ff c1 e8 0c
> c1 e0 05 8d 34 03 f6 46 01 40 74 03 8b 76 0c 8b 06 84 c0 78 15 f6 c4 60
> 75 04 <0f> 0b eb fe 89 f0 e8 90 7a fe ff e9 90 00 00 00 8b 45 04 89 45 
> [   94.370035] EIP: [<c018340b>] kfree+0x3c/0xe4 SS:ESP 0068:c0ab0f80
> [   94.383497] ---[ end trace 4779637014de8de6 ]---
> [   94.383620] Kernel panic - not syncing: Fatal exception in interrupt
> 
> The bug triggered is BUG_ON(!PageCompound(page)) in kfree();
> 
> (gdb) l *(free_user_ns+0x0)
> 0xc014e784 is in free_user_ns (kernel/user_namespace.c:64).
> 59	
> 60		return 0;
> 61	}
> 62	
> 63	void free_user_ns(struct kref *kref)
> 64	{
> 65		struct user_namespace *ns;
> 66	
> 67		ns = container_of(kref, struct user_namespace, kref);
> 68		free_uid(ns->creator);
> (gdb) l *(groups_free)
> 0xc012c879 is in groups_free (kernel/sys.c:1160).
> 1155	}
> 1156	
> 1157	EXPORT_SYMBOL(groups_alloc);
> 1158	
> 1159	void groups_free(struct group_info *group_info)
> 1160	{
> 1161		if (group_info->blocks[0] != group_info->small_block) {
> 1162			int i;
> 1163			for (i = 0; i < group_info->nblocks; i++)
> 1164				free_page((unsigned
> long)group_info->blocks[i]);
> 

Well that's ugly.  We seem to have passed a non-slab address into
kfree().

void kfree(const void *x)
{
	struct page *page;
	void *object = (void *)x;

	if (unlikely(ZERO_OR_NULL_PTR(x)))
		return;

	page = virt_to_head_page(x);
	if (unlikely(!PageSlab(page))) {
		BUG_ON(!PageCompound(page));


doing BUG_ON(!PageCompound) is a rather odd way of reporting that.

I'm unsure what could have caused this.  Could you have a play around
please?  Set all the memory debug options, try using slab instead of
slub, etc?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: namespaces?: bug at mm/slub.c:2750
  2009-02-07  0:15 ` Andrew Morton
@ 2009-02-09 11:53   ` Eric Sesterhenn
  2009-02-11  7:55   ` Vegard Nossum
  1 sibling, 0 replies; 14+ messages in thread
From: Eric Sesterhenn @ 2009-02-09 11:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

* Andrew Morton (akpm@linux-foundation.org) wrote:
> On Fri, 6 Feb 2009 12:35:56 +0100
> Eric Sesterhenn <snakebyte@gmx.de> wrote:
> 
> > Hi,
> > 
> > with todays -git i get the following bug when i reboot the machine
> > 
> > [   94.369135] ------------[ cut here ]------------
> > [   94.369344] kernel BUG at mm/slub.c:2750!
> > [   94.369463] invalid opcode: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> > [   94.369828] last sysfs file: /sys/devices/pnp0/00:0c/id
> > [   94.369952] Modules linked in:
> > [   94.370035] 
> > [   94.370035] Pid: 0, comm: swapper Not tainted
> > (2.6.29-rc3-00653-g8285bbf #240) System Name
> > [   94.370035] EIP: 0060:[<c018340b>] EFLAGS: 00010246 CPU: 0
> > [   94.370035] EIP is at kfree+0x3c/0xe4
> > [   94.370035] EAX: 00000400 EBX: c1139000 ECX: 00000000 EDX: c09fb204
> > [   94.370035] ESI: c114cf60 EDI: c09fb204 EBP: c0ab0f94 ESP: c0ab0f80
> > [   94.370035]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > [   94.370035] Process swapper (pid: 0, ti=c0ab0000 task=c09f23ac
> > task.ti=c0a50000)
> > [   94.370035] Stack:
> > [   94.370035]  c0ab0f94 00000286 c09fb204 c014e784 00000020 c0ab0fa0
> > c014e79c c09fb204
> > [   94.370035]  c0ab0fb0 c04dca32 cede1240 00000282 c0ab0fc4 c01297d1
> > c012c8aa cf9e4dc0
> > [   94.370035]  cf9e4e20 c0ab0fd0 c0135b0d 00000000 c0ab0fe0 c015a783
> > 00000100 00000001
> > [   94.370035] Call Trace:
> > [   94.370035]  [<c014e784>] ? free_user_ns+0x0/0x1b
> > [   94.370035]  [<c014e79c>] ? free_user_ns+0x18/0x1b
> > [   94.370035]  [<c04dca32>] ? kref_put+0x3b/0x49
> > [   94.370035]  [<c01297d1>] ? free_uid+0x49/0xa4
> > [   94.370035]  [<c012c8aa>] ? groups_free+0x31/0x35
> > [   94.370035]  [<c0135b0d>] ? put_cred_rcu+0x52/0x63
> > [   94.370035]  [<c015a783>] ? rcu_process_callbacks+0x60/0x74
> > [   94.370035]  [<c0125af2>] ? __do_softirq+0x6a/0xf1
> > [   94.370035]  [<c0125a88>] ? __do_softirq+0x0/0xf1
> > [   94.370035]  <IRQ> <0> [<c01598a8>] ? handle_level_irq+0x0/0xbc
> > [   94.370035]  [<c0125a0a>] ? irq_exit+0x3b/0x77
> > [   94.370035]  [<c010411d>] ? do_IRQ+0xdc/0xf3
> > [   94.370035]  [<c010340c>] ? common_interrupt+0x2c/0x34
> > [   94.370035]  [<c013007b>] ? param_array_set+0x18/0xc6
> > [   94.370035]  [<c051f5d2>] ? acpi_idle_enter_simple+0x167/0x1d1
> > [   94.370035]  [<c062c69d>] ? cpuidle_idle_call+0x57/0x8e
> > [   94.370035]  [<c01019eb>] ? cpu_idle+0x59/0x86
> > [   94.370035]  [<c077a585>] ? rest_init+0x61/0x63
> > [   94.370035] Code: 00 00 00 8b 1d e4 0f 03 c1 e8 fb f3 f8 ff c1 e8 0c
> > c1 e0 05 8d 34 03 f6 46 01 40 74 03 8b 76 0c 8b 06 84 c0 78 15 f6 c4 60
> > 75 04 <0f> 0b eb fe 89 f0 e8 90 7a fe ff e9 90 00 00 00 8b 45 04 89 45 
> > [   94.370035] EIP: [<c018340b>] kfree+0x3c/0xe4 SS:ESP 0068:c0ab0f80
> > [   94.383497] ---[ end trace 4779637014de8de6 ]---
> > [   94.383620] Kernel panic - not syncing: Fatal exception in interrupt
> > 
> > The bug triggered is BUG_ON(!PageCompound(page)) in kfree();
> > 
> > (gdb) l *(free_user_ns+0x0)
> > 0xc014e784 is in free_user_ns (kernel/user_namespace.c:64).
> > 59	
> > 60		return 0;
> > 61	}
> > 62	
> > 63	void free_user_ns(struct kref *kref)
> > 64	{
> > 65		struct user_namespace *ns;
> > 66	
> > 67		ns = container_of(kref, struct user_namespace, kref);
> > 68		free_uid(ns->creator);
> > (gdb) l *(groups_free)
> > 0xc012c879 is in groups_free (kernel/sys.c:1160).
> > 1155	}
> > 1156	
> > 1157	EXPORT_SYMBOL(groups_alloc);
> > 1158	
> > 1159	void groups_free(struct group_info *group_info)
> > 1160	{
> > 1161		if (group_info->blocks[0] != group_info->small_block) {
> > 1162			int i;
> > 1163			for (i = 0; i < group_info->nblocks; i++)
> > 1164				free_page((unsigned
> > long)group_info->blocks[i]);
> > 
> 
> Well that's ugly.  We seem to have passed a non-slab address into
> kfree().
> 
> void kfree(const void *x)
> {
> 	struct page *page;
> 	void *object = (void *)x;
> 
> 	if (unlikely(ZERO_OR_NULL_PTR(x)))
> 		return;
> 
> 	page = virt_to_head_page(x);
> 	if (unlikely(!PageSlab(page))) {
> 		BUG_ON(!PageCompound(page));
> 
> 
> doing BUG_ON(!PageCompound) is a rather odd way of reporting that.
> 
> I'm unsure what could have caused this.  Could you have a play around
> please?  Set all the memory debug options, try using slab instead of
> slub, etc

Short story, I cant reproduce this anymore.

Long story:

I tried bisecting this. I tested each kernel with at least two
reboots, the ones marked bad paniced on both, all the good
ones rebooted cleanly.

root@whiterabbit:/usr/src/linux# git-bisect log
git-bisect start
# good: [97499aafffd20f018d719acc5ed73cf65705784d] Merge branch 'master' of
# git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
git-bisect good 97499aafffd20f018d719acc5ed73cf65705784d
# bad: [695874d11965eee158e9bf45807a7a2db3f652c9] Merge branch 'master' of
# git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
git-bisect bad 695874d11965eee158e9bf45807a7a2db3f652c9
# good: [67e70baf043cfdcdaf5972bc94be82632071536b] V4L/DVB (10411): s5h1409:
# Perform s5h1409 soft reset after tuning
git-bisect good 67e70baf043cfdcdaf5972bc94be82632071536b
# good: [5c350d93ff4736086a1b08fef1d0b5e22138d2e0] Merge branch 'for-linus' of
# git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
git-bisect good 5c350d93ff4736086a1b08fef1d0b5e22138d2e0
# bad: [93bfbd71db4d2e01c05e219f285249a74808b1d4] Merge branch 'merge' of
# git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
git-bisect bad 93bfbd71db4d2e01c05e219f285249a74808b1d4
# bad: [31c952dcf83d5b0fd57b514cbe8a1664647c26e7] Merge branch
# 'sched-fixes-for-linus' of
# git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git-bisect bad 31c952dcf83d5b0fd57b514cbe8a1664647c26e7
# good: [a9f3e2b549f83a9cdab873abf4140be27c05a3f2] sched: clear buddies more
# aggressively
git-bisect good a9f3e2b549f83a9cdab873abf4140be27c05a3f2
# good: [3d398703ef06fd97b4c28c86b580546d5b57e7b7] sched_rt: don't use
# first_cpu on cpumask created with cpumask_and
git-bisect good 3d398703ef06fd97b4c28c86b580546d5b57e7b7
# good: [9e6235e997bf091326b2f3ac92217c2ac2e27eb5] Merge branch 'for_linus' of
# git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
git-bisect good 9e6235e997bf091326b2f3ac92217c2ac2e27eb5

root@whiterabbit:/usr/src/linux# git-bisect good
31c952dcf83d5b0fd57b514cbe8a1664647c26e7 is first bad commit

root@whiterabbit:/usr/src/linux# git-show 31c952dcf83d5b0fd57b514cbe8a1664647c26e7
commit 31c952dcf83d5b0fd57b514cbe8a1664647c26e7
Merge: 9e6235e... 3d39870...
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Feb 2 19:26:29 2009 -0800

    Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
    
    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
      sched_rt: don't use first_cpu on cpumask created with cpumask_and
      sched: fix buddie group latency
      sched: clear buddies more aggressively
      sched: symmetric sync vs avg_overlap
      sched: fix sync wakeups
      cpuset: fix possible deadlock in async_rebuild_sched_domains


This seemed bogus to me, as far as i understand git, the merge commits
do not actually change the code, they only server as a reference
that other commits where pulled in.

Tried a git-reset --hard 31c952dcf83d5b0fd57b514cbe8a1664647c26e7
to get to the always failing version again. Which turned out to
fail in the last of three reboots... :| With dmesg -n 8 I didnt get more
information out of this.

I enabled slab on this 2.6.29-rc3-00410-g31c952d and tested 10 
reboots. None showed anything strange :(

After this I tried with SLOB, but it often hang
during boot after

[    0.762297] Block layer SCSI generic (bsg) driver version 0.4 loaded (major
253)
[    0.762556] io scheduler noop registered
[    0.762662] io scheduler cfq registered (default)
[    0.767357] PCI: VIA PCI bridge detected. Disabling DAC.

So I checked out a fresh tree (2.6.29-rc3-00742-ge83102c) and build it
with the same config and SLUB enabled again. Just to see
if this was still reproducible. 16 reboots later
I didnt see this again. I then tested 2.6.29-rc4-00001-gd5b5623
but saw nothing in 13 reboots. Noticed CONFIG_SLUB_DEBUG_ON=n
and set it to y, tried another 12 boots and nothing happened.

Based on this I assume this was just a somehow miscompiled
kernel since being really really hard to reproduce on some kernels,
while always reproducible on others sounds somewhat strange.

Greetings, Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: namespaces?: bug at mm/slub.c:2750
  2009-02-07  0:15 ` Andrew Morton
  2009-02-09 11:53   ` Eric Sesterhenn
@ 2009-02-11  7:55   ` Vegard Nossum
  2009-02-11  8:07     ` Andrew Morton
  1 sibling, 1 reply; 14+ messages in thread
From: Vegard Nossum @ 2009-02-11  7:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Eric Sesterhenn, linux-kernel

On Sat, Feb 7, 2009 at 1:15 AM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Fri, 6 Feb 2009 12:35:56 +0100
> Eric Sesterhenn <snakebyte@gmx.de> wrote:
>>
>> with todays -git i get the following bug when i reboot the machine
>>
>> [   94.369135] ------------[ cut here ]------------
>> [   94.369344] kernel BUG at mm/slub.c:2750!

[...]

> Well that's ugly.  We seem to have passed a non-slab address into
> kfree().
>
> void kfree(const void *x)
> {
>        struct page *page;
>        void *object = (void *)x;
>
>        if (unlikely(ZERO_OR_NULL_PTR(x)))
>                return;
>
>        page = virt_to_head_page(x);
>        if (unlikely(!PageSlab(page))) {
>                BUG_ON(!PageCompound(page));
>
>
> doing BUG_ON(!PageCompound) is a rather odd way of reporting that.
>

This might be pointing out the obvious, but page allocator
pass-through means that we can't really tell slab pages from page
allocator pages. But since pass-through uses GFP_COMP, we know it'd be
a bug to pass a non-compound page into this function.

> I'm unsure what could have caused this.  Could you have a play around
> please?  Set all the memory debug options, try using slab instead of
> slub, etc?

I think it is trying to free the init_user_ns (because of a
refcounting bug). I am able to reproduce it with a clean stack trace:

[  648.864710] ------------[ cut here ]------------
[  648.865554] kernel BUG at mm/slub.c:2750!
[  648.865554] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[  648.865554] last sysfs file: /sys/devices/pnp0/00:0d/id
[  648.865554] CPU 1
[  648.865554] Modules linked in:
[  648.865554] Pid: 917, comm: udevd Not tainted 2.6.29-rc3 #223
[  648.865554] RIP: 0010:[<ffffffff810b99c9>]  [<ffffffff810b99c9>]
kfree+0x29/0x87
[  648.865554] RSP: 0018:ffff88003f187e28  EFLAGS: 00010246
[  648.865554] RAX: 0100000000000400 RBX: ffffffff8171fe00 RCX: 0000000000000086
[  648.865554] RDX: ffffe20000050ec8 RSI: 0000000000000085 RDI: ffffe20000050ec8
[  648.865554] RBP: ffff88003f187e38 R08: 0000000000000000 R09: ffffffff81819cb0
[  648.865554] R10: 0000000000000002 R11: ffff88003f187e98 R12: ffffffff81072144
[  648.865554] R13: 00000000000000cf R14: ffffffff818b13e0 R15: 000000000000000a
[  648.865554] FS:  0000000000000000(0000) GS:ffff88003f156f80(0063)
knlGS:00000000f7fac700
[  648.865554] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[  648.865554] CR2: 0000000009b2d630 CR3: 000000003e92c000 CR4: 00000000000006a0
[  648.865554] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  648.865554] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  648.865554] Process udevd (pid: 917, threadinfo ffff88003e458000,
task ffff88003f1b8e80)
[  648.865554] Stack:
[  648.865554]  ffffffff8171fe00 ffffffff81072144 ffff88003f187e58
ffffffff81072169
[  648.865554]  ffffffff818b13e0 ffffffff8171fe00 ffff88003f187e78
ffffffff811b2f68
[  648.865554]  ffff88003dd4c500 0000000000000286 ffff88003f187e98
ffffffff81046712
[  648.865554] Call Trace:
[  648.865554]  <IRQ> <0> [<ffffffff81072144>] ? free_user_ns+0x0/0x29
[  648.865554]  [<ffffffff81072169>] free_user_ns+0x25/0x29
[  648.865554]  [<ffffffff811b2f68>] kref_put+0x66/0x72
[  648.865554]  [<ffffffff81046712>] free_uid+0x4f/0x90
[  648.865554]  [<ffffffff810555b8>] put_cred_rcu+0xa5/0xb9
[  648.865554]  [<ffffffff8107ea86>] __rcu_process_callbacks+0x1f4/0x263
[  648.865554]  [<ffffffff8107eb2c>] rcu_process_callbacks+0x37/0x5d
[  648.865554]  [<ffffffff81041bd3>] __do_softirq+0x88/0x149
[  648.865554]  [<ffffffff8100d53c>] call_softirq+0x1c/0x28
[  648.865554]  [<ffffffff8100e4e9>] do_softirq+0x39/0x7b
[  648.865554]  [<ffffffff81041946>] irq_exit+0x44/0x7e
[  648.865554]  [<ffffffff814c3d8f>] smp_apic_timer_interrupt+0x98/0xb1
[  648.865554]  [<ffffffff8100cf73>] apic_timer_interrupt+0x13/0x20
[  648.865554]  <EOI> <0>Code: c9 c3 55 48 89 e5 41 54 53 0f 1f 44 00
00 48 83 ff 10 48 89 fb 7
6 6d e8 1d ed ff ff 48 89 c7 48 8b 00 84 c0 78 10 f6 c4 60 75 04 <0f>
0b eb fe e8 f4 cd fd ff e
b 4e 48 8b 4d 08 4c 8b 4f 10 9c 41
[  648.865554] RIP  [<ffffffff810b99c9>] kfree+0x29/0x87
[  648.865554]  RSP <ffff88003f187e28>
[  649.131912] ---[ end trace a48f49f69d2bcb3e ]---
[  649.136840] Kernel panic - not syncing: Fatal exception in interrupt
[  649.143558] Rebooting in 10 seconds..

And if you look closely in the stack dump, you will find init_user_ns:

ffffffff8171fe00 D init_user_ns

In case you missed it, KOSAKI Motohiro posted a similar stack-trace
(but not the same BUG) in this thread:
http://lkml.org/lkml/2009/2/10/7


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: namespaces?: bug at mm/slub.c:2750
  2009-02-11  7:55   ` Vegard Nossum
@ 2009-02-11  8:07     ` Andrew Morton
  2009-02-11 10:48       ` Vegard Nossum
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2009-02-11  8:07 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Eric Sesterhenn, linux-kernel, David Howells, KOSAKI Motohiro,
	containers

(cc's added)

On Wed, 11 Feb 2009 08:55:08 +0100 Vegard Nossum <vegard.nossum@gmail.com> wrote:

> On Sat, Feb 7, 2009 at 1:15 AM, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Fri, 6 Feb 2009 12:35:56 +0100
> > Eric Sesterhenn <snakebyte@gmx.de> wrote:
> >>
> >> with todays -git i get the following bug when i reboot the machine
> >>
> >> [   94.369135] ------------[ cut here ]------------
> >> [   94.369344] kernel BUG at mm/slub.c:2750!
> 
> [...]
> 
> > Well that's ugly.  We seem to have passed a non-slab address into
> > kfree().
> >
> > void kfree(const void *x)
> > {
> >        struct page *page;
> >        void *object = (void *)x;
> >
> >        if (unlikely(ZERO_OR_NULL_PTR(x)))
> >                return;
> >
> >        page = virt_to_head_page(x);
> >        if (unlikely(!PageSlab(page))) {
> >                BUG_ON(!PageCompound(page));
> >
> >
> > doing BUG_ON(!PageCompound) is a rather odd way of reporting that.
> >
> 
> This might be pointing out the obvious, but page allocator
> pass-through means that we can't really tell slab pages from page
> allocator pages. But since pass-through uses GFP_COMP, we know it'd be
> a bug to pass a non-compound page into this function.
> 
> > I'm unsure what could have caused this.  Could you have a play around
> > please?  Set all the memory debug options, try using slab instead of
> > slub, etc?
> 
> I think it is trying to free the init_user_ns (because of a
> refcounting bug). I am able to reproduce it with a clean stack trace:
> 
> [  648.864710] ------------[ cut here ]------------
> [  648.865554] kernel BUG at mm/slub.c:2750!
> [  648.865554] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> [  648.865554] last sysfs file: /sys/devices/pnp0/00:0d/id
> [  648.865554] CPU 1
> [  648.865554] Modules linked in:
> [  648.865554] Pid: 917, comm: udevd Not tainted 2.6.29-rc3 #223
> [  648.865554] RIP: 0010:[<ffffffff810b99c9>]  [<ffffffff810b99c9>]
> kfree+0x29/0x87
> [  648.865554] RSP: 0018:ffff88003f187e28  EFLAGS: 00010246
> [  648.865554] RAX: 0100000000000400 RBX: ffffffff8171fe00 RCX: 0000000000000086
> [  648.865554] RDX: ffffe20000050ec8 RSI: 0000000000000085 RDI: ffffe20000050ec8
> [  648.865554] RBP: ffff88003f187e38 R08: 0000000000000000 R09: ffffffff81819cb0
> [  648.865554] R10: 0000000000000002 R11: ffff88003f187e98 R12: ffffffff81072144
> [  648.865554] R13: 00000000000000cf R14: ffffffff818b13e0 R15: 000000000000000a
> [  648.865554] FS:  0000000000000000(0000) GS:ffff88003f156f80(0063)
> knlGS:00000000f7fac700
> [  648.865554] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
> [  648.865554] CR2: 0000000009b2d630 CR3: 000000003e92c000 CR4: 00000000000006a0
> [  648.865554] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  648.865554] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  648.865554] Process udevd (pid: 917, threadinfo ffff88003e458000,
> task ffff88003f1b8e80)
> [  648.865554] Stack:
> [  648.865554]  ffffffff8171fe00 ffffffff81072144 ffff88003f187e58
> ffffffff81072169
> [  648.865554]  ffffffff818b13e0 ffffffff8171fe00 ffff88003f187e78
> ffffffff811b2f68
> [  648.865554]  ffff88003dd4c500 0000000000000286 ffff88003f187e98
> ffffffff81046712
> [  648.865554] Call Trace:
> [  648.865554]  <IRQ> <0> [<ffffffff81072144>] ? free_user_ns+0x0/0x29
> [  648.865554]  [<ffffffff81072169>] free_user_ns+0x25/0x29
> [  648.865554]  [<ffffffff811b2f68>] kref_put+0x66/0x72
> [  648.865554]  [<ffffffff81046712>] free_uid+0x4f/0x90
> [  648.865554]  [<ffffffff810555b8>] put_cred_rcu+0xa5/0xb9
> [  648.865554]  [<ffffffff8107ea86>] __rcu_process_callbacks+0x1f4/0x263
> [  648.865554]  [<ffffffff8107eb2c>] rcu_process_callbacks+0x37/0x5d
> [  648.865554]  [<ffffffff81041bd3>] __do_softirq+0x88/0x149
> [  648.865554]  [<ffffffff8100d53c>] call_softirq+0x1c/0x28
> [  648.865554]  [<ffffffff8100e4e9>] do_softirq+0x39/0x7b
> [  648.865554]  [<ffffffff81041946>] irq_exit+0x44/0x7e
> [  648.865554]  [<ffffffff814c3d8f>] smp_apic_timer_interrupt+0x98/0xb1
> [  648.865554]  [<ffffffff8100cf73>] apic_timer_interrupt+0x13/0x20
> [  648.865554]  <EOI> <0>Code: c9 c3 55 48 89 e5 41 54 53 0f 1f 44 00
> 00 48 83 ff 10 48 89 fb 7
> 6 6d e8 1d ed ff ff 48 89 c7 48 8b 00 84 c0 78 10 f6 c4 60 75 04 <0f>
> 0b eb fe e8 f4 cd fd ff e
> b 4e 48 8b 4d 08 4c 8b 4f 10 9c 41
> [  648.865554] RIP  [<ffffffff810b99c9>] kfree+0x29/0x87
> [  648.865554]  RSP <ffff88003f187e28>
> [  649.131912] ---[ end trace a48f49f69d2bcb3e ]---
> [  649.136840] Kernel panic - not syncing: Fatal exception in interrupt
> [  649.143558] Rebooting in 10 seconds..
> 
> And if you look closely in the stack dump, you will find init_user_ns:
> 
> ffffffff8171fe00 D init_user_ns
> 
> In case you missed it, KOSAKI Motohiro posted a similar stack-trace
> (but not the same BUG) in this thread:
> http://lkml.org/lkml/2009/2/10/7
> 

Both traces include the newly-added put_cred_rcu().  Suspicious.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: namespaces?: bug at mm/slub.c:2750
  2009-02-11  8:07     ` Andrew Morton
@ 2009-02-11 10:48       ` Vegard Nossum
  2009-02-11 16:37         ` Serge E. Hallyn
  2009-02-11 17:02         ` David Howells
  0 siblings, 2 replies; 14+ messages in thread
From: Vegard Nossum @ 2009-02-11 10:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric Sesterhenn, linux-kernel, David Howells, KOSAKI Motohiro,
	containers

On Wed, Feb 11, 2009 at 9:07 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>> In case you missed it, KOSAKI Motohiro posted a similar stack-trace
>> (but not the same BUG) in this thread:
>> http://lkml.org/lkml/2009/2/10/7
>>
>
> Both traces include the newly-added put_cred_rcu().  Suspicious.
>

I have this test case which triggers it regularly after some minutes:

#include <sys/wait.h>
#include <sys/types.h>

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char* argv[])
{
        unsigned int i;

        if (fork() == 0) {
                while (1)
                        system("echo -n .");
        }

        for (i = 0; i < 2; ++i) {
                if (fork() == 0) {
                        while (1) {
                                setreuid(0, 0xcafeba);
                                setreuid(0xcafeba, 0);

                                setreuid(0, 0xcafebb);
                                setreuid(0xcafebb, 0);
                        }

                        exit(EXIT_SUCCESS);
                }
        }

        while (!(wait(NULL) == -1 && errno == ECHILD))
                ;

        return 0;
}

It seems to be the combination of exec() and setreuid(), but I
couldn't get it to work with just exec() instead of system(). It is
possible that CONFIG_USER_SCHED must be =y for this to work. It can
probably be simplified too...


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: namespaces?: bug at mm/slub.c:2750
  2009-02-11 10:48       ` Vegard Nossum
@ 2009-02-11 16:37         ` Serge E. Hallyn
  2009-02-11 17:02         ` David Howells
  1 sibling, 0 replies; 14+ messages in thread
From: Serge E. Hallyn @ 2009-02-11 16:37 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Andrew Morton, David Howells, Eric Sesterhenn, containers,
	linux-kernel, Dhaval Giani, Peter Zijlstra

Quoting Vegard Nossum (vegard.nossum@gmail.com):
> On Wed, Feb 11, 2009 at 9:07 AM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >> In case you missed it, KOSAKI Motohiro posted a similar stack-trace
> >> (but not the same BUG) in this thread:
> >> http://lkml.org/lkml/2009/2/10/7
> >>
> >
> > Both traces include the newly-added put_cred_rcu().  Suspicious.
> >
> 
> I have this test case which triggers it regularly after some minutes:
> 
> #include <sys/wait.h>
> #include <sys/types.h>
> 
> #include <errno.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> 
> int main(int argc, char* argv[])
> {
>         unsigned int i;
> 
>         if (fork() == 0) {
>                 while (1)
>                         system("echo -n .");
>         }
> 
>         for (i = 0; i < 2; ++i) {
>                 if (fork() == 0) {
>                         while (1) {
>                                 setreuid(0, 0xcafeba);
>                                 setreuid(0xcafeba, 0);
> 
>                                 setreuid(0, 0xcafebb);
>                                 setreuid(0xcafebb, 0);
>                         }
> 
>                         exit(EXIT_SUCCESS);
>                 }
>         }
> 
>         while (!(wait(NULL) == -1 && errno == ECHILD))
>                 ;
> 
>         return 0;
> }
> 
> It seems to be the combination of exec() and setreuid(), but I
> couldn't get it to work with just exec() instead of system(). It is
> possible that CONFIG_USER_SCHED must be =y for this to work. It can
> probably be simplified too...
> 
> 
> Vegard

I haven't yet been able to reproduce it, but I'm wondering whether
something like the following is needed.

Note that free_user() still seems wrong there, as it won't call
uid_hash_remove(up) for a user which is not inthe init_user_ns at
all, right?


>From 2bc9da9728e0a4242bd8b1362d34e8d425be1b66 Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <serue@us.ibm.com>
Date: Wed, 11 Feb 2009 11:29:52 -0500
Subject: [PATCH 1/1] user namespaces: only put the userns when we unhash the uid

uids in namespaces other than init don't get a sysfs entry.

For those in the init namespace, while we're waiting to remove
the sysfs entry for the uid the uid is still hashed, and
alloc_uid() may re-grab that uid without getting a new
reference to the user_ns, which we've already put in free_user
before scheduling remove_user_sysfs_dir().

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
---
 kernel/user.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/kernel/user.c b/kernel/user.c
index 477b666..bb401a5 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -71,6 +71,7 @@ static void uid_hash_insert(struct user_struct *up, struct hlist_head *hashent)
 
 static void uid_hash_remove(struct user_struct *up)
 {
+	put_user_ns(up->user_ns);
 	hlist_del_init(&up->uidhash_node);
 }
 
@@ -334,7 +335,6 @@ static void free_user(struct user_struct *up, unsigned long flags)
 	atomic_inc(&up->__count);
 	spin_unlock_irqrestore(&uidhash_lock, flags);
 
-	put_user_ns(up->user_ns);
 	INIT_WORK(&up->work, remove_user_sysfs_dir);
 	schedule_work(&up->work);
 }
@@ -357,7 +357,6 @@ static void free_user(struct user_struct *up, unsigned long flags)
 	sched_destroy_user(up);
 	key_put(up->uid_keyring);
 	key_put(up->session_keyring);
-	put_user_ns(up->user_ns);
 	kmem_cache_free(uid_cachep, up);
 }
 
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: namespaces?: bug at mm/slub.c:2750
  2009-02-11 10:48       ` Vegard Nossum
  2009-02-11 16:37         ` Serge E. Hallyn
@ 2009-02-11 17:02         ` David Howells
  2009-02-11 17:24           ` Serge E. Hallyn
  2009-02-11 18:03           ` David Howells
  1 sibling, 2 replies; 14+ messages in thread
From: David Howells @ 2009-02-11 17:02 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: dhowells, Vegard Nossum, Andrew Morton, Eric Sesterhenn,
	containers, linux-kernel, Dhaval Giani, Peter Zijlstra

Serge E. Hallyn <serue@us.ibm.com> wrote:

>  static void uid_hash_remove(struct user_struct *up)
>  {
> +	put_user_ns(up->user_ns);
>  	hlist_del_init(&up->uidhash_node);
>  }

Don't you need to do the hlist_del_init() first?  Otherwise, mightn't the
put_user_ns() cause the namespace to be freed before hlist_del_init() removes
the user_struct from it?

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: namespaces?: bug at mm/slub.c:2750
  2009-02-11 17:02         ` David Howells
@ 2009-02-11 17:24           ` Serge E. Hallyn
  2009-02-11 18:00             ` Vegard Nossum
  2009-02-11 18:03           ` David Howells
  1 sibling, 1 reply; 14+ messages in thread
From: Serge E. Hallyn @ 2009-02-11 17:24 UTC (permalink / raw)
  To: David Howells
  Cc: Vegard Nossum, Andrew Morton, Eric Sesterhenn, containers,
	linux-kernel, Dhaval Giani, Peter Zijlstra

Quoting David Howells (dhowells@redhat.com):
> Serge E. Hallyn <serue@us.ibm.com> wrote:
> 
> >  static void uid_hash_remove(struct user_struct *up)
> >  {
> > +	put_user_ns(up->user_ns);
> >  	hlist_del_init(&up->uidhash_node);
> >  }
> 
> Don't you need to do the hlist_del_init() first?  Otherwise, mightn't the
> put_user_ns() cause the namespace to be freed before hlist_del_init() removes
> the user_struct from it?

It's called under uidhash_lock spinlock so should be ok, but in
principle you're right so it's probably a good idea.

The main point is that without this patch, put_user_ns is done before
the hlist_del_init and *not* atomically under uidhash_lock.

thanks,
-serge

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: namespaces?: bug at mm/slub.c:2750
  2009-02-11 17:24           ` Serge E. Hallyn
@ 2009-02-11 18:00             ` Vegard Nossum
  0 siblings, 0 replies; 14+ messages in thread
From: Vegard Nossum @ 2009-02-11 18:00 UTC (permalink / raw)
  To: Serge E. Hallyn, KOSAKI Motohiro
  Cc: David Howells, Andrew Morton, Eric Sesterhenn, containers,
	linux-kernel, Dhaval Giani, Peter Zijlstra

On Wed, Feb 11, 2009 at 6:24 PM, Serge E. Hallyn <serue@us.ibm.com> wrote:
> Quoting David Howells (dhowells@redhat.com):
>> Serge E. Hallyn <serue@us.ibm.com> wrote:
>>
>> >  static void uid_hash_remove(struct user_struct *up)
>> >  {
>> > +   put_user_ns(up->user_ns);
>> >     hlist_del_init(&up->uidhash_node);
>> >  }
>>
>> Don't you need to do the hlist_del_init() first?  Otherwise, mightn't the
>> put_user_ns() cause the namespace to be freed before hlist_del_init() removes
>> the user_struct from it?
>
> It's called under uidhash_lock spinlock so should be ok, but in
> principle you're right so it's probably a good idea.
>
> The main point is that without this patch, put_user_ns is done before
> the hlist_del_init and *not* atomically under uidhash_lock.

Congrats, your (unmodified) patch made it through the first 20 minutes
of testing! :-D

(In comparison, the unpatched kernel would usually crash after ~3 minutes)

I wonder why you couldn't reproduce it, though.

KOSAKI Motohiro: You might want to see if this patch helps too. It is
here: http://lkml.org/lkml/2009/2/11/251


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: namespaces?: bug at mm/slub.c:2750
  2009-02-11 17:02         ` David Howells
  2009-02-11 17:24           ` Serge E. Hallyn
@ 2009-02-11 18:03           ` David Howells
  2009-02-11 19:38             ` Serge E. Hallyn
  2009-02-11 20:42             ` David Howells
  1 sibling, 2 replies; 14+ messages in thread
From: David Howells @ 2009-02-11 18:03 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: dhowells, Vegard Nossum, Andrew Morton, Eric Sesterhenn,
	containers, linux-kernel, Dhaval Giani, Peter Zijlstra

Serge E. Hallyn <serue@us.ibm.com> wrote:

> It's called under uidhash_lock spinlock so should be ok, but in
> principle you're right so it's probably a good idea.

The lock is nothing to do with it.  put_user_ns() may call kfree() on the
user_namespace, but the user_struct given to uid_hash_remove() may still be
attached to it.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: namespaces?: bug at mm/slub.c:2750
  2009-02-11 18:03           ` David Howells
@ 2009-02-11 19:38             ` Serge E. Hallyn
  2009-02-11 20:42             ` David Howells
  1 sibling, 0 replies; 14+ messages in thread
From: Serge E. Hallyn @ 2009-02-11 19:38 UTC (permalink / raw)
  To: David Howells
  Cc: Vegard Nossum, Andrew Morton, Eric Sesterhenn, containers,
	linux-kernel, Dhaval Giani, Peter Zijlstra

Quoting David Howells (dhowells@redhat.com):
> Serge E. Hallyn <serue@us.ibm.com> wrote:
> 
> > It's called under uidhash_lock spinlock so should be ok, but in
> > principle you're right so it's probably a good idea.
> 
> The lock is nothing to do with it.  put_user_ns() may call kfree() on the
> user_namespace, but the user_struct given to uid_hash_remove() may still be
> attached to it.

Yes, but noone will pull the user_struct off the list without
taking the lock.

what am I missing?

Anyway, I do like swapping the lines (as below) better.

-serge


>From 8b83d11023c1064e99bffae3c2a05580b915de60 Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <serue@us.ibm.com>
Date: Wed, 11 Feb 2009 11:29:52 -0500
Subject: [PATCH 1/1] user namespaces: only put the userns when we unhash the uid

uids in namespaces other than init don't get a sysfs entry.

For those in the init namespace, while we're waiting to remove
the sysfs entry for the uid the uid is still hashed, and
alloc_uid() may re-grab that uid without getting a new
reference to the user_ns, which we've already put in free_user
before scheduling remove_user_sysfs_dir().

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
---
 kernel/user.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/kernel/user.c b/kernel/user.c
index 477b666..3551ac7 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -72,6 +72,7 @@ static void uid_hash_insert(struct user_struct *up, struct hlist_head *hashent)
 static void uid_hash_remove(struct user_struct *up)
 {
 	hlist_del_init(&up->uidhash_node);
+	put_user_ns(up->user_ns);
 }
 
 static struct user_struct *uid_hash_find(uid_t uid, struct hlist_head *hashent)
@@ -334,7 +335,6 @@ static void free_user(struct user_struct *up, unsigned long flags)
 	atomic_inc(&up->__count);
 	spin_unlock_irqrestore(&uidhash_lock, flags);
 
-	put_user_ns(up->user_ns);
 	INIT_WORK(&up->work, remove_user_sysfs_dir);
 	schedule_work(&up->work);
 }
@@ -357,7 +357,6 @@ static void free_user(struct user_struct *up, unsigned long flags)
 	sched_destroy_user(up);
 	key_put(up->uid_keyring);
 	key_put(up->session_keyring);
-	put_user_ns(up->user_ns);
 	kmem_cache_free(uid_cachep, up);
 }
 
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: namespaces?: bug at mm/slub.c:2750
  2009-02-11 18:03           ` David Howells
  2009-02-11 19:38             ` Serge E. Hallyn
@ 2009-02-11 20:42             ` David Howells
  2009-02-11 21:21               ` Serge E. Hallyn
  1 sibling, 1 reply; 14+ messages in thread
From: David Howells @ 2009-02-11 20:42 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: dhowells, Vegard Nossum, Andrew Morton, Eric Sesterhenn,
	containers, linux-kernel, Dhaval Giani, Peter Zijlstra

Serge E. Hallyn <serue@us.ibm.com> wrote:

> Yes, but noone will pull the user_struct off the list without
> taking the lock.
> 
> what am I missing?

I believe that the hash link (uidhash_node) in the user_struct that is passed
to uid_hash_remove() points to, and is pointed to by the user_namespace to
which the user_struct belongs.

In which case calling put_user_ns() may kfree the head pointer of the list
_before_ hlist_del_init() is invoked - in which case hlist_del_init() will act
upon freed memory.

At least, I think it works like this.


Anyway, I have no objection to your new patch.

Acked-by: David Howells <dhowells@redhat.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: namespaces?: bug at mm/slub.c:2750
  2009-02-11 20:42             ` David Howells
@ 2009-02-11 21:21               ` Serge E. Hallyn
  0 siblings, 0 replies; 14+ messages in thread
From: Serge E. Hallyn @ 2009-02-11 21:21 UTC (permalink / raw)
  To: David Howells
  Cc: Vegard Nossum, Andrew Morton, Eric Sesterhenn, containers,
	linux-kernel, Dhaval Giani, Peter Zijlstra

Quoting David Howells (dhowells@redhat.com):
> Serge E. Hallyn <serue@us.ibm.com> wrote:
> 
> > Yes, but noone will pull the user_struct off the list without
> > taking the lock.
> > 
> > what am I missing?
> 
> I believe that the hash link (uidhash_node) in the user_struct that is passed
> to uid_hash_remove() points to, and is pointed to by the user_namespace to
> which the user_struct belongs.
> 
> In which case calling put_user_ns() may kfree the head pointer of the list
> _before_ hlist_del_init() is invoked - in which case hlist_del_init() will act
> upon freed memory.
> 
> At least, I think it works like this.

Yikes, you're right.  I was thinking there was on hash table with
the key calculated from ns+uid, but instead each ns has its own
hash table keyed on uid.

> Anyway, I have no objection to your new patch.
> 
> Acked-by: David Howells <dhowells@redhat.com>

thanks,
-serge

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-02-11 21:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-06 11:35 namespaces?: bug at mm/slub.c:2750 Eric Sesterhenn
2009-02-07  0:15 ` Andrew Morton
2009-02-09 11:53   ` Eric Sesterhenn
2009-02-11  7:55   ` Vegard Nossum
2009-02-11  8:07     ` Andrew Morton
2009-02-11 10:48       ` Vegard Nossum
2009-02-11 16:37         ` Serge E. Hallyn
2009-02-11 17:02         ` David Howells
2009-02-11 17:24           ` Serge E. Hallyn
2009-02-11 18:00             ` Vegard Nossum
2009-02-11 18:03           ` David Howells
2009-02-11 19:38             ` Serge E. Hallyn
2009-02-11 20:42             ` David Howells
2009-02-11 21:21               ` Serge E. Hallyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).