All of lore.kernel.org
 help / color / mirror / Atom feed
* [OOPS] BUG_ON in cgroups on 4.4.0-rc5-next
@ 2015-12-18 20:08 ` Alex Ng (LIS)
  0 siblings, 0 replies; 7+ messages in thread
From: Alex Ng (LIS) @ 2015-12-18 20:08 UTC (permalink / raw)
  To: tj, lizefan, hannes, cgroups; +Cc: linux-kernel

Hi,

I was running a "git clone" of the linux-next source tree and hit the following BUG_ON condition. My box is running kernel 4.4.0-rc5-next-20151217-52.27. Any ideas on how to pin down the cause?

The trace indicates that the following condition in compare_css_sets() triggered the oops:

	BUG_ON(cgrp1->root != cgrp2->root);

[ 1859.800805] ------------[ cut here ]------------
[ 1859.804082] kernel BUG at kernel/cgroup.c:834!
[ 1859.804082] invalid opcode: 0000 [#1] SMP
[ 1859.804082] Modules linked in: iscsi_ibft iscsi_boot_sysfs af_packet crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng aesni_intel i2c_piix4 hv_netvsc serio_raw pcspkr hyperv_keyboard aes_x86_64 lrw hyperv_fb joydev gf128mul glue_helper ablk_helper hv_utils acpi_cpufreq cryptd processor button dm_mod xfs libcrc32c sd_mod hid_generic sr_mod cdrom ata_generic ata_piix hid_hyperv hv_storvsc ahci libahci crc32c_intel hv_vmbus libata floppy sg scsi_mod autofs4
[ 1859.804082] CPU: 2 PID: 1 Comm: systemd Not tainted 4.4.0-rc5-next-20151217-52.27-default+ #2
[ 1859.804082] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
[ 1859.804082] task: ffff880101c54040 ti: ffff880101c58000 task.ti: ffff880101c58000
[ 1859.804082] RIP: 0010:[<ffffffff810f108d>]  [<ffffffff810f108d>] find_css_set+0x3ad/0x3e0
[ 1859.804082] RSP: 0018:ffff880101c5bc38  EFLAGS: 00010207
[ 1859.804082] RAX: ffff88003694b238 RBX: ffff8800f10d0638 RCX: ffff8800eefa8220
[ 1859.804082] RDX: ffff8800f14b5a20 RSI: ffff88003694b250 RDI: ffff880101c5bc48
[ 1859.804082] RBP: ffff880101c5bcc0 R08: 0000000000000000 R09: ffff8800f12efc00
[ 1859.804082] R10: ffff8800f18e3800 R11: 0000000000000000 R12: ffff8800f3938400
[ 1859.804082] R13: ffff880101c5bc48 R14: ffff8800f10d0600 R15: ffff88003694b200
[ 1859.804082] FS:  00007f994345a880(0000) GS:ffff880102e40000(0000) knlGS:0000000000000000
[ 1859.804082] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1859.804082] CR2: 00007fc829d19000 CR3: 0000000036e46000 CR4: 00000000000006e0
[ 1859.804082] Stack:
[ 1859.804082]  ffff880101c5bc88 ffffffff810c3970 ffffffff81a74b00 ffffffff81dcc380
[ 1859.804082]  ffffffff81a4d100 ffffffff81f5c660 ffff8801023df800 ffff8801023db500
[ 1859.804082]  ffff8801023d7400 ffff8801023d7340 ffff8801023d7280 ffff8801023db400
[ 1859.804082] Call Trace:
[ 1859.804082]  [<ffffffff810c3970>] ? __wait_rcu_gp+0xd0/0xf0
[ 1859.804082]  [<ffffffff810f115a>] cgroup_migrate_prepare_dst+0x9a/0x200
[ 1859.804082]  [<ffffffff810f2065>] cgroup_attach_task+0x65/0xd0
[ 1859.804082]  [<ffffffff810abf1d>] ? percpu_down_write+0x5d/0xd0
[ 1859.804082]  [<ffffffff810f2348>] __cgroup_procs_write.isra.22+0x1b8/0x2d0
[ 1859.804082]  [<ffffffff810f2493>] cgroup_procs_write+0x13/0x20
[ 1859.804082]  [<ffffffff810edb28>] cgroup_file_write+0x38/0xf0
[ 1859.804082]  [<ffffffff81250380>] kernfs_fop_write+0x120/0x170
[ 1859.804082]  [<ffffffff811daf08>] __vfs_write+0x28/0xe0
[ 1859.804082]  [<ffffffff8129a618>] ? apparmor_file_permission+0x18/0x20
[ 1859.804082]  [<ffffffff81273dbd>] ? security_file_permission+0x3d/0xc0
[ 1859.804082]  [<ffffffff810abe47>] ? percpu_down_read+0x17/0x50
[ 1859.804082]  [<ffffffff811db7c2>] vfs_write+0xa2/0x1a0
[ 1859.804082]  [<ffffffff81051310>] ? __do_page_fault+0x1a0/0x3f0
[ 1859.804082]  [<ffffffff811dc726>] SyS_write+0x46/0xa0
[ 1859.804082]  [<ffffffff815aafee>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 1859.804082] Code: 03 10 48 8b 72 08 48 89 4a 08 48 89 11 48 89 71 08 48 89 0e f6 40 74 01 75 c3 48 8b 50 18 f6 c2 03 75 22 65 48 ff 02 eb b4 0f 0b <0f> 0b 31 c0 e9 b0 fd ff ff 4c 89 ff e8 72 92 0c 00 31 c0 e9 a1
[ 1860.196107] RIP  [<ffffffff810f108d>] find_css_set+0x3ad/0x3e0
[ 1860.196107]  RSP <ffff880101c5bc38>
[ 1860.199742] ---[ end trace 3a415fee224c72a3 ]---
[ 1860.199744] Kernel panic - not syncing: Fatal exception in interrupt
[ 1860.203733] Kernel Offset: disabled
[ 1860.203733] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

--
Alex Ng


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [OOPS] BUG_ON in cgroups on 4.4.0-rc5-next
@ 2015-12-18 20:08 ` Alex Ng (LIS)
  0 siblings, 0 replies; 7+ messages in thread
From: Alex Ng (LIS) @ 2015-12-18 20:08 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
	hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hi,

I was running a "git clone" of the linux-next source tree and hit the following BUG_ON condition. My box is running kernel 4.4.0-rc5-next-20151217-52.27. Any ideas on how to pin down the cause?

The trace indicates that the following condition in compare_css_sets() triggered the oops:

	BUG_ON(cgrp1->root != cgrp2->root);

[ 1859.800805] ------------[ cut here ]------------
[ 1859.804082] kernel BUG at kernel/cgroup.c:834!
[ 1859.804082] invalid opcode: 0000 [#1] SMP
[ 1859.804082] Modules linked in: iscsi_ibft iscsi_boot_sysfs af_packet crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng aesni_intel i2c_piix4 hv_netvsc serio_raw pcspkr hyperv_keyboard aes_x86_64 lrw hyperv_fb joydev gf128mul glue_helper ablk_helper hv_utils acpi_cpufreq cryptd processor button dm_mod xfs libcrc32c sd_mod hid_generic sr_mod cdrom ata_generic ata_piix hid_hyperv hv_storvsc ahci libahci crc32c_intel hv_vmbus libata floppy sg scsi_mod autofs4
[ 1859.804082] CPU: 2 PID: 1 Comm: systemd Not tainted 4.4.0-rc5-next-20151217-52.27-default+ #2
[ 1859.804082] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
[ 1859.804082] task: ffff880101c54040 ti: ffff880101c58000 task.ti: ffff880101c58000
[ 1859.804082] RIP: 0010:[<ffffffff810f108d>]  [<ffffffff810f108d>] find_css_set+0x3ad/0x3e0
[ 1859.804082] RSP: 0018:ffff880101c5bc38  EFLAGS: 00010207
[ 1859.804082] RAX: ffff88003694b238 RBX: ffff8800f10d0638 RCX: ffff8800eefa8220
[ 1859.804082] RDX: ffff8800f14b5a20 RSI: ffff88003694b250 RDI: ffff880101c5bc48
[ 1859.804082] RBP: ffff880101c5bcc0 R08: 0000000000000000 R09: ffff8800f12efc00
[ 1859.804082] R10: ffff8800f18e3800 R11: 0000000000000000 R12: ffff8800f3938400
[ 1859.804082] R13: ffff880101c5bc48 R14: ffff8800f10d0600 R15: ffff88003694b200
[ 1859.804082] FS:  00007f994345a880(0000) GS:ffff880102e40000(0000) knlGS:0000000000000000
[ 1859.804082] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1859.804082] CR2: 00007fc829d19000 CR3: 0000000036e46000 CR4: 00000000000006e0
[ 1859.804082] Stack:
[ 1859.804082]  ffff880101c5bc88 ffffffff810c3970 ffffffff81a74b00 ffffffff81dcc380
[ 1859.804082]  ffffffff81a4d100 ffffffff81f5c660 ffff8801023df800 ffff8801023db500
[ 1859.804082]  ffff8801023d7400 ffff8801023d7340 ffff8801023d7280 ffff8801023db400
[ 1859.804082] Call Trace:
[ 1859.804082]  [<ffffffff810c3970>] ? __wait_rcu_gp+0xd0/0xf0
[ 1859.804082]  [<ffffffff810f115a>] cgroup_migrate_prepare_dst+0x9a/0x200
[ 1859.804082]  [<ffffffff810f2065>] cgroup_attach_task+0x65/0xd0
[ 1859.804082]  [<ffffffff810abf1d>] ? percpu_down_write+0x5d/0xd0
[ 1859.804082]  [<ffffffff810f2348>] __cgroup_procs_write.isra.22+0x1b8/0x2d0
[ 1859.804082]  [<ffffffff810f2493>] cgroup_procs_write+0x13/0x20
[ 1859.804082]  [<ffffffff810edb28>] cgroup_file_write+0x38/0xf0
[ 1859.804082]  [<ffffffff81250380>] kernfs_fop_write+0x120/0x170
[ 1859.804082]  [<ffffffff811daf08>] __vfs_write+0x28/0xe0
[ 1859.804082]  [<ffffffff8129a618>] ? apparmor_file_permission+0x18/0x20
[ 1859.804082]  [<ffffffff81273dbd>] ? security_file_permission+0x3d/0xc0
[ 1859.804082]  [<ffffffff810abe47>] ? percpu_down_read+0x17/0x50
[ 1859.804082]  [<ffffffff811db7c2>] vfs_write+0xa2/0x1a0
[ 1859.804082]  [<ffffffff81051310>] ? __do_page_fault+0x1a0/0x3f0
[ 1859.804082]  [<ffffffff811dc726>] SyS_write+0x46/0xa0
[ 1859.804082]  [<ffffffff815aafee>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 1859.804082] Code: 03 10 48 8b 72 08 48 89 4a 08 48 89 11 48 89 71 08 48 89 0e f6 40 74 01 75 c3 48 8b 50 18 f6 c2 03 75 22 65 48 ff 02 eb b4 0f 0b <0f> 0b 31 c0 e9 b0 fd ff ff 4c 89 ff e8 72 92 0c 00 31 c0 e9 a1
[ 1860.196107] RIP  [<ffffffff810f108d>] find_css_set+0x3ad/0x3e0
[ 1860.196107]  RSP <ffff880101c5bc38>
[ 1860.199742] ---[ end trace 3a415fee224c72a3 ]---
[ 1860.199744] Kernel panic - not syncing: Fatal exception in interrupt
[ 1860.203733] Kernel Offset: disabled
[ 1860.203733] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

--
Alex Ng

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [OOPS] BUG_ON in cgroups on 4.4.0-rc5-next
@ 2015-12-21 21:56   ` tj-DgEjT+Ai2ygdnm+yROfE0A
  0 siblings, 0 replies; 7+ messages in thread
From: tj @ 2015-12-21 21:56 UTC (permalink / raw)
  To: Alex Ng (LIS); +Cc: lizefan, hannes, cgroups, linux-kernel

Hello, Alex.

On Fri, Dec 18, 2015 at 08:08:03PM +0000, Alex Ng (LIS) wrote:
> Hi,
> 
> I was running a "git clone" of the linux-next source tree and hit the following BUG_ON condition. My box is running kernel 4.4.0-rc5-next-20151217-52.27. Any ideas on how to pin down the cause?
> 
> The trace indicates that the following condition in compare_css_sets() triggered the oops:

Can you please let me know the steps to reproduce the bug?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [OOPS] BUG_ON in cgroups on 4.4.0-rc5-next
@ 2015-12-21 21:56   ` tj-DgEjT+Ai2ygdnm+yROfE0A
  0 siblings, 0 replies; 7+ messages in thread
From: tj-DgEjT+Ai2ygdnm+yROfE0A @ 2015-12-21 21:56 UTC (permalink / raw)
  To: Alex Ng (LIS)
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello, Alex.

On Fri, Dec 18, 2015 at 08:08:03PM +0000, Alex Ng (LIS) wrote:
> Hi,
> 
> I was running a "git clone" of the linux-next source tree and hit the following BUG_ON condition. My box is running kernel 4.4.0-rc5-next-20151217-52.27. Any ideas on how to pin down the cause?
> 
> The trace indicates that the following condition in compare_css_sets() triggered the oops:

Can you please let me know the steps to reproduce the bug?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [OOPS] BUG_ON in cgroups on 4.4.0-rc5-next
  2015-12-21 21:56   ` tj-DgEjT+Ai2ygdnm+yROfE0A
  (?)
@ 2015-12-22 19:06   ` Alex Ng (LIS)
  2015-12-23 16:54       ` tj-DgEjT+Ai2ygdnm+yROfE0A
  -1 siblings, 1 reply; 7+ messages in thread
From: Alex Ng (LIS) @ 2015-12-22 19:06 UTC (permalink / raw)
  To: tj; +Cc: lizefan, hannes, cgroups, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 972 bytes --]

> Hello, Alex.
> 
> On Fri, Dec 18, 2015 at 08:08:03PM +0000, Alex Ng (LIS) wrote:
> > Hi,
> >
> > I was running a "git clone" of the linux-next source tree and hit the
> following BUG_ON condition. My box is running kernel 4.4.0-rc5-next-
> 20151217-52.27. Any ideas on how to pin down the cause?
> >
> > The trace indicates that the following condition in compare_css_sets()
> triggered the oops:
> 
> Can you please let me know the steps to reproduce the bug?

I tried this on a Hyper-V VM hosted in Windows Server 2012R2 and ran the attached script. 
The script clones the linux-next tree in a random directory under /tmp in a tight loop.

This panic is not always reproducible, and I have only hit it once after running the script about 10 times. A different kernel panic happens each time I run this script; and the panics always happen during the first iteration of the loop.

Let me know if you need more information.

Hope this helps,
Alex

[-- Attachment #2: test.sh --]
[-- Type: application/octet-stream, Size: 207 bytes --]

#!/bin/bash

function clonetree
{
	#while true; do
		clonedir=/tmp/$(uuidgen)
		git clone https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git $clonedir
		rm -rf $clonedir
	#done
}

clonetree

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [OOPS] BUG_ON in cgroups on 4.4.0-rc5-next
@ 2015-12-23 16:54       ` tj-DgEjT+Ai2ygdnm+yROfE0A
  0 siblings, 0 replies; 7+ messages in thread
From: tj @ 2015-12-23 16:54 UTC (permalink / raw)
  To: Alex Ng (LIS); +Cc: lizefan, hannes, cgroups, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 756 bytes --]

Hello, Alex.

On Tue, Dec 22, 2015 at 07:06:41PM +0000, Alex Ng (LIS) wrote:
> > Can you please let me know the steps to reproduce the bug?
> 
> I tried this on a Hyper-V VM hosted in Windows Server 2012R2 and ran
> the attached script.  The script clones the linux-next tree in a
> random directory under /tmp in a tight loop.
>
> This panic is not always reproducible, and I have only hit it once
> after running the script about 10 times. A different kernel panic
> happens each time I run this script; and the panics always happen
> during the first iteration of the loop.

Heh, I don't get it.  The script doesn't do anything cgroup specific.
Can you please apply the attached patch, reproduce the issue and
report the kernel log?

Thanks.

-- 
tejun

[-- Attachment #2: dbg --]
[-- Type: text/plain, Size: 1137 bytes --]

---
 kernel/cgroup.c |   21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -779,6 +779,22 @@ static inline void get_css_set(struct cs
 	atomic_inc(&cset->refcount);
 }
 
+static void dump_cset(struct css_set *cset)
+{
+	struct cgrp_cset_link *link;
+
+	printk("XXX dumping cset %p\n", cset);
+	list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
+		struct cgroup *cgrp = link->cgrp;
+		struct cgroup_root *root = cgrp->root;
+
+		printk("root %d:0x%04x:%s ",
+		       root->hierarchy_id, root->subsys_mask, root->name);
+		pr_cont_cgroup_path(cgrp);
+		pr_cont("\n");
+	}
+}
+
 /**
  * compare_css_sets - helper function for find_existing_css_set().
  * @cset: candidate css_set being tested
@@ -831,7 +847,10 @@ static bool compare_css_sets(struct css_
 		cgrp1 = link1->cgrp;
 		cgrp2 = link2->cgrp;
 		/* Hierarchies should be linked in the same order. */
-		BUG_ON(cgrp1->root != cgrp2->root);
+		if (WARN_ON(cgrp1->root != cgrp2->root)) {
+			dump_cset(cset);
+			dump_cset(old_cset);
+		}
 
 		/*
 		 * If this hierarchy is the hierarchy of the cgroup

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [OOPS] BUG_ON in cgroups on 4.4.0-rc5-next
@ 2015-12-23 16:54       ` tj-DgEjT+Ai2ygdnm+yROfE0A
  0 siblings, 0 replies; 7+ messages in thread
From: tj-DgEjT+Ai2ygdnm+yROfE0A @ 2015-12-23 16:54 UTC (permalink / raw)
  To: Alex Ng (LIS)
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 756 bytes --]

Hello, Alex.

On Tue, Dec 22, 2015 at 07:06:41PM +0000, Alex Ng (LIS) wrote:
> > Can you please let me know the steps to reproduce the bug?
> 
> I tried this on a Hyper-V VM hosted in Windows Server 2012R2 and ran
> the attached script.  The script clones the linux-next tree in a
> random directory under /tmp in a tight loop.
>
> This panic is not always reproducible, and I have only hit it once
> after running the script about 10 times. A different kernel panic
> happens each time I run this script; and the panics always happen
> during the first iteration of the loop.

Heh, I don't get it.  The script doesn't do anything cgroup specific.
Can you please apply the attached patch, reproduce the issue and
report the kernel log?

Thanks.

-- 
tejun

[-- Attachment #2: dbg --]
[-- Type: text/plain, Size: 1137 bytes --]

---
 kernel/cgroup.c |   21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -779,6 +779,22 @@ static inline void get_css_set(struct cs
 	atomic_inc(&cset->refcount);
 }
 
+static void dump_cset(struct css_set *cset)
+{
+	struct cgrp_cset_link *link;
+
+	printk("XXX dumping cset %p\n", cset);
+	list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
+		struct cgroup *cgrp = link->cgrp;
+		struct cgroup_root *root = cgrp->root;
+
+		printk("root %d:0x%04x:%s ",
+		       root->hierarchy_id, root->subsys_mask, root->name);
+		pr_cont_cgroup_path(cgrp);
+		pr_cont("\n");
+	}
+}
+
 /**
  * compare_css_sets - helper function for find_existing_css_set().
  * @cset: candidate css_set being tested
@@ -831,7 +847,10 @@ static bool compare_css_sets(struct css_
 		cgrp1 = link1->cgrp;
 		cgrp2 = link2->cgrp;
 		/* Hierarchies should be linked in the same order. */
-		BUG_ON(cgrp1->root != cgrp2->root);
+		if (WARN_ON(cgrp1->root != cgrp2->root)) {
+			dump_cset(cset);
+			dump_cset(old_cset);
+		}
 
 		/*
 		 * If this hierarchy is the hierarchy of the cgroup

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-12-23 16:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-18 20:08 [OOPS] BUG_ON in cgroups on 4.4.0-rc5-next Alex Ng (LIS)
2015-12-18 20:08 ` Alex Ng (LIS)
2015-12-21 21:56 ` tj
2015-12-21 21:56   ` tj-DgEjT+Ai2ygdnm+yROfE0A
2015-12-22 19:06   ` Alex Ng (LIS)
2015-12-23 16:54     ` tj
2015-12-23 16:54       ` tj-DgEjT+Ai2ygdnm+yROfE0A

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.