* [PATCH] cgroup: fix cgroup_path() vs rename() race
@ 2013-01-25 7:09 Li Zefan
2013-01-25 16:42 ` Tejun Heo
2013-02-08 18:46 ` Sasha Levin
0 siblings, 2 replies; 5+ messages in thread
From: Li Zefan @ 2013-01-25 7:09 UTC (permalink / raw)
To: Tejun Heo; +Cc: Al Viro, LKML, Cgroups
rename() will change dentry->d_name. The result of this race can
be worse than seeing partially rewritten name, but we might access
a stale pointer because rename() will re-allocate memory to hold
a longer name.
Use dentry_path_raw(), and this vfs API will take care of lockings.
Signed-off-by: Li Zefan <lizefan@huawei.com>
---
kernel/cgroup.c | 22 +++-------------------
1 file changed, 3 insertions(+), 19 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5d4c92e..776ff75 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1792,26 +1792,10 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
return 0;
}
- start = buf + buflen - 1;
+ start = dentry_path_raw(dentry, buf, buflen);
+ if (IS_ERR(start))
+ return PTR_ERR(start);
- *start = '\0';
- for (;;) {
- int len = dentry->d_name.len;
-
- if ((start -= len) < buf)
- return -ENAMETOOLONG;
- memcpy(start, dentry->d_name.name, len);
- cgrp = cgrp->parent;
- if (!cgrp)
- break;
-
- dentry = cgrp->dentry;
- if (!cgrp->parent)
- continue;
- if (--start < buf)
- return -ENAMETOOLONG;
- *start = '/';
- }
memmove(buf, start, buf + buflen - start);
return 0;
}
--
1.8.0.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] cgroup: fix cgroup_path() vs rename() race
2013-01-25 7:09 [PATCH] cgroup: fix cgroup_path() vs rename() race Li Zefan
@ 2013-01-25 16:42 ` Tejun Heo
2013-01-26 0:20 ` Li Zefan
2013-02-08 18:46 ` Sasha Levin
1 sibling, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2013-01-25 16:42 UTC (permalink / raw)
To: Li Zefan; +Cc: Al Viro, LKML, Cgroups
On Fri, Jan 25, 2013 at 03:09:59PM +0800, Li Zefan wrote:
> rename() will change dentry->d_name. The result of this race can
> be worse than seeing partially rewritten name, but we might access
> a stale pointer because rename() will re-allocate memory to hold
> a longer name.
>
> Use dentry_path_raw(), and this vfs API will take care of lockings.
>
> Signed-off-by: Li Zefan <lizefan@huawei.com>
Urgh... why do we even support rename? :(
Applying to cgroup/for-3.8-fixes w/ stable cc'd.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] cgroup: fix cgroup_path() vs rename() race
2013-01-25 16:42 ` Tejun Heo
@ 2013-01-26 0:20 ` Li Zefan
0 siblings, 0 replies; 5+ messages in thread
From: Li Zefan @ 2013-01-26 0:20 UTC (permalink / raw)
To: Tejun Heo; +Cc: Al Viro, LKML, Cgroups
On 2013/1/26 0:42, Tejun Heo wrote:
> On Fri, Jan 25, 2013 at 03:09:59PM +0800, Li Zefan wrote:
>> rename() will change dentry->d_name. The result of this race can
>> be worse than seeing partially rewritten name, but we might access
>> a stale pointer because rename() will re-allocate memory to hold
>> a longer name.
>>
>> Use dentry_path_raw(), and this vfs API will take care of lockings.
>>
>> Signed-off-by: Li Zefan <lizefan@huawei.com>
>
> Urgh... why do we even support rename? :(
>
Added in this commit many years ago:
commit 18a19cb3047e454ee5ecbc35d7acf3f8e09e0466
Author: Paul Jackson <pj@sgi.com>
Date: Sun Oct 30 15:02:31 2005 -0800
[PATCH] cpusets: simple rename
Add support for renaming cpusets. Only allow simple rename of cpuset
directories in place. Don't allow moving cpusets elsewhere in hierarchy or
renaming the special cpuset files in each cpuset directory.
The usefulness of this simple rename became apparent when developing task
migration facilities. It allows building a second cpuset hierarchy using
new names and containing new CPUs and Memory Nodes, moving tasks from the
old to the new cpusets, removing the old cpusets, and then renaming the new
cpusets to be just like the old names, so that any knowledge that the tasks
had of their cpuset names will still be valid.
Leaf node cpusets can be migrated to other CPUs or Memory Nodes by just
updating their 'cpus' and 'mems' files, but because no cpuset can contain
CPUs or Nodes not in its parent cpuset, one cannot do this in a cpuset
hierarchy without first expanding all the non-leaf cpusets to contain the
union of both the old and new CPUs and Nodes, which would obfuscate the
one-to-one migration of a task from one cpuset to another required to
correctly migrate the physical page frames currently allocated to that
task.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] cgroup: fix cgroup_path() vs rename() race
2013-01-25 7:09 [PATCH] cgroup: fix cgroup_path() vs rename() race Li Zefan
2013-01-25 16:42 ` Tejun Heo
@ 2013-02-08 18:46 ` Sasha Levin
2013-02-16 7:59 ` Li Zefan
1 sibling, 1 reply; 5+ messages in thread
From: Sasha Levin @ 2013-02-08 18:46 UTC (permalink / raw)
To: Li Zefan; +Cc: Tejun Heo, Al Viro, LKML, Cgroups
On 01/25/2013 02:09 AM, Li Zefan wrote:
> rename() will change dentry->d_name. The result of this race can
> be worse than seeing partially rewritten name, but we might access
> a stale pointer because rename() will re-allocate memory to hold
> a longer name.
>
> Use dentry_path_raw(), and this vfs API will take care of lockings.
>
> Signed-off-by: Li Zefan <lizefan@huawei.com>
Hi Li,
I was fuzzing with trinity inside a KVM tools guest, and stumbled on
a lockdep spew related to this patch.
Here's the spew (brace yourself):
[ 313.262599] ======================================================
[ 313.271340] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
[ 313.277542] 3.8.0-rc6-next-20130208-sasha-00028-ge4e162d #278 Tainted: G W
[ 313.277542] ------------------------------------------------------
[ 313.277542] kworker/u:3/4490 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 313.277542] (rename_lock){+.+...}, at: [<ffffffff812a11f9>] dentry_path_raw+0x29/0x70
[ 313.277542]
[ 313.277542] and this task is already holding:
[ 313.277542] (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffff819e78f3>] put_io_context_active+0x63/0x100
[ 313.277542] which would create a new lock dependency:
[ 313.277542] (&(&q->__queue_lock)->rlock){-.-...} -> (rename_lock){+.+...}
[ 313.277542]
[ 313.277542] but this new dependency connects a HARDIRQ-irq-safe lock:
[ 313.277542] (&(&q->__queue_lock)->rlock){-.-...}
... which became HARDIRQ-irq-safe at:
[ 313.277542] [<ffffffff8118452e>] mark_irqflags+0x7e/0x1a0
[ 313.277542] [<ffffffff81186fce>] __lock_acquire+0x87e/0xb00
[ 313.277542] [<ffffffff81187b3a>] lock_acquire+0x1ca/0x270
[ 313.277542] [<ffffffff83d74749>] _raw_spin_lock_irqsave+0x79/0xc0
[ 313.277542] [<ffffffff81f4a1b1>] virtblk_done+0x51/0x2d0
[ 313.277542] [<ffffffff81bf1f34>] vring_interrupt+0x94/0xc0
[ 313.277542] [<ffffffff811c7909>] handle_irq_event_percpu+0x139/0x420
[ 313.277542] [<ffffffff811c7c33>] handle_irq_event+0x43/0x70
[ 313.277542] [<ffffffff811ca638>] handle_edge_irq+0xe8/0x120
[ 313.277542] [<ffffffff810706b4>] handle_irq+0x164/0x190
[ 313.277542] [<ffffffff810701e5>] do_IRQ+0x55/0xd0
[ 313.277542] [<ffffffff83d74af2>] ret_from_intr+0x0/0x1a
[ 313.277542] [<ffffffff811db968>] rcu_note_context_switch+0xc8/0x1e0
[ 313.277542] [<ffffffff83d72363>] __schedule+0x53/0x3b0
[ 313.277542] [<ffffffff83d72825>] schedule+0x55/0x60
[ 313.277542] [<ffffffff83d728ba>] io_schedule+0x8a/0xd0
[ 313.277542] [<ffffffff8120f7f9>] sleep_on_page+0x9/0x10
[ 313.277542] [<ffffffff83d70db8>] __wait_on_bit_lock+0x48/0xa0
[ 313.277542] [<ffffffff8120f7e2>] __lock_page+0x62/0x70
[ 313.277542] [<ffffffff81210968>] do_read_cache_page+0xf8/0x190
[ 313.277542] [<ffffffff81210a47>] read_cache_page_async+0x17/0x20
[ 313.277542] [<ffffffff81210a59>] read_cache_page+0x9/0x20
[ 313.277542] [<ffffffff819efb3b>] read_dev_sector+0x2b/0x90
[ 313.277542] [<ffffffff819f1812>] adfspart_check_ICS+0x52/0x2a0
[ 313.277542] [<ffffffff819f0d45>] check_partition+0xf5/0x1f0
[ 313.277542] [<ffffffff819f09ad>] rescan_partitions+0x8d/0x2b0
[ 313.277542] [<ffffffff812ca178>] __blkdev_get+0x1d8/0x470
[ 313.277542] [<ffffffff812ca605>] blkdev_get+0x1f5/0x200
[ 313.277542] [<ffffffff819ed832>] register_disk+0x102/0x170
[ 313.277542] [<ffffffff819ee034>] add_disk+0xf4/0x1f0
[ 313.277542] [<ffffffff81f4b719>] virtblk_probe+0x5b9/0x700
[ 313.277542] [<ffffffff81bf19cb>] virtio_dev_probe+0xeb/0x160
[ 313.277542] [<ffffffff81ef9f0f>] really_probe+0x10f/0x2e0
[ 313.277542] [<ffffffff81efa28b>] driver_probe_device+0x7b/0xa0
[ 313.277542] [<ffffffff81efa313>] __driver_attach+0x63/0xa0
[ 313.277542] [<ffffffff81ef8269>] bus_for_each_dev+0x59/0x90
[ 313.277542] [<ffffffff81ef9b39>] driver_attach+0x19/0x20
[ 313.277542] [<ffffffff81ef9513>] bus_add_driver+0xf3/0x270
[ 313.277542] [<ffffffff81efa978>] driver_register+0xa8/0x150
[ 313.277542] [<ffffffff81bf1c0d>] register_virtio_driver+0x2d/0x30
[ 313.277542] [<ffffffff861c0d74>] init+0x59/0x83
[ 313.277542] [<ffffffff810020ca>] do_one_initcall+0x8a/0x180
[ 313.277542] [<ffffffff8616df6a>] do_basic_setup+0x96/0xb4
[ 313.277542] [<ffffffff8616e05a>] kernel_init_freeable+0xd2/0x14c
[ 313.277542] [<ffffffff83cca109>] kernel_init+0x9/0xf0
[ 313.277542] [<ffffffff83d7543c>] ret_from_fork+0x7c/0xb0
[ 313.277542]
[ 313.277542] to a HARDIRQ-irq-unsafe lock:
[ 313.277542] (rename_lock){+.+...}
... which became HARDIRQ-irq-unsafe at:
[ 313.277542] ... [<ffffffff811845c0>] mark_irqflags+0x110/0x1a0
[ 313.277542] [<ffffffff81186fce>] __lock_acquire+0x87e/0xb00
[ 313.277542] [<ffffffff81187b3a>] lock_acquire+0x1ca/0x270
[ 313.277542] [<ffffffff83d7396b>] _raw_spin_lock+0x3b/0x70
[ 313.277542] [<ffffffff812a4a21>] sys_getcwd+0xa1/0x200
[ 313.277542] [<ffffffff83d75718>] tracesys+0xe1/0xe6
[ 313.277542]
[ 313.277542] other info that might help us debug this:
[ 313.277542]
[ 313.277542] Possible interrupt unsafe locking scenario:
[ 313.277542]
[ 313.277542] CPU0 CPU1
[ 313.277542] ---- ----
[ 313.277542] lock(rename_lock);
[ 313.277542] local_irq_disable();
[ 313.277542] lock(&(&q->__queue_lock)->rlock);
[ 313.277542] lock(rename_lock);
[ 313.277542] <Interrupt>
[ 313.277542] lock(&(&q->__queue_lock)->rlock);
[ 313.277542]
[ 313.277542] *** DEADLOCK ***
[ 313.277542]
[ 313.277542] 3 locks held by kworker/u:3/4490:
[ 313.277542] #0: (&(&ioc->lock)->rlock/1){......}, at: [<ffffffff819e78be>] put_io_context_active+0x2e/0x100
[ 313.277542] #1: (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffff819e78f3>] put_io_context_active+0x63/0x100
[ 313.277542] #2: (rcu_read_lock){.+.+..}, at: [<ffffffff81a03def>] cfq_put_queue+0x4f/0x320
[ 313.277542]
the dependencies between HARDIRQ-irq-safe lock and the holding lock:
[ 313.277542] -> (&(&q->__queue_lock)->rlock){-.-...} ops: 1166 {
[ 313.277542] IN-HARDIRQ-W at:
[ 313.277542] [<ffffffff8118452e>] mark_irqflags+0x7e/0x1a0
[ 313.277542] [<ffffffff81186fce>] __lock_acquire+0x87e/0xb00
[ 313.277542] [<ffffffff81187b3a>] lock_acquire+0x1ca/0x270
[ 313.277542] [<ffffffff83d74749>] _raw_spin_lock_irqsave+0x79/0xc0
[ 313.277542] [<ffffffff81f4a1b1>] virtblk_done+0x51/0x2d0
[ 313.277542] [<ffffffff81bf1f34>] vring_interrupt+0x94/0xc0
[ 313.277542] [<ffffffff811c7909>] handle_irq_event_percpu+0x139/0x420
[ 313.277542] [<ffffffff811c7c33>] handle_irq_event+0x43/0x70
[ 313.277542] [<ffffffff811ca638>] handle_edge_irq+0xe8/0x120
[ 313.277542] [<ffffffff810706b4>] handle_irq+0x164/0x190
[ 313.277542] [<ffffffff810701e5>] do_IRQ+0x55/0xd0
[ 313.277542] [<ffffffff83d74af2>] ret_from_intr+0x0/0x1a
[ 313.277542] [<ffffffff811db968>] rcu_note_context_switch+0xc8/0x1e0
[ 313.277542] [<ffffffff83d72363>] __schedule+0x53/0x3b0
[ 313.277542] [<ffffffff83d72825>] schedule+0x55/0x60
[ 313.277542] [<ffffffff83d728ba>] io_schedule+0x8a/0xd0
[ 313.277542] [<ffffffff8120f7f9>] sleep_on_page+0x9/0x10
[ 313.277542] [<ffffffff83d70db8>] __wait_on_bit_lock+0x48/0xa0
[ 313.277542] [<ffffffff8120f7e2>] __lock_page+0x62/0x70
[ 313.277542] [<ffffffff81210968>] do_read_cache_page+0xf8/0x190
[ 313.277542] [<ffffffff81210a47>] read_cache_page_async+0x17/0x20
[ 313.277542] [<ffffffff81210a59>] read_cache_page+0x9/0x20
[ 313.277542] [<ffffffff819efb3b>] read_dev_sector+0x2b/0x90
[ 313.277542] [<ffffffff819f1812>] adfspart_check_ICS+0x52/0x2a0
[ 313.277542] [<ffffffff819f0d45>] check_partition+0xf5/0x1f0
[ 313.277542] [<ffffffff819f09ad>] rescan_partitions+0x8d/0x2b0
[ 313.277542] [<ffffffff812ca178>] __blkdev_get+0x1d8/0x470
[ 313.277542] [<ffffffff812ca605>] blkdev_get+0x1f5/0x200
[ 313.277542] [<ffffffff819ed832>] register_disk+0x102/0x170
[ 313.277542] [<ffffffff819ee034>] add_disk+0xf4/0x1f0
[ 313.277542] [<ffffffff81f4b719>] virtblk_probe+0x5b9/0x700
[ 313.277542] [<ffffffff81bf19cb>] virtio_dev_probe+0xeb/0x160
[ 313.277542] [<ffffffff81ef9f0f>] really_probe+0x10f/0x2e0
[ 313.277542] [<ffffffff81efa28b>] driver_probe_device+0x7b/0xa0
[ 313.277542] [<ffffffff81efa313>] __driver_attach+0x63/0xa0
[ 313.277542] [<ffffffff81ef8269>] bus_for_each_dev+0x59/0x90
[ 313.277542] [<ffffffff81ef9b39>] driver_attach+0x19/0x20
[ 313.277542] [<ffffffff81ef9513>] bus_add_driver+0xf3/0x270
[ 313.277542] [<ffffffff81efa978>] driver_register+0xa8/0x150
[ 313.277542] [<ffffffff81bf1c0d>] register_virtio_driver+0x2d/0x30
[ 313.277542] [<ffffffff861c0d74>] init+0x59/0x83
[ 313.277542] [<ffffffff810020ca>] do_one_initcall+0x8a/0x180
[ 313.277542] [<ffffffff8616df6a>] do_basic_setup+0x96/0xb4
[ 313.277542] [<ffffffff8616e05a>] kernel_init_freeable+0xd2/0x14c
[ 313.277542] [<ffffffff83cca109>] kernel_init+0x9/0xf0
[ 313.277542] [<ffffffff83d7543c>] ret_from_fork+0x7c/0xb0
[ 313.277542] IN-SOFTIRQ-W at:
[ 313.277542] [<ffffffff81184560>] mark_irqflags+0xb0/0x1a0
[ 313.277542] [<ffffffff81186fce>] __lock_acquire+0x87e/0xb00
[ 313.277542] [<ffffffff81187b3a>] lock_acquire+0x1ca/0x270
[ 313.277542] [<ffffffff83d7396b>] _raw_spin_lock+0x3b/0x70
[ 313.277542] [<ffffffff82025925>] scsi_device_unbusy+0x95/0xd0
[ 313.277542] [<ffffffff8201c172>] scsi_finish_command+0x32/0x110
[ 313.277542] [<ffffffff82025bc5>] scsi_softirq_done+0x135/0x150
[ 313.277542] [<ffffffff819e9b84>] blk_done_softirq+0xb4/0xd0
[ 313.277542] [<ffffffff81117315>] __do_softirq+0x1e5/0x490
[ 313.277542] [<ffffffff811175fd>] run_ksoftirqd+0x3d/0xa0
[ 313.277542] [<ffffffff8114815e>] smpboot_thread_fn+0x2ae/0x2c0
[ 313.277542] [<ffffffff8113dcc3>] kthread+0xe3/0xf0
[ 313.277542] [<ffffffff83d7543c>] ret_from_fork+0x7c/0xb0
[ 313.277542] INITIAL USE at:
[ 313.277542] [<ffffffff81186ff4>] __lock_acquire+0x8a4/0xb00
[ 313.277542] [<ffffffff81187b3a>] lock_acquire+0x1ca/0x270
[ 313.277542] [<ffffffff83d743e9>] _raw_spin_lock_irq+0x59/0x90
[ 313.277542] [<ffffffff819dfa36>] blk_queue_bypass_start+0x16/0xc0
[ 313.277542] [<ffffffff819fbbb8>] blkcg_activate_policy+0x68/0x3d0
[ 313.277542] [<ffffffff819fe4a5>] blk_throtl_init+0xf5/0x130
[ 313.277542] [<ffffffff819fbfb7>] blkcg_init_queue+0x27/0x30
[ 313.277542] [<ffffffff819de428>] blk_alloc_queue_node+0x278/0x2b0
[ 313.277542] [<ffffffff819e3a86>] blk_init_queue_node+0x26/0x70
[ 313.277542] [<ffffffff819e3ade>] blk_init_queue+0xe/0x10
[ 313.277542] [<ffffffff861bfa80>] do_floppy_init+0xaf/0x6a4
[ 313.277542] [<ffffffff861c007e>] floppy_async_init+0x9/0xb
[ 313.277542] [<ffffffff811468fb>] async_run_entry_fn+0x6b/0x130
[ 313.277542] [<ffffffff81132b08>] process_one_work+0x388/0x670
[ 313.277542] [<ffffffff811337df>] worker_thread+0x1df/0x2e0
[ 313.277542] [<ffffffff8113dcc3>] kthread+0xe3/0xf0
[ 313.277542] [<ffffffff83d7543c>] ret_from_fork+0x7c/0xb0
[ 313.277542] }
[ 313.277542] ... key at: [<ffffffff86f022d0>] __key.38539+0x0/0x8
[ 313.277542] ... acquired at:
[ 313.277542] [<ffffffff81182f5d>] check_usage+0x1bd/0x1e0
[ 313.277542] [<ffffffff81182fea>] check_irq_usage+0x6a/0xe0
[ 313.277542] [<ffffffff811831ab>] check_prev_add+0x14b/0x640
[ 313.277542] [<ffffffff8118375a>] check_prevs_add+0xba/0x1a0
[ 313.277542] [<ffffffff81183ee0>] validate_chain.isra.21+0x6a0/0x7b0
[ 313.277542] [<ffffffff81187163>] __lock_acquire+0xa13/0xb00
[ 313.277542] [<ffffffff81187b3a>] lock_acquire+0x1ca/0x270
[ 313.277542] [<ffffffff83d7396b>] _raw_spin_lock+0x3b/0x70
[ 313.277542] [<ffffffff812a11f9>] dentry_path_raw+0x29/0x70
[ 313.277542] [<ffffffff811a3e2e>] cgroup_path+0xbe/0xf0
[ 313.277542] [<ffffffff81a03e71>] cfq_put_queue+0xd1/0x320
[ 313.277542] [<ffffffff81a047d8>] cfq_exit_cfqq+0x58/0x70
[ 313.277542] [<ffffffff81a04838>] cfq_exit_icq+0x48/0x60
[ 313.277542] [<ffffffff819e7919>] put_io_context_active+0x89/0x100
[ 313.277542] [<ffffffff819e79de>] exit_io_context+0x4e/0x60
[ 313.277542] [<ffffffff811147fe>] do_exit+0x4be/0x590
[ 313.277542] [<ffffffff8113dccb>] kthread+0xeb/0xf0
[ 313.277542] [<ffffffff83d7543c>] ret_from_fork+0x7c/0xb0
[ 313.277542]
[ 313.277542]
the dependencies between the lock to be acquired and HARDIRQ-irq-unsafe lock:
[ 313.277542] -> (rename_lock){+.+...} ops: 2972 {
[ 313.277542] HARDIRQ-ON-W at:
[ 313.277542] [<ffffffff811845c0>] mark_irqflags+0x110/0x1a0
[ 313.277542] [<ffffffff81186fce>] __lock_acquire+0x87e/0xb00
[ 313.277542] [<ffffffff81187b3a>] lock_acquire+0x1ca/0x270
[ 313.277542] [<ffffffff83d7396b>] _raw_spin_lock+0x3b/0x70
[ 313.277542] [<ffffffff812a4a21>] sys_getcwd+0xa1/0x200
[ 313.277542] [<ffffffff83d75718>] tracesys+0xe1/0xe6
[ 313.277542] SOFTIRQ-ON-W at:
[ 313.277542] [<ffffffff811845e3>] mark_irqflags+0x133/0x1a0
[ 313.277542] [<ffffffff81186fce>] __lock_acquire+0x87e/0xb00
[ 313.277542] [<ffffffff81187b3a>] lock_acquire+0x1ca/0x270
[ 313.277542] [<ffffffff83d7396b>] _raw_spin_lock+0x3b/0x70
[ 313.277542] [<ffffffff812a4a21>] sys_getcwd+0xa1/0x200
[ 313.277542] [<ffffffff83d75718>] tracesys+0xe1/0xe6
[ 313.277542] INITIAL USE at:
[ 313.277542] [<ffffffff81186ff4>] __lock_acquire+0x8a4/0xb00
[ 313.277542] [<ffffffff81187b3a>] lock_acquire+0x1ca/0x270
[ 313.277542] [<ffffffff83d7396b>] _raw_spin_lock+0x3b/0x70
[ 313.277542] [<ffffffff812a4a21>] sys_getcwd+0xa1/0x200
[ 313.277542] [<ffffffff83d75718>] tracesys+0xe1/0xe6
[ 313.277542] }
[ 313.277542] ... key at: [<ffffffff8540e020>] rename_lock+0x20/0x1000
[ 313.277542] ... acquired at:
[ 313.277542] [<ffffffff81182f5d>] check_usage+0x1bd/0x1e0
[ 313.277542] [<ffffffff81182fea>] check_irq_usage+0x6a/0xe0
[ 313.277542] [<ffffffff811831ab>] check_prev_add+0x14b/0x640
[ 313.277542] [<ffffffff8118375a>] check_prevs_add+0xba/0x1a0
[ 313.277542] [<ffffffff81183ee0>] validate_chain.isra.21+0x6a0/0x7b0
[ 313.277542] [<ffffffff81187163>] __lock_acquire+0xa13/0xb00
[ 313.277542] [<ffffffff81187b3a>] lock_acquire+0x1ca/0x270
[ 313.277542] [<ffffffff83d7396b>] _raw_spin_lock+0x3b/0x70
[ 313.277542] [<ffffffff812a11f9>] dentry_path_raw+0x29/0x70
[ 313.277542] [<ffffffff811a3e2e>] cgroup_path+0xbe/0xf0
[ 313.277542] [<ffffffff81a03e71>] cfq_put_queue+0xd1/0x320
[ 313.277542] [<ffffffff81a047d8>] cfq_exit_cfqq+0x58/0x70
[ 313.277542] [<ffffffff81a04838>] cfq_exit_icq+0x48/0x60
[ 313.277542] [<ffffffff819e7919>] put_io_context_active+0x89/0x100
[ 313.277542] [<ffffffff819e79de>] exit_io_context+0x4e/0x60
[ 313.277542] [<ffffffff811147fe>] do_exit+0x4be/0x590
[ 313.277542] [<ffffffff8113dccb>] kthread+0xeb/0xf0
[ 313.277542] [<ffffffff83d7543c>] ret_from_fork+0x7c/0xb0
[ 313.277542]
[ 313.277542]
[ 313.277542] stack backtrace:
[ 313.277542] Pid: 4490, comm: kworker/u:3 Tainted: G W 3.8.0-rc6-next-20130208-sasha-00028-ge4e162d #278
[ 313.277542] Call Trace:
[ 313.277542] [<ffffffff81182d7a>] print_bad_irq_dependency+0x2ea/0x310
[ 313.277542] [<ffffffff81182f5d>] check_usage+0x1bd/0x1e0
[ 313.277542] [<ffffffff81156e65>] ? sched_clock_local+0x25/0x90
[ 313.277542] [<ffffffff81182fea>] check_irq_usage+0x6a/0xe0
[ 313.277542] [<ffffffff811831ab>] check_prev_add+0x14b/0x640
[ 313.277542] [<ffffffff810a42b8>] ? kvm_clock_read+0x38/0x70
[ 313.277542] [<ffffffff8118375a>] check_prevs_add+0xba/0x1a0
[ 313.277542] [<ffffffff810774a5>] ? sched_clock+0x15/0x20
[ 313.277542] [<ffffffff81183ee0>] validate_chain.isra.21+0x6a0/0x7b0
[ 313.277542] [<ffffffff81187163>] __lock_acquire+0xa13/0xb00
[ 313.277542] [<ffffffff81156e65>] ? sched_clock_local+0x25/0x90
[ 313.277542] [<ffffffff81157098>] ? sched_clock_cpu+0x108/0x120
[ 313.277542] [<ffffffff81187b3a>] lock_acquire+0x1ca/0x270
[ 313.277542] [<ffffffff812a11f9>] ? dentry_path_raw+0x29/0x70
[ 313.277542] [<ffffffff810774a5>] ? sched_clock+0x15/0x20
[ 313.277542] [<ffffffff83d7396b>] _raw_spin_lock+0x3b/0x70
[ 313.277542] [<ffffffff812a11f9>] ? dentry_path_raw+0x29/0x70
[ 313.277542] [<ffffffff812a11f9>] dentry_path_raw+0x29/0x70
[ 313.277542] [<ffffffff811a3e2e>] cgroup_path+0xbe/0xf0
[ 313.277542] [<ffffffff81a03e71>] cfq_put_queue+0xd1/0x320
[ 313.277542] [<ffffffff81a03def>] ? cfq_put_queue+0x4f/0x320
[ 313.277542] [<ffffffff819e78f3>] ? put_io_context_active+0x63/0x100
[ 313.277542] [<ffffffff819e78be>] ? put_io_context_active+0x2e/0x100
[ 313.277542] [<ffffffff81a047d8>] cfq_exit_cfqq+0x58/0x70
[ 313.277542] [<ffffffff81a04838>] cfq_exit_icq+0x48/0x60
[ 313.277542] [<ffffffff819e7919>] put_io_context_active+0x89/0x100
[ 313.277542] [<ffffffff819e79de>] exit_io_context+0x4e/0x60
[ 313.277542] [<ffffffff811147fe>] do_exit+0x4be/0x590
[ 313.277542] [<ffffffff81133600>] ? manage_workers+0x110/0x110
[ 313.277542] [<ffffffff8113dccb>] kthread+0xeb/0xf0
[ 313.277542] [<ffffffff83d721f6>] ? wait_for_common+0x106/0x170
[ 313.277542] [<ffffffff8113dbe0>] ? flush_kthread_worker+0x190/0x190
[ 313.277542] [<ffffffff83d7543c>] ret_from_fork+0x7c/0xb0
[ 313.277542] [<ffffffff8113dbe0>] ? flush_kthread_worker+0x190/0x190
Thanks,
Sasha
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] cgroup: fix cgroup_path() vs rename() race
2013-02-08 18:46 ` Sasha Levin
@ 2013-02-16 7:59 ` Li Zefan
0 siblings, 0 replies; 5+ messages in thread
From: Li Zefan @ 2013-02-16 7:59 UTC (permalink / raw)
To: Sasha Levin; +Cc: Tejun Heo, Al Viro, LKML, Cgroups
(sorry for the late reply, just came back from holiday)
On 2013/2/9 2:46, Sasha Levin wrote:
> On 01/25/2013 02:09 AM, Li Zefan wrote:
>> rename() will change dentry->d_name. The result of this race can
>> be worse than seeing partially rewritten name, but we might access
>> a stale pointer because rename() will re-allocate memory to hold
>> a longer name.
>>
>> Use dentry_path_raw(), and this vfs API will take care of lockings.
>>
>> Signed-off-by: Li Zefan <lizefan@huawei.com>
>
> Hi Li,
>
> I was fuzzing with trinity inside a KVM tools guest, and stumbled on
> a lockdep spew related to this patch.
>
> Here's the spew (brace yourself):
>
dentry_path_raw() will grab rename_lock and dentry->d_lock without disabling
irq, which means cgroup_path() can't be called if the caller has already held
a spinlock with irq disabled.
Both blkio cgroup and cpu cgroup have this lock issue...The only fix is to
make a copy of dentry->d_name and save it in cgrp->name.
Patch will be followed.
> [ 313.262599] ======================================================
> [ 313.271340] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
> [ 313.277542] 3.8.0-rc6-next-20130208-sasha-00028-ge4e162d #278 Tainted: G W
> [ 313.277542] ------------------------------------------------------
> [ 313.277542] kworker/u:3/4490 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> [ 313.277542] (rename_lock){+.+...}, at: [<ffffffff812a11f9>] dentry_path_raw+0x29/0x70
> [ 313.277542]
> [ 313.277542] and this task is already holding:
> [ 313.277542] (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffff819e78f3>] put_io_context_active+0x63/0x100
> [ 313.277542] which would create a new lock dependency:
> [ 313.277542] (&(&q->__queue_lock)->rlock){-.-...} -> (rename_lock){+.+...}
> [ 313.277542]
> [ 313.277542] but this new dependency connects a HARDIRQ-irq-safe lock:
> [ 313.277542] (&(&q->__queue_lock)->rlock){-.-...}
> ... which became HARDIRQ-irq-safe at:
>
...
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-02-16 8:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-25 7:09 [PATCH] cgroup: fix cgroup_path() vs rename() race Li Zefan
2013-01-25 16:42 ` Tejun Heo
2013-01-26 0:20 ` Li Zefan
2013-02-08 18:46 ` Sasha Levin
2013-02-16 7:59 ` Li Zefan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).