* Re: hit a KASan bug related to Perf during stress test [not found] <318B87A793BE164187D8851D6CE09D64371C8811@shsmsx102.ccr.corp.intel.com> @ 2016-10-24 9:53 ` Peter Zijlstra 2016-10-24 11:15 ` Oleg Nesterov 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2016-10-24 9:53 UTC (permalink / raw) To: Ni, BaoleX Cc: mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng, Oleg Nesterov On Mon, Oct 24, 2016 at 09:35:46AM +0000, Ni, BaoleX wrote: > > [32736.018823] BUG: KASan: use after free in task_tgid_nr_ns+0x35/0xb0 at addr ffff8800265568c0 > [32736.028309] Read of size 8 by task dumpsys/11268 > [32736.033511] ============================================================================= > [32736.042700] BUG task_struct (Tainted: G W O): kasan: bad access detected 'W' this wasn't the first WARN you got, this means this might be the result of prior borkage. Also, it says: "BUG task_struct", does that mean task_struct was the object accessed after free? > [32736.051002] ----------------------------------------------------------------------------- > [32736.051002] > [32736.061840] Disabling lock debugging due to kernel taint > [32736.067830] INFO: Slab 0xffffea0000995400 objects=5 used=3 fp=0xffff880026550000 flags=0x4000000000004080 > [32736.078572] INFO: Object 0xffff880026556440 @offset=25664 fp=0x (null) > ... > [32738.776936] CPU: 0 PID: 11268 Comm: dumpsys Tainted: G B W O 3.14.70-x86_64-02260-g162539f #1 > [32738.787092] Hardware name: Insyde CherryTrail/T3 MRD, BIOS CHTMRD.A6.002.016 09/20/2016 > [32738.796082] ffff880026550000 0000000000000086 0000000000000000 ffff880065e05a70 > [32738.796215] ffffffff81fc9427 ffff880065803b40 ffff880026556440 ffff880065e05aa0 > [32738.796345] ffffffff8123fe2d ffff880065803b40 ffffea0000995400 ffff880026556440 > [32738.796475] Call Trace: > [32738.796510] <NMI> > [32738.796585] [<ffffffff81fc9427>] dump_stack+0x67/0x90 > [32738.802404] [<ffffffff8123fe2d>] print_trailer+0xfd/0x170 > [32738.808603] [<ffffffff81244f26>] object_err+0x36/0x40 > [32738.814417] [<ffffffff812467ed>] kasan_report_error+0x1fd/0x3d0 > [32738.821193] [<ffffffff81131b84>] ? __rcu_read_unlock+0x24/0x90 > [32738.827881] [<ffffffff81fe0888>] ? preempt_count_sub+0x18/0xf0 > [32738.834565] [<ffffffff811db32c>] ? perf_output_put_handle+0x5c/0x170 > [32738.841833] [<ffffffff81246e70>] kasan_report+0x40/0x50 > [32738.847838] [<ffffffff810d9975>] ? task_tgid_nr_ns+0x35/0xb0 > [32738.854327] [<ffffffff81245d59>] __asan_load8+0x69/0xa0 > [32738.860333] [<ffffffff811dba18>] ? perf_output_copy+0x88/0x120 > [32738.867020] [<ffffffff810d9975>] task_tgid_nr_ns+0x35/0xb0 So here we did: perf_event_[pt]id(event, current); How can _current_ not be valid anymore? > [32738.873319] [<ffffffff811cd5d8>] __perf_event_header__init_id+0xb8/0x200 > [32738.880970] [<ffffffff811d6f19>] perf_prepare_sample+0xa9/0x4a0 > [32738.887754] [<ffffffff811d7700>] __perf_event_overflow+0x3f0/0x460 > [32738.894835] [<ffffffff81022998>] ? x86_perf_event_set_period+0x128/0x210 > [32738.902496] [<ffffffff811d8494>] perf_event_overflow+0x14/0x20 > [32738.909180] [<ffffffff8102cabc>] intel_pmu_handle_irq+0x25c/0x520 > [32738.916156] [<ffffffff81245945>] ? __asan_store8+0x15/0xa0 > [32738.922460] [<ffffffff81fddb8b>] perf_event_nmi_handler+0x2b/0x50 > [32738.929437] [<ffffffff81fdd4a8>] nmi_handle+0x88/0x230 > [32738.935346] [<ffffffff81009873>] do_nmi+0x193/0x490 > [32738.940963] [<ffffffff81fdc6d6>] end_repeat_nmi+0x1a/0x1e > [32738.947163] [<ffffffff81245d22>] ? __asan_load8+0x32/0xa0 > [32738.953358] [<ffffffff81245d22>] ? __asan_load8+0x32/0xa0 > [32738.959554] [<ffffffff81245d22>] ? __asan_load8+0x32/0xa0 > [32738.965718] <<EOE>> > [32738.965787] [<ffffffff811065a2>] ? check_preempt_wakeup+0x1a2/0x3a0 > [32738.972970] [<ffffffff810f4618>] check_preempt_curr+0xf8/0x120 > [32738.979658] [<ffffffff810f465d>] ttwu_do_wakeup+0x1d/0x1b0 > [32738.985953] [<ffffffff810f4909>] ttwu_do_activate.constprop.105+0x89/0x90 > [32738.993710] [<ffffffff810f87fe>] try_to_wake_up+0x29e/0x4e0 > [32739.000100] [<ffffffff810f8aaf>] default_wake_function+0x2f/0x40 > [32739.006979] [<ffffffff81114338>] autoremove_wake_function+0x18/0x50 > [32739.014149] [<ffffffff81fe0888>] ? preempt_count_sub+0x18/0xf0 > [32739.020836] [<ffffffff81113ab9>] __wake_up_common+0x79/0xb0 > [32739.027232] [<ffffffff81113d69>] __wake_up+0x39/0x50 > [32739.032945] [<ffffffff81135918>] __call_rcu_nocb_enqueue+0x158/0x160 > [32739.040207] [<ffffffff81135a4c>] __call_rcu+0x12c/0x450 And while we just called release_task(), that call_rcu() should still be pending at this point, also I don't think that can be current until after do_task_dead() where we schedule away from the dead task and change current. > [32739.046207] [<ffffffff81135dcd>] call_rcu+0x1d/0x20 > [32739.051821] [<ffffffff810ae2da>] release_task+0x6aa/0x8d0 > [32739.058022] [<ffffffff8111e86f>] ? do_raw_write_unlock+0x6f/0xd0 > [32739.064900] [<ffffffff810b1002>] do_exit+0xe52/0x1020 > [32739.070712] [<ffffffff810b1222>] SyS_exit+0x22/0x30 > [32739.076328] [<ffffffff81fe9063>] sysenter_dispatch+0x7/0x1f > [32739.082725] [<ffffffff8152f33b>] ? trace_hardirqs_on_thunk+0x3a/0x3c Oleg, any idea? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 9:53 ` hit a KASan bug related to Perf during stress test Peter Zijlstra @ 2016-10-24 11:15 ` Oleg Nesterov 2016-10-24 11:24 ` Peter Zijlstra ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Oleg Nesterov @ 2016-10-24 11:15 UTC (permalink / raw) To: Peter Zijlstra Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On 10/24, Peter Zijlstra wrote: > > > [32738.867020] [<ffffffff810d9975>] task_tgid_nr_ns+0x35/0xb0 > > So here we did: perf_event_[pt]id(event, current); > > How can _current_ not be valid anymore? ... > > [32739.040207] [<ffffffff81135a4c>] __call_rcu+0x12c/0x450 > > And while we just called release_task(), that call_rcu() should still be > pending at this point, Yes, current is still valid. But nothing protects current->group_leader or parent/real_parent, they can point to the exited/freed task. We really need to nullify them in __unhash_process() to catch the problems like this, I wanted to do this many times... So you simply can't know your tgid or even tid after release_task() calls __unhash_process(). Actually after exit_notify() unless the exiting task autoreaps itself. How about the trivial fix below? Oleg. --- x/kernel/events/core.c +++ x/kernel/events/core.c @@ -1257,7 +1257,7 @@ static u32 perf_event_pid(struct perf_ev if (event->parent) event = event->parent; - return task_tgid_nr_ns(p, event->ns); + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : 0; } static u32 perf_event_tid(struct perf_event *event, struct task_struct *p) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 11:15 ` Oleg Nesterov @ 2016-10-24 11:24 ` Peter Zijlstra 2016-10-24 12:02 ` Oleg Nesterov 2016-10-24 11:27 ` Peter Zijlstra 2016-10-24 12:11 ` Peter Zijlstra 2 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2016-10-24 11:24 UTC (permalink / raw) To: Oleg Nesterov Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On Mon, Oct 24, 2016 at 01:15:27PM +0200, Oleg Nesterov wrote: > On 10/24, Peter Zijlstra wrote: > > > > > [32738.867020] [<ffffffff810d9975>] task_tgid_nr_ns+0x35/0xb0 > > > > So here we did: perf_event_[pt]id(event, current); > > > > How can _current_ not be valid anymore? > > ... > > > > [32739.040207] [<ffffffff81135a4c>] __call_rcu+0x12c/0x450 > > > > And while we just called release_task(), that call_rcu() should still be > > pending at this point, > > Yes, current is still valid. > > But nothing protects current->group_leader or parent/real_parent, they > can point to the exited/freed task. We really need to nullify them in > __unhash_process() to catch the problems like this, I wanted to do this > many times... > > So you simply can't know your tgid or even tid after release_task() calls > __unhash_process(). Actually after exit_notify() unless the exiting task > autoreaps itself. > > How about the trivial fix below? > > Oleg. > > --- x/kernel/events/core.c > +++ x/kernel/events/core.c > @@ -1257,7 +1257,7 @@ static u32 perf_event_pid(struct perf_ev > if (event->parent) > event = event->parent; > > - return task_tgid_nr_ns(p, event->ns); > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : 0; > } Hurm.. should we not push this into task_tgid_nr_ns() ? I mean, now the user needs to be aware of this dinky detail. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 11:24 ` Peter Zijlstra @ 2016-10-24 12:02 ` Oleg Nesterov 2016-10-24 12:10 ` Oleg Nesterov 2016-10-24 12:19 ` Peter Zijlstra 0 siblings, 2 replies; 27+ messages in thread From: Oleg Nesterov @ 2016-10-24 12:02 UTC (permalink / raw) To: Peter Zijlstra Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On 10/24, Peter Zijlstra wrote: > > On Mon, Oct 24, 2016 at 01:15:27PM +0200, Oleg Nesterov wrote: > > > > --- x/kernel/events/core.c > > +++ x/kernel/events/core.c > > @@ -1257,7 +1257,7 @@ static u32 perf_event_pid(struct perf_ev > > if (event->parent) > > event = event->parent; > > > > - return task_tgid_nr_ns(p, event->ns); > > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : 0; > > } > > Hurm.. should we not push this into task_tgid_nr_ns() ? I mean, now the > user needs to be aware of this dinky detail. Perhaps. Or into task_tgid(). Or even the patch below, __task_pid_nr_ns() is always safe. This certainly needs some cleanups. Oleg. --- x/include/linux/pid.h +++ x/include/linux/pid.h @@ -8,7 +8,8 @@ enum pid_type PIDTYPE_PID, PIDTYPE_PGID, PIDTYPE_SID, - PIDTYPE_MAX + PIDTYPE_MAX, + PIDTYPE_TGID /* do not use */ }; /* --- x/kernel/pid.c +++ x/kernel/pid.c @@ -538,7 +538,7 @@ EXPORT_SYMBOL(__task_pid_nr_ns); pid_t task_tgid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns) { - return pid_nr_ns(task_tgid(tsk), ns); + return __task_pid_nr_ns(tsk, PIDTYPE_TGID, ns); } EXPORT_SYMBOL(task_tgid_nr_ns); ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 12:02 ` Oleg Nesterov @ 2016-10-24 12:10 ` Oleg Nesterov 2016-10-24 12:22 ` Peter Zijlstra 2016-10-24 12:19 ` Peter Zijlstra 1 sibling, 1 reply; 27+ messages in thread From: Oleg Nesterov @ 2016-10-24 12:10 UTC (permalink / raw) To: Peter Zijlstra Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On 10/24, Oleg Nesterov wrote: > > On 10/24, Peter Zijlstra wrote: > > > > On Mon, Oct 24, 2016 at 01:15:27PM +0200, Oleg Nesterov wrote: > > > > > > --- x/kernel/events/core.c > > > +++ x/kernel/events/core.c > > > @@ -1257,7 +1257,7 @@ static u32 perf_event_pid(struct perf_ev > > > if (event->parent) > > > event = event->parent; > > > > > > - return task_tgid_nr_ns(p, event->ns); > > > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : 0; > > > } > > > > Hurm.. should we not push this into task_tgid_nr_ns() ? I mean, now the > > user needs to be aware of this dinky detail. > > Perhaps. Or into task_tgid(). Or even the patch below, __task_pid_nr_ns() > is always safe. This certainly needs some cleanups. the patch was obviously incomplete. Oleg. --- x/include/linux/pid.h +++ x/include/linux/pid.h @@ -8,7 +8,8 @@ enum pid_type PIDTYPE_PID, PIDTYPE_PGID, PIDTYPE_SID, - PIDTYPE_MAX + PIDTYPE_MAX, + PIDTYPE_TGID /* do not use */ }; /* --- x/kernel/pid.c +++ x/kernel/pid.c @@ -526,8 +526,11 @@ pid_t __task_pid_nr_ns(struct task_struc if (!ns) ns = task_active_pid_ns(current); if (likely(pid_alive(task))) { - if (type != PIDTYPE_PID) + if (type != PIDTYPE_PID) { + if (type == PIDTYPE_TGID) + type = PIDTYPE_PID; task = task->group_leader; + } nr = pid_nr_ns(rcu_dereference(task->pids[type].pid), ns); } rcu_read_unlock(); @@ -538,7 +541,7 @@ EXPORT_SYMBOL(__task_pid_nr_ns); pid_t task_tgid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns) { - return pid_nr_ns(task_tgid(tsk), ns); + return __task_pid_nr_ns(tsk, PIDTYPE_TGID, ns); } EXPORT_SYMBOL(task_tgid_nr_ns); ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 12:10 ` Oleg Nesterov @ 2016-10-24 12:22 ` Peter Zijlstra 2016-10-24 12:29 ` Oleg Nesterov 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2016-10-24 12:22 UTC (permalink / raw) To: Oleg Nesterov Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On Mon, Oct 24, 2016 at 02:10:31PM +0200, Oleg Nesterov wrote: > --- x/kernel/pid.c > +++ x/kernel/pid.c > @@ -526,8 +526,11 @@ pid_t __task_pid_nr_ns(struct task_struc > if (!ns) > ns = task_active_pid_ns(current); > if (likely(pid_alive(task))) { > - if (type != PIDTYPE_PID) > + if (type != PIDTYPE_PID) { > + if (type == PIDTYPE_TGID) > + type = PIDTYPE_PID; > task = task->group_leader; > + } Aah, that makes much more sense ;-) > nr = pid_nr_ns(rcu_dereference(task->pids[type].pid), ns); > } > rcu_read_unlock(); Still, I wonder if returning 0 is the right thing. 0 is a 'valid' PID for the init/idle task. And we still have the re-use issue for the TID, because when we get here TID is already unhashed too afaict, it just doesn't explode because we don't deref freed memory. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 12:22 ` Peter Zijlstra @ 2016-10-24 12:29 ` Oleg Nesterov 2016-10-24 12:38 ` Peter Zijlstra 0 siblings, 1 reply; 27+ messages in thread From: Oleg Nesterov @ 2016-10-24 12:29 UTC (permalink / raw) To: Peter Zijlstra Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On 10/24, Peter Zijlstra wrote: > > On Mon, Oct 24, 2016 at 02:10:31PM +0200, Oleg Nesterov wrote: > > --- x/kernel/pid.c > > +++ x/kernel/pid.c > > @@ -526,8 +526,11 @@ pid_t __task_pid_nr_ns(struct task_struc > > if (!ns) > > ns = task_active_pid_ns(current); > > if (likely(pid_alive(task))) { > > - if (type != PIDTYPE_PID) > > + if (type != PIDTYPE_PID) { > > + if (type == PIDTYPE_TGID) > > + type = PIDTYPE_PID; > > task = task->group_leader; > > + } > > Aah, that makes much more sense ;-) > > > nr = pid_nr_ns(rcu_dereference(task->pids[type].pid), ns); > > } > > rcu_read_unlock(); > > > Still, I wonder if returning 0 is the right thing. 0 is a 'valid' PID > for the init/idle task. Yes, now I think that -1 would make more sense. Unfortunately we can't just change __task_pid_nr_ns(), it already has the users which assume it returns zero... attach_to_pi_state() for example. > And we still have the re-use issue for the TID, because when we get here > TID is already unhashed too afaict, Yes, so perf_event_tid() will report zero. Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 12:29 ` Oleg Nesterov @ 2016-10-24 12:38 ` Peter Zijlstra 2016-10-24 13:25 ` Oleg Nesterov 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2016-10-24 12:38 UTC (permalink / raw) To: Oleg Nesterov Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On Mon, Oct 24, 2016 at 02:29:42PM +0200, Oleg Nesterov wrote: > On 10/24, Peter Zijlstra wrote: > > > > On Mon, Oct 24, 2016 at 02:10:31PM +0200, Oleg Nesterov wrote: > > > --- x/kernel/pid.c > > > +++ x/kernel/pid.c > > > @@ -526,8 +526,11 @@ pid_t __task_pid_nr_ns(struct task_struc > > > if (!ns) > > > ns = task_active_pid_ns(current); > > > if (likely(pid_alive(task))) { > > > - if (type != PIDTYPE_PID) > > > + if (type != PIDTYPE_PID) { > > > + if (type == PIDTYPE_TGID) > > > + type = PIDTYPE_PID; > > > task = task->group_leader; > > > + } > > > > Aah, that makes much more sense ;-) > > > > > nr = pid_nr_ns(rcu_dereference(task->pids[type].pid), ns); > > > } > > > rcu_read_unlock(); > > > > > > Still, I wonder if returning 0 is the right thing. 0 is a 'valid' PID > > for the init/idle task. > > Yes, now I think that -1 would make more sense. Unfortunately we can't > just change __task_pid_nr_ns(), it already has the users which assume > it returns zero... attach_to_pi_state() for example. Indeed. And I have a patch that assumes task_pid_vnr(&init_task) == 0, is that true because of this !alive case or true in general? No worries though, we can revert to your earlier explicit test and return -1 while adding a comment to explain details? I'll go write one up in a bit, but I need to run an errand first. > > And we still have the re-use issue for the TID, because when we get here > > TID is already unhashed too afaict, > > Yes, so perf_event_tid() will report zero. Ah, ok. So whould we change that to match pid and return (explicit) -1 there too? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 12:38 ` Peter Zijlstra @ 2016-10-24 13:25 ` Oleg Nesterov 2016-10-24 13:40 ` Oleg Nesterov 2016-10-24 14:36 ` Peter Zijlstra 0 siblings, 2 replies; 27+ messages in thread From: Oleg Nesterov @ 2016-10-24 13:25 UTC (permalink / raw) To: Peter Zijlstra Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On 10/24, Peter Zijlstra wrote: > > On Mon, Oct 24, 2016 at 02:29:42PM +0200, Oleg Nesterov wrote: > > On 10/24, Peter Zijlstra wrote: > > > > > > Still, I wonder if returning 0 is the right thing. 0 is a 'valid' PID > > > for the init/idle task. > > > > Yes, now I think that -1 would make more sense. Unfortunately we can't > > just change __task_pid_nr_ns(), it already has the users which assume > > it returns zero... attach_to_pi_state() for example. > > Indeed. And I have a patch that assumes task_pid_vnr(&init_task) == 0, > is that true because of this !alive case or true in general? This is true in general. Idle threads are always alive but they use the the special init_struct_pid with .nr == 0. > No worries though, we can revert to your earlier explicit test and > return -1 while adding a comment to explain details? ... > Ah, ok. So whould we change that to match pid and return (explicit) -1 > there too? Well, if we add that PIDTYPE_TGID hack, I think we can do something like below... Or do you think we should add a perf_alive() check into perf_event_pid() for a quick fix? Either way it's a pity we can't report at least the valid tid, perhaps perf_event_tid() could use task_pid_nr() if event->ns == init_pid_ns, I dunno. Oleg. --- x/kernel/events/core.c +++ x/kernel/events/core.c @@ -1249,26 +1249,30 @@ unclone_ctx(struct perf_event_context *c return parent_ctx; } -static u32 perf_event_pid(struct perf_event *event, struct task_struct *p) +static u32 perf_event_xxx(struct perf_event *event, struct task_struct *p, + enum pid_type type) { + pid_t nr; /* * only top level events have the pid namespace they were created in */ if (event->parent) event = event->parent; - return task_tgid_nr_ns(p, event->ns); + nr = __task_pid_nr_ns(p, type, event->ns); + if (!nr && !is_idle_task(p)) + nr = -1; + return nr; } -static u32 perf_event_tid(struct perf_event *event, struct task_struct *p) +static u32 perf_event_pid(struct perf_event *event, struct task_struct *p) { - /* - * only top level events have the pid namespace they were created in - */ - if (event->parent) - event = event->parent; + return perf_event_xxx(p, event, PIDTYPE_TGID); +} - return task_pid_nr_ns(p, event->ns); +static u32 perf_event_tid(struct perf_event *event, struct task_struct *p) +{ + return perf_event_xxx(p, event, PIDTYPE_PID); } /* ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 13:25 ` Oleg Nesterov @ 2016-10-24 13:40 ` Oleg Nesterov 2016-10-24 14:17 ` Peter Zijlstra 2016-10-24 14:36 ` Peter Zijlstra 1 sibling, 1 reply; 27+ messages in thread From: Oleg Nesterov @ 2016-10-24 13:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On 10/24, Oleg Nesterov wrote: > > -static u32 perf_event_pid(struct perf_event *event, struct task_struct *p) > +static u32 perf_event_xxx(struct perf_event *event, struct task_struct *p, > + enum pid_type type) > { > + pid_t nr; > /* > * only top level events have the pid namespace they were created in > */ > if (event->parent) > event = event->parent; > > - return task_tgid_nr_ns(p, event->ns); > + nr = __task_pid_nr_ns(p, type, event->ns); > + if (!nr && !is_idle_task(p)) > + nr = -1; > + return nr; And just in case... In any case __task_pid_nr_ns() and other similar helpers can also return zero if "p" runs in another namespace. Say, in the parent ns. Say, perf_event_switch_output(). What do we want to report in this case, zero or -1 ? Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 13:40 ` Oleg Nesterov @ 2016-10-24 14:17 ` Peter Zijlstra 0 siblings, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2016-10-24 14:17 UTC (permalink / raw) To: Oleg Nesterov Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On Mon, Oct 24, 2016 at 03:40:13PM +0200, Oleg Nesterov wrote: > On 10/24, Oleg Nesterov wrote: > > > > -static u32 perf_event_pid(struct perf_event *event, struct task_struct *p) > > +static u32 perf_event_xxx(struct perf_event *event, struct task_struct *p, > > + enum pid_type type) > > { > > + pid_t nr; > > /* > > * only top level events have the pid namespace they were created in > > */ > > if (event->parent) > > event = event->parent; > > > > - return task_tgid_nr_ns(p, event->ns); > > + nr = __task_pid_nr_ns(p, type, event->ns); > > + if (!nr && !is_idle_task(p)) > > + nr = -1; > > + return nr; > > And just in case... In any case __task_pid_nr_ns() and other similar helpers > can also return zero if "p" runs in another namespace. Say, in the parent ns. Right, I'm tempted to not change that. Its been the behaviour for a while and changing that will upset people. The unhash case is different in that its actively broken so we must do something. > Say, perf_event_switch_output(). What do we want to report in this case, zero > or -1 ? As with all switch_output() cases, the user had better know wth he's doing ;-) Doing a switch_output() on a running counter is dubious to begin with. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 13:25 ` Oleg Nesterov 2016-10-24 13:40 ` Oleg Nesterov @ 2016-10-24 14:36 ` Peter Zijlstra 2016-10-24 15:39 ` Oleg Nesterov 1 sibling, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2016-10-24 14:36 UTC (permalink / raw) To: Oleg Nesterov Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On Mon, Oct 24, 2016 at 03:25:55PM +0200, Oleg Nesterov wrote: > Well, if we add that PIDTYPE_TGID hack, I think we can do something > like below... > > Or do you think we should add a perf_alive() check into perf_event_pid() > for a quick fix? That is what I was thinking. Then we don't need to do the TGID hack, I suspect some people might object to that. > Either way it's a pity we can't report at least the valid tid, perhaps > perf_event_tid() could use task_pid_nr() if event->ns == init_pid_ns, > I dunno. Right, but after unhash is there really still the notion of a valid TID? I mean, the TID can be reused, at which point you'll end up with two tasks etc.. But yes, very tedious. I was thinking something like so? --- kernel/events/core.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index c6e47e97b33f..2c9a22485e9e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1257,7 +1257,14 @@ static u32 perf_event_pid(struct perf_event *event, struct task_struct *p) if (event->parent) event = event->parent; - return task_tgid_nr_ns(p, event->ns); + /* + * It is possible the task already got unhashed, in which case we + * cannot determine the current->group_leader/real_parent. + * + * Also, report -1 to indicate unhashed, so as not to confused with + * 0 for the idle task. + */ + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : ~0; } static u32 perf_event_tid(struct perf_event *event, struct task_struct *p) @@ -1268,7 +1275,7 @@ static u32 perf_event_tid(struct perf_event *event, struct task_struct *p) if (event->parent) event = event->parent; - return task_pid_nr_ns(p, event->ns); + return pid_alive(p) ? task_pid_nr_ns(p, event->ns) : ~0; } /* ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 14:36 ` Peter Zijlstra @ 2016-10-24 15:39 ` Oleg Nesterov 2016-10-24 15:53 ` Oleg Nesterov 2016-10-25 9:28 ` Peter Zijlstra 0 siblings, 2 replies; 27+ messages in thread From: Oleg Nesterov @ 2016-10-24 15:39 UTC (permalink / raw) To: Peter Zijlstra Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On 10/24, Peter Zijlstra wrote: > > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -1257,7 +1257,14 @@ static u32 perf_event_pid(struct perf_event *event, struct task_struct *p) > if (event->parent) > event = event->parent; > > - return task_tgid_nr_ns(p, event->ns); > + /* > + * It is possible the task already got unhashed, in which case we > + * cannot determine the current->group_leader/real_parent. > + * > + * Also, report -1 to indicate unhashed, so as not to confused with > + * 0 for the idle task. > + */ > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : ~0; > } Yes, but this _looks_ racy unless p == current. I mean, pid_alive() makes task_tgid_nr_ns() safe, but task_tgid_nr_ns() still can return zero _if_ it can race with the exiting task. > static u32 perf_event_tid(struct perf_event *event, struct task_struct *p) > @@ -1268,7 +1275,7 @@ static u32 perf_event_tid(struct perf_event *event, struct task_struct *p) > if (event->parent) > event = event->parent; > > - return task_pid_nr_ns(p, event->ns); > + return pid_alive(p) ? task_pid_nr_ns(p, event->ns) : ~0; The same. However. At first glance the only case when p != current is copy_process(), right? And in this case the new child can't go away. So I think this patch is fine. Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 15:39 ` Oleg Nesterov @ 2016-10-24 15:53 ` Oleg Nesterov 2016-10-25 6:55 ` Ni, BaoleX 2016-10-25 9:28 ` Peter Zijlstra 1 sibling, 1 reply; 27+ messages in thread From: Oleg Nesterov @ 2016-10-24 15:53 UTC (permalink / raw) To: Peter Zijlstra Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On 10/24, Oleg Nesterov wrote: > > On 10/24, Peter Zijlstra wrote: > > > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -1257,7 +1257,14 @@ static u32 perf_event_pid(struct perf_event *event, struct task_struct *p) > > if (event->parent) > > event = event->parent; > > > > - return task_tgid_nr_ns(p, event->ns); > > + /* > > + * It is possible the task already got unhashed, in which case we > > + * cannot determine the current->group_leader/real_parent. > > + * > > + * Also, report -1 to indicate unhashed, so as not to confused with > > + * 0 for the idle task. > > + */ > > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : ~0; > > } > > Yes, but this _looks_ racy unless p == current. I mean, pid_alive() makes > task_tgid_nr_ns() safe, but task_tgid_nr_ns() still can return zero _if_ > it can race with the exiting task. > > > static u32 perf_event_tid(struct perf_event *event, struct task_struct *p) > > @@ -1268,7 +1275,7 @@ static u32 perf_event_tid(struct perf_event *event, struct task_struct *p) > > if (event->parent) > > event = event->parent; > > > > - return task_pid_nr_ns(p, event->ns); > > + return pid_alive(p) ? task_pid_nr_ns(p, event->ns) : ~0; > > The same. > > However. At first glance the only case when p != current is copy_process(), > right? And in this case the new child can't go away. So I think this patch > is fine. Actually there is another case, comm_write() -> perf_event_comm_output(). It checks same_thread_group(current, p), so we can only race with the exiting sub-thread. perf_event_pid() can't return zero, perf_event_tid() can. And I personally think we do not care and your patch is fine anyway ;) Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: hit a KASan bug related to Perf during stress test 2016-10-24 15:53 ` Oleg Nesterov @ 2016-10-25 6:55 ` Ni, BaoleX 0 siblings, 0 replies; 27+ messages in thread From: Ni, BaoleX @ 2016-10-25 6:55 UTC (permalink / raw) To: Oleg Nesterov, Peter Zijlstra Cc: mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng Thanks a lot, guys. I will take Peter's patch to do stress test. -----Original Message----- From: Oleg Nesterov [mailto:oleg@redhat.com] Sent: Monday, October 24, 2016 11:53 PM To: Peter Zijlstra Cc: Ni, BaoleX; mingo@redhat.com; acme@kernel.org; linux-kernel@vger.kernel.org; alexander.shishkin@linux.intel.com; Liu, Chuansheng Subject: Re: hit a KASan bug related to Perf during stress test On 10/24, Oleg Nesterov wrote: > > On 10/24, Peter Zijlstra wrote: > > > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -1257,7 +1257,14 @@ static u32 perf_event_pid(struct perf_event *event, struct task_struct *p) > > if (event->parent) > > event = event->parent; > > > > - return task_tgid_nr_ns(p, event->ns); > > + /* > > + * It is possible the task already got unhashed, in which case we > > + * cannot determine the current->group_leader/real_parent. > > + * > > + * Also, report -1 to indicate unhashed, so as not to confused with > > + * 0 for the idle task. > > + */ > > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : ~0; > > } > > Yes, but this _looks_ racy unless p == current. I mean, pid_alive() > makes > task_tgid_nr_ns() safe, but task_tgid_nr_ns() still can return zero > _if_ it can race with the exiting task. > > > static u32 perf_event_tid(struct perf_event *event, struct > > task_struct *p) @@ -1268,7 +1275,7 @@ static u32 perf_event_tid(struct perf_event *event, struct task_struct *p) > > if (event->parent) > > event = event->parent; > > > > - return task_pid_nr_ns(p, event->ns); > > + return pid_alive(p) ? task_pid_nr_ns(p, event->ns) : ~0; > > The same. > > However. At first glance the only case when p != current is > copy_process(), right? And in this case the new child can't go away. > So I think this patch is fine. Actually there is another case, comm_write() -> perf_event_comm_output(). It checks same_thread_group(current, p), so we can only race with the exiting sub-thread. perf_event_pid() can't return zero, perf_event_tid() can. And I personally think we do not care and your patch is fine anyway ;) Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 15:39 ` Oleg Nesterov 2016-10-24 15:53 ` Oleg Nesterov @ 2016-10-25 9:28 ` Peter Zijlstra 2016-10-25 14:41 ` Oleg Nesterov 1 sibling, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2016-10-25 9:28 UTC (permalink / raw) To: Oleg Nesterov Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On Mon, Oct 24, 2016 at 05:39:08PM +0200, Oleg Nesterov wrote: > On 10/24, Peter Zijlstra wrote: > > > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -1257,7 +1257,14 @@ static u32 perf_event_pid(struct perf_event *event, struct task_struct *p) > > if (event->parent) > > event = event->parent; > > > > - return task_tgid_nr_ns(p, event->ns); > > + /* > > + * It is possible the task already got unhashed, in which case we > > + * cannot determine the current->group_leader/real_parent. > > + * > > + * Also, report -1 to indicate unhashed, so as not to confused with > > + * 0 for the idle task. > > + */ > > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : ~0; > > } > > Yes, but this _looks_ racy unless p == current. I mean, pid_alive() makes > task_tgid_nr_ns() safe, but task_tgid_nr_ns() still can return zero _if_ > it can race with the exiting task. So what serialization would close that race? __task_pid_nr_ns() only seems to use RCU nothing more. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-25 9:28 ` Peter Zijlstra @ 2016-10-25 14:41 ` Oleg Nesterov 2016-10-26 9:03 ` Peter Zijlstra 0 siblings, 1 reply; 27+ messages in thread From: Oleg Nesterov @ 2016-10-25 14:41 UTC (permalink / raw) To: Peter Zijlstra Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On 10/25, Peter Zijlstra wrote: > > On Mon, Oct 24, 2016 at 05:39:08PM +0200, Oleg Nesterov wrote: > > On 10/24, Peter Zijlstra wrote: > > > > > > --- a/kernel/events/core.c > > > +++ b/kernel/events/core.c > > > @@ -1257,7 +1257,14 @@ static u32 perf_event_pid(struct perf_event *event, struct task_struct *p) > > > if (event->parent) > > > event = event->parent; > > > > > > - return task_tgid_nr_ns(p, event->ns); > > > + /* > > > + * It is possible the task already got unhashed, in which case we > > > + * cannot determine the current->group_leader/real_parent. > > > + * > > > + * Also, report -1 to indicate unhashed, so as not to confused with > > > + * 0 for the idle task. > > > + */ > > > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : ~0; > > > } > > > > Yes, but this _looks_ racy unless p == current. I mean, pid_alive() makes > > task_tgid_nr_ns() safe, but task_tgid_nr_ns() still can return zero _if_ > > it can race with the exiting task. > > So what serialization would close that race? __task_pid_nr_ns() only > seems to use RCU nothing more. I do not see how can we close this race, we obviously do not want to use any locking. That is why I tried to suggest nr = __task_pid_nr_ns(p, type, event->ns); if (!nr && !is_idle_task(p)) nr = -1; return nr; but this will report -1 if p runs in another namespace, so perhaps we can do nr = __task_pid_nr_ns(p, type, event->ns); if (!nr && p->exit_state) // it has already called exit_notify nr = -1; return nr; Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-25 14:41 ` Oleg Nesterov @ 2016-10-26 9:03 ` Peter Zijlstra 2016-10-26 16:10 ` Oleg Nesterov 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2016-10-26 9:03 UTC (permalink / raw) To: Oleg Nesterov Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On Tue, Oct 25, 2016 at 04:41:26PM +0200, Oleg Nesterov wrote: > > > > So what serialization would close that race? __task_pid_nr_ns() only > > seems to use RCU nothing more. > > I do not see how can we close this race, we obviously do not want to use > any locking. > > That is why I tried to suggest > > nr = __task_pid_nr_ns(p, type, event->ns); > if (!nr && !is_idle_task(p)) > nr = -1; > return nr; > > but this will report -1 if p runs in another namespace, so perhaps we > can do > > nr = __task_pid_nr_ns(p, type, event->ns); > if (!nr && p->exit_state) > // it has already called exit_notify > nr = -1; > return nr; I think I'm asking how __task_pid_nr_ns() isn't susceptible to this race ;-) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-26 9:03 ` Peter Zijlstra @ 2016-10-26 16:10 ` Oleg Nesterov 0 siblings, 0 replies; 27+ messages in thread From: Oleg Nesterov @ 2016-10-26 16:10 UTC (permalink / raw) To: Peter Zijlstra Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On 10/26, Peter Zijlstra wrote: > > On Tue, Oct 25, 2016 at 04:41:26PM +0200, Oleg Nesterov wrote: > > > > > > So what serialization would close that race? __task_pid_nr_ns() only > > > seems to use RCU nothing more. > > > > I do not see how can we close this race, we obviously do not want to use > > any locking. > > > > That is why I tried to suggest > > > > nr = __task_pid_nr_ns(p, type, event->ns); > > if (!nr && !is_idle_task(p)) > > nr = -1; > > return nr; > > > > but this will report -1 if p runs in another namespace, so perhaps we > > can do > > > > nr = __task_pid_nr_ns(p, type, event->ns); > > if (!nr && p->exit_state) > > // it has already called exit_notify > > nr = -1; > > return nr; > > I think I'm asking how __task_pid_nr_ns() isn't susceptible to this race > ;-) which race ? ;) it seems that I confused you. Lets ignore the original problem with perf_event_pid()->task_tgid_nr_ns() which can access the freed memory. Lets suppose it is already fixed. Another problem, as you noted, is that task_tgid_nr_ns/task_pid_nr_ns returns zero if the task exits and this zero can be confused with the swapper's pid. return pid_alive(p) ? task_pid_nr_ns(p, event->ns) : ~0 still can return zero because pid_alive(p) == T is not stable if we can race with the exiting task, so it can't guarantee that task_pid_nr_ns() won't return 0. So we can check ->exit_state or, even better, that same pid_alive() after task_pid_nr_ns() returns 0. nr = task_pid_nr_ns(p); /* avoid -1 if it is idle thread or runs in another ns */ if (!nr && !pid_alive(p)) nr = -1; return nr; Or I misunderstood you? Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 12:02 ` Oleg Nesterov 2016-10-24 12:10 ` Oleg Nesterov @ 2016-10-24 12:19 ` Peter Zijlstra 1 sibling, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2016-10-24 12:19 UTC (permalink / raw) To: Oleg Nesterov Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On Mon, Oct 24, 2016 at 02:02:32PM +0200, Oleg Nesterov wrote: > Perhaps. Or into task_tgid(). Or even the patch below, __task_pid_nr_ns() > is always safe. This certainly needs some cleanups. > --- x/include/linux/pid.h > +++ x/include/linux/pid.h > @@ -8,7 +8,8 @@ enum pid_type > PIDTYPE_PID, > PIDTYPE_PGID, > PIDTYPE_SID, > - PIDTYPE_MAX > + PIDTYPE_MAX, > + PIDTYPE_TGID /* do not use */ > }; > > /* > --- x/kernel/pid.c > +++ x/kernel/pid.c > @@ -538,7 +538,7 @@ EXPORT_SYMBOL(__task_pid_nr_ns); > > pid_t task_tgid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns) > { > - return pid_nr_ns(task_tgid(tsk), ns); > + return __task_pid_nr_ns(tsk, PIDTYPE_TGID, ns); > } > EXPORT_SYMBOL(task_tgid_nr_ns); > > Right, that will return 0 on !alive. But I'm not seeing how PIDTYPE_TGID isn't an array bound violating of its own though. Then again, I didn't look to hard at the pid stuff. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 11:15 ` Oleg Nesterov 2016-10-24 11:24 ` Peter Zijlstra @ 2016-10-24 11:27 ` Peter Zijlstra 2016-10-24 11:29 ` Peter Zijlstra 2016-10-24 12:11 ` Peter Zijlstra 2 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2016-10-24 11:27 UTC (permalink / raw) To: Oleg Nesterov Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng, Jiri Olsa On Mon, Oct 24, 2016 at 01:15:27PM +0200, Oleg Nesterov wrote: > How about the trivial fix below? > > Oleg. > > --- x/kernel/events/core.c > +++ x/kernel/events/core.c > @@ -1257,7 +1257,7 @@ static u32 perf_event_pid(struct perf_ev > if (event->parent) > event = event->parent; > > - return task_tgid_nr_ns(p, event->ns); > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : 0; > } Also, now we get a (few) sample(s) with a different pid:tid than prior samples and not matching the sched_switch() events. I can imagine that being somewhat confusing for people/tools. Acme/Jolsa, any idea if that will bugger perf-report? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 11:27 ` Peter Zijlstra @ 2016-10-24 11:29 ` Peter Zijlstra 2016-10-24 12:04 ` Jiri Olsa 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2016-10-24 11:29 UTC (permalink / raw) To: Oleg Nesterov Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng, Jiri Olsa On Mon, Oct 24, 2016 at 01:27:32PM +0200, Peter Zijlstra wrote: > On Mon, Oct 24, 2016 at 01:15:27PM +0200, Oleg Nesterov wrote: > > How about the trivial fix below? > > > > Oleg. > > > > --- x/kernel/events/core.c > > +++ x/kernel/events/core.c > > @@ -1257,7 +1257,7 @@ static u32 perf_event_pid(struct perf_ev > > if (event->parent) > > event = event->parent; > > > > - return task_tgid_nr_ns(p, event->ns); > > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : 0; > > } > > Also, now we get a (few) sample(s) with a different pid:tid than prior > samples and not matching the sched_switch() events. > > I can imagine that being somewhat confusing for people/tools. > > Acme/Jolsa, any idea if that will bugger perf-report? Hurm, then again, I imagine that after unhash_process the PID/TID could be instantly re-used and then we're still confused. Yuck. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 11:29 ` Peter Zijlstra @ 2016-10-24 12:04 ` Jiri Olsa 2016-10-24 12:12 ` Peter Zijlstra 0 siblings, 1 reply; 27+ messages in thread From: Jiri Olsa @ 2016-10-24 12:04 UTC (permalink / raw) To: Peter Zijlstra Cc: Oleg Nesterov, Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng, Jiri Olsa On Mon, Oct 24, 2016 at 01:29:45PM +0200, Peter Zijlstra wrote: > On Mon, Oct 24, 2016 at 01:27:32PM +0200, Peter Zijlstra wrote: > > On Mon, Oct 24, 2016 at 01:15:27PM +0200, Oleg Nesterov wrote: > > > How about the trivial fix below? > > > > > > Oleg. > > > > > > --- x/kernel/events/core.c > > > +++ x/kernel/events/core.c > > > @@ -1257,7 +1257,7 @@ static u32 perf_event_pid(struct perf_ev > > > if (event->parent) > > > event = event->parent; > > > > > > - return task_tgid_nr_ns(p, event->ns); > > > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : 0; > > > } > > > > Also, now we get a (few) sample(s) with a different pid:tid than prior > > samples and not matching the sched_switch() events. > > > > I can imagine that being somewhat confusing for people/tools. > > > > Acme/Jolsa, any idea if that will bugger perf-report? > > Hurm, then again, I imagine that after unhash_process the PID/TID could > be instantly re-used and then we're still confused. sounds bad.. I haven't checked the related pid_alive code, but shouldn't we already get the EXIT event in this case? jirka ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 12:04 ` Jiri Olsa @ 2016-10-24 12:12 ` Peter Zijlstra 0 siblings, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2016-10-24 12:12 UTC (permalink / raw) To: Jiri Olsa Cc: Oleg Nesterov, Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng, Jiri Olsa On Mon, Oct 24, 2016 at 02:04:11PM +0200, Jiri Olsa wrote: > On Mon, Oct 24, 2016 at 01:29:45PM +0200, Peter Zijlstra wrote: > > Hurm, then again, I imagine that after unhash_process the PID/TID could > > be instantly re-used and then we're still confused. > > sounds bad.. I haven't checked the related pid_alive code, > but shouldn't we already get the EXIT event in this case? It has, perf_event_exit_task() happens before we unhash. But a per-cpu event that has PID/TID reporting on will run into this. We'll observe 'funny' values between the unhash and the next context switch. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 11:15 ` Oleg Nesterov 2016-10-24 11:24 ` Peter Zijlstra 2016-10-24 11:27 ` Peter Zijlstra @ 2016-10-24 12:11 ` Peter Zijlstra 2016-10-24 12:21 ` Oleg Nesterov 2 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2016-10-24 12:11 UTC (permalink / raw) To: Oleg Nesterov Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On Mon, Oct 24, 2016 at 01:15:27PM +0200, Oleg Nesterov wrote: > Yes, current is still valid. > > But nothing protects current->group_leader or parent/real_parent, they > can point to the exited/freed task. We really need to nullify them in > __unhash_process() to catch the problems like this, I wanted to do this > many times... > > So you simply can't know your tgid or even tid after release_task() calls > __unhash_process(). Actually after exit_notify() unless the exiting task > autoreaps itself. > > How about the trivial fix below? > > Oleg. > > --- x/kernel/events/core.c > +++ x/kernel/events/core.c > @@ -1257,7 +1257,7 @@ static u32 perf_event_pid(struct perf_ev > if (event->parent) > event = event->parent; > > - return task_tgid_nr_ns(p, event->ns); > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : 0; > } > > static u32 perf_event_tid(struct perf_event *event, struct task_struct *p) Should we do the same for perf_event_tid() and report -1 as the pid/tid in the !alive case? -1 should be an obvious invalid pid since we limit the pid-space to less than 32 bits. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 12:11 ` Peter Zijlstra @ 2016-10-24 12:21 ` Oleg Nesterov 2016-10-24 12:27 ` Peter Zijlstra 0 siblings, 1 reply; 27+ messages in thread From: Oleg Nesterov @ 2016-10-24 12:21 UTC (permalink / raw) To: Peter Zijlstra Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On 10/24, Peter Zijlstra wrote: > > On Mon, Oct 24, 2016 at 01:15:27PM +0200, Oleg Nesterov wrote: > > > > --- x/kernel/events/core.c > > +++ x/kernel/events/core.c > > @@ -1257,7 +1257,7 @@ static u32 perf_event_pid(struct perf_ev > > if (event->parent) > > event = event->parent; > > > > - return task_tgid_nr_ns(p, event->ns); > > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : 0; > > } > > > > static u32 perf_event_tid(struct perf_event *event, struct task_struct *p) > > Should we do the same for perf_event_tid() and report -1 as the pid/tid > in the !alive case? -1 should be an obvious invalid pid since we limit > the pid-space to less than 32 bits. task_pid_nr_ns() is always safe, it calls __task_pid_nr_ns(). But yes, it can return zero if called after exit_notify() and/or release_task(). And while zero is not a valid pid too, I guess it can be confused with the idle thread's "pid" ? Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: hit a KASan bug related to Perf during stress test 2016-10-24 12:21 ` Oleg Nesterov @ 2016-10-24 12:27 ` Peter Zijlstra 0 siblings, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2016-10-24 12:27 UTC (permalink / raw) To: Oleg Nesterov Cc: Ni, BaoleX, mingo, acme, linux-kernel, alexander.shishkin, Liu, Chuansheng On Mon, Oct 24, 2016 at 02:21:23PM +0200, Oleg Nesterov wrote: > > Should we do the same for perf_event_tid() and report -1 as the pid/tid > > in the !alive case? -1 should be an obvious invalid pid since we limit > > the pid-space to less than 32 bits. > > task_pid_nr_ns() is always safe, it calls __task_pid_nr_ns(). But yes, > it can return zero if called after exit_notify() and/or release_task(). > > And while zero is not a valid pid too, I guess it can be confused with > the idle thread's "pid" ? Right, 0 is the idle thread. ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2016-10-26 16:12 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <318B87A793BE164187D8851D6CE09D64371C8811@shsmsx102.ccr.corp.intel.com> 2016-10-24 9:53 ` hit a KASan bug related to Perf during stress test Peter Zijlstra 2016-10-24 11:15 ` Oleg Nesterov 2016-10-24 11:24 ` Peter Zijlstra 2016-10-24 12:02 ` Oleg Nesterov 2016-10-24 12:10 ` Oleg Nesterov 2016-10-24 12:22 ` Peter Zijlstra 2016-10-24 12:29 ` Oleg Nesterov 2016-10-24 12:38 ` Peter Zijlstra 2016-10-24 13:25 ` Oleg Nesterov 2016-10-24 13:40 ` Oleg Nesterov 2016-10-24 14:17 ` Peter Zijlstra 2016-10-24 14:36 ` Peter Zijlstra 2016-10-24 15:39 ` Oleg Nesterov 2016-10-24 15:53 ` Oleg Nesterov 2016-10-25 6:55 ` Ni, BaoleX 2016-10-25 9:28 ` Peter Zijlstra 2016-10-25 14:41 ` Oleg Nesterov 2016-10-26 9:03 ` Peter Zijlstra 2016-10-26 16:10 ` Oleg Nesterov 2016-10-24 12:19 ` Peter Zijlstra 2016-10-24 11:27 ` Peter Zijlstra 2016-10-24 11:29 ` Peter Zijlstra 2016-10-24 12:04 ` Jiri Olsa 2016-10-24 12:12 ` Peter Zijlstra 2016-10-24 12:11 ` Peter Zijlstra 2016-10-24 12:21 ` Oleg Nesterov 2016-10-24 12:27 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).