* tg_load_down NULL pointer dereference
@ 2016-06-20 6:14 AMIT NAGAL
2016-06-20 8:34 ` Peter Zijlstra
0 siblings, 1 reply; 2+ messages in thread
From: AMIT NAGAL @ 2016-06-20 6:14 UTC (permalink / raw)
To: linux-kernel, mingo, peterz; +Cc: ajeet.y, pankaj.m
Hi
I am using Linux kernel version 3.10.28 (ARM platform) .
I am getting NULL pointer dereference in tg_load_down() .
At the time of error , tg->parent->cfs_rq value is 0 and tg->se value is 0x00000400 . ( refer to backtrace in 5) ).
1)
Problematic statement is in line 5814 in tg_load_down() :
line 5814 :: load = tg->parent->cfs_rq[cpu]->h_load;
tg->parent->cfs_rq value is 0 (register r2) due to which null dereference error comes .
line 5815 :: load *= tg->se[cpu]->load.weight;
tg->se pointer value value is 0x00000400 (register r3) . (tg->se is computed early , refer disas below ).
PC = 0x0xc00741b0 at the time of NULL dereference error .
(gdb) list *(0xc00741b0)
0xc00741b0 is in tg_load_down (kernel/sched/fair.c:5814).
5809 long cpu = (long)data;
5810
5811 if (!tg->parent) {
5812 load = cpu_rq(cpu)->load.weight;
5813 } else {
5814 load = tg->parent->cfs_rq[cpu]->h_load;
5815 load *= tg->se[cpu]->load.weight;
5816 load /= tg->parent->cfs_rq[cpu]->load.weight + 1;
5817 }
(gdb) disas tg_load_down
Dump of assembler code for function tg_load_down:
0xc007418c <+0>: mov r12, sp
0xc0074190 <+4>: push {r11, r12, lr, pc}
0xc0074194 <+8>: sub r11, r12, #4
0xc0074198 <+12>: ldr r3, [r0, #80] ; 0x50
0xc007419c <+16>: cmp r3, #0
0xc00741a0 <+20>: beq 0xc00741e8 <tg_load_down+92>
0xc00741a4 <+24>: ldr r2, [r3, #36] ; 0x24
0xc00741a8 <+28>: lsl lr, r1, #2
0xc00741ac <+32>: ldr r3, [r0, #32]
0xc00741b0 <+36>: ldr r12, [r2, r1, lsl #2]
2)
tg_lock_group() first argument ( struct task_group *tg) is stored in r0 .
Line 5811 : if (!tg->parent) {
so first , tg->parent is stored in r3 .
c0074198: e5903050 ldr r3, [r0, #80] ; 0x50
and tg->parent is checked for NULL .
c007419c: e3530000 cmp r3, #0
Line 5814 : load = tg->parent->cfs_rq[cpu]
c00741a4: e5932024 ldr r2, [r3, #36] ; 0x24
here tg->parent->cfs_rq is stored in r2 .
Line 5815 : load *= tg->se[cpu]
c00741ac: e5903020 ldr r3, [r0, #32]
here tg->se is stored in r3 .
both tg->parent->cfs_rq and tg->se are double pointers .
however when we see r2 ( tg->parent->cfs_rq) and r3 (tg->se) values in register dumps at the time of kernel crash , they have these values as shown in backtrace below in 5).
r2=00000000 r3 = 00000400
after this , ldr r12, [r2, r1, lsl #2] is executed which causes kernel crash with NULL pointer dereference error as r2 value is 0
3)
rcu lock protection is already there while tg_load_down is executing .
static void update_h_load(long cpu)
{
struct rq *rq = cpu_rq(cpu);
unsigned long now = jiffies;
if (rq->h_load_throttle == now)
return;
rq->h_load_throttle = now;
rcu_read_lock();
walk_tg_tree(tg_load_down, tg_nop, (void *)cpu);
rcu_read_unlock();
}
4)relevant Backtrace related to problem is as follows :
pc : [<c00741b0> ( tg_load_down + 36 )] lr : [<00000008>] psr: a0070093
ip : dc4c3d08 fp : dc4c3d04
r10: c047e418 r9 : c007418c r8 : 00000000
r7 : c006f44c r6 : c069cb28 r5 : 00000002 r4 : d5d53a08
r3 : 00000400 r2 : 00000000 r1 : 00000002 r0 : d5d53a08
Function entered at [<c007418c>](tg_load_down) from [<c006f3c0>](walk_tg_tree_from + 48)
Function entered at [<c006f390>](walk_tg_tree_from) from [< c007a694>](load_balance +668)
5) Is there any scenario by which tg->parent->cfs_rq be 0 (r2) and tg->se can get corrupted to value 00000400 (r3).
Regards
Amit Nagal
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: tg_load_down NULL pointer dereference
2016-06-20 6:14 tg_load_down NULL pointer dereference AMIT NAGAL
@ 2016-06-20 8:34 ` Peter Zijlstra
0 siblings, 0 replies; 2+ messages in thread
From: Peter Zijlstra @ 2016-06-20 8:34 UTC (permalink / raw)
To: AMIT NAGAL; +Cc: linux-kernel, mingo, ajeet.y, pankaj.m
On Mon, Jun 20, 2016 at 06:14:01AM +0000, AMIT NAGAL wrote:
> Hi
> I am using Linux kernel version 3.10.28 (ARM platform) .
> I am getting NULL pointer dereference in tg_load_down() .
> At the time of error , tg->parent->cfs_rq value is 0 and tg->se value is 0x00000400 . ( refer to backtrace in 5) ).
Were you destroying cgroups at the time?
If so, there were some problems with cgroup teardown recently, see
patches:
6fe1f348b3dd ("sched/cgroup: Fix cgroup entity load tracking tear-down")
2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init")
Which depend on:
aa226ff4a1ce ("cgroup: make sure a parent css isn't offlined before its children")
8bb5ef79bc0f ("cgroup: make sure a parent css isn't freed before its children")
I've no idea if any of that is relevant to your ancient kernel, let
alone applies, that's your problem for using ancient wares.
If you can reproduce with a current kernel (4.6+) then I might look more.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-06-20 8:34 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-20 6:14 tg_load_down NULL pointer dereference AMIT NAGAL
2016-06-20 8:34 ` Peter Zijlstra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.