tg_load_down NULL pointer dereference

* tg_load_down NULL pointer dereference
@ 2016-06-20  6:14 AMIT NAGAL
  2016-06-20  8:34 ` Peter Zijlstra
  0 siblings, 1 reply; 2+ messages in thread
From: AMIT NAGAL @ 2016-06-20  6:14 UTC (permalink / raw)
  To: linux-kernel, mingo, peterz; +Cc: ajeet.y, pankaj.m

Hi 
I am using  Linux kernel version 3.10.28 (ARM platform) .
I am getting NULL pointer dereference in tg_load_down() .
At the time of error  , tg->parent->cfs_rq  value is 0 and  tg->se value is 0x00000400 . ( refer to backtrace in 5) ).

1)
Problematic statement is in line  5814  in tg_load_down() :  
line 5814 :: load = tg->parent->cfs_rq[cpu]->h_load;
tg->parent->cfs_rq value is 0 (register r2) due to which null dereference error comes .

line 5815 :: load *= tg->se[cpu]->load.weight;
tg->se pointer value value is 0x00000400  (register r3) . (tg->se is computed early , refer disas below ).

PC = 0x0xc00741b0 at the time of NULL dereference error .
(gdb) list *(0xc00741b0)
0xc00741b0 is in tg_load_down (kernel/sched/fair.c:5814).
5809            long cpu = (long)data;
5810
5811            if (!tg->parent) {
5812                    load = cpu_rq(cpu)->load.weight;
5813            } else {
5814                    load = tg->parent->cfs_rq[cpu]->h_load;
5815                    load *= tg->se[cpu]->load.weight;
5816                    load /= tg->parent->cfs_rq[cpu]->load.weight + 1;
5817            } 

(gdb) disas tg_load_down
Dump of assembler code for function tg_load_down:
   0xc007418c <+0>:     mov     r12, sp
   0xc0074190 <+4>:     push    {r11, r12, lr, pc}
   0xc0074194 <+8>:     sub     r11, r12, #4
   0xc0074198 <+12>:    ldr     r3, [r0, #80]   ; 0x50
   0xc007419c <+16>:    cmp     r3, #0
   0xc00741a0 <+20>:    beq     0xc00741e8 <tg_load_down+92>
   0xc00741a4 <+24>:    ldr     r2, [r3, #36]   ; 0x24
   0xc00741a8 <+28>:    lsl     lr, r1, #2
   0xc00741ac <+32>:    ldr     r3, [r0, #32]
   0xc00741b0 <+36>:    ldr     r12, [r2, r1, lsl #2]

2) 
tg_lock_group() first argument ( struct task_group *tg) is stored in r0 .
Line 5811  :   if (!tg->parent) {
so first ,  tg->parent is stored in r3 .
c0074198:       e5903050        ldr     r3, [r0, #80]   ; 0x50

and tg->parent is checked for NULL .
c007419c:       e3530000        cmp     r3, #0

Line 5814  :   load = tg->parent->cfs_rq[cpu]
c00741a4:       e5932024        ldr     r2, [r3, #36]   ; 0x24
here tg->parent->cfs_rq is stored in  r2 . 

Line 5815  :   load *= tg->se[cpu]
c00741ac:       e5903020        ldr     r3, [r0, #32]
here tg->se is stored in r3 .

both  tg->parent->cfs_rq and  tg->se are  double pointers .
however when we see r2 ( tg->parent->cfs_rq) and r3 (tg->se) values in register dumps at the time of kernel crash , they have these values as shown in backtrace below in 5).
r2=00000000  r3 = 00000400  

after this ,  ldr     r12, [r2, r1, lsl #2] is executed which causes kernel crash with NULL pointer dereference error as r2 value is 0 

3)
rcu lock protection is already there while tg_load_down is executing .

static void update_h_load(long cpu)
{
        struct rq *rq = cpu_rq(cpu);
        unsigned long now = jiffies;

        if (rq->h_load_throttle == now)
                return;

        rq->h_load_throttle = now;

        rcu_read_lock();
        walk_tg_tree(tg_load_down, tg_nop, (void *)cpu);
        rcu_read_unlock();
}

4)relevant Backtrace related to problem is as follows :
pc : [<c00741b0> ( tg_load_down + 36 )]    lr : [<00000008>]    psr: a0070093
ip : dc4c3d08  fp : dc4c3d04
r10: c047e418  r9 : c007418c  r8 : 00000000
r7 : c006f44c  r6 : c069cb28  r5 : 00000002  r4 : d5d53a08
r3 : 00000400  r2 : 00000000  r1 : 00000002  r0 : d5d53a08

Function entered at [<c007418c>](tg_load_down) from  [<c006f3c0>](walk_tg_tree_from + 48)
Function entered at [<c006f390>](walk_tg_tree_from) from [< c007a694>](load_balance +668)

5) Is there any scenario by which tg->parent->cfs_rq be 0 (r2) and tg->se  can get corrupted to value 00000400  (r3).

Regards
Amit Nagal

^ permalink raw reply	[flat|nested] 2+ messages in thread