All of lore.kernel.org
 help / color / mirror / Atom feed
* tg_load_down NULL pointer dereference
@ 2016-06-20  6:14 AMIT NAGAL
  2016-06-20  8:34 ` Peter Zijlstra
  0 siblings, 1 reply; 2+ messages in thread
From: AMIT NAGAL @ 2016-06-20  6:14 UTC (permalink / raw)
  To: linux-kernel, mingo, peterz; +Cc: ajeet.y, pankaj.m

Hi 
I am using  Linux kernel version 3.10.28 (ARM platform) .
I am getting NULL pointer dereference in tg_load_down() .
At the time of error  , tg->parent->cfs_rq  value is 0 and  tg->se value is 0x00000400 . ( refer to backtrace in 5) ).

1)
Problematic statement is in line  5814  in tg_load_down() :  
line 5814 :: load = tg->parent->cfs_rq[cpu]->h_load;
tg->parent->cfs_rq value is 0 (register r2) due to which null dereference error comes .

line 5815 :: load *= tg->se[cpu]->load.weight;
tg->se pointer value value is 0x00000400  (register r3) . (tg->se is computed early , refer disas below ).

PC = 0x0xc00741b0 at the time of NULL dereference error .
(gdb) list *(0xc00741b0)
0xc00741b0 is in tg_load_down (kernel/sched/fair.c:5814).
5809            long cpu = (long)data;
5810
5811            if (!tg->parent) {
5812                    load = cpu_rq(cpu)->load.weight;
5813            } else {
5814                    load = tg->parent->cfs_rq[cpu]->h_load;
5815                    load *= tg->se[cpu]->load.weight;
5816                    load /= tg->parent->cfs_rq[cpu]->load.weight + 1;
5817            } 

(gdb) disas tg_load_down
Dump of assembler code for function tg_load_down:
   0xc007418c <+0>:     mov     r12, sp
   0xc0074190 <+4>:     push    {r11, r12, lr, pc}
   0xc0074194 <+8>:     sub     r11, r12, #4
   0xc0074198 <+12>:    ldr     r3, [r0, #80]   ; 0x50
   0xc007419c <+16>:    cmp     r3, #0
   0xc00741a0 <+20>:    beq     0xc00741e8 <tg_load_down+92>
   0xc00741a4 <+24>:    ldr     r2, [r3, #36]   ; 0x24
   0xc00741a8 <+28>:    lsl     lr, r1, #2
   0xc00741ac <+32>:    ldr     r3, [r0, #32]
   0xc00741b0 <+36>:    ldr     r12, [r2, r1, lsl #2]

2) 
tg_lock_group() first argument ( struct task_group *tg) is stored in r0 .
Line 5811  :   if (!tg->parent) {
so first ,  tg->parent is stored in r3 .
c0074198:       e5903050        ldr     r3, [r0, #80]   ; 0x50

and tg->parent is checked for NULL .
c007419c:       e3530000        cmp     r3, #0

Line 5814  :   load = tg->parent->cfs_rq[cpu]
c00741a4:       e5932024        ldr     r2, [r3, #36]   ; 0x24
here tg->parent->cfs_rq is stored in  r2 . 

Line 5815  :   load *= tg->se[cpu]
c00741ac:       e5903020        ldr     r3, [r0, #32]
here tg->se is stored in r3 .

both  tg->parent->cfs_rq and  tg->se are  double pointers .
however when we see r2 ( tg->parent->cfs_rq) and r3 (tg->se) values in register dumps at the time of kernel crash , they have these values as shown in backtrace below in 5).
r2=00000000  r3 = 00000400  

after this ,  ldr     r12, [r2, r1, lsl #2] is executed which causes kernel crash with NULL pointer dereference error as r2 value is 0 

3)
rcu lock protection is already there while tg_load_down is executing .

static void update_h_load(long cpu)
{
        struct rq *rq = cpu_rq(cpu);
        unsigned long now = jiffies;

        if (rq->h_load_throttle == now)
                return;

        rq->h_load_throttle = now;

        rcu_read_lock();
        walk_tg_tree(tg_load_down, tg_nop, (void *)cpu);
        rcu_read_unlock();
}
 
4)relevant Backtrace related to problem is as follows :
pc : [<c00741b0> ( tg_load_down + 36 )]    lr : [<00000008>]    psr: a0070093
ip : dc4c3d08  fp : dc4c3d04
r10: c047e418  r9 : c007418c  r8 : 00000000
r7 : c006f44c  r6 : c069cb28  r5 : 00000002  r4 : d5d53a08
r3 : 00000400  r2 : 00000000  r1 : 00000002  r0 : d5d53a08

Function entered at [<c007418c>](tg_load_down) from  [<c006f3c0>](walk_tg_tree_from + 48)
Function entered at [<c006f390>](walk_tg_tree_from) from [< c007a694>](load_balance +668)

5) Is there any scenario by which tg->parent->cfs_rq be 0 (r2) and tg->se  can get corrupted to value 00000400  (r3).

Regards
Amit Nagal

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: tg_load_down NULL pointer dereference
  2016-06-20  6:14 tg_load_down NULL pointer dereference AMIT NAGAL
@ 2016-06-20  8:34 ` Peter Zijlstra
  0 siblings, 0 replies; 2+ messages in thread
From: Peter Zijlstra @ 2016-06-20  8:34 UTC (permalink / raw)
  To: AMIT NAGAL; +Cc: linux-kernel, mingo, ajeet.y, pankaj.m

On Mon, Jun 20, 2016 at 06:14:01AM +0000, AMIT NAGAL wrote:
> Hi 
> I am using  Linux kernel version 3.10.28 (ARM platform) .
> I am getting NULL pointer dereference in tg_load_down() .
> At the time of error  , tg->parent->cfs_rq  value is 0 and  tg->se value is 0x00000400 . ( refer to backtrace in 5) ).

Were you destroying cgroups at the time?

If so, there were some problems with cgroup teardown recently, see
patches:

  6fe1f348b3dd ("sched/cgroup: Fix cgroup entity load tracking tear-down")
  2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init")

Which depend on:

  aa226ff4a1ce ("cgroup: make sure a parent css isn't offlined before its children")
  8bb5ef79bc0f ("cgroup: make sure a parent css isn't freed before its children")

I've no idea if any of that is relevant to your ancient kernel, let
alone applies, that's your problem for using ancient wares.

If you can reproduce with a current kernel (4.6+) then I might look more.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-06-20  8:34 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-20  6:14 tg_load_down NULL pointer dereference AMIT NAGAL
2016-06-20  8:34 ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.