linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Structure clobbering causes timer oopses
@ 2002-10-13  0:59 Dave Hansen
  2002-10-13  1:07 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Hansen @ 2002-10-13  0:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linux Kernel Mailing List, Andrew Morton

It put some extrace checks in timer_t, including a tripwire at the 
beginning and end, just in case the timer was just trampled by 
something.  It was.

I added begin, and end:
> struct timer_list {
>+        unsigned int begin;
>         struct list_head entry;
>         unsigned long expires;
> 
>         void (*function)(unsigned long);
>         unsigned long data;
> 
>         struct tvec_t_base_s *base;
>+        unsigned int end;
> };

> static inline void init_timer(struct timer_list * timer)
> {
>+        timer->begin = TIMER_BEG_MAGIC;
>+        timer->end = TIMER_END_MAGIC;
>         timer->base = NULL;
> }

then this beast:
(yeah, yeah, it ain't pretty, but it worked)
> #define CHECK_TIMER(timer) do {\
>                 if (((timer)->begin!=TIMER_BEG_MAGIC) || \
>                     ((timer)->end!=TIMER_END_MAGIC)) {\
>                         printk("timer magic check failed %s:%s():%d\n",
 >                __stringify(KBUILD_BASENAME),__FUNCTION__,__LINE__);\
>                         printk("begin: 0x%x end:0x%x\n", (timer)->begin, 
 >                         (timer)->end);\
>                         dump_stack();\
>                 }} while (0)


Just before a crash, I got:

timer magic check failed timer:__run_timers():351
begin: 0xc035fbc8 end:0xc035fbe8
Call Trace:
  [<c0120d53>] run_timer_tasklet+0xf7/0x188
  [<c011d945>] tasklet_hi_action+0x85/0xe0
  [<c011d64a>] do_softirq+0x5a/0xac
  [<c01117ed>] smp_apic_timer_interrupt+0x111/0x118
  [<c0105334>] poll_idle+0x0/0x48
  [<c0107a7a>] apic_timer_interrupt+0x1a/0x20
  [<c0105334>] poll_idle+0x0/0x48
  [<c010535d>] poll_idle+0x29/0x48
  [<c01053b3>] cpu_idle+0x37/0x48
  [<c011898d>] printk+0x125/0x140


Then, the full crash:

general protection fault: fbe0

CPU:    4
EIP:    0060:[<c035fbe9>]    Not tainted
EFLAGS: 00010287
EIP is at tvec_bases+0x169/0x20400
eax: d18deac0   ebx: c035dbcc   ecx: c035fbe0   edx: c0363f70
esi: c035fbd8   edi: c0363b00   ebp: 00000001   esp: f77c7f1c
ds: 0068   es: 0068   ss: 0068
Process swapper (pid: 0, threadinfo=f77c6000 task=f77c5060)
Stack: c0120d9b c035fbe0 cb1101c8 00000000 f77c6000 c011d945 00000000
        00000001 c035f960 fffffffe 00000080 c03443e4 c03443e4 c011d64a
        c035f960 00000010 00000004 00000000 00000000 00000046 c01117ed
        f77c6000 c0105334 00000000
Call Trace:
  [<c0120d9b>] run_timer_tasklet+0x13f/0x188
  [<c011d945>] tasklet_hi_action+0x85/0xe0
  [<c011d64a>] do_softirq+0x5a/0xac
  [<c01117ed>] smp_apic_timer_interrupt+0x111/0x118
  [<c0105334>] poll_idle+0x0/0x483
  [<c0107a7a>] apic_timer_interrupt+0x1a/0x20
  [<c0105334>] poll_idle+0x0/0x48
  [<c010535d>] poll_idle+0x29/0x48
  [<c01053b3>] cpu_idle+0x37/0x48
  [<c011898d>] printk+0x125/0x140


APIC error on CPU4: 08(08)
APIC error on CPU4: 08(08)
APIC error on CPU4: 08(08)
APIC error on CPU4: 08(08)
...

Notice that the junk that got put in begin, end, and function, are 
fairly close values, like something was trying to fill out an array.

Can anyone think of clever ways to figure out what is doing the 
trampling?

BTW, I found lots of users who aren't using init_timer().  Should I 
publicly humiliate them?
-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Structure clobbering causes timer oopses
  2002-10-13  0:59 Structure clobbering causes timer oopses Dave Hansen
@ 2002-10-13  1:07 ` Andrew Morton
  2002-10-13  2:09   ` Dave Hansen
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2002-10-13  1:07 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Ingo Molnar, Linux Kernel Mailing List

Dave Hansen wrote:
> 
> ...
> timer magic check failed timer:__run_timers():351
> begin: 0xc035fbc8 end:0xc035fbe8

Can you look these up in System.map?

> ..
> 
> BTW, I found lots of users who aren't using init_timer().  Should I
> publicly humiliate them?

If they're initially using add_timer(), that works out
OK.  It they start out using mod_timer() (or del_timer) then bug.

I assume you tried all the memory debugging options?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Structure clobbering causes timer oopses
  2002-10-13  1:07 ` Andrew Morton
@ 2002-10-13  2:09   ` Dave Hansen
  2002-10-13 19:50     ` Dipankar Sarma
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Hansen @ 2002-10-13  2:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, Linux Kernel Mailing List

Andrew Morton wrote:
> Dave Hansen wrote:
>>...
>>timer magic check failed timer:__run_timers():351
>>begin: 0xc035fbc8 end:0xc035fbe8
> 
> Can you look these up in System.map?

Inside tvec_bases, just like eip, because of timer_t->function.
c035fa80 d tvec_bases
c037fe80 d pidmap_lock
c037fea0 D page_states

>>BTW, I found lots of users who aren't using init_timer().  Should I
>>publicly humiliate them?
> 
> If they're initially using add_timer(), that works out
> OK.  It they start out using mod_timer() (or del_timer) then bug.

The init_timer() comment says otherwise, but I imagine that not using 
it shouldn't _cause_ any bugs.

* init_timer() must be done to a timer prior calling *any* of the
* other timer functions.

> I assume you tried all the memory debugging options?

No luck there.  I can't even get the oops to trigger with all the 
debugging on.

-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Structure clobbering causes timer oopses
  2002-10-13  2:09   ` Dave Hansen
@ 2002-10-13 19:50     ` Dipankar Sarma
  0 siblings, 0 replies; 4+ messages in thread
From: Dipankar Sarma @ 2002-10-13 19:50 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Andrew Morton, Ingo Molnar, Linux Kernel Mailing List

On Sun, Oct 13, 2002 at 02:17:46AM +0000, Dave Hansen wrote:
> > If they're initially using add_timer(), that works out
> > OK.  It they start out using mod_timer() (or del_timer) then bug.
> 
> The init_timer() comment says otherwise, but I imagine that not using 
> it shouldn't _cause_ any bugs.
> 
> * init_timer() must be done to a timer prior calling *any* of the
> * other timer functions.

I am not sure about that. init_timer() initializes timer->base
and timer_pending() checks for base == NULL. So, it is illegal
to do timer_pending(), mod_timer() and del_timer*() without an
init_timer() or an add_timer() earlier. But then, I presume this
was a requirement in the earlier timer interfaces too. No ?

Thanks
-- 
Dipankar Sarma  <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-10-13 19:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-13  0:59 Structure clobbering causes timer oopses Dave Hansen
2002-10-13  1:07 ` Andrew Morton
2002-10-13  2:09   ` Dave Hansen
2002-10-13 19:50     ` Dipankar Sarma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).