linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible bug with the keyboard_tasklet?
@ 2001-11-30  6:12 Ryan Bradetich
  2001-11-30  7:48 ` Possible bug with the keyboard_tasklet? or is it softirq tasklet scheduling? george anzinger
  2001-11-30 14:56 ` Ryan Bradetich
  0 siblings, 2 replies; 3+ messages in thread
From: Ryan Bradetich @ 2001-11-30  6:12 UTC (permalink / raw)
  To: linux-kernel

Hello linux hackers,

I am a relatively new kernel hacker working on the parisc-linux port,
and I think I found a BUG in the arch independant code involving the
keyboard_tasklet.  The problem represented itself as a system hang 
when I enabled CONFIG_SMP on my C200+.  I am posting to this list, so
hopefully someone can verify the problem and help devise a solution to
fix the problem properly.


Problem Description:
------------------------------------------------------------------------
I tracking the problem down to the following infinate loop in
tasklet_action() from kernel/softirq.c. 

while (list) {
        struct tasklet_struct *t = list;

        list = list->next;

        if (tasklet_trylock(t)) {
                if (!atomic_read(&t->count)) {
                        if (!test_and_clear_bit(TASKLET_STATE_SCHED,
&t->state))
                                BUG();
                        t->func(t->data);
                        tasklet_unlock(t);
                        continue;
                }
                tasklet_unlock(t);
        }

        local_irq_disable();
        t->next = tasklet_vec[cpu].list;
        tasklet_vec[cpu].list = t;
        __cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
        local_irq_enable();
}

I eventually figured out that the if(!atomic_read(&t->count)) was
failing... and the task would be added back into the list via the
following two lines of code:

        t->next = tasklet_vec[cpu].list;
        tasklet_vec[cpu].list = t;

This loop would never end since the atomic_read(&t->count) was always
non-zero, and the task was always being re-added to the list.

Debugging the loop further showed me the keyboard_tasklet was
the culprit task causing the infinate loop.  I eventually figured out
that the keyboard_tasklet->count was being initialized to 1 by the
following macro from include/linux/interrupt.h:

        #define DECLARE_TASKLET_DISABLED(name, func, data) \
        struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func,
data }

I also figurd out that the keyboard_tasklet was being scheduled via the
schedule_tasklet() before the enable_tasklet() was called it.  (The
enable_tasklet() provides a memory barrior, then calls atomic_dec()
on the ->count of the tasklet, making it 0).


The following traces shows the paths to the first call of
schedule_tasklet(), and the first call of enable_tasklet() for the
keyboard_tasklet.

schedule_tasklet(keyboard_tasklet)
-------------------------
1. start_kernel()
2. console_init()
3. con_init()
4. vc_init()
5. reset_terminal()
6. set_leds()
7. schedule_tasklet()



enable_tasklet(keyboard_tasklet)
--------------------------------
1. start_kernel()
2. rest_init()
3. init() via kernel_thread.
4. do_basic_setup()
5. do_init_calls()
6. chr_dev_init()
7. tty_init()
8. kbd_init()
9. enable_tasklet()


Looking in the start_kernel(),  console_init() is the 9th function
called, where as rest_init() is the last function called.



Questions about the code:
------------------------------------------------------------------------
1. Why is the console code scheduling the keyboard_tasklet?  This
doesn't make much sense to me.

2. Is the member variable count in the tasklet_struct
(include/linux/interrupt.h) only used for flagging the enabled/disabled
state of the tasklet?  If so, is it possible to rename the member
variable to something more intuative like disabled?

3. What is the best way to fix this problem, assuming I actually did
find a bug (I know this fixes the problem I was seeing on my C200+).

Currently I have the following patch in my local tree for this problem:

--- drivers/char/console.c      2001/11/09 23:35:36     1.15
+++ drivers/char/console.c      2001/11/30 04:44:38
@@ -1420,7 +1420,10 @@ static void reset_terminal(int currcons,
        kbd_table[currcons].slockstate = 0;
        kbd_table[currcons].ledmode = LED_SHOW_FLAGS;
        kbd_table[currcons].ledflagstate =
kbd_table[currcons].default_ledflagstate;
-       set_leds();
+
+        /* Only schedule the keyboard_tasklet if it is enabled. */
+        if (!atomic_read(&keyboard_tasklet.count))
+               set_leds();
 
        cursor_type = CUR_DEFAULT;
        complement_mask = s_complement_mask;



Thanks for your time,

- Ryan
parisc-linux newbie kernel hacker.



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Possible bug with the keyboard_tasklet? or is it softirq tasklet  scheduling?
  2001-11-30  6:12 Possible bug with the keyboard_tasklet? Ryan Bradetich
@ 2001-11-30  7:48 ` george anzinger
  2001-11-30 14:56 ` Ryan Bradetich
  1 sibling, 0 replies; 3+ messages in thread
From: george anzinger @ 2001-11-30  7:48 UTC (permalink / raw)
  To: Ryan Bradetich; +Cc: linux-kernel

Ryan Bradetich wrote:
> 
> Hello linux hackers,
> 
> I am a relatively new kernel hacker working on the parisc-linux port,
> and I think I found a BUG in the arch independant code involving the
> keyboard_tasklet.  The problem represented itself as a system hang
> when I enabled CONFIG_SMP on my C200+.  I am posting to this list, so
> hopefully someone can verify the problem and help devise a solution to
> fix the problem properly.
> 
> Problem Description:
> ------------------------------------------------------------------------
> I tracking the problem down to the following infinate loop in
> tasklet_action() from kernel/softirq.c.
> 
> while (list) {
>         struct tasklet_struct *t = list;
> 
>         list = list->next;
> 
>         if (tasklet_trylock(t)) {
>                 if (!atomic_read(&t->count)) {
>                         if (!test_and_clear_bit(TASKLET_STATE_SCHED,
> &t->state))
>                                 BUG();
>                         t->func(t->data);
>                         tasklet_unlock(t);
>                         continue;
>                 }
>                 tasklet_unlock(t);
>         }
> 
>         local_irq_disable();
>         t->next = tasklet_vec[cpu].list;
>         tasklet_vec[cpu].list = t;
>         __cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
>         local_irq_enable();
> }
> 
> I eventually figured out that the if(!atomic_read(&t->count)) was
> failing... and the task would be added back into the list via the
> following two lines of code:
> 
>         t->next = tasklet_vec[cpu].list;
>         tasklet_vec[cpu].list = t;
> 
> This loop would never end since the atomic_read(&t->count) was always
> non-zero, and the task was always being re-added to the list.
> 
> Debugging the loop further showed me the keyboard_tasklet was
> the culprit task causing the infinate loop.  I eventually figured out
> that the keyboard_tasklet->count was being initialized to 1 by the
> following macro from include/linux/interrupt.h:
> 
>         #define DECLARE_TASKLET_DISABLED(name, func, data) \
>         struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func,
> data }
> 
> I also figurd out that the keyboard_tasklet was being scheduled via the
> schedule_tasklet() before the enable_tasklet() was called it.  (The
> enable_tasklet() provides a memory barrior, then calls atomic_dec()
> on the ->count of the tasklet, making it 0).
> 
> The following traces shows the paths to the first call of
> schedule_tasklet(), and the first call of enable_tasklet() for the
> keyboard_tasklet.
> 
> schedule_tasklet(keyboard_tasklet)
> -------------------------
> 1. start_kernel()
> 2. console_init()
> 3. con_init()
> 4. vc_init()
> 5. reset_terminal()
> 6. set_leds()
> 7. schedule_tasklet()
> 
> enable_tasklet(keyboard_tasklet)
> --------------------------------
> 1. start_kernel()
> 2. rest_init()
> 3. init() via kernel_thread.
> 4. do_basic_setup()
> 5. do_init_calls()
> 6. chr_dev_init()
> 7. tty_init()
> 8. kbd_init()
> 9. enable_tasklet()
> 
> Looking in the start_kernel(),  console_init() is the 9th function
> called, where as rest_init() is the last function called.
> 
> Questions about the code:
> ------------------------------------------------------------------------
> 1. Why is the console code scheduling the keyboard_tasklet?  This
> doesn't make much sense to me.
> 
> 2. Is the member variable count in the tasklet_struct
> (include/linux/interrupt.h) only used for flagging the enabled/disabled
> state of the tasklet?  If so, is it possible to rename the member
> variable to something more intuative like disabled?
> 
> 3. What is the best way to fix this problem, assuming I actually did
> find a bug (I know this fixes the problem I was seeing on my C200+).
> 
> Currently I have the following patch in my local tree for this problem:
> 
> --- drivers/char/console.c      2001/11/09 23:35:36     1.15
> +++ drivers/char/console.c      2001/11/30 04:44:38
> @@ -1420,7 +1420,10 @@ static void reset_terminal(int currcons,
>         kbd_table[currcons].slockstate = 0;
>         kbd_table[currcons].ledmode = LED_SHOW_FLAGS;
>         kbd_table[currcons].ledflagstate =
> kbd_table[currcons].default_ledflagstate;
> -       set_leds();
> +
> +        /* Only schedule the keyboard_tasklet if it is enabled. */
> +        if (!atomic_read(&keyboard_tasklet.count))
> +               set_leds();
> 
>         cursor_type = CUR_DEFAULT;
>         complement_mask = s_complement_mask;
> 
> Thanks for your time,
> 
> - Ryan
> parisc-linux newbie kernel hacker.
> 
I found this same problem in while trying to run down timer bh issues. 
With out looking at the keyboard driver (since this is IMHO a tasklet
issue), I recommend that we not set the "pending" bit
(__cpu_raise_softirq()) if the tasklet fails because of count !=0, and
then modify the enable macro to, if the count is now 0, do the
__cpu_raise_softirq().  This still leaves the issue of the
tasklet_trylock(t), which will fail in the same way, but there we are
contending with another cpu and the rules say it can only run on one cpu
at a time.

On the other hand, why is this bothering you?  You don't say what kernel
version you are on, but the later versions push this sort of thing off
to ksoftirqd (a kernel thread) which allows the system to boot (even if
the thread doesn't exist yet, and it doesn't at this point).

The tasklet info suggests that it is ok for a tasklet to reschedule
itself, however, in the current system, this means that it will run each
interrupt.  Surly a timer would be a better answer, except we don't have
sub jiffie timers... yet.
-- 
George           george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Possible bug with the keyboard_tasklet? or is it softirq tasklet  scheduling?
  2001-11-30  6:12 Possible bug with the keyboard_tasklet? Ryan Bradetich
  2001-11-30  7:48 ` Possible bug with the keyboard_tasklet? or is it softirq tasklet scheduling? george anzinger
@ 2001-11-30 14:56 ` Ryan Bradetich
  1 sibling, 0 replies; 3+ messages in thread
From: Ryan Bradetich @ 2001-11-30 14:56 UTC (permalink / raw)
  To: george anzinger; +Cc: linux-kernel

> I found this same problem in while trying to run down timer bh issues. 
> With out looking at the keyboard driver (since this is IMHO a tasklet
> issue), I recommend that we not set the "pending" bit
> (__cpu_raise_softirq()) if the tasklet fails because of count !=0, and
> then modify the enable macro to, if the count is now 0, do the
> __cpu_raise_softirq().  This still leaves the issue of the
> tasklet_trylock(t), which will fail in the same way, but there we are
> contending with another cpu and the rules say it can only run on one cpu
> at a time.

hmm.. interesting.  I agree with you that this is a softirq tasklet
scheduling problem, not just related to the keyboard tasklet.  According
to the rules you stated, the tasklet_trylock(t) is bogus and can be
removed.  So what you suggest is to reschedule the tasklet, but not
raise the pending bit if the tasklet is disabled.  I can live with that
:)

I will produce and test a patch based on this tonight.

> On the other hand, why is this bothering you?  You don't say what kernel
> version you are on, but the later versions push this sort of thing off
> to ksoftirqd (a kernel thread) which allows the system to boot (even if
> the thread doesn't exist yet, and it doesn't at this point).

Sorry, I am running (available from cvs.parisc-linux.org)
Linux vega 2.4.16-pa5 #2 SMP Thu Nov 29 21:35:27 MST 2001 parisc unknown

Since this code is arch independent, it is very similar to the 2.4.16
kernel.  (Before posting to the list, I did verify that this problem
existed in the 2.4.16 kernel, and the code paths were the same.)


The reason I tracked this problem down was, if I enabled CONFIG_SMP
(C200+ is a single processor system), the system would "hang" after the
following bootup message:

Based upon Swansea University Computer Society NET3.039

UP kernels worked fine, but SMP kernels would "hang" every time.  I
still do not understand why toggling CONFIG_SMP triggered this "hang",
but it did.  Looks I need to keep digging further and try to understand
why CONFIG_SMP "hangs" the system.



> The tasklet info suggests that it is ok for a tasklet to reschedule
> itself, however, in the current system, this means that it will run each
> interrupt.  Surly a timer would be a better answer, except we don't have
> sub jiffie timers... yet.
> -- 
> George           george@mvista.com
> High-res-timers: http://sourceforge.net/projects/high-res-timers/
> Real time sched: http://sourceforge.net/projects/rtsched/
> 


Thanks for your feedback George!

- Ryan
parisc-linux newbie kernel hacker.



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-11-30 14:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-30  6:12 Possible bug with the keyboard_tasklet? Ryan Bradetich
2001-11-30  7:48 ` Possible bug with the keyboard_tasklet? or is it softirq tasklet scheduling? george anzinger
2001-11-30 14:56 ` Ryan Bradetich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).