linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 3.0.18 tcsetattr on fd 0 when detached freezes system (RCU timeouts) (Centos 6.1 x86_64)
@ 2012-01-28  0:43 Professor Berkley Shands
  2012-01-28  6:55 ` Michael Tokarev
  0 siblings, 1 reply; 4+ messages in thread
From: Professor Berkley Shands @ 2012-01-28  0:43 UTC (permalink / raw)
  To: linux-kernel


typedef struct
{
    struct termios term;
} XKEY_DATA;

typedef XKEY_DATA *xkeyhandle;

static inline xkeyhandle xkeystart()
{

    // Turn off echo.
    struct termios temp;
    err = tcgetattr(0, &temp);
    if (err)
    {
       perror("tcgetattr failure");
    }

    XKEY_DATA *handle = new XKEY_DATA;
    handle->term = temp;

    temp.c_lflag &= ~ECHO;
    temp.c_lflag &= ~ICANON;

    err = tcsetattr(0, TCSANOW, &temp);        // this line causes the 
kernel to get very sick
    if (err)
    {
       perror("tcsetattr failure");
    }

    return handle;

}

The above code, called from main() will produce an error from tcsh:

/home/bshands> ./a.out > /dev/null &
[1] 3635
/home/bshands>
/home/bshands>
[1]  + Suspended (tty output) ./a.out > /dev/null
/home/bshands>

this does not appear on the redhat kernel, nor 2.6.32.43, but appeared 
infrequently in 3.0.9.
in 3.0.18, doing this in the background does *EVIL* things.

ssh system "./a.out > /dev/null &" &

Now when the code reaches the tcsetattr() the system quits scheduling tasks.
top shows 100%sy on 4/12 cores, kernel threads blocked, stalled tasks 
count increasing.


Jan 27 18:58:50 system kernel: [ 1269.891610] kjournald       S 
ffff880321270a30     0   416      2 0x00000000
Jan 27 18:58:50 system kernel: [ 1269.891684]  ffff880321f5de50 
0000000000000046 ffff880321270680 0000000000000000
Jan 27 18:58:50 system kernel: [ 1269.891816]  ffff880321270680 
0000000000012ac0 ffff880321f5dfd8 ffff880321f5c010
Jan 27 18:58:50 system kernel: [ 1269.891948]  ffff880321f5dfd8 
0000000000012ac0 ffff880327a61260 ffff880321270680
Jan 27 18:58:50 system kernel: [ 1269.892083] Call Trace:
Jan 27 18:58:50 system kernel: [ 1269.892144]  [<ffffffff824c126f>] 
schedule+0x3f/0x60
Jan 27 18:58:50 system kernel: [ 1269.892226]  [<ffffffffa0030f9d>] 
kjournald+0x20d/0x230 [jbd]
Jan 27 18:58:50 system kernel: [ 1269.892294]  [<ffffffff82082230>] ? 
wake_up_bit+0x40/0x40
Jan 27 18:58:50 system kernel: [ 1269.892362]  [<ffffffffa0030d90>] ? 
commit_timeout+0x10/0x10 [jbd]
Jan 27 18:58:50 system kernel: [ 1269.892431]  [<ffffffff82081bb6>] 
kthread+0x96/0xa0
Jan 27 18:58:50 system kernel: [ 1269.892497]  [<ffffffff824cbe04>] 
kernel_thread_helper+0x4/0x10
Jan 27 18:58:50 system kernel: [ 1269.892565]  [<ffffffff82081b20>] ? 
kthread_worker_fn+0x1a0/0x1a0
Jan 27 18:58:50 system kernel: [ 1269.892633]  [<ffffffff824cbe00>] ? 
gs_change+0x13/0x13

After a little while, the system locks up. rcu timeout errors may appear 
in dmesg.
dumps are available.

Note that this little bugger is a user mode crash, once started, you are 
done.
it appears that later use of termios is still sufficient to trigger the 
lockup.
they key is being detached.

Berkley






^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 3.0.18 tcsetattr on fd 0 when detached freezes system (RCU timeouts) (Centos 6.1 x86_64)
  2012-01-28  0:43 3.0.18 tcsetattr on fd 0 when detached freezes system (RCU timeouts) (Centos 6.1 x86_64) Professor Berkley Shands
@ 2012-01-28  6:55 ` Michael Tokarev
       [not found]   ` <4F283B40.1070200@seas.wustl.edu>
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Tokarev @ 2012-01-28  6:55 UTC (permalink / raw)
  To: Professor Berkley Shands; +Cc: linux-kernel

On 28.01.2012 04:43, Professor Berkley Shands wrote:
> 
> typedef struct
> {
>    struct termios term;
> } XKEY_DATA;
> 
> typedef XKEY_DATA *xkeyhandle;
> 
> static inline xkeyhandle xkeystart()
> {
> 
>    // Turn off echo.
>    struct termios temp;
>    err = tcgetattr(0, &temp);
>    if (err)
>    {
>       perror("tcgetattr failure");
>    }
> 
>    XKEY_DATA *handle = new XKEY_DATA;
>    handle->term = temp;
> 
>    temp.c_lflag &= ~ECHO;
>    temp.c_lflag &= ~ICANON;
> 
>    err = tcsetattr(0, TCSANOW, &temp);        // this line causes the kernel to get very sick
>    if (err)
>    {
>       perror("tcsetattr failure");
>    }
> 
>    return handle;
> 
> }
> 
> The above code, called from main() will produce an error from tcsh:
> 
> /home/bshands> ./a.out > /dev/null &
> [1] 3635
> /home/bshands>
> /home/bshands>
> [1]  + Suspended (tty output) ./a.out > /dev/null
> /home/bshands>
> 
> this does not appear on the redhat kernel, nor 2.6.32.43, but appeared infrequently in 3.0.9.
> in 3.0.18, doing this in the background does *EVIL* things.
> 
> ssh system "./a.out > /dev/null &" &
> 
> Now when the code reaches the tcsetattr() the system quits scheduling tasks.
> top shows 100%sy on 4/12 cores, kernel threads blocked, stalled tasks count increasing.

I used the following code:

=======================================
#include <termios.h>
#include <stdio.h>

int main() {
   struct termios temp;

   if (tcgetattr(0, &temp) != 0)
      perror("tcgetattr failure");

   temp.c_lflag &= ~ECHO;
   temp.c_lflag &= ~ICANON;

   if (tcsetattr(0, TCSANOW, &temp) != 0) // this line causes the kernel to get very sick
      perror("tcsetattr failure");

   return 0;
}
=======================================


But can't reproduce what you're observing.  It prints

 tcgetattr failure: Inappropriate ioctl for device
 tcsetattr failure: Inappropriate ioctl for device

and does not do any evil things.  I tried it on 3.0.18
on x86 on 32bits and 64bits.

What I'm doing wrong?

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 3.0.18 tcsetattr on fd 0 when detached freezes system (RCU timeouts) (Centos 6.1 x86_64)
       [not found]   ` <4F283B40.1070200@seas.wustl.edu>
@ 2012-02-02  8:36     ` Michael Tokarev
  2012-02-02 22:09       ` Professor Berkley Shands
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Tokarev @ 2012-02-02  8:36 UTC (permalink / raw)
  To: Professor Berkley Shands; +Cc: Linux-kernel

On 31.01.2012 23:04, Professor Berkley Shands wrote:
> Very strange. Now boxes with NO OFED, no Intel-10 Gige and no special drivers are locking up.
> If your kernel is Red Hat compatible, could you please send me a copy of the .config so I can try to
> isolate this more?

Why are you writing to me personally?  I just
tried your code and can't find the problem you
see, and I replied to the list.  Cc'ing to the
list now.

I don't understand what is "redhat compatible".
I use kernel from kernel.org, currently at version
3.0.18.

Did you try my small "reproducer" - does it lock your
machines too?  I provided complete code which is
compilable and runnable, unlike your version which
lacked some context.

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 3.0.18 tcsetattr on fd 0 when detached freezes system (RCU timeouts) (Centos 6.1 x86_64)
  2012-02-02  8:36     ` Michael Tokarev
@ 2012-02-02 22:09       ` Professor Berkley Shands
  0 siblings, 0 replies; 4+ messages in thread
From: Professor Berkley Shands @ 2012-02-02 22:09 UTC (permalink / raw)
  Cc: Linux-kernel

I built my .config from the redhat .config provided in 2.6.32-131 using 
make oldconfig.
that failed miserably. I then used one based on 2.6.39.4, which actually 
booted, but I get these
lockup errors, RCU timeouts, ...

The system died right away on the tcsetattr(), (which also did not 
return any error).
And my simple test case crashed all the time. Looked rather suspicous...
Now after a week, *ALL* my 3.0.18 boxes lock up (other than sitting 
IDLE, any load eventually
causes the system to stop scheduling). That is 32 core 6282's, 3.46GHz 
Nehalems, 2.3 GHz 2374's...
I have to assume the tcsetattr() is an artifact at this point.
Without building all the kernels in between 2.6.32.55 and 3.0.18, I 
needed a starting point
for the .config that works. Usually it is something unnoticed that 
needed to be updated
that make oldconfig didn't point out. Things do change that I can't keep 
current on. :-)

So it appears that it has to be my configuration. Hence the request for 
a .config I can compare against to see
what is wrong / misconfigured / not configured etc.

Berkley


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-02-02 22:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-28  0:43 3.0.18 tcsetattr on fd 0 when detached freezes system (RCU timeouts) (Centos 6.1 x86_64) Professor Berkley Shands
2012-01-28  6:55 ` Michael Tokarev
     [not found]   ` <4F283B40.1070200@seas.wustl.edu>
2012-02-02  8:36     ` Michael Tokarev
2012-02-02 22:09       ` Professor Berkley Shands

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).