linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG: soft lockup on all kernels after 2.6.3x
@ 2013-02-09 14:10 Alexey Vlasov
  2013-02-09 15:07 ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Alexey Vlasov @ 2013-02-09 14:10 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1674 bytes --]

Hello.

I used 2.6.2x kernel for a long time on my shared hosting and I didn't
have any problems. Kernels worked well and server uptime was about 2-3
years.

But investigating some strange hangings of my clients' sites I came to
this:
http://bugs.mysql.com/bug.php?id=50399
from this bug it is clear that on kernels younger than 2.6.32 (
unfortunately I can't remember if it is true with 2.6.30-31) happens
mysql client hanging.

It is not clear whether it is a bug of kernel or libc or mysql-client, I
didn't manage to find it out. I decided to do simpler (as it seemed to
me at that moment) to start using 2.6.3x kernels. And that caused
greater problems. By trying to use new kernels on my working servers
with peak load I got an uptime from an hour to 1-3 months.

I even got some statistics for how long can every kernel from version
2.6.32 work in peak load situations. It sounds funny but my clients are
not happy with all these reboots.

>From all the variety of servers from 2.6.32 to 3.7.4 I can say that
2.6.35 is the most stabil, I got about 30 servers on it. But they hang
usually once in 1-3 months.

Returning to the problem of kernels >= 2.6.32, as I have noticed they
hang totally alike, giving the console:

...
Feb  8 10:27:45 10.2.0.7 [470393.417168] BUG: soft lockup - CPU#2 stuck for 61s! [vsftpd:29013]
...
[see the attachment]

it doesn't happen on an empty server, only on loaded ones. Unfortunately
I don't know how to provoke such hanging artificially.

I' ve given a trace attached. In fact I don't know what to do with all
these bugs, I can't use 2.6.2x because of MySQL hanging and = >2.6.3
start hanging themselves.

-- 
BRGDS. Alexey Vlasov.

[-- Attachment #2: bug_spinlock.txt.gz --]
[-- Type: application/octet-stream, Size: 247926 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: soft lockup on all kernels after 2.6.3x
  2013-02-09 14:10 BUG: soft lockup on all kernels after 2.6.3x Alexey Vlasov
@ 2013-02-09 15:07 ` Eric Dumazet
  2013-02-09 15:30   ` Alexey Vlasov
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Eric Dumazet @ 2013-02-09 15:07 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: linux-kernel

On Sat, 2013-02-09 at 18:10 +0400, Alexey Vlasov wrote:
> Hello.
> 
> I used 2.6.2x kernel for a long time on my shared hosting and I didn't
> have any problems. Kernels worked well and server uptime was about 2-3
> years.
> 
> But investigating some strange hangings of my clients' sites I came to
> this:
> http://bugs.mysql.com/bug.php?id=50399
> from this bug it is clear that on kernels younger than 2.6.32 (
> unfortunately I can't remember if it is true with 2.6.30-31) happens
> mysql client hanging.
> 
> It is not clear whether it is a bug of kernel or libc or mysql-client, I
> didn't manage to find it out. I decided to do simpler (as it seemed to
> me at that moment) to start using 2.6.3x kernels. And that caused
> greater problems. By trying to use new kernels on my working servers
> with peak load I got an uptime from an hour to 1-3 months.
> 
> I even got some statistics for how long can every kernel from version
> 2.6.32 work in peak load situations. It sounds funny but my clients are
> not happy with all these reboots.
> 
> From all the variety of servers from 2.6.32 to 3.7.4 I can say that
> 2.6.35 is the most stabil, I got about 30 servers on it. But they hang
> usually once in 1-3 months.
> 
> Returning to the problem of kernels >= 2.6.32, as I have noticed they
> hang totally alike, giving the console:
> 
> ...
> Feb  8 10:27:45 10.2.0.7 [470393.417168] BUG: soft lockup - CPU#2 stuck for 61s! [vsftpd:29013]
> ...
> [see the attachment]
> 
> it doesn't happen on an empty server, only on loaded ones. Unfortunately
> I don't know how to provoke such hanging artificially.
> 
> I' ve given a trace attached. In fact I don't know what to do with all
> these bugs, I can't use 2.6.2x because of MySQL hanging and = >2.6.3
> start hanging themselves.
> 

Did you compile the kernel yourself, or is it a standard kernel (distro
provided) ?

Your traces dont contain symbols, its quite hard to guess the issue.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: soft lockup on all kernels after 2.6.3x
  2013-02-09 15:07 ` Eric Dumazet
@ 2013-02-09 15:30   ` Alexey Vlasov
  2013-03-07 12:54   ` Alexey Vlasov
  2013-03-07 13:34   ` Alexey Vlasov
  2 siblings, 0 replies; 11+ messages in thread
From: Alexey Vlasov @ 2013-02-09 15:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel

On Sat, Feb 09, 2013 at 07:07:53AM -0800, Eric Dumazet wrote:
> Did you compile the kernel yourself, or is it a standard kernel (distro
> provided) ?
> 
> Your traces dont contain symbols, its quite hard to guess the issue.

I compile the kernel myself. Should I add CONFIG_DEBUG_INFO ? ok then
I'll try it. May be I should switch on anything else to get more info
for debug?
Thanks.

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: soft lockup on all kernels after 2.6.3x
  2013-02-09 15:07 ` Eric Dumazet
  2013-02-09 15:30   ` Alexey Vlasov
@ 2013-03-07 12:54   ` Alexey Vlasov
  2013-03-07 16:20     ` Eric Dumazet
  2013-03-07 13:34   ` Alexey Vlasov
  2 siblings, 1 reply; 11+ messages in thread
From: Alexey Vlasov @ 2013-03-07 12:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1172 bytes --]

Hi,

On Sat, Feb 09, 2013 at 07:07:53AM -0800, Eric Dumazet wrote:
> >
> > I used 2.6.2x kernel for a long time on my shared hosting and I didn't
> > have any problems. Kernels worked well and server uptime was about 2-3
> > years.
> > 
> > ...
> > 
> > it doesn't happen on an empty server, only on loaded ones. Unfortunately
> > I don't know how to provoke such hanging artificially.
> > 
 
> Your traces dont contain symbols, its quite hard to guess the issue.

Well the server got high loaded and began to crash almost once a day.

=====
BUG: soft lockup - CPU#1 stuck for 23s! [httpd:21686]
Call Trace:
[<ffffffff8110bba5>] ? mntput_no_expire+0x25/0x170
[<ffffffff810f9389>] ? path_lookupat+0x189/0x890
[<ffffffff810f9b67>] ? filename_lookup.clone.39+0xd7/0xe0
[<ffffffff810fc85c>] ? user_path_at_empty+0x5c/0xb0
[<ffffffff8102b5f9>] ? __do_page_fault+0x1b9/0x480
[<ffffffff810f146e>] ? vfs_fstatat+0x3e/0x90
[<ffffffff810c54bf>] ? remove_vma+0x5f/0x70
[<ffffffff810f168f>] ? sys_newstat+0x1f/0x50
[<ffffffff814b09c2>] ? page_fault+0x22/0x30
[<ffffffff814b0f49>] ? system_call_fastpath+0x18/0x1d
=====

There's a full trace in attachment. 

-- 
BRGDS. Alexey Vlasov.

[-- Attachment #2: bug_softlockup.txt.gz --]
[-- Type: application/octet-stream, Size: 12507 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: soft lockup on all kernels after 2.6.3x
  2013-02-09 15:07 ` Eric Dumazet
  2013-02-09 15:30   ` Alexey Vlasov
  2013-03-07 12:54   ` Alexey Vlasov
@ 2013-03-07 13:34   ` Alexey Vlasov
  2013-03-07 13:41     ` BUG: soft lockup on all kernels after 2.6.3x (include full log) Alexey Vlasov
  2 siblings, 1 reply; 11+ messages in thread
From: Alexey Vlasov @ 2013-03-07 13:34 UTC (permalink / raw)
  To: linux-kernel

Hi,

On Sat, Feb 09, 2013 at 07:07:53AM -0800, Eric Dumazet wrote:
> >
> > I used 2.6.2x kernel for a long time on my shared hosting and I
> > didn't
> > have any problems. Kernels worked well and server uptime was about
> > 2-3
> > years.
> >
> > ...
> >
> > it doesn't happen on an empty server, only on loaded ones.
> > Unfortunately
> > I don't know how to provoke such hanging artificially.
> >

> Your traces dont contain symbols, its quite hard to guess the issue.

Well the server got high loaded and began to crash almost once a day.

=====
BUG: soft lockup - CPU#1 stuck for 23s! [httpd:21686]
Call Trace:
[<ffffffff8110bba5>] ? mntput_no_expire+0x25/0x170
[<ffffffff810f9389>] ? path_lookupat+0x189/0x890
[<ffffffff810f9b67>] ? filename_lookup.clone.39+0xd7/0xe0
[<ffffffff810fc85c>] ? user_path_at_empty+0x5c/0xb0
[<ffffffff8102b5f9>] ? __do_page_fault+0x1b9/0x480
[<ffffffff810f146e>] ? vfs_fstatat+0x3e/0x90
[<ffffffff810c54bf>] ? remove_vma+0x5f/0x70
[<ffffffff810f168f>] ? sys_newstat+0x1f/0x50
[<ffffffff814b09c2>] ? page_fault+0x22/0x30
[<ffffffff814b0f49>] ? system_call_fastpath+0x18/0x1d
=====

There's a full trace in attachment.

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: soft lockup on all kernels after 2.6.3x (include full log)
  2013-03-07 13:34   ` Alexey Vlasov
@ 2013-03-07 13:41     ` Alexey Vlasov
  0 siblings, 0 replies; 11+ messages in thread
From: Alexey Vlasov @ 2013-03-07 13:41 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 132 bytes --]

On Thu, Mar 07, 2013 at 05:34:14PM +0400, Alexey Vlasov wrote:
> 
> There's a full trace in attachment.
 
-- 
BRGDS. Alexey Vlasov.

[-- Attachment #2: bug_softlockup.txt.gz --]
[-- Type: application/octet-stream, Size: 12507 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: soft lockup on all kernels after 2.6.3x
  2013-03-07 12:54   ` Alexey Vlasov
@ 2013-03-07 16:20     ` Eric Dumazet
  2013-03-07 16:37       ` Alexey Vlasov
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2013-03-07 16:20 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: linux-kernel

On Thu, 2013-03-07 at 16:54 +0400, Alexey Vlasov wrote:
> Hi,
> 
> On Sat, Feb 09, 2013 at 07:07:53AM -0800, Eric Dumazet wrote:
> > >
> > > I used 2.6.2x kernel for a long time on my shared hosting and I didn't
> > > have any problems. Kernels worked well and server uptime was about 2-3
> > > years.
> > > 
> > > ...
> > > 
> > > it doesn't happen on an empty server, only on loaded ones. Unfortunately
> > > I don't know how to provoke such hanging artificially.
> > > 
>  
> > Your traces dont contain symbols, its quite hard to guess the issue.
> 
> Well the server got high loaded and began to crash almost once a day.
> 
> =====
> BUG: soft lockup - CPU#1 stuck for 23s! [httpd:21686]
> Call Trace:
> [<ffffffff8110bba5>] ? mntput_no_expire+0x25/0x170
> [<ffffffff810f9389>] ? path_lookupat+0x189/0x890
> [<ffffffff810f9b67>] ? filename_lookup.clone.39+0xd7/0xe0
> [<ffffffff810fc85c>] ? user_path_at_empty+0x5c/0xb0
> [<ffffffff8102b5f9>] ? __do_page_fault+0x1b9/0x480
> [<ffffffff810f146e>] ? vfs_fstatat+0x3e/0x90
> [<ffffffff810c54bf>] ? remove_vma+0x5f/0x70
> [<ffffffff810f168f>] ? sys_newstat+0x1f/0x50
> [<ffffffff814b09c2>] ? page_fault+0x22/0x30
> [<ffffffff814b0f49>] ? system_call_fastpath+0x18/0x1d
> =====
> 
> There's a full trace in attachment. 
> 


Seems a VFS issue. 

A "umount" is done, blocking almost all other cpus in lg_local_lock()

What are gr_xxxx symbols ?

Mar  7 00:50:00 l25 [1735187.889877]  [<ffffffff8110e118>] ? is_path_reachable+0x48/0x60
Mar  7 00:50:00 l25 [1735187.889880]  [<ffffffff8110e163>] ? path_is_under+0x33/0x60
Mar  7 00:50:00 l25 [1735187.889887]  [<ffffffff812257a4>] ? gr_is_outside_chroot+0x54/0x70
Mar  7 00:50:00 l25 [1735187.889890]  [<ffffffff81225815>] ? gr_chroot_fchdir+0x55/0x80
Mar  7 00:50:00 l25 [1735187.889894]  [<ffffffff810f9b2e>] ? filename_lookup.clone.39+0x9e/0xe0
Mar  7 00:50:00 l25 [1735187.889897]  [<ffffffff810fc85c>] ? user_path_at_empty+0x5c/0xb0
Mar  7 00:50:00 l25 [1735187.889903]  [<ffffffff8102b5f9>] ? __do_page_fault+0x1b9/0x480
Mar  7 00:50:00 l25 [1735187.889907]  [<ffffffff814b09c2>] ? page_fault+0x22/0x30
Mar  7 00:50:00 l25 [1735187.889910]  [<ffffffff810f146e>] ? vfs_fstatat+0x3e/0x90
Mar  7 00:50:00 l25 [1735187.889914]  [<ffffffff812278cb>] ? gr_learn_resource+0x3b/0x1e0
Mar  7 00:50:00 l25 [1735187.889918]  [<ffffffff810f168f>] ? sys_newstat+0x1f/0x50
Mar  7 00:50:00 l25 [1735187.889922]  [<ffffffff810ea4b4>] ? filp_close+0x54/0x80
Mar  7 00:50:00 l25 [1735187.889925]  [<ffffffff814b09c2>] ? page_fault+0x22/0x30
Mar  7 00:50:00 l25 [1735187.889928]  [<ffffffff814b0f49>] ? system_call_fastpath+0x18/0x1d



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: soft lockup on all kernels after 2.6.3x
  2013-03-07 16:20     ` Eric Dumazet
@ 2013-03-07 16:37       ` Alexey Vlasov
  2013-03-07 16:44         ` richard -rw- weinberger
  2013-03-07 16:57         ` Eric Dumazet
  0 siblings, 2 replies; 11+ messages in thread
From: Alexey Vlasov @ 2013-03-07 16:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel

On Thu, Mar 07, 2013 at 08:20:23AM -0800, Eric Dumazet wrote:
>
> What are gr_xxxx symbols ?

This is grsecurity patches ;)
 
> Mar  7 00:50:00 l25 [1735187.889877]  [<ffffffff8110e118>] ? is_path_reachable+0x48/0x60
> Mar  7 00:50:00 l25 [1735187.889880]  [<ffffffff8110e163>] ? path_is_under+0x33/0x60
> Mar  7 00:50:00 l25 [1735187.889887]  [<ffffffff812257a4>] ? gr_is_outside_chroot+0x54/0x70
> Mar  7 00:50:00 l25 [1735187.889890]  [<ffffffff81225815>] ? gr_chroot_fchdir+0x55/0x80
> Mar  7 00:50:00 l25 [1735187.889894]  [<ffffffff810f9b2e>] ? filename_lookup.clone.39+0x9e/0xe0
> Mar  7 00:50:00 l25 [1735187.889897]  [<ffffffff810fc85c>] ? user_path_at_empty+0x5c/0xb0
> Mar  7 00:50:00 l25 [1735187.889903]  [<ffffffff8102b5f9>] ? __do_page_fault+0x1b9/0x480
> Mar  7 00:50:00 l25 [1735187.889907]  [<ffffffff814b09c2>] ? page_fault+0x22/0x30
> Mar  7 00:50:00 l25 [1735187.889910]  [<ffffffff810f146e>] ? vfs_fstatat+0x3e/0x90
> Mar  7 00:50:00 l25 [1735187.889914]  [<ffffffff812278cb>] ? gr_learn_resource+0x3b/0x1e0
> Mar  7 00:50:00 l25 [1735187.889918]  [<ffffffff810f168f>] ? sys_newstat+0x1f/0x50
> Mar  7 00:50:00 l25 [1735187.889922]  [<ffffffff810ea4b4>] ? filp_close+0x54/0x80
> Mar  7 00:50:00 l25 [1735187.889925]  [<ffffffff814b09c2>] ? page_fault+0x22/0x30
> Mar  7 00:50:00 l25 [1735187.889928]  [<ffffffff814b0f49>] ? system_call_fastpath+0x18/0x1d

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: soft lockup on all kernels after 2.6.3x
  2013-03-07 16:37       ` Alexey Vlasov
@ 2013-03-07 16:44         ` richard -rw- weinberger
  2013-03-07 16:57         ` Eric Dumazet
  1 sibling, 0 replies; 11+ messages in thread
From: richard -rw- weinberger @ 2013-03-07 16:44 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: Eric Dumazet, linux-kernel

On Thu, Mar 7, 2013 at 5:37 PM, Alexey Vlasov <renton@renton.name> wrote:
> On Thu, Mar 07, 2013 at 08:20:23AM -0800, Eric Dumazet wrote:
>>
>> What are gr_xxxx symbols ?
>
> This is grsecurity patches ;)

Please reproduce without grsec...

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: soft lockup on all kernels after 2.6.3x
  2013-03-07 16:37       ` Alexey Vlasov
  2013-03-07 16:44         ` richard -rw- weinberger
@ 2013-03-07 16:57         ` Eric Dumazet
  2013-03-09 19:11           ` Alexey Vlasov
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2013-03-07 16:57 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: linux-kernel

On Thu, 2013-03-07 at 20:37 +0400, Alexey Vlasov wrote:
> On Thu, Mar 07, 2013 at 08:20:23AM -0800, Eric Dumazet wrote:
> >
> > What are gr_xxxx symbols ?
> 
> This is grsecurity patches ;)
>  

Well, remove all alien patches and try to reproduce the bug with a
pristine linux kernel.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: soft lockup on all kernels after 2.6.3x
  2013-03-07 16:57         ` Eric Dumazet
@ 2013-03-09 19:11           ` Alexey Vlasov
  0 siblings, 0 replies; 11+ messages in thread
From: Alexey Vlasov @ 2013-03-09 19:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: richard -rw- weinberger, linux-kernel

On Thu, Mar 07, 2013 at 08:57:28AM -0800, Eric Dumazet wrote:
> 
> Well, remove all alien patches and try to reproduce the bug with a
> pristine linux kernel.

I wrote to Spender (developer grsec) and he confirmed that it's possible
that a problem is with grsec patch.

Thank you greatly for your answers!

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-03-09 19:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-09 14:10 BUG: soft lockup on all kernels after 2.6.3x Alexey Vlasov
2013-02-09 15:07 ` Eric Dumazet
2013-02-09 15:30   ` Alexey Vlasov
2013-03-07 12:54   ` Alexey Vlasov
2013-03-07 16:20     ` Eric Dumazet
2013-03-07 16:37       ` Alexey Vlasov
2013-03-07 16:44         ` richard -rw- weinberger
2013-03-07 16:57         ` Eric Dumazet
2013-03-09 19:11           ` Alexey Vlasov
2013-03-07 13:34   ` Alexey Vlasov
2013-03-07 13:41     ` BUG: soft lockup on all kernels after 2.6.3x (include full log) Alexey Vlasov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).