linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel hangs up - possible sendfile() epoll() bug?
@ 2003-08-15 23:05 Yaoping Ruan
  2003-08-15 23:43 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Yaoping Ruan @ 2003-08-15 23:05 UTC (permalink / raw)
  To: linux-kernel

Hi,

Recently we updated a user space web server to use the sendfile() and
epoll() interface, and tried to measure the performance with SpecWeb99
benchmark. As the load increases, e.g a SpecWeb99's target score of 600
connection, the kernel sometimes hangs up without any logging
information, and the only way left is to push the reset button to
reboot.

We also made similar updates to use sendfile() and kevent() on FreeBSD
and achieved a score of 1000 connections. Thus the possibility of
application bug is low (also it is a user space server). Before the
sendfile() and epoll() change, it was also fine but only could get a
SpecWeb99 score of 500.

The kernel we were using was 2.4.21 with the epoll patch applied. Since
the epoll man pages mention the interface is stabilized in 2.5.66, we
also tried 2.5.66 but didn't see anything better. The machine is a PIII
Xeon processor-based Intel server motherboard, with 2 CPU support but
only 1 is used, Maxtor Diamond IDE disk, Promise Ultra DMA 66
controller, and a single Netgear GA621 gigabit ethernet network adapter.

Is this possibly caused by the newly introduced interfaces?




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kernel hangs up - possible sendfile() epoll() bug?
  2003-08-15 23:05 kernel hangs up - possible sendfile() epoll() bug? Yaoping Ruan
@ 2003-08-15 23:43 ` Andrew Morton
  2003-08-22  4:41   ` kernel hangs up running web server Yaoping Ruan
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2003-08-15 23:43 UTC (permalink / raw)
  To: Yaoping Ruan; +Cc: linux-kernel

Yaoping Ruan <yruan@cs.princeton.edu> wrote:
>
> Hi,
> 
> Recently we updated a user space web server to use the sendfile() and
> epoll() interface, and tried to measure the performance with SpecWeb99
> benchmark. As the load increases, e.g a SpecWeb99's target score of 600
> connection, the kernel sometimes hangs up without any logging
> information, and the only way left is to push the reset button to
> reboot.
> 
> We also made similar updates to use sendfile() and kevent() on FreeBSD
> and achieved a score of 1000 connections. Thus the possibility of
> application bug is low (also it is a user space server). Before the
> sendfile() and epoll() change, it was also fine but only could get a
> SpecWeb99 score of 500.
> 
> The kernel we were using was 2.4.21 with the epoll patch applied. Since
> the epoll man pages mention the interface is stabilized in 2.5.66, we
> also tried 2.5.66 but didn't see anything better. The machine is a PIII
> Xeon processor-based Intel server motherboard, with 2 CPU support but
> only 1 is used, Maxtor Diamond IDE disk, Promise Ultra DMA 66
> controller, and a single Netgear GA621 gigabit ethernet network adapter.
> 

Definitely a kernel bug.

Could you please test 2.6.0-test3?  If that has the same problem then
some initial steps would be:


- Boot the kernel with the "nmi_watchdog=1" option on the kernel boot
  command line.  (It needs to be an SMP-compiled kernel for this to work. 
  Or one which has the local APIC enabled in config)

- Make sure that /proc/sys/kernel/sysrq was set to `1' after booting.

- Can you still ping the machine after it hangs up?

- Type ALT-SYSRQ-T and/pr ALT-SYSRQ-P on the keyboard, see if you get a trace.

- ALT-SYSRQ-M may be interesting too (memory stats)

If the nmi watchdog doesn't generate a trace then the sysrq keys should do
so.

If the above does not provide us with enough information to solve the bug
then the next step would be for you to provide sufficient material for a
kernel developer to reproduce the problem.

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* kernel hangs up running web server
  2003-08-15 23:43 ` Andrew Morton
@ 2003-08-22  4:41   ` Yaoping Ruan
  2003-08-22  5:19     ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Yaoping Ruan @ 2003-08-22  4:41 UTC (permalink / raw)
  To: linux-kernel

Hi,

I posted this kernel lock problem sometime last week. Though there're helpful
suggestion, I haven't found any useful information from the kernel. I've tried
2.6.0-test3 and the same thing happened. The machine couldn't be reached by "ping"
after hanging up. But the bug may not be in sendfile() or epoll() since a version
without these two system call also causes kernel lock.

Thus I've made both the server (Flash) and the workload generator (Flexiclient)
available at:
www.cs.princeton.edu/~yruan/flash
 and would appreciate if any of the developers could try them out. To run the test

1. generate fileset at server side by copying over fileset and
   ./fileset -s zipfset -n #DIRS
2. compile Flash (may change options in config.h) and run
   ./flash -user YOUR_ACCOUNT
3. run client as:
   ./zipfgen -s spec -n #DIRS | ./batcher 5 | ./flexiclient -host HOST -time
SECONDS -active 1000
(it's a rarely happened bug, better to run more than 1800 seconds, with 1000
connections)

Thanks a lot


- Yaoping


Andrew Morton wrote:

> Yaoping Ruan <yruan@cs.princeton.edu> wrote:
> >
> > Hi,
> >
> > Recently we updated a user space web server to use the sendfile() and
> > epoll() interface, and tried to measure the performance with SpecWeb99
> > benchmark. As the load increases, e.g a SpecWeb99's target score of 600
> > connection, the kernel sometimes hangs up without any logging
> > information, and the only way left is to push the reset button to
> > reboot.
> >
> > We also made similar updates to use sendfile() and kevent() on FreeBSD
> > and achieved a score of 1000 connections. Thus the possibility of
> > application bug is low (also it is a user space server). Before the
> > sendfile() and epoll() change, it was also fine but only could get a
> > SpecWeb99 score of 500.
> >
> > The kernel we were using was 2.4.21 with the epoll patch applied. Since
> > the epoll man pages mention the interface is stabilized in 2.5.66, we
> > also tried 2.5.66 but didn't see anything better. The machine is a PIII
> > Xeon processor-based Intel server motherboard, with 2 CPU support but
> > only 1 is used, Maxtor Diamond IDE disk, Promise Ultra DMA 66
> > controller, and a single Netgear GA621 gigabit ethernet network adapter.
> >
>
> Definitely a kernel bug.
>
> Could you please test 2.6.0-test3?  If that has the same problem then
> some initial steps would be:
>
> - Boot the kernel with the "nmi_watchdog=1" option on the kernel boot
>   command line.  (It needs to be an SMP-compiled kernel for this to work.
>   Or one which has the local APIC enabled in config)
>
> - Make sure that /proc/sys/kernel/sysrq was set to `1' after booting.
>
> - Can you still ping the machine after it hangs up?
>
> - Type ALT-SYSRQ-T and/pr ALT-SYSRQ-P on the keyboard, see if you get a trace.
>
> - ALT-SYSRQ-M may be interesting too (memory stats)
>
> If the nmi watchdog doesn't generate a trace then the sysrq keys should do
> so.
>
> If the above does not provide us with enough information to solve the bug
> then the next step would be for you to provide sufficient material for a
> kernel developer to reproduce the problem.
>
> Thanks.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kernel hangs up running web server
  2003-08-22  4:41   ` kernel hangs up running web server Yaoping Ruan
@ 2003-08-22  5:19     ` Andrew Morton
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2003-08-22  5:19 UTC (permalink / raw)
  To: Yaoping Ruan; +Cc: linux-kernel

Yaoping Ruan <yruan@cs.princeton.edu> wrote:
>
> Thus I've made both the server (Flash) and the workload generator (Flexiclient)
>  available at:
>  www.cs.princeton.edu/~yruan/flash
>   and would appreciate if any of the developers could try them out. To run the test
> 
>  1. generate fileset at server side by copying over fileset and
>     ./fileset -s zipfset -n #DIRS
>  2. compile Flash (may change options in config.h) and run
>     ./flash -user YOUR_ACCOUNT
>  3. run client as:
>     ./zipfgen -s spec -n #DIRS | ./batcher 5 | ./flexiclient -host HOST -time
>  SECONDS -active 1000
>  (it's a rarely happened bug, better to run more than 1800 seconds, with 1000
>  connections)

How does one tell the server how to locate its fileset?


I get this.  What does it mean?

vmm:/home/akpm/flash/flexi-curr> ./zipfgen -s spec -n 100 | ./batcher 5 | ./flexiclient -host localhost -time 2000 -active 1000
-host localhost : name of machine/interface running server
-port 31415 : listen port # on server
-active 1000 : number of simultaneous outstanding requests
-maxconns 0 : max idle and active connections
-persist groups : enable persistent connections (off,groups,force)
-hash 0 : print hash mark for each completed request
-time 2000 : # seconds to run test
-sync 0 : synchronize clients on different machines using clientmaster
-output  : sends incoming data to this file name
-printint 20 : specifies seconds to print real summary every
-reqrate 0 : if set, max burst request rate
-rcvwin 48 : if set, new size of receive window in KB
-exhdrs  : if set, extra headers for each request
-xml 0 : if set, produce output in XML format
-statfile stdout : sends statistics to this file
-doskip 0 : if set, skips trace entries to avoid batching
-trace 0 : same as -doskip
-persec 1 : print statistics at per second level
file flexiclient.c, line 796, not being held

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-08-22  5:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-15 23:05 kernel hangs up - possible sendfile() epoll() bug? Yaoping Ruan
2003-08-15 23:43 ` Andrew Morton
2003-08-22  4:41   ` kernel hangs up running web server Yaoping Ruan
2003-08-22  5:19     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).