All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
       [not found]     ` <20030718112758.1da7ab03.skraw@ithnet.com>
@ 2003-07-18 12:23       ` Marcelo Tosatti
  2003-07-18 12:50         ` Stephan von Krawczynski
  0 siblings, 1 reply; 15+ messages in thread
From: Marcelo Tosatti @ 2003-07-18 12:23 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Chris Mason, Andrea Arcangeli, riel, lkml


CCed lkml for obvious reasons

On Fri, 18 Jul 2003, Stephan von Krawczynski wrote:

> On Wed, 16 Jul 2003 08:37:51 -0300 (BRT)
> Marcelo Tosatti <marcelo@conectiva.com.br> wrote:
>
> >
> > Stephan, can you reproduce it easily?
>
> Hello,
>
> there is definitely something about it. pre6 froze after 2 days of
> testing. I guess I was unlucky this time with logfiles, no messages
> there.  There is something severe. You may call it reproducable, but not
> easy.

Stephan,

What is your workload?

I'll try to reproduce it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-18 12:23       ` Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd) Marcelo Tosatti
@ 2003-07-18 12:50         ` Stephan von Krawczynski
  2003-07-18 14:14           ` Marcelo Tosatti
  2003-07-18 17:18           ` Andrea Arcangeli
  0 siblings, 2 replies; 15+ messages in thread
From: Stephan von Krawczynski @ 2003-07-18 12:50 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: mason, andrea, riel, linux-kernel

On Fri, 18 Jul 2003 09:23:10 -0300 (BRT)
Marcelo Tosatti <marcelo@conectiva.com.br> wrote:

> 
> CCed lkml for obvious reasons
> 
> On Fri, 18 Jul 2003, Stephan von Krawczynski wrote:
> 
> > On Wed, 16 Jul 2003 08:37:51 -0300 (BRT)
> > Marcelo Tosatti <marcelo@conectiva.com.br> wrote:
> >
> > >
> > > Stephan, can you reproduce it easily?
> >
> > Hello,
> >
> > there is definitely something about it. pre6 froze after 2 days of
> > testing. I guess I was unlucky this time with logfiles, no messages
> > there.  There is something severe. You may call it reproducable, but not
> > easy.
> 
> Stephan,
> 
> What is your workload?
> 
> I'll try to reproduce it.

You need heavy NFS action and I/O load. Its the same box I use for
server-scenario tests. 3 GB RAM, SMP, 320 GB RAID5 (3ware), SDLT tape drive, 2
x 1000 TX. In detail:

00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (rev 23)
00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01)
00:00.2 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:00.3 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:02.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0d)
00:03.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0d)
00:04.0 VGA compatible controller: ATI Technologies Inc Radeon RV200 QW [Radeon
7500]
00:05.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07)
00:05.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 07)
00:07.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05)
00:0f.3 Host bridge: ServerWorks GCLE Host Bridge
01:02.0 RAID bus controller: 3ware Inc 3ware 7000-series ATA-RAID (rev 01)
01:03.0 Network controller: AVM Audiovisuelles MKTG & Computer System GmbH
Fritz!PCI v2.0 ISDN (rev 01)
01:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit
Ethernet (rev 15)
02:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit
Ethernet (rev 15)
02:03.0 SCSI storage controller: Adaptec AIC-7899P U160/m (rev 01)
02:03.1 SCSI storage controller: Adaptec AIC-7899P U160/m (rev 01)

Take several NFS clients and write to this box some GBs (all at same time),
then copy these files around on the box or tar them. You should see collapses
like from the BUG I posted lately up to complete freeze.
I have continuous cpu load above 2.0 upto about 8.0

Regards,
Stephan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-18 12:50         ` Stephan von Krawczynski
@ 2003-07-18 14:14           ` Marcelo Tosatti
  2003-07-18 15:13             ` Stephan von Krawczynski
  2003-07-21  8:49             ` Stephan von Krawczynski
  2003-07-18 17:18           ` Andrea Arcangeli
  1 sibling, 2 replies; 15+ messages in thread
From: Marcelo Tosatti @ 2003-07-18 14:14 UTC (permalink / raw)
  To: Stephan von Krawczynski
  Cc: Chris Mason, Andrea Arcangeli, riel, lkml, Jim Gifford


I have just started stress testing a 8way OSDL box to see if I can
reproduce the problem. I'm using pre6+axboes BH_Sync patch.

I'm running 50 dbench clients on aic7xxx (ext2) and 50 dbench clients on
DAC960 (ext3). Lets see what happens.

After lunch I'll keep looking at the oopses. During the morning I only had
time to setup the OSDL box and start the tests.

On Fri, 18 Jul 2003, Stephan von Krawczynski wrote:

> On Fri, 18 Jul 2003 09:23:10 -0300 (BRT)
> Marcelo Tosatti <marcelo@conectiva.com.br> wrote:
>
> >
> > CCed lkml for obvious reasons
> >
> > On Fri, 18 Jul 2003, Stephan von Krawczynski wrote:
> >
> > > On Wed, 16 Jul 2003 08:37:51 -0300 (BRT)
> > > Marcelo Tosatti <marcelo@conectiva.com.br> wrote:
> > >
> > > >
> > > > Stephan, can you reproduce it easily?
> > >
> > > Hello,
> > >
> > > there is definitely something about it. pre6 froze after 2 days of
> > > testing. I guess I was unlucky this time with logfiles, no messages
> > > there.  There is something severe. You may call it reproducable, but not
> > > easy.
> >
> > Stephan,
> >
> > What is your workload?
> >
> > I'll try to reproduce it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-18 14:14           ` Marcelo Tosatti
@ 2003-07-18 15:13             ` Stephan von Krawczynski
  2003-07-21  8:49             ` Stephan von Krawczynski
  1 sibling, 0 replies; 15+ messages in thread
From: Stephan von Krawczynski @ 2003-07-18 15:13 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: mason, andrea, riel, linux-kernel, maillist

On Fri, 18 Jul 2003 11:14:15 -0300 (BRT)
Marcelo Tosatti <marcelo@conectiva.com.br> wrote:

> 
> I have just started stress testing a 8way OSDL box to see if I can
> reproduce the problem. I'm using pre6+axboes BH_Sync patch.
> 
> I'm running 50 dbench clients on aic7xxx (ext2) and 50 dbench clients on
> DAC960 (ext3). Lets see what happens.
> 
> After lunch I'll keep looking at the oopses. During the morning I only had
> time to setup the OSDL box and start the tests.

On my box it takes about 48 hours before the problem shows. But that may
heavily depend on the box I guess.

Regards,
Stephan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-18 12:50         ` Stephan von Krawczynski
  2003-07-18 14:14           ` Marcelo Tosatti
@ 2003-07-18 17:18           ` Andrea Arcangeli
  1 sibling, 0 replies; 15+ messages in thread
From: Andrea Arcangeli @ 2003-07-18 17:18 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Marcelo Tosatti, mason, riel, linux-kernel

On Fri, Jul 18, 2003 at 02:50:33PM +0200, Stephan von Krawczynski wrote:
> You need heavy NFS action and I/O load. Its the same box I use for

I wonder if it can be related to the nfs changes. I also had those nfs
changes in my tree previously, but most of them rejected (i.e. a -R
wouldn't clean it up) so there must be further or slightly different
changes in mainline pre6 compared to 21rc8aa1. It could be only an
editing thing though.

It would be very interesting if you could still reproduce w/o nfs (for
example replacing the nfs transfers temporarily with an rsync, that
would reduce the scope of the problem a lot).

Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-18 14:14           ` Marcelo Tosatti
  2003-07-18 15:13             ` Stephan von Krawczynski
@ 2003-07-21  8:49             ` Stephan von Krawczynski
  2003-07-21 11:51               ` Marcelo Tosatti
  2003-07-21 15:05               ` Stephan von Krawczynski
  1 sibling, 2 replies; 15+ messages in thread
From: Stephan von Krawczynski @ 2003-07-21  8:49 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: mason, andrea, riel, linux-kernel, maillist

On Fri, 18 Jul 2003 11:14:15 -0300 (BRT)
Marcelo Tosatti <marcelo@conectiva.com.br> wrote:

> 
> I have just started stress testing a 8way OSDL box to see if I can
> reproduce the problem. I'm using pre6+axboes BH_Sync patch.
> 
> I'm running 50 dbench clients on aic7xxx (ext2) and 50 dbench clients on
> DAC960 (ext3). Lets see what happens.
> 
> After lunch I'll keep looking at the oopses. During the morning I only had
> time to setup the OSDL box and start the tests.

Hello Marcelo,

have you seen anything in your tests? My box just froze again after 3 days
during NFS action. This was with pre6, I am switching over to pre7.

Regards,
Stephan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-21  8:49             ` Stephan von Krawczynski
@ 2003-07-21 11:51               ` Marcelo Tosatti
  2003-07-21 15:05               ` Stephan von Krawczynski
  1 sibling, 0 replies; 15+ messages in thread
From: Marcelo Tosatti @ 2003-07-21 11:51 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: mason, andrea, riel, linux-kernel, maillist



On Mon, 21 Jul 2003, Stephan von Krawczynski wrote:

> On Fri, 18 Jul 2003 11:14:15 -0300 (BRT)
> Marcelo Tosatti <marcelo@conectiva.com.br> wrote:
>
> >
> > I have just started stress testing a 8way OSDL box to see if I can
> > reproduce the problem. I'm using pre6+axboes BH_Sync patch.
> >
> > I'm running 50 dbench clients on aic7xxx (ext2) and 50 dbench clients on
> > DAC960 (ext3). Lets see what happens.
> >
> > After lunch I'll keep looking at the oopses. During the morning I only had
> > time to setup the OSDL box and start the tests.
>
> Hello Marcelo,
>
> have you seen anything in your tests? My box just froze again after 3 days
> during NFS action. This was with pre6, I am switching over to pre7.

No. I just checked it and the 8way is alive and well:

bash-2.05a$ uptime
  4:53am  up 2 days, 18:04,  2 users,  load average: 100.57, 96.27, 95.22


Could you try to reproduce the tests with something else other than NFS?
(local disk, SMB, ...) as Andrea suggested?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-21  8:49             ` Stephan von Krawczynski
  2003-07-21 11:51               ` Marcelo Tosatti
@ 2003-07-21 15:05               ` Stephan von Krawczynski
  2003-07-21 16:20                 ` Andrea Arcangeli
  2003-07-21 17:23                 ` Marcelo Tosatti
  1 sibling, 2 replies; 15+ messages in thread
From: Stephan von Krawczynski @ 2003-07-21 15:05 UTC (permalink / raw)
  To: marcelo; +Cc: mason, andrea, riel, linux-kernel, maillist

On Mon, 21 Jul 2003 10:49:06 +0200
Stephan von Krawczynski <skraw@ithnet.com> wrote:

> On Fri, 18 Jul 2003 11:14:15 -0300 (BRT)
> Marcelo Tosatti <marcelo@conectiva.com.br> wrote:
> 
> > 
> > I have just started stress testing a 8way OSDL box to see if I can
> > reproduce the problem. I'm using pre6+axboes BH_Sync patch.
> > 
> > I'm running 50 dbench clients on aic7xxx (ext2) and 50 dbench clients on
> > DAC960 (ext3). Lets see what happens.
> > 
> > After lunch I'll keep looking at the oopses. During the morning I only had
> > time to setup the OSDL box and start the tests.
> 
> Hello Marcelo,
> 
> have you seen anything in your tests? My box just froze again after 3 days
> during NFS action. This was with pre6, I am switching over to pre7.

I managed to freeze the pre7 box within these few hours. There was no nfs
involved, only tar-to-tape.
I switched back to 2.4.21 to see if it is still stable.
Is there a possibility that the i/o-scheduler has another flaw somewhere (just
like during mount previously) ...


Regards,
Stephan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-21 15:05               ` Stephan von Krawczynski
@ 2003-07-21 16:20                 ` Andrea Arcangeli
  2003-07-21 19:24                   ` Stephan von Krawczynski
  2003-07-21 17:23                 ` Marcelo Tosatti
  1 sibling, 1 reply; 15+ messages in thread
From: Andrea Arcangeli @ 2003-07-21 16:20 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: marcelo, mason, riel, linux-kernel, maillist

On Mon, Jul 21, 2003 at 05:05:17PM +0200, Stephan von Krawczynski wrote:
> On Mon, 21 Jul 2003 10:49:06 +0200
> Stephan von Krawczynski <skraw@ithnet.com> wrote:
> 
> > On Fri, 18 Jul 2003 11:14:15 -0300 (BRT)
> > Marcelo Tosatti <marcelo@conectiva.com.br> wrote:
> > 
> > > 
> > > I have just started stress testing a 8way OSDL box to see if I can
> > > reproduce the problem. I'm using pre6+axboes BH_Sync patch.
> > > 
> > > I'm running 50 dbench clients on aic7xxx (ext2) and 50 dbench clients on
> > > DAC960 (ext3). Lets see what happens.
> > > 
> > > After lunch I'll keep looking at the oopses. During the morning I only had
> > > time to setup the OSDL box and start the tests.
> > 
> > Hello Marcelo,
> > 
> > have you seen anything in your tests? My box just froze again after 3 days
> > during NFS action. This was with pre6, I am switching over to pre7.
> 
> I managed to freeze the pre7 box within these few hours. There was no nfs
> involved, only tar-to-tape.
> I switched back to 2.4.21 to see if it is still stable.
> Is there a possibility that the i/o-scheduler has another flaw somewhere (just
> like during mount previously) ...

is it a scsi tape? Is the tape always involved? there are st.c updates
between 2.4.21 to 22pre7. you can try to back them out.

If only the BKCVS would provide the tags in all files and not only in
the file ChangeSets it would be very easy again to extract all the st.c
updates. What happened to the BKCVS, why aren't the tags present in all
the files anymore? Is it a mistake or intentional?

You should also provide a SYSRQ+P/T of the hang or we can't debug it at
all.

Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-21 15:05               ` Stephan von Krawczynski
  2003-07-21 16:20                 ` Andrea Arcangeli
@ 2003-07-21 17:23                 ` Marcelo Tosatti
  2003-07-21 19:09                   ` Stephan von Krawczynski
  1 sibling, 1 reply; 15+ messages in thread
From: Marcelo Tosatti @ 2003-07-21 17:23 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: mason, andrea, riel, linux-kernel, maillist



On Mon, 21 Jul 2003, Stephan von Krawczynski wrote:

> On Mon, 21 Jul 2003 10:49:06 +0200
> Stephan von Krawczynski <skraw@ithnet.com> wrote:
>
> > On Fri, 18 Jul 2003 11:14:15 -0300 (BRT)
> > Marcelo Tosatti <marcelo@conectiva.com.br> wrote:
> >
> > >
> > > I have just started stress testing a 8way OSDL box to see if I can
> > > reproduce the problem. I'm using pre6+axboes BH_Sync patch.
> > >
> > > I'm running 50 dbench clients on aic7xxx (ext2) and 50 dbench clients on
> > > DAC960 (ext3). Lets see what happens.
> > >
> > > After lunch I'll keep looking at the oopses. During the morning I only had
> > > time to setup the OSDL box and start the tests.
> >
> > Hello Marcelo,
> >
> > have you seen anything in your tests? My box just froze again after 3 days
> > during NFS action. This was with pre6, I am switching over to pre7.
>
> I managed to freeze the pre7 box within these few hours. There was no nfs
> involved, only tar-to-tape.

You had NMI on, correct? Sysrq doesnt work, correct?

> I switched back to 2.4.21 to see if it is still stable. Is there a
> possibility that the i/o-scheduler has another flaw somewhere (just like
> during mount previously) ...

It might be a problem in the IO scheduler, yes.

Lets isolate the problems: If 2.4.21 doenst lockup, try 2.4.22-pre7
without drivers/block/ll_rw_blk{.c,.h} changes.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-21 17:23                 ` Marcelo Tosatti
@ 2003-07-21 19:09                   ` Stephan von Krawczynski
  0 siblings, 0 replies; 15+ messages in thread
From: Stephan von Krawczynski @ 2003-07-21 19:09 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: mason, andrea, riel, linux-kernel, maillist

On Mon, 21 Jul 2003 14:23:53 -0300 (BRT)
Marcelo Tosatti <marcelo@conectiva.com.br> wrote:

> > > Hello Marcelo,
> > >
> > > have you seen anything in your tests? My box just froze again after 3
> > > days during NFS action. This was with pre6, I am switching over to pre7.
> >
> > I managed to freeze the pre7 box within these few hours. There was no nfs
> > involved, only tar-to-tape.
> 
> You had NMI on, correct? Sysrq doesnt work, correct?

Yes, that's right.
 
> > I switched back to 2.4.21 to see if it is still stable. Is there a
> > possibility that the i/o-scheduler has another flaw somewhere (just like
> > during mount previously) ...
> 
> It might be a problem in the IO scheduler, yes.
> 
> Lets isolate the problems: If 2.4.21 doenst lockup, try 2.4.22-pre7
> without drivers/block/ll_rw_blk{.c,.h} changes.

I am pretty confident that 2.4.21 does not lock up, I tested it long time ago
and to my memory it had no problems. Anyway I re-check to make sure the box is
still ok.

Can you send me patches off-list to reverse from -pre7. Just to make sure we
are talking of the same stuff...

Regards,
Stephan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-21 16:20                 ` Andrea Arcangeli
@ 2003-07-21 19:24                   ` Stephan von Krawczynski
  2003-07-21 19:40                     ` Marcelo Tosatti
  2003-07-21 21:05                     ` Marcelo Tosatti
  0 siblings, 2 replies; 15+ messages in thread
From: Stephan von Krawczynski @ 2003-07-21 19:24 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: marcelo, mason, riel, linux-kernel, maillist

On Mon, 21 Jul 2003 12:20:33 -0400
Andrea Arcangeli <andrea@suse.de> wrote:

> > I managed to freeze the pre7 box within these few hours. There was no nfs
> > involved, only tar-to-tape.
> > I switched back to 2.4.21 to see if it is still stable.
> > Is there a possibility that the i/o-scheduler has another flaw somewhere
> > (just like during mount previously) ...
> 
> is it a scsi tape?

yes.

> Is the tape always involved?

No, I experience both freeze during nfs-only action and freeze during
tar-to-scsi-tape.
My feelings are that the freeze does (at least in the nfs case) not happen
during high load but rather when load seems relatively light. Handwaving one
could say it looks rather like an I/O sched starvation issue than breakdown
during high load. Similar to the last issue.

> there are st.c updates
> between 2.4.21 to 22pre7. you can try to back them out.

Hm, which?

> [...]
> You should also provide a SYSRQ+P/T of the hang or we can't debug it at
> all.

Well, I really tried hard to produce something, but failed so far, if I had
more time I would try a serial console hoping that it survives long enough to
show at least _something_.
The only thing I ever could see was the BUG in page-alloc thing from the
beginning of this thread.

Regards,
Stephan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-21 19:24                   ` Stephan von Krawczynski
@ 2003-07-21 19:40                     ` Marcelo Tosatti
  2003-07-21 20:12                       ` Stephan von Krawczynski
  2003-07-21 21:05                     ` Marcelo Tosatti
  1 sibling, 1 reply; 15+ messages in thread
From: Marcelo Tosatti @ 2003-07-21 19:40 UTC (permalink / raw)
  To: Stephan von Krawczynski
  Cc: Andrea Arcangeli, Chris Mason, riel, lkml, maillist



On Mon, 21 Jul 2003, Stephan von Krawczynski wrote:

> On Mon, 21 Jul 2003 12:20:33 -0400
> Andrea Arcangeli <andrea@suse.de> wrote:
>
> > > I managed to freeze the pre7 box within these few hours. There was no nfs
> > > involved, only tar-to-tape.
> > > I switched back to 2.4.21 to see if it is still stable.
> > > Is there a possibility that the i/o-scheduler has another flaw somewhere
> > > (just like during mount previously) ...
> >
> > is it a scsi tape?
>
> yes.
>
> > Is the tape always involved?
>
> No, I experience both freeze during nfs-only action and freeze during
> tar-to-scsi-tape.
> My feelings are that the freeze does (at least in the nfs case) not happen
> during high load but rather when load seems relatively light. Handwaving one
> could say it looks rather like an I/O sched starvation issue than breakdown
> during high load. Similar to the last issue.
>
> > there are st.c updates
> > between 2.4.21 to 22pre7. you can try to back them out.
>
> Hm, which?
>
> > [...]
> > You should also provide a SYSRQ+P/T of the hang or we can't debug it at
> > all.
>
> Well, I really tried hard to produce something, but failed so far, if I had
> more time I would try a serial console hoping that it survives long enough to
> show at least _something_.
> The only thing I ever could see was the BUG in page-alloc thing from the
> beginning of this thread.

Stephan,

I'm sending you the scsi tape driver changes in 2.4.22-pre so you can
revert them (in private in a few minutes).

If that doesnt make us spot the problem, can you PLEASE find out in which
-pre the problem starts ?

Thank you

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-21 19:40                     ` Marcelo Tosatti
@ 2003-07-21 20:12                       ` Stephan von Krawczynski
  0 siblings, 0 replies; 15+ messages in thread
From: Stephan von Krawczynski @ 2003-07-21 20:12 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: andrea, mason, riel, linux-kernel, maillist

On Mon, 21 Jul 2003 16:40:27 -0300 (BRT)
Marcelo Tosatti <marcelo@conectiva.com.br> wrote:

> If that doesnt make us spot the problem, can you PLEASE find out in which
> -pre the problem starts ?

Right away I can tell you there was no problem up to the pre that did not boot
on my box, I thing it was pre3, right? Meaing pre1 and pre2 work.

pre5 was the first one that booted again - and the first I can tell has the
problem.

I can "port" the mini-patch from chris back to pre3 and try this one as next
step...

Regards,
Stephan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd)
  2003-07-21 19:24                   ` Stephan von Krawczynski
  2003-07-21 19:40                     ` Marcelo Tosatti
@ 2003-07-21 21:05                     ` Marcelo Tosatti
  1 sibling, 0 replies; 15+ messages in thread
From: Marcelo Tosatti @ 2003-07-21 21:05 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Andrea Arcangeli, Chris Mason, lkml, Jens Axboe


Just FYI, the 8way box is running for three days with LOTS of IO and
memory pressure:

hostname:  dev8-005 (dev8-005.pdx.osdl.net) running linux

bash-2.05a$ uptime
  2:03pm  up 3 days,  3:14,  2 users,  load average: 82.48, 91.67, 94.29
bash-2.05a$ vmstat 2
   procs                      memory    swap          io     system
cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy  id
 1 77  2   3436   8232  77288 885880   0   0     3    12   13    16   4
9   8
 0 78  3   3436   7300  77448 886596   0   0   108 12184  619   448   0
9  90
 0 78  2   3436  11472  77760 880692   0   0   400 22922  836  2497   2
33  65
 0 77  2   3428   7292  78176 884640   6   0   414  7858  761   511   0
11  88
 0 77  3   3428   7392  78348 884776   0   0   238  9942  687   449   0
9  91
....


Interactivity under this extreme circumstances is impressive. Very good.

Great work Andrea, Mason and Jens. Thanks.


On Mon, 21 Jul 2003, Stephan von Krawczynski wrote:

> On Mon, 21 Jul 2003 12:20:33 -0400
> Andrea Arcangeli <andrea@suse.de> wrote:
>
> > > I managed to freeze the pre7 box within these few hours. There was no nfs
> > > involved, only tar-to-tape.
> > > I switched back to 2.4.21 to see if it is still stable.
> > > Is there a possibility that the i/o-scheduler has another flaw somewhere
> > > (just like during mount previously) ...
> >
> > is it a scsi tape?
>
> yes.
>
> > Is the tape always involved?
>
> No, I experience both freeze during nfs-only action and freeze during
> tar-to-scsi-tape.
> My feelings are that the freeze does (at least in the nfs case) not happen
> during high load but rather when load seems relatively light. Handwaving one
> could say it looks rather like an I/O sched starvation issue than breakdown
> during high load. Similar to the last issue.
>
> > there are st.c updates
> > between 2.4.21 to 22pre7. you can try to back them out.
>
> Hm, which?
>
> > [...]
> > You should also provide a SYSRQ+P/T of the hang or we can't debug it at
> > all.
>
> Well, I really tried hard to produce something, but failed so far, if I had
> more time I would try a serial console hoping that it survives long enough to
> show at least _something_.
> The only thing I ever could see was the BUG in page-alloc thing from the
> beginning of this thread.
>
> Regards,
> Stephan
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2003-07-21 20:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.55L.0307150859130.5146@freak.distro.conectiva>
     [not found] ` <1058297936.4016.86.camel@tiny.suse.com>
     [not found]   ` <Pine.LNX.4.55L.0307160836270.30825@freak.distro.conectiva>
     [not found]     ` <20030718112758.1da7ab03.skraw@ithnet.com>
2003-07-18 12:23       ` Bug Report: 2.4.22-pre5: BUG in page_alloc (fwd) Marcelo Tosatti
2003-07-18 12:50         ` Stephan von Krawczynski
2003-07-18 14:14           ` Marcelo Tosatti
2003-07-18 15:13             ` Stephan von Krawczynski
2003-07-21  8:49             ` Stephan von Krawczynski
2003-07-21 11:51               ` Marcelo Tosatti
2003-07-21 15:05               ` Stephan von Krawczynski
2003-07-21 16:20                 ` Andrea Arcangeli
2003-07-21 19:24                   ` Stephan von Krawczynski
2003-07-21 19:40                     ` Marcelo Tosatti
2003-07-21 20:12                       ` Stephan von Krawczynski
2003-07-21 21:05                     ` Marcelo Tosatti
2003-07-21 17:23                 ` Marcelo Tosatti
2003-07-21 19:09                   ` Stephan von Krawczynski
2003-07-18 17:18           ` Andrea Arcangeli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.