linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Nick's scheduler v18
       [not found] <1068589319.1557.1.camel@localhost.localdomain>
@ 2003-11-11 22:30 ` Tom Sightler
  2003-11-12  0:38   ` Nick Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Tom Sightler @ 2003-11-11 22:30 UTC (permalink / raw)
  To: LKML; +Cc: piggin

On Tue, 2003-11-11 at 17:22, Tom Sightler wrote:
> http://www.kerneltrap.org/~npiggin/v18/
> 
> Nothing exciting for desktop users. High end performance is now starting
> to get better.

Hey Nick,

Was this tested against single processor?  On my Dell Latitude C810 I
can boot test9 and test9-mm2 without problems, but using the identical
config with this patch my system will not even boot up all the way.  It
stops at various stages during the init scripts.  It seems to
consistently get further if I add elevator=deadline but it never boots
all the way up in either case.

No messages or other good info, just hangs and won't go any further. 
Any thoughts?

Later,
Tom



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Nick's scheduler v18
  2003-11-11 22:30 ` Nick's scheduler v18 Tom Sightler
@ 2003-11-12  0:38   ` Nick Piggin
  0 siblings, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2003-11-12  0:38 UTC (permalink / raw)
  To: Tom Sightler; +Cc: LKML



Tom Sightler wrote:

>On Tue, 2003-11-11 at 17:22, Tom Sightler wrote:
>
>>http://www.kerneltrap.org/~npiggin/v18/
>>
>>Nothing exciting for desktop users. High end performance is now starting
>>to get better.
>>
>
>Hey Nick,
>
>Was this tested against single processor?  On my Dell Latitude C810 I
>can boot test9 and test9-mm2 without problems, but using the identical
>config with this patch my system will not even boot up all the way.  It
>stops at various stages during the init scripts.  It seems to
>consistently get further if I add elevator=deadline but it never boots
>all the way up in either case.
>
>No messages or other good info, just hangs and won't go any further. 
>Any thoughts?
>

Yeah, tested on UP. Sigh. Can I have a look at your .config? Do you have
preempt on?

Thanks



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Nick's scheduler v18
  2003-11-13 22:27     ` Mike Fedyk
@ 2003-11-14 10:34       ` Sven Luther
  0 siblings, 0 replies; 10+ messages in thread
From: Sven Luther @ 2003-11-14 10:34 UTC (permalink / raw)
  To: Andrew Morton, Mary Edie Meredith, piggin, linux-kernel, jenny

On Thu, Nov 13, 2003 at 02:27:51PM -0800, Mike Fedyk wrote:
> On Thu, Nov 13, 2003 at 11:39:06AM -0800, Andrew Morton wrote:
> > What filesystem was being used?
> > 
> > If it was ext2 then perhaps you hit the recently-fixed block allocator
> > race.  That fix was merged after test9.  Please check the kernel logs for
> > any filesystem error messages.
> > 
> > Also, please retry the run, see if it is repeatable.
> 
> Did that hit ext3 also?  ISTR, getting some "access beyond end of device"
> while running ext3.

BTW, i did encounter some problem with amiga partitions which had some
bad values due to a bug in libparted now fixed. The head size was
counted double or something such, which resulted in accesses beyon the
end of the device. It has a funny effect though. The box would freeze,
and the IDE led would flash in 1 second intervals. Not sure it is the
expected behavior. This is with a 2.4.22 kernel, both on x86 and ppc.

Friendly,

Sven Luther

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Nick's scheduler v18
  2003-11-13 19:39   ` Andrew Morton
  2003-11-13 22:27     ` Mike Fedyk
@ 2003-11-14  5:45     ` Nick Piggin
  1 sibling, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2003-11-14  5:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mary Edie Meredith, linux-kernel, jenny



Andrew Morton wrote:

>Mary Edie Meredith <maryedie@osdl.org> wrote:
>
>>Nick,
>>
>>We ran your patch on STP against one of our database workloads (DBT3 on
>>postgreSQL which uses file system rather than raw).
>>
>>The test was able to compile, successfully start up the database,
>>successfully load the database from source file, successfully run the
>>power test (single stream update/query/delete).   
>>
>>It failed, however at the next stage, where it starts 8 streams of query
>>and one stream of updates/deletes where it ran for approximately 40
>>minutes (usually takes over an hour to complete).  The updates appear to
>>have completed and only queries were active at the time of failure.  See
>>the error message below from the database log.
>>
>>...
>>
>>PANIC:  fdatasync of log file 1, segment 81 failed: Input/output error
>>
>>
>
>It's hard to see how a CPU scheduler change could cause fdatasync() to
>return EIO.
>
>What filesystem was being used?
>
>If it was ext2 then perhaps you hit the recently-fixed block allocator
>race.  That fix was merged after test9.  Please check the kernel logs for
>any filesystem error messages.
>

The kernel tested was test9-bk14 + my patch.

I don't think it would be due to a problem my patch. Perhaps different
scheduling patterns made some race more likely though.

>
>Also, please retry the run, see if it is repeatable.
>

I will let someone else take over from here ;) I'll run the test
again with the latest bk when I submit another round of STP tests
sometime.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Nick's scheduler v18
  2003-11-13 21:30 ` Martin J. Bligh
@ 2003-11-14  2:12   ` Nick Piggin
  0 siblings, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2003-11-14  2:12 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel



Martin J. Bligh wrote:

>>Average of 5 kernel compiles (make -j) on a 16-way 512KB cache NUMAQ gives
>>         bk14  bk14-v18
>>real    83.5s     81.7s
>>user   987.6s    992.5s
>>sys    158.0s    142.3s
>>
>>Volanomark looks much better than mainline.
>>
>>More testing welcome.
>>
>
>-noint is just backing out the interactivity patch (part of your patch)
>Not sure that's helping you much really, but maybe it conflicts with
>your other stuff.
>
>Kernbench: (make -j N vmlinux, where N = 2 x num_cpus)
>                              Elapsed      System        User         CPU
>              2.6.0-test9       45.28      100.19      568.01     1474.75
>        2.6.0-test9-noint       48.20       99.05      567.26     1389.00
>       2.6.0-test9-nick18       45.06       91.56      568.77     1467.50
>
>Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)
>                              Elapsed      System        User         CPU
>              2.6.0-test9       46.17      122.20      571.58     1501.00
>        2.6.0-test9-noint       46.43      117.96      577.60     1498.00
>       2.6.0-test9-nick18       46.90      109.05      589.77     1488.75
>
>Kernbench: (make -j vmlinux, maximal tasks)
>                              Elapsed      System        User         CPU
>              2.6.0-test9       45.84      120.14      570.93     1507.00
>        2.6.0-test9-noint       47.42      123.52      582.91     1488.75
>       2.6.0-test9-nick18       46.83      110.70      588.91     1494.00
>
>It seems that you're decreasing system time significantly, but increasing
>user time if you have lots of tasks ... context switch thrash, maybe?
>

OK, thanks for testing. Still not great.

My patchset does a _lot_ less SMP and NUMA balancing, although I think
that sometimes causes too much idle time. It might be doing more context
switching though.

>
>Would be interesting if you know which of the many patches in there make
>the performance difference ... the whole thing is a bit too big to pick
>up and maintain easily ;-)
>

Its not well broken out though unfortunately. I really need to document and
comment it better.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Nick's scheduler v18
  2003-11-13 19:39   ` Andrew Morton
@ 2003-11-13 22:27     ` Mike Fedyk
  2003-11-14 10:34       ` Sven Luther
  2003-11-14  5:45     ` Nick Piggin
  1 sibling, 1 reply; 10+ messages in thread
From: Mike Fedyk @ 2003-11-13 22:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mary Edie Meredith, piggin, linux-kernel, jenny

On Thu, Nov 13, 2003 at 11:39:06AM -0800, Andrew Morton wrote:
> What filesystem was being used?
> 
> If it was ext2 then perhaps you hit the recently-fixed block allocator
> race.  That fix was merged after test9.  Please check the kernel logs for
> any filesystem error messages.
> 
> Also, please retry the run, see if it is repeatable.

Did that hit ext3 also?  ISTR, getting some "access beyond end of device"
while running ext3.

Interestingly enough, I didn't get this while using reiserfs3...

And me still running 2.6.0-test6-mm4 :-/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Nick's scheduler v18
  2003-11-10 17:20 Nick Piggin
  2003-11-13 18:07 ` Mary Edie Meredith
@ 2003-11-13 21:30 ` Martin J. Bligh
  2003-11-14  2:12   ` Nick Piggin
  1 sibling, 1 reply; 10+ messages in thread
From: Martin J. Bligh @ 2003-11-13 21:30 UTC (permalink / raw)
  To: Nick Piggin, linux-kernel

> Average of 5 kernel compiles (make -j) on a 16-way 512KB cache NUMAQ gives
>          bk14  bk14-v18
> real    83.5s     81.7s
> user   987.6s    992.5s
> sys    158.0s    142.3s
> 
> Volanomark looks much better than mainline.
> 
> More testing welcome.

-noint is just backing out the interactivity patch (part of your patch)
Not sure that's helping you much really, but maybe it conflicts with
your other stuff.

Kernbench: (make -j N vmlinux, where N = 2 x num_cpus)
                              Elapsed      System        User         CPU
              2.6.0-test9       45.28      100.19      568.01     1474.75
        2.6.0-test9-noint       48.20       99.05      567.26     1389.00
       2.6.0-test9-nick18       45.06       91.56      568.77     1467.50

Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)
                              Elapsed      System        User         CPU
              2.6.0-test9       46.17      122.20      571.58     1501.00
        2.6.0-test9-noint       46.43      117.96      577.60     1498.00
       2.6.0-test9-nick18       46.90      109.05      589.77     1488.75

Kernbench: (make -j vmlinux, maximal tasks)
                              Elapsed      System        User         CPU
              2.6.0-test9       45.84      120.14      570.93     1507.00
        2.6.0-test9-noint       47.42      123.52      582.91     1488.75
       2.6.0-test9-nick18       46.83      110.70      588.91     1494.00

It seems that you're decreasing system time significantly, but increasing
user time if you have lots of tasks ... context switch thrash, maybe?

Would be interesting if you know which of the many patches in there make
the performance difference ... the whole thing is a bit too big to pick
up and maintain easily ;-)

M.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Nick's scheduler v18
  2003-11-13 18:07 ` Mary Edie Meredith
@ 2003-11-13 19:39   ` Andrew Morton
  2003-11-13 22:27     ` Mike Fedyk
  2003-11-14  5:45     ` Nick Piggin
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2003-11-13 19:39 UTC (permalink / raw)
  To: Mary Edie Meredith; +Cc: piggin, linux-kernel, jenny

Mary Edie Meredith <maryedie@osdl.org> wrote:
>
> Nick,
> 
> We ran your patch on STP against one of our database workloads (DBT3 on
> postgreSQL which uses file system rather than raw).
> 
> The test was able to compile, successfully start up the database,
> successfully load the database from source file, successfully run the
> power test (single stream update/query/delete).   
> 
> It failed, however at the next stage, where it starts 8 streams of query
> and one stream of updates/deletes where it ran for approximately 40
> minutes (usually takes over an hour to complete).  The updates appear to
> have completed and only queries were active at the time of failure.  See
> the error message below from the database log.
>
> ...
>
> PANIC:  fdatasync of log file 1, segment 81 failed: Input/output error
>

It's hard to see how a CPU scheduler change could cause fdatasync() to
return EIO.

What filesystem was being used?

If it was ext2 then perhaps you hit the recently-fixed block allocator
race.  That fix was merged after test9.  Please check the kernel logs for
any filesystem error messages.

Also, please retry the run, see if it is repeatable.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Nick's scheduler v18
  2003-11-10 17:20 Nick Piggin
@ 2003-11-13 18:07 ` Mary Edie Meredith
  2003-11-13 19:39   ` Andrew Morton
  2003-11-13 21:30 ` Martin J. Bligh
  1 sibling, 1 reply; 10+ messages in thread
From: Mary Edie Meredith @ 2003-11-13 18:07 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, jenny

Nick,

We ran your patch on STP against one of our database workloads (DBT3 on
postgreSQL which uses file system rather than raw).

The test was able to compile, successfully start up the database,
successfully load the database from source file, successfully run the
power test (single stream update/query/delete).   

It failed, however at the next stage, where it starts 8 streams of query
and one stream of updates/deletes where it ran for approximately 40
minutes (usually takes over an hour to complete).  The updates appear to
have completed and only queries were active at the time of failure.  See
the error message below from the database log.

The queries did produce results and the query streams appear to have
failed at the same time.  


.config is at:

http://khack.osdl.org/stp/282959/environment/kernel-config

iostat,vmstat data at this location:
http://khack.osdl.org/stp/282959/results/

from the database log (normal until the line beginning with "PANIC")
...
LOG:  removing transaction log file 000000010000004D
LOG:  removing transaction log file 0000000100000050
PANIC:  fdatasync of log file 1, segment 81 failed: Input/output error
LOG:  statement:  update time_statistics set e_time=current_timestamp where task_name='PERF1.THRUPUT.QS6.Q11';
LOG:  server process (pid 23182) was terminated by signal 6
LOG:  terminating any other active server processes
WARNING:  Message from PostgreSQL backend:
	The Postmaster has informed me that some other backend
	died abnormally and possibly corrupted shared memory.
	I have rolled back the current transaction and am
	going to terminate your database system connection and exit.
	Please reconnect to the database system and repeat your query.
... 

Jenny searched the postgreSQL site for this error and so far can't find
any more details about it.  We are puzzled by the error on the log when
at the time there should not have been any actual updates.  We will
forward this question to the PostreSQL folks.  


On Mon, 2003-11-10 at 09:20, Nick Piggin wrote:
> http://www.kerneltrap.org/~npiggin/v18/
> 
> Nothing exciting for desktop users. High end performance is now starting
> to get better.
> 
> Has an (unimportant) accounting fix that shouldn't really be here, but
> doesn't look like it will get in before 2.6.0.
> 
> Average of 5 kernel compiles (make -j) on a 16-way 512KB cache NUMAQ gives
>          bk14  bk14-v18
> real    83.5s     81.7s
> user   987.6s    992.5s
> sys    158.0s    142.3s
> 
> Volanomark looks much better than mainline.
> 
> More testing welcome.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Mary Edie Meredith <maryedie@osdl.org>
Open Source Development Lab


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Nick's scheduler v18
@ 2003-11-10 17:20 Nick Piggin
  2003-11-13 18:07 ` Mary Edie Meredith
  2003-11-13 21:30 ` Martin J. Bligh
  0 siblings, 2 replies; 10+ messages in thread
From: Nick Piggin @ 2003-11-10 17:20 UTC (permalink / raw)
  To: linux-kernel

http://www.kerneltrap.org/~npiggin/v18/

Nothing exciting for desktop users. High end performance is now starting
to get better.

Has an (unimportant) accounting fix that shouldn't really be here, but
doesn't look like it will get in before 2.6.0.

Average of 5 kernel compiles (make -j) on a 16-way 512KB cache NUMAQ gives
         bk14  bk14-v18
real    83.5s     81.7s
user   987.6s    992.5s
sys    158.0s    142.3s

Volanomark looks much better than mainline.

More testing welcome.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-11-14 10:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1068589319.1557.1.camel@localhost.localdomain>
2003-11-11 22:30 ` Nick's scheduler v18 Tom Sightler
2003-11-12  0:38   ` Nick Piggin
2003-11-10 17:20 Nick Piggin
2003-11-13 18:07 ` Mary Edie Meredith
2003-11-13 19:39   ` Andrew Morton
2003-11-13 22:27     ` Mike Fedyk
2003-11-14 10:34       ` Sven Luther
2003-11-14  5:45     ` Nick Piggin
2003-11-13 21:30 ` Martin J. Bligh
2003-11-14  2:12   ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).