* Nick's scheduler v18 @ 2003-11-10 17:20 Nick Piggin 2003-11-13 18:07 ` Mary Edie Meredith 2003-11-13 21:30 ` Martin J. Bligh 0 siblings, 2 replies; 10+ messages in thread From: Nick Piggin @ 2003-11-10 17:20 UTC (permalink / raw) To: linux-kernel http://www.kerneltrap.org/~npiggin/v18/ Nothing exciting for desktop users. High end performance is now starting to get better. Has an (unimportant) accounting fix that shouldn't really be here, but doesn't look like it will get in before 2.6.0. Average of 5 kernel compiles (make -j) on a 16-way 512KB cache NUMAQ gives bk14 bk14-v18 real 83.5s 81.7s user 987.6s 992.5s sys 158.0s 142.3s Volanomark looks much better than mainline. More testing welcome. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's scheduler v18 2003-11-10 17:20 Nick's scheduler v18 Nick Piggin @ 2003-11-13 18:07 ` Mary Edie Meredith 2003-11-13 19:39 ` Andrew Morton 2003-11-13 21:30 ` Martin J. Bligh 1 sibling, 1 reply; 10+ messages in thread From: Mary Edie Meredith @ 2003-11-13 18:07 UTC (permalink / raw) To: Nick Piggin; +Cc: linux-kernel, jenny Nick, We ran your patch on STP against one of our database workloads (DBT3 on postgreSQL which uses file system rather than raw). The test was able to compile, successfully start up the database, successfully load the database from source file, successfully run the power test (single stream update/query/delete). It failed, however at the next stage, where it starts 8 streams of query and one stream of updates/deletes where it ran for approximately 40 minutes (usually takes over an hour to complete). The updates appear to have completed and only queries were active at the time of failure. See the error message below from the database log. The queries did produce results and the query streams appear to have failed at the same time. .config is at: http://khack.osdl.org/stp/282959/environment/kernel-config iostat,vmstat data at this location: http://khack.osdl.org/stp/282959/results/ from the database log (normal until the line beginning with "PANIC") ... LOG: removing transaction log file 000000010000004D LOG: removing transaction log file 0000000100000050 PANIC: fdatasync of log file 1, segment 81 failed: Input/output error LOG: statement: update time_statistics set e_time=current_timestamp where task_name='PERF1.THRUPUT.QS6.Q11'; LOG: server process (pid 23182) was terminated by signal 6 LOG: terminating any other active server processes WARNING: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. ... Jenny searched the postgreSQL site for this error and so far can't find any more details about it. We are puzzled by the error on the log when at the time there should not have been any actual updates. We will forward this question to the PostreSQL folks. On Mon, 2003-11-10 at 09:20, Nick Piggin wrote: > http://www.kerneltrap.org/~npiggin/v18/ > > Nothing exciting for desktop users. High end performance is now starting > to get better. > > Has an (unimportant) accounting fix that shouldn't really be here, but > doesn't look like it will get in before 2.6.0. > > Average of 5 kernel compiles (make -j) on a 16-way 512KB cache NUMAQ gives > bk14 bk14-v18 > real 83.5s 81.7s > user 987.6s 992.5s > sys 158.0s 142.3s > > Volanomark looks much better than mainline. > > More testing welcome. > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Mary Edie Meredith <maryedie@osdl.org> Open Source Development Lab ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's scheduler v18 2003-11-13 18:07 ` Mary Edie Meredith @ 2003-11-13 19:39 ` Andrew Morton 2003-11-13 22:27 ` Mike Fedyk 2003-11-14 5:45 ` Nick Piggin 0 siblings, 2 replies; 10+ messages in thread From: Andrew Morton @ 2003-11-13 19:39 UTC (permalink / raw) To: Mary Edie Meredith; +Cc: piggin, linux-kernel, jenny Mary Edie Meredith <maryedie@osdl.org> wrote: > > Nick, > > We ran your patch on STP against one of our database workloads (DBT3 on > postgreSQL which uses file system rather than raw). > > The test was able to compile, successfully start up the database, > successfully load the database from source file, successfully run the > power test (single stream update/query/delete). > > It failed, however at the next stage, where it starts 8 streams of query > and one stream of updates/deletes where it ran for approximately 40 > minutes (usually takes over an hour to complete). The updates appear to > have completed and only queries were active at the time of failure. See > the error message below from the database log. > > ... > > PANIC: fdatasync of log file 1, segment 81 failed: Input/output error > It's hard to see how a CPU scheduler change could cause fdatasync() to return EIO. What filesystem was being used? If it was ext2 then perhaps you hit the recently-fixed block allocator race. That fix was merged after test9. Please check the kernel logs for any filesystem error messages. Also, please retry the run, see if it is repeatable. Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's scheduler v18 2003-11-13 19:39 ` Andrew Morton @ 2003-11-13 22:27 ` Mike Fedyk 2003-11-14 10:34 ` Sven Luther 2003-11-14 5:45 ` Nick Piggin 1 sibling, 1 reply; 10+ messages in thread From: Mike Fedyk @ 2003-11-13 22:27 UTC (permalink / raw) To: Andrew Morton; +Cc: Mary Edie Meredith, piggin, linux-kernel, jenny On Thu, Nov 13, 2003 at 11:39:06AM -0800, Andrew Morton wrote: > What filesystem was being used? > > If it was ext2 then perhaps you hit the recently-fixed block allocator > race. That fix was merged after test9. Please check the kernel logs for > any filesystem error messages. > > Also, please retry the run, see if it is repeatable. Did that hit ext3 also? ISTR, getting some "access beyond end of device" while running ext3. Interestingly enough, I didn't get this while using reiserfs3... And me still running 2.6.0-test6-mm4 :-/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's scheduler v18 2003-11-13 22:27 ` Mike Fedyk @ 2003-11-14 10:34 ` Sven Luther 0 siblings, 0 replies; 10+ messages in thread From: Sven Luther @ 2003-11-14 10:34 UTC (permalink / raw) To: Andrew Morton, Mary Edie Meredith, piggin, linux-kernel, jenny On Thu, Nov 13, 2003 at 02:27:51PM -0800, Mike Fedyk wrote: > On Thu, Nov 13, 2003 at 11:39:06AM -0800, Andrew Morton wrote: > > What filesystem was being used? > > > > If it was ext2 then perhaps you hit the recently-fixed block allocator > > race. That fix was merged after test9. Please check the kernel logs for > > any filesystem error messages. > > > > Also, please retry the run, see if it is repeatable. > > Did that hit ext3 also? ISTR, getting some "access beyond end of device" > while running ext3. BTW, i did encounter some problem with amiga partitions which had some bad values due to a bug in libparted now fixed. The head size was counted double or something such, which resulted in accesses beyon the end of the device. It has a funny effect though. The box would freeze, and the IDE led would flash in 1 second intervals. Not sure it is the expected behavior. This is with a 2.4.22 kernel, both on x86 and ppc. Friendly, Sven Luther ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's scheduler v18 2003-11-13 19:39 ` Andrew Morton 2003-11-13 22:27 ` Mike Fedyk @ 2003-11-14 5:45 ` Nick Piggin 1 sibling, 0 replies; 10+ messages in thread From: Nick Piggin @ 2003-11-14 5:45 UTC (permalink / raw) To: Andrew Morton; +Cc: Mary Edie Meredith, linux-kernel, jenny Andrew Morton wrote: >Mary Edie Meredith <maryedie@osdl.org> wrote: > >>Nick, >> >>We ran your patch on STP against one of our database workloads (DBT3 on >>postgreSQL which uses file system rather than raw). >> >>The test was able to compile, successfully start up the database, >>successfully load the database from source file, successfully run the >>power test (single stream update/query/delete). >> >>It failed, however at the next stage, where it starts 8 streams of query >>and one stream of updates/deletes where it ran for approximately 40 >>minutes (usually takes over an hour to complete). The updates appear to >>have completed and only queries were active at the time of failure. See >>the error message below from the database log. >> >>... >> >>PANIC: fdatasync of log file 1, segment 81 failed: Input/output error >> >> > >It's hard to see how a CPU scheduler change could cause fdatasync() to >return EIO. > >What filesystem was being used? > >If it was ext2 then perhaps you hit the recently-fixed block allocator >race. That fix was merged after test9. Please check the kernel logs for >any filesystem error messages. > The kernel tested was test9-bk14 + my patch. I don't think it would be due to a problem my patch. Perhaps different scheduling patterns made some race more likely though. > >Also, please retry the run, see if it is repeatable. > I will let someone else take over from here ;) I'll run the test again with the latest bk when I submit another round of STP tests sometime. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's scheduler v18 2003-11-10 17:20 Nick's scheduler v18 Nick Piggin 2003-11-13 18:07 ` Mary Edie Meredith @ 2003-11-13 21:30 ` Martin J. Bligh 2003-11-14 2:12 ` Nick Piggin 1 sibling, 1 reply; 10+ messages in thread From: Martin J. Bligh @ 2003-11-13 21:30 UTC (permalink / raw) To: Nick Piggin, linux-kernel > Average of 5 kernel compiles (make -j) on a 16-way 512KB cache NUMAQ gives > bk14 bk14-v18 > real 83.5s 81.7s > user 987.6s 992.5s > sys 158.0s 142.3s > > Volanomark looks much better than mainline. > > More testing welcome. -noint is just backing out the interactivity patch (part of your patch) Not sure that's helping you much really, but maybe it conflicts with your other stuff. Kernbench: (make -j N vmlinux, where N = 2 x num_cpus) Elapsed System User CPU 2.6.0-test9 45.28 100.19 568.01 1474.75 2.6.0-test9-noint 48.20 99.05 567.26 1389.00 2.6.0-test9-nick18 45.06 91.56 568.77 1467.50 Kernbench: (make -j N vmlinux, where N = 16 x num_cpus) Elapsed System User CPU 2.6.0-test9 46.17 122.20 571.58 1501.00 2.6.0-test9-noint 46.43 117.96 577.60 1498.00 2.6.0-test9-nick18 46.90 109.05 589.77 1488.75 Kernbench: (make -j vmlinux, maximal tasks) Elapsed System User CPU 2.6.0-test9 45.84 120.14 570.93 1507.00 2.6.0-test9-noint 47.42 123.52 582.91 1488.75 2.6.0-test9-nick18 46.83 110.70 588.91 1494.00 It seems that you're decreasing system time significantly, but increasing user time if you have lots of tasks ... context switch thrash, maybe? Would be interesting if you know which of the many patches in there make the performance difference ... the whole thing is a bit too big to pick up and maintain easily ;-) M. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's scheduler v18 2003-11-13 21:30 ` Martin J. Bligh @ 2003-11-14 2:12 ` Nick Piggin 0 siblings, 0 replies; 10+ messages in thread From: Nick Piggin @ 2003-11-14 2:12 UTC (permalink / raw) To: Martin J. Bligh; +Cc: linux-kernel Martin J. Bligh wrote: >>Average of 5 kernel compiles (make -j) on a 16-way 512KB cache NUMAQ gives >> bk14 bk14-v18 >>real 83.5s 81.7s >>user 987.6s 992.5s >>sys 158.0s 142.3s >> >>Volanomark looks much better than mainline. >> >>More testing welcome. >> > >-noint is just backing out the interactivity patch (part of your patch) >Not sure that's helping you much really, but maybe it conflicts with >your other stuff. > >Kernbench: (make -j N vmlinux, where N = 2 x num_cpus) > Elapsed System User CPU > 2.6.0-test9 45.28 100.19 568.01 1474.75 > 2.6.0-test9-noint 48.20 99.05 567.26 1389.00 > 2.6.0-test9-nick18 45.06 91.56 568.77 1467.50 > >Kernbench: (make -j N vmlinux, where N = 16 x num_cpus) > Elapsed System User CPU > 2.6.0-test9 46.17 122.20 571.58 1501.00 > 2.6.0-test9-noint 46.43 117.96 577.60 1498.00 > 2.6.0-test9-nick18 46.90 109.05 589.77 1488.75 > >Kernbench: (make -j vmlinux, maximal tasks) > Elapsed System User CPU > 2.6.0-test9 45.84 120.14 570.93 1507.00 > 2.6.0-test9-noint 47.42 123.52 582.91 1488.75 > 2.6.0-test9-nick18 46.83 110.70 588.91 1494.00 > >It seems that you're decreasing system time significantly, but increasing >user time if you have lots of tasks ... context switch thrash, maybe? > OK, thanks for testing. Still not great. My patchset does a _lot_ less SMP and NUMA balancing, although I think that sometimes causes too much idle time. It might be doing more context switching though. > >Would be interesting if you know which of the many patches in there make >the performance difference ... the whole thing is a bit too big to pick >up and maintain easily ;-) > Its not well broken out though unfortunately. I really need to document and comment it better. ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <1068589319.1557.1.camel@localhost.localdomain>]
* Re: Nick's scheduler v18 [not found] <1068589319.1557.1.camel@localhost.localdomain> @ 2003-11-11 22:30 ` Tom Sightler 2003-11-12 0:38 ` Nick Piggin 0 siblings, 1 reply; 10+ messages in thread From: Tom Sightler @ 2003-11-11 22:30 UTC (permalink / raw) To: LKML; +Cc: piggin On Tue, 2003-11-11 at 17:22, Tom Sightler wrote: > http://www.kerneltrap.org/~npiggin/v18/ > > Nothing exciting for desktop users. High end performance is now starting > to get better. Hey Nick, Was this tested against single processor? On my Dell Latitude C810 I can boot test9 and test9-mm2 without problems, but using the identical config with this patch my system will not even boot up all the way. It stops at various stages during the init scripts. It seems to consistently get further if I add elevator=deadline but it never boots all the way up in either case. No messages or other good info, just hangs and won't go any further. Any thoughts? Later, Tom ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's scheduler v18 2003-11-11 22:30 ` Tom Sightler @ 2003-11-12 0:38 ` Nick Piggin 0 siblings, 0 replies; 10+ messages in thread From: Nick Piggin @ 2003-11-12 0:38 UTC (permalink / raw) To: Tom Sightler; +Cc: LKML Tom Sightler wrote: >On Tue, 2003-11-11 at 17:22, Tom Sightler wrote: > >>http://www.kerneltrap.org/~npiggin/v18/ >> >>Nothing exciting for desktop users. High end performance is now starting >>to get better. >> > >Hey Nick, > >Was this tested against single processor? On my Dell Latitude C810 I >can boot test9 and test9-mm2 without problems, but using the identical >config with this patch my system will not even boot up all the way. It >stops at various stages during the init scripts. It seems to >consistently get further if I add elevator=deadline but it never boots >all the way up in either case. > >No messages or other good info, just hangs and won't go any further. >Any thoughts? > Yeah, tested on UP. Sigh. Can I have a look at your .config? Do you have preempt on? Thanks ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2003-11-14 10:34 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-11-10 17:20 Nick's scheduler v18 Nick Piggin 2003-11-13 18:07 ` Mary Edie Meredith 2003-11-13 19:39 ` Andrew Morton 2003-11-13 22:27 ` Mike Fedyk 2003-11-14 10:34 ` Sven Luther 2003-11-14 5:45 ` Nick Piggin 2003-11-13 21:30 ` Martin J. Bligh 2003-11-14 2:12 ` Nick Piggin [not found] <1068589319.1557.1.camel@localhost.localdomain> 2003-11-11 22:30 ` Tom Sightler 2003-11-12 0:38 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).