linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Panic on 2.6.0-test1-mm1
@ 2003-07-29 14:37 Martin J. Bligh
  2003-07-29 21:18 ` Martin J. Bligh
  2003-07-31 22:37 ` Panic on 2.6.0-test1-mm1 William Lee Irwin III
  0 siblings, 2 replies; 23+ messages in thread
From: Martin J. Bligh @ 2003-07-29 14:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

The big box had this on the console ... looks like it was doing a
compile at the time ... sorry, only just noticed it after returning
from OLS, so don't have more context (2.6.0-test1-mm1).

kernel BUG at include/linux/list.h:149!
invalid operand: 0000 [#1]
SMP 
CPU:    3
EIP:    0060:[<c0117f98>]    Not tainted VLI
EFLAGS: 00010083
EIP is at pgd_dtor+0x64/0x8c
eax: c1685078   ebx: c1685060   ecx: c0288348   edx: c1685078
esi: 00000082   edi: c030b6a0   ebp: e9b9c000   esp: e9ea3ed0
ds: 007b   es: 007b   ss: 0068
Process cc1 (pid: 4439, threadinfo=e9ea2000 task=eac36690)
Stack: 00000000 f01fdecc c0139588 e9b9c1e0 f01fdecc 00000000 00000039 f01fdecc 
       0000000a f01fdf54 e9871000 c013a540 f01fdecc e9b9c000 f01e9000 00000024 
       f01e9010 f01fdecc c013ace8 f01fdecc f01e9010 00000024 f01fdecc f01fdfb8 
Call Trace:
 [<c0139588>] slab_destroy+0x40/0x124
 [<c013a540>] free_block+0xfc/0x13c
 [<c013ace8>] drain_array_locked+0x80/0xac
 [<c013adef>] reap_timer_fnc+0xdb/0x1e0
 [<c013ad14>] reap_timer_fnc+0x0/0x1e0
 [<c0125aa5>] run_timer_softirq+0x13d/0x170
 [<c0121f7c>] do_softirq+0x6c/0xcc
 [<c01159df>] smp_apic_timer_interrupt+0x14b/0x158
 [<c023a752>] apic_timer_interrupt+0x1a/0x20

Code: 80 50 26 00 00 8d 14 92 8d 1c d0 8d 53 18 8b 4a 04 39 11 74 0e 0f 0b 94 00 99 3e 24 c0 8d b6 00 00 00 00 8b 43 18 39 50 04 74 08 <0f> 0b 95 00 99 3e 24 c0 89 48 04 89 01 c7 43 18 00 01 10 00 c7 
 <0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Panic on 2.6.0-test1-mm1
  2003-07-29 14:37 Panic on 2.6.0-test1-mm1 Martin J. Bligh
@ 2003-07-29 21:18 ` Martin J. Bligh
  2003-07-30 15:01   ` 2.6.0-test2-mm1 results Martin J. Bligh
  2003-07-31 22:37 ` Panic on 2.6.0-test1-mm1 William Lee Irwin III
  1 sibling, 1 reply; 23+ messages in thread
From: Martin J. Bligh @ 2003-07-29 21:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

> The big box had this on the console ... looks like it was doing a
> compile at the time ... sorry, only just noticed it after returning
> from OLS, so don't have more context (2.6.0-test1-mm1).
> 
> kernel BUG at include/linux/list.h:149!
> invalid operand: 0000 [#1]
> SMP 
> CPU:    3
> EIP:    0060:[<c0117f98>]    Not tainted VLI
> EFLAGS: 00010083
> EIP is at pgd_dtor+0x64/0x8c
> eax: c1685078   ebx: c1685060   ecx: c0288348   edx: c1685078
> esi: 00000082   edi: c030b6a0   ebp: e9b9c000   esp: e9ea3ed0
> ds: 007b   es: 007b   ss: 0068
> Process cc1 (pid: 4439, threadinfo=e9ea2000 task=eac36690)
> Stack: 00000000 f01fdecc c0139588 e9b9c1e0 f01fdecc 00000000 00000039 f01fdecc 
>        0000000a f01fdf54 e9871000 c013a540 f01fdecc e9b9c000 f01e9000 00000024 
>        f01e9010 f01fdecc c013ace8 f01fdecc f01e9010 00000024 f01fdecc f01fdfb8 
> Call Trace:
>  [<c0139588>] slab_destroy+0x40/0x124
>  [<c013a540>] free_block+0xfc/0x13c
>  [<c013ace8>] drain_array_locked+0x80/0xac
>  [<c013adef>] reap_timer_fnc+0xdb/0x1e0
>  [<c013ad14>] reap_timer_fnc+0x0/0x1e0
>  [<c0125aa5>] run_timer_softirq+0x13d/0x170
>  [<c0121f7c>] do_softirq+0x6c/0xcc
>  [<c01159df>] smp_apic_timer_interrupt+0x14b/0x158
>  [<c023a752>] apic_timer_interrupt+0x1a/0x20
> 
> Code: 80 50 26 00 00 8d 14 92 8d 1c d0 8d 53 18 8b 4a 04 39 11 74 0e 0f 0b 94 00 99 3e 24 c0 8d b6 00 00 00 00 8b 43 18 39 50 04 74 08 <0f> 0b 95 00 99 3e 24 c0 89 48 04 89 01 c7 43 18 00 01 10 00 c7 
>  <0>Kernel panic: Fatal exception in interrupt
> In interrupt handler - not syncing

Seems to be trivially reproducible by doing "make -j vmlinux".
I'll try your latest one to see if it's fixed already, I guess.

M.

kernel BUG at include/linux/list.h:149!
invalid operand: 0000 [#1]
SMP 
CPU:    3
EIP:    0060:[<c0117f98>]    Not tainted VLI
EFLAGS: 00010083
EIP is at pgd_dtor+0x64/0x8c
eax: c1573450   ebx: c1573438   ecx: c0288348   edx: c1573450
esi: 00000082   edi: c030b6a0   ebp: e2e1b000   esp: e2813ed4
ds: 007b   es: 007b   ss: 0068
Process cc1 (pid: 11439, threadinfo=e2812000 task=e4869980)
Stack: 00000000 f01fdecc c0139588 e2e1b1e0 f01fdecc 00000000 00000039 f01fdecc 
       00000017 f01fdf54 e05f4000 c013a540 f01fdecc e2e1b000 f01c6410 00000018 
       f01c6400 f01fdecc c013ac31 f01fdecc f01c6410 00000018 f01fdecc f01fdfb8 
Call Trace:
 [<c0139588>] slab_destroy+0x40/0x124
 [<c013a540>] free_block+0xfc/0x13c
 [<c013ac31>] drain_array+0x55/0x8c
 [<c013ad14>] reap_timer_fnc+0x0/0x1e0
 [<c013ada7>] reap_timer_fnc+0x93/0x1e0
 [<c013ad14>] reap_timer_fnc+0x0/0x1e0
 [<c0125aa5>] run_timer_softirq+0x13d/0x170
 [<c0121f7c>] do_softirq+0x6c/0xcc
 [<c01159df>] smp_apic_timer_interrupt+0x14b/0x158
 [<c023a752>] apic_timer_interrupt+0x1a/0x20

Code: 80 50 26 00 00 8d 14 92 8d 1c d0 8d 53 18 8b 4a 04 39 11 74 0e 0f 0b 94 00
 99 3e 24 c0 8d b6 00 00 00 00 8b 43 18 39 50 04 74 08 <0f> 0b 95 00 99 3e 24 c0
 89 48 04 89 01 c7 43 18 00 01 10 00 c7 
 <0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing


^ permalink raw reply	[flat|nested] 23+ messages in thread

* 2.6.0-test2-mm1 results
  2003-07-29 21:18 ` Martin J. Bligh
@ 2003-07-30 15:01   ` Martin J. Bligh
  2003-07-30 15:28     ` Con Kolivas
  0 siblings, 1 reply; 23+ messages in thread
From: Martin J. Bligh @ 2003-07-30 15:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

OK, so test2-mm1 fixes the panic I was seeing in test1-mm1.
Only noticeable thing is that -mm tree is consistently a little slower 
at kernbench

Kernbench: (make -j N vmlinux, where N = 2 x num_cpus)
                              Elapsed      System        User         CPU
                   2.5.74       45.17       97.88      568.43     1474.75
               2.5.74-mm1       45.84      109.66      568.05     1477.50
              2.6.0-test1       45.25       98.63      568.45     1473.50
          2.6.0-test2-mm1       45.38      101.47      569.16     1476.25
         2.6.0-test2-mjb1       43.31       75.98      564.33     1478.00

Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)
                              Elapsed      System        User         CPU
                   2.5.74       45.74      114.56      571.62     1500.00
               2.5.74-mm1       46.59      133.65      570.90     1511.50
              2.6.0-test1       45.68      114.68      571.70     1503.00
          2.6.0-test2-mm1       46.66      119.82      579.32     1497.25
         2.6.0-test2-mjb1       44.03       87.85      569.97     1493.75

Kernbench: (make -j vmlinux, maximal tasks)
                              Elapsed      System        User         CPU
                   2.5.74       46.11      115.86      571.77     1491.50
               2.5.74-mm1       47.13      139.07      571.52     1509.25
              2.6.0-test1       46.09      115.76      571.74     1491.25
          2.6.0-test2-mm1       46.95      121.18      582.00     1497.50
         2.6.0-test2-mjb1       44.08       85.54      570.57     1487.25


DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This 
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.

Results are shown as percentages of the first set displayed

SDET 1  (see disclaimer)
                           Throughput    Std. Dev
                   2.5.74       100.0%         3.7%
               2.5.74-mm1        88.5%        10.9%
              2.6.0-test1       103.0%         2.0%
          2.6.0-test2-mm1        99.7%         3.1%
         2.6.0-test2-mjb1       107.2%         3.6%

SDET 2  (see disclaimer)
                           Throughput    Std. Dev
                   2.5.74       100.0%        53.7%
               2.5.74-mm1       133.9%         1.4%
              2.6.0-test1       136.4%         1.9%
          2.6.0-test2-mm1       132.1%         4.2%
         2.6.0-test2-mjb1       156.6%         1.1%

SDET 4  (see disclaimer)
                           Throughput    Std. Dev
                   2.5.74       100.0%         3.9%
               2.5.74-mm1        92.5%         2.5%
              2.6.0-test1        96.7%         5.7%
          2.6.0-test2-mm1        70.6%        49.1%
         2.6.0-test2-mjb1       134.2%         2.1%

SDET 8  (see disclaimer)
                           Throughput    Std. Dev
                   2.5.74       100.0%        45.9%
               2.5.74-mm1       123.5%         0.6%
              2.6.0-test1        86.1%        70.7%
          2.6.0-test2-mm1       127.8%         0.4%
         2.6.0-test2-mjb1       158.6%         0.7%

SDET 16  (see disclaimer)
                           Throughput    Std. Dev
                   2.5.74       100.0%         0.3%
               2.5.74-mm1        92.8%         0.8%
              2.6.0-test1        99.3%         0.6%
          2.6.0-test2-mm1        97.9%         0.5%
         2.6.0-test2-mjb1       120.8%         0.6%

SDET 32  (see disclaimer)
                           Throughput    Std. Dev
                   2.5.74       100.0%         0.1%
               2.5.74-mm1        94.4%         0.4%
              2.6.0-test1       100.4%         0.2%
          2.6.0-test2-mm1        97.9%         0.2%
         2.6.0-test2-mjb1       123.2%         0.5%

SDET 64  (see disclaimer)
                           Throughput    Std. Dev
                   2.5.74       100.0%         0.4%
               2.5.74-mm1        95.6%         0.3%
              2.6.0-test1       101.1%         0.3%
          2.6.0-test2-mm1       100.3%         0.5%
         2.6.0-test2-mjb1       127.1%         0.2%

SDET 128  (see disclaimer)
                           Throughput    Std. Dev
                   2.5.74       100.0%         0.1%
               2.5.74-mm1        97.6%         0.2%
              2.6.0-test1       100.6%         0.6%
          2.6.0-test2-mm1       101.8%         0.0%
         2.6.0-test2-mjb1       127.9%         0.3%

diffprofile for kernbench (from test1 to test2-mm1, so not really
fair, but might help):

      4383     2.6% total
      1600     6.8% page_remove_rmap
       934    61.6% do_no_page
       470    13.9% __copy_from_user_ll
       469    12.8% find_get_page
       373     0.0% pgd_ctor
       368     4.7% __d_lookup
       349     6.6% __copy_to_user_ll
       278    15.1% atomic_dec_and_lock
       273   154.2% may_open
       240    15.6% kmem_cache_free
       182    30.2% __wake_up
       163    11.2% schedule
       152    10.4% free_hot_cold_page
       148    21.2% pte_alloc_one
       123     6.5% path_lookup
       100     9.8% clear_page_tables
        77    12.5% copy_process
        76     4.2% buffered_rmqueue
        70    19.0% .text.lock.file_table
        70     5.8% release_pages
        66   825.0% free_percpu
        55    21.0% vfs_read
        54   300.0% cache_grow
        50     9.5% kmap_atomic
....
       -57   -38.0% __generic_file_aio_read
       -62  -100.0% free_pages_bulk
      -255   -77.5% dentry_open
      -316    -2.2% do_anonymous_page
      -415   -77.3% do_page_cache_readahead
      -562   -96.1% pgd_alloc
      -683   -68.9% filemap_nopage
     -1005    -2.0% default_idle

Someone messing with the pgd alloc stuff, perhaps?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.6.0-test2-mm1 results
  2003-07-30 15:01   ` 2.6.0-test2-mm1 results Martin J. Bligh
@ 2003-07-30 15:28     ` Con Kolivas
  2003-07-30 16:27       ` Martin J. Bligh
  2003-07-31 14:56       ` Martin J. Bligh
  0 siblings, 2 replies; 23+ messages in thread
From: Con Kolivas @ 2003-07-30 15:28 UTC (permalink / raw)
  To: Martin J. Bligh, Andrew Morton; +Cc: linux-kernel

On Thu, 31 Jul 2003 01:01, Martin J. Bligh wrote:
> OK, so test2-mm1 fixes the panic I was seeing in test1-mm1.
> Only noticeable thing is that -mm tree is consistently a little slower
> at kernbench

Could conceivably be my hacks throwing the cc cpu hogs onto the expired array 
more frequently.

Con


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.6.0-test2-mm1 results
  2003-07-30 15:28     ` Con Kolivas
@ 2003-07-30 16:27       ` Martin J. Bligh
  2003-07-31 14:56       ` Martin J. Bligh
  1 sibling, 0 replies; 23+ messages in thread
From: Martin J. Bligh @ 2003-07-30 16:27 UTC (permalink / raw)
  To: Con Kolivas, Andrew Morton; +Cc: linux-kernel

> On Thu, 31 Jul 2003 01:01, Martin J. Bligh wrote:
>> OK, so test2-mm1 fixes the panic I was seeing in test1-mm1.
>> Only noticeable thing is that -mm tree is consistently a little slower
>> at kernbench
> 
> Could conceivably be my hacks throwing the cc cpu hogs onto the expired array 
> more frequently.

OK, do you have that against straight mainline? I can try it broken
out if so ...

M.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.6.0-test2-mm1 results
  2003-07-30 15:28     ` Con Kolivas
  2003-07-30 16:27       ` Martin J. Bligh
@ 2003-07-31 14:56       ` Martin J. Bligh
  2003-07-31 15:13         ` Con Kolivas
  1 sibling, 1 reply; 23+ messages in thread
From: Martin J. Bligh @ 2003-07-31 14:56 UTC (permalink / raw)
  To: Con Kolivas, Andrew Morton; +Cc: linux-kernel

--Con Kolivas <kernel@kolivas.org> wrote (on Thursday, July 31, 2003 01:28:49 +1000):

> On Thu, 31 Jul 2003 01:01, Martin J. Bligh wrote:
>> OK, so test2-mm1 fixes the panic I was seeing in test1-mm1.
>> Only noticeable thing is that -mm tree is consistently a little slower
>> at kernbench
> 
> Could conceivably be my hacks throwing the cc cpu hogs onto the expired array 
> more frequently.

Kernbench: (make -j vmlinux, maximal tasks)
                              Elapsed      System        User         CPU
              2.6.0-test2       46.05      115.20      571.75     1491.25
          2.6.0-test2-con       46.98      121.02      583.55     1498.75
          2.6.0-test2-mm1       46.95      121.18      582.00     1497.50

Good guess ;-)

Does this help interactivity a lot, or was it just an experiment?
Perhaps it could be less agressive or something?

M.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.6.0-test2-mm1 results
  2003-07-31 14:56       ` Martin J. Bligh
@ 2003-07-31 15:13         ` Con Kolivas
  2003-07-31 15:19           ` Martin J. Bligh
  2003-07-31 17:03           ` Bill Davidsen
  0 siblings, 2 replies; 23+ messages in thread
From: Con Kolivas @ 2003-07-31 15:13 UTC (permalink / raw)
  To: Martin J. Bligh, Andrew Morton; +Cc: linux-kernel

On Fri, 1 Aug 2003 00:56, Martin J. Bligh wrote:
> --Con Kolivas <kernel@kolivas.org> wrote (on Thursday, July 31, 2003 
01:28:49 +1000):
> > On Thu, 31 Jul 2003 01:01, Martin J. Bligh wrote:
> >> OK, so test2-mm1 fixes the panic I was seeing in test1-mm1.
> >> Only noticeable thing is that -mm tree is consistently a little slower
> >> at kernbench
> >
> > Could conceivably be my hacks throwing the cc cpu hogs onto the expired
> > array more frequently.
>
> Kernbench: (make -j vmlinux, maximal tasks)
>                               Elapsed      System        User         CPU
>               2.6.0-test2       46.05      115.20      571.75     1491.25
>           2.6.0-test2-con       46.98      121.02      583.55     1498.75
>           2.6.0-test2-mm1       46.95      121.18      582.00     1497.50
>
> Good guess ;-)
>
> Does this help interactivity a lot, or was it just an experiment?
> Perhaps it could be less agressive or something?

Well basically this is a side effect of selecting out the correct cpu hogs in 
the interactivity estimator. It seems to be working ;-) The more cpu hogs 
they are the lower dynamic priority (higher number) they get, and the more 
likely they are to be removed from the active array if they use up their full 
timeslice. The scheduler in it's current form costs more to resurrect things 
from the expired array and restart them, and the cpu hogs will have to wait 
till other less cpu hogging tasks run. 

How do we get around this? I'll be brave here and say I'm not sure we need to, 
as cpu hogs have a knack of slowing things down for everyone, and it is best 
not just for interactivity for this to happen, but for fairness.

I suspect a lot of people will have something to say on this one...

Con


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.6.0-test2-mm1 results
  2003-07-31 15:13         ` Con Kolivas
@ 2003-07-31 15:19           ` Martin J. Bligh
  2003-07-31 15:35             ` Con Kolivas
  2003-07-31 21:19             ` William Lee Irwin III
  2003-07-31 17:03           ` Bill Davidsen
  1 sibling, 2 replies; 23+ messages in thread
From: Martin J. Bligh @ 2003-07-31 15:19 UTC (permalink / raw)
  To: Con Kolivas, Andrew Morton; +Cc: linux-kernel

>> Does this help interactivity a lot, or was it just an experiment?
>> Perhaps it could be less agressive or something?
> 
> Well basically this is a side effect of selecting out the correct cpu hogs in 
> the interactivity estimator. It seems to be working ;-) The more cpu hogs 
> they are the lower dynamic priority (higher number) they get, and the more 
> likely they are to be removed from the active array if they use up their full 
> timeslice. The scheduler in it's current form costs more to resurrect things 
> from the expired array and restart them, and the cpu hogs will have to wait 
> till other less cpu hogging tasks run. 
> 
> How do we get around this? I'll be brave here and say I'm not sure we need to, 
> as cpu hogs have a knack of slowing things down for everyone, and it is best 
> not just for interactivity for this to happen, but for fairness.
> 
> I suspect a lot of people will have something to say on this one...

Well, what you want to do is prioritise interactive tasks over cpu hogs.
What *seems* to be happening is you're just switching between cpu hogs
more ... that doesn't help anyone really. I don't have an easy answer
for how to fix that, but it doesn't seem desireable to me - we need some
better way of working out what's interactive, and what's not.

M.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.6.0-test2-mm1 results
  2003-07-31 15:19           ` Martin J. Bligh
@ 2003-07-31 15:35             ` Con Kolivas
  2003-07-31 16:01               ` Martin J. Bligh
  2003-07-31 21:19             ` William Lee Irwin III
  1 sibling, 1 reply; 23+ messages in thread
From: Con Kolivas @ 2003-07-31 15:35 UTC (permalink / raw)
  To: Martin J. Bligh, Andrew Morton; +Cc: linux-kernel

On Fri, 1 Aug 2003 01:19, Martin J. Bligh wrote:
> >> Does this help interactivity a lot, or was it just an experiment?
> >> Perhaps it could be less agressive or something?
> >
> > Well basically this is a side effect of selecting out the correct cpu
> > hogs in the interactivity estimator. It seems to be working ;-) The more
> > cpu hogs they are the lower dynamic priority (higher number) they get,
> > and the more likely they are to be removed from the active array if they
> > use up their full timeslice. The scheduler in it's current form costs
> > more to resurrect things from the expired array and restart them, and the
> > cpu hogs will have to wait till other less cpu hogging tasks run.
> >
> > How do we get around this? I'll be brave here and say I'm not sure we
> > need to, as cpu hogs have a knack of slowing things down for everyone,
> > and it is best not just for interactivity for this to happen, but for
> > fairness.
> >
> > I suspect a lot of people will have something to say on this one...
>
> Well, what you want to do is prioritise interactive tasks over cpu hogs.
> What *seems* to be happening is you're just switching between cpu hogs
> more ... that doesn't help anyone really. I don't have an easy answer
> for how to fix that, but it doesn't seem desireable to me - we need some
> better way of working out what's interactive, and what's not.

Indeed and now that I've thought about it some more, there are 2 other 
possible contributors

1. Tasks also round robin at 25ms. Ingo said he's not sure if that's too low, 
and it definitely drops throughput measurably but slightly.
A simple experiment is changing the timeslice granularity in sched.c and see 
if that fixes it to see if that's the cause.

2. Tasks waiting for 1 second are considered starved, so cpu hogs running with 
their full timeslice used up when something is waiting that long will be 
expired. That used to be 10 seconds.
Changing starvation limit will show if that contributes.

Con


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.6.0-test2-mm1 results
  2003-07-31 15:35             ` Con Kolivas
@ 2003-07-31 16:01               ` Martin J. Bligh
  2003-07-31 16:11                 ` Con Kolivas
  0 siblings, 1 reply; 23+ messages in thread
From: Martin J. Bligh @ 2003-07-31 16:01 UTC (permalink / raw)
  To: Con Kolivas, Andrew Morton, Ingo Molnar; +Cc: linux-kernel

> On Fri, 1 Aug 2003 01:19, Martin J. Bligh wrote:
>> >> Does this help interactivity a lot, or was it just an experiment?
>> >> Perhaps it could be less agressive or something?
>> > 
>> > Well basically this is a side effect of selecting out the correct cpu
>> > hogs in the interactivity estimator. It seems to be working ;-) The more
>> > cpu hogs they are the lower dynamic priority (higher number) they get,
>> > and the more likely they are to be removed from the active array if they
>> > use up their full timeslice. The scheduler in it's current form costs
>> > more to resurrect things from the expired array and restart them, and the
>> > cpu hogs will have to wait till other less cpu hogging tasks run.
>> > 
>> > How do we get around this? I'll be brave here and say I'm not sure we
>> > need to, as cpu hogs have a knack of slowing things down for everyone,
>> > and it is best not just for interactivity for this to happen, but for
>> > fairness.
>> > 
>> > I suspect a lot of people will have something to say on this one...
>> 
>> Well, what you want to do is prioritise interactive tasks over cpu hogs.
>> What *seems* to be happening is you're just switching between cpu hogs
>> more ... that doesn't help anyone really. I don't have an easy answer
>> for how to fix that, but it doesn't seem desireable to me - we need some
>> better way of working out what's interactive, and what's not.
> 
> Indeed and now that I've thought about it some more, there are 2 other 
> possible contributors
> 
> 1. Tasks also round robin at 25ms. Ingo said he's not sure if that's too low, 
> and it definitely drops throughput measurably but slightly.
> A simple experiment is changing the timeslice granularity in sched.c and see 
> if that fixes it to see if that's the cause.
> 
> 2. Tasks waiting for 1 second are considered starved, so cpu hogs running with 
> their full timeslice used up when something is waiting that long will be 
> expired. That used to be 10 seconds.
> Changing starvation limit will show if that contributes.

Ah. If I'm doing a full "make -j" I have almost 100 tasks per cpu.
if it's 25ms or 100ms timeslice that's 2.5 or 10s to complete the
timeslice. Won't that make *everyone* seem starved? Not sure that's
a good idea ... reminds me of Dilbert: "we're going to focus particularly
on ... everything!" ;-)

M.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.6.0-test2-mm1 results
  2003-07-31 16:01               ` Martin J. Bligh
@ 2003-07-31 16:11                 ` Con Kolivas
  0 siblings, 0 replies; 23+ messages in thread
From: Con Kolivas @ 2003-07-31 16:11 UTC (permalink / raw)
  To: Martin J. Bligh, Andrew Morton, Ingo Molnar; +Cc: linux-kernel

On Fri, 1 Aug 2003 02:01, Martin J. Bligh wrote:
> > On Fri, 1 Aug 2003 01:19, Martin J. Bligh wrote:
> >> >> Does this help interactivity a lot, or was it just an experiment?
> >> >> Perhaps it could be less agressive or something?
> >> >
> >> > Well basically this is a side effect of selecting out the correct cpu
> >> > hogs in the interactivity estimator. It seems to be working ;-) The
> >> > more cpu hogs they are the lower dynamic priority (higher number) they
> >> > get, and the more likely they are to be removed from the active array
> >> > if they use up their full timeslice. The scheduler in it's current
> >> > form costs more to resurrect things from the expired array and restart
> >> > them, and the cpu hogs will have to wait till other less cpu hogging
> >> > tasks run.
> >> >
> >> > How do we get around this? I'll be brave here and say I'm not sure we
> >> > need to, as cpu hogs have a knack of slowing things down for everyone,
> >> > and it is best not just for interactivity for this to happen, but for
> >> > fairness.
> >> >
> >> > I suspect a lot of people will have something to say on this one...
> >>
> >> Well, what you want to do is prioritise interactive tasks over cpu hogs.
> >> What *seems* to be happening is you're just switching between cpu hogs
> >> more ... that doesn't help anyone really. I don't have an easy answer
> >> for how to fix that, but it doesn't seem desireable to me - we need some
> >> better way of working out what's interactive, and what's not.
> >
> > Indeed and now that I've thought about it some more, there are 2 other
> > possible contributors
> >
> > 1. Tasks also round robin at 25ms. Ingo said he's not sure if that's too
> > low, and it definitely drops throughput measurably but slightly.
> > A simple experiment is changing the timeslice granularity in sched.c and
> > see if that fixes it to see if that's the cause.
> >
> > 2. Tasks waiting for 1 second are considered starved, so cpu hogs running
> > with their full timeslice used up when something is waiting that long
> > will be expired. That used to be 10 seconds.
> > Changing starvation limit will show if that contributes.
>
> Ah. If I'm doing a full "make -j" I have almost 100 tasks per cpu.
> if it's 25ms or 100ms timeslice that's 2.5 or 10s to complete the
> timeslice. Won't that make *everyone* seem starved? Not sure that's
> a good idea ... reminds me of Dilbert: "we're going to focus particularly
> on ... everything!" ;-)

The starvation thingy is also dependent on number of running tasks.

I quote from the master engineer Ingo's codebook:

#define EXPIRED_STARVING(rq) \
		(STARVATION_LIMIT && ((rq)->expired_timestamp && \
		(jiffies - (rq)->expired_timestamp >= \
			STARVATION_LIMIT * ((rq)->nr_running) + 1)))

Where STARVATION_LIMIT is 1 second.

Con


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.6.0-test2-mm1 results
  2003-07-31 15:13         ` Con Kolivas
  2003-07-31 15:19           ` Martin J. Bligh
@ 2003-07-31 17:03           ` Bill Davidsen
  1 sibling, 0 replies; 23+ messages in thread
From: Bill Davidsen @ 2003-07-31 17:03 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel

On Fri, 1 Aug 2003, Con Kolivas wrote:


> > Does this help interactivity a lot, or was it just an experiment?
> > Perhaps it could be less agressive or something?
> 
> Well basically this is a side effect of selecting out the correct cpu hogs in 
> the interactivity estimator. It seems to be working ;-) The more cpu hogs 
> they are the lower dynamic priority (higher number) they get, and the more 
> likely they are to be removed from the active array if they use up their full 
> timeslice. The scheduler in it's current form costs more to resurrect things 
> from the expired array and restart them, and the cpu hogs will have to wait 
> till other less cpu hogging tasks run. 

If that's what it really does, fine. I'm not sure it really finds hogs,
though, or rather "finds only true hogs."

> 
> How do we get around this? I'll be brave here and say I'm not sure we need to, 
> as cpu hogs have a knack of slowing things down for everyone, and it is best 
> not just for interactivity for this to happen, but for fairness.

While this does a good job I'm still worried that we don't have a good
handle on which processes are realy interactive in term of interfacing
with a human. I don't think we can make the scheduler do the right thing
in every case unless it has better information.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.6.0-test2-mm1 results
  2003-07-31 15:19           ` Martin J. Bligh
  2003-07-31 15:35             ` Con Kolivas
@ 2003-07-31 21:19             ` William Lee Irwin III
  1 sibling, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-07-31 21:19 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Con Kolivas, Andrew Morton, linux-kernel

At some point in the past, Con Kolivas wrote:
>> How do we get around this? I'll be brave here and say I'm not sure
>> we need to, as cpu hogs have a knack of slowing things down for
>> everyone, and it is best not just for interactivity for this to
>> happen, but for fairness.

On Thu, Jul 31, 2003 at 08:19:01AM -0700, Martin J. Bligh wrote:
> Well, what you want to do is prioritise interactive tasks over cpu hogs.
> What *seems* to be happening is you're just switching between cpu hogs
> more ... that doesn't help anyone really. I don't have an easy answer
> for how to fix that, but it doesn't seem desireable to me - we need some
> better way of working out what's interactive, and what's not.

I don't believe so. You're describing the precise effect of finite-
quantum FB (or tiny quantum RR) on long-running tasks. Generally
multilevel queues are used to back off to a service-time dependent
queueing discipline (e.g. use RR with increasing quanta for each level
and use level promotion and demotion to discriminate interactive tasks,
which remain higher-priority since overall policy is FB) with longer
timeslices for such beasts for less context-switching overhead. I say
lengthen timeslices with service time and make priority preemption work.

-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Panic on 2.6.0-test1-mm1
  2003-07-29 14:37 Panic on 2.6.0-test1-mm1 Martin J. Bligh
  2003-07-29 21:18 ` Martin J. Bligh
@ 2003-07-31 22:37 ` William Lee Irwin III
  2003-07-31 22:41   ` William Lee Irwin III
  2003-08-01  0:47   ` Martin J. Bligh
  1 sibling, 2 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-07-31 22:37 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel

On Tue, Jul 29, 2003 at 07:37:00AM -0700, Martin J. Bligh wrote:
> The big box had this on the console ... looks like it was doing a
> compile at the time ... sorry, only just noticed it after returning
> from OLS, so don't have more context (2.6.0-test1-mm1).
> kernel BUG at include/linux/list.h:149!
> invalid operand: 0000 [#1]
> SMP 
> CPU:    3
> EIP:    0060:[<c0117f98>]    Not tainted VLI
> EFLAGS: 00010083
> EIP is at pgd_dtor+0x64/0x8c

This is on PAE, so you're in far deeper trouble than I could have caused:

        pgd_cache = kmem_cache_create("pgd",
                                PTRS_PER_PGD*sizeof(pgd_t),
                                0,
                                SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN,
                                pgd_ctor,
                                PTRS_PER_PMD == 1 ? pgd_dtor : NULL);

You've applied mingo's patch, which needs to check for PAE in certain
places like the above. Backing out highpmd didn't make this easier, it
just gave you performance problems because now all your pmd's are stuck
on node 0 and another side-effect of those changes is that you're now
pounding pgd_lock on 16x+ boxen. You could back out the preconstruction
altogether, if you're hellbent on backing out everyone else's patches
until your code has nothing to merge against.


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Panic on 2.6.0-test1-mm1
  2003-07-31 22:41   ` William Lee Irwin III
@ 2003-07-31 22:40     ` Andrew Morton
  2003-08-01  0:15       ` William Lee Irwin III
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2003-07-31 22:40 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: mbligh, linux-kernel

William Lee Irwin III <wli@holomorphy.com> wrote:
>
> You may now put the "aggravated" magnet beneath the "wli" position on
> the fridge.

I never, ever, at any stage was told that highpmd.patch offered any
benefits wrt lock contention or node locality.  I was only told that it
saved a little bit of memory on highmem boxes.

It would be useful to actually tell me what your patches do.  And to
provide test results which demonstrate the magnitude of the performance
benefits.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Panic on 2.6.0-test1-mm1
  2003-07-31 22:37 ` Panic on 2.6.0-test1-mm1 William Lee Irwin III
@ 2003-07-31 22:41   ` William Lee Irwin III
  2003-07-31 22:40     ` Andrew Morton
  2003-08-01  0:47   ` Martin J. Bligh
  1 sibling, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2003-07-31 22:41 UTC (permalink / raw)
  To: Martin J. Bligh, Andrew Morton, linux-kernel

On Thu, Jul 31, 2003 at 03:37:10PM -0700, William Lee Irwin III wrote:
> You've applied mingo's patch, which needs to check for PAE in certain
> places like the above. Backing out highpmd didn't make this easier, it
> just gave you performance problems because now all your pmd's are stuck
> on node 0 and another side-effect of those changes is that you're now
> pounding pgd_lock on 16x+ boxen. You could back out the preconstruction
> altogether, if you're hellbent on backing out everyone else's patches
> until your code has nothing to merge against.

I also did the merging of pgtable.c for highpmd and the preconstruction
code correctly, sent it upstream, and it got ignored in favor of code
that does it incorrectly, oopses, and by some voodoo gets something else
I wrote dropped while remaining incorrect.

You may now put the "aggravated" magnet beneath the "wli" position on
the fridge.


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Panic on 2.6.0-test1-mm1
  2003-07-31 22:40     ` Andrew Morton
@ 2003-08-01  0:15       ` William Lee Irwin III
  2003-08-01  0:18         ` Zwane Mwaikambo
  2003-08-01  0:20         ` William Lee Irwin III
  0 siblings, 2 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-08-01  0:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mbligh, linux-kernel

William Lee Irwin III <wli@holomorphy.com> wrote:
>> You may now put the "aggravated" magnet beneath the "wli" position on
>> the fridge.

On Thu, Jul 31, 2003 at 03:40:20PM -0700, Andrew Morton wrote:
> I never, ever, at any stage was told that highpmd.patch offered any
> benefits wrt lock contention or node locality.  I was only told that it
> saved a little bit of memory on highmem boxes.

The lock contention is unrelated apart from the mangling of pgd_ctor().
The node locality is only important on systems with exaggerated NUMA
characteristics, such as the kind Martin and I bench on.


On Thu, Jul 31, 2003 at 03:40:20PM -0700, Andrew Morton wrote:
> It would be useful to actually tell me what your patches do.  And to
> provide test results which demonstrate the magnitude of the performance
> benefits.

I don't believe it would be valuable to push it on the grounds of
performance, as the performance characteristics of modern midrange i386
systems don't have such high remote access penalties.

The complaint was targetted more at errors in some new incoming patch
motivating mine being backed out.


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Panic on 2.6.0-test1-mm1
  2003-08-01  0:15       ` William Lee Irwin III
@ 2003-08-01  0:18         ` Zwane Mwaikambo
  2003-08-01  0:20         ` William Lee Irwin III
  1 sibling, 0 replies; 23+ messages in thread
From: Zwane Mwaikambo @ 2003-08-01  0:18 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andrew Morton, mbligh, linux-kernel

On Thu, 31 Jul 2003, William Lee Irwin III wrote:

> I don't believe it would be valuable to push it on the grounds of
> performance, as the performance characteristics of modern midrange i386
> systems don't have such high remote access penalties.

Others might be interested to know about the effects (performance, memory 
consumption etc) nonetheless, regardless of how large or negligible. It 
helps in finding out where to start looking when things improve (or regress).

Thanks for the work anyway,
	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Panic on 2.6.0-test1-mm1
  2003-08-01  0:15       ` William Lee Irwin III
  2003-08-01  0:18         ` Zwane Mwaikambo
@ 2003-08-01  0:20         ` William Lee Irwin III
  1 sibling, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-08-01  0:20 UTC (permalink / raw)
  To: Andrew Morton, mbligh, linux-kernel

On Thu, Jul 31, 2003 at 05:15:38PM -0700, William Lee Irwin III wrote:
> The complaint was targetted more at errors in some new incoming patch
> motivating mine being backed out.

Oh, and mbligh's inaccurate bug reporting (failure to report the XKVA
patch being applied).


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Panic on 2.6.0-test1-mm1
  2003-07-31 22:37 ` Panic on 2.6.0-test1-mm1 William Lee Irwin III
  2003-07-31 22:41   ` William Lee Irwin III
@ 2003-08-01  0:47   ` Martin J. Bligh
  2003-08-01  0:53     ` William Lee Irwin III
  1 sibling, 1 reply; 23+ messages in thread
From: Martin J. Bligh @ 2003-08-01  0:47 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andrew Morton, linux-kernel

> On Tue, Jul 29, 2003 at 07:37:00AM -0700, Martin J. Bligh wrote:
>> The big box had this on the console ... looks like it was doing a
>> compile at the time ... sorry, only just noticed it after returning
>> from OLS, so don't have more context (2.6.0-test1-mm1).
>> kernel BUG at include/linux/list.h:149!
>> invalid operand: 0000 [#1]
>> SMP 
>> CPU:    3
>> EIP:    0060:[<c0117f98>]    Not tainted VLI
>> EFLAGS: 00010083
>> EIP is at pgd_dtor+0x64/0x8c
> 
> This is on PAE, so you're in far deeper trouble than I could have caused:
> 
>         pgd_cache = kmem_cache_create("pgd",
>                                 PTRS_PER_PGD*sizeof(pgd_t),
>                                 0,
>                                 SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN,
>                                 pgd_ctor,
>                                 PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
> 
> You've applied mingo's patch, which needs to check for PAE in certain
> places like the above. Backing out highpmd didn't make this easier, it
> just gave you performance problems because now all your pmd's are stuck
> on node 0 and another side-effect of those changes is that you're now
> pounding pgd_lock on 16x+ boxen. You could back out the preconstruction
> altogether, if you're hellbent on backing out everyone else's patches
> until your code has nothing to merge against.

I think this was just virgin -mm1, I can go back and double check ...
Not sure what the stuff about backing out other peoples patches was
all about, I just pointed out the crash.

Andrew had backed out highpmd for other reasons before I even mailed
this out, if that's what your knickers are all twisted about ... I have
no evidence that was causing the problem ... merely that it goes away
on -test2-mm1 ... it was Andrew's suggestion, not mine.

M.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Panic on 2.6.0-test1-mm1
  2003-08-01  0:47   ` Martin J. Bligh
@ 2003-08-01  0:53     ` William Lee Irwin III
  2003-08-01  0:57       ` Martin J. Bligh
  0 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2003-08-01  0:53 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel

At some point in the past, I wrote:
>>         pgd_cache = kmem_cache_create("pgd",
>>                                 PTRS_PER_PGD*sizeof(pgd_t),
>>                                 0,
>>                                 SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN,
>>                                 pgd_ctor,
>>                                 PTRS_PER_PMD == 1 ? pgd_dtor : NULL);

On Thu, Jul 31, 2003 at 05:47:55PM -0700, Martin J. Bligh wrote:
> I think this was just virgin -mm1, I can go back and double check ...
> Not sure what the stuff about backing out other peoples patches was
> all about, I just pointed out the crash.

pgd_dtor() will never be called on PAE due to the above code (thanks to
the PTRS_PER_PMD check), _unless_ mingo's patch is applied (which backs
out the PTRS_PER_PMD check).


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Panic on 2.6.0-test1-mm1
  2003-08-01  0:53     ` William Lee Irwin III
@ 2003-08-01  0:57       ` Martin J. Bligh
  2003-08-01  1:02         ` William Lee Irwin III
  0 siblings, 1 reply; 23+ messages in thread
From: Martin J. Bligh @ 2003-08-01  0:57 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andrew Morton, linux-kernel

> At some point in the past, I wrote:
>>>         pgd_cache = kmem_cache_create("pgd",
>>>                                 PTRS_PER_PGD*sizeof(pgd_t),
>>>                                 0,
>>>                                 SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN,
>>>                                 pgd_ctor,
>>>                                 PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
> 
> On Thu, Jul 31, 2003 at 05:47:55PM -0700, Martin J. Bligh wrote:
>> I think this was just virgin -mm1, I can go back and double check ...
>> Not sure what the stuff about backing out other peoples patches was
>> all about, I just pointed out the crash.
> 
> pgd_dtor() will never be called on PAE due to the above code (thanks to
> the PTRS_PER_PMD check), _unless_ mingo's patch is applied (which backs
> out the PTRS_PER_PMD check).

OK, might have made a mistake ... I can rerun it if you want, but the 
latest kernel seems to work now.

M.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Panic on 2.6.0-test1-mm1
  2003-08-01  0:57       ` Martin J. Bligh
@ 2003-08-01  1:02         ` William Lee Irwin III
  0 siblings, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-08-01  1:02 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel

At some point in the past, I wrote:
>> pgd_dtor() will never be called on PAE due to the above code (thanks to
>> the PTRS_PER_PMD check), _unless_ mingo's patch is applied (which backs
>> out the PTRS_PER_PMD check).

On Thu, Jul 31, 2003 at 05:57:49PM -0700, Martin J. Bligh wrote:
> OK, might have made a mistake ... I can rerun it if you want, but the 
> latest kernel seems to work now.

There was a spinlock acquisition in there, too, so if you're seeing
weird performance effects in an update (not sure if there are any yet),
generating a patch to skip that, the list op, and not install pgd_dtor()
when PTRS_PER_PMD == 1 is in order.


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2003-08-01  1:01 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-29 14:37 Panic on 2.6.0-test1-mm1 Martin J. Bligh
2003-07-29 21:18 ` Martin J. Bligh
2003-07-30 15:01   ` 2.6.0-test2-mm1 results Martin J. Bligh
2003-07-30 15:28     ` Con Kolivas
2003-07-30 16:27       ` Martin J. Bligh
2003-07-31 14:56       ` Martin J. Bligh
2003-07-31 15:13         ` Con Kolivas
2003-07-31 15:19           ` Martin J. Bligh
2003-07-31 15:35             ` Con Kolivas
2003-07-31 16:01               ` Martin J. Bligh
2003-07-31 16:11                 ` Con Kolivas
2003-07-31 21:19             ` William Lee Irwin III
2003-07-31 17:03           ` Bill Davidsen
2003-07-31 22:37 ` Panic on 2.6.0-test1-mm1 William Lee Irwin III
2003-07-31 22:41   ` William Lee Irwin III
2003-07-31 22:40     ` Andrew Morton
2003-08-01  0:15       ` William Lee Irwin III
2003-08-01  0:18         ` Zwane Mwaikambo
2003-08-01  0:20         ` William Lee Irwin III
2003-08-01  0:47   ` Martin J. Bligh
2003-08-01  0:53     ` William Lee Irwin III
2003-08-01  0:57       ` Martin J. Bligh
2003-08-01  1:02         ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).