* Re: 2.6.14-rc2-mm1
2005-09-22 5:28 2.6.14-rc2-mm1 Andrew Morton
@ 2005-09-22 6:35 ` Joel Becker
2005-09-22 6:46 ` 2.6.14-rc2-mm1 Reuben Farrelly
` (5 subsequent siblings)
6 siblings, 0 replies; 37+ messages in thread
From: Joel Becker @ 2005-09-22 6:35 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
On Wed, Sep 21, 2005 at 10:28:39PM -0700, Andrew Morton wrote:
> git-ocfs2-prep.patch
> git-ocfs2.patch
As the truncate_inode_pages patch is now in Linus' git, it is
no longer in git-ocfs2.patch. -rc2-mm1 is effectively reverting it.
git-ocfs2-prep.patch should be removed.
Joel
--
"There is no sincerer love than the love of food."
- George Bernard Shaw
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-22 5:28 2.6.14-rc2-mm1 Andrew Morton
2005-09-22 6:35 ` 2.6.14-rc2-mm1 Joel Becker
@ 2005-09-22 6:46 ` Reuben Farrelly
2005-09-22 7:03 ` 2.6.14-rc2-mm1 Andrew Morton
2005-09-22 18:59 ` 2.6.14-rc2-mm1 Martin J. Bligh
` (4 subsequent siblings)
6 siblings, 1 reply; 37+ messages in thread
From: Reuben Farrelly @ 2005-09-22 6:46 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-ide
Hi,
On 22/09/2005 5:28 p.m., Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
>
> - Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.
>
> - Various random other things - nothing major.
Overall boots up and looks fine, but still seeing this oops which comes up on
warm reboot intermittently:
ahci(0000:00:1f.2) AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl SATA mode
ahci(0000:00:1f.2) flags: 64bit ncq led slum part
ata1: SATA max UDMA/133 cmd 0xF8802D00 ctl 0x0 bmdma 0x0 irq 193
ata2: SATA max UDMA/133 cmd 0xF8802D80 ctl 0x0 bmdma 0x0 irq 193
ata3: SATA max UDMA/133 cmd 0xF8802E00 ctl 0x0 bmdma 0x0 irq 193
ata4: SATA max UDMA/133 cmd 0xF8802E80 ctl 0x0 bmdma 0x0 irq 193
ata1: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi0 : ahci
ata2: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
ata2: dev 0 configured for UDMA/133
scsi1 : ahci
ata3: no device found (phy stat 00000000)
scsi2 : ahci
ata4: no device found (phy stat 00000000)
scsi3 : ahci
Vendor: ATA Model: ST380817AS Rev: 3.42
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST380817AS Rev: 3.42
Type: Direct-Access ANSI SCSI revision: 05
scheduling while atomic: ksoftirqd/0/0x00000100/3
[<c0103ad0>] dump_stack+0x17/0x19
[<c031483a>] schedule+0x8ba/0xccb
[<c0315d17>] __down+0xe5/0x126
[<c0313f1a>] __down_failed+0xa/0x10
[<c0233f3d>] .text.lock.main+0x2b/0x3e
[<c022f90c>] device_del+0x35/0x5d
[<c025d71e>] scsi_target_reap+0x89/0xa3
[<c025ed5a>] scsi_device_dev_release+0x114/0x18b
[<c022f504>] device_release+0x1a/0x5a
[<c01e15c2>] kobject_cleanup+0x43/0x6b
[<c01e15f5>] kobject_release+0xb/0xd
[<c01e1e3c>] kref_put+0x2e/0x92
[<c01e160b>] kobject_put+0x14/0x16
[<c022f8d5>] put_device+0x11/0x13
[<c0256fd8>] scsi_put_command+0x7c/0x9e
[<c025b918>] scsi_next_command+0xf/0x19
[<c025b9db>] scsi_end_request+0x93/0xc5
[<c025bdd4>] scsi_io_completion+0x281/0x46a
[<c025c1c8>] scsi_generic_done+0x2d/0x3a
[<c0257746>] scsi_finish_command+0x7f/0x93
[<c025762b>] scsi_softirq+0xab/0x11c
[<c0121952>] __do_softirq+0x72/0xdc
[<c01219f3>] do_softirq+0x37/0x39
[<c0121eeb>] ksoftirqd+0x9f/0xf4
[<c012ff37>] kthread+0x99/0x9d
[<c01010b5>] kernel_thread_helper+0x5/0xb
Unable to handle kernel paging request<5>SCSI device sda: 156301488 512-byte
hdwr sectors (80026 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
SCSI device sda: drive cache: write back
sda: at virtual address 6b6b6b6b
printing eip:
c025b81f
*pde = 00000000
Oops: 0000 [#1]
SMP
last sysfs file:
Modules linked in:
CPU: 0
EIP: 0060:[<c025b81f>] Not tainted VLI
EFLAGS: 00010292 (2.6.14-rc2-mm1)
EIP is at scsi_run_queue+0x12/0xb8
eax: 6b6b6b6b ebx: f7c36b70 ecx: 00000000 edx: 00000001
esi: f7c4eb6c edi: 00000246 ebp: c1911eac esp: c1911e98
ds: 007b es: 007b ss: 0068
Process ksoftirqd/0 (pid: 3, threadinfo=c1910000 task=c1942a90)
Stack: c1baf5f8 f7c36b70 f7c36b70 f7c4eb6c 00000246 c1911eb8 c025b91f f7c386e8
c1911ed0 c025b9db f7c36b70 f7c4eb6c 00000000 00000000 c1911f28 c025bdd4
00000001 00004f80 00000100 00000001 c1807ac0 00000000 00000000 00040000
Call Trace:
[<c0103a83>] show_stack+0x94/0xca
[<c0103c2c>] show_registers+0x15a/0x1ea
[<c0103e4a>] die+0x108/0x183
[<c03166cd>] do_page_fault+0x1ed/0x63d
[<c0103753>] error_code+0x4f/0x54
[<c025b91f>] scsi_next_command+0x16/0x19
[<c025b9db>] scsi_end_request+0x93/0xc5
[<c025bdd4>] scsi_io_completion+0x281/0x46a
[<c025c1c8>] scsi_generic_done+0x2d/0x3a
[<c0257746>] scsi_finish_command+0x7f/0x93
[<c025762b>] scsi_softirq+0xab/0x11c
[<c0121952>] __do_softirq+0x72/0xdc
[<c01219f3>] do_softirq+0x37/0x39
[<c0121eeb>] ksoftirqd+0x9f/0xf4
[<c012ff37>] kthread+0x99/0x9d
[<c01010b5>] kernel_thread_helper+0x5/0xb
Code: fd ff 8b 4d ec 8b 41 44 e8 e4 a6 0b 00 89 45 f0 89 d8 e8 34 c1 ff ff eb
b2 55 89 e5 57 56 53 83 ec 08 89 45 f0 8b 80 10 01 00 00 <8b> 38 80 b8 85 01
00 00 00 0f 88 8b 00 00 00 8b 47 44 e8 af a6
<0>Kernel panic - not syncing: Fatal exception in interrupt
<0>Rebooting in 60 seconds..
This is not new to this -mm release (I had a screen dump of it 2 weeks ago but
I suspect it is actually a bit older than that even).
reuben
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-22 6:46 ` 2.6.14-rc2-mm1 Reuben Farrelly
@ 2005-09-22 7:03 ` Andrew Morton
0 siblings, 0 replies; 37+ messages in thread
From: Andrew Morton @ 2005-09-22 7:03 UTC (permalink / raw)
To: Reuben Farrelly; +Cc: linux-kernel, linux-ide, linux-scsi, James Bottomley
Reuben Farrelly <reuben-lkml@reub.net> wrote:
>
> Hi,
>
> On 22/09/2005 5:28 p.m., Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
> >
> > - Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.
> >
> > - Various random other things - nothing major.
>
> Overall boots up and looks fine, but still seeing this oops which comes up on
> warm reboot intermittently:
Nasty.
> ahci(0000:00:1f.2) AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl SATA mode
> ahci(0000:00:1f.2) flags: 64bit ncq led slum part
> ata1: SATA max UDMA/133 cmd 0xF8802D00 ctl 0x0 bmdma 0x0 irq 193
> ata2: SATA max UDMA/133 cmd 0xF8802D80 ctl 0x0 bmdma 0x0 irq 193
> ata3: SATA max UDMA/133 cmd 0xF8802E00 ctl 0x0 bmdma 0x0 irq 193
> ata4: SATA max UDMA/133 cmd 0xF8802E80 ctl 0x0 bmdma 0x0 irq 193
> ata1: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
> ata1: dev 0 configured for UDMA/133
> scsi0 : ahci
> ata2: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
> ata2: dev 0 configured for UDMA/133
> scsi1 : ahci
> ata3: no device found (phy stat 00000000)
> scsi2 : ahci
> ata4: no device found (phy stat 00000000)
> scsi3 : ahci
> Vendor: ATA Model: ST380817AS Rev: 3.42
> Type: Direct-Access ANSI SCSI revision: 05
> Vendor: ATA Model: ST380817AS Rev: 3.42
> Type: Direct-Access ANSI SCSI revision: 05
> scheduling while atomic: ksoftirqd/0/0x00000100/3
> [<c0103ad0>] dump_stack+0x17/0x19
> [<c031483a>] schedule+0x8ba/0xccb
> [<c0315d17>] __down+0xe5/0x126
> [<c0313f1a>] __down_failed+0xa/0x10
> [<c0233f3d>] .text.lock.main+0x2b/0x3e
> [<c022f90c>] device_del+0x35/0x5d
> [<c025d71e>] scsi_target_reap+0x89/0xa3
> [<c025ed5a>] scsi_device_dev_release+0x114/0x18b
> [<c022f504>] device_release+0x1a/0x5a
> [<c01e15c2>] kobject_cleanup+0x43/0x6b
> [<c01e15f5>] kobject_release+0xb/0xd
> [<c01e1e3c>] kref_put+0x2e/0x92
> [<c01e160b>] kobject_put+0x14/0x16
> [<c022f8d5>] put_device+0x11/0x13
> [<c0256fd8>] scsi_put_command+0x7c/0x9e
> [<c025b918>] scsi_next_command+0xf/0x19
> [<c025b9db>] scsi_end_request+0x93/0xc5
> [<c025bdd4>] scsi_io_completion+0x281/0x46a
> [<c025c1c8>] scsi_generic_done+0x2d/0x3a
> [<c0257746>] scsi_finish_command+0x7f/0x93
> [<c025762b>] scsi_softirq+0xab/0x11c
> [<c0121952>] __do_softirq+0x72/0xdc
> [<c01219f3>] do_softirq+0x37/0x39
> [<c0121eeb>] ksoftirqd+0x9f/0xf4
> [<c012ff37>] kthread+0x99/0x9d
> [<c01010b5>] kernel_thread_helper+0x5/0xb
There's a whole bunch of reasons why we cannot call scsi_target_reap() from
softirq context. klist_del() locking and whatever semaphore that's taking
are amongst them...
> Unable to handle kernel paging request<5>SCSI device sda: 156301488 512-byte
> hdwr sectors (80026 MB)
> SCSI device sda: drive cache: write back
> SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
> SCSI device sda: drive cache: write back
> sda: at virtual address 6b6b6b6b
> printing eip:
> c025b81f
> *pde = 00000000
> Oops: 0000 [#1]
> SMP
> last sysfs file:
> Modules linked in:
> CPU: 0
> EIP: 0060:[<c025b81f>] Not tainted VLI
> EFLAGS: 00010292 (2.6.14-rc2-mm1)
> EIP is at scsi_run_queue+0x12/0xb8
> eax: 6b6b6b6b ebx: f7c36b70 ecx: 00000000 edx: 00000001
> esi: f7c4eb6c edi: 00000246 ebp: c1911eac esp: c1911e98
> ds: 007b es: 007b ss: 0068
> Process ksoftirqd/0 (pid: 3, threadinfo=c1910000 task=c1942a90)
> Stack: c1baf5f8 f7c36b70 f7c36b70 f7c4eb6c 00000246 c1911eb8 c025b91f f7c386e8
> c1911ed0 c025b9db f7c36b70 f7c4eb6c 00000000 00000000 c1911f28 c025bdd4
> 00000001 00004f80 00000100 00000001 c1807ac0 00000000 00000000 00040000
> Call Trace:
> [<c0103a83>] show_stack+0x94/0xca
> [<c0103c2c>] show_registers+0x15a/0x1ea
> [<c0103e4a>] die+0x108/0x183
> [<c03166cd>] do_page_fault+0x1ed/0x63d
> [<c0103753>] error_code+0x4f/0x54
> [<c025b91f>] scsi_next_command+0x16/0x19
> [<c025b9db>] scsi_end_request+0x93/0xc5
> [<c025bdd4>] scsi_io_completion+0x281/0x46a
> [<c025c1c8>] scsi_generic_done+0x2d/0x3a
> [<c0257746>] scsi_finish_command+0x7f/0x93
> [<c025762b>] scsi_softirq+0xab/0x11c
> [<c0121952>] __do_softirq+0x72/0xdc
> [<c01219f3>] do_softirq+0x37/0x39
> [<c0121eeb>] ksoftirqd+0x9f/0xf4
> [<c012ff37>] kthread+0x99/0x9d
> [<c01010b5>] kernel_thread_helper+0x5/0xb
> Code: fd ff 8b 4d ec 8b 41 44 e8 e4 a6 0b 00 89 45 f0 89 d8 e8 34 c1 ff ff eb
> b2 55 89 e5 57 56 53 83 ec 08 89 45 f0 8b 80 10 01 00 00 <8b> 38 80 b8 85 01
> 00 00 00 0f 88 8b 00 00 00 8b 47 44 e8 af a6
> <0>Kernel panic - not syncing: Fatal exception in interrupt
> <0>Rebooting in 60 seconds..
>
It oopsed as well. That might be a second bug.
>
> This is not new to this -mm release (I had a screen dump of it 2 weeks ago but
> I suspect it is actually a bit older than that even).
>
Thanks.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-22 5:28 2.6.14-rc2-mm1 Andrew Morton
2005-09-22 6:35 ` 2.6.14-rc2-mm1 Joel Becker
2005-09-22 6:46 ` 2.6.14-rc2-mm1 Reuben Farrelly
@ 2005-09-22 18:59 ` Martin J. Bligh
2005-09-22 19:52 ` 2.6.14-rc2-mm1 Andrew Morton
2005-09-22 19:50 ` tty update speed regression (was: 2.6.14-rc2-mm1) Alexey Dobriyan
` (3 subsequent siblings)
6 siblings, 1 reply; 37+ messages in thread
From: Martin J. Bligh @ 2005-09-22 18:59 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Build breaks with this config (x440/summit):
http://ftp.kernel.org/pub/linux/kernel/people/mbligh/config/abat/elm3b67
arch/i386/kernel/built-in.o(.init.text+0x389d): In function `set_nmi_ipi_callback':
/usr/local/autobench/var/tmp/build/arch/i386/kernel/traps.c:727: undefined reference to `usb_early_handoff'
arch/i386/kernel/built-in.o(.init.text+0x4ee0): In function `smp_read_mpc':
/usr/local/autobench/var/tmp/build/include/asm-i386/mach-summit/mach_mpparse.h:35: undefined reference to `usb_early_handoff'
Plus it panics on boot on Power-4 LPAR
Memory: 30962716k/31457280k available (4308k kernel code, 494564k reserved, 1112k data, 253k bss, 420k init)
Mount-cache hash table entries: 256
softlockup thread 0 started up.
Processor 1 found.
softlockup thread 1 started up.
Processor 2 found.
softlockup thread 2 started up.
Processor 3 found.
Brought up 4 CPUs
softlockup thread 3 started up.
NET: Registered protocol family 16
PCI: Probing PCI hardware
IOMMU table initialized, virtual merging disabled
PCI_DMA: iommu_table_setparms: /pci@3fffde0a000/pci@2,2 has missing tce entries !
Kernel panic - not syncing: iommu_init_table: Can't allocate 1729382256943765922 bytes
<7>RTAS: event: 3, Type: Internal Device Failure, Severity: 5
ibm,os-term call failed -1
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-22 18:59 ` 2.6.14-rc2-mm1 Martin J. Bligh
@ 2005-09-22 19:52 ` Andrew Morton
2005-09-22 20:14 ` 2.6.14-rc2-mm1 Martin J. Bligh
2005-09-22 22:28 ` 2.6.14-rc2-mm1 - ide problems ? Badari Pulavarty
0 siblings, 2 replies; 37+ messages in thread
From: Andrew Morton @ 2005-09-22 19:52 UTC (permalink / raw)
To: Martin J. Bligh
Cc: linux-kernel, David Brownell, Paul Mackerras, Antonino A. Daplas,
Benjamin Herrenschmidt
"Martin J. Bligh" <mbligh@mbligh.org> wrote:
>
> Build breaks with this config (x440/summit):
> http://ftp.kernel.org/pub/linux/kernel/people/mbligh/config/abat/elm3b67
>
> arch/i386/kernel/built-in.o(.init.text+0x389d): In function `set_nmi_ipi_callback':
> /usr/local/autobench/var/tmp/build/arch/i386/kernel/traps.c:727: undefined reference to `usb_early_handoff'
> arch/i386/kernel/built-in.o(.init.text+0x4ee0): In function `smp_read_mpc':
> /usr/local/autobench/var/tmp/build/include/asm-i386/mach-summit/mach_mpparse.h:35: undefined reference to `usb_early_handoff'
>
grr. David had a hack in there which caused my links to fail so I hacked
it out and broke yours.
> Plus it panics on boot on Power-4 LPAR
>
> Memory: 30962716k/31457280k available (4308k kernel code, 494564k reserved, 1112k data, 253k bss, 420k init)
> Mount-cache hash table entries: 256
> softlockup thread 0 started up.
> Processor 1 found.
> softlockup thread 1 started up.
> Processor 2 found.
> softlockup thread 2 started up.
> Processor 3 found.
> Brought up 4 CPUs
> softlockup thread 3 started up.
> NET: Registered protocol family 16
> PCI: Probing PCI hardware
> IOMMU table initialized, virtual merging disabled
> PCI_DMA: iommu_table_setparms: /pci@3fffde0a000/pci@2,2 has missing tce entries !
> Kernel panic - not syncing: iommu_init_table: Can't allocate 1729382256943765922 bytes
>
> <7>RTAS: event: 3, Type: Internal Device Failure, Severity: 5
> ibm,os-term call failed -1
There are ppc64 IOMMU changes in Linus's tree...
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-22 19:52 ` 2.6.14-rc2-mm1 Andrew Morton
@ 2005-09-22 20:14 ` Martin J. Bligh
2005-09-23 0:28 ` 2.6.14-rc2-mm1 Martin J. Bligh
2005-09-22 22:28 ` 2.6.14-rc2-mm1 - ide problems ? Badari Pulavarty
1 sibling, 1 reply; 37+ messages in thread
From: Martin J. Bligh @ 2005-09-22 20:14 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, David Brownell, Paul Mackerras, Antonino A. Daplas,
Benjamin Herrenschmidt
>> Plus it panics on boot on Power-4 LPAR
>>
>> Memory: 30962716k/31457280k available (4308k kernel code, 494564k reserved, 1112k data, 253k bss, 420k init)
>> Mount-cache hash table entries: 256
>> softlockup thread 0 started up.
>> Processor 1 found.
>> softlockup thread 1 started up.
>> Processor 2 found.
>> softlockup thread 2 started up.
>> Processor 3 found.
>> Brought up 4 CPUs
>> softlockup thread 3 started up.
>> NET: Registered protocol family 16
>> PCI: Probing PCI hardware
>> IOMMU table initialized, virtual merging disabled
>> PCI_DMA: iommu_table_setparms: /pci@3fffde0a000/pci@2,2 has missing tce entries !
>> Kernel panic - not syncing: iommu_init_table: Can't allocate 1729382256943765922 bytes
>>
>> <7>RTAS: event: 3, Type: Internal Device Failure, Severity: 5
>> ibm,os-term call failed -1
>
> There are ppc64 IOMMU changes in Linus's tree...
Thanks. will retest with just linus.patch to confirm
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-22 20:14 ` 2.6.14-rc2-mm1 Martin J. Bligh
@ 2005-09-23 0:28 ` Martin J. Bligh
0 siblings, 0 replies; 37+ messages in thread
From: Martin J. Bligh @ 2005-09-23 0:28 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, David Brownell, Paul Mackerras, Antonino A. Daplas,
Benjamin Herrenschmidt
--"Martin J. Bligh" <mbligh@mbligh.org> wrote (on Thursday, September 22, 2005 13:14:11 -0700):
>>> Plus it panics on boot on Power-4 LPAR
>>>
>>> Memory: 30962716k/31457280k available (4308k kernel code, 494564k reserved, 1112k data, 253k bss, 420k init)
>>> Mount-cache hash table entries: 256
>>> softlockup thread 0 started up.
>>> Processor 1 found.
>>> softlockup thread 1 started up.
>>> Processor 2 found.
>>> softlockup thread 2 started up.
>>> Processor 3 found.
>>> Brought up 4 CPUs
>>> softlockup thread 3 started up.
>>> NET: Registered protocol family 16
>>> PCI: Probing PCI hardware
>>> IOMMU table initialized, virtual merging disabled
>>> PCI_DMA: iommu_table_setparms: /pci@3fffde0a000/pci@2,2 has missing tce entries !
>>> Kernel panic - not syncing: iommu_init_table: Can't allocate 1729382256943765922 bytes
>>>
>>> <7>RTAS: event: 3, Type: Internal Device Failure, Severity: 5
>>> ibm,os-term call failed -1
>>
>> There are ppc64 IOMMU changes in Linus's tree...
>
> Thanks. will retest with just linus.patch to confirm
Yeah, is broken there too. Borkage in mainline! ;-)
http://test.kernel.org/13316/debug/console.log
if someone wants to look ...
M.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1 - ide problems ?
2005-09-22 19:52 ` 2.6.14-rc2-mm1 Andrew Morton
2005-09-22 20:14 ` 2.6.14-rc2-mm1 Martin J. Bligh
@ 2005-09-22 22:28 ` Badari Pulavarty
2005-09-22 23:39 ` Andrew Morton
1 sibling, 1 reply; 37+ messages in thread
From: Badari Pulavarty @ 2005-09-22 22:28 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml
[-- Attachment #1: Type: text/plain, Size: 100 bytes --]
Hi Andrew,
My ide-based AMD64 machine doesn't boot 2.6.14-rc2-mm1.
Known issue ?
Thanks,
Badari
[-- Attachment #2: amd-log --]
[-- Type: text/plain, Size: 10429 bytes --]
Bootdata ok (command line is root=/dev/hda2 vga=0x314 selinux=0 splash=silent console=tty0 console=ttyS0,38400 resume=/dev/hda1 profile=2)
Linux version 2.6.14-rc2-mm1 (root@elm3b29) (gcc version 3.3.3 (SuSE Linux)) #1 SMP Thu Sep 22 14:46:56 PDT 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000ca000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000dfef0000 (usable)
BIOS-e820: 00000000dfef0000 - 00000000dfeff000 (ACPI data)
BIOS-e820: 00000000dfeff000 - 00000000dff00000 (ACPI NVS)
BIOS-e820: 00000000dff00000 - 00000000e0000000 (usable)
BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 00000001e0000000 (usable)
Scanning NUMA topology in Northbridge 24
Number of nodes 4
Node 0 MemBase 0000000000000000 Limit 000000017fffffff
Node 1 MemBase 0000000180000000 Limit 000000019fffffff
Node 2 MemBase 00000001a0000000 Limit 00000001bfffffff
Node 3 MemBase 00000001c0000000 Limit 00000001dfffffff
Using node hash shift of 21
Bootmem setup node 0 0000000000000000-000000017fffffff
Bootmem setup node 1 0000000180000000-000000019fffffff
Bootmem setup node 2 00000001a0000000-00000001bfffffff
Bootmem setup node 3 00000001c0000000-00000001dfffffff
ACPI: PM-Timer IO Port: 0x8008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
Processor #2 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
Processor #3 15:5 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 4, version 17, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x05] address[0xfa3e0000] gsi_base[24])
IOAPIC[1]: apic_id 5, version 17, address 0xfa3e0000, GSI 24-27
ACPI: IOAPIC (id[0x06] address[0xfa3e1000] gsi_base[28])
IOAPIC[2]: apic_id 6, version 17, address 0xfa3e1000, GSI 28-31
ACPI: IOAPIC (id[0x07] address[0xfa3e2000] gsi_base[32])
IOAPIC[3]: apic_id 7, version 17, address 0xfa3e2000, GSI 32-35
ACPI: IOAPIC (id[0x08] address[0xfa3e4000] gsi_base[36])
IOAPIC[4]: apic_id 8, version 17, address 0xfa3e4000, GSI 36-39
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at e2000000 (gap: e0000000:1ec00000)
Checking aperture...
CPU 0: aperture @ 0 size 32 MB
No AGP bridge found
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 8000000
Built 4 zonelists
Initializing CPU#0
Kernel command line: root=/dev/hda2 vga=0x314 selinux=0 splash=silent console=tty0 console=ttyS0,38400 resume=/dev/hda1 profile=2
kernel profiling enabled (shift: 2)
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 3.579545 MHz PM timer.
time.c: Detected 1398.189 MHz processor.
Console: colour dummy device 80x25
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
Memory: 6110856k/7864320k available (3049k kernel code, 194612k reserved, 1612k data, 244k init)
Calibrating delay using timer specific routine.. 2801.62 BogoMIPS (lpj=5603254)
Security Framework v1.0.0 initialized
SELinux: Disabled at boot.
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0(1) -> Node 0 -> Core 0
mtrr: v2.0 (20020519)
Using local APIC timer interrupts.
Detected 12.483 MHz APIC timer.
setup_APIC_timer
done
Booting processor 1/4 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 2796.59 BogoMIPS (lpj=5593188)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1(1) -> Node 1 -> Core 0
Opteron MP w/ 1MB stepping 00
setup_APIC_timer
done
CPU 1: Syncing TSC to CPU 0.
CPU 1: synchronized TSC with CPU 0 (last diff -1 cycles, maxerr 981 cycles)
Booting processor 2/4 APIC 0x2
Initializing CPU#2
Calibrating delay using timer specific routine.. 2796.59 BogoMIPS (lpj=5593185)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 2(1) -> Node 2 -> Core 0
Opteron MP w/ 1MB stepping 00
setup_APIC_timer
done
CPU 2: Syncing TSC to CPU 0.
CPU 2: synchronized TSC with CPU 0 (last diff -4 cycles, maxerr 976 cycles)
Booting processor 3/4 APIC 0x3
Initializing CPU#3
Calibrating delay using timer specific routine.. 2796.59 BogoMIPS (lpj=5593186)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 3(1) -> Node 3 -> Core 0
Opteron MP w/ 1MB stepping 00
setup_APIC_timer
done
CPU 3: Syncing TSC to CPU 0.
CPU 3: synchronized TSC with CPU 0 (last diff -2 cycles, maxerr 1606 cycles)
Brought up 4 CPUs
Disabling vsyscall due to use of PM timer
time.c: Using PM based timekeeping.
testing NMI watchdog ... OK.
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
ACPI: Subsystem revision 20050902
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 *5 10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 5 *10 11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 5 10 *11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 5 10 *11)
ACPI: PCI Root Bridge [PCI1] (0000:08)
PCI: Probing PCI hardware (bus 08)
SCSI subsystem initialized
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
PCI-DMA: Disabling AGP.
PCI-DMA: aperture base @ 8000000 size 65536 KB
PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
PCI: Bridge: 0000:00:06.0
IO window: 2000-2fff
MEM window: fa000000-fa0fffff
PREFETCH window: e2000000-e20fffff
PCI: Bridge: 0000:09:01.0
IO window: disabled.
MEM window: fa400000-faffffff
PREFETCH window: fc000000-fdffffff
PCI: Bridge: 0000:08:01.0
IO window: disabled.
MEM window: fa400000-faffffff
PREFETCH window: fc000000-fdffffff
PCI: Bridge: 0000:08:02.0
IO window: 3000-3fff
MEM window: fb000000-fb0fffff
PREFETCH window: e2100000-e21fffff
PCI: Bridge: 0000:08:03.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:08:04.0
IO window: 4000-4fff
MEM window: fb100000-fb1fffff
PREFETCH window: e2200000-e22fffff
ACPI: PCI Interrupt 0000:08:04.0[A] -> GSI 36 (level, low) -> IRQ 16
IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $
audit: initializing netlink socket (disabled)
audit(1127427646.392:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
JFS: nTxBlock = 8192, nTxLock = 65536
Initializing Cryptographic API
PCI: MSI quirk detected. pci_msi_quirk set.
PCI: MSI quirk detected. pci_msi_quirk set.
PCI: MSI quirk detected. pci_msi_quirk set.
PCI: MSI quirk detected. pci_msi_quirk set.
vesafb: framebuffer at 0xfc000000, mapped to 0xffffc20000600000, using 1875k, total 16384k
vesafb: mode is 800x600x16, linelength=1600, pages=16
vesafb: scrolling: redraw
vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
mtrr: type mismatch for fc000000,1000000 old: write-back new: write-combining
mtrr: type mismatch for fc000000,800000 old: write-back new: write-combining
mtrr: type mismatch for fc000000,400000 old: write-back new: write-combining
mtrr: type mismatch for fc000000,200000 old: write-back new: write-combining
mtrr: type mismatch for fc000000,100000 old: write-back new: write-combining
mtrr: type mismatch for fc000000,80000 old: write-back new: write-combining
mtrr: type mismatch for fc000000,40000 old: write-back new: write-combining
mtrr: type mismatch for fc000000,20000 old: write-back new: write-combining
mtrr: type mismatch for fc000000,10000 old: write-back new: write-combining
mtrr: type mismatch for fc000000,8000 old: write-back new: write-combining
mtrr: type mismatch for fc000000,4000 old: write-back new: write-combining
mtrr: type mismatch for fc000000,2000 old: write-back new: write-combining
mtrr: type mismatch for fc000000,1000 old: write-back new: write-combining
vesafb: Mode is not VGA compatible
Console: switching to colour frame buffer device 100x37
fb0: VESA VGA frame buffer device
Real Time Clock Driver v1.12
Non-volatile memory driver v1.2
Linux agpgart interface v0.101 (c) Dave Jones
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
mice: PS/2 mouse device common for all mice
input: PC Speaker
io scheduler noop registered
input: AT Translated Set 2 keyboard on isa0060/serio0
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
RAMDISK driver initialized: 16 RAM disks of 128000K size 1024 blocksize
loop: loaded (max 8 devices)
tg3.c:v3.40 (September 15, 2005)
ACPI: PCI Interrupt 0000:19:02.0[A] -> GSI 38 (level, low) -> IRQ 17
input: PS/2 Generic Mouse on isa0060/serio1
eth0: Tigon3 [partno(3C996B-T) rev 0105 PHY(5701)] (PCI:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:04:76:f0:f9:aa
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[0]
eth0: dma_rwctrl[76ff000f]
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD8111: IDE controller at PCI slot 0000:00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller
ide0: BM-DMA at 0x1020-0x1027, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0x1028-0x102f, BIOS settings: hdc:DMA, hdd:pio
^ permalink raw reply [flat|nested] 37+ messages in thread
* tty update speed regression (was: 2.6.14-rc2-mm1)
2005-09-22 5:28 2.6.14-rc2-mm1 Andrew Morton
` (2 preceding siblings ...)
2005-09-22 18:59 ` 2.6.14-rc2-mm1 Martin J. Bligh
@ 2005-09-22 19:50 ` Alexey Dobriyan
2005-09-22 21:49 ` Alexey Dobriyan
2005-09-24 17:43 ` 2.6.14-rc2-mm1 Mattia Dongili
` (2 subsequent siblings)
6 siblings, 1 reply; 37+ messages in thread
From: Alexey Dobriyan @ 2005-09-22 19:50 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
I see regression in tty update speed with ADOM (ncurses based
roguelike) [1].
Messages at the top ("goblin hits you") are printed slowly. An eye can
notice letter after letter printing.
2.6.14-rc2 is OK.
I'll try to revert tty-layer-buffering-revamp*.patch pieces and see if
it'll change something.
[1] http://adom.de/adom/download/linux/adom-111-elf.tar.gz (binary only)
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: tty update speed regression (was: 2.6.14-rc2-mm1)
2005-09-22 19:50 ` tty update speed regression (was: 2.6.14-rc2-mm1) Alexey Dobriyan
@ 2005-09-22 21:49 ` Alexey Dobriyan
2005-09-23 0:08 ` Nishanth Aravamudan
0 siblings, 1 reply; 37+ messages in thread
From: Alexey Dobriyan @ 2005-09-22 21:49 UTC (permalink / raw)
To: Andrew Morton, Nishanth Aravamudan; +Cc: linux-kernel
On Thu, Sep 22, 2005 at 11:50:29PM +0400, Alexey Dobriyan wrote:
> I see regression in tty update speed with ADOM (ncurses based
> roguelike) [1].
>
> Messages at the top ("goblin hits you") are printed slowly. An eye can
> notice letter after letter printing.
>
> 2.6.14-rc2 is OK.
>
> I'll try to revert tty-layer-buffering-revamp*.patch pieces and see if
> it'll change something.
>
> [1] http://adom.de/adom/download/linux/adom-111-elf.tar.gz (binary only)
Scratch TTY revamp, the sucker is
fix-sys_poll-large-timeout-handling.patch
HZ=250 here.
------------------------------------------------------------------------
From: Nishanth Aravamudan <nacc@us.ibm.com>
The @timeout parameter to sys_poll() is in milliseconds but we compare it
to (MAX_SCHEDULE_TIMEOUT / HZ), which is (jiffies/jiffies-per-sec) or
seconds. That seems blatantly broken. This led to improper overflow
checking for @timeout. As Andrew Morton pointed out, the best fix is to to
check for potential overflow first, then either select an indefinite value
or convert @timeout.
To achieve this and clean-up the code, change the prototype of the sys_poll
to make it clear that the parameter is in milliseconds and introduce a
variable, timeout_jiffies to hold the corresonding jiffies value.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
fs/select.c | 36 ++++++++++++++++++++++++++----------
include/linux/syscalls.h | 2 +-
2 files changed, 27 insertions(+), 11 deletions(-)
diff -puN fs/select.c~fix-sys_poll-large-timeout-handling fs/select.c
--- devel/fs/select.c~fix-sys_poll-large-timeout-handling 2005-09-10 02:35:19.000000000 -0700
+++ devel-akpm/fs/select.c 2005-09-10 03:26:17.000000000 -0700
@@ -464,15 +464,18 @@ static int do_poll(unsigned int nfds, s
return count;
}
-asmlinkage long sys_poll(struct pollfd __user * ufds, unsigned int nfds, long timeout)
+asmlinkage long sys_poll(struct pollfd __user *ufds, unsigned int nfds,
+ long timeout_msecs)
{
struct poll_wqueues table;
- int fdcount, err;
+ int fdcount, err;
+ int overflow;
unsigned int i;
struct poll_list *head;
struct poll_list *walk;
struct fdtable *fdt;
int max_fdset;
+ unsigned long timeout_jiffies;
/* Do a sanity check on nfds ... */
rcu_read_lock();
@@ -482,13 +485,26 @@ asmlinkage long sys_poll(struct pollfd _
if (nfds > max_fdset && nfds > OPEN_MAX)
return -EINVAL;
- if (timeout) {
- /* Careful about overflow in the intermediate values */
- if ((unsigned long) timeout < MAX_SCHEDULE_TIMEOUT / HZ)
- timeout = (unsigned long)(timeout*HZ+999)/1000+1;
- else /* Negative or overflow */
- timeout = MAX_SCHEDULE_TIMEOUT;
- }
+ /*
+ * We compare HZ with 1000 to work out which side of the
+ * expression needs conversion. Because we want to avoid
+ * converting any value to a numerically higher value, which
+ * could overflow.
+ */
+#if HZ > 1000
+ overflow = timeout_msecs >= jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
+#else
+ overflow = msecs_to_jiffies(timeout_msecs) >= MAX_SCHEDULE_TIMEOUT;
+#endif
+
+ /*
+ * If we would overflow in the conversion or a negative timeout
+ * is requested, sleep indefinitely.
+ */
+ if (overflow || timeout_msecs < 0)
+ timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
+ else
+ timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
poll_initwait(&table);
@@ -519,7 +535,7 @@ asmlinkage long sys_poll(struct pollfd _
}
i -= pp->len;
}
- fdcount = do_poll(nfds, head, &table, timeout);
+ fdcount = do_poll(nfds, head, &table, timeout_jiffies);
/* OK, now copy the revents fields back to user space. */
walk = head;
diff -puN include/linux/syscalls.h~fix-sys_poll-large-timeout-handling include/linux/syscalls.h
--- devel/include/linux/syscalls.h~fix-sys_poll-large-timeout-handling 2005-09-10 02:35:19.000000000 -0700
+++ devel-akpm/include/linux/syscalls.h 2005-09-10 02:35:19.000000000 -0700
@@ -420,7 +420,7 @@ asmlinkage long sys_socketpair(int, int,
asmlinkage long sys_socketcall(int call, unsigned long __user *args);
asmlinkage long sys_listen(int, int);
asmlinkage long sys_poll(struct pollfd __user *ufds, unsigned int nfds,
- long timeout);
+ long timeout_msecs);
asmlinkage long sys_select(int n, fd_set __user *inp, fd_set __user *outp,
fd_set __user *exp, struct timeval __user *tvp);
asmlinkage long sys_epoll_create(int size);
_
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: tty update speed regression (was: 2.6.14-rc2-mm1)
2005-09-22 21:49 ` Alexey Dobriyan
@ 2005-09-23 0:08 ` Nishanth Aravamudan
2005-09-23 17:12 ` Nish Aravamudan
0 siblings, 1 reply; 37+ messages in thread
From: Nishanth Aravamudan @ 2005-09-23 0:08 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: Andrew Morton, linux-kernel
On 23.09.2005 [01:49:26 +0400], Alexey Dobriyan wrote:
> On Thu, Sep 22, 2005 at 11:50:29PM +0400, Alexey Dobriyan wrote:
> > I see regression in tty update speed with ADOM (ncurses based
> > roguelike) [1].
> >
> > Messages at the top ("goblin hits you") are printed slowly. An eye can
> > notice letter after letter printing.
> >
> > 2.6.14-rc2 is OK.
> >
> > I'll try to revert tty-layer-buffering-revamp*.patch pieces and see if
> > it'll change something.
> >
> > [1] http://adom.de/adom/download/linux/adom-111-elf.tar.gz (binary only)
>
> Scratch TTY revamp, the sucker is
> fix-sys_poll-large-timeout-handling.patch
>
> HZ=250 here.
Alexey,
Thanks for the report. I will take a look on my Thinkpad with HZ=250
under -mm2. I have some ideas for debugging it if I see the same
problem.
Thanks,
Nish
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: tty update speed regression (was: 2.6.14-rc2-mm1)
2005-09-23 0:08 ` Nishanth Aravamudan
@ 2005-09-23 17:12 ` Nish Aravamudan
2005-09-23 18:42 ` Alexey Dobriyan
0 siblings, 1 reply; 37+ messages in thread
From: Nish Aravamudan @ 2005-09-23 17:12 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: Alexey Dobriyan, Andrew Morton, linux-kernel
On 9/22/05, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> On 23.09.2005 [01:49:26 +0400], Alexey Dobriyan wrote:
> > On Thu, Sep 22, 2005 at 11:50:29PM +0400, Alexey Dobriyan wrote:
> > > I see regression in tty update speed with ADOM (ncurses based
> > > roguelike) [1].
> > >
> > > Messages at the top ("goblin hits you") are printed slowly. An eye can
> > > notice letter after letter printing.
> > >
> > > 2.6.14-rc2 is OK.
> > >
> > > I'll try to revert tty-layer-buffering-revamp*.patch pieces and see if
> > > it'll change something.
> > >
> > > [1] http://adom.de/adom/download/linux/adom-111-elf.tar.gz (binary only)
> >
> > Scratch TTY revamp, the sucker is
> > fix-sys_poll-large-timeout-handling.patch
> >
> > HZ=250 here.
>
> Alexey,
>
> Thanks for the report. I will take a look on my Thinkpad with HZ=250
> under -mm2. I have some ideas for debugging it if I see the same
> problem.
I did not see any tty refresh problems on my TP with HZ=250 under
2.6.14-rc2-mm1 (excuse the typo in my previous response) under the
adom binary you sent me. I even played two games just to make sure ;)
Is there any chance you can do an strace of the process while it is
slow to redraw your screen? Just to verify how poll() is being called
[if my patch is the problem, then poll() must be being used somewhat
differently than I expected -- e.g. a dependency on the broken
behavior]. The only thing I can think of right now is that I made
timeout_jiffies unsigned, when schedule_timeout() will treat it as
signed, but I'm not sure if that is the problem.
We may want to contact the adom author eventually to figure out how
poll() is being used in the Linux port, if strace is unable to help
further.
Thanks,
Nish
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: tty update speed regression (was: 2.6.14-rc2-mm1)
2005-09-23 17:12 ` Nish Aravamudan
@ 2005-09-23 18:42 ` Alexey Dobriyan
2005-09-23 19:07 ` Nishanth Aravamudan
0 siblings, 1 reply; 37+ messages in thread
From: Alexey Dobriyan @ 2005-09-23 18:42 UTC (permalink / raw)
To: Nish Aravamudan; +Cc: Andrew Morton, linux-kernel
On Fri, Sep 23, 2005 at 10:12:11AM -0700, Nish Aravamudan wrote:
> I did not see any tty refresh problems on my TP with HZ=250 under
> 2.6.14-rc2-mm1 (excuse the typo in my previous response) under the
> adom binary you sent me. I even played two games just to make sure ;)
The slowdown is HZ dependent:
* HZ=1000 - game is playable. If I would not know slowdown is there I
wouldn't notice it.
* HZ=100 - messages at the top are printed r e a l l y s l o w.
* HZ=250 - somewhere in the middle.
> Is there any chance you can do an strace of the process while it is
> slow to redraw your screen?
Typical pattern is:
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
write(1, "\33[11;18H\33[37m\33[40m[g] Gnome\r\33[12"..., 58) = 58
rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
write(1, "\33[12;18H\33[37m\33[40m[h] Hurthling\r"..., 62) = 62
rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
write(1, "\33[13;18H\33[37m\33[40m[i] Orc\r\33[14d\33"..., 56) = 56
rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
I can send full strace log if needed.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: tty update speed regression (was: 2.6.14-rc2-mm1)
2005-09-23 18:42 ` Alexey Dobriyan
@ 2005-09-23 19:07 ` Nishanth Aravamudan
2005-09-23 19:42 ` Alexey Dobriyan
0 siblings, 1 reply; 37+ messages in thread
From: Nishanth Aravamudan @ 2005-09-23 19:07 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: Nish Aravamudan, Andrew Morton, linux-kernel
On 23.09.2005 [22:42:16 +0400], Alexey Dobriyan wrote:
> On Fri, Sep 23, 2005 at 10:12:11AM -0700, Nish Aravamudan wrote:
> > I did not see any tty refresh problems on my TP with HZ=250 under
> > 2.6.14-rc2-mm1 (excuse the typo in my previous response) under the
> > adom binary you sent me. I even played two games just to make sure ;)
>
> The slowdown is HZ dependent:
> * HZ=1000 - game is playable. If I would not know slowdown is there I
> wouldn't notice it.
> * HZ=100 - messages at the top are printed r e a l l y s l o w.
> * HZ=250 - somewhere in the middle.
>
> > Is there any chance you can do an strace of the process while it is
> > slow to redraw your screen?
>
> Typical pattern is:
>
> rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
> poll([{fd=0, events=POLLIN}], 1, 0) = 0
> poll([{fd=0, events=POLLIN}], 1, 0) = 0
> write(1, "\33[11;18H\33[37m\33[40m[g] Gnome\r\33[12"..., 58) = 58
> rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
> rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
> poll([{fd=0, events=POLLIN}], 1, 0) = 0
> poll([{fd=0, events=POLLIN}], 1, 0) = 0
> write(1, "\33[12;18H\33[37m\33[40m[h] Hurthling\r"..., 62) = 62
> rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
> rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
> poll([{fd=0, events=POLLIN}], 1, 0) = 0
> poll([{fd=0, events=POLLIN}], 1, 0) = 0
> write(1, "\33[13;18H\33[37m\33[40m[i] Orc\r\33[14d\33"..., 56) = 56
> rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
> rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
>
> I can send full strace log if needed.
Nope, that helped tremendously! I think I know what the issue is (and
why it's HZ dependent).
In the current code, (2.6.13.2, e.g) we allow 0 timeout poll-requests to
be resolved as 0 jiffy requests. But in my patch, those requests become
1 jiffy (which of course depends on HZ and gets quite long if HZ=100)!
Care to try the following patch?
Note: I would be happy to not do the conditional and just have the patch
change the msecs_to_jiffies() line when assigning to timeout_jiffies.
But I figured it would be best to avoid *all* computations if we know
the resulting value is going to be 0. Hence all the tab changing.
Thanks,
Nish
Description: Modifying sys_poll() to handle large timeouts correctly
resulted in 0 being treated just like any other millisecond request,
while the current code treats it as an optimized case. Do the same in
the new code. Most of the code change is tabbing due to the inserted if.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
fs/select.c | 41 +++++++++++++++++++++++++----------------
1 files changed, 25 insertions(+), 16 deletions(-)
diff -urpN 2.6.14-rc2-mm1/fs/select.c 2.6.14-rc2-mm1-dev/fs/select.c
--- 2.6.14-rc2-mm1/fs/select.c 2005-09-23 11:52:36.000000000 -0700
+++ 2.6.14-rc2-mm1-dev/fs/select.c 2005-09-23 12:04:03.000000000 -0700
@@ -485,26 +485,35 @@ asmlinkage long sys_poll(struct pollfd _
if (nfds > max_fdset && nfds > OPEN_MAX)
return -EINVAL;
- /*
- * We compare HZ with 1000 to work out which side of the
- * expression needs conversion. Because we want to avoid
- * converting any value to a numerically higher value, which
- * could overflow.
- */
+ if (timeout_msecs) {
+ /*
+ * We compare HZ with 1000 to work out which side of the
+ * expression needs conversion. Because we want to
+ * avoid converting any value to a numerically higher
+ * value, which could overflow.
+ */
#if HZ > 1000
- overflow = timeout_msecs >= jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
+ overflow = timeout_msecs >=
+ jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
#else
- overflow = msecs_to_jiffies(timeout_msecs) >= MAX_SCHEDULE_TIMEOUT;
+ overflow = msecs_to_jiffies(timeout_msecs) >=
+ MAX_SCHEDULE_TIMEOUT;
#endif
- /*
- * If we would overflow in the conversion or a negative timeout
- * is requested, sleep indefinitely.
- */
- if (overflow || timeout_msecs < 0)
- timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
- else
- timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
+ /*
+ * If we would overflow in the conversion or a negative
+ * timeout is requested, sleep indefinitely.
+ */
+ if (overflow || timeout_msecs < 0)
+ timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
+ else
+ timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
+ } else {
+ /*
+ * 0 millisecond requests become 0 jiffy requests
+ */
+ timeout_jiffies = 0;
+ }
poll_initwait(&table);
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: tty update speed regression (was: 2.6.14-rc2-mm1)
2005-09-23 19:07 ` Nishanth Aravamudan
@ 2005-09-23 19:42 ` Alexey Dobriyan
2005-09-23 21:32 ` Nishanth Aravamudan
0 siblings, 1 reply; 37+ messages in thread
From: Alexey Dobriyan @ 2005-09-23 19:42 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: Nish Aravamudan, Andrew Morton, linux-kernel
On Fri, Sep 23, 2005 at 12:07:49PM -0700, Nishanth Aravamudan wrote:
> On 23.09.2005 [22:42:16 +0400], Alexey Dobriyan wrote:
> > poll([{fd=0, events=POLLIN}], 1, 0) = 0
> > I can send full strace log if needed.
>
> Nope, that helped tremendously! I think I know what the issue is (and
> why it's HZ dependent).
>
> In the current code, (2.6.13.2, e.g) we allow 0 timeout poll-requests to
> be resolved as 0 jiffy requests. But in my patch, those requests become
> 1 jiffy (which of course depends on HZ and gets quite long if HZ=100)!
>
> Care to try the following patch?
It works! Now, even with HZ=100, gameplay is smooth.
Andrew, please, apply.
> Description: Modifying sys_poll() to handle large timeouts correctly
> resulted in 0 being treated just like any other millisecond request,
> while the current code treats it as an optimized case. Do the same in
> the new code. Most of the code change is tabbing due to the inserted if.
> diff -urpN 2.6.14-rc2-mm1/fs/select.c 2.6.14-rc2-mm1-dev/fs/select.c
> --- 2.6.14-rc2-mm1/fs/select.c 2005-09-23 11:52:36.000000000 -0700
> +++ 2.6.14-rc2-mm1-dev/fs/select.c 2005-09-23 12:04:03.000000000 -0700
> @@ -485,26 +485,35 @@ asmlinkage long sys_poll(struct pollfd _
> if (nfds > max_fdset && nfds > OPEN_MAX)
> return -EINVAL;
>
> - /*
> - * We compare HZ with 1000 to work out which side of the
> - * expression needs conversion. Because we want to avoid
> - * converting any value to a numerically higher value, which
> - * could overflow.
> - */
> + if (timeout_msecs) {
> + /*
> + * We compare HZ with 1000 to work out which side of the
> + * expression needs conversion. Because we want to
> + * avoid converting any value to a numerically higher
> + * value, which could overflow.
> + */
> #if HZ > 1000
> - overflow = timeout_msecs >= jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
> + overflow = timeout_msecs >=
> + jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
> #else
> - overflow = msecs_to_jiffies(timeout_msecs) >= MAX_SCHEDULE_TIMEOUT;
> + overflow = msecs_to_jiffies(timeout_msecs) >=
> + MAX_SCHEDULE_TIMEOUT;
> #endif
>
> - /*
> - * If we would overflow in the conversion or a negative timeout
> - * is requested, sleep indefinitely.
> - */
> - if (overflow || timeout_msecs < 0)
> - timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
> - else
> - timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
> + /*
> + * If we would overflow in the conversion or a negative
> + * timeout is requested, sleep indefinitely.
> + */
> + if (overflow || timeout_msecs < 0)
> + timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
> + else
> + timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
> + } else {
> + /*
> + * 0 millisecond requests become 0 jiffy requests
> + */
> + timeout_jiffies = 0;
> + }
>
> poll_initwait(&table);
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: tty update speed regression (was: 2.6.14-rc2-mm1)
2005-09-23 19:42 ` Alexey Dobriyan
@ 2005-09-23 21:32 ` Nishanth Aravamudan
0 siblings, 0 replies; 37+ messages in thread
From: Nishanth Aravamudan @ 2005-09-23 21:32 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: Nish Aravamudan, Andrew Morton, linux-kernel
On 23.09.2005 [23:42:53 +0400], Alexey Dobriyan wrote:
> On Fri, Sep 23, 2005 at 12:07:49PM -0700, Nishanth Aravamudan wrote:
> > On 23.09.2005 [22:42:16 +0400], Alexey Dobriyan wrote:
> > > poll([{fd=0, events=POLLIN}], 1, 0) = 0
>
> > > I can send full strace log if needed.
> >
> > Nope, that helped tremendously! I think I know what the issue is (and
> > why it's HZ dependent).
> >
> > In the current code, (2.6.13.2, e.g) we allow 0 timeout poll-requests to
> > be resolved as 0 jiffy requests. But in my patch, those requests become
> > 1 jiffy (which of course depends on HZ and gets quite long if HZ=100)!
> >
> > Care to try the following patch?
>
> It works! Now, even with HZ=100, gameplay is smooth.
>
> Andrew, please, apply.
Great! Thanks for the testing, Alexey.
-Nish
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-22 5:28 2.6.14-rc2-mm1 Andrew Morton
` (3 preceding siblings ...)
2005-09-22 19:50 ` tty update speed regression (was: 2.6.14-rc2-mm1) Alexey Dobriyan
@ 2005-09-24 17:43 ` Mattia Dongili
2005-09-24 17:58 ` 2.6.14-rc2-mm1 Mattia Dongili
2005-09-27 7:13 ` 2.6.14-rc2-mm1 (Oops, possibly Netfilter related?) Reuben Farrelly
6 siblings, 0 replies; 37+ messages in thread
From: Mattia Dongili @ 2005-09-24 17:43 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
On Wed, Sep 21, 2005 at 10:28:39PM -0700, Andrew Morton wrote:
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
[...]
> +reiser4-ver_linux-dont-print-reiser4progs-version-if-none-found.patch
> +reiser4-atime-update-fix.patch
> +reiser4-use-try_to_freeze.patch
>
> reiser4 fixes
Runs good, except that reiser4 seems to do bad things in do_sendfile.
I have apache2 running here and it refuses to serve my ~/public_html
homepage. /home is running on a reiser4 partition and while apache2
serves good pages from different filesystems, stracing the process while
requesting my homepage, I get:
stat64("/home/mattia/public_html/index.html", {st_mode=S_IFREG|0644, st_size=2315, ...}) = 0
open("/home/mattia/public_html/index.html", O_RDONLY) = 12
setsockopt(11, SOL_TCP, TCP_NODELAY, [0], 4) = 0
setsockopt(11, SOL_TCP, TCP_CORK, [1], 4) = 0
writev(11, [{"HTTP/1.1 200 OK\r\nDate: Sat, 24 S"..., 328}], 1) = 328
sendfile(11, 12, [0], 2315) = -1 EINVAL (Invalid argument)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
setsockopt(11, SOL_TCP, TCP_CORK, [0], 4) = 0
setsockopt(11, SOL_TCP, TCP_NODELAY, [1], 4) = 0
read(11, 0x82297f0, 8000) = -1 EAGAIN (Resource temporarily unavailable)
write(10, "127.0.0.1 - - [24/Sep/2005:10:13"..., 95) = 95
close(11) = 0
read(5, 0xbfe4c4e3, 1) = -1 EAGAIN (Resource temporarily unavailable)
close(12) = 0
--
mattia
:wq!
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-22 5:28 2.6.14-rc2-mm1 Andrew Morton
` (4 preceding siblings ...)
2005-09-24 17:43 ` 2.6.14-rc2-mm1 Mattia Dongili
@ 2005-09-24 17:58 ` Mattia Dongili
2005-09-24 18:23 ` 2.6.14-rc2-mm1 Andrew Morton
2005-09-27 7:13 ` 2.6.14-rc2-mm1 (Oops, possibly Netfilter related?) Reuben Farrelly
6 siblings, 1 reply; 37+ messages in thread
From: Mattia Dongili @ 2005-09-24 17:58 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
On Wed, Sep 21, 2005 at 10:28:39PM -0700, Andrew Morton wrote:
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
Herm... running almost good :) I just got the below allocation failure
(including /proc/slabinfo and /proc/vmstat, useful? can provide more
info if happens again - ah, exim is just running for the local delivery
purpose only). I did see it previously in .14-rc1-mm1 only but I didn't
find time enough to report it properly.
Linux version 2.6.14-rc2-mm1-1 (mattia@inferi) (gcc version 4.0.1 (Debian 4.0.1-2)) #1 PREEMPT Fri Sep 23 20:56:05 CEST 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
BIOS-e820: 000000000009e800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000c0000 - 00000000000d0000 (reserved)
BIOS-e820: 00000000000d8000 - 00000000000e0000 (reserved)
BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000000fef0000 (usable)
BIOS-e820: 000000000fef0000 - 000000000feff000 (ACPI data)
BIOS-e820: 000000000feff000 - 000000000ff00000 (ACPI NVS)
BIOS-e820: 000000000ff00000 - 000000000ff80000 (usable)
BIOS-e820: 000000000ff80000 - 0000000010000000 (reserved)
BIOS-e820: 00000000ff800000 - 00000000ffc00000 (reserved)
BIOS-e820: 00000000fffffc00 - 0000000100000000 (reserved)
255MB LOWMEM available.
On node 0 totalpages: 65408
DMA zone: 4096 pages, LIFO batch:2
DMA32 zone: 0 pages, LIFO batch:2
Normal zone: 61312 pages, LIFO batch:32
HighMem zone: 0 pages, LIFO batch:2
DMI present.
[...]
exim4: page allocation failure. order:1, mode:0x80000020
[<c0143698>] __alloc_pages+0x328/0x450
[<c0147150>] kmem_getpages+0x30/0xa0
[<c01480cf>] cache_grow+0xbf/0x1f0
[<c0148446>] cache_alloc_refill+0x246/0x280
[<c0148793>] __kmalloc+0x73/0x80
[<c0291cd8>] pskb_expand_head+0x58/0x150
[<c0297143>] skb_checksum_help+0x103/0x120
[<d0c6d1cc>] ip_nat_fn+0x1cc/0x240 [iptable_nat]
[<d0c763e8>] ip_conntrack_in+0x188/0x2c0 [ip_conntrack]
[<d0c6d45e>] ip_nat_local_fn+0x7e/0xc0 [iptable_nat]
[<c02b2670>] dst_output+0x0/0x30
[<c02b2670>] dst_output+0x0/0x30
[<c02e7c2b>] nf_iterate+0x6b/0xa0
[<c02b2670>] dst_output+0x0/0x30
[<c02b2670>] dst_output+0x0/0x30
[<c02e7cc4>] nf_hook_slow+0x64/0x140
[<c02b2670>] dst_output+0x0/0x30
[<c02b2670>] dst_output+0x0/0x30
[<c02b35ae>] ip_queue_xmit+0x23e/0x550
[<c02b2670>] dst_output+0x0/0x30
[<c01e1b9a>] __copy_to_user_ll+0x4a/0x90
[<c0293a6e>] memcpy_toiovec+0x6e/0x90
[<c02c4c75>] tcp_cwnd_restart+0x35/0xf0
[<c02c5276>] tcp_transmit_skb+0x426/0x780
[<c02c332e>] tcp_rcv_established+0x6e/0x8c0
[<c02c657d>] tcp_write_xmit+0x12d/0x3d0
[<c02c6855>] __tcp_push_pending_frames+0x35/0xb0
[<c02bad3c>] tcp_sendmsg+0xa3c/0xb50
[<c028c67f>] sock_aio_write+0xcf/0x120
[<c016029d>] do_sync_write+0xcd/0x130
[<c0131ed0>] autoremove_wake_function+0x0/0x60
[<c016047f>] vfs_write+0x17f/0x190
[<c016055b>] sys_write+0x4b/0x80
[<c01032a1>] syscall_call+0x7/0xb
Mem-info:
DMA per-cpu:
cpu 0 hot: low 0, high 12, batch 2 used:8
cpu 0 cold: low 0, high 4, batch 1 used:3
DMA32 per-cpu: empty
Normal per-cpu:
cpu 0 hot: low 0, high 192, batch 32 used:14
cpu 0 cold: low 0, high 64, batch 16 used:51
HighMem per-cpu: empty
Free pages: 4112kB (0kB HighMem)
Active:46238 inactive:10857 dirty:16 writeback:0 unstable:0 free:1028 slab:4078 mapped:39343 pagetables:316
DMA free:1224kB min:128kB low:160kB high:192kB active:6812kB inactive:3684kB present:16384kB pages_scanned:36 all_unreclaimable? no
lowmem_reserve[]: 0 0 239 239
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 239 239
Normal free:2888kB min:1916kB low:2392kB high:2872kB active:178140kB inactive:39744kB present:245248kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 300*4kB 1*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1224kB
DMA32: empty
Normal: 722*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2888kB
HighMem: empty
Swap cache: add 33, delete 33, find 0/0, race 0+0
Free swap = 248864kB
Total swap = 248996kB
Free swap: 248864kB
65408 pages of RAM
0 pages of HIGHMEM
1529 reserved pages
46307 pages shared
0 pages swap cached
16 pages dirty
0 pages writeback
39343 pages mapped
4078 pages slab
316 pages pagetables
cat /proc/slabinfo
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
nfs_write_data 36 36 448 9 1 : tunables 54 27 0 : slabdata 4 4 0
nfs_read_data 32 36 448 9 1 : tunables 54 27 0 : slabdata 4 4 0
nfs_inode_cache 3 14 560 7 1 : tunables 54 27 0 : slabdata 2 2 0
nfs_page 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
rpc_buffers 8 8 2048 2 1 : tunables 24 12 0 : slabdata 4 4 0
rpc_tasks 8 20 192 20 1 : tunables 120 60 0 : slabdata 1 1 0
rpc_inode_cache 8 9 416 9 1 : tunables 54 27 0 : slabdata 1 1 0
ip_conntrack_expect 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
ip_conntrack 1 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0
scsi_cmd_cache 1 11 352 11 1 : tunables 54 27 0 : slabdata 1 1 0
d_cursor 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
file_fsdata 71 75 256 15 1 : tunables 120 60 0 : slabdata 5 5 0
dentry_fsdata 2188 3658 64 59 1 : tunables 120 60 0 : slabdata 62 62 0
fq 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
jnode 1869 4480 96 40 1 : tunables 120 60 0 : slabdata 112 112 0
txn_handle 0 0 32 113 1 : tunables 120 60 0 : slabdata 0 0 0
txn_atom 1 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0
plugin_set 73 118 64 59 1 : tunables 120 60 0 : slabdata 2 2 0
znode 4704 7888 224 17 1 : tunables 120 60 0 : slabdata 464 464 0
reiser4_inode 4057 4144 512 7 1 : tunables 54 27 0 : slabdata 592 592 0
sgpool-128 32 32 2048 2 1 : tunables 24 12 0 : slabdata 16 16 0
sgpool-64 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0
sgpool-32 32 32 512 8 1 : tunables 54 27 0 : slabdata 4 4 0
sgpool-16 32 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0
sgpool-8 32 60 128 30 1 : tunables 120 60 0 : slabdata 2 2 0
dm_tio 0 0 16 203 1 : tunables 120 60 0 : slabdata 0 0 0
dm_io 0 0 16 203 1 : tunables 120 60 0 : slabdata 0 0 0
uhci_urb_priv 1 92 40 92 1 : tunables 120 60 0 : slabdata 1 1 0
UNIX 77 77 352 11 1 : tunables 54 27 0 : slabdata 7 7 0
tcp_bind_bucket 15 203 16 203 1 : tunables 120 60 0 : slabdata 1 1 0
inet_peer_cache 1 59 64 59 1 : tunables 120 60 0 : slabdata 1 1 0
ip_fib_alias 9 113 32 113 1 : tunables 120 60 0 : slabdata 1 1 0
ip_fib_hash 9 113 32 113 1 : tunables 120 60 0 : slabdata 1 1 0
ip_dst_cache 31 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0
arp_cache 3 30 128 30 1 : tunables 120 60 0 : slabdata 1 1 0
RAW 2 9 448 9 1 : tunables 54 27 0 : slabdata 1 1 0
UDP 8 9 448 9 1 : tunables 54 27 0 : slabdata 1 1 0
tw_sock_TCP 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
request_sock_TCP 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
TCP 15 16 960 4 1 : tunables 54 27 0 : slabdata 4 4 0
cfq_ioc_pool 0 0 48 78 1 : tunables 120 60 0 : slabdata 0 0 0
cfq_pool 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
crq_pool 0 0 44 84 1 : tunables 120 60 0 : slabdata 0 0 0
deadline_drq 0 0 48 78 1 : tunables 120 60 0 : slabdata 0 0 0
as_arq 24 189 60 63 1 : tunables 120 60 0 : slabdata 3 3 0
mqueue_inode_cache 1 7 512 7 1 : tunables 54 27 0 : slabdata 1 1 0
reiser_inode_cache 622 1450 392 10 1 : tunables 54 27 0 : slabdata 145 145 0
dnotify_cache 0 0 20 169 1 : tunables 120 60 0 : slabdata 0 0 0
eventpoll_pwq 0 0 36 101 1 : tunables 120 60 0 : slabdata 0 0 0
eventpoll_epi 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
inotify_event_cache 0 0 28 127 1 : tunables 120 60 0 : slabdata 0 0 0
inotify_watch_cache 0 0 36 101 1 : tunables 120 60 0 : slabdata 0 0 0
kioctx 0 0 160 24 1 : tunables 120 60 0 : slabdata 0 0 0
kiocb 0 0 128 30 1 : tunables 120 60 0 : slabdata 0 0 0
fasync_cache 2 203 16 203 1 : tunables 120 60 0 : slabdata 1 1 0
shmem_inode_cache 748 756 408 9 1 : tunables 54 27 0 : slabdata 84 84 0
posix_timers_cache 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
uid_cache 6 59 64 59 1 : tunables 120 60 0 : slabdata 1 1 0
blkdev_ioc 51 127 28 127 1 : tunables 120 60 0 : slabdata 1 1 0
blkdev_queue 2 10 380 10 1 : tunables 54 27 0 : slabdata 1 1 0
blkdev_requests 25 78 152 26 1 : tunables 120 60 0 : slabdata 3 3 0
biovec-(256) 260 260 3072 2 2 : tunables 24 12 0 : slabdata 130 130 0
biovec-128 264 265 1536 5 2 : tunables 24 12 0 : slabdata 53 53 0
biovec-64 272 275 768 5 1 : tunables 54 27 0 : slabdata 55 55 0
biovec-16 272 280 192 20 1 : tunables 120 60 0 : slabdata 14 14 0
biovec-4 272 295 64 59 1 : tunables 120 60 0 : slabdata 5 5 0
biovec-1 279 406 16 203 1 : tunables 120 60 0 : slabdata 2 2 0
bio 279 354 64 59 1 : tunables 120 60 0 : slabdata 6 6 0
file_lock_cache 21 44 88 44 1 : tunables 120 60 0 : slabdata 1 1 0
sock_inode_cache 110 110 352 11 1 : tunables 54 27 0 : slabdata 10 10 0
skbuff_fclone_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0
skbuff_head_cache 696 696 160 24 1 : tunables 120 60 0 : slabdata 29 29 0
acpi_operand 828 828 40 92 1 : tunables 120 60 0 : slabdata 9 9 0
acpi_parse_ext 61 84 44 84 1 : tunables 120 60 0 : slabdata 1 1 0
acpi_parse 41 127 28 127 1 : tunables 120 60 0 : slabdata 1 1 0
acpi_state 28 78 48 78 1 : tunables 120 60 0 : slabdata 1 1 0
proc_inode_cache 215 360 332 12 1 : tunables 54 27 0 : slabdata 30 30 0
sigqueue 4 26 148 26 1 : tunables 120 60 0 : slabdata 1 1 0
radix_tree_node 3568 4046 276 14 1 : tunables 54 27 0 : slabdata 289 289 0
bdev_cache 7 9 416 9 1 : tunables 54 27 0 : slabdata 1 1 0
sysfs_dir_cache 4059 4140 40 92 1 : tunables 120 60 0 : slabdata 45 45 0
mnt_cache 27 40 96 40 1 : tunables 120 60 0 : slabdata 1 1 0
inode_cache 1113 1272 316 12 1 : tunables 54 27 0 : slabdata 106 106 0
dentry_cache 5085 7569 136 29 1 : tunables 120 60 0 : slabdata 261 261 0
filp 1512 1632 160 24 1 : tunables 120 60 0 : slabdata 68 68 0
names_cache 11 11 4096 1 1 : tunables 24 12 0 : slabdata 11 11 0
idr_layer_cache 93 116 136 29 1 : tunables 120 60 0 : slabdata 4 4 0
buffer_head 3942 20592 48 78 1 : tunables 120 60 0 : slabdata 264 264 0
mm_struct 77 77 576 7 1 : tunables 54 27 0 : slabdata 11 11 0
vm_area_struct 3512 3740 88 44 1 : tunables 120 60 0 : slabdata 85 85 0
fs_cache 77 113 32 113 1 : tunables 120 60 0 : slabdata 1 1 0
files_cache 78 99 448 9 1 : tunables 54 27 0 : slabdata 11 11 0
signal_cache 99 99 352 11 1 : tunables 54 27 0 : slabdata 9 9 0
sighand_cache 84 84 1312 3 1 : tunables 24 12 0 : slabdata 28 28 0
task_struct 93 93 1328 3 1 : tunables 24 12 0 : slabdata 31 31 0
anon_vma 1504 1695 8 339 1 : tunables 120 60 0 : slabdata 5 5 0
pgd 64 64 4096 1 1 : tunables 24 12 0 : slabdata 64 64 0
size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
size-65536 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
size-32768 2 2 32768 1 8 : tunables 8 4 0 : slabdata 2 2 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-16384 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
size-8192 95 95 8192 1 2 : tunables 8 4 0 : slabdata 95 95 0
size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0
size-4096 100 100 4096 1 1 : tunables 24 12 0 : slabdata 100 100 0
size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0
size-2048 310 328 2048 2 1 : tunables 24 12 0 : slabdata 164 164 0
size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0
size-1024 176 176 1024 4 1 : tunables 54 27 0 : slabdata 44 44 0
size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0
size-512 624 624 512 8 1 : tunables 54 27 0 : slabdata 78 78 0
size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0
size-256 150 150 256 15 1 : tunables 120 60 0 : slabdata 10 10 0
size-128(DMA) 0 0 128 30 1 : tunables 120 60 0 : slabdata 0 0 0
size-128 1702 1800 128 30 1 : tunables 120 60 0 : slabdata 60 60 0
size-64(DMA) 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
size-32(DMA) 0 0 32 113 1 : tunables 120 60 0 : slabdata 0 0 0
size-64 2641 2891 64 59 1 : tunables 120 60 0 : slabdata 49 49 0
size-32 3020 3616 32 113 1 : tunables 120 60 0 : slabdata 32 32 0
kmem_cache 160 160 96 40 1 : tunables 120 60 0 : slabdata 4 4 0
and
cat /proc/vmstat
nr_dirty 6
nr_writeback 0
nr_unstable 0
nr_page_table_pages 299
nr_mapped 39613
nr_slab 4128
pgpgin 853871
pgpgout 697604
pswpin 0
pswpout 33
pgalloc_high 0
pgalloc_normal 7729542
pgalloc_dma 739299
pgfree 8475900
pgactivate 194732
pgdeactivate 167948
pgfault 4652531
pgmajfault 2200
pgrefill_high 0
pgrefill_normal 921490
pgrefill_dma 53701
pgsteal_high 0
pgsteal_normal 225142
pgsteal_dma 32821
pgscan_kswapd_high 0
pgscan_kswapd_normal 218790
pgscan_kswapd_dma 31262
pgscan_direct_high 0
pgscan_direct_normal 63855
pgscan_direct_dma 10391
pginodesteal 888
slabs_scanned 1641984
kswapd_steal 196892
kswapd_inodesteal 17749
pageoutrun 5595
allocstall 1531
pgrotated 71
nr_bounce 0
--
mattia
:wq!
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-24 17:58 ` 2.6.14-rc2-mm1 Mattia Dongili
@ 2005-09-24 18:23 ` Andrew Morton
2005-09-26 19:33 ` 2.6.14-rc2-mm1 Seth, Rohit
0 siblings, 1 reply; 37+ messages in thread
From: Andrew Morton @ 2005-09-24 18:23 UTC (permalink / raw)
To: Mattia Dongili; +Cc: linux-kernel, Seth, Rohit
Mattia Dongili <malattia@linux.it> wrote:
>
> On Wed, Sep 21, 2005 at 10:28:39PM -0700, Andrew Morton wrote:
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
>
> Herm... running almost good :) I just got the below allocation failure
> (including /proc/slabinfo and /proc/vmstat, useful? can provide more
> info if happens again - ah, exim is just running for the local delivery
> purpose only). I did see it previously in .14-rc1-mm1 only but I didn't
> find time enough to report it properly.
>
> ...
> exim4: page allocation failure. order:1, mode:0x80000020
Yes, it's expected that
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch will cause more
fragmentation and will hence cause higher-order allocation attempts to
fail.
I think I'll drop that one.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-24 18:23 ` 2.6.14-rc2-mm1 Andrew Morton
@ 2005-09-26 19:33 ` Seth, Rohit
2005-09-27 18:57 ` 2.6.14-rc2-mm1 Martin J. Bligh
0 siblings, 1 reply; 37+ messages in thread
From: Seth, Rohit @ 2005-09-26 19:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: Mattia Dongili, linux-kernel, Seth, Rohit
On Sat, Sep 24, 2005 at 11:23:39AM -0700, Andrew Morton wrote:
> >
> > ...
> > exim4: page allocation failure. order:1, mode:0x80000020
>
> Yes, it's expected that
> mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch will cause more
> fragmentation and will hence cause higher-order allocation attempts to
> fail.
>
> I think I'll drop that one.
Seems like from the log messages that quite a few pages are hanging in the cpu's cold pcp list even with the low memory conditions. Below is the patch to reduce the higher bound in cold pcp list (...this got increased with my previous change).
I think we should also drain the CPU's hot and cold pcps for the GFP_KERNEL page requests (in the event the higher order request is not able to get serviced otherwise). This will still only drains the current CPUs pcps in an MP environment (leaving the other CPUs with their lists intact). I will send this patch later today.
[PATCH]: Reduce the high mark in cpu's cold pcp list.
Signed-off-by: Rohit Seth <rohit.seth@intel.com>
--- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000 -0700
+++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000 -0700
@@ -1749,7 +1749,7 @@
pcp = &p->pcp[1]; /* cold*/
pcp->count = 0;
pcp->low = 0;
- pcp->high = 2 * batch;
+ pcp->high = batch / 2;
pcp->batch = max(1UL, batch/2);
INIT_LIST_HEAD(&pcp->list);
}
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-26 19:33 ` 2.6.14-rc2-mm1 Seth, Rohit
@ 2005-09-27 18:57 ` Martin J. Bligh
2005-09-27 20:05 ` 2.6.14-rc2-mm1 Rohit Seth
0 siblings, 1 reply; 37+ messages in thread
From: Martin J. Bligh @ 2005-09-27 18:57 UTC (permalink / raw)
To: Seth, Rohit, Andrew Morton; +Cc: Mattia Dongili, linux-kernel
> Seems like from the log messages that quite a few pages are hanging in the cpu's cold pcp list even with the low memory conditions. Below is the patch to reduce the higher bound in cold pcp list (...this got increased with my previous change).
>
> I think we should also drain the CPU's hot and cold pcps for the GFP_KERNEL page requests (in the event the higher order request is not able to get serviced otherwise). This will still only drains the current CPUs pcps in an MP environment (leaving the other CPUs with their lists intact). I will send this patch later today.
>
> [PATCH]: Reduce the high mark in cpu's cold pcp list.
>
> Signed-off-by: Rohit Seth <rohit.seth@intel.com>
>
>
> --- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000 -0700
> +++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000 -0700
> @@ -1749,7 +1749,7 @@
> pcp = &p->pcp[1]; /* cold*/
> pcp->count = 0;
> pcp->low = 0;
> - pcp->high = 2 * batch;
> + pcp->high = batch / 2;
> pcp->batch = max(1UL, batch/2);
> INIT_LIST_HEAD(&pcp->list);
> }
> -
I don't understand. How can you set the high watermark at half the batch
size? Makes no sense to me.
And can you give a stricter definiton of what you mean by "low memory
conditions"? I agree we ought to empty the lists before going OOM or
anything, but not at the slightest feather of pressure ... answer lies
somewhere inbetween ... but where?
M.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-27 18:57 ` 2.6.14-rc2-mm1 Martin J. Bligh
@ 2005-09-27 20:05 ` Rohit Seth
2005-09-27 21:18 ` 2.6.14-rc2-mm1 Martin J. Bligh
0 siblings, 1 reply; 37+ messages in thread
From: Rohit Seth @ 2005-09-27 20:05 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andrew Morton, Mattia Dongili, linux-kernel
On Tue, 2005-09-27 at 11:57 -0700, Martin J. Bligh wrote:
> > Seems like from the log messages that quite a few pages are hanging
> in the cpu's cold pcp list even with the low memory conditions. Below
> is the patch to reduce the higher bound in cold pcp list (...this got
> increased with my previous change).
>
> >
> > I think we should also drain the CPU's hot and cold pcps for the
> GFP_KERNEL page requests (in the event the higher order request is not
> able to get serviced otherwise). This will still only drains the
> current CPUs pcps in an MP environment (leaving the other CPUs with
> their lists intact). I will send this patch later today.
>
> >
> > [PATCH]: Reduce the high mark in cpu's cold pcp list.
> >
> > Signed-off-by: Rohit Seth <rohit.seth@intel.com>
> >
> >
> > --- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000
> -0700
> > +++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000
> -0700
> > @@ -1749,7 +1749,7 @@
> > pcp = &p->pcp[1]; /* cold*/
> > pcp->count = 0;
> > pcp->low = 0;
> > - pcp->high = 2 * batch;
> > + pcp->high = batch / 2;
> > pcp->batch = max(1UL, batch/2);
> > INIT_LIST_HEAD(&pcp->list);
> > }
> > -
>
> I don't understand. How can you set the high watermark at half the
> batch size? Makes no sense to me.
>
The batch size for the cold pcp list is getting initialized to batch/2
in the code snip above. So, this change is setting the high water mark
for cold list to same as pcp's batch number.
> And can you give a stricter definiton of what you mean by "low memory
> conditions"? I agree we ought to empty the lists before going OOM or
> anything, but not at the slightest feather of pressure ... answer lies
> somewhere inbetween ... but where?
>
In the specific case of dump information that Mattia sent earlier, there
is only 4M of free mem available at the time the order 1 request is
failing.
In general, I think if a specific higher order ( > 0) request fails that
has GFP_KERNEL set then at least we should drain the pcps.
-rohit
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-27 20:05 ` 2.6.14-rc2-mm1 Rohit Seth
@ 2005-09-27 21:18 ` Martin J. Bligh
2005-09-27 21:51 ` 2.6.14-rc2-mm1 Rohit Seth
0 siblings, 1 reply; 37+ messages in thread
From: Martin J. Bligh @ 2005-09-27 21:18 UTC (permalink / raw)
To: Rohit Seth; +Cc: Andrew Morton, Mattia Dongili, linux-kernel
>> > --- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000
>> -0700
>> > +++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000
>> -0700
>> > @@ -1749,7 +1749,7 @@
>> > pcp = &p->pcp[1]; /* cold*/
>> > pcp->count = 0;
>> > pcp->low = 0;
>> > - pcp->high = 2 * batch;
>> > + pcp->high = batch / 2;
>> > pcp->batch = max(1UL, batch/2);
>> > INIT_LIST_HEAD(&pcp->list);
>> > }
>> > -
>>
>> I don't understand. How can you set the high watermark at half the
>> batch size? Makes no sense to me.
>>
>
> The batch size for the cold pcp list is getting initialized to batch/2
> in the code snip above. So, this change is setting the high water mark
> for cold list to same as pcp's batch number.
I must be being particularly dense today ... but:
pcp->high = batch / 2;
Looks like half the batch size to me, not the same?
>> And can you give a stricter definiton of what you mean by "low memory
>> conditions"? I agree we ought to empty the lists before going OOM or
>> anything, but not at the slightest feather of pressure ... answer lies
>> somewhere inbetween ... but where?
>>
>
> In the specific case of dump information that Mattia sent earlier, there
> is only 4M of free mem available at the time the order 1 request is
> failing.
>
> In general, I think if a specific higher order ( > 0) request fails that
> has GFP_KERNEL set then at least we should drain the pcps.
Mmmm. so every time we fork a process with 8K stacks, or allocate a frame
for jumbo ethernet, or NFS, you want to drain the lists? that seems to
wholly defeat the purpose.
Could you elaborate on what the benefits were from this change in the
first place? Some page colouring thing on ia64? It seems to have way more
downside than upside to me.
M.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-27 21:18 ` 2.6.14-rc2-mm1 Martin J. Bligh
@ 2005-09-27 21:51 ` Rohit Seth
2005-09-27 21:59 ` 2.6.14-rc2-mm1 Martin J. Bligh
0 siblings, 1 reply; 37+ messages in thread
From: Rohit Seth @ 2005-09-27 21:51 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andrew Morton, Mattia Dongili, linux-kernel
On Tue, 2005-09-27 at 14:18 -0700, Martin J. Bligh wrote:
> >> > --- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000
> >> -0700
> >> > +++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000
> >> -0700
> >> > @@ -1749,7 +1749,7 @@
> >> > pcp = &p->pcp[1]; /* cold*/
> >> > pcp->count = 0;
> >> > pcp->low = 0;
> >> > - pcp->high = 2 * batch;
> >> > + pcp->high = batch / 2;
> >> > pcp->batch = max(1UL, batch/2);
> >> > INIT_LIST_HEAD(&pcp->list);
> >> > }
> >> > -
> >>
> >> I don't understand. How can you set the high watermark at half the
> >> batch size? Makes no sense to me.
> >>
> >
> > The batch size for the cold pcp list is getting initialized to batch/2
> > in the code snip above. So, this change is setting the high water mark
> > for cold list to same as pcp's batch number.
>
> I must be being particularly dense today ... but:
>
> pcp->high = batch / 2;
>
> Looks like half the batch size to me, not the same?
pcp->batch = max(1UL, batch/2); is the line of code that is setting the
batch value for the cold pcp list. batch is just a number that we
counted based on some parameters earlier.
>
> >> And can you give a stricter definiton of what you mean by "low memory
> >> conditions"? I agree we ought to empty the lists before going OOM or
> >> anything, but not at the slightest feather of pressure ... answer lies
> >> somewhere inbetween ... but where?
> >>
> >
> > In the specific case of dump information that Mattia sent earlier, there
> > is only 4M of free mem available at the time the order 1 request is
> > failing.
> >
> > In general, I think if a specific higher order ( > 0) request fails that
> > has GFP_KERNEL set then at least we should drain the pcps.
>
> Mmmm. so every time we fork a process with 8K stacks, or allocate a frame
> for jumbo ethernet, or NFS, you want to drain the lists? that seems to
> wholly defeat the purpose.
>
Not every time there is a request for higher order pages. That surely
will defeat the purpose of pcps. But my suggestion is only to drain
when the the global pool is not able to service the request. In the
pathological case where the higher order and zero order requests are
alternating you could have thrashing in terms of pages moving to pcp for
them to move back to global list.
> Could you elaborate on what the benefits were from this change in the
> first place? Some page colouring thing on ia64? It seems to have way more
> downside than upside to me.
The original change was to try to allocate a higher order page to
service a batch size bulk request. This was with the hope that better
physical contiguity will spread the data better across big caches.
-rohit
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-27 21:51 ` 2.6.14-rc2-mm1 Rohit Seth
@ 2005-09-27 21:59 ` Martin J. Bligh
2005-09-27 22:49 ` 2.6.14-rc2-mm1 Rohit Seth
0 siblings, 1 reply; 37+ messages in thread
From: Martin J. Bligh @ 2005-09-27 21:59 UTC (permalink / raw)
To: Rohit Seth; +Cc: Andrew Morton, Mattia Dongili, linux-kernel
>> I must be being particularly dense today ... but:
>>
>> pcp->high = batch / 2;
>>
>> Looks like half the batch size to me, not the same?
>
> pcp->batch = max(1UL, batch/2); is the line of code that is setting the
> batch value for the cold pcp list. batch is just a number that we
> counted based on some parameters earlier.
Ah, OK, so I am being dense. Fair enough. But if there's a reason to do
that max, perhaps:
pcp->batch = max(1UL, batch/2);
pcp->high = pcp->batch;
would be more appropriate? Tradeoff is more frequent dump / fill against
better frag, I suppose (at least if we don't refill using higher order
allocs ;-)) which seems fair enough.
>> > In general, I think if a specific higher order ( > 0) request fails that
>> > has GFP_KERNEL set then at least we should drain the pcps.
>>
>> Mmmm. so every time we fork a process with 8K stacks, or allocate a frame
>> for jumbo ethernet, or NFS, you want to drain the lists? that seems to
>> wholly defeat the purpose.
>
> Not every time there is a request for higher order pages. That surely
> will defeat the purpose of pcps. But my suggestion is only to drain
> when the the global pool is not able to service the request. In the
> pathological case where the higher order and zero order requests are
> alternating you could have thrashing in terms of pages moving to pcp for
> them to move back to global list.
OK, seems fair enough. But there's multiple "harder and harder" attempts
within __alloc_pages to do that ... which one are you going for? just
before we OOM / fail the alloc? That'd be hard to argue with, though I'm
unsure what the locking is to dump out other CPUs queues - you going to
global IPI and ask them to do it - that'd seem to cause it to race to
refill (as you mention).
>> Could you elaborate on what the benefits were from this change in the
>> first place? Some page colouring thing on ia64? It seems to have way more
>> downside than upside to me.
>
> The original change was to try to allocate a higher order page to
> service a batch size bulk request. This was with the hope that better
> physical contiguity will spread the data better across big caches.
OK ... but it has an impact on fragmentation. How much benefit are you
getting?
M.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-27 21:59 ` 2.6.14-rc2-mm1 Martin J. Bligh
@ 2005-09-27 22:49 ` Rohit Seth
2005-09-27 22:49 ` 2.6.14-rc2-mm1 Martin J. Bligh
0 siblings, 1 reply; 37+ messages in thread
From: Rohit Seth @ 2005-09-27 22:49 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andrew Morton, Mattia Dongili, linux-kernel
On Tue, 2005-09-27 at 14:59 -0700, Martin J. Bligh wrote:
> pcp->batch = max(1UL, batch/2);
> pcp->high = pcp->batch;
>
> would be more appropriate? Tradeoff is more frequent dump / fill against
> better frag, I suppose (at least if we don't refill using higher order
> allocs ;-)) which seems fair enough.
>
There are couple of small changes including this one that I will be
sending out in this initialization routine.
> >
> > Not every time there is a request for higher order pages. That surely
> > will defeat the purpose of pcps. But my suggestion is only to drain
> > when the the global pool is not able to service the request. In the
> > pathological case where the higher order and zero order requests are
> > alternating you could have thrashing in terms of pages moving to pcp for
> > them to move back to global list.
>
> OK, seems fair enough. But there's multiple "harder and harder" attempts
> within __alloc_pages to do that ... which one are you going for? just
> before we OOM / fail the alloc? That'd be hard to argue with, though I'm
> unsure what the locking is to dump out other CPUs queues - you going to
> global IPI and ask them to do it - that'd seem to cause it to race to
> refill (as you mention).
>
Thinking of initiating this drain operation after the swapper daemon is
woken up. hopefully that will allow other possible pages to be put back
on freelist and reduce the possible thrash of pages between freemem pool
and pcps.
As a first step, I will be draining the local cpu's pcp. IPI or lazy
purging of pcps could be used as a a very last resort to drain other
CPUs pcps for the scnearios where nothing else has worked to get more
pages. For these extreme low memory conditions I'm not sure if we
should worry about thrashing any more than having free pages lying
around and not getting used.
> >> Could you elaborate on what the benefits were from this change in the
> >> first place? Some page colouring thing on ia64? It seems to have way more
> >> downside than upside to me.
> >
> > The original change was to try to allocate a higher order page to
> > service a batch size bulk request. This was with the hope that better
> > physical contiguity will spread the data better across big caches.
>
> OK ... but it has an impact on fragmentation. How much benefit are you
> getting?
>
Benefit is in terms of reduced performance variation (and expected
throughput) of certain workloads from run to run on the same kernel.
-rohit
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-27 22:49 ` 2.6.14-rc2-mm1 Rohit Seth
@ 2005-09-27 22:49 ` Martin J. Bligh
2005-09-27 23:16 ` 2.6.14-rc2-mm1 Rohit Seth
0 siblings, 1 reply; 37+ messages in thread
From: Martin J. Bligh @ 2005-09-27 22:49 UTC (permalink / raw)
To: Rohit Seth; +Cc: Andrew Morton, Mattia Dongili, linux-kernel
>> > Not every time there is a request for higher order pages. That surely
>> > will defeat the purpose of pcps. But my suggestion is only to drain
>> > when the the global pool is not able to service the request. In the
>> > pathological case where the higher order and zero order requests are
>> > alternating you could have thrashing in terms of pages moving to pcp for
>> > them to move back to global list.
>>
>> OK, seems fair enough. But there's multiple "harder and harder" attempts
>> within __alloc_pages to do that ... which one are you going for? just
>> before we OOM / fail the alloc? That'd be hard to argue with, though I'm
>> unsure what the locking is to dump out other CPUs queues - you going to
>> global IPI and ask them to do it - that'd seem to cause it to race to
>> refill (as you mention).
>>
>
> Thinking of initiating this drain operation after the swapper daemon is
> woken up. hopefully that will allow other possible pages to be put back
> on freelist and reduce the possible thrash of pages between freemem pool
> and pcps.
OK, but waking up kswapd doesn't indicate a low memory condition.
It's standard procedure .... we'll have to wake it up whenever we dip
below the high watermarks. Perhaps before dropping into direct reclaim
would be more appropriate?
> As a first step, I will be draining the local cpu's pcp. IPI or lazy
> purging of pcps could be used as a a very last resort to drain other
> CPUs pcps for the scnearios where nothing else has worked to get more
> pages. For these extreme low memory conditions I'm not sure if we
> should worry about thrashing any more than having free pages lying
> around and not getting used.
Sounds fair.
>> >> Could you elaborate on what the benefits were from this change in the
>> >> first place? Some page colouring thing on ia64? It seems to have way more
>> >> downside than upside to me.
>> >
>> > The original change was to try to allocate a higher order page to
>> > service a batch size bulk request. This was with the hope that better
>> > physical contiguity will spread the data better across big caches.
>>
>> OK ... but it has an impact on fragmentation. How much benefit are you
>> getting?
>
> Benefit is in terms of reduced performance variation (and expected
> throughput) of certain workloads from run to run on the same kernel.
Mmmm. how much are you talking about in terms of throughput, and on what
platforms? all previous attempts to measure page colouring seemed to
indicate it did nothing at all - maybe some specific types of h/w are
more susceptible?
M.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1
2005-09-27 22:49 ` 2.6.14-rc2-mm1 Martin J. Bligh
@ 2005-09-27 23:16 ` Rohit Seth
0 siblings, 0 replies; 37+ messages in thread
From: Rohit Seth @ 2005-09-27 23:16 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andrew Morton, Mattia Dongili, linux-kernel
On Tue, 2005-09-27 at 15:49 -0700, Martin J. Bligh wrote:
> >
> > Thinking of initiating this drain operation after the swapper daemon is
> > woken up. hopefully that will allow other possible pages to be put back
> > on freelist and reduce the possible thrash of pages between freemem pool
> > and pcps.
>
> OK, but waking up kswapd doesn't indicate a low memory condition.
> It's standard procedure .... we'll have to wake it up whenever we dip
> below the high watermarks. Perhaps before dropping into direct reclaim
> would be more appropriate?
>
Agreed. That is a better place.
> >> >> Could you elaborate on what the benefits were from this change in the
> >> >> first place? Some page colouring thing on ia64? It seems to have way more
> >> >> downside than upside to me.
> >> >
> >> > The original change was to try to allocate a higher order page to
> >> > service a batch size bulk request. This was with the hope that better
> >> > physical contiguity will spread the data better across big caches.
> >>
> >> OK ... but it has an impact on fragmentation. How much benefit are you
> >> getting?
> >
> > Benefit is in terms of reduced performance variation (and expected
> > throughput) of certain workloads from run to run on the same kernel.
>
> Mmmm. how much are you talking about in terms of throughput, and on what
> platforms? all previous attempts to measure page colouring seemed to
> indicate it did nothing at all - maybe some specific types of h/w are
> more susceptible?
>
In terms of percentages, between 10-15% variation. Nothing out of
regular about the platforms. Do you remember what workloads were run in
the previous attempts to see if there is any coloring. I agree that
with 2.6.x based kernel, there is better handle on the variation (as
compared to 2.4). And the best results of 2.6 matches the best results
of any coloring patch.
-rohit
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1 (Oops, possibly Netfilter related?)
2005-09-22 5:28 2.6.14-rc2-mm1 Andrew Morton
` (5 preceding siblings ...)
2005-09-24 17:58 ` 2.6.14-rc2-mm1 Mattia Dongili
@ 2005-09-27 7:13 ` Reuben Farrelly
2005-09-27 7:44 ` Andrew Morton
6 siblings, 1 reply; 37+ messages in thread
From: Reuben Farrelly @ 2005-09-27 7:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, netfilter-devel
Hi again,
On 22/09/2005 5:28 p.m., Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
>
> - Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.
>
> - Various random other things - nothing major.
Just noticed this oops from about 4am this morning. This would have been at
about the time when the normal daily cronjobs are run, but shouldn't have been
doing much else.
Sep 27 04:04:28 tornado kernel: smbd: page allocation failure. order:1,
mode:0x80000020
Sep 27 04:04:28 tornado kernel: [<c0103ad0>] dump_stack+0x17/0x19
Sep 27 04:04:28 tornado kernel: [<c013f84a>] __alloc_pages+0x2d8/0x3ef
Sep 27 04:04:28 tornado kernel: [<c0142b32>] kmem_getpages+0x2c/0x91
Sep 27 04:04:28 tornado kernel: [<c0144136>] cache_grow+0xa2/0x1aa
Sep 27 04:04:28 tornado kernel: [<c0144810>] cache_alloc_refill+0x279/0x2bb
Sep 27 04:04:28 tornado kernel: [<c0144da9>] __kmalloc+0xc7/0xe7
Sep 27 04:04:28 tornado kernel: [<c02ab386>] pskb_expand_head+0x4b/0x11a
Sep 27 04:04:28 tornado kernel: [<c02afd34>] skb_checksum_help+0xcb/0xe5
Sep 27 04:04:28 tornado kernel: [<c0302b0d>] ip_nat_fn+0x16d/0x1bf
Sep 27 04:04:28 tornado kernel: [<c0302cdc>] ip_nat_local_fn+0x57/0x8d
Sep 27 04:04:28 tornado kernel: [<c03068ef>] nf_iterate+0x59/0x7d
Sep 27 04:04:28 tornado kernel: [<c030695d>] nf_hook_slow+0x4a/0x109
Sep 27 04:04:28 tornado kernel: [<c02ca035>] ip_queue_xmit+0x23c/0x4f5
Sep 27 04:04:28 tornado kernel: [<c02da477>] tcp_transmit_skb+0x3ce/0x713
Sep 27 04:04:29 tornado kernel: [<c02db53b>] tcp_write_xmit+0x124/0x37b
Sep 27 04:04:29 tornado kernel: [<c02db7b3>] __tcp_push_pending_frames+0x21/0x70
Sep 27 04:04:29 tornado kernel: [<c02d0b45>] tcp_sendmsg+0x9cc/0xabc
Sep 27 04:04:29 tornado kernel: [<c02ed3dd>] inet_sendmsg+0x2e/0x4c
Sep 27 04:04:29 tornado kernel: [<c02a6691>] sock_sendmsg+0xbf/0xe3
Sep 27 04:04:29 tornado kernel: [<c02a77be>] sys_sendto+0xa5/0xbe
Sep 27 04:04:29 tornado kernel: [<c02a780d>] sys_send+0x36/0x38
Sep 27 04:04:29 tornado kernel: [<c02a7ef7>] sys_socketcall+0x134/0x251
Sep 27 04:04:29 tornado kernel: [<c0102b5b>] sysenter_past_esp+0x54/0x75
Sep 27 04:04:29 tornado kernel: Mem-info:
Sep 27 04:04:29 tornado kernel: DMA per-cpu:
Sep 27 04:04:29 tornado kernel: cpu 0 hot: low 0, high 12, batch 2 used:10
Sep 27 04:04:29 tornado kernel: cpu 0 cold: low 0, high 4, batch 1 used:3
Sep 27 04:04:29 tornado kernel: cpu 1 hot: low 0, high 12, batch 2 used:10
Sep 27 04:04:29 tornado kernel: cpu 1 cold: low 0, high 4, batch 1 used:3
Sep 27 04:04:29 tornado kernel: DMA32 per-cpu: empty
Sep 27 04:04:30 tornado kernel: Normal per-cpu:
Sep 27 04:04:30 tornado kernel: cpu 0 hot: low 0, high 384, batch 64 used:346
Sep 27 04:04:30 tornado kernel: cpu 0 cold: low 0, high 128, batch 32 used:115
Sep 27 04:04:30 tornado kernel: cpu 1 hot: low 0, high 384, batch 64 used:324
Sep 27 04:04:30 tornado kernel: cpu 1 cold: low 0, high 128, batch 32 used:112
Sep 27 04:04:30 tornado kernel: HighMem per-cpu:
Sep 27 04:04:30 tornado kernel: cpu 0 hot: low 0, high 96, batch 16 used:38
Sep 27 04:04:30 tornado kernel: cpu 0 cold: low 0, high 32, batch 8 used:27
Sep 27 04:04:30 tornado kernel: cpu 1 hot: low 0, high 96, batch 16 used:36
Sep 27 04:04:30 tornado kernel: cpu 1 cold: low 0, high 32, batch 8 used:5
Sep 27 04:04:30 tornado kernel: Free pages: 38404kB (2720kB HighMem)
Sep 27 04:04:31 tornado kernel: Active:139410 inactive:49515 dirty:135
writeback:1 unstable:0 free:9601 slab:54525 mapped:88304 pagetables:776
Sep 27 04:04:31 tornado kernel: DMA free:5828kB min:68kB low:84kB high:100kB
active:100kB inactive:944kB present:16384kB pages_scanned:0 all_unreclaimable? no
Sep 27 04:04:31 tornado kernel: lowmem_reserve[]: 0 0 880 1006
Sep 27 04:04:31 tornado kernel: DMA32 free:0kB min:0kB low:0kB high:0kB
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 27 04:04:31 tornado kernel: lowmem_reserve[]: 0 0 880 1006
Sep 27 04:04:31 tornado kernel: Normal free:29856kB min:3756kB low:4692kB
high:5632kB active:446760kB inactive:188768kB present:901120kB pages_scanned:0
all_unreclaimable? no
Sep 27 04:04:31 tornado kernel: lowmem_reserve[]: 0 0 0 1009
Sep 27 04:04:32 tornado kernel: HighMem free:2720kB min:128kB low:160kB
high:192kB active:110784kB inactive:8344kB present:129212kB pages_scanned:0
all_unreclaimable? no
Sep 27 04:04:32 tornado kernel: lowmem_reserve[]: 0 0 0 0
Sep 27 04:04:32 tornado kernel: DMA: 803*4kB 167*8kB 50*16kB 15*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5828kB
Sep 27 04:04:32 tornado kernel: DMA32: empty
Sep 27 04:04:32 tornado kernel: Normal: 6744*4kB 360*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 29856kB
Sep 27 04:04:32 tornado kernel: HighMem: 654*4kB 13*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2720kB
Sep 27 04:04:32 tornado kernel: Swap cache: add 40, delete 40, find 1/2, race 0+0
Sep 27 04:04:32 tornado kernel: Free swap = 497820kB
Sep 27 04:04:32 tornado kernel: Total swap = 497936kB
Sep 27 04:04:32 tornado kernel: Free swap: 497820kB
Sep 27 04:04:32 tornado kernel: 261679 pages of RAM
Sep 27 04:04:32 tornado kernel: 32303 pages of HIGHMEM
Sep 27 04:04:32 tornado kernel: 3160 reserved pages
Sep 27 04:04:32 tornado kernel: 160186 pages shared
Sep 27 04:04:32 tornado kernel: 0 pages swap cached
Sep 27 04:04:33 tornado kernel: 135 pages dirty
Sep 27 04:04:33 tornado kernel: 1 pages writeback
Sep 27 04:04:33 tornado kernel: 88304 pages mapped
Sep 27 04:04:33 tornado kernel: 54527 pages slab
Sep 27 04:04:33 tornado kernel: 776 pages pagetables
Sep 27 04:04:59 tornado kernel: smtpd: page allocation failure. order:1,
mode:0x80000020
Sep 27 04:04:59 tornado kernel: [<c0103ad0>] dump_stack+0x17/0x19
Sep 27 04:04:59 tornado kernel: [<c013f84a>] __alloc_pages+0x2d8/0x3ef
Sep 27 04:04:59 tornado kernel: [<c0142b32>] kmem_getpages+0x2c/0x91
Sep 27 04:04:59 tornado kernel: [<c0144136>] cache_grow+0xa2/0x1aa
Sep 27 04:04:59 tornado kernel: [<c0144810>] cache_alloc_refill+0x279/0x2bb
Sep 27 04:04:59 tornado kernel: [<c0144da9>] __kmalloc+0xc7/0xe7
Sep 27 04:04:59 tornado kernel: [<c02ab386>] pskb_expand_head+0x4b/0x11a
Sep 27 04:04:59 tornado kernel: [<c02afd34>] skb_checksum_help+0xcb/0xe5
Sep 27 04:04:59 tornado kernel: [<c0302b0d>] ip_nat_fn+0x16d/0x1bf
Sep 27 04:04:59 tornado kernel: [<c0302cdc>] ip_nat_local_fn+0x57/0x8d
Sep 27 04:04:59 tornado kernel: [<c03068ef>] nf_iterate+0x59/0x7d
Sep 27 04:04:59 tornado kernel: [<c030695d>] nf_hook_slow+0x4a/0x109
Sep 27 04:05:00 tornado kernel: [<c02ca035>] ip_queue_xmit+0x23c/0x4f5
Sep 27 04:05:00 tornado kernel: [<c02da477>] tcp_transmit_skb+0x3ce/0x713
Sep 27 04:05:00 tornado kernel: [<c02db53b>] tcp_write_xmit+0x124/0x37b
Sep 27 04:05:00 tornado kernel: [<c02db7b3>] __tcp_push_pending_frames+0x21/0x70
Sep 27 04:05:01 tornado kernel: [<c02d0b45>] tcp_sendmsg+0x9cc/0xabc
Sep 27 04:05:01 tornado kernel: [<c02ed3dd>] inet_sendmsg+0x2e/0x4c
Sep 27 04:05:01 tornado kernel: [<c02a69c6>] sock_aio_write+0xbd/0xf6
Sep 27 04:05:01 tornado kernel: [<c0159767>] do_sync_write+0xbb/0x10a
Sep 27 04:05:01 tornado kernel: [<c01598f7>] vfs_write+0x141/0x148
Sep 27 04:05:02 tornado kernel: [<c015999f>] sys_write+0x3d/0x64
Sep 27 04:05:02 tornado kernel: [<c0102b5b>] sysenter_past_esp+0x54/0x75
Sep 27 04:05:02 tornado kernel: Mem-info:
Sep 27 04:05:02 tornado kernel: DMA per-cpu:
Sep 27 04:05:02 tornado kernel: cpu 0 hot: low 0, high 12, batch 2 used:4
Sep 27 04:05:02 tornado kernel: cpu 0 cold: low 0, high 4, batch 1 used:3
Sep 27 04:05:02 tornado kernel: cpu 1 hot: low 0, high 12, batch 2 used:10
Sep 27 04:05:03 tornado kernel: cpu 1 cold: low 0, high 4, batch 1 used:3
Sep 27 04:05:03 tornado kernel: DMA32 per-cpu: empty
Sep 27 04:05:03 tornado kernel: Normal per-cpu:
Sep 27 04:05:03 tornado kernel: cpu 0 hot: low 0, high 384, batch 64 used:23
Sep 27 04:05:04 tornado kernel: cpu 0 cold: low 0, high 128, batch 32 used:115
Sep 27 04:05:04 tornado kernel: cpu 1 hot: low 0, high 384, batch 64 used:383
Sep 27 04:05:04 tornado kernel: cpu 1 cold: low 0, high 128, batch 32 used:120
Sep 27 04:05:04 tornado kernel: HighMem per-cpu:
Sep 27 04:05:04 tornado kernel: cpu 0 hot: low 0, high 96, batch 16 used:89
Sep 27 04:05:04 tornado kernel: cpu 0 cold: low 0, high 32, batch 8 used:3
Sep 27 04:05:05 tornado kernel: cpu 1 hot: low 0, high 96, batch 16 used:5
Sep 27 04:05:05 tornado kernel: cpu 1 cold: low 0, high 32, batch 8 used:27
Sep 27 04:05:05 tornado kernel: Free pages: 39608kB (2144kB HighMem)
Sep 27 04:05:05 tornado kernel: Active:132565 inactive:56281 dirty:100
writeback:1 unstable:0 free:9902 slab:54546 mapped:88341 pagetables:776
Sep 27 04:05:05 tornado kernel: DMA free:4704kB min:68kB low:84kB high:100kB
active:224kB inactive:948kB present:16384kB pages_scanned:0 all_unreclaimable? no
Sep 27 04:05:05 tornado kernel: lowmem_reserve[]: 0 0 880 1006
Sep 27 04:05:05 tornado kernel: DMA32 free:0kB min:0kB low:0kB high:0kB
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 27 04:05:05 tornado kernel: lowmem_reserve[]: 0 0 880 1006
Sep 27 04:05:05 tornado kernel: Normal free:32760kB min:3756kB low:4692kB
high:5632kB active:418168kB inactive:216412kB present:901120kB pages_scanned:0
all_unreclaimable? no
Sep 27 04:05:05 tornado kernel: lowmem_reserve[]: 0 0 0 1009
Sep 27 04:05:05 tornado kernel: HighMem free:2144kB min:128kB low:160kB
high:192kB active:111868kB inactive:7764kB present:129212kB pages_scanned:0
all_unreclaimable? no
Sep 27 04:05:05 tornado kernel: lowmem_reserve[]: 0 0 0 0
Sep 27 04:05:05 tornado kernel: DMA: 936*4kB 108*8kB 6*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4704kB
Sep 27 04:05:05 tornado kernel: DMA32: empty
Sep 27 04:05:05 tornado kernel: Normal: 7484*4kB 349*8kB 2*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 32760kB
Sep 27 04:05:05 tornado kernel: HighMem: 510*4kB 13*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2144kB
Sep 27 04:05:05 tornado kernel: Swap cache: add 40, delete 40, find 1/2, race 0+0
Sep 27 04:05:05 tornado kernel: Free swap = 497820kB
Sep 27 04:05:05 tornado kernel: Total swap = 497936kB
Sep 27 04:05:05 tornado kernel: Free swap: 497820kB
Sep 27 04:05:05 tornado kernel: 261679 pages of RAM
Sep 27 04:05:05 tornado kernel: 32303 pages of HIGHMEM
Sep 27 04:05:05 tornado kernel: 3160 reserved pages
Sep 27 04:05:06 tornado kernel: 165825 pages shared
Sep 27 04:05:06 tornado kernel: 0 pages swap cached
Sep 27 04:05:06 tornado kernel: 100 pages dirty
Sep 27 04:05:06 tornado kernel: 1 pages writeback
Sep 27 04:05:06 tornado kernel: 88341 pages mapped
Sep 27 04:05:06 tornado kernel: 54546 pages slab
Sep 27 04:05:06 tornado kernel: 776 pages pagetables
reuben
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1 (Oops, possibly Netfilter related?)
2005-09-27 7:13 ` 2.6.14-rc2-mm1 (Oops, possibly Netfilter related?) Reuben Farrelly
@ 2005-09-27 7:44 ` Andrew Morton
2005-09-27 18:59 ` Martin J. Bligh
0 siblings, 1 reply; 37+ messages in thread
From: Andrew Morton @ 2005-09-27 7:44 UTC (permalink / raw)
To: Reuben Farrelly; +Cc: linux-kernel, netfilter-devel, Seth, Rohit
Reuben Farrelly <reuben-lkml@reub.net> wrote:
>
> On 22/09/2005 5:28 p.m., Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
> >
> > - Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.
> >
> > - Various random other things - nothing major.
>
> Just noticed this oops from about 4am this morning. This would have been at
> about the time when the normal daily cronjobs are run, but shouldn't have been
> doing much else.
>
>
> Sep 27 04:04:28 tornado kernel: smbd: page allocation failure. order:1,
> mode:0x80000020
> Sep 27 04:04:28 tornado kernel: [<c0103ad0>] dump_stack+0x17/0x19
> Sep 27 04:04:28 tornado kernel: [<c013f84a>] __alloc_pages+0x2d8/0x3ef
> Sep 27 04:04:28 tornado kernel: [<c0142b32>] kmem_getpages+0x2c/0x91
> Sep 27 04:04:28 tornado kernel: [<c0144136>] cache_grow+0xa2/0x1aa
> Sep 27 04:04:28 tornado kernel: [<c0144810>] cache_alloc_refill+0x279/0x2bb
> Sep 27 04:04:28 tornado kernel: [<c0144da9>] __kmalloc+0xc7/0xe7
> Sep 27 04:04:28 tornado kernel: [<c02ab386>] pskb_expand_head+0x4b/0x11a
> Sep 27 04:04:28 tornado kernel: [<c02afd34>] skb_checksum_help+0xcb/0xe5
> Sep 27 04:04:28 tornado kernel: [<c0302b0d>] ip_nat_fn+0x16d/0x1bf
> Sep 27 04:04:28 tornado kernel: [<c0302cdc>] ip_nat_local_fn+0x57/0x8d
> Sep 27 04:04:28 tornado kernel: [<c03068ef>] nf_iterate+0x59/0x7d
> Sep 27 04:04:28 tornado kernel: [<c030695d>] nf_hook_slow+0x4a/0x109
> Sep 27 04:04:28 tornado kernel: [<c02ca035>] ip_queue_xmit+0x23c/0x4f5
> Sep 27 04:04:28 tornado kernel: [<c02da477>] tcp_transmit_skb+0x3ce/0x713
> Sep 27 04:04:29 tornado kernel: [<c02db53b>] tcp_write_xmit+0x124/0x37b
> Sep 27 04:04:29 tornado kernel: [<c02db7b3>] __tcp_push_pending_frames+0x21/0x70
> Sep 27 04:04:29 tornado kernel: [<c02d0b45>] tcp_sendmsg+0x9cc/0xabc
> Sep 27 04:04:29 tornado kernel: [<c02ed3dd>] inet_sendmsg+0x2e/0x4c
> Sep 27 04:04:29 tornado kernel: [<c02a6691>] sock_sendmsg+0xbf/0xe3
> Sep 27 04:04:29 tornado kernel: [<c02a77be>] sys_sendto+0xa5/0xbe
No, this is simply a warning - the kernel ran out of 1-order pages in the
page allocator. There have been several reports of this after
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch was merged,
which was rather expected.
I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
which address fragmentation at this level. If that code gets there then we
can take another look at
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1 (Oops, possibly Netfilter related?)
2005-09-27 7:44 ` Andrew Morton
@ 2005-09-27 18:59 ` Martin J. Bligh
2005-10-02 17:13 ` Paul Jackson
0 siblings, 1 reply; 37+ messages in thread
From: Martin J. Bligh @ 2005-09-27 18:59 UTC (permalink / raw)
To: Andrew Morton, Reuben Farrelly; +Cc: linux-kernel, netfilter-devel, Seth, Rohit
> No, this is simply a warning - the kernel ran out of 1-order pages in the
> page allocator. There have been several reports of this after
> mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch was merged,
> which was rather expected.
>
> I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
> which address fragmentation at this level. If that code gets there then we
> can take another look at
> mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
Me no understand. We're going to deliberately cause fragmentation in order
to defragment it again later ???
M.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1 (Oops, possibly Netfilter related?)
2005-09-27 18:59 ` Martin J. Bligh
@ 2005-10-02 17:13 ` Paul Jackson
2005-10-02 21:31 ` Martin J. Bligh
0 siblings, 1 reply; 37+ messages in thread
From: Paul Jackson @ 2005-10-02 17:13 UTC (permalink / raw)
To: Martin J. Bligh
Cc: akpm, reuben-lkml, linux-kernel, netfilter-devel, rohit.seth
Martin, responding to Andrew:
> > I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
> > which address fragmentation at this level. If that code gets there then we
> > can take another look at
> > mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
>
> Me no understand. We're going to deliberately cause fragmentation in order
> to defragment it again later ???
I thought that the patches of Mel Gorman and Joel Schopp were reducing
fragmentation, not causing it.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1 (Oops, possibly Netfilter related?)
2005-10-02 17:13 ` Paul Jackson
@ 2005-10-02 21:31 ` Martin J. Bligh
2005-10-03 17:20 ` Rohit Seth
0 siblings, 1 reply; 37+ messages in thread
From: Martin J. Bligh @ 2005-10-02 21:31 UTC (permalink / raw)
To: Paul Jackson; +Cc: akpm, linux-kernel, rohit.seth
--Paul Jackson <pj@sgi.com> wrote (on Sunday, October 02, 2005 10:13:19 -0700):
> Martin, responding to Andrew:
>> > I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
>> > which address fragmentation at this level. If that code gets there then we
>> > can take another look at
>> > mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
>>
>> Me no understand. We're going to deliberately cause fragmentation in order
>> to defragment it again later ???
>
> I thought that the patches of Mel Gorman and Joel Schopp were reducing
> fragmentation, not causing it.
They were. but mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk
seems to be going in the opposite direction.
M.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1 (Oops, possibly Netfilter related?)
2005-10-02 21:31 ` Martin J. Bligh
@ 2005-10-03 17:20 ` Rohit Seth
2005-10-03 17:56 ` Martin J. Bligh
0 siblings, 1 reply; 37+ messages in thread
From: Rohit Seth @ 2005-10-03 17:20 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Paul Jackson, akpm, linux-kernel
On Sun, 2005-10-02 at 14:31 -0700, Martin J. Bligh wrote:
>
> --Paul Jackson <pj@sgi.com> wrote (on Sunday, October 02, 2005 10:13:19 -0700):
>
> > Martin, responding to Andrew:
> >> > I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
> >> > which address fragmentation at this level. If that code gets there then we
> >> > can take another look at
> >> > mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
> >>
> >> Me no understand. We're going to deliberately cause fragmentation in order
> >> to defragment it again later ???
> >
> > I thought that the patches of Mel Gorman and Joel Schopp were reducing
> > fragmentation, not causing it.
>
> They were. but mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk
> seems to be going in the opposite direction.
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk patch tries to
allocate more physical contiguous pages for pcp. This would cause some
extra fragmentation at the higher orders but has the potential benefit
of spreading more uniformly across caches. I agree though that for this
scheme to work nicely we should have the capability of draining the pcps
so that higher order requests can be serviced whenever possible.
-rohit
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.14-rc2-mm1 (Oops, possibly Netfilter related?)
2005-10-03 17:20 ` Rohit Seth
@ 2005-10-03 17:56 ` Martin J. Bligh
0 siblings, 0 replies; 37+ messages in thread
From: Martin J. Bligh @ 2005-10-03 17:56 UTC (permalink / raw)
To: Rohit Seth; +Cc: Paul Jackson, akpm, linux-kernel
> mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk patch tries to
> allocate more physical contiguous pages for pcp. This would cause some
> extra fragmentation at the higher orders but has the potential benefit
> of spreading more uniformly across caches. I agree though that for this
> scheme to work nicely we should have the capability of draining the pcps
> so that higher order requests can be serviced whenever possible.
Unfortunately, I don't think it's that simple. We'll end up taking the
higher order elements from the buddy into the caches, and using them
all piecemeal - ie fragmenting it all.
If we take lists of 0 order pages from the buddy, we're trying to use
whatever dross was left over in there (from a fragmentation point of view)
up first, before breaking into the more precious stuff (phys contig bits).
That was why I wrote it that way in the first place - it wasn't
accidental ;-)
>From the direction the thread was going in previously, it sounded like
you were finding other ways to alleviate the colouring issue you were
seeing ... I was hoping that would fix it up enough the desire for higher
order allocations would disappear.
To be blunt about it ... making sure that we don't fall over on higher
order allocs seems to me to be more important than a bit of variability
in benchmark runs ...
M.
^ permalink raw reply [flat|nested] 37+ messages in thread