All of lore.kernel.org
 help / color / mirror / Atom feed
* mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
@ 2010-03-04 21:21 Greg Thelen
  2010-03-05  3:21 ` Johannes Weiner
  0 siblings, 1 reply; 48+ messages in thread
From: Greg Thelen @ 2010-03-04 21:21 UTC (permalink / raw)
  To: linux-mm

On several systems I am seeing a boot panic if I use mmotm
(stamp-2010-03-02-18-38).  If I remove
bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
find that:
* 2.6.33 boots fine.
* 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
* 2.6.33 + mmotm (including
bootmem-avoid-dma32-zone-by-default.patch): panics.
Note: I had to enable earlyprintk to see the panic.  Without
earlyprintk no console output was seen.  The system appeared to hang
after the loader.

Here's the panic seen with earlyprintk using 2.6.33 + mmotm:

Starting up ...
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.33-mm1+
(gthelen@ninji.mtv.corp.google.com) (gcc version 4.2.4 (Ubuntu
4.2.4-1ubuntu4)) #1 SMP Thu Mar 4 12:03:29 PST 2010
[    0.000000] Command line:
root=UUID=a77f406a-7cc7-4f49-9cc2-818b2b4159ae ro console=tty0
console=ttyS0,115200n8 earlyprintk=serial,ttyS0,9600
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
[    0.000000]  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 000000000fff0000 (usable)
[    0.000000]  BIOS-e820: 000000000fff0000 - 0000000010000000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000fffbd000 - 0000000100000000 (reserved)
[    0.000000] bootconsole [earlyser0] enabled
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.4 present.
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0xfff0 max_arch_pfn = 0x400000000
[    0.000000] PAT not supported by CPU.
[    0.000000] CPU MTRRs all blank - virtualized system.
[    0.000000] Scanning 1 areas for low memory corruption
[    0.000000] modified physical RAM map:
[    0.000000]  modified: 0000000000000000 - 0000000000010000 (reserved)
[    0.000000]  modified: 0000000000010000 - 000000000009fc00 (usable)
[    0.000000]  modified: 000000000009fc00 - 00000000000a0000 (reserved)
[    0.000000]  modified: 00000000000e8000 - 0000000000100000 (reserved)
[    0.000000]  modified: 0000000000100000 - 000000000fff0000 (usable)
[    0.000000]  modified: 000000000fff0000 - 0000000010000000 (ACPI data)
[    0.000000]  modified: 00000000fffbd000 - 0000000100000000 (reserved)
[    0.000000] init_memory_mapping: 0000000000000000-000000000fff0000
[    0.000000] RAMDISK: 0fd9d000 - 0ffdf539
[    0.000000] ACPI: RSDP 00000000000fb450 00014 (v00 QEMU  )
[    0.000000] ACPI: RSDT 000000000fff0000 00030 (v01 QEMU   QEMURSDT
00000001 QEMU 00000001)
[    0.000000] ACPI: FACP 000000000fff0030 00074 (v01 QEMU   QEMUFACP
00000001 QEMU 00000001)
[    0.000000] ACPI: DSDT 000000000fff0100 0089D (v01   BXPC   BXDSDT
00000001 INTL 20061109)
[    0.000000] ACPI: FACS 000000000fff00c0 00040
[    0.000000] ACPI: APIC 000000000fff09d8 00068 (v01 QEMU   QEMUAPIC
00000001 QEMU 00000001)
[    0.000000] ACPI: SSDT 000000000fff099d 00037 (v01 QEMU   QEMUSSDT
00000001 QEMU 00000001)
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at 0000000000000000-000000000fff0000
[    0.000000] Initmem setup node 0 0000000000000000-000000000fff0000
[    0.000000]   NODE_DATA [0000000001c4e040 - 0000000001c5303f]
[    0.000000] BUG: unable to handle kernel NULL pointer dereference at (null)
[    0.000000] IP: [<ffffffff81b0f5f7>] memory_present+0x9a/0xbf
[    0.000000] PGD 0
[    0.000000] Oops: 0000 [#1] SMP
[    0.000000] last sysfs file:
[    0.000000] CPU 0
[    0.000000] Modules linked in:
[    0.000000]
[    0.000000] Pid: 0, comm: swapper Not tainted 2.6.33-mm1+ #1 /
[    0.000000] RIP: 0010:[<ffffffff81b0f5f7>]  [<ffffffff81b0f5f7>]
memory_present+0x9a/0xbf
[    0.000000] RSP: 0000:ffffffff81a01e18  EFLAGS: 00010046
[    0.000000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
[    0.000000] RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
[    0.000000] RBP: ffffffff81a01e58 R08: ffffffffffffffff R09: 0000000000000040
[    0.000000] R10: ffff880001c4e040 R11: 0000000000004100 R12: 0000000000000000
[    0.000000] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[    0.000000] FS:  0000000000000000(0000) GS:ffffffff81adf000(0000)
knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: 0000000000000000 CR3: 0000000001a08000 CR4: 00000000000000b0
[    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.000000] Process swapper (pid: 0, threadinfo ffffffff81a00000,
task ffffffff81a10020)
[    0.000000] Stack:
[    0.000000]  000000000fff0000 000000000000009f 0000000000000000
0000000000000000
[    0.000000] <0> 0000000000000040 ffffffff81a01ef8 0000000000000000
0000000000000000
[    0.000000] <0> ffffffff81a01e78 ffffffff81b0dd0e ffffffff81a01e88
000000000fff0000
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff81b0dd0e>]
sparse_memory_present_with_active_regions+0x31/0x47
[    0.000000]  [<ffffffff81b0688a>] paging_init+0x3f/0x5b
[    0.000000]  [<ffffffff81af81a7>] setup_arch+0x964/0xa03
[    0.000000]  [<ffffffff8103014a>] ? need_resched+0x1e/0x28
[    0.000000]  [<ffffffff8103015d>] ? should_resched+0x9/0x2a
[    0.000000]  [<ffffffff8152de24>] ? _cond_resched+0x9/0x1d
[    0.000000]  [<ffffffff81af4a34>] start_kernel+0x9f/0x382
[    0.000000]  [<ffffffff81af4299>] x86_64_start_reservations+0xa9/0xad
[    0.000000]  [<ffffffff81af4383>] x86_64_start_kernel+0xe6/0xed
[    0.000000] Code: c7 00 56 c2 81 e8 a0 f9 a1 ff 48 83 3c dd 00 16
c2 81 00 75 08 4c 89 2c dd 00 16 c2 81 fe 05 11 60 11 00 4c 89 ff e8
85 3b 5c ff <48> 83 38 00 75 03 4c 89 30 49 81 c4 00 80 00 00 4c 3b 65
c8 72
[    0.000000] RIP  [<ffffffff81b0f5f7>] memory_present+0x9a/0xbf
[    0.000000]  RSP <ffffffff81a01e18>
[    0.000000] CR2: 0000000000000000
[    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Pid: 0, comm: swapper Tainted: G      D    2.6.33-mm1+ #1
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff8103c78c>] panic+0x9e/0x113
[    0.000000]  [<ffffffff8103d3d6>] ? printk+0x67/0x69
[    0.000000]  [<ffffffff8105914e>] ? blocking_notifier_call_chain+0xf/0x11
[    0.000000]  [<ffffffff8103f8b4>] do_exit+0x78/0x70f
[    0.000000]  [<ffffffff8103ca2f>] ? spin_unlock_irqrestore+0x9/0xb
[    0.000000]  [<ffffffff8103dcde>] ? kmsg_dump+0x112/0x138
[    0.000000]  [<ffffffff81530061>] oops_end+0xb2/0xba
[    0.000000]  [<ffffffff810258d3>] no_context+0x1f5/0x204
[    0.000000]  [<ffffffff81025b1b>] __bad_area_nosemaphore+0x17f/0x1a2
[    0.000000]  [<ffffffff81025bb4>] bad_area_nosemaphore+0xe/0x10
[    0.000000]  [<ffffffff81531e36>] do_page_fault+0x122/0x24c
[    0.000000]  [<ffffffff8152f59f>] page_fault+0x1f/0x30
[    0.000000]  [<ffffffff81b0f5f7>] ? memory_present+0x9a/0xbf
[    0.000000]  [<ffffffff81b0f5f7>] ? memory_present+0x9a/0xbf
[    0.000000]  [<ffffffff81b0dd0e>]
sparse_memory_present_with_active_regions+0x31/0x47
[    0.000000]  [<ffffffff81b0688a>] paging_init+0x3f/0x5b
[    0.000000]  [<ffffffff81af81a7>] setup_arch+0x964/0xa03
[    0.000000]  [<ffffffff8103014a>] ? need_resched+0x1e/0x28
[    0.000000]  [<ffffffff8103015d>] ? should_resched+0x9/0x2a
[    0.000000]  [<ffffffff8152de24>] ? _cond_resched+0x9/0x1d
[    0.000000]  [<ffffffff81af4a34>] start_kernel+0x9f/0x382
[    0.000000]  [<ffffffff81af4299>] x86_64_start_reservations+0xa9/0xad
[    0.000000]  [<ffffffff81af4383>] x86_64_start_kernel+0xe6/0xed

The kernel was built with 'make mrproper && make defconfig && make
ARCH=x86_64 CONFIG=smp -j 6'.  This panic is seen on every attempt, so
I can provide more diagnostics.

--
Greg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-04 21:21 mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Greg Thelen
@ 2010-03-05  3:21 ` Johannes Weiner
  2010-03-05  5:00   ` Yinghai Lu
                     ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Johannes Weiner @ 2010-03-05  3:21 UTC (permalink / raw)
  To: Greg Thelen; +Cc: Yinghai Lu, linux-mm

Hello Greg,

On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
> On several systems I am seeing a boot panic if I use mmotm
> (stamp-2010-03-02-18-38).  If I remove
> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
> find that:
> * 2.6.33 boots fine.
> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
> * 2.6.33 + mmotm (including
> bootmem-avoid-dma32-zone-by-default.patch): panics.
> Note: I had to enable earlyprintk to see the panic.  Without
> earlyprintk no console output was seen.  The system appeared to hang
> after the loader.

Thanks for your report.  A few notes below.

> Here's the panic seen with earlyprintk using 2.6.33 + mmotm:
> 
> Starting up ...
> [    0.000000] Initializing cgroup subsys cpuset
> [    0.000000] Initializing cgroup subsys cpu
> [    0.000000] Linux version 2.6.33-mm1+
> (gthelen@ninji.mtv.corp.google.com) (gcc version 4.2.4 (Ubuntu
> 4.2.4-1ubuntu4)) #1 SMP Thu Mar 4 12:03:29 PST 2010
> [    0.000000] Command line:
> root=UUID=a77f406a-7cc7-4f49-9cc2-818b2b4159ae ro console=tty0
> console=ttyS0,115200n8 earlyprintk=serial,ttyS0,9600
> [    0.000000] BIOS-provided physical RAM map:
> [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
> [    0.000000]  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
> [    0.000000]  BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
> [    0.000000]  BIOS-e820: 0000000000100000 - 000000000fff0000 (usable)
> [    0.000000]  BIOS-e820: 000000000fff0000 - 0000000010000000 (ACPI data)
> [    0.000000]  BIOS-e820: 00000000fffbd000 - 0000000100000000 (reserved)
> [    0.000000] bootconsole [earlyser0] enabled
> [    0.000000] NX (Execute Disable) protection: active
> [    0.000000] DMI 2.4 present.
> [    0.000000] No AGP bridge found
> [    0.000000] last_pfn = 0xfff0 max_arch_pfn = 0x400000000
> [    0.000000] PAT not supported by CPU.
> [    0.000000] CPU MTRRs all blank - virtualized system.
> [    0.000000] Scanning 1 areas for low memory corruption
> [    0.000000] modified physical RAM map:
> [    0.000000]  modified: 0000000000000000 - 0000000000010000 (reserved)
> [    0.000000]  modified: 0000000000010000 - 000000000009fc00 (usable)
> [    0.000000]  modified: 000000000009fc00 - 00000000000a0000 (reserved)
> [    0.000000]  modified: 00000000000e8000 - 0000000000100000 (reserved)
> [    0.000000]  modified: 0000000000100000 - 000000000fff0000 (usable)
> [    0.000000]  modified: 000000000fff0000 - 0000000010000000 (ACPI data)
> [    0.000000]  modified: 00000000fffbd000 - 0000000100000000 (reserved)
> [    0.000000] init_memory_mapping: 0000000000000000-000000000fff0000

256MB of memory, right?

> [    0.000000] RAMDISK: 0fd9d000 - 0ffdf539
> [    0.000000] ACPI: RSDP 00000000000fb450 00014 (v00 QEMU  )
> [    0.000000] ACPI: RSDT 000000000fff0000 00030 (v01 QEMU   QEMURSDT
> 00000001 QEMU 00000001)
> [    0.000000] ACPI: FACP 000000000fff0030 00074 (v01 QEMU   QEMUFACP
> 00000001 QEMU 00000001)
> [    0.000000] ACPI: DSDT 000000000fff0100 0089D (v01   BXPC   BXDSDT
> 00000001 INTL 20061109)
> [    0.000000] ACPI: FACS 000000000fff00c0 00040
> [    0.000000] ACPI: APIC 000000000fff09d8 00068 (v01 QEMU   QEMUAPIC
> 00000001 QEMU 00000001)
> [    0.000000] ACPI: SSDT 000000000fff099d 00037 (v01 QEMU   QEMUSSDT
> 00000001 QEMU 00000001)
> [    0.000000] No NUMA configuration found
> [    0.000000] Faking a node at 0000000000000000-000000000fff0000
> [    0.000000] Initmem setup node 0 0000000000000000-000000000fff0000
> [    0.000000]   NODE_DATA [0000000001c4e040 - 0000000001c5303f]
> [    0.000000] BUG: unable to handle kernel NULL pointer dereference at (null)
> [    0.000000] IP: [<ffffffff81b0f5f7>] memory_present+0x9a/0xbf
> [    0.000000] PGD 0
> [    0.000000] Oops: 0000 [#1] SMP
> [    0.000000] last sysfs file:
> [    0.000000] CPU 0
> [    0.000000] Modules linked in:
> [    0.000000]
> [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.33-mm1+ #1 /
> [    0.000000] RIP: 0010:[<ffffffff81b0f5f7>]  [<ffffffff81b0f5f7>]
> memory_present+0x9a/0xbf
> [    0.000000] RSP: 0000:ffffffff81a01e18  EFLAGS: 00010046
> [    0.000000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> [    0.000000] RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
> [    0.000000] RBP: ffffffff81a01e58 R08: ffffffffffffffff R09: 0000000000000040
> [    0.000000] R10: ffff880001c4e040 R11: 0000000000004100 R12: 0000000000000000
> [    0.000000] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
> [    0.000000] FS:  0000000000000000(0000) GS:ffffffff81adf000(0000)
> knlGS:0000000000000000
> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.000000] CR2: 0000000000000000 CR3: 0000000001a08000 CR4: 00000000000000b0
> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [    0.000000] Process swapper (pid: 0, threadinfo ffffffff81a00000,
> task ffffffff81a10020)
> [    0.000000] Stack:
> [    0.000000]  000000000fff0000 000000000000009f 0000000000000000
> 0000000000000000
> [    0.000000] <0> 0000000000000040 ffffffff81a01ef8 0000000000000000
> 0000000000000000
> [    0.000000] <0> ffffffff81a01e78 ffffffff81b0dd0e ffffffff81a01e88
> 000000000fff0000
> [    0.000000] Call Trace:
> [    0.000000]  [<ffffffff81b0dd0e>]
> sparse_memory_present_with_active_regions+0x31/0x47
> [    0.000000]  [<ffffffff81b0688a>] paging_init+0x3f/0x5b
> [    0.000000]  [<ffffffff81af81a7>] setup_arch+0x964/0xa03
> [    0.000000]  [<ffffffff8103014a>] ? need_resched+0x1e/0x28
> [    0.000000]  [<ffffffff8103015d>] ? should_resched+0x9/0x2a
> [    0.000000]  [<ffffffff8152de24>] ? _cond_resched+0x9/0x1d
> [    0.000000]  [<ffffffff81af4a34>] start_kernel+0x9f/0x382
> [    0.000000]  [<ffffffff81af4299>] x86_64_start_reservations+0xa9/0xad
> [    0.000000]  [<ffffffff81af4383>] x86_64_start_kernel+0xe6/0xed
> [    0.000000] Code: c7 00 56 c2 81 e8 a0 f9 a1 ff 48 83 3c dd 00 16
> c2 81 00 75 08 4c 89 2c dd 00 16 c2 81 fe 05 11 60 11 00 4c 89 ff e8
> 85 3b 5c ff <48> 83 38 00 75 03 4c 89 30 49 81 c4 00 80 00 00 4c 3b 65
> c8 72
> [    0.000000] RIP  [<ffffffff81b0f5f7>] memory_present+0x9a/0xbf
> [    0.000000]  RSP <ffffffff81a01e18>
> [    0.000000] CR2: 0000000000000000
> [    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> [    0.000000] Pid: 0, comm: swapper Tainted: G      D    2.6.33-mm1+ #1
> [    0.000000] Call Trace:
> [    0.000000]  [<ffffffff8103c78c>] panic+0x9e/0x113
> [    0.000000]  [<ffffffff8103d3d6>] ? printk+0x67/0x69
> [    0.000000]  [<ffffffff8105914e>] ? blocking_notifier_call_chain+0xf/0x11
> [    0.000000]  [<ffffffff8103f8b4>] do_exit+0x78/0x70f
> [    0.000000]  [<ffffffff8103ca2f>] ? spin_unlock_irqrestore+0x9/0xb
> [    0.000000]  [<ffffffff8103dcde>] ? kmsg_dump+0x112/0x138
> [    0.000000]  [<ffffffff81530061>] oops_end+0xb2/0xba
> [    0.000000]  [<ffffffff810258d3>] no_context+0x1f5/0x204
> [    0.000000]  [<ffffffff81025b1b>] __bad_area_nosemaphore+0x17f/0x1a2
> [    0.000000]  [<ffffffff81025bb4>] bad_area_nosemaphore+0xe/0x10
> [    0.000000]  [<ffffffff81531e36>] do_page_fault+0x122/0x24c
> [    0.000000]  [<ffffffff8152f59f>] page_fault+0x1f/0x30
> [    0.000000]  [<ffffffff81b0f5f7>] ? memory_present+0x9a/0xbf
> [    0.000000]  [<ffffffff81b0f5f7>] ? memory_present+0x9a/0xbf
> [    0.000000]  [<ffffffff81b0dd0e>]
> sparse_memory_present_with_active_regions+0x31/0x47
> [    0.000000]  [<ffffffff81b0688a>] paging_init+0x3f/0x5b
> [    0.000000]  [<ffffffff81af81a7>] setup_arch+0x964/0xa03
> [    0.000000]  [<ffffffff8103014a>] ? need_resched+0x1e/0x28
> [    0.000000]  [<ffffffff8103015d>] ? should_resched+0x9/0x2a
> [    0.000000]  [<ffffffff8152de24>] ? _cond_resched+0x9/0x1d
> [    0.000000]  [<ffffffff81af4a34>] start_kernel+0x9f/0x382
> [    0.000000]  [<ffffffff81af4299>] x86_64_start_reservations+0xa9/0xad
> [    0.000000]  [<ffffffff81af4383>] x86_64_start_kernel+0xe6/0xed
> 
> The kernel was built with 'make mrproper && make defconfig && make
> ARCH=x86_64 CONFIG=smp -j 6'.  This panic is seen on every attempt, so
> I can provide more diagnostics.

Okay, if you did defconfig and just hit enter to all questions, you
should have SPARSEMEM_EXTREME and NO_BOOTMEM enabled.  This means that
the 'mem_section' is an array of pointers and the following happens in
memory_present():

	for_one_pfn_in_each_section() {
		sparse_index_init(); /* no return value check */
		ms = __nr_to_section();
		if (!ms->section_mem_map) /* bang */
			...;
	}

where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate
the mem_section descriptor with bootmem.  If this would fail, the box
would panic immediately earlier, but NO_BOOTMEM does not seem to get it
right.

Greg, could you retry _with_ my bootmem patch applied, but with setting
CONFIG_NO_BOOTMEM=n up front?

I think NO_BOOTMEM has several problems.  Yinghai, can you verify them?

1. It does not seem to handle goal appropriately: bootmem would try
without the goal if it does not make sense.  And in this case, the
goal is 4G (above DMA32) and the amount of memory is 256M.

And if I did not miss something, this is the difference with my patch:
without it, the default goal is 16M, which is no problem as it is well
within your available memory.  But the change of the default goal moved
it outside it which the bootmem replacement can not handle.

2. The early reservation stuff seems to return NULL but callsites assume
that the bootmem interface never does that.  Okay, the result is the same,
we crash.  But it still moves error reporting to a possibly much later
point where somebody actually dereferences the returned pointer.

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05  3:21 ` Johannes Weiner
@ 2010-03-05  5:00   ` Yinghai Lu
  2010-03-05  5:14     ` Yinghai Lu
  2010-03-05  5:17   ` Greg Thelen
  2010-03-05  9:04     ` Yinghai Lu
  2 siblings, 1 reply; 48+ messages in thread
From: Yinghai Lu @ 2010-03-05  5:00 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Greg Thelen, linux-mm

On 03/04/2010 07:21 PM, Johannes Weiner wrote:
> Hello Greg,
> 
> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>> On several systems I am seeing a boot panic if I use mmotm
>> (stamp-2010-03-02-18-38).  If I remove
>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
>> find that:
>> * 2.6.33 boots fine.
>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>> * 2.6.33 + mmotm (including
>> bootmem-avoid-dma32-zone-by-default.patch): panics.
>> Note: I had to enable earlyprintk to see the panic.  Without
>> earlyprintk no console output was seen.  The system appeared to hang
>> after the loader.
> 
> Thanks for your report.  A few notes below.
> 
>> Here's the panic seen with earlyprintk using 2.6.33 + mmotm:
>>
>> Starting up ...
>> [    0.000000] Initializing cgroup subsys cpuset
>> [    0.000000] Initializing cgroup subsys cpu
>> [    0.000000] Linux version 2.6.33-mm1+
>> (gthelen@ninji.mtv.corp.google.com) (gcc version 4.2.4 (Ubuntu
>> 4.2.4-1ubuntu4)) #1 SMP Thu Mar 4 12:03:29 PST 2010
>> [    0.000000] Command line:
>> root=UUID=a77f406a-7cc7-4f49-9cc2-818b2b4159ae ro console=tty0
>> console=ttyS0,115200n8 earlyprintk=serial,ttyS0,9600
>> [    0.000000] BIOS-provided physical RAM map:
>> [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
>> [    0.000000]  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
>> [    0.000000]  BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
>> [    0.000000]  BIOS-e820: 0000000000100000 - 000000000fff0000 (usable)
>> [    0.000000]  BIOS-e820: 000000000fff0000 - 0000000010000000 (ACPI data)
>> [    0.000000]  BIOS-e820: 00000000fffbd000 - 0000000100000000 (reserved)
>> [    0.000000] bootconsole [earlyser0] enabled
>> [    0.000000] NX (Execute Disable) protection: active
>> [    0.000000] DMI 2.4 present.
>> [    0.000000] No AGP bridge found
>> [    0.000000] last_pfn = 0xfff0 max_arch_pfn = 0x400000000
>> [    0.000000] PAT not supported by CPU.
>> [    0.000000] CPU MTRRs all blank - virtualized system.
>> [    0.000000] Scanning 1 areas for low memory corruption
>> [    0.000000] modified physical RAM map:
>> [    0.000000]  modified: 0000000000000000 - 0000000000010000 (reserved)
>> [    0.000000]  modified: 0000000000010000 - 000000000009fc00 (usable)
>> [    0.000000]  modified: 000000000009fc00 - 00000000000a0000 (reserved)
>> [    0.000000]  modified: 00000000000e8000 - 0000000000100000 (reserved)
>> [    0.000000]  modified: 0000000000100000 - 000000000fff0000 (usable)
>> [    0.000000]  modified: 000000000fff0000 - 0000000010000000 (ACPI data)
>> [    0.000000]  modified: 00000000fffbd000 - 0000000100000000 (reserved)
>> [    0.000000] init_memory_mapping: 0000000000000000-000000000fff0000
> 
> 256MB of memory, right?
> 
>> [    0.000000] RAMDISK: 0fd9d000 - 0ffdf539
>> [    0.000000] ACPI: RSDP 00000000000fb450 00014 (v00 QEMU  )
>> [    0.000000] ACPI: RSDT 000000000fff0000 00030 (v01 QEMU   QEMURSDT
>> 00000001 QEMU 00000001)
>> [    0.000000] ACPI: FACP 000000000fff0030 00074 (v01 QEMU   QEMUFACP
>> 00000001 QEMU 00000001)
>> [    0.000000] ACPI: DSDT 000000000fff0100 0089D (v01   BXPC   BXDSDT
>> 00000001 INTL 20061109)
>> [    0.000000] ACPI: FACS 000000000fff00c0 00040
>> [    0.000000] ACPI: APIC 000000000fff09d8 00068 (v01 QEMU   QEMUAPIC
>> 00000001 QEMU 00000001)
>> [    0.000000] ACPI: SSDT 000000000fff099d 00037 (v01 QEMU   QEMUSSDT
>> 00000001 QEMU 00000001)
>> [    0.000000] No NUMA configuration found
>> [    0.000000] Faking a node at 0000000000000000-000000000fff0000
>> [    0.000000] Initmem setup node 0 0000000000000000-000000000fff0000
>> [    0.000000]   NODE_DATA [0000000001c4e040 - 0000000001c5303f]
>> [    0.000000] BUG: unable to handle kernel NULL pointer dereference at (null)
>> [    0.000000] IP: [<ffffffff81b0f5f7>] memory_present+0x9a/0xbf
>> [    0.000000] PGD 0
>> [    0.000000] Oops: 0000 [#1] SMP
>> [    0.000000] last sysfs file:
>> [    0.000000] CPU 0
>> [    0.000000] Modules linked in:
>> [    0.000000]
>> [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.33-mm1+ #1 /
>> [    0.000000] RIP: 0010:[<ffffffff81b0f5f7>]  [<ffffffff81b0f5f7>]
>> memory_present+0x9a/0xbf
>> [    0.000000] RSP: 0000:ffffffff81a01e18  EFLAGS: 00010046
>> [    0.000000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
>> [    0.000000] RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
>> [    0.000000] RBP: ffffffff81a01e58 R08: ffffffffffffffff R09: 0000000000000040
>> [    0.000000] R10: ffff880001c4e040 R11: 0000000000004100 R12: 0000000000000000
>> [    0.000000] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffff81adf000(0000)
>> knlGS:0000000000000000
>> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.000000] CR2: 0000000000000000 CR3: 0000000001a08000 CR4: 00000000000000b0
>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [    0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [    0.000000] Process swapper (pid: 0, threadinfo ffffffff81a00000,
>> task ffffffff81a10020)
>> [    0.000000] Stack:
>> [    0.000000]  000000000fff0000 000000000000009f 0000000000000000
>> 0000000000000000
>> [    0.000000] <0> 0000000000000040 ffffffff81a01ef8 0000000000000000
>> 0000000000000000
>> [    0.000000] <0> ffffffff81a01e78 ffffffff81b0dd0e ffffffff81a01e88
>> 000000000fff0000
>> [    0.000000] Call Trace:
>> [    0.000000]  [<ffffffff81b0dd0e>]
>> sparse_memory_present_with_active_regions+0x31/0x47
>> [    0.000000]  [<ffffffff81b0688a>] paging_init+0x3f/0x5b
>> [    0.000000]  [<ffffffff81af81a7>] setup_arch+0x964/0xa03
>> [    0.000000]  [<ffffffff8103014a>] ? need_resched+0x1e/0x28
>> [    0.000000]  [<ffffffff8103015d>] ? should_resched+0x9/0x2a
>> [    0.000000]  [<ffffffff8152de24>] ? _cond_resched+0x9/0x1d
>> [    0.000000]  [<ffffffff81af4a34>] start_kernel+0x9f/0x382
>> [    0.000000]  [<ffffffff81af4299>] x86_64_start_reservations+0xa9/0xad
>> [    0.000000]  [<ffffffff81af4383>] x86_64_start_kernel+0xe6/0xed
>> [    0.000000] Code: c7 00 56 c2 81 e8 a0 f9 a1 ff 48 83 3c dd 00 16
>> c2 81 00 75 08 4c 89 2c dd 00 16 c2 81 fe 05 11 60 11 00 4c 89 ff e8
>> 85 3b 5c ff <48> 83 38 00 75 03 4c 89 30 49 81 c4 00 80 00 00 4c 3b 65
>> c8 72
>> [    0.000000] RIP  [<ffffffff81b0f5f7>] memory_present+0x9a/0xbf
>> [    0.000000]  RSP <ffffffff81a01e18>
>> [    0.000000] CR2: 0000000000000000
>> [    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
>> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
>> [    0.000000] Pid: 0, comm: swapper Tainted: G      D    2.6.33-mm1+ #1
>> [    0.000000] Call Trace:
>> [    0.000000]  [<ffffffff8103c78c>] panic+0x9e/0x113
>> [    0.000000]  [<ffffffff8103d3d6>] ? printk+0x67/0x69
>> [    0.000000]  [<ffffffff8105914e>] ? blocking_notifier_call_chain+0xf/0x11
>> [    0.000000]  [<ffffffff8103f8b4>] do_exit+0x78/0x70f
>> [    0.000000]  [<ffffffff8103ca2f>] ? spin_unlock_irqrestore+0x9/0xb
>> [    0.000000]  [<ffffffff8103dcde>] ? kmsg_dump+0x112/0x138
>> [    0.000000]  [<ffffffff81530061>] oops_end+0xb2/0xba
>> [    0.000000]  [<ffffffff810258d3>] no_context+0x1f5/0x204
>> [    0.000000]  [<ffffffff81025b1b>] __bad_area_nosemaphore+0x17f/0x1a2
>> [    0.000000]  [<ffffffff81025bb4>] bad_area_nosemaphore+0xe/0x10
>> [    0.000000]  [<ffffffff81531e36>] do_page_fault+0x122/0x24c
>> [    0.000000]  [<ffffffff8152f59f>] page_fault+0x1f/0x30
>> [    0.000000]  [<ffffffff81b0f5f7>] ? memory_present+0x9a/0xbf
>> [    0.000000]  [<ffffffff81b0f5f7>] ? memory_present+0x9a/0xbf
>> [    0.000000]  [<ffffffff81b0dd0e>]
>> sparse_memory_present_with_active_regions+0x31/0x47
>> [    0.000000]  [<ffffffff81b0688a>] paging_init+0x3f/0x5b
>> [    0.000000]  [<ffffffff81af81a7>] setup_arch+0x964/0xa03
>> [    0.000000]  [<ffffffff8103014a>] ? need_resched+0x1e/0x28
>> [    0.000000]  [<ffffffff8103015d>] ? should_resched+0x9/0x2a
>> [    0.000000]  [<ffffffff8152de24>] ? _cond_resched+0x9/0x1d
>> [    0.000000]  [<ffffffff81af4a34>] start_kernel+0x9f/0x382
>> [    0.000000]  [<ffffffff81af4299>] x86_64_start_reservations+0xa9/0xad
>> [    0.000000]  [<ffffffff81af4383>] x86_64_start_kernel+0xe6/0xed
>>
>> The kernel was built with 'make mrproper && make defconfig && make
>> ARCH=x86_64 CONFIG=smp -j 6'.  This panic is seen on every attempt, so
>> I can provide more diagnostics.
> 
> Okay, if you did defconfig and just hit enter to all questions, you
> should have SPARSEMEM_EXTREME and NO_BOOTMEM enabled.  This means that
> the 'mem_section' is an array of pointers and the following happens in
> memory_present():
> 
> 	for_one_pfn_in_each_section() {
> 		sparse_index_init(); /* no return value check */
> 		ms = __nr_to_section();
> 		if (!ms->section_mem_map) /* bang */
> 			...;
> 	}
> 
> where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate
> the mem_section descriptor with bootmem.  If this would fail, the box
> would panic immediately earlier, but NO_BOOTMEM does not seem to get it
> right.
> 
> Greg, could you retry _with_ my bootmem patch applied, but with setting
> CONFIG_NO_BOOTMEM=n up front?
> 
> I think NO_BOOTMEM has several problems.  Yinghai, can you verify them?
> 
> 1. It does not seem to handle goal appropriately: bootmem would try
> without the goal if it does not make sense.  And in this case, the
> goal is 4G (above DMA32) and the amount of memory is 256M.
> 
> And if I did not miss something, this is the difference with my patch:
> without it, the default goal is 16M, which is no problem as it is well
> within your available memory.  But the change of the default goal moved
> it outside it which the bootmem replacement can not handle.
> 
> 2. The early reservation stuff seems to return NULL but callsites assume
> that the bootmem interface never does that.  Okay, the result is the same,
> we crash.  But it still moves error reporting to a possibly much later
> point where somebody actually dereferences the returned pointer.

related change could be: __alloc_bootmem_node_high...

void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
                                   unsigned long align, unsigned long goal)
{
#ifdef MAX_DMA32_PFN
        unsigned long end_pfn;

        if (WARN_ON_ONCE(slab_is_available()))
                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);

        /* update goal according ...MAX_DMA32_PFN */
        end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;

        if (end_pfn > MAX_DMA32_PFN + (128 >> (20 - PAGE_SHIFT)) &&
            (goal >> PAGE_SHIFT) < MAX_DMA32_PFN) {
                void *ptr; 
                unsigned long new_goal;
                                
                new_goal = MAX_DMA32_PFN << PAGE_SHIFT;
#ifdef CONFIG_NO_BOOTMEM
                ptr =  __alloc_memory_core_early(pgdat->node_id, size, align,
                                                 new_goal, -1ULL);
#else
                ptr = alloc_bootmem_core(pgdat->bdata, size, align,
                                                 new_goal, 0);
#endif
                if (ptr)
                        return ptr;
        }
#endif

        return __alloc_bootmem_node(pgdat, size, align, goal);

}

also __alloc_bootmem_node will not fallback...if you specify one big goal.

static void * __init_refok __earlyonly_bootmem_alloc(int node,
                                unsigned long size,
                                unsigned long align,
                                unsigned long goal)
{       
        return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal);
}
        
static void *vmemmap_buf;
static void *vmemmap_buf_end;
                
void * __meminit vmemmap_alloc_block(unsigned long size, int node)
{               
        /* If the main allocator is up use that, fallback to bootmem. */
        if (slab_is_available()) {
                struct page *page;               

                if (node_state(node, N_HIGH_MEMORY))
                        page = alloc_pages_node(node,
                                GFP_KERNEL | __GFP_ZERO, get_order(size));
                else
                        page = alloc_pages(GFP_KERNEL | __GFP_ZERO,
                                get_order(size));
                if (page)
                        return page_address(page);
                return NULL;
        } else
                return __earlyonly_bootmem_alloc(node, size, size,
                                __pa(MAX_DMA_ADDRESS));
}

so you patch change the goal in vmemmap_alloc_block ?

YH

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05  5:00   ` Yinghai Lu
@ 2010-03-05  5:14     ` Yinghai Lu
  2010-03-05 12:51       ` Johannes Weiner
  0 siblings, 1 reply; 48+ messages in thread
From: Yinghai Lu @ 2010-03-05  5:14 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Greg Thelen, linux-mm

On 03/04/2010 09:00 PM, Yinghai Lu wrote:
> On 03/04/2010 07:21 PM, Johannes Weiner wrote:
>> Hello Greg,
>>
>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>>> On several systems I am seeing a boot panic if I use mmotm
>>> (stamp-2010-03-02-18-38).  If I remove
>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
>>> find that:
>>> * 2.6.33 boots fine.
>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>>> * 2.6.33 + mmotm (including
>>> bootmem-avoid-dma32-zone-by-default.patch): panics.
>>> Note: I had to enable earlyprintk to see the panic.  Without
>>> earlyprintk no console output was seen.  The system appeared to hang
>>> after the loader.
>>
>> Thanks for your report.  A few notes below.
>>
>>> Here's the panic seen with earlyprintk using 2.6.33 + mmotm:
>>>
>>> Starting up ...
>>> [    0.000000] Initializing cgroup subsys cpuset
>>> [    0.000000] Initializing cgroup subsys cpu
>>> [    0.000000] Linux version 2.6.33-mm1+
>>> (gthelen@ninji.mtv.corp.google.com) (gcc version 4.2.4 (Ubuntu
>>> 4.2.4-1ubuntu4)) #1 SMP Thu Mar 4 12:03:29 PST 2010
>>> [    0.000000] Command line:
>>> root=UUID=a77f406a-7cc7-4f49-9cc2-818b2b4159ae ro console=tty0
>>> console=ttyS0,115200n8 earlyprintk=serial,ttyS0,9600
>>> [    0.000000] BIOS-provided physical RAM map:
>>> [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
>>> [    0.000000]  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
>>> [    0.000000]  BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
>>> [    0.000000]  BIOS-e820: 0000000000100000 - 000000000fff0000 (usable)
>>> [    0.000000]  BIOS-e820: 000000000fff0000 - 0000000010000000 (ACPI data)
>>> [    0.000000]  BIOS-e820: 00000000fffbd000 - 0000000100000000 (reserved)
>>> [    0.000000] bootconsole [earlyser0] enabled
>>> [    0.000000] NX (Execute Disable) protection: active
>>> [    0.000000] DMI 2.4 present.
>>> [    0.000000] No AGP bridge found
>>> [    0.000000] last_pfn = 0xfff0 max_arch_pfn = 0x400000000
>>> [    0.000000] PAT not supported by CPU.
>>> [    0.000000] CPU MTRRs all blank - virtualized system.
>>> [    0.000000] Scanning 1 areas for low memory corruption
>>> [    0.000000] modified physical RAM map:
>>> [    0.000000]  modified: 0000000000000000 - 0000000000010000 (reserved)
>>> [    0.000000]  modified: 0000000000010000 - 000000000009fc00 (usable)
>>> [    0.000000]  modified: 000000000009fc00 - 00000000000a0000 (reserved)
>>> [    0.000000]  modified: 00000000000e8000 - 0000000000100000 (reserved)
>>> [    0.000000]  modified: 0000000000100000 - 000000000fff0000 (usable)
>>> [    0.000000]  modified: 000000000fff0000 - 0000000010000000 (ACPI data)
>>> [    0.000000]  modified: 00000000fffbd000 - 0000000100000000 (reserved)
>>> [    0.000000] init_memory_mapping: 0000000000000000-000000000fff0000
>>
>> 256MB of memory, right?
>>
>>> [    0.000000] RAMDISK: 0fd9d000 - 0ffdf539
>>> [    0.000000] ACPI: RSDP 00000000000fb450 00014 (v00 QEMU  )
>>> [    0.000000] ACPI: RSDT 000000000fff0000 00030 (v01 QEMU   QEMURSDT
>>> 00000001 QEMU 00000001)
>>> [    0.000000] ACPI: FACP 000000000fff0030 00074 (v01 QEMU   QEMUFACP
>>> 00000001 QEMU 00000001)
>>> [    0.000000] ACPI: DSDT 000000000fff0100 0089D (v01   BXPC   BXDSDT
>>> 00000001 INTL 20061109)
>>> [    0.000000] ACPI: FACS 000000000fff00c0 00040
>>> [    0.000000] ACPI: APIC 000000000fff09d8 00068 (v01 QEMU   QEMUAPIC
>>> 00000001 QEMU 00000001)
>>> [    0.000000] ACPI: SSDT 000000000fff099d 00037 (v01 QEMU   QEMUSSDT
>>> 00000001 QEMU 00000001)
>>> [    0.000000] No NUMA configuration found
>>> [    0.000000] Faking a node at 0000000000000000-000000000fff0000
>>> [    0.000000] Initmem setup node 0 0000000000000000-000000000fff0000
>>> [    0.000000]   NODE_DATA [0000000001c4e040 - 0000000001c5303f]
>>> [    0.000000] BUG: unable to handle kernel NULL pointer dereference at (null)
>>> [    0.000000] IP: [<ffffffff81b0f5f7>] memory_present+0x9a/0xbf
>>> [    0.000000] PGD 0
>>> [    0.000000] Oops: 0000 [#1] SMP
>>> [    0.000000] last sysfs file:
>>> [    0.000000] CPU 0
>>> [    0.000000] Modules linked in:
>>> [    0.000000]
>>> [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.33-mm1+ #1 /
>>> [    0.000000] RIP: 0010:[<ffffffff81b0f5f7>]  [<ffffffff81b0f5f7>]
>>> memory_present+0x9a/0xbf
>>> [    0.000000] RSP: 0000:ffffffff81a01e18  EFLAGS: 00010046
>>> [    0.000000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
>>> [    0.000000] RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
>>> [    0.000000] RBP: ffffffff81a01e58 R08: ffffffffffffffff R09: 0000000000000040
>>> [    0.000000] R10: ffff880001c4e040 R11: 0000000000004100 R12: 0000000000000000
>>> [    0.000000] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
>>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffff81adf000(0000)
>>> knlGS:0000000000000000
>>> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [    0.000000] CR2: 0000000000000000 CR3: 0000000001a08000 CR4: 00000000000000b0
>>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [    0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> [    0.000000] Process swapper (pid: 0, threadinfo ffffffff81a00000,
>>> task ffffffff81a10020)
>>> [    0.000000] Stack:
>>> [    0.000000]  000000000fff0000 000000000000009f 0000000000000000
>>> 0000000000000000
>>> [    0.000000] <0> 0000000000000040 ffffffff81a01ef8 0000000000000000
>>> 0000000000000000
>>> [    0.000000] <0> ffffffff81a01e78 ffffffff81b0dd0e ffffffff81a01e88
>>> 000000000fff0000
>>> [    0.000000] Call Trace:
>>> [    0.000000]  [<ffffffff81b0dd0e>]
>>> sparse_memory_present_with_active_regions+0x31/0x47
>>> [    0.000000]  [<ffffffff81b0688a>] paging_init+0x3f/0x5b
>>> [    0.000000]  [<ffffffff81af81a7>] setup_arch+0x964/0xa03
>>> [    0.000000]  [<ffffffff8103014a>] ? need_resched+0x1e/0x28
>>> [    0.000000]  [<ffffffff8103015d>] ? should_resched+0x9/0x2a
>>> [    0.000000]  [<ffffffff8152de24>] ? _cond_resched+0x9/0x1d
>>> [    0.000000]  [<ffffffff81af4a34>] start_kernel+0x9f/0x382
>>> [    0.000000]  [<ffffffff81af4299>] x86_64_start_reservations+0xa9/0xad
>>> [    0.000000]  [<ffffffff81af4383>] x86_64_start_kernel+0xe6/0xed
>>> [    0.000000] Code: c7 00 56 c2 81 e8 a0 f9 a1 ff 48 83 3c dd 00 16
>>> c2 81 00 75 08 4c 89 2c dd 00 16 c2 81 fe 05 11 60 11 00 4c 89 ff e8
>>> 85 3b 5c ff <48> 83 38 00 75 03 4c 89 30 49 81 c4 00 80 00 00 4c 3b 65
>>> c8 72
>>> [    0.000000] RIP  [<ffffffff81b0f5f7>] memory_present+0x9a/0xbf
>>> [    0.000000]  RSP <ffffffff81a01e18>
>>> [    0.000000] CR2: 0000000000000000
>>> [    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
>>> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
>>> [    0.000000] Pid: 0, comm: swapper Tainted: G      D    2.6.33-mm1+ #1
>>> [    0.000000] Call Trace:
>>> [    0.000000]  [<ffffffff8103c78c>] panic+0x9e/0x113
>>> [    0.000000]  [<ffffffff8103d3d6>] ? printk+0x67/0x69
>>> [    0.000000]  [<ffffffff8105914e>] ? blocking_notifier_call_chain+0xf/0x11
>>> [    0.000000]  [<ffffffff8103f8b4>] do_exit+0x78/0x70f
>>> [    0.000000]  [<ffffffff8103ca2f>] ? spin_unlock_irqrestore+0x9/0xb
>>> [    0.000000]  [<ffffffff8103dcde>] ? kmsg_dump+0x112/0x138
>>> [    0.000000]  [<ffffffff81530061>] oops_end+0xb2/0xba
>>> [    0.000000]  [<ffffffff810258d3>] no_context+0x1f5/0x204
>>> [    0.000000]  [<ffffffff81025b1b>] __bad_area_nosemaphore+0x17f/0x1a2
>>> [    0.000000]  [<ffffffff81025bb4>] bad_area_nosemaphore+0xe/0x10
>>> [    0.000000]  [<ffffffff81531e36>] do_page_fault+0x122/0x24c
>>> [    0.000000]  [<ffffffff8152f59f>] page_fault+0x1f/0x30
>>> [    0.000000]  [<ffffffff81b0f5f7>] ? memory_present+0x9a/0xbf
>>> [    0.000000]  [<ffffffff81b0f5f7>] ? memory_present+0x9a/0xbf
>>> [    0.000000]  [<ffffffff81b0dd0e>]
>>> sparse_memory_present_with_active_regions+0x31/0x47
>>> [    0.000000]  [<ffffffff81b0688a>] paging_init+0x3f/0x5b
>>> [    0.000000]  [<ffffffff81af81a7>] setup_arch+0x964/0xa03
>>> [    0.000000]  [<ffffffff8103014a>] ? need_resched+0x1e/0x28
>>> [    0.000000]  [<ffffffff8103015d>] ? should_resched+0x9/0x2a
>>> [    0.000000]  [<ffffffff8152de24>] ? _cond_resched+0x9/0x1d
>>> [    0.000000]  [<ffffffff81af4a34>] start_kernel+0x9f/0x382
>>> [    0.000000]  [<ffffffff81af4299>] x86_64_start_reservations+0xa9/0xad
>>> [    0.000000]  [<ffffffff81af4383>] x86_64_start_kernel+0xe6/0xed
>>>
>>> The kernel was built with 'make mrproper && make defconfig && make
>>> ARCH=x86_64 CONFIG=smp -j 6'.  This panic is seen on every attempt, so
>>> I can provide more diagnostics.
>>
>> Okay, if you did defconfig and just hit enter to all questions, you
>> should have SPARSEMEM_EXTREME and NO_BOOTMEM enabled.  This means that
>> the 'mem_section' is an array of pointers and the following happens in
>> memory_present():
>>
>> 	for_one_pfn_in_each_section() {
>> 		sparse_index_init(); /* no return value check */
>> 		ms = __nr_to_section();
>> 		if (!ms->section_mem_map) /* bang */
>> 			...;
>> 	}
>>
>> where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate
>> the mem_section descriptor with bootmem.  If this would fail, the box
>> would panic immediately earlier, but NO_BOOTMEM does not seem to get it
>> right.
>>
>> Greg, could you retry _with_ my bootmem patch applied, but with setting
>> CONFIG_NO_BOOTMEM=n up front?
>>
>> I think NO_BOOTMEM has several problems.  Yinghai, can you verify them?
>>
>> 1. It does not seem to handle goal appropriately: bootmem would try
>> without the goal if it does not make sense.  And in this case, the
>> goal is 4G (above DMA32) and the amount of memory is 256M.
>>
>> And if I did not miss something, this is the difference with my patch:
>> without it, the default goal is 16M, which is no problem as it is well
>> within your available memory.  But the change of the default goal moved
>> it outside it which the bootmem replacement can not handle.
>>
>> 2. The early reservation stuff seems to return NULL but callsites assume
>> that the bootmem interface never does that.  Okay, the result is the same,
>> we crash.  But it still moves error reporting to a possibly much later
>> point where somebody actually dereferences the returned pointer.
> 
> related change could be: __alloc_bootmem_node_high...

no should be here...

static struct mem_section noinline __init_refok *sparse_index_alloc(int nid)
{                                     
        struct mem_section *section = NULL;
        unsigned long array_size = SECTIONS_PER_ROOT *
                                   sizeof(struct mem_section);

        if (slab_is_available()) {
                if (node_state(nid, N_HIGH_MEMORY))
                        section = kmalloc_node(array_size, GFP_KERNEL, nid);
                else
                        section = kmalloc(array_size, GFP_KERNEL);
        } else
                section = alloc_bootmem_node(NODE_DATA(nid), array_size);

and

#define alloc_bootmem_node(pgdat, x) \
        __alloc_bootmem_node(pgdat, x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))


then you change that goal MAX_DMA_ADDRESS to 4g..., but the system only have 256M 

YH

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05  3:21 ` Johannes Weiner
  2010-03-05  5:00   ` Yinghai Lu
@ 2010-03-05  5:17   ` Greg Thelen
  2010-03-05  5:34     ` Greg Thelen
  2010-03-05 18:41       ` Yinghai Lu
  2010-03-05  9:04     ` Yinghai Lu
  2 siblings, 2 replies; 48+ messages in thread
From: Greg Thelen @ 2010-03-05  5:17 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Yinghai Lu, linux-mm

On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>> On several systems I am seeing a boot panic if I use mmotm
>> (stamp-2010-03-02-18-38).  If I remove
>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
>> find that:
>> * 2.6.33 boots fine.
>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>> * 2.6.33 + mmotm (including
>> bootmem-avoid-dma32-zone-by-default.patch): panics.
>> Here's the panic seen with earlyprintk using 2.6.33 + mmotm:
>> [    0.000000]  modified: 0000000000000000 - 0000000000010000 (reserved)
>> [    0.000000]  modified: 0000000000010000 - 000000000009fc00 (usable)
>> [    0.000000]  modified: 000000000009fc00 - 00000000000a0000 (reserved)
>> [    0.000000]  modified: 00000000000e8000 - 0000000000100000 (reserved)
>> [    0.000000]  modified: 0000000000100000 - 000000000fff0000 (usable)
>> [    0.000000]  modified: 000000000fff0000 - 0000000010000000 (ACPI data)
>> [    0.000000]  modified: 00000000fffbd000 - 0000000100000000 (reserved)
>> [    0.000000] init_memory_mapping: 0000000000000000-000000000fff0000
> 256MB of memory, right?

yes, I am testing in a 256MB VM.

>> The kernel was built with 'make mrproper && make defconfig && make
>> ARCH=x86_64 CONFIG=smp -j 6'.  This panic is seen on every attempt, so
>> I can provide more diagnostics.
>
> Okay, if you did defconfig and just hit enter to all questions, you
> should have SPARSEMEM_EXTREME and NO_BOOTMEM enabled.

Correct.

> This means that the 'mem_section' is an array of pointers and the following
> happens in memory_present():
>
>        for_one_pfn_in_each_section() {
>                sparse_index_init(); /* no return value check */
>                ms = __nr_to_section();
>                if (!ms->section_mem_map) /* bang */
>                        ...;
>        }
>
> where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate
> the mem_section descriptor with bootmem.  If this would fail, the box
> would panic immediately earlier, but NO_BOOTMEM does not seem to get it
> right.
>
> Greg, could you retry _with_ my bootmem patch applied, but with setting
> CONFIG_NO_BOOTMEM=n up front?

Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I
re-tested with 'make defconfig' to confirm the panic with this later
mmotm.

Then, as you suggested, I set CONFIG_NO_BOOTMEM=n.  The system booted
fine (no panic).

--
Greg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05  5:17   ` Greg Thelen
@ 2010-03-05  5:34     ` Greg Thelen
  2010-03-05 18:41       ` Yinghai Lu
  1 sibling, 0 replies; 48+ messages in thread
From: Greg Thelen @ 2010-03-05  5:34 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Yinghai Lu, linux-mm

On Thu, Mar 4, 2010 at 9:17 PM, Greg Thelen <gthelen@google.com> wrote:
> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> 256MB of memory, right?
>
> yes, I am testing in a 256MB VM.

I also performed a 6GB test and found that the system booted fine with
defconfig:
CONFIG_NO_BOOTMEM=y
CONFIG_SPARSEMEM_EXTREME=y

--
Greg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05  3:21 ` Johannes Weiner
@ 2010-03-05  9:04     ` Yinghai Lu
  2010-03-05  5:17   ` Greg Thelen
  2010-03-05  9:04     ` Yinghai Lu
  2 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-05  9:04 UTC (permalink / raw)
  To: Johannes Weiner, Jiri Slaby
  Cc: Greg Thelen, linux-mm, linux-kernel, Andrew Morton

On 03/04/2010 07:21 PM, Johannes Weiner wrote:
> Hello Greg,
> 
> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>> On several systems I am seeing a boot panic if I use mmotm
>> (stamp-2010-03-02-18-38).  If I remove
>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
>> find that:
>> * 2.6.33 boots fine.
>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>> * 2.6.33 + mmotm (including
>> bootmem-avoid-dma32-zone-by-default.patch): panics.
>> Note: I had to enable earlyprintk to see the panic.  Without
>> earlyprintk no console output was seen.  The system appeared to hang
>> after the loader.
> 
> where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate
> the mem_section descriptor with bootmem.  If this would fail, the box
> would panic immediately earlier, but NO_BOOTMEM does not seem to get it
> right.
> 
> Greg, could you retry _with_ my bootmem patch applied, but with setting
> CONFIG_NO_BOOTMEM=n up front?
> 
> I think NO_BOOTMEM has several problems.  Yinghai, can you verify them?
...
> 
> 1. It does not seem to handle goal appropriately: bootmem would try
> without the goal if it does not make sense.  And in this case, the
> goal is 4G (above DMA32) and the amount of memory is 256M.
> 
> And if I did not miss something, this is the difference with my patch:
> without it, the default goal is 16M, which is no problem as it is well
> within your available memory.  But the change of the default goal moved
> it outside it which the bootmem replacement can not handle.
> 
> 2. The early reservation stuff seems to return NULL but callsites assume
> that the bootmem interface never does that.  Okay, the result is the same,
> we crash.  But it still moves error reporting to a possibly much later
> point where somebody actually dereferences the returned pointer.

under CONFIG_NO_BOOTMEM
for alloc_bootmem_node it will honor goal, if someone input big goal it will not
fallback to get a small one below that goal.

return NULL, could make caller have more choice and more control.

anyway we should honor the goal, otherwise should use _nopanic instead.

according to context
http://patchwork.kernel.org/patch/73893/

Jiri, 
please check current linus tree still have problem about mem_map is using that much low mem?

on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.
with NO_BOOTMEM

[    0.000000]  a - 11
[    0.000000]  19 40 - 80 95
[    0.000000]  702 740 - 1000 1000
[    0.000000]  331f 3340 - 3400 3400
[    0.000000]  35dd - 3600
[    0.000000]  37dd - 3800
[    0.000000]  39dd - 3a00
[    0.000000]  3bdd - 3c00
[    0.000000]  3ddd - 3e00
[    0.000000]  3fdd - 4000
[    0.000000]  41dd - 4200
[    0.000000]  43dd - 4400
[    0.000000]  45dd - 4600
[    0.000000]  47dd - 4800
[    0.000000]  49dd - 4a00
[    0.000000]  4bdd - 4c00
[    0.000000]  4ddd - 4e00
[    0.000000]  4fdd - 5000
[    0.000000]  51dd - 5200
[    0.000000]  93dd 9400 - 7d500 7d53b
[    0.000000]  7f730 - 7f750
[    0.000000]  100012 100040 - 100200 100200
[    0.000000]  170200 170200 - 2080000 2080000
[    0.000000]  2080065 2080080 - 2080200 2080200

so PFN: 9400 - 7d500 are free.

without NO_BOOTMEM
[    0.000000] nid=0 start=0x0000000000 end=0x0002080000 aligned=1
[    0.000000]   free [0x000000000a - 0x0000000095]
[    0.000000]   free [0x0000000702 - 0x0000001000]
[    0.000000]   free [0x00000032c4 - 0x0000003400]
[    0.000000]   free [0x00000035de - 0x0000003600]
[    0.000000]   free [0x00000037dd - 0x0000003800]
[    0.000000]   free [0x00000039dd - 0x0000003a00]
[    0.000000]   free [0x0000003bdd - 0x0000003c00]
[    0.000000]   free [0x0000003ddd - 0x0000003e00]
[    0.000000]   free [0x0000003fdd - 0x0000004000]
[    0.000000]   free [0x00000041dd - 0x0000004200]
[    0.000000]   free [0x00000043dd - 0x0000004400]
[    0.000000]   free [0x00000045dd - 0x0000004600]
[    0.000000]   free [0x00000047dd - 0x0000004800]
[    0.000000]   free [0x00000049dd - 0x0000004a00]
[    0.000000]   free [0x0000004bdd - 0x0000004c00]
[    0.000000]   free [0x0000004ddd - 0x0000004e00]
[    0.000000]   free [0x0000004fdd - 0x0000005000]
[    0.000000]   free [0x00000051dd - 0x0000005200]
[    0.000000]   free [0x00000053dd - 0x000007d53b]
[    0.000000]   free [0x000007f730 - 0x000007f750]
[    0.000000]   free [0x000010041f - 0x0000100a00]
[    0.000000]   free [0x0000170a00 - 0x0000180a00]
[    0.000000]   free [0x0000180a03 - 0x0002080000]
so pfn: 53dd 7d53b are free

looks like we don't need to change the default goal in alloc_bootmem_node.

YH

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
@ 2010-03-05  9:04     ` Yinghai Lu
  0 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-05  9:04 UTC (permalink / raw)
  To: Johannes Weiner, Jiri Slaby
  Cc: Greg Thelen, linux-mm, linux-kernel, Andrew Morton

On 03/04/2010 07:21 PM, Johannes Weiner wrote:
> Hello Greg,
> 
> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>> On several systems I am seeing a boot panic if I use mmotm
>> (stamp-2010-03-02-18-38).  If I remove
>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
>> find that:
>> * 2.6.33 boots fine.
>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>> * 2.6.33 + mmotm (including
>> bootmem-avoid-dma32-zone-by-default.patch): panics.
>> Note: I had to enable earlyprintk to see the panic.  Without
>> earlyprintk no console output was seen.  The system appeared to hang
>> after the loader.
> 
> where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate
> the mem_section descriptor with bootmem.  If this would fail, the box
> would panic immediately earlier, but NO_BOOTMEM does not seem to get it
> right.
> 
> Greg, could you retry _with_ my bootmem patch applied, but with setting
> CONFIG_NO_BOOTMEM=n up front?
> 
> I think NO_BOOTMEM has several problems.  Yinghai, can you verify them?
...
> 
> 1. It does not seem to handle goal appropriately: bootmem would try
> without the goal if it does not make sense.  And in this case, the
> goal is 4G (above DMA32) and the amount of memory is 256M.
> 
> And if I did not miss something, this is the difference with my patch:
> without it, the default goal is 16M, which is no problem as it is well
> within your available memory.  But the change of the default goal moved
> it outside it which the bootmem replacement can not handle.
> 
> 2. The early reservation stuff seems to return NULL but callsites assume
> that the bootmem interface never does that.  Okay, the result is the same,
> we crash.  But it still moves error reporting to a possibly much later
> point where somebody actually dereferences the returned pointer.

under CONFIG_NO_BOOTMEM
for alloc_bootmem_node it will honor goal, if someone input big goal it will not
fallback to get a small one below that goal.

return NULL, could make caller have more choice and more control.

anyway we should honor the goal, otherwise should use _nopanic instead.

according to context
http://patchwork.kernel.org/patch/73893/

Jiri, 
please check current linus tree still have problem about mem_map is using that much low mem?

on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.
with NO_BOOTMEM

[    0.000000]  a - 11
[    0.000000]  19 40 - 80 95
[    0.000000]  702 740 - 1000 1000
[    0.000000]  331f 3340 - 3400 3400
[    0.000000]  35dd - 3600
[    0.000000]  37dd - 3800
[    0.000000]  39dd - 3a00
[    0.000000]  3bdd - 3c00
[    0.000000]  3ddd - 3e00
[    0.000000]  3fdd - 4000
[    0.000000]  41dd - 4200
[    0.000000]  43dd - 4400
[    0.000000]  45dd - 4600
[    0.000000]  47dd - 4800
[    0.000000]  49dd - 4a00
[    0.000000]  4bdd - 4c00
[    0.000000]  4ddd - 4e00
[    0.000000]  4fdd - 5000
[    0.000000]  51dd - 5200
[    0.000000]  93dd 9400 - 7d500 7d53b
[    0.000000]  7f730 - 7f750
[    0.000000]  100012 100040 - 100200 100200
[    0.000000]  170200 170200 - 2080000 2080000
[    0.000000]  2080065 2080080 - 2080200 2080200

so PFN: 9400 - 7d500 are free.

without NO_BOOTMEM
[    0.000000] nid=0 start=0x0000000000 end=0x0002080000 aligned=1
[    0.000000]   free [0x000000000a - 0x0000000095]
[    0.000000]   free [0x0000000702 - 0x0000001000]
[    0.000000]   free [0x00000032c4 - 0x0000003400]
[    0.000000]   free [0x00000035de - 0x0000003600]
[    0.000000]   free [0x00000037dd - 0x0000003800]
[    0.000000]   free [0x00000039dd - 0x0000003a00]
[    0.000000]   free [0x0000003bdd - 0x0000003c00]
[    0.000000]   free [0x0000003ddd - 0x0000003e00]
[    0.000000]   free [0x0000003fdd - 0x0000004000]
[    0.000000]   free [0x00000041dd - 0x0000004200]
[    0.000000]   free [0x00000043dd - 0x0000004400]
[    0.000000]   free [0x00000045dd - 0x0000004600]
[    0.000000]   free [0x00000047dd - 0x0000004800]
[    0.000000]   free [0x00000049dd - 0x0000004a00]
[    0.000000]   free [0x0000004bdd - 0x0000004c00]
[    0.000000]   free [0x0000004ddd - 0x0000004e00]
[    0.000000]   free [0x0000004fdd - 0x0000005000]
[    0.000000]   free [0x00000051dd - 0x0000005200]
[    0.000000]   free [0x00000053dd - 0x000007d53b]
[    0.000000]   free [0x000007f730 - 0x000007f750]
[    0.000000]   free [0x000010041f - 0x0000100a00]
[    0.000000]   free [0x0000170a00 - 0x0000180a00]
[    0.000000]   free [0x0000180a03 - 0x0002080000]
so pfn: 53dd 7d53b are free

looks like we don't need to change the default goal in alloc_bootmem_node.

YH

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05  9:04     ` Yinghai Lu
@ 2010-03-05 10:26       ` Jiri Slaby
  -1 siblings, 0 replies; 48+ messages in thread
From: Jiri Slaby @ 2010-03-05 10:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Johannes Weiner, Greg Thelen, linux-mm, linux-kernel, Andrew Morton

On 03/05/2010 10:04 AM, Yinghai Lu wrote:
> according to context
> http://patchwork.kernel.org/patch/73893/
> 
> Jiri, 
> please check current linus tree still have problem about mem_map is using that much low mem?

Hi!

Sorry, I don't have direct access to the machine. I might try to ask the
owners to do so.

> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.

So where gets your mem_map allocated (I suppose you're running flat model)?

Note that the failure we were seeing was with different amount of memory
on different machines. Obviously because of different e820 reservations
and driver requirements at boot time. So the required memory to trigger
the error oscillated around 128G, sometimes being 130G.

It triggered when mem_map fit exactly into 0-2G (and 2-4G was reserved)
and no more space was there. If RAM was more than 130G, mem_map was
above 4G boundary implicitly, so that there was enough space in the
first 4G of memory for others with specific bootmem limitations.

> with NO_BOOTMEM
> [    0.000000]  a - 11
> [    0.000000]  19 40 - 80 95
> [    0.000000]  702 740 - 1000 1000
> [    0.000000]  331f 3340 - 3400 3400
> [    0.000000]  35dd - 3600
> [    0.000000]  37dd - 3800
> [    0.000000]  39dd - 3a00
> [    0.000000]  3bdd - 3c00
> [    0.000000]  3ddd - 3e00
> [    0.000000]  3fdd - 4000
> [    0.000000]  41dd - 4200
> [    0.000000]  43dd - 4400
> [    0.000000]  45dd - 4600
> [    0.000000]  47dd - 4800
> [    0.000000]  49dd - 4a00
> [    0.000000]  4bdd - 4c00
> [    0.000000]  4ddd - 4e00
> [    0.000000]  4fdd - 5000
> [    0.000000]  51dd - 5200
> [    0.000000]  93dd 9400 - 7d500 7d53b
> [    0.000000]  7f730 - 7f750
> [    0.000000]  100012 100040 - 100200 100200
> [    0.000000]  170200 170200 - 2080000 2080000
> [    0.000000]  2080065 2080080 - 2080200 2080200
> 
> so PFN: 9400 - 7d500 are free.

Could you explain more the dmesg output?

> without NO_BOOTMEM
> [    0.000000] nid=0 start=0x0000000000 end=0x0002080000 aligned=1
> [    0.000000]   free [0x000000000a - 0x0000000095]
> [    0.000000]   free [0x0000000702 - 0x0000001000]
> [    0.000000]   free [0x00000032c4 - 0x0000003400]
> [    0.000000]   free [0x00000035de - 0x0000003600]
> [    0.000000]   free [0x00000037dd - 0x0000003800]
> [    0.000000]   free [0x00000039dd - 0x0000003a00]
> [    0.000000]   free [0x0000003bdd - 0x0000003c00]
> [    0.000000]   free [0x0000003ddd - 0x0000003e00]
> [    0.000000]   free [0x0000003fdd - 0x0000004000]
> [    0.000000]   free [0x00000041dd - 0x0000004200]
> [    0.000000]   free [0x00000043dd - 0x0000004400]
> [    0.000000]   free [0x00000045dd - 0x0000004600]
> [    0.000000]   free [0x00000047dd - 0x0000004800]
> [    0.000000]   free [0x00000049dd - 0x0000004a00]
> [    0.000000]   free [0x0000004bdd - 0x0000004c00]
> [    0.000000]   free [0x0000004ddd - 0x0000004e00]
> [    0.000000]   free [0x0000004fdd - 0x0000005000]
> [    0.000000]   free [0x00000051dd - 0x0000005200]
> [    0.000000]   free [0x00000053dd - 0x000007d53b]
> [    0.000000]   free [0x000007f730 - 0x000007f750]
> [    0.000000]   free [0x000010041f - 0x0000100a00]
> [    0.000000]   free [0x0000170a00 - 0x0000180a00]
> [    0.000000]   free [0x0000180a03 - 0x0002080000]
> so pfn: 53dd 7d53b are free
> 
> looks like we don't need to change the default goal in alloc_bootmem_node.

thanks,
-- 
js

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
@ 2010-03-05 10:26       ` Jiri Slaby
  0 siblings, 0 replies; 48+ messages in thread
From: Jiri Slaby @ 2010-03-05 10:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Johannes Weiner, Greg Thelen, linux-mm, linux-kernel, Andrew Morton

On 03/05/2010 10:04 AM, Yinghai Lu wrote:
> according to context
> http://patchwork.kernel.org/patch/73893/
> 
> Jiri, 
> please check current linus tree still have problem about mem_map is using that much low mem?

Hi!

Sorry, I don't have direct access to the machine. I might try to ask the
owners to do so.

> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.

So where gets your mem_map allocated (I suppose you're running flat model)?

Note that the failure we were seeing was with different amount of memory
on different machines. Obviously because of different e820 reservations
and driver requirements at boot time. So the required memory to trigger
the error oscillated around 128G, sometimes being 130G.

It triggered when mem_map fit exactly into 0-2G (and 2-4G was reserved)
and no more space was there. If RAM was more than 130G, mem_map was
above 4G boundary implicitly, so that there was enough space in the
first 4G of memory for others with specific bootmem limitations.

> with NO_BOOTMEM
> [    0.000000]  a - 11
> [    0.000000]  19 40 - 80 95
> [    0.000000]  702 740 - 1000 1000
> [    0.000000]  331f 3340 - 3400 3400
> [    0.000000]  35dd - 3600
> [    0.000000]  37dd - 3800
> [    0.000000]  39dd - 3a00
> [    0.000000]  3bdd - 3c00
> [    0.000000]  3ddd - 3e00
> [    0.000000]  3fdd - 4000
> [    0.000000]  41dd - 4200
> [    0.000000]  43dd - 4400
> [    0.000000]  45dd - 4600
> [    0.000000]  47dd - 4800
> [    0.000000]  49dd - 4a00
> [    0.000000]  4bdd - 4c00
> [    0.000000]  4ddd - 4e00
> [    0.000000]  4fdd - 5000
> [    0.000000]  51dd - 5200
> [    0.000000]  93dd 9400 - 7d500 7d53b
> [    0.000000]  7f730 - 7f750
> [    0.000000]  100012 100040 - 100200 100200
> [    0.000000]  170200 170200 - 2080000 2080000
> [    0.000000]  2080065 2080080 - 2080200 2080200
> 
> so PFN: 9400 - 7d500 are free.

Could you explain more the dmesg output?

> without NO_BOOTMEM
> [    0.000000] nid=0 start=0x0000000000 end=0x0002080000 aligned=1
> [    0.000000]   free [0x000000000a - 0x0000000095]
> [    0.000000]   free [0x0000000702 - 0x0000001000]
> [    0.000000]   free [0x00000032c4 - 0x0000003400]
> [    0.000000]   free [0x00000035de - 0x0000003600]
> [    0.000000]   free [0x00000037dd - 0x0000003800]
> [    0.000000]   free [0x00000039dd - 0x0000003a00]
> [    0.000000]   free [0x0000003bdd - 0x0000003c00]
> [    0.000000]   free [0x0000003ddd - 0x0000003e00]
> [    0.000000]   free [0x0000003fdd - 0x0000004000]
> [    0.000000]   free [0x00000041dd - 0x0000004200]
> [    0.000000]   free [0x00000043dd - 0x0000004400]
> [    0.000000]   free [0x00000045dd - 0x0000004600]
> [    0.000000]   free [0x00000047dd - 0x0000004800]
> [    0.000000]   free [0x00000049dd - 0x0000004a00]
> [    0.000000]   free [0x0000004bdd - 0x0000004c00]
> [    0.000000]   free [0x0000004ddd - 0x0000004e00]
> [    0.000000]   free [0x0000004fdd - 0x0000005000]
> [    0.000000]   free [0x00000051dd - 0x0000005200]
> [    0.000000]   free [0x00000053dd - 0x000007d53b]
> [    0.000000]   free [0x000007f730 - 0x000007f750]
> [    0.000000]   free [0x000010041f - 0x0000100a00]
> [    0.000000]   free [0x0000170a00 - 0x0000180a00]
> [    0.000000]   free [0x0000180a03 - 0x0002080000]
> so pfn: 53dd 7d53b are free
> 
> looks like we don't need to change the default goal in alloc_bootmem_node.

thanks,
-- 
js

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05  5:14     ` Yinghai Lu
@ 2010-03-05 12:51       ` Johannes Weiner
  2010-03-05 16:38         ` Yinghai
  0 siblings, 1 reply; 48+ messages in thread
From: Johannes Weiner @ 2010-03-05 12:51 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Greg Thelen, linux-mm

Hi,

On Thu, Mar 04, 2010 at 09:14:15PM -0800, Yinghai Lu wrote:
> On 03/04/2010 09:00 PM, Yinghai Lu wrote:
> > On 03/04/2010 07:21 PM, Johannes Weiner wrote:
> >> Hello Greg,
> >>
> >> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
> >>> On several systems I am seeing a boot panic if I use mmotm
> >>> (stamp-2010-03-02-18-38).  If I remove
> >>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
> >>> find that:
> >>> * 2.6.33 boots fine.
> >>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
> >>> * 2.6.33 + mmotm (including
> >>> bootmem-avoid-dma32-zone-by-default.patch): panics.
> >>> Note: I had to enable earlyprintk to see the panic.  Without
> >>> earlyprintk no console output was seen.  The system appeared to hang
> >>> after the loader.
> >>
> >> Thanks for your report.  A few notes below.
> >>
> >>> Here's the panic seen with earlyprintk using 2.6.33 + mmotm:
> >>>
> >>> Starting up ...
> >>> [    0.000000] Initializing cgroup subsys cpuset
> >>> [    0.000000] Initializing cgroup subsys cpu
> >>> [    0.000000] Linux version 2.6.33-mm1+
> >>> (gthelen@ninji.mtv.corp.google.com) (gcc version 4.2.4 (Ubuntu
> >>> 4.2.4-1ubuntu4)) #1 SMP Thu Mar 4 12:03:29 PST 2010
> >>> [    0.000000] Command line:
> >>> root=UUID=a77f406a-7cc7-4f49-9cc2-818b2b4159ae ro console=tty0
> >>> console=ttyS0,115200n8 earlyprintk=serial,ttyS0,9600
> >>> [    0.000000] BIOS-provided physical RAM map:
> >>> [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
> >>> [    0.000000]  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
> >>> [    0.000000]  BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
> >>> [    0.000000]  BIOS-e820: 0000000000100000 - 000000000fff0000 (usable)
> >>> [    0.000000]  BIOS-e820: 000000000fff0000 - 0000000010000000 (ACPI data)
> >>> [    0.000000]  BIOS-e820: 00000000fffbd000 - 0000000100000000 (reserved)
> >>> [    0.000000] bootconsole [earlyser0] enabled
> >>> [    0.000000] NX (Execute Disable) protection: active
> >>> [    0.000000] DMI 2.4 present.
> >>> [    0.000000] No AGP bridge found
> >>> [    0.000000] last_pfn = 0xfff0 max_arch_pfn = 0x400000000
> >>> [    0.000000] PAT not supported by CPU.
> >>> [    0.000000] CPU MTRRs all blank - virtualized system.
> >>> [    0.000000] Scanning 1 areas for low memory corruption
> >>> [    0.000000] modified physical RAM map:
> >>> [    0.000000]  modified: 0000000000000000 - 0000000000010000 (reserved)
> >>> [    0.000000]  modified: 0000000000010000 - 000000000009fc00 (usable)
> >>> [    0.000000]  modified: 000000000009fc00 - 00000000000a0000 (reserved)
> >>> [    0.000000]  modified: 00000000000e8000 - 0000000000100000 (reserved)
> >>> [    0.000000]  modified: 0000000000100000 - 000000000fff0000 (usable)
> >>> [    0.000000]  modified: 000000000fff0000 - 0000000010000000 (ACPI data)
> >>> [    0.000000]  modified: 00000000fffbd000 - 0000000100000000 (reserved)
> >>> [    0.000000] init_memory_mapping: 0000000000000000-000000000fff0000
> >>
> >> 256MB of memory, right?
> >>
> >>> [    0.000000] RAMDISK: 0fd9d000 - 0ffdf539
> >>> [    0.000000] ACPI: RSDP 00000000000fb450 00014 (v00 QEMU  )
> >>> [    0.000000] ACPI: RSDT 000000000fff0000 00030 (v01 QEMU   QEMURSDT
> >>> 00000001 QEMU 00000001)
> >>> [    0.000000] ACPI: FACP 000000000fff0030 00074 (v01 QEMU   QEMUFACP
> >>> 00000001 QEMU 00000001)
> >>> [    0.000000] ACPI: DSDT 000000000fff0100 0089D (v01   BXPC   BXDSDT
> >>> 00000001 INTL 20061109)
> >>> [    0.000000] ACPI: FACS 000000000fff00c0 00040
> >>> [    0.000000] ACPI: APIC 000000000fff09d8 00068 (v01 QEMU   QEMUAPIC
> >>> 00000001 QEMU 00000001)
> >>> [    0.000000] ACPI: SSDT 000000000fff099d 00037 (v01 QEMU   QEMUSSDT
> >>> 00000001 QEMU 00000001)
> >>> [    0.000000] No NUMA configuration found
> >>> [    0.000000] Faking a node at 0000000000000000-000000000fff0000
> >>> [    0.000000] Initmem setup node 0 0000000000000000-000000000fff0000
> >>> [    0.000000]   NODE_DATA [0000000001c4e040 - 0000000001c5303f]
> >>> [    0.000000] BUG: unable to handle kernel NULL pointer dereference at (null)
> >>> [    0.000000] IP: [<ffffffff81b0f5f7>] memory_present+0x9a/0xbf
> >>> [    0.000000] PGD 0
> >>> [    0.000000] Oops: 0000 [#1] SMP
> >>> [    0.000000] last sysfs file:
> >>> [    0.000000] CPU 0
> >>> [    0.000000] Modules linked in:
> >>> [    0.000000]
> >>> [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.33-mm1+ #1 /
> >>> [    0.000000] RIP: 0010:[<ffffffff81b0f5f7>]  [<ffffffff81b0f5f7>]
> >>> memory_present+0x9a/0xbf
> >>> [    0.000000] RSP: 0000:ffffffff81a01e18  EFLAGS: 00010046
> >>> [    0.000000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> >>> [    0.000000] RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
> >>> [    0.000000] RBP: ffffffff81a01e58 R08: ffffffffffffffff R09: 0000000000000040
> >>> [    0.000000] R10: ffff880001c4e040 R11: 0000000000004100 R12: 0000000000000000
> >>> [    0.000000] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
> >>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffff81adf000(0000)
> >>> knlGS:0000000000000000
> >>> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [    0.000000] CR2: 0000000000000000 CR3: 0000000001a08000 CR4: 00000000000000b0
> >>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> [    0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >>> [    0.000000] Process swapper (pid: 0, threadinfo ffffffff81a00000,
> >>> task ffffffff81a10020)
> >>> [    0.000000] Stack:
> >>> [    0.000000]  000000000fff0000 000000000000009f 0000000000000000
> >>> 0000000000000000
> >>> [    0.000000] <0> 0000000000000040 ffffffff81a01ef8 0000000000000000
> >>> 0000000000000000
> >>> [    0.000000] <0> ffffffff81a01e78 ffffffff81b0dd0e ffffffff81a01e88
> >>> 000000000fff0000
> >>> [    0.000000] Call Trace:
> >>> [    0.000000]  [<ffffffff81b0dd0e>]
> >>> sparse_memory_present_with_active_regions+0x31/0x47
> >>> [    0.000000]  [<ffffffff81b0688a>] paging_init+0x3f/0x5b
> >>> [    0.000000]  [<ffffffff81af81a7>] setup_arch+0x964/0xa03
> >>> [    0.000000]  [<ffffffff8103014a>] ? need_resched+0x1e/0x28
> >>> [    0.000000]  [<ffffffff8103015d>] ? should_resched+0x9/0x2a
> >>> [    0.000000]  [<ffffffff8152de24>] ? _cond_resched+0x9/0x1d
> >>> [    0.000000]  [<ffffffff81af4a34>] start_kernel+0x9f/0x382
> >>> [    0.000000]  [<ffffffff81af4299>] x86_64_start_reservations+0xa9/0xad
> >>> [    0.000000]  [<ffffffff81af4383>] x86_64_start_kernel+0xe6/0xed
> >>> [    0.000000] Code: c7 00 56 c2 81 e8 a0 f9 a1 ff 48 83 3c dd 00 16
> >>> c2 81 00 75 08 4c 89 2c dd 00 16 c2 81 fe 05 11 60 11 00 4c 89 ff e8
> >>> 85 3b 5c ff <48> 83 38 00 75 03 4c 89 30 49 81 c4 00 80 00 00 4c 3b 65
> >>> c8 72
> >>> [    0.000000] RIP  [<ffffffff81b0f5f7>] memory_present+0x9a/0xbf
> >>> [    0.000000]  RSP <ffffffff81a01e18>
> >>> [    0.000000] CR2: 0000000000000000
> >>> [    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
> >>> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> >>> [    0.000000] Pid: 0, comm: swapper Tainted: G      D    2.6.33-mm1+ #1
> >>> [    0.000000] Call Trace:
> >>> [    0.000000]  [<ffffffff8103c78c>] panic+0x9e/0x113
> >>> [    0.000000]  [<ffffffff8103d3d6>] ? printk+0x67/0x69
> >>> [    0.000000]  [<ffffffff8105914e>] ? blocking_notifier_call_chain+0xf/0x11
> >>> [    0.000000]  [<ffffffff8103f8b4>] do_exit+0x78/0x70f
> >>> [    0.000000]  [<ffffffff8103ca2f>] ? spin_unlock_irqrestore+0x9/0xb
> >>> [    0.000000]  [<ffffffff8103dcde>] ? kmsg_dump+0x112/0x138
> >>> [    0.000000]  [<ffffffff81530061>] oops_end+0xb2/0xba
> >>> [    0.000000]  [<ffffffff810258d3>] no_context+0x1f5/0x204
> >>> [    0.000000]  [<ffffffff81025b1b>] __bad_area_nosemaphore+0x17f/0x1a2
> >>> [    0.000000]  [<ffffffff81025bb4>] bad_area_nosemaphore+0xe/0x10
> >>> [    0.000000]  [<ffffffff81531e36>] do_page_fault+0x122/0x24c
> >>> [    0.000000]  [<ffffffff8152f59f>] page_fault+0x1f/0x30
> >>> [    0.000000]  [<ffffffff81b0f5f7>] ? memory_present+0x9a/0xbf
> >>> [    0.000000]  [<ffffffff81b0f5f7>] ? memory_present+0x9a/0xbf
> >>> [    0.000000]  [<ffffffff81b0dd0e>]
> >>> sparse_memory_present_with_active_regions+0x31/0x47
> >>> [    0.000000]  [<ffffffff81b0688a>] paging_init+0x3f/0x5b
> >>> [    0.000000]  [<ffffffff81af81a7>] setup_arch+0x964/0xa03
> >>> [    0.000000]  [<ffffffff8103014a>] ? need_resched+0x1e/0x28
> >>> [    0.000000]  [<ffffffff8103015d>] ? should_resched+0x9/0x2a
> >>> [    0.000000]  [<ffffffff8152de24>] ? _cond_resched+0x9/0x1d
> >>> [    0.000000]  [<ffffffff81af4a34>] start_kernel+0x9f/0x382
> >>> [    0.000000]  [<ffffffff81af4299>] x86_64_start_reservations+0xa9/0xad
> >>> [    0.000000]  [<ffffffff81af4383>] x86_64_start_kernel+0xe6/0xed
> >>>
> >>> The kernel was built with 'make mrproper && make defconfig && make
> >>> ARCH=x86_64 CONFIG=smp -j 6'.  This panic is seen on every attempt, so
> >>> I can provide more diagnostics.
> >>
> >> Okay, if you did defconfig and just hit enter to all questions, you
> >> should have SPARSEMEM_EXTREME and NO_BOOTMEM enabled.  This means that
> >> the 'mem_section' is an array of pointers and the following happens in
> >> memory_present():
> >>
> >> 	for_one_pfn_in_each_section() {
> >> 		sparse_index_init(); /* no return value check */
> >> 		ms = __nr_to_section();
> >> 		if (!ms->section_mem_map) /* bang */
> >> 			...;
> >> 	}
> >>
> >> where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate
> >> the mem_section descriptor with bootmem.  If this would fail, the box
> >> would panic immediately earlier, but NO_BOOTMEM does not seem to get it
> >> right.
> >>
> >> Greg, could you retry _with_ my bootmem patch applied, but with setting
> >> CONFIG_NO_BOOTMEM=n up front?
> >>
> >> I think NO_BOOTMEM has several problems.  Yinghai, can you verify them?
> >>
> >> 1. It does not seem to handle goal appropriately: bootmem would try
> >> without the goal if it does not make sense.  And in this case, the
> >> goal is 4G (above DMA32) and the amount of memory is 256M.
> >>
> >> And if I did not miss something, this is the difference with my patch:
> >> without it, the default goal is 16M, which is no problem as it is well
> >> within your available memory.  But the change of the default goal moved
> >> it outside it which the bootmem replacement can not handle.
> >>
> >> 2. The early reservation stuff seems to return NULL but callsites assume
> >> that the bootmem interface never does that.  Okay, the result is the same,
> >> we crash.  But it still moves error reporting to a possibly much later
> >> point where somebody actually dereferences the returned pointer.
> > 
> > related change could be: __alloc_bootmem_node_high...
> 
> no should be here...
> 
> static struct mem_section noinline __init_refok *sparse_index_alloc(int nid)
> {                                     
>         struct mem_section *section = NULL;
>         unsigned long array_size = SECTIONS_PER_ROOT *
>                                    sizeof(struct mem_section);
> 
>         if (slab_is_available()) {
>                 if (node_state(nid, N_HIGH_MEMORY))
>                         section = kmalloc_node(array_size, GFP_KERNEL, nid);
>                 else
>                         section = kmalloc(array_size, GFP_KERNEL);
>         } else
>                 section = alloc_bootmem_node(NODE_DATA(nid), array_size);
> 
> and
> 
> #define alloc_bootmem_node(pgdat, x) \
>         __alloc_bootmem_node(pgdat, x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
> 
> 
> then you change that goal MAX_DMA_ADDRESS to 4g..., but the system only have 256M 

and alloc_bootmem_core() will handle it.  The principle of the default goal is:
if you have memory outside the DMA zone, use that if possible.  If not, just use
what's there.

So increasing the default goal to above the DMA32 zone and falling back if not
possible is a sensible change in itself.

Replacing the bootmem API implementation with something incompatible is NOT a
sensible change, however.  You have to do the fallback or review all callers
and make sure they conform to your new semantics.

My patch just shows that with common machines: those with <=4G of memory
but you already broke uncommon machines without my patch, those with
<=16M of memory.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05  9:04     ` Yinghai Lu
@ 2010-03-05 13:08       ` Johannes Weiner
  -1 siblings, 0 replies; 48+ messages in thread
From: Johannes Weiner @ 2010-03-05 13:08 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Jiri Slaby, Greg Thelen, linux-mm, linux-kernel, Andrew Morton

On Fri, Mar 05, 2010 at 01:04:33AM -0800, Yinghai Lu wrote:
> On 03/04/2010 07:21 PM, Johannes Weiner wrote:
> > Hello Greg,
> > 
> > On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
> >> On several systems I am seeing a boot panic if I use mmotm
> >> (stamp-2010-03-02-18-38).  If I remove
> >> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
> >> find that:
> >> * 2.6.33 boots fine.
> >> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
> >> * 2.6.33 + mmotm (including
> >> bootmem-avoid-dma32-zone-by-default.patch): panics.
> >> Note: I had to enable earlyprintk to see the panic.  Without
> >> earlyprintk no console output was seen.  The system appeared to hang
> >> after the loader.
> > 
> > where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate
> > the mem_section descriptor with bootmem.  If this would fail, the box
> > would panic immediately earlier, but NO_BOOTMEM does not seem to get it
> > right.
> > 
> > Greg, could you retry _with_ my bootmem patch applied, but with setting
> > CONFIG_NO_BOOTMEM=n up front?
> > 
> > I think NO_BOOTMEM has several problems.  Yinghai, can you verify them?
> ...
> > 
> > 1. It does not seem to handle goal appropriately: bootmem would try
> > without the goal if it does not make sense.  And in this case, the
> > goal is 4G (above DMA32) and the amount of memory is 256M.
> > 
> > And if I did not miss something, this is the difference with my patch:
> > without it, the default goal is 16M, which is no problem as it is well
> > within your available memory.  But the change of the default goal moved
> > it outside it which the bootmem replacement can not handle.
> > 
> > 2. The early reservation stuff seems to return NULL but callsites assume
> > that the bootmem interface never does that.  Okay, the result is the same,
> > we crash.  But it still moves error reporting to a possibly much later
> > point where somebody actually dereferences the returned pointer.
> 
> under CONFIG_NO_BOOTMEM
> for alloc_bootmem_node it will honor goal, if someone input big goal it will not
> fallback to get a small one below that goal.

Yes, that's the problem.

> return NULL, could make caller have more choice and more control.

Most callers do not need it as there is no real way to handle allocation
failures at this point of time in the boot process.

For everything else, there is the _nopanic API.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
@ 2010-03-05 13:08       ` Johannes Weiner
  0 siblings, 0 replies; 48+ messages in thread
From: Johannes Weiner @ 2010-03-05 13:08 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Jiri Slaby, Greg Thelen, linux-mm, linux-kernel, Andrew Morton

On Fri, Mar 05, 2010 at 01:04:33AM -0800, Yinghai Lu wrote:
> On 03/04/2010 07:21 PM, Johannes Weiner wrote:
> > Hello Greg,
> > 
> > On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
> >> On several systems I am seeing a boot panic if I use mmotm
> >> (stamp-2010-03-02-18-38).  If I remove
> >> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
> >> find that:
> >> * 2.6.33 boots fine.
> >> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
> >> * 2.6.33 + mmotm (including
> >> bootmem-avoid-dma32-zone-by-default.patch): panics.
> >> Note: I had to enable earlyprintk to see the panic.  Without
> >> earlyprintk no console output was seen.  The system appeared to hang
> >> after the loader.
> > 
> > where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate
> > the mem_section descriptor with bootmem.  If this would fail, the box
> > would panic immediately earlier, but NO_BOOTMEM does not seem to get it
> > right.
> > 
> > Greg, could you retry _with_ my bootmem patch applied, but with setting
> > CONFIG_NO_BOOTMEM=n up front?
> > 
> > I think NO_BOOTMEM has several problems.  Yinghai, can you verify them?
> ...
> > 
> > 1. It does not seem to handle goal appropriately: bootmem would try
> > without the goal if it does not make sense.  And in this case, the
> > goal is 4G (above DMA32) and the amount of memory is 256M.
> > 
> > And if I did not miss something, this is the difference with my patch:
> > without it, the default goal is 16M, which is no problem as it is well
> > within your available memory.  But the change of the default goal moved
> > it outside it which the bootmem replacement can not handle.
> > 
> > 2. The early reservation stuff seems to return NULL but callsites assume
> > that the bootmem interface never does that.  Okay, the result is the same,
> > we crash.  But it still moves error reporting to a possibly much later
> > point where somebody actually dereferences the returned pointer.
> 
> under CONFIG_NO_BOOTMEM
> for alloc_bootmem_node it will honor goal, if someone input big goal it will not
> fallback to get a small one below that goal.

Yes, that's the problem.

> return NULL, could make caller have more choice and more control.

Most callers do not need it as there is no real way to handle allocation
failures at this point of time in the boot process.

For everything else, there is the _nopanic API.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05 12:51       ` Johannes Weiner
@ 2010-03-05 16:38         ` Yinghai
  0 siblings, 0 replies; 48+ messages in thread
From: Yinghai @ 2010-03-05 16:38 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Greg Thelen, linux-mm





On Mar 5, 2010, at 4:51 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:

>
> My patch just shows that with common machines: those with <=4G of  
> memory
> but you already broke uncommon machines without my patch, those with
> <=16M of memory.

Ok
Will fix it

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05  5:17   ` Greg Thelen
@ 2010-03-05 18:41       ` Yinghai Lu
  2010-03-05 18:41       ` Yinghai Lu
  1 sibling, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-05 18:41 UTC (permalink / raw)
  To: Greg Thelen, Andrew Morton, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: Johannes Weiner, linux-mm, linux-kernel

On 03/04/2010 09:17 PM, Greg Thelen wrote:
> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>>> On several systems I am seeing a boot panic if I use mmotm
>>> (stamp-2010-03-02-18-38).  If I remove
>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
>>> find that:
>>> * 2.6.33 boots fine.
>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>>> * 2.6.33 + mmotm (including
>>> bootmem-avoid-dma32-zone-by-default.patch): panics.
...
> 
> Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I
> re-tested with 'make defconfig' to confirm the panic with this later
> mmotm.

please check

[PATCH] early_res: double check with updated goal in alloc_memory_core_early

Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node
change the behavoir about goal.
original bootmem one will try go further regardless of goal.

and it will break his patch about default goal from MAX_DMA to MAX_DMA32...
also broke uncommon machines with <=16M of memory.
(really? our x86 kernel still can run on 16M system?)

so try again with update goal.

Reported-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 mm/bootmem.c |   28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l
 }
 
 #ifdef CONFIG_NO_BOOTMEM
+static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size,
+						 u64 align, u64 goal, u64 limit)
+{
+	void *ptr;
+	unsigned long end_pfn;
+
+	ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
+					 goal, limit);
+	if (ptr)
+		return ptr;
+
+	/* check goal according  */
+	end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;
+	if ((end_pfn << PAGE_SHIFT) < (goal + size)) {
+		goal = pgdat->node_start_pfn << PAGE_SHIFT;
+		ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
+						 goal, limit);
+	}
+
+	return ptr;
+}
+
 static void __init __free_pages_memory(unsigned long start, unsigned long end)
 {
 	int i;
@@ -836,7 +858,7 @@ void * __init __alloc_bootmem_node(pg_da
 		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
 
 #ifdef CONFIG_NO_BOOTMEM
-	return __alloc_memory_core_early(pgdat->node_id, size, align,
+	return  ___alloc_memory_core_early(pgdat, size, align,
 					 goal, -1ULL);
 #else
 	return ___alloc_bootmem_node(pgdat->bdata, size, align, goal, 0);
@@ -920,7 +942,7 @@ void * __init __alloc_bootmem_node_nopan
 		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
 
 #ifdef CONFIG_NO_BOOTMEM
-	ptr =  __alloc_memory_core_early(pgdat->node_id, size, align,
+	ptr =  ___alloc_memory_core_early(pgdat, size, align,
 						 goal, -1ULL);
 #else
 	ptr = alloc_arch_preferred_bootmem(pgdat->bdata, size, align, goal, 0);
@@ -980,7 +1002,7 @@ void * __init __alloc_bootmem_low_node(p
 		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
 
 #ifdef CONFIG_NO_BOOTMEM
-	return __alloc_memory_core_early(pgdat->node_id, size, align,
+	return ___alloc_memory_core_early(pgdat, size, align,
 				goal, ARCH_LOW_ADDRESS_LIMIT);
 #else
 	return ___alloc_bootmem_node(pgdat->bdata, size, align,

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
@ 2010-03-05 18:41       ` Yinghai Lu
  0 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-05 18:41 UTC (permalink / raw)
  To: Greg Thelen, Andrew Morton, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: Johannes Weiner, linux-mm, linux-kernel

On 03/04/2010 09:17 PM, Greg Thelen wrote:
> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>>> On several systems I am seeing a boot panic if I use mmotm
>>> (stamp-2010-03-02-18-38).  If I remove
>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
>>> find that:
>>> * 2.6.33 boots fine.
>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>>> * 2.6.33 + mmotm (including
>>> bootmem-avoid-dma32-zone-by-default.patch): panics.
...
> 
> Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I
> re-tested with 'make defconfig' to confirm the panic with this later
> mmotm.

please check

[PATCH] early_res: double check with updated goal in alloc_memory_core_early

Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node
change the behavoir about goal.
original bootmem one will try go further regardless of goal.

and it will break his patch about default goal from MAX_DMA to MAX_DMA32...
also broke uncommon machines with <=16M of memory.
(really? our x86 kernel still can run on 16M system?)

so try again with update goal.

Reported-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 mm/bootmem.c |   28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l
 }
 
 #ifdef CONFIG_NO_BOOTMEM
+static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size,
+						 u64 align, u64 goal, u64 limit)
+{
+	void *ptr;
+	unsigned long end_pfn;
+
+	ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
+					 goal, limit);
+	if (ptr)
+		return ptr;
+
+	/* check goal according  */
+	end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;
+	if ((end_pfn << PAGE_SHIFT) < (goal + size)) {
+		goal = pgdat->node_start_pfn << PAGE_SHIFT;
+		ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
+						 goal, limit);
+	}
+
+	return ptr;
+}
+
 static void __init __free_pages_memory(unsigned long start, unsigned long end)
 {
 	int i;
@@ -836,7 +858,7 @@ void * __init __alloc_bootmem_node(pg_da
 		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
 
 #ifdef CONFIG_NO_BOOTMEM
-	return __alloc_memory_core_early(pgdat->node_id, size, align,
+	return  ___alloc_memory_core_early(pgdat, size, align,
 					 goal, -1ULL);
 #else
 	return ___alloc_bootmem_node(pgdat->bdata, size, align, goal, 0);
@@ -920,7 +942,7 @@ void * __init __alloc_bootmem_node_nopan
 		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
 
 #ifdef CONFIG_NO_BOOTMEM
-	ptr =  __alloc_memory_core_early(pgdat->node_id, size, align,
+	ptr =  ___alloc_memory_core_early(pgdat, size, align,
 						 goal, -1ULL);
 #else
 	ptr = alloc_arch_preferred_bootmem(pgdat->bdata, size, align, goal, 0);
@@ -980,7 +1002,7 @@ void * __init __alloc_bootmem_low_node(p
 		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
 
 #ifdef CONFIG_NO_BOOTMEM
-	return __alloc_memory_core_early(pgdat->node_id, size, align,
+	return ___alloc_memory_core_early(pgdat, size, align,
 				goal, ARCH_LOW_ADDRESS_LIMIT);
 #else
 	return ___alloc_bootmem_node(pgdat->bdata, size, align,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05 18:41       ` Yinghai Lu
@ 2010-03-05 19:09         ` Greg Thelen
  -1 siblings, 0 replies; 48+ messages in thread
From: Greg Thelen @ 2010-03-05 19:09 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andrew Morton, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Johannes Weiner, linux-mm, linux-kernel

On Fri, Mar 5, 2010 at 10:41 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On 03/04/2010 09:17 PM, Greg Thelen wrote:
>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>>>> On several systems I am seeing a boot panic if I use mmotm
>>>> (stamp-2010-03-02-18-38).  If I remove
>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
>>>> find that:
>>>> * 2.6.33 boots fine.
>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>>>> * 2.6.33 + mmotm (including
>>>> bootmem-avoid-dma32-zone-by-default.patch): panics.
> ...
>>
>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I
>> re-tested with 'make defconfig' to confirm the panic with this later
>> mmotm.
>
> please check
>
> [PATCH] early_res: double check with updated goal in alloc_memory_core_early
>
> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node
> change the behavoir about goal.
> original bootmem one will try go further regardless of goal.
>
> and it will break his patch about default goal from MAX_DMA to MAX_DMA32...
> also broke uncommon machines with <=16M of memory.
> (really? our x86 kernel still can run on 16M system?)
>
> so try again with update goal.
>
> Reported-by: Greg Thelen <gthelen@google.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
>  mm/bootmem.c |   28 +++++++++++++++++++++++++---
>  1 file changed, 25 insertions(+), 3 deletions(-)
>
> Index: linux-2.6/mm/bootmem.c
> ===================================================================
> --- linux-2.6.orig/mm/bootmem.c
> +++ linux-2.6/mm/bootmem.c
> @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l
>  }
>
>  #ifdef CONFIG_NO_BOOTMEM
> +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size,
> +                                                u64 align, u64 goal, u64 limit)
> +{
> +       void *ptr;
> +       unsigned long end_pfn;
> +
> +       ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
> +                                        goal, limit);
> +       if (ptr)
> +               return ptr;
> +
> +       /* check goal according  */
> +       end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;
> +       if ((end_pfn << PAGE_SHIFT) < (goal + size)) {
> +               goal = pgdat->node_start_pfn << PAGE_SHIFT;
> +               ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
> +                                                goal, limit);
> +       }
> +
> +       return ptr;
> +}
> +
>  static void __init __free_pages_memory(unsigned long start, unsigned long end)
>  {
>        int i;
> @@ -836,7 +858,7 @@ void * __init __alloc_bootmem_node(pg_da
>                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>
>  #ifdef CONFIG_NO_BOOTMEM
> -       return __alloc_memory_core_early(pgdat->node_id, size, align,
> +       return  ___alloc_memory_core_early(pgdat, size, align,
>                                         goal, -1ULL);
>  #else
>        return ___alloc_bootmem_node(pgdat->bdata, size, align, goal, 0);
> @@ -920,7 +942,7 @@ void * __init __alloc_bootmem_node_nopan
>                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>
>  #ifdef CONFIG_NO_BOOTMEM
> -       ptr =  __alloc_memory_core_early(pgdat->node_id, size, align,
> +       ptr =  ___alloc_memory_core_early(pgdat, size, align,
>                                                 goal, -1ULL);
>  #else
>        ptr = alloc_arch_preferred_bootmem(pgdat->bdata, size, align, goal, 0);
> @@ -980,7 +1002,7 @@ void * __init __alloc_bootmem_low_node(p
>                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>
>  #ifdef CONFIG_NO_BOOTMEM
> -       return __alloc_memory_core_early(pgdat->node_id, size, align,
> +       return ___alloc_memory_core_early(pgdat, size, align,
>                                goal, ARCH_LOW_ADDRESS_LIMIT);
>  #else
>        return ___alloc_bootmem_node(pgdat->bdata, size, align,
>

On my 256MB VM, which detected the problem starting this thread, the
"double check with updated goal in alloc_memory_core_early" patch
(above) boots without panic.

My initial impression is that this fixes the reported problem.  Note:
I have not tested to see if any other issues are introduced.

--
Greg

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
@ 2010-03-05 19:09         ` Greg Thelen
  0 siblings, 0 replies; 48+ messages in thread
From: Greg Thelen @ 2010-03-05 19:09 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andrew Morton, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Johannes Weiner, linux-mm, linux-kernel

On Fri, Mar 5, 2010 at 10:41 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On 03/04/2010 09:17 PM, Greg Thelen wrote:
>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>>>> On several systems I am seeing a boot panic if I use mmotm
>>>> (stamp-2010-03-02-18-38).  If I remove
>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
>>>> find that:
>>>> * 2.6.33 boots fine.
>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>>>> * 2.6.33 + mmotm (including
>>>> bootmem-avoid-dma32-zone-by-default.patch): panics.
> ...
>>
>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I
>> re-tested with 'make defconfig' to confirm the panic with this later
>> mmotm.
>
> please check
>
> [PATCH] early_res: double check with updated goal in alloc_memory_core_early
>
> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node
> change the behavoir about goal.
> original bootmem one will try go further regardless of goal.
>
> and it will break his patch about default goal from MAX_DMA to MAX_DMA32...
> also broke uncommon machines with <=16M of memory.
> (really? our x86 kernel still can run on 16M system?)
>
> so try again with update goal.
>
> Reported-by: Greg Thelen <gthelen@google.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
>  mm/bootmem.c |   28 +++++++++++++++++++++++++---
>  1 file changed, 25 insertions(+), 3 deletions(-)
>
> Index: linux-2.6/mm/bootmem.c
> ===================================================================
> --- linux-2.6.orig/mm/bootmem.c
> +++ linux-2.6/mm/bootmem.c
> @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l
>  }
>
>  #ifdef CONFIG_NO_BOOTMEM
> +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size,
> +                                                u64 align, u64 goal, u64 limit)
> +{
> +       void *ptr;
> +       unsigned long end_pfn;
> +
> +       ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
> +                                        goal, limit);
> +       if (ptr)
> +               return ptr;
> +
> +       /* check goal according  */
> +       end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;
> +       if ((end_pfn << PAGE_SHIFT) < (goal + size)) {
> +               goal = pgdat->node_start_pfn << PAGE_SHIFT;
> +               ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
> +                                                goal, limit);
> +       }
> +
> +       return ptr;
> +}
> +
>  static void __init __free_pages_memory(unsigned long start, unsigned long end)
>  {
>        int i;
> @@ -836,7 +858,7 @@ void * __init __alloc_bootmem_node(pg_da
>                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>
>  #ifdef CONFIG_NO_BOOTMEM
> -       return __alloc_memory_core_early(pgdat->node_id, size, align,
> +       return  ___alloc_memory_core_early(pgdat, size, align,
>                                         goal, -1ULL);
>  #else
>        return ___alloc_bootmem_node(pgdat->bdata, size, align, goal, 0);
> @@ -920,7 +942,7 @@ void * __init __alloc_bootmem_node_nopan
>                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>
>  #ifdef CONFIG_NO_BOOTMEM
> -       ptr =  __alloc_memory_core_early(pgdat->node_id, size, align,
> +       ptr =  ___alloc_memory_core_early(pgdat, size, align,
>                                                 goal, -1ULL);
>  #else
>        ptr = alloc_arch_preferred_bootmem(pgdat->bdata, size, align, goal, 0);
> @@ -980,7 +1002,7 @@ void * __init __alloc_bootmem_low_node(p
>                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>
>  #ifdef CONFIG_NO_BOOTMEM
> -       return __alloc_memory_core_early(pgdat->node_id, size, align,
> +       return ___alloc_memory_core_early(pgdat, size, align,
>                                goal, ARCH_LOW_ADDRESS_LIMIT);
>  #else
>        return ___alloc_bootmem_node(pgdat->bdata, size, align,
>

On my 256MB VM, which detected the problem starting this thread, the
"double check with updated goal in alloc_memory_core_early" patch
(above) boots without panic.

My initial impression is that this fixes the reported problem.  Note:
I have not tested to see if any other issues are introduced.

--
Greg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05 10:26       ` Jiri Slaby
  (?)
@ 2010-03-05 20:27       ` Yinghai Lu
  -1 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-05 20:27 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Johannes Weiner, Greg Thelen, linux-mm, linux-kernel, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2163 bytes --]

On 03/05/2010 02:26 AM, Jiri Slaby wrote:
> On 03/05/2010 10:04 AM, Yinghai Lu wrote:
>> according to context
>> http://patchwork.kernel.org/patch/73893/
>>
>> Jiri, 
>> please check current linus tree still have problem about mem_map is using that much low mem?
> 
> Hi!
> 
> Sorry, I don't have direct access to the machine. I might try to ask the
> owners to do so.
> 
>> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.
> 
> So where gets your mem_map allocated (I suppose you're running flat model)?
> 
> Note that the failure we were seeing was with different amount of memory
> on different machines. Obviously because of different e820 reservations
> and driver requirements at boot time. So the required memory to trigger
> the error oscillated around 128G, sometimes being 130G.
> 
> It triggered when mem_map fit exactly into 0-2G (and 2-4G was reserved)
> and no more space was there. If RAM was more than 130G, mem_map was
> above 4G boundary implicitly, so that there was enough space in the
> first 4G of memory for others with specific bootmem limitations.
> 
>> with NO_BOOTMEM
>> [    0.000000]  a - 11
>> [    0.000000]  19 40 - 80 95
>> [    0.000000]  702 740 - 1000 1000
>> [    0.000000]  331f 3340 - 3400 3400
>> [    0.000000]  35dd - 3600
>> [    0.000000]  37dd - 3800
>> [    0.000000]  39dd - 3a00
>> [    0.000000]  3bdd - 3c00
>> [    0.000000]  3ddd - 3e00
>> [    0.000000]  3fdd - 4000
>> [    0.000000]  41dd - 4200
>> [    0.000000]  43dd - 4400
>> [    0.000000]  45dd - 4600
>> [    0.000000]  47dd - 4800
>> [    0.000000]  49dd - 4a00
>> [    0.000000]  4bdd - 4c00
>> [    0.000000]  4ddd - 4e00
>> [    0.000000]  4fdd - 5000
>> [    0.000000]  51dd - 5200
>> [    0.000000]  93dd 9400 - 7d500 7d53b
>> [    0.000000]  7f730 - 7f750
>> [    0.000000]  100012 100040 - 100200 100200
>> [    0.000000]  170200 170200 - 2080000 2080000
>> [    0.000000]  2080065 2080080 - 2080200 2080200
>>
>> so PFN: 9400 - 7d500 are free.
> 
> Could you explain more the dmesg output?

it will list free pfn range that will be use for slab...

attached is debug patch for print out without CONFIG_NO_BOOTMEM set.

YH

[-- Attachment #2: print_free_bootmem.patch --]
[-- Type: text/x-patch, Size: 3981 bytes --]

Subject: [PATCH -v3] x86: print bootmem free before and free_all_bootmem

so we could double check if we have enough low pages later

-v2: fix errors checkpatch.pl reported
-v3: move after pci_iommu_alloc, so could compare it with NO_BOOTMEM

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/init_64.c   |    2 +
 include/linux/bootmem.h |    3 +
 mm/bootmem.c            |   91 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 96 insertions(+)

Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -335,6 +335,97 @@ static void __init __free(bootmem_data_t
 			BUG();
 }
 
+static void __init print_all_bootmem_free_core(bootmem_data_t *bdata)
+{
+	int aligned;
+	unsigned long *map;
+	unsigned long start, end, count = 0;
+	unsigned long free_start = -1UL, free_end = 0;
+
+	if (!bdata->node_bootmem_map)
+		return;
+
+	start = bdata->node_min_pfn;
+	end = bdata->node_low_pfn;
+
+	/*
+	 * If the start is aligned to the machines wordsize, we might
+	 * be able to count it in bulks of that order.
+	 */
+	aligned = !(start & (BITS_PER_LONG - 1));
+
+	printk(KERN_DEBUG "nid=%td start=0x%010lx end=0x%010lx aligned=%d\n",
+		bdata - bootmem_node_data, start, end, aligned);
+	map = bdata->node_bootmem_map;
+
+	while (start < end) {
+		unsigned long idx, vec;
+
+		idx = start - bdata->node_min_pfn;
+		vec = ~map[idx / BITS_PER_LONG];
+
+		if (aligned && vec == ~0UL && start + BITS_PER_LONG < end) {
+			if (free_start == -1UL) {
+				free_start = idx;
+				free_end = free_start + BITS_PER_LONG;
+			} else {
+				if (free_end == idx) {
+					free_end += BITS_PER_LONG;
+				} else {
+					/* there is gap, print old */
+					printk(KERN_DEBUG "  free [0x%010lx - 0x%010lx]\n",
+							free_start + bdata->node_min_pfn,
+							free_end + bdata->node_min_pfn);
+					free_start = idx;
+					free_end = idx + BITS_PER_LONG;
+				}
+			}
+			count += BITS_PER_LONG;
+		} else {
+			unsigned long off = 0;
+
+			while (vec && off < BITS_PER_LONG) {
+				if (vec & 1) {
+					if (free_start == -1UL) {
+						free_start = idx + off;
+						free_end = free_start + 1;
+					} else {
+						if (free_end == (idx + off)) {
+							free_end++;
+						} else {
+							/* there is gap, print old */
+							printk(KERN_DEBUG "  free [0x%010lx - 0x%010lx]\n",
+								free_start + bdata->node_min_pfn,
+								free_end + bdata->node_min_pfn);
+							free_start = idx + off;
+							free_end = free_start + 1;
+						}
+					}
+					count++;
+				}
+				vec >>= 1;
+				off++;
+			}
+		}
+		start += BITS_PER_LONG;
+	}
+
+	/* last one */
+	if (free_start != -1UL)
+		printk(KERN_DEBUG "  free [0x%010lx - 0x%010lx]\n",
+			free_start + bdata->node_min_pfn,
+			free_end + bdata->node_min_pfn);
+	printk(KERN_DEBUG "  total free 0x%010lx\n", count);
+}
+
+void __init print_bootmem_free(void)
+{
+	bootmem_data_t *bdata;
+
+	list_for_each_entry(bdata, &bdata_list, list)
+		print_all_bootmem_free_core(bdata);
+}
+
 static int __init __reserve(bootmem_data_t *bdata, unsigned long sidx,
 			unsigned long eidx, int flags)
 {
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -679,6 +679,8 @@ void __init mem_init(void)
 
 	pci_iommu_alloc();
 
+	print_bootmem_free();
+
 	/* clear_bss() already clear the empty_zero_page */
 
 	reservedpages = 0;
Index: linux-2.6/include/linux/bootmem.h
===================================================================
--- linux-2.6.orig/include/linux/bootmem.h
+++ linux-2.6/include/linux/bootmem.h
@@ -38,6 +38,9 @@ typedef struct bootmem_data {
 } bootmem_data_t;
 
 extern bootmem_data_t bootmem_node_data[];
+void print_bootmem_free(void);
+#else
+static inline void print_bootmem_free(void) {}
 #endif
 
 extern unsigned long bootmem_bootmap_pages(unsigned long);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] x86/bootmem: introduce bootmem_default_goal
  2010-03-05 18:41       ` Yinghai Lu
@ 2010-03-05 20:38         ` Yinghai Lu
  -1 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-05 20:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg Thelen, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Johannes Weiner, linux-mm, linux-kernel

if you don't want to drop
|  bootmem: avoid DMA32 zone by default

today mainline tree actually DO NOT need that patch according to print out ...

please apply this one too.

[PATCH] x86/bootmem: introduce bootmem_default_goal

don't punish the 64bit systems with less 4G RAM.
they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/setup.c |   13 +++++++++++++
 include/linux/bootmem.h |    3 ++-
 mm/bootmem.c            |    4 ++++
 3 files changed, 19 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -686,6 +686,18 @@ static void __init trim_bios_range(void)
 	sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
 }
 
+#ifdef MAX_DMA32_PFN
+static void __init set_bootmem_default_goal(void)
+{
+	if (max_pfn_mapped < MAX_DMA32_PFN)
+		bootmem_default_goal = __pa(MAX_DMA_ADDRESS);
+}
+#else
+static void __init set_bootmem_default_goal(void)
+{
+}
+#endif
+
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
@@ -931,6 +943,7 @@ void __init setup_arch(char **cmdline_p)
 		max_low_pfn = max_pfn;
 	}
 #endif
+	set_bootmem_default_goal();
 
 	/*
 	 * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
Index: linux-2.6/include/linux/bootmem.h
===================================================================
--- linux-2.6.orig/include/linux/bootmem.h
+++ linux-2.6/include/linux/bootmem.h
@@ -104,7 +104,8 @@ extern void *__alloc_bootmem_low_node(pg
 				      unsigned long goal);
 
 #ifdef MAX_DMA32_PFN
-#define BOOTMEM_DEFAULT_GOAL	(MAX_DMA32_PFN << PAGE_SHIFT)
+extern unsigned long bootmem_default_goal;
+#define BOOTMEM_DEFAULT_GOAL	bootmem_default_goal
 #else
 #define BOOTMEM_DEFAULT_GOAL	__pa(MAX_DMA_ADDRESS)
 #endif
Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -25,6 +25,10 @@ unsigned long max_low_pfn;
 unsigned long min_low_pfn;
 unsigned long max_pfn;
 
+#ifdef MAX_DMA32_PFN
+unsigned long bootmem_default_goal = (MAX_DMA32_PFN << PAGE_SHIFT);
+#endif
+
 #ifdef CONFIG_CRASH_DUMP
 /*
  * If we have booted due to a crash, max_pfn will be a very low value. We need

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] x86/bootmem: introduce bootmem_default_goal
@ 2010-03-05 20:38         ` Yinghai Lu
  0 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-05 20:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg Thelen, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Johannes Weiner, linux-mm, linux-kernel

if you don't want to drop
|  bootmem: avoid DMA32 zone by default

today mainline tree actually DO NOT need that patch according to print out ...

please apply this one too.

[PATCH] x86/bootmem: introduce bootmem_default_goal

don't punish the 64bit systems with less 4G RAM.
they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/setup.c |   13 +++++++++++++
 include/linux/bootmem.h |    3 ++-
 mm/bootmem.c            |    4 ++++
 3 files changed, 19 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -686,6 +686,18 @@ static void __init trim_bios_range(void)
 	sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
 }
 
+#ifdef MAX_DMA32_PFN
+static void __init set_bootmem_default_goal(void)
+{
+	if (max_pfn_mapped < MAX_DMA32_PFN)
+		bootmem_default_goal = __pa(MAX_DMA_ADDRESS);
+}
+#else
+static void __init set_bootmem_default_goal(void)
+{
+}
+#endif
+
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
@@ -931,6 +943,7 @@ void __init setup_arch(char **cmdline_p)
 		max_low_pfn = max_pfn;
 	}
 #endif
+	set_bootmem_default_goal();
 
 	/*
 	 * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
Index: linux-2.6/include/linux/bootmem.h
===================================================================
--- linux-2.6.orig/include/linux/bootmem.h
+++ linux-2.6/include/linux/bootmem.h
@@ -104,7 +104,8 @@ extern void *__alloc_bootmem_low_node(pg
 				      unsigned long goal);
 
 #ifdef MAX_DMA32_PFN
-#define BOOTMEM_DEFAULT_GOAL	(MAX_DMA32_PFN << PAGE_SHIFT)
+extern unsigned long bootmem_default_goal;
+#define BOOTMEM_DEFAULT_GOAL	bootmem_default_goal
 #else
 #define BOOTMEM_DEFAULT_GOAL	__pa(MAX_DMA_ADDRESS)
 #endif
Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -25,6 +25,10 @@ unsigned long max_low_pfn;
 unsigned long min_low_pfn;
 unsigned long max_pfn;
 
+#ifdef MAX_DMA32_PFN
+unsigned long bootmem_default_goal = (MAX_DMA32_PFN << PAGE_SHIFT);
+#endif
+
 #ifdef CONFIG_CRASH_DUMP
 /*
  * If we have booted due to a crash, max_pfn will be a very low value. We need

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05 18:41       ` Yinghai Lu
@ 2010-03-05 23:58         ` Johannes Weiner
  -1 siblings, 0 replies; 48+ messages in thread
From: Johannes Weiner @ 2010-03-05 23:58 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Greg Thelen, Andrew Morton, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, linux-mm, linux-kernel

Hello Yinghai,

On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote:
> On 03/04/2010 09:17 PM, Greg Thelen wrote:
> > On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
> >>> On several systems I am seeing a boot panic if I use mmotm
> >>> (stamp-2010-03-02-18-38).  If I remove
> >>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
> >>> find that:
> >>> * 2.6.33 boots fine.
> >>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
> >>> * 2.6.33 + mmotm (including
> >>> bootmem-avoid-dma32-zone-by-default.patch): panics.
> ...
> > 
> > Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I
> > re-tested with 'make defconfig' to confirm the panic with this later
> > mmotm.
> 
> please check
> 
> [PATCH] early_res: double check with updated goal in alloc_memory_core_early
> 
> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node
> change the behavoir about goal.
> original bootmem one will try go further regardless of goal.
> 
> and it will break his patch about default goal from MAX_DMA to MAX_DMA32...
> also broke uncommon machines with <=16M of memory.
> (really? our x86 kernel still can run on 16M system?)
> 
> so try again with update goal.

Thanks for the patch, it seems to be correct.

However, I have a more generic question about it, regarding the future of the
early_res allocator.

Did you plan on keeping the bootmem API for longer?  Because my impression was,
emulating it is a temporary measure until all users are gone and bootmem can
be finally dropped.

But then this would require some sort of handling of 'user does not need DMA[32]
memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res
allocator as well.

I ask this specifically because you move this fix into the bootmem compatibility
code while there is not yet a way to tell early_res the same thing, so switching
a user that _needs_ to specify this requirement from bootmem to early_res is not
yet possible, is it?

> Reported-by: Greg Thelen <gthelen@google.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  mm/bootmem.c |   28 +++++++++++++++++++++++++---
>  1 file changed, 25 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6/mm/bootmem.c
> ===================================================================
> --- linux-2.6.orig/mm/bootmem.c
> +++ linux-2.6/mm/bootmem.c
> @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l
>  }
>  
>  #ifdef CONFIG_NO_BOOTMEM
> +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size,
> +						 u64 align, u64 goal, u64 limit)
> +{
> +	void *ptr;
> +	unsigned long end_pfn;
> +
> +	ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
> +					 goal, limit);
> +	if (ptr)
> +		return ptr;
> +
> +	/* check goal according  */
> +	end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;
> +	if ((end_pfn << PAGE_SHIFT) < (goal + size)) {
> +		goal = pgdat->node_start_pfn << PAGE_SHIFT;
> +		ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
> +						 goal, limit);
> +	}
> +
> +	return ptr;

I think it would make sense to move the parameter check before doing the
allocation.  Then you save the second call.

And a second nitpick: naming the inner function __foo and the outer one ___foo seems
confusing to me.  Could you maybe rename the wrapper? bootmem_compat_alloc_early() or
something like that?

Thanks,
	Hannes

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
@ 2010-03-05 23:58         ` Johannes Weiner
  0 siblings, 0 replies; 48+ messages in thread
From: Johannes Weiner @ 2010-03-05 23:58 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Greg Thelen, Andrew Morton, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, linux-mm, linux-kernel

Hello Yinghai,

On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote:
> On 03/04/2010 09:17 PM, Greg Thelen wrote:
> > On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
> >>> On several systems I am seeing a boot panic if I use mmotm
> >>> (stamp-2010-03-02-18-38).  If I remove
> >>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
> >>> find that:
> >>> * 2.6.33 boots fine.
> >>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
> >>> * 2.6.33 + mmotm (including
> >>> bootmem-avoid-dma32-zone-by-default.patch): panics.
> ...
> > 
> > Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I
> > re-tested with 'make defconfig' to confirm the panic with this later
> > mmotm.
> 
> please check
> 
> [PATCH] early_res: double check with updated goal in alloc_memory_core_early
> 
> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node
> change the behavoir about goal.
> original bootmem one will try go further regardless of goal.
> 
> and it will break his patch about default goal from MAX_DMA to MAX_DMA32...
> also broke uncommon machines with <=16M of memory.
> (really? our x86 kernel still can run on 16M system?)
> 
> so try again with update goal.

Thanks for the patch, it seems to be correct.

However, I have a more generic question about it, regarding the future of the
early_res allocator.

Did you plan on keeping the bootmem API for longer?  Because my impression was,
emulating it is a temporary measure until all users are gone and bootmem can
be finally dropped.

But then this would require some sort of handling of 'user does not need DMA[32]
memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res
allocator as well.

I ask this specifically because you move this fix into the bootmem compatibility
code while there is not yet a way to tell early_res the same thing, so switching
a user that _needs_ to specify this requirement from bootmem to early_res is not
yet possible, is it?

> Reported-by: Greg Thelen <gthelen@google.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  mm/bootmem.c |   28 +++++++++++++++++++++++++---
>  1 file changed, 25 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6/mm/bootmem.c
> ===================================================================
> --- linux-2.6.orig/mm/bootmem.c
> +++ linux-2.6/mm/bootmem.c
> @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l
>  }
>  
>  #ifdef CONFIG_NO_BOOTMEM
> +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size,
> +						 u64 align, u64 goal, u64 limit)
> +{
> +	void *ptr;
> +	unsigned long end_pfn;
> +
> +	ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
> +					 goal, limit);
> +	if (ptr)
> +		return ptr;
> +
> +	/* check goal according  */
> +	end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;
> +	if ((end_pfn << PAGE_SHIFT) < (goal + size)) {
> +		goal = pgdat->node_start_pfn << PAGE_SHIFT;
> +		ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
> +						 goal, limit);
> +	}
> +
> +	return ptr;

I think it would make sense to move the parameter check before doing the
allocation.  Then you save the second call.

And a second nitpick: naming the inner function __foo and the outer one ___foo seems
confusing to me.  Could you maybe rename the wrapper? bootmem_compat_alloc_early() or
something like that?

Thanks,
	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05 23:58         ` Johannes Weiner
@ 2010-03-06  1:50           ` Yinghai Lu
  -1 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-06  1:50 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Greg Thelen, Andrew Morton, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, linux-mm, linux-kernel

On 03/05/2010 03:58 PM, Johannes Weiner wrote:
> Hello Yinghai,
> 
> On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote:
>> On 03/04/2010 09:17 PM, Greg Thelen wrote:
>>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>>>>> On several systems I am seeing a boot panic if I use mmotm
>>>>> (stamp-2010-03-02-18-38).  If I remove
>>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
>>>>> find that:
>>>>> * 2.6.33 boots fine.
>>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>>>>> * 2.6.33 + mmotm (including
>>>>> bootmem-avoid-dma32-zone-by-default.patch): panics.
>> ...
>>>
>>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I
>>> re-tested with 'make defconfig' to confirm the panic with this later
>>> mmotm.
>>
>> please check
>>
>> [PATCH] early_res: double check with updated goal in alloc_memory_core_early
>>
>> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node
>> change the behavoir about goal.
>> original bootmem one will try go further regardless of goal.
>>
>> and it will break his patch about default goal from MAX_DMA to MAX_DMA32...
>> also broke uncommon machines with <=16M of memory.
>> (really? our x86 kernel still can run on 16M system?)
>>
>> so try again with update goal.
> 
> Thanks for the patch, it seems to be correct.
> 
> However, I have a more generic question about it, regarding the future of the
> early_res allocator.
> 
> Did you plan on keeping the bootmem API for longer?  Because my impression was,
> emulating it is a temporary measure until all users are gone and bootmem can
> be finally dropped.

that depends on every arch maintainer.

user can compare them on x86 to check if...

next step will be make fw_mem_map to generiaized and combine them with lmb.

> 
> But then this would require some sort of handling of 'user does not need DMA[32]
> memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res
> allocator as well.
> 
> I ask this specifically because you move this fix into the bootmem compatibility
> code while there is not yet a way to tell early_res the same thing, so switching
> a user that _needs_ to specify this requirement from bootmem to early_res is not
> yet possible, is it?

just let caller set the goal.

> 
>> Reported-by: Greg Thelen <gthelen@google.com>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>>
>> ---
>>  mm/bootmem.c |   28 +++++++++++++++++++++++++---
>>  1 file changed, 25 insertions(+), 3 deletions(-)
>>
>> Index: linux-2.6/mm/bootmem.c
>> ===================================================================
>> --- linux-2.6.orig/mm/bootmem.c
>> +++ linux-2.6/mm/bootmem.c
>> @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l
>>  }
>>  
>>  #ifdef CONFIG_NO_BOOTMEM
>> +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size,
>> +						 u64 align, u64 goal, u64 limit)
>> +{
>> +	void *ptr;
>> +	unsigned long end_pfn;
>> +
>> +	ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
>> +					 goal, limit);
>> +	if (ptr)
>> +		return ptr;
>> +
>> +	/* check goal according  */
>> +	end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;
>> +	if ((end_pfn << PAGE_SHIFT) < (goal + size)) {
>> +		goal = pgdat->node_start_pfn << PAGE_SHIFT;
>> +		ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
>> +						 goal, limit);
>> +	}
>> +
>> +	return ptr;
> 
> I think it would make sense to move the parameter check before doing the
> allocation.  Then you save the second call.

I am trying to avoid the second call.
please check another patch about "introduce bootmem_default_goal : don't punish 64bit system without 4g ram"

> 
> And a second nitpick: naming the inner function __foo and the outer one ___foo seems
> confusing to me.  Could you maybe rename the wrapper? bootmem_compat_alloc_early() or
> something like that?

ok.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
@ 2010-03-06  1:50           ` Yinghai Lu
  0 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-06  1:50 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Greg Thelen, Andrew Morton, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, linux-mm, linux-kernel

On 03/05/2010 03:58 PM, Johannes Weiner wrote:
> Hello Yinghai,
> 
> On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote:
>> On 03/04/2010 09:17 PM, Greg Thelen wrote:
>>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>>>>> On several systems I am seeing a boot panic if I use mmotm
>>>>> (stamp-2010-03-02-18-38).  If I remove
>>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
>>>>> find that:
>>>>> * 2.6.33 boots fine.
>>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>>>>> * 2.6.33 + mmotm (including
>>>>> bootmem-avoid-dma32-zone-by-default.patch): panics.
>> ...
>>>
>>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I
>>> re-tested with 'make defconfig' to confirm the panic with this later
>>> mmotm.
>>
>> please check
>>
>> [PATCH] early_res: double check with updated goal in alloc_memory_core_early
>>
>> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node
>> change the behavoir about goal.
>> original bootmem one will try go further regardless of goal.
>>
>> and it will break his patch about default goal from MAX_DMA to MAX_DMA32...
>> also broke uncommon machines with <=16M of memory.
>> (really? our x86 kernel still can run on 16M system?)
>>
>> so try again with update goal.
> 
> Thanks for the patch, it seems to be correct.
> 
> However, I have a more generic question about it, regarding the future of the
> early_res allocator.
> 
> Did you plan on keeping the bootmem API for longer?  Because my impression was,
> emulating it is a temporary measure until all users are gone and bootmem can
> be finally dropped.

that depends on every arch maintainer.

user can compare them on x86 to check if...

next step will be make fw_mem_map to generiaized and combine them with lmb.

> 
> But then this would require some sort of handling of 'user does not need DMA[32]
> memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res
> allocator as well.
> 
> I ask this specifically because you move this fix into the bootmem compatibility
> code while there is not yet a way to tell early_res the same thing, so switching
> a user that _needs_ to specify this requirement from bootmem to early_res is not
> yet possible, is it?

just let caller set the goal.

> 
>> Reported-by: Greg Thelen <gthelen@google.com>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>>
>> ---
>>  mm/bootmem.c |   28 +++++++++++++++++++++++++---
>>  1 file changed, 25 insertions(+), 3 deletions(-)
>>
>> Index: linux-2.6/mm/bootmem.c
>> ===================================================================
>> --- linux-2.6.orig/mm/bootmem.c
>> +++ linux-2.6/mm/bootmem.c
>> @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l
>>  }
>>  
>>  #ifdef CONFIG_NO_BOOTMEM
>> +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size,
>> +						 u64 align, u64 goal, u64 limit)
>> +{
>> +	void *ptr;
>> +	unsigned long end_pfn;
>> +
>> +	ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
>> +					 goal, limit);
>> +	if (ptr)
>> +		return ptr;
>> +
>> +	/* check goal according  */
>> +	end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;
>> +	if ((end_pfn << PAGE_SHIFT) < (goal + size)) {
>> +		goal = pgdat->node_start_pfn << PAGE_SHIFT;
>> +		ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
>> +						 goal, limit);
>> +	}
>> +
>> +	return ptr;
> 
> I think it would make sense to move the parameter check before doing the
> allocation.  Then you save the second call.

I am trying to avoid the second call.
please check another patch about "introduce bootmem_default_goal : don't punish 64bit system without 4g ram"

> 
> And a second nitpick: naming the inner function __foo and the outer one ___foo seems
> confusing to me.  Could you maybe rename the wrapper? bootmem_compat_alloc_early() or
> something like that?

ok.

Thanks

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-06  1:50           ` Yinghai Lu
@ 2010-03-06  2:24             ` Johannes Weiner
  -1 siblings, 0 replies; 48+ messages in thread
From: Johannes Weiner @ 2010-03-06  2:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Greg Thelen, Andrew Morton, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, linux-mm, linux-kernel

On Fri, Mar 05, 2010 at 05:50:39PM -0800, Yinghai Lu wrote:
> On 03/05/2010 03:58 PM, Johannes Weiner wrote:
> > Hello Yinghai,
> > 
> > On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote:
> >> On 03/04/2010 09:17 PM, Greg Thelen wrote:
> >>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
> >>>>> On several systems I am seeing a boot panic if I use mmotm
> >>>>> (stamp-2010-03-02-18-38).  If I remove
> >>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
> >>>>> find that:
> >>>>> * 2.6.33 boots fine.
> >>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
> >>>>> * 2.6.33 + mmotm (including
> >>>>> bootmem-avoid-dma32-zone-by-default.patch): panics.
> >> ...
> >>>
> >>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I
> >>> re-tested with 'make defconfig' to confirm the panic with this later
> >>> mmotm.
> >>
> >> please check
> >>
> >> [PATCH] early_res: double check with updated goal in alloc_memory_core_early
> >>
> >> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node
> >> change the behavoir about goal.
> >> original bootmem one will try go further regardless of goal.
> >>
> >> and it will break his patch about default goal from MAX_DMA to MAX_DMA32...
> >> also broke uncommon machines with <=16M of memory.
> >> (really? our x86 kernel still can run on 16M system?)
> >>
> >> so try again with update goal.
> > 
> > Thanks for the patch, it seems to be correct.
> > 
> > However, I have a more generic question about it, regarding the future of the
> > early_res allocator.
> > 
> > Did you plan on keeping the bootmem API for longer?  Because my impression was,
> > emulating it is a temporary measure until all users are gone and bootmem can
> > be finally dropped.
> 
> that depends on every arch maintainer.
> 
> user can compare them on x86 to check if...

Humm, now that is a bit disappointing.  Because it means we will never get rid
of bootmem as long as it works for the other architectures.  And your changeset
just added ~900 lines of code, some of it being a rather ugly compatibility
layer in bootmem that I hoped could go away again sooner than later.

I do not know what the upsides for x86 are from no longer using bootmem but it
would suck from a code maintainance point of view to get stuck half way through
this transition and have now TWO implementations of the bootmem interface we
would like to get rid of.

> next step will be make fw_mem_map to generiaized and combine them with lmb.
> 
> > 
> > But then this would require some sort of handling of 'user does not need DMA[32]
> > memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res
> > allocator as well.
> > 
> > I ask this specifically because you move this fix into the bootmem compatibility
> > code while there is not yet a way to tell early_res the same thing, so switching
> > a user that _needs_ to specify this requirement from bootmem to early_res is not
> > yet possible, is it?
> 
> just let caller set the goal.

That means that every caller must be aware of where the DMA zone ends and if
it is non-empty and open-code the fallback to the DMA zone if the non-DMA zone
is exhausted?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
@ 2010-03-06  2:24             ` Johannes Weiner
  0 siblings, 0 replies; 48+ messages in thread
From: Johannes Weiner @ 2010-03-06  2:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Greg Thelen, Andrew Morton, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, linux-mm, linux-kernel

On Fri, Mar 05, 2010 at 05:50:39PM -0800, Yinghai Lu wrote:
> On 03/05/2010 03:58 PM, Johannes Weiner wrote:
> > Hello Yinghai,
> > 
> > On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote:
> >> On 03/04/2010 09:17 PM, Greg Thelen wrote:
> >>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
> >>>>> On several systems I am seeing a boot panic if I use mmotm
> >>>>> (stamp-2010-03-02-18-38).  If I remove
> >>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
> >>>>> find that:
> >>>>> * 2.6.33 boots fine.
> >>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
> >>>>> * 2.6.33 + mmotm (including
> >>>>> bootmem-avoid-dma32-zone-by-default.patch): panics.
> >> ...
> >>>
> >>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I
> >>> re-tested with 'make defconfig' to confirm the panic with this later
> >>> mmotm.
> >>
> >> please check
> >>
> >> [PATCH] early_res: double check with updated goal in alloc_memory_core_early
> >>
> >> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node
> >> change the behavoir about goal.
> >> original bootmem one will try go further regardless of goal.
> >>
> >> and it will break his patch about default goal from MAX_DMA to MAX_DMA32...
> >> also broke uncommon machines with <=16M of memory.
> >> (really? our x86 kernel still can run on 16M system?)
> >>
> >> so try again with update goal.
> > 
> > Thanks for the patch, it seems to be correct.
> > 
> > However, I have a more generic question about it, regarding the future of the
> > early_res allocator.
> > 
> > Did you plan on keeping the bootmem API for longer?  Because my impression was,
> > emulating it is a temporary measure until all users are gone and bootmem can
> > be finally dropped.
> 
> that depends on every arch maintainer.
> 
> user can compare them on x86 to check if...

Humm, now that is a bit disappointing.  Because it means we will never get rid
of bootmem as long as it works for the other architectures.  And your changeset
just added ~900 lines of code, some of it being a rather ugly compatibility
layer in bootmem that I hoped could go away again sooner than later.

I do not know what the upsides for x86 are from no longer using bootmem but it
would suck from a code maintainance point of view to get stuck half way through
this transition and have now TWO implementations of the bootmem interface we
would like to get rid of.

> next step will be make fw_mem_map to generiaized and combine them with lmb.
> 
> > 
> > But then this would require some sort of handling of 'user does not need DMA[32]
> > memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res
> > allocator as well.
> > 
> > I ask this specifically because you move this fix into the bootmem compatibility
> > code while there is not yet a way to tell early_res the same thing, so switching
> > a user that _needs_ to specify this requirement from bootmem to early_res is not
> > yet possible, is it?
> 
> just let caller set the goal.

That means that every caller must be aware of where the DMA zone ends and if
it is non-empty and open-code the fallback to the DMA zone if the non-DMA zone
is exhausted?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-06  2:24             ` Johannes Weiner
@ 2010-03-06  2:31               ` Yinghai Lu
  -1 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-06  2:31 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Greg Thelen, Andrew Morton, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, linux-mm, linux-kernel

On 03/05/2010 06:24 PM, Johannes Weiner wrote:
> On Fri, Mar 05, 2010 at 05:50:39PM -0800, Yinghai Lu wrote:
>> On 03/05/2010 03:58 PM, Johannes Weiner wrote:
>>> Hello Yinghai,
>>>
>>> On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote:
>>>> On 03/04/2010 09:17 PM, Greg Thelen wrote:
>>>>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>>>>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>>>>>>> On several systems I am seeing a boot panic if I use mmotm
>>>>>>> (stamp-2010-03-02-18-38).  If I remove
>>>>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
>>>>>>> find that:
>>>>>>> * 2.6.33 boots fine.
>>>>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>>>>>>> * 2.6.33 + mmotm (including
>>>>>>> bootmem-avoid-dma32-zone-by-default.patch): panics.
>>>> ...
>>>>>
>>>>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I
>>>>> re-tested with 'make defconfig' to confirm the panic with this later
>>>>> mmotm.
>>>>
>>>> please check
>>>>
>>>> [PATCH] early_res: double check with updated goal in alloc_memory_core_early
>>>>
>>>> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node
>>>> change the behavoir about goal.
>>>> original bootmem one will try go further regardless of goal.
>>>>
>>>> and it will break his patch about default goal from MAX_DMA to MAX_DMA32...
>>>> also broke uncommon machines with <=16M of memory.
>>>> (really? our x86 kernel still can run on 16M system?)
>>>>
>>>> so try again with update goal.
>>>
>>> Thanks for the patch, it seems to be correct.
>>>
>>> However, I have a more generic question about it, regarding the future of the
>>> early_res allocator.
>>>
>>> Did you plan on keeping the bootmem API for longer?  Because my impression was,
>>> emulating it is a temporary measure until all users are gone and bootmem can
>>> be finally dropped.
>>
>> that depends on every arch maintainer.
>>
>> user can compare them on x86 to check if...
> 
> Humm, now that is a bit disappointing.  Because it means we will never get rid
> of bootmem as long as it works for the other architectures.  And your changeset
> just added ~900 lines of code, some of it being a rather ugly compatibility
> layer in bootmem that I hoped could go away again sooner than later.
> 
> I do not know what the upsides for x86 are from no longer using bootmem but it
> would suck from a code maintainance point of view to get stuck half way through
> this transition and have now TWO implementations of the bootmem interface we
> would like to get rid of.

some data, and others can compare them more on x86 systems...

I didn't plan to post this data before you said ....

for my 1T system

nobootmem:
   text    data     bss     dec     hex filename
19185736        4148404 12170736        35504876        21dc2ec vmlinux.nobootmem
Memory: 1058662820k/1075838976k available (11388k kernel code, 2106480k absent, 15069676k reserved, 8589k data, 2744k init
[  220.947157] calling  ip_auto_config+0x0/0x24d @ 1


bootmem:
   text    data     bss     dec     hex filename
19188441        4153956 12170736        35513133        21de32d vmlinux.bootmem
Memory: 1058662796k/1075838976k available (11388k kernel code, 2106480k absent, 15069700k reserved, 8589k data, 2752k init)
[  236.765364] calling  ip_auto_config+0x0/0x24d @ 1

YH

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
@ 2010-03-06  2:31               ` Yinghai Lu
  0 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-06  2:31 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Greg Thelen, Andrew Morton, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, linux-mm, linux-kernel

On 03/05/2010 06:24 PM, Johannes Weiner wrote:
> On Fri, Mar 05, 2010 at 05:50:39PM -0800, Yinghai Lu wrote:
>> On 03/05/2010 03:58 PM, Johannes Weiner wrote:
>>> Hello Yinghai,
>>>
>>> On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote:
>>>> On 03/04/2010 09:17 PM, Greg Thelen wrote:
>>>>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>>>>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>>>>>>> On several systems I am seeing a boot panic if I use mmotm
>>>>>>> (stamp-2010-03-02-18-38).  If I remove
>>>>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I
>>>>>>> find that:
>>>>>>> * 2.6.33 boots fine.
>>>>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>>>>>>> * 2.6.33 + mmotm (including
>>>>>>> bootmem-avoid-dma32-zone-by-default.patch): panics.
>>>> ...
>>>>>
>>>>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I
>>>>> re-tested with 'make defconfig' to confirm the panic with this later
>>>>> mmotm.
>>>>
>>>> please check
>>>>
>>>> [PATCH] early_res: double check with updated goal in alloc_memory_core_early
>>>>
>>>> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node
>>>> change the behavoir about goal.
>>>> original bootmem one will try go further regardless of goal.
>>>>
>>>> and it will break his patch about default goal from MAX_DMA to MAX_DMA32...
>>>> also broke uncommon machines with <=16M of memory.
>>>> (really? our x86 kernel still can run on 16M system?)
>>>>
>>>> so try again with update goal.
>>>
>>> Thanks for the patch, it seems to be correct.
>>>
>>> However, I have a more generic question about it, regarding the future of the
>>> early_res allocator.
>>>
>>> Did you plan on keeping the bootmem API for longer?  Because my impression was,
>>> emulating it is a temporary measure until all users are gone and bootmem can
>>> be finally dropped.
>>
>> that depends on every arch maintainer.
>>
>> user can compare them on x86 to check if...
> 
> Humm, now that is a bit disappointing.  Because it means we will never get rid
> of bootmem as long as it works for the other architectures.  And your changeset
> just added ~900 lines of code, some of it being a rather ugly compatibility
> layer in bootmem that I hoped could go away again sooner than later.
> 
> I do not know what the upsides for x86 are from no longer using bootmem but it
> would suck from a code maintainance point of view to get stuck half way through
> this transition and have now TWO implementations of the bootmem interface we
> would like to get rid of.

some data, and others can compare them more on x86 systems...

I didn't plan to post this data before you said ....

for my 1T system

nobootmem:
   text    data     bss     dec     hex filename
19185736        4148404 12170736        35504876        21dc2ec vmlinux.nobootmem
Memory: 1058662820k/1075838976k available (11388k kernel code, 2106480k absent, 15069676k reserved, 8589k data, 2744k init
[  220.947157] calling  ip_auto_config+0x0/0x24d @ 1


bootmem:
   text    data     bss     dec     hex filename
19188441        4153956 12170736        35513133        21de32d vmlinux.bootmem
Memory: 1058662796k/1075838976k available (11388k kernel code, 2106480k absent, 15069700k reserved, 8589k data, 2752k init)
[  236.765364] calling  ip_auto_config+0x0/0x24d @ 1

YH

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* please don't apply : bootmem: avoid DMA32 zone by default
  2010-03-05 20:38         ` Yinghai Lu
@ 2010-03-06  5:44           ` Yinghai Lu
  -1 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-06  5:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg Thelen, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Johannes Weiner, linux-mm, linux-kernel

On 03/05/2010 12:38 PM, Yinghai Lu wrote:
> if you don't want to drop
> |  bootmem: avoid DMA32 zone by default
> 
> today mainline tree actually DO NOT need that patch according to print out ...
> 
> please apply this one too.
> 
> [PATCH] x86/bootmem: introduce bootmem_default_goal
> 
> don't punish the 64bit systems with less 4G RAM.
> they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...

andrew,

please drop Johannes' patch : bootmem: avoid DMA32 zone by default

so you don't need to apply two fix patches from me:
[PATCH] early_res: double check with updated goal in alloc_memory_core_early
[PATCH] x86/bootmem: introduce bootmem_default_goal

move all bootmem to above 4g, make system performance get worse...

Thanks

Yinghai Lu


^ permalink raw reply	[flat|nested] 48+ messages in thread

* please don't apply : bootmem: avoid DMA32 zone by default
@ 2010-03-06  5:44           ` Yinghai Lu
  0 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-06  5:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg Thelen, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Johannes Weiner, linux-mm, linux-kernel

On 03/05/2010 12:38 PM, Yinghai Lu wrote:
> if you don't want to drop
> |  bootmem: avoid DMA32 zone by default
> 
> today mainline tree actually DO NOT need that patch according to print out ...
> 
> please apply this one too.
> 
> [PATCH] x86/bootmem: introduce bootmem_default_goal
> 
> don't punish the 64bit systems with less 4G RAM.
> they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...

andrew,

please drop Johannes' patch : bootmem: avoid DMA32 zone by default

so you don't need to apply two fix patches from me:
[PATCH] early_res: double check with updated goal in alloc_memory_core_early
[PATCH] x86/bootmem: introduce bootmem_default_goal

move all bootmem to above 4g, make system performance get worse...

Thanks

Yinghai Lu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: please don't apply : bootmem: avoid DMA32 zone by default
  2010-03-06  5:44           ` Yinghai Lu
@ 2010-03-07  0:22             ` Andrew Morton
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2010-03-07  0:22 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Greg Thelen, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Johannes Weiner, linux-mm, linux-kernel

On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu <yinghai@kernel.org> wrote:

> On 03/05/2010 12:38 PM, Yinghai Lu wrote:
> > if you don't want to drop
> > |  bootmem: avoid DMA32 zone by default
> > 
> > today mainline tree actually DO NOT need that patch according to print out ...
> > 
> > please apply this one too.
> > 
> > [PATCH] x86/bootmem: introduce bootmem_default_goal
> > 
> > don't punish the 64bit systems with less 4G RAM.
> > they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...
> 
> andrew,
> 
> please drop Johannes' patch : bootmem: avoid DMA32 zone by default

I'd rather not.  That patch is said to fix a runtime problem which is
present in 2.6.33 and hence we planned on backporting it into 2.6.33.x.

I don't have a clue what your patches do.  Can you tell us?

Earlier, Johannes wrote

: Humm, now that is a bit disappointing.  Because it means we will never
: get rid of bootmem as long as it works for the other architectures. 
: And your changeset just added ~900 lines of code, some of it being a
: rather ugly compatibility layer in bootmem that I hoped could go away
: again sooner than later.
: 
: I do not know what the upsides for x86 are from no longer using bootmem
: but it would suck from a code maintainance point of view to get stuck
: half way through this transition and have now TWO implementations of
: the bootmem interface we would like to get rid of.

Which is a pretty good-sounding argument.  Perhaps we should be
dropping your patches.

What patches _are_ these x86 bootmem changes, anyway?  Please identify
them so people can take a look and see what they do.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: please don't apply : bootmem: avoid DMA32 zone by default
@ 2010-03-07  0:22             ` Andrew Morton
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2010-03-07  0:22 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Greg Thelen, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Johannes Weiner, linux-mm, linux-kernel

On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu <yinghai@kernel.org> wrote:

> On 03/05/2010 12:38 PM, Yinghai Lu wrote:
> > if you don't want to drop
> > |  bootmem: avoid DMA32 zone by default
> > 
> > today mainline tree actually DO NOT need that patch according to print out ...
> > 
> > please apply this one too.
> > 
> > [PATCH] x86/bootmem: introduce bootmem_default_goal
> > 
> > don't punish the 64bit systems with less 4G RAM.
> > they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...
> 
> andrew,
> 
> please drop Johannes' patch : bootmem: avoid DMA32 zone by default

I'd rather not.  That patch is said to fix a runtime problem which is
present in 2.6.33 and hence we planned on backporting it into 2.6.33.x.

I don't have a clue what your patches do.  Can you tell us?

Earlier, Johannes wrote

: Humm, now that is a bit disappointing.  Because it means we will never
: get rid of bootmem as long as it works for the other architectures. 
: And your changeset just added ~900 lines of code, some of it being a
: rather ugly compatibility layer in bootmem that I hoped could go away
: again sooner than later.
: 
: I do not know what the upsides for x86 are from no longer using bootmem
: but it would suck from a code maintainance point of view to get stuck
: half way through this transition and have now TWO implementations of
: the bootmem interface we would like to get rid of.

Which is a pretty good-sounding argument.  Perhaps we should be
dropping your patches.

What patches _are_ these x86 bootmem changes, anyway?  Please identify
them so people can take a look and see what they do.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: please don't apply : bootmem: avoid DMA32 zone by default
  2010-03-07  0:22             ` Andrew Morton
@ 2010-03-07  0:42               ` Yinghai Lu
  -1 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-07  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg Thelen, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Johannes Weiner, linux-mm, linux-kernel, Linus Torvalds

On 03/06/2010 04:22 PM, Andrew Morton wrote:
> On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu <yinghai@kernel.org> wrote:
> 
>> On 03/05/2010 12:38 PM, Yinghai Lu wrote:
>>> if you don't want to drop
>>> |  bootmem: avoid DMA32 zone by default
>>>
>>> today mainline tree actually DO NOT need that patch according to print out ...
>>>
>>> please apply this one too.
>>>
>>> [PATCH] x86/bootmem: introduce bootmem_default_goal
>>>
>>> don't punish the 64bit systems with less 4G RAM.
>>> they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...
>>
>> andrew,
>>
>> please drop Johannes' patch : bootmem: avoid DMA32 zone by default
> 
> I'd rather not.  That patch is said to fix a runtime problem which is
> present in 2.6.33 and hence we planned on backporting it into 2.6.33.x.

that patch make my box booting time from 215s to 265s.

should have better way to fix the problem:
just put the mem_map or the big chunk on high.
instead put everything above 4g.

some thing like
static void * __init_refok __earlyonly_bootmem_alloc(int node,
                                unsigned long size,
                                unsigned long align,
                                unsigned long goal)
{
        return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal);
}

void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
                                   unsigned long align, unsigned long goal)
{
#ifdef MAX_DMA32_PFN
        unsigned long end_pfn;

        if (WARN_ON_ONCE(slab_is_available()))
                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);

        /* update goal according ...MAX_DMA32_PFN */
        end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;

        if (end_pfn > MAX_DMA32_PFN + (128 >> (20 - PAGE_SHIFT)) &&
            (goal >> PAGE_SHIFT) < MAX_DMA32_PFN) {
                void *ptr;
                unsigned long new_goal;

                new_goal = MAX_DMA32_PFN << PAGE_SHIFT;
#ifdef CONFIG_NO_BOOTMEM
                ptr =  __alloc_memory_core_early(pgdat->node_id, size, align,
                                                 new_goal, -1ULL);
#else
                ptr = alloc_bootmem_core(pgdat->bdata, size, align,
                                                 new_goal, 0);
#endif
                if (ptr)
                        return ptr;
        }
#endif

        return __alloc_bootmem_node(pgdat, size, align, goal);

}


> 
> I don't have a clue what your patches do.  Can you tell us?

do use bootmem, and use early_res instead. 

you are on the to list...

please check...
http://lkml.org/lkml/2010/2/10/39
> 
> Earlier, Johannes wrote
> 
> : Humm, now that is a bit disappointing.  Because it means we will never
> : get rid of bootmem as long as it works for the other architectures. 
> : And your changeset just added ~900 lines of code, some of it being a
> : rather ugly compatibility layer in bootmem that I hoped could go away
> : again sooner than later.
> : 
> : I do not know what the upsides for x86 are from no longer using bootmem
> : but it would suck from a code maintainance point of view to get stuck
> : half way through this transition and have now TWO implementations of
> : the bootmem interface we would like to get rid of.
> 
> Which is a pretty good-sounding argument.  Perhaps we should be
> dropping your patches.
> 
> What patches _are_ these x86 bootmem changes, anyway?  Please identify
> them so people can take a look and see what they do.

http://lkml.org/lkml/2010/2/10/39

and you and linus, ingo, hpa, tglx on the To list.

Yinghai

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: please don't apply : bootmem: avoid DMA32 zone by default
@ 2010-03-07  0:42               ` Yinghai Lu
  0 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-07  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg Thelen, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Johannes Weiner, linux-mm, linux-kernel, Linus Torvalds

On 03/06/2010 04:22 PM, Andrew Morton wrote:
> On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu <yinghai@kernel.org> wrote:
> 
>> On 03/05/2010 12:38 PM, Yinghai Lu wrote:
>>> if you don't want to drop
>>> |  bootmem: avoid DMA32 zone by default
>>>
>>> today mainline tree actually DO NOT need that patch according to print out ...
>>>
>>> please apply this one too.
>>>
>>> [PATCH] x86/bootmem: introduce bootmem_default_goal
>>>
>>> don't punish the 64bit systems with less 4G RAM.
>>> they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...
>>
>> andrew,
>>
>> please drop Johannes' patch : bootmem: avoid DMA32 zone by default
> 
> I'd rather not.  That patch is said to fix a runtime problem which is
> present in 2.6.33 and hence we planned on backporting it into 2.6.33.x.

that patch make my box booting time from 215s to 265s.

should have better way to fix the problem:
just put the mem_map or the big chunk on high.
instead put everything above 4g.

some thing like
static void * __init_refok __earlyonly_bootmem_alloc(int node,
                                unsigned long size,
                                unsigned long align,
                                unsigned long goal)
{
        return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal);
}

void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
                                   unsigned long align, unsigned long goal)
{
#ifdef MAX_DMA32_PFN
        unsigned long end_pfn;

        if (WARN_ON_ONCE(slab_is_available()))
                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);

        /* update goal according ...MAX_DMA32_PFN */
        end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;

        if (end_pfn > MAX_DMA32_PFN + (128 >> (20 - PAGE_SHIFT)) &&
            (goal >> PAGE_SHIFT) < MAX_DMA32_PFN) {
                void *ptr;
                unsigned long new_goal;

                new_goal = MAX_DMA32_PFN << PAGE_SHIFT;
#ifdef CONFIG_NO_BOOTMEM
                ptr =  __alloc_memory_core_early(pgdat->node_id, size, align,
                                                 new_goal, -1ULL);
#else
                ptr = alloc_bootmem_core(pgdat->bdata, size, align,
                                                 new_goal, 0);
#endif
                if (ptr)
                        return ptr;
        }
#endif

        return __alloc_bootmem_node(pgdat, size, align, goal);

}


> 
> I don't have a clue what your patches do.  Can you tell us?

do use bootmem, and use early_res instead. 

you are on the to list...

please check...
http://lkml.org/lkml/2010/2/10/39
> 
> Earlier, Johannes wrote
> 
> : Humm, now that is a bit disappointing.  Because it means we will never
> : get rid of bootmem as long as it works for the other architectures. 
> : And your changeset just added ~900 lines of code, some of it being a
> : rather ugly compatibility layer in bootmem that I hoped could go away
> : again sooner than later.
> : 
> : I do not know what the upsides for x86 are from no longer using bootmem
> : but it would suck from a code maintainance point of view to get stuck
> : half way through this transition and have now TWO implementations of
> : the bootmem interface we would like to get rid of.
> 
> Which is a pretty good-sounding argument.  Perhaps we should be
> dropping your patches.
> 
> What patches _are_ these x86 bootmem changes, anyway?  Please identify
> them so people can take a look and see what they do.

http://lkml.org/lkml/2010/2/10/39

and you and linus, ingo, hpa, tglx on the To list.

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: please don't apply : bootmem: avoid DMA32 zone by default
  2010-03-07  0:42               ` Yinghai Lu
@ 2010-03-07  0:53                 ` Yinghai Lu
  -1 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-07  0:53 UTC (permalink / raw)
  To: Andrew Morton, Jiri Slaby
  Cc: Greg Thelen, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Johannes Weiner, linux-mm, linux-kernel, Linus Torvalds

On 03/06/2010 04:42 PM, Yinghai Lu wrote:
> On 03/06/2010 04:22 PM, Andrew Morton wrote:
>> On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu <yinghai@kernel.org> wrote:
>>
>>> On 03/05/2010 12:38 PM, Yinghai Lu wrote:
>>>> if you don't want to drop
>>>> |  bootmem: avoid DMA32 zone by default
>>>>
>>>> today mainline tree actually DO NOT need that patch according to print out ...
>>>>
>>>> please apply this one too.
>>>>
>>>> [PATCH] x86/bootmem: introduce bootmem_default_goal
>>>>
>>>> don't punish the 64bit systems with less 4G RAM.
>>>> they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...
>>>
>>> andrew,
>>>
>>> please drop Johannes' patch : bootmem: avoid DMA32 zone by default
>>
>> I'd rather not.  That patch is said to fix a runtime problem which is
>> present in 2.6.33 and hence we planned on backporting it into 2.6.33.x.
> 
> that patch make my box booting time from 215s to 265s.
> 
> should have better way to fix the problem:
> just put the mem_map or the big chunk on high.
> instead put everything above 4g.
> 
> some thing like
> static void * __init_refok __earlyonly_bootmem_alloc(int node,
>                                 unsigned long size,
>                                 unsigned long align,
>                                 unsigned long goal)
> {
>         return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal);
> }
> 
> void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
>                                    unsigned long align, unsigned long goal)
> {
> #ifdef MAX_DMA32_PFN
>         unsigned long end_pfn;
> 
>         if (WARN_ON_ONCE(slab_is_available()))
>                 return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
> 
>         /* update goal according ...MAX_DMA32_PFN */
>         end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;
> 
>         if (end_pfn > MAX_DMA32_PFN + (128 >> (20 - PAGE_SHIFT)) &&
>             (goal >> PAGE_SHIFT) < MAX_DMA32_PFN) {
>                 void *ptr;
>                 unsigned long new_goal;
> 
>                 new_goal = MAX_DMA32_PFN << PAGE_SHIFT;
> #ifdef CONFIG_NO_BOOTMEM
>                 ptr =  __alloc_memory_core_early(pgdat->node_id, size, align,
>                                                  new_goal, -1ULL);
> #else
>                 ptr = alloc_bootmem_core(pgdat->bdata, size, align,
>                                                  new_goal, 0);
> #endif
>                 if (ptr)
>                         return ptr;
>         }
> #endif
> 
>         return __alloc_bootmem_node(pgdat, size, align, goal);
> 
> }

Jiri, can you send out your bootlog and .config?

Yinghai

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: please don't apply : bootmem: avoid DMA32 zone by default
@ 2010-03-07  0:53                 ` Yinghai Lu
  0 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-07  0:53 UTC (permalink / raw)
  To: Andrew Morton, Jiri Slaby
  Cc: Greg Thelen, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Johannes Weiner, linux-mm, linux-kernel, Linus Torvalds

On 03/06/2010 04:42 PM, Yinghai Lu wrote:
> On 03/06/2010 04:22 PM, Andrew Morton wrote:
>> On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu <yinghai@kernel.org> wrote:
>>
>>> On 03/05/2010 12:38 PM, Yinghai Lu wrote:
>>>> if you don't want to drop
>>>> |  bootmem: avoid DMA32 zone by default
>>>>
>>>> today mainline tree actually DO NOT need that patch according to print out ...
>>>>
>>>> please apply this one too.
>>>>
>>>> [PATCH] x86/bootmem: introduce bootmem_default_goal
>>>>
>>>> don't punish the 64bit systems with less 4G RAM.
>>>> they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...
>>>
>>> andrew,
>>>
>>> please drop Johannes' patch : bootmem: avoid DMA32 zone by default
>>
>> I'd rather not.  That patch is said to fix a runtime problem which is
>> present in 2.6.33 and hence we planned on backporting it into 2.6.33.x.
> 
> that patch make my box booting time from 215s to 265s.
> 
> should have better way to fix the problem:
> just put the mem_map or the big chunk on high.
> instead put everything above 4g.
> 
> some thing like
> static void * __init_refok __earlyonly_bootmem_alloc(int node,
>                                 unsigned long size,
>                                 unsigned long align,
>                                 unsigned long goal)
> {
>         return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal);
> }
> 
> void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
>                                    unsigned long align, unsigned long goal)
> {
> #ifdef MAX_DMA32_PFN
>         unsigned long end_pfn;
> 
>         if (WARN_ON_ONCE(slab_is_available()))
>                 return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
> 
>         /* update goal according ...MAX_DMA32_PFN */
>         end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;
> 
>         if (end_pfn > MAX_DMA32_PFN + (128 >> (20 - PAGE_SHIFT)) &&
>             (goal >> PAGE_SHIFT) < MAX_DMA32_PFN) {
>                 void *ptr;
>                 unsigned long new_goal;
> 
>                 new_goal = MAX_DMA32_PFN << PAGE_SHIFT;
> #ifdef CONFIG_NO_BOOTMEM
>                 ptr =  __alloc_memory_core_early(pgdat->node_id, size, align,
>                                                  new_goal, -1ULL);
> #else
>                 ptr = alloc_bootmem_core(pgdat->bdata, size, align,
>                                                  new_goal, 0);
> #endif
>                 if (ptr)
>                         return ptr;
>         }
> #endif
> 
>         return __alloc_bootmem_node(pgdat, size, align, goal);
> 
> }

Jiri, can you send out your bootlog and .config?

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: please don't apply : bootmem: avoid DMA32 zone by default
  2010-03-07  0:22             ` Andrew Morton
@ 2010-03-07  1:03               ` Paul Mackerras
  -1 siblings, 0 replies; 48+ messages in thread
From: Paul Mackerras @ 2010-03-07  1:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yinghai Lu, Greg Thelen, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, Johannes Weiner, linux-mm, linux-kernel, linux-arch

On Sat, Mar 06, 2010 at 04:22:34PM -0800, Andrew Morton wrote:
> Earlier, Johannes wrote
> 
> : Humm, now that is a bit disappointing.  Because it means we will never
> : get rid of bootmem as long as it works for the other architectures. 
> : And your changeset just added ~900 lines of code, some of it being a
> : rather ugly compatibility layer in bootmem that I hoped could go away
> : again sooner than later.

Whoa!  Who's proposing to get rid of bootmem, and why?

Paul.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: please don't apply : bootmem: avoid DMA32 zone by default
@ 2010-03-07  1:03               ` Paul Mackerras
  0 siblings, 0 replies; 48+ messages in thread
From: Paul Mackerras @ 2010-03-07  1:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yinghai Lu, Greg Thelen, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, Johannes Weiner, linux-mm, linux-kernel, linux-arch

On Sat, Mar 06, 2010 at 04:22:34PM -0800, Andrew Morton wrote:
> Earlier, Johannes wrote
> 
> : Humm, now that is a bit disappointing.  Because it means we will never
> : get rid of bootmem as long as it works for the other architectures. 
> : And your changeset just added ~900 lines of code, some of it being a
> : rather ugly compatibility layer in bootmem that I hoped could go away
> : again sooner than later.

Whoa!  Who's proposing to get rid of bootmem, and why?

Paul.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-05 10:26       ` Jiri Slaby
  (?)
  (?)
@ 2010-03-07  1:17       ` Yinghai Lu
  2010-03-11 10:54         ` Jiri Slaby
  -1 siblings, 1 reply; 48+ messages in thread
From: Yinghai Lu @ 2010-03-07  1:17 UTC (permalink / raw)
  To: Jiri Slaby, Andrew Morton
  Cc: Johannes Weiner, Greg Thelen, linux-kernel, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner, cl

On 03/05/2010 02:26 AM, Jiri Slaby wrote:
> On 03/05/2010 10:04 AM, Yinghai Lu wrote:
>> according to context
>> http://patchwork.kernel.org/patch/73893/
>>
>> Jiri, 
>> please check current linus tree still have problem about mem_map is using that much low mem?
> 
> Hi!
> 
> Sorry, I don't have direct access to the machine. I might try to ask the
> owners to do so.
> 
>> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.
> 
> So where gets your mem_map allocated (I suppose you're running flat model)?

what kernel version? 2.6.27?

x86 64bit now only support SPARSEMEM.

Yinghai


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: please don't apply : bootmem: avoid DMA32 zone by default
  2010-03-07  1:03               ` Paul Mackerras
  (?)
@ 2010-03-07  1:48               ` Stephen Rothwell
  -1 siblings, 0 replies; 48+ messages in thread
From: Stephen Rothwell @ 2010-03-07  1:48 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Andrew Morton, Yinghai Lu, Greg Thelen, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, Johannes Weiner, linux-mm,
	linux-kernel, linux-arch

[-- Attachment #1: Type: text/plain, Size: 894 bytes --]

Hi Paul,

On Sun, 7 Mar 2010 12:03:27 +1100 Paul Mackerras <paulus@samba.org> wrote:
>
> On Sat, Mar 06, 2010 at 04:22:34PM -0800, Andrew Morton wrote:
> > Earlier, Johannes wrote
> > 
> > : Humm, now that is a bit disappointing.  Because it means we will never
> > : get rid of bootmem as long as it works for the other architectures. 
> > : And your changeset just added ~900 lines of code, some of it being a
> > : rather ugly compatibility layer in bootmem that I hoped could go away
> > : again sooner than later.
> 
> Whoa!  Who's proposing to get rid of bootmem, and why?

I assume that is the point of the "early_res" work already in Linus' tree
starting from commit 27811d8cabe56e0c3622251b049086f49face4ff ("x86: Move
range related operation to one file").

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] sparsemem: on no vmemmap path put mem_map on node high too
  2010-03-07  0:53                 ` Yinghai Lu
  (?)
@ 2010-03-07  2:15                 ` Yinghai Lu
  -1 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-07  2:15 UTC (permalink / raw)
  To: Andrew Morton, Jiri Slaby, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, Christoph Lameter
  Cc: Greg Thelen, Johannes Weiner, linux-kernel


we need to put mem_map high when virtual memmap is not used.

before this patch
free mem pfn range on first node:
[    0.000000]  19 - 1f
[    0.000000]  28 40 - 80 95
[    0.000000]  702 740 - 1000 1000
[    0.000000]  347c - 347e
[    0.000000]  34e7 3500 - 3b80 3b8b
[    0.000000]  73b8b 73bc0 - 73c00 73c00
[    0.000000]  73ddd - 73e00
[    0.000000]  73fdd - 74000
[    0.000000]  741dd - 74200
[    0.000000]  743dd - 74400
[    0.000000]  745dd - 74600
[    0.000000]  747dd - 74800
[    0.000000]  749dd - 74a00
[    0.000000]  74bdd - 74c00
[    0.000000]  74ddd - 74e00
[    0.000000]  74fdd - 75000
[    0.000000]  751dd - 75200
[    0.000000]  753dd - 75400
[    0.000000]  755dd - 75600
[    0.000000]  757dd - 75800
[    0.000000]  759dd - 75a00
[    0.000000]  79bdd 79c00 - 7d540 7d550
[    0.000000]  7f745 - 7f750
[    0.000000]  10000b 100040 - 2080000 2080000
so only 79c00 - 7d540 are major free block under 4g...

after this patch, we will get
[    0.000000]  19 - 1f
[    0.000000]  28 40 - 80 95
[    0.000000]  702 740 - 1000 1000
[    0.000000]  347c - 347e
[    0.000000]  34e7 3500 - 3600 3600
[    0.000000]  37dd - 3800
[    0.000000]  39dd - 3a00
[    0.000000]  3bdd - 3c00
[    0.000000]  3ddd - 3e00
[    0.000000]  3fdd - 4000
[    0.000000]  41dd - 4200
[    0.000000]  43dd - 4400
[    0.000000]  45dd - 4600
[    0.000000]  47dd - 4800
[    0.000000]  49dd - 4a00
[    0.000000]  4bdd - 4c00
[    0.000000]  4ddd - 4e00
[    0.000000]  4fdd - 5000
[    0.000000]  51dd - 5200
[    0.000000]  53dd - 5400
[    0.000000]  95dd 9600 - 7d540 7d550
[    0.000000]  7f745 - 7f750
[    0.000000]  17000b 170040 - 2080000 2080000
we will have 9600 - 7d540 for major free block...

sparse-vmemmap path already used __alloc_bootmem_node_high()

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 mm/sparse.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/sparse.c
===================================================================
--- linux-2.6.orig/mm/sparse.c
+++ linux-2.6/mm/sparse.c
@@ -381,13 +381,15 @@ static void __init sparse_early_usemaps_
 struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid)
 {
 	struct page *map;
+	unsigned long size;
 
 	map = alloc_remap(nid, sizeof(struct page) * PAGES_PER_SECTION);
 	if (map)
 		return map;
 
-	map = alloc_bootmem_pages_node(NODE_DATA(nid),
-		       PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION));
+	size = PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
+	map = __alloc_bootmem_node_high(NODE_DATA(nid), size,
+					 PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
 	return map;
 }
 void __init sparse_mem_maps_populate_node(struct page **map_map,
@@ -411,7 +413,8 @@ void __init sparse_mem_maps_populate_nod
 	}
 
 	size = PAGE_ALIGN(size);
-	map = alloc_bootmem_pages_node(NODE_DATA(nodeid), size * map_count);
+	map = __alloc_bootmem_node_high(NODE_DATA(nodeid), size * map_count,
+					 PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
 	if (map) {
 		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 			if (!present_section_nr(pnum))

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: please don't apply : bootmem: avoid DMA32 zone by default
  2010-03-07  1:03               ` Paul Mackerras
@ 2010-03-07  9:16                 ` Russell King
  -1 siblings, 0 replies; 48+ messages in thread
From: Russell King @ 2010-03-07  9:16 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Andrew Morton, Yinghai Lu, Greg Thelen, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, Johannes Weiner, linux-mm,
	linux-kernel, linux-arch

On Sun, Mar 07, 2010 at 12:03:27PM +1100, Paul Mackerras wrote:
> On Sat, Mar 06, 2010 at 04:22:34PM -0800, Andrew Morton wrote:
> > Earlier, Johannes wrote
> > 
> > : Humm, now that is a bit disappointing.  Because it means we will never
> > : get rid of bootmem as long as it works for the other architectures. 
> > : And your changeset just added ~900 lines of code, some of it being a
> > : rather ugly compatibility layer in bootmem that I hoped could go away
> > : again sooner than later.
> 
> Whoa!  Who's proposing to get rid of bootmem, and why?

It would be nice if this stuff was copied to linux-arch since it
impacts all architectures.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: please don't apply : bootmem: avoid DMA32 zone by default
@ 2010-03-07  9:16                 ` Russell King
  0 siblings, 0 replies; 48+ messages in thread
From: Russell King @ 2010-03-07  9:16 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Andrew Morton, Yinghai Lu, Greg Thelen, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, Johannes Weiner, linux-mm,
	linux-kernel, linux-arch

On Sun, Mar 07, 2010 at 12:03:27PM +1100, Paul Mackerras wrote:
> On Sat, Mar 06, 2010 at 04:22:34PM -0800, Andrew Morton wrote:
> > Earlier, Johannes wrote
> > 
> > : Humm, now that is a bit disappointing.  Because it means we will never
> > : get rid of bootmem as long as it works for the other architectures. 
> > : And your changeset just added ~900 lines of code, some of it being a
> > : rather ugly compatibility layer in bootmem that I hoped could go away
> > : again sooner than later.
> 
> Whoa!  Who's proposing to get rid of bootmem, and why?

It would be nice if this stuff was copied to linux-arch since it
impacts all architectures.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-07  1:17       ` Yinghai Lu
@ 2010-03-11 10:54         ` Jiri Slaby
  2010-03-11 20:12           ` Yinghai Lu
  0 siblings, 1 reply; 48+ messages in thread
From: Jiri Slaby @ 2010-03-11 10:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andrew Morton, Johannes Weiner, Greg Thelen, linux-kernel,
	Ingo Molnar, H. Peter Anvin, Thomas Gleixner, cl

On 03/07/2010 02:17 AM, Yinghai Lu wrote:
> On 03/05/2010 02:26 AM, Jiri Slaby wrote:
>> On 03/05/2010 10:04 AM, Yinghai Lu wrote:
>>> according to context
>>> http://patchwork.kernel.org/patch/73893/
>>>
>>> Jiri,
>>> please check current linus tree still have problem about mem_map is using that much low mem?
>>
>> Hi!
>>
>> Sorry, I don't have direct access to the machine. I might try to ask the
>> owners to do so.
>>
>>> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.
>>
>> So where gets your mem_map allocated (I suppose you're running flat model)?
>
> what kernel version? 2.6.27?

Hi, yes, it is 2.6.27.

> x86 64bit now only support SPARSEMEM.


-- 
js

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-11 10:54         ` Jiri Slaby
@ 2010-03-11 20:12           ` Yinghai Lu
  2010-03-11 21:40             ` Jiri Slaby
  0 siblings, 1 reply; 48+ messages in thread
From: Yinghai Lu @ 2010-03-11 20:12 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Andrew Morton, Johannes Weiner, Greg Thelen, linux-kernel,
	Ingo Molnar, H. Peter Anvin, Thomas Gleixner, cl

On 03/11/2010 02:54 AM, Jiri Slaby wrote:
> On 03/07/2010 02:17 AM, Yinghai Lu wrote:
>> On 03/05/2010 02:26 AM, Jiri Slaby wrote:
>>> On 03/05/2010 10:04 AM, Yinghai Lu wrote:
>>>> according to context
>>>> http://patchwork.kernel.org/patch/73893/
>>>>
>>>> Jiri,
>>>> please check current linus tree still have problem about mem_map is
>>>> using that much low mem?
>>>
>>> Hi!
>>>
>>> Sorry, I don't have direct access to the machine. I might try to ask the
>>> owners to do so.
>>>
>>>> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.
>>>
>>> So where gets your mem_map allocated (I suppose you're running flat
>>> model)?
>>
>> what kernel version? 2.6.27?
> 
> Hi, yes, it is 2.6.27.

SLES 11?

Yinghai

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-11 20:12           ` Yinghai Lu
@ 2010-03-11 21:40             ` Jiri Slaby
  2010-03-11 21:42               ` Yinghai Lu
  0 siblings, 1 reply; 48+ messages in thread
From: Jiri Slaby @ 2010-03-11 21:40 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andrew Morton, Johannes Weiner, Greg Thelen, linux-kernel,
	Ingo Molnar, H. Peter Anvin, Thomas Gleixner, cl

On 03/11/2010 09:12 PM, Yinghai Lu wrote:
> On 03/11/2010 02:54 AM, Jiri Slaby wrote:
>> Hi, yes, it is 2.6.27.
>
> SLES 11?

Sorry I wrote that in haste. It is SLES 10 in the end. That means it is 
2.6.16, not 2.6.27. Hence no sparsemem whatsoever. With SLES11 it should 
be OK, we are using flatmem only for i386.

Whatever, it should be no issue now, as flatmem currently (as of 2.6.25) 
depends on i386.

On the other hand I still considered the patch as applicable to 
contemporary kernels since there might be weird bios e820 maps and huge 
(and sparse) bootmem allocations/reservations (memory cgroups, initrd) 
so that code requiring much memory below 4g (swiotlb) will fail then.

Whatever, in the current kernel, the particular issue I was referring to 
*is not reproducible*.

thanks,
-- 
js

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
  2010-03-11 21:40             ` Jiri Slaby
@ 2010-03-11 21:42               ` Yinghai Lu
  0 siblings, 0 replies; 48+ messages in thread
From: Yinghai Lu @ 2010-03-11 21:42 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Andrew Morton, Johannes Weiner, Greg Thelen, linux-kernel,
	Ingo Molnar, H. Peter Anvin, Thomas Gleixner, cl

On 03/11/2010 01:40 PM, Jiri Slaby wrote:
> On 03/11/2010 09:12 PM, Yinghai Lu wrote:
>> On 03/11/2010 02:54 AM, Jiri Slaby wrote:
>>> Hi, yes, it is 2.6.27.
>>
>> SLES 11?
> 
> Sorry I wrote that in haste. It is SLES 10 in the end. That means it is
> 2.6.16, not 2.6.27. Hence no sparsemem whatsoever. With SLES11 it should
> be OK, we are using flatmem only for i386.
> 
> Whatever, it should be no issue now, as flatmem currently (as of 2.6.25)
> depends on i386.
> 
> On the other hand I still considered the patch as applicable to
> contemporary kernels since there might be weird bios e820 maps and huge
> (and sparse) bootmem allocations/reservations (memory cgroups, initrd)
> so that code requiring much memory below 4g (swiotlb) will fail then.
> 
> Whatever, in the current kernel, the particular issue I was referring to
> *is not reproducible*.

the point is: we should only put the memmap put high. that is big chunk...
other users should be ok... and leave them alone. 

YH

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2010-03-11 21:44 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-04 21:21 mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Greg Thelen
2010-03-05  3:21 ` Johannes Weiner
2010-03-05  5:00   ` Yinghai Lu
2010-03-05  5:14     ` Yinghai Lu
2010-03-05 12:51       ` Johannes Weiner
2010-03-05 16:38         ` Yinghai
2010-03-05  5:17   ` Greg Thelen
2010-03-05  5:34     ` Greg Thelen
2010-03-05 18:41     ` Yinghai Lu
2010-03-05 18:41       ` Yinghai Lu
2010-03-05 19:09       ` Greg Thelen
2010-03-05 19:09         ` Greg Thelen
2010-03-05 20:38       ` [PATCH] x86/bootmem: introduce bootmem_default_goal Yinghai Lu
2010-03-05 20:38         ` Yinghai Lu
2010-03-06  5:44         ` please don't apply : bootmem: avoid DMA32 zone by default Yinghai Lu
2010-03-06  5:44           ` Yinghai Lu
2010-03-07  0:22           ` Andrew Morton
2010-03-07  0:22             ` Andrew Morton
2010-03-07  0:42             ` Yinghai Lu
2010-03-07  0:42               ` Yinghai Lu
2010-03-07  0:53               ` Yinghai Lu
2010-03-07  0:53                 ` Yinghai Lu
2010-03-07  2:15                 ` [PATCH] sparsemem: on no vmemmap path put mem_map on node high too Yinghai Lu
2010-03-07  1:03             ` please don't apply : bootmem: avoid DMA32 zone by default Paul Mackerras
2010-03-07  1:03               ` Paul Mackerras
2010-03-07  1:48               ` Stephen Rothwell
2010-03-07  9:16               ` Russell King
2010-03-07  9:16                 ` Russell King
2010-03-05 23:58       ` mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Johannes Weiner
2010-03-05 23:58         ` Johannes Weiner
2010-03-06  1:50         ` Yinghai Lu
2010-03-06  1:50           ` Yinghai Lu
2010-03-06  2:24           ` Johannes Weiner
2010-03-06  2:24             ` Johannes Weiner
2010-03-06  2:31             ` Yinghai Lu
2010-03-06  2:31               ` Yinghai Lu
2010-03-05  9:04   ` Yinghai Lu
2010-03-05  9:04     ` Yinghai Lu
2010-03-05 10:26     ` Jiri Slaby
2010-03-05 10:26       ` Jiri Slaby
2010-03-05 20:27       ` Yinghai Lu
2010-03-07  1:17       ` Yinghai Lu
2010-03-11 10:54         ` Jiri Slaby
2010-03-11 20:12           ` Yinghai Lu
2010-03-11 21:40             ` Jiri Slaby
2010-03-11 21:42               ` Yinghai Lu
2010-03-05 13:08     ` Johannes Weiner
2010-03-05 13:08       ` Johannes Weiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.