All of lore.kernel.org
 help / color / mirror / Atom feed
* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-12  8:32 ` Kukjin Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Kukjin Kim @ 2010-07-12  8:32 UTC (permalink / raw)
  To: linux-arm-kernel, linux-samsung-soc, 'Russell King - ARM Linux'
  Cc: ben-linux

Russell,

Hi,

Kukjin Kim wrote:
> Russell wrote:
> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
fine.
> > That doesn't mean the section size is 16MB.
> > 
> > As I've already said, the section size has _nothing_ what so ever to do
> > with the size of memory, or the granularity of the size of memory.  By
> > way of illustration, it is perfectly legal to have a section size of
> > 256MB but only have 1MB in a section and this is perfectly legal.  So
> > sections do not have to be completely filled.
> > 
> Actually, as you know, the hole's area of mem_map is freed from bootmem if
a
> section has a hole when initializing sparse memory.
> 
> I identified that a section doesn't need to be a contiguous area of
physical
> memory when reading your comment with the fact that the mem_map of a
section
> can be smaller than the size of a section.
> 
> I found, however, the kernel panics when modifying min_free_kbytes file in
> the proc filesystem if a section has a hole.
> 
> While processing the change of min_free_kbytes in the kernel, page
> descriptors in a hole of an online section is accessed.

As I said, following error happens.
It would be helpful to me if any opinions or comments.

---
When SECTION_SIZE_BITS is 24 (16MiB),

[root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
2736
[root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes

[root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
2730
[root@Samsung ~]#


When SECTION_SIZE_BITS is 28 (256MiB),

[root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
2736
[root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
Unable to handle kernel NULL pointer dereference at virtual address 00000004
pgd = 80a14000
[00000004] *pgd=20a0b031, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT
last sysfs file:
Modules linked in:
CPU: 0    Not tainted  (2.6.35-rc4-00007-g9a59bf7-dirty #3)
PC is at get_pageblock_flags_group+0x54/0xa8
LR is at setup_per_zone_wmarks+0xfc/0x1d4
pc : [<800686cc>]    lr : [<800691a0>]    psr: 60000093
sp : 80a03ed0  ip : 00000001  fp : 00058000
r10: 004a0000  r9 : 801e5fbc  r8 : 802c0000
r7 : 0001c900  r6 : 00025000  r5 : 801e5fa4  r4 : 0000007e
r3 : 00000018  r2 : 00000002  r1 : 00000000  r0 : 00000000
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 20a14019  DAC: 00000015
Process bash (pid: 888, stack limit = 0x80a022e8)
Stack: (0x80a03ed0 to 0x80a04000)
3ec0:                                     00000000 801e5fa4 00025000
800691a0
3ee0: a0000013 000002aa 00000005 00000001 801d8254 2aacb000 00000005
00000001
3f00: 80a02000 80a03f80 00000001 80069354 80a03f80 00000001 801d7900
800d2734
3f20: 80a03f80 00000000 00000000 00000005 80a0903c 00000005 b7c81c00
2aacb000
3f40: 80a03f80 00000005 00000000 800d2760 00000001 2aacb000 00000005
8008ff74
3f60: b7c81c00 2aacb000 b7c81c00 2aacb000 00000000 00000000 00000005
800900c8
3f80: 00000005 00000000 00000005 00000000 00000005 2aacb000 2ac525f8
00000004
3fa0: 8001f0e8 8001ef40 00000005 2aacb000 00000001 2aacb000 00000005
00000000
3fc0: 00000005 2aacb000 2ac525f8 00000004 00000005 000babe0 00000000
00000001
3fe0: 2aacb000 7e88ca58 2ab99124 2abe643c 60000010 00000001 00000000
00000000
[<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>]
(setup_per_zone_wmarks+0xfc/0x1d4)
[<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>]
(min_free_kbytes_sysctl_handler+0x20/0x28)
[<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>]
(proc_sys_call_handler+0x90/0xac)
[<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>]
(proc_sys_write+0x10/0x14)
[<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>]
(vfs_write+0xac/0x154)
[<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68)
[<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>]
(ret_fast_syscall+0x0/0x30)
Code: 11a0cb8c 11a0cbac 1080018c e3a0c001 (e5904004)
---[ end trace 8b96cc6afed783d1 ]---
note: bash[888] exited with preempt_count 1
BUG: scheduling while atomic: bash/888/0x40000002
Modules linked in:
[<80024558>] (unwind_backtrace+0x0/0xec) from [<8014f110>]
(schedule+0x6c/0x2e4)
[<8014f110>] (schedule+0x6c/0x2e4) from [<80035f74>]
(__cond_resched+0x24/0x34)
[<80035f74>] (__cond_resched+0x24/0x34) from [<8014f484>]
(_cond_resched+0x30/0x40)
[<8014f484>] (_cond_resched+0x30/0x40) from [<8007ac1c>]
(unmap_vmas+0x550/0x604)
[<8007ac1c>] (unmap_vmas+0x550/0x604) from [<8007d6b8>]
(exit_mmap+0xb0/0x1d8)
[<8007d6b8>] (exit_mmap+0xb0/0x1d8) from [<80037de8>] (mmput+0x4c/0x104)
[<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c)
[<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc)
[<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc)
[<80022950>] (die+0x190/0x1bc) from [<80025488>]
(__do_kernel_fault+0x64/0x84)
[<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>]
(do_page_fault+0x1d4/0x1ec)
[<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>]
(do_DataAbort+0x34/0x94)
[<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>]
(__dabt_svc+0x4c/0x60)
Exception stack(0x80a03e88 to 0x80a03ed0)
3e80:                   00000000 00000000 00000002 00000018 0000007e
801e5fa4
3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001
80a03ed0
3ec0: 800691a0 800686cc 60000093 ffffffff
[<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>]
(get_pageblock_flags_group+0x54/0xa8)
[<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>]
(setup_per_zone_wmarks+0xfc/0x1d4)
[<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>]
(min_free_kbytes_sysctl_handler+0x20/0x28)
[<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>]
(proc_sys_call_handler+0x90/0xac)
[<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>]
(proc_sys_write+0x10/0x14)
[<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>]
(vfs_write+0xac/0x154)
[<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68)
[<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>]
(ret_fast_syscall+0x0/0x30)
BUG: spinlock lockup on CPU#0, bash/888, 801e5fbc
[<80024558>] (unwind_backtrace+0x0/0xec) from [<801098a8>]
(do_raw_spin_lock+0x110/0x14c)
[<801098a8>] (do_raw_spin_lock+0x110/0x14c) from [<8006963c>]
(free_pcppages_bulk+0x24/0x2f4)
[<8006963c>] (free_pcppages_bulk+0x24/0x2f4) from [<80069f28>]
(free_hot_cold_page+0x15c/0x180)
[<80069f28>] (free_hot_cold_page+0x15c/0x180) from [<8007b834>]
(free_pgd_range+0x15c/0x17c)
[<8007b834>] (free_pgd_range+0x15c/0x17c) from [<8007b8e4>]
(free_pgtables+0x90/0x9c)
[<8007b8e4>] (free_pgtables+0x90/0x9c) from [<8007d714>]
(exit_mmap+0x10c/0x1d8)
[<8007d714>] (exit_mmap+0x10c/0x1d8) from [<80037de8>] (mmput+0x4c/0x104)
[<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c)
[<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc)
[<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc)
[<80022950>] (die+0x190/0x1bc) from [<80025488>]
(__do_kernel_fault+0x64/0x84)
[<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>]
(do_page_fault+0x1d4/0x1ec)
[<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>]
(do_DataAbort+0x34/0x94)
[<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>]
(__dabt_svc+0x4c/0x60)
Exception stack(0x80a03e88 to 0x80a03ed0)
3e80:                   00000000 00000000 00000002 00000018 0000007e
801e5fa4
3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001
80a03ed0
3ec0: 800691a0 800686cc 60000093 ffffffff
[<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>]
(get_pageblock_flags_group+0x54/0xa8)
[<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>]
(setup_per_zone_wmarks+0xfc/0x1d4)
[<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>]
(min_free_kbytes_sysctl_handler+0x20/0x28)
[<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>]
(proc_sys_call_handler+0x90/0xac)
[<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>]
(proc_sys_write+0x10/0x14)
[<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>]
(vfs_write+0xac/0x154)
[<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68)
[<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>]
(ret_fast_syscall+0x0/0x30)


Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-12  8:32 ` Kukjin Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Kukjin Kim @ 2010-07-12  8:32 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

Hi,

Kukjin Kim wrote:
> Russell wrote:
> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
fine.
> > That doesn't mean the section size is 16MB.
> > 
> > As I've already said, the section size has _nothing_ what so ever to do
> > with the size of memory, or the granularity of the size of memory.  By
> > way of illustration, it is perfectly legal to have a section size of
> > 256MB but only have 1MB in a section and this is perfectly legal.  So
> > sections do not have to be completely filled.
> > 
> Actually, as you know, the hole's area of mem_map is freed from bootmem if
a
> section has a hole when initializing sparse memory.
> 
> I identified that a section doesn't need to be a contiguous area of
physical
> memory when reading your comment with the fact that the mem_map of a
section
> can be smaller than the size of a section.
> 
> I found, however, the kernel panics when modifying min_free_kbytes file in
> the proc filesystem if a section has a hole.
> 
> While processing the change of min_free_kbytes in the kernel, page
> descriptors in a hole of an online section is accessed.

As I said, following error happens.
It would be helpful to me if any opinions or comments.

---
When SECTION_SIZE_BITS is 24 (16MiB),

[root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
2736
[root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes

[root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
2730
[root at Samsung ~]#


When SECTION_SIZE_BITS is 28 (256MiB),

[root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
2736
[root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
Unable to handle kernel NULL pointer dereference at virtual address 00000004
pgd = 80a14000
[00000004] *pgd=20a0b031, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT
last sysfs file:
Modules linked in:
CPU: 0    Not tainted  (2.6.35-rc4-00007-g9a59bf7-dirty #3)
PC is at get_pageblock_flags_group+0x54/0xa8
LR is at setup_per_zone_wmarks+0xfc/0x1d4
pc : [<800686cc>]    lr : [<800691a0>]    psr: 60000093
sp : 80a03ed0  ip : 00000001  fp : 00058000
r10: 004a0000  r9 : 801e5fbc  r8 : 802c0000
r7 : 0001c900  r6 : 00025000  r5 : 801e5fa4  r4 : 0000007e
r3 : 00000018  r2 : 00000002  r1 : 00000000  r0 : 00000000
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 20a14019  DAC: 00000015
Process bash (pid: 888, stack limit = 0x80a022e8)
Stack: (0x80a03ed0 to 0x80a04000)
3ec0:                                     00000000 801e5fa4 00025000
800691a0
3ee0: a0000013 000002aa 00000005 00000001 801d8254 2aacb000 00000005
00000001
3f00: 80a02000 80a03f80 00000001 80069354 80a03f80 00000001 801d7900
800d2734
3f20: 80a03f80 00000000 00000000 00000005 80a0903c 00000005 b7c81c00
2aacb000
3f40: 80a03f80 00000005 00000000 800d2760 00000001 2aacb000 00000005
8008ff74
3f60: b7c81c00 2aacb000 b7c81c00 2aacb000 00000000 00000000 00000005
800900c8
3f80: 00000005 00000000 00000005 00000000 00000005 2aacb000 2ac525f8
00000004
3fa0: 8001f0e8 8001ef40 00000005 2aacb000 00000001 2aacb000 00000005
00000000
3fc0: 00000005 2aacb000 2ac525f8 00000004 00000005 000babe0 00000000
00000001
3fe0: 2aacb000 7e88ca58 2ab99124 2abe643c 60000010 00000001 00000000
00000000
[<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>]
(setup_per_zone_wmarks+0xfc/0x1d4)
[<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>]
(min_free_kbytes_sysctl_handler+0x20/0x28)
[<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>]
(proc_sys_call_handler+0x90/0xac)
[<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>]
(proc_sys_write+0x10/0x14)
[<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>]
(vfs_write+0xac/0x154)
[<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68)
[<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>]
(ret_fast_syscall+0x0/0x30)
Code: 11a0cb8c 11a0cbac 1080018c e3a0c001 (e5904004)
---[ end trace 8b96cc6afed783d1 ]---
note: bash[888] exited with preempt_count 1
BUG: scheduling while atomic: bash/888/0x40000002
Modules linked in:
[<80024558>] (unwind_backtrace+0x0/0xec) from [<8014f110>]
(schedule+0x6c/0x2e4)
[<8014f110>] (schedule+0x6c/0x2e4) from [<80035f74>]
(__cond_resched+0x24/0x34)
[<80035f74>] (__cond_resched+0x24/0x34) from [<8014f484>]
(_cond_resched+0x30/0x40)
[<8014f484>] (_cond_resched+0x30/0x40) from [<8007ac1c>]
(unmap_vmas+0x550/0x604)
[<8007ac1c>] (unmap_vmas+0x550/0x604) from [<8007d6b8>]
(exit_mmap+0xb0/0x1d8)
[<8007d6b8>] (exit_mmap+0xb0/0x1d8) from [<80037de8>] (mmput+0x4c/0x104)
[<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c)
[<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc)
[<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc)
[<80022950>] (die+0x190/0x1bc) from [<80025488>]
(__do_kernel_fault+0x64/0x84)
[<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>]
(do_page_fault+0x1d4/0x1ec)
[<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>]
(do_DataAbort+0x34/0x94)
[<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>]
(__dabt_svc+0x4c/0x60)
Exception stack(0x80a03e88 to 0x80a03ed0)
3e80:                   00000000 00000000 00000002 00000018 0000007e
801e5fa4
3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001
80a03ed0
3ec0: 800691a0 800686cc 60000093 ffffffff
[<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>]
(get_pageblock_flags_group+0x54/0xa8)
[<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>]
(setup_per_zone_wmarks+0xfc/0x1d4)
[<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>]
(min_free_kbytes_sysctl_handler+0x20/0x28)
[<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>]
(proc_sys_call_handler+0x90/0xac)
[<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>]
(proc_sys_write+0x10/0x14)
[<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>]
(vfs_write+0xac/0x154)
[<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68)
[<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>]
(ret_fast_syscall+0x0/0x30)
BUG: spinlock lockup on CPU#0, bash/888, 801e5fbc
[<80024558>] (unwind_backtrace+0x0/0xec) from [<801098a8>]
(do_raw_spin_lock+0x110/0x14c)
[<801098a8>] (do_raw_spin_lock+0x110/0x14c) from [<8006963c>]
(free_pcppages_bulk+0x24/0x2f4)
[<8006963c>] (free_pcppages_bulk+0x24/0x2f4) from [<80069f28>]
(free_hot_cold_page+0x15c/0x180)
[<80069f28>] (free_hot_cold_page+0x15c/0x180) from [<8007b834>]
(free_pgd_range+0x15c/0x17c)
[<8007b834>] (free_pgd_range+0x15c/0x17c) from [<8007b8e4>]
(free_pgtables+0x90/0x9c)
[<8007b8e4>] (free_pgtables+0x90/0x9c) from [<8007d714>]
(exit_mmap+0x10c/0x1d8)
[<8007d714>] (exit_mmap+0x10c/0x1d8) from [<80037de8>] (mmput+0x4c/0x104)
[<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c)
[<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc)
[<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc)
[<80022950>] (die+0x190/0x1bc) from [<80025488>]
(__do_kernel_fault+0x64/0x84)
[<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>]
(do_page_fault+0x1d4/0x1ec)
[<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>]
(do_DataAbort+0x34/0x94)
[<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>]
(__dabt_svc+0x4c/0x60)
Exception stack(0x80a03e88 to 0x80a03ed0)
3e80:                   00000000 00000000 00000002 00000018 0000007e
801e5fa4
3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001
80a03ed0
3ec0: 800691a0 800686cc 60000093 ffffffff
[<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>]
(get_pageblock_flags_group+0x54/0xa8)
[<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>]
(setup_per_zone_wmarks+0xfc/0x1d4)
[<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>]
(min_free_kbytes_sysctl_handler+0x20/0x28)
[<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>]
(proc_sys_call_handler+0x90/0xac)
[<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>]
(proc_sys_write+0x10/0x14)
[<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>]
(vfs_write+0xac/0x154)
[<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68)
[<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>]
(ret_fast_syscall+0x0/0x30)


Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-12  8:32 ` Kukjin Kim
@ 2010-07-12  9:35   ` Kyungmin Park
  -1 siblings, 0 replies; 60+ messages in thread
From: Kyungmin Park @ 2010-07-12  9:35 UTC (permalink / raw)
  To: Kukjin Kim
  Cc: linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux

Interesting.

I got tested with
#define MAX_PHYSMEM_BITS        31
#define SECTION_SIZE_BITS       27

# cat /proc/sys/vm/min_free_kbytes
1832
# echo 1828 > /proc/sys/vm/min_free_kbytes
# cat /proc/sys/vm/min_free_kbytes
1828
# echo 1820 > /proc/sys/vm/min_free_kbytes
# cat /proc/sys/vm/min_free_kbytes
1820
# echo 1700 > /proc/sys/vm/min_free_kbytes
# cat /proc/sys/vm/min_free_kbytes
1700

No kernel panic.

Thank you,
Kyungmin Park

On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> Russell,
>
> Hi,
>
> Kukjin Kim wrote:
>> Russell wrote:
>> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
> fine.
>> > That doesn't mean the section size is 16MB.
>> >
>> > As I've already said, the section size has _nothing_ what so ever to do
>> > with the size of memory, or the granularity of the size of memory.  By
>> > way of illustration, it is perfectly legal to have a section size of
>> > 256MB but only have 1MB in a section and this is perfectly legal.  So
>> > sections do not have to be completely filled.
>> >
>> Actually, as you know, the hole's area of mem_map is freed from bootmem if
> a
>> section has a hole when initializing sparse memory.
>>
>> I identified that a section doesn't need to be a contiguous area of
> physical
>> memory when reading your comment with the fact that the mem_map of a
> section
>> can be smaller than the size of a section.
>>
>> I found, however, the kernel panics when modifying min_free_kbytes file in
>> the proc filesystem if a section has a hole.
>>
>> While processing the change of min_free_kbytes in the kernel, page
>> descriptors in a hole of an online section is accessed.
>
> As I said, following error happens.
> It would be helpful to me if any opinions or comments.
>
> ---
> When SECTION_SIZE_BITS is 24 (16MiB),
>
> [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> 2736
> [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
>
> [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> 2730
> [root@Samsung ~]#
>
>
> When SECTION_SIZE_BITS is 28 (256MiB),
>
> [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> 2736
> [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
> Unable to handle kernel NULL pointer dereference at virtual address 00000004
> pgd = 80a14000
> [00000004] *pgd=20a0b031, *pte=00000000, *ppte=00000000
> Internal error: Oops: 17 [#1] PREEMPT
> last sysfs file:
> Modules linked in:
> CPU: 0    Not tainted  (2.6.35-rc4-00007-g9a59bf7-dirty #3)
> PC is at get_pageblock_flags_group+0x54/0xa8
> LR is at setup_per_zone_wmarks+0xfc/0x1d4
> pc : [<800686cc>]    lr : [<800691a0>]    psr: 60000093
> sp : 80a03ed0  ip : 00000001  fp : 00058000
> r10: 004a0000  r9 : 801e5fbc  r8 : 802c0000
> r7 : 0001c900  r6 : 00025000  r5 : 801e5fa4  r4 : 0000007e
> r3 : 00000018  r2 : 00000002  r1 : 00000000  r0 : 00000000
> Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
> Control: 10c5387d  Table: 20a14019  DAC: 00000015
> Process bash (pid: 888, stack limit = 0x80a022e8)
> Stack: (0x80a03ed0 to 0x80a04000)
> 3ec0:                                     00000000 801e5fa4 00025000
> 800691a0
> 3ee0: a0000013 000002aa 00000005 00000001 801d8254 2aacb000 00000005
> 00000001
> 3f00: 80a02000 80a03f80 00000001 80069354 80a03f80 00000001 801d7900
> 800d2734
> 3f20: 80a03f80 00000000 00000000 00000005 80a0903c 00000005 b7c81c00
> 2aacb000
> 3f40: 80a03f80 00000005 00000000 800d2760 00000001 2aacb000 00000005
> 8008ff74
> 3f60: b7c81c00 2aacb000 b7c81c00 2aacb000 00000000 00000000 00000005
> 800900c8
> 3f80: 00000005 00000000 00000005 00000000 00000005 2aacb000 2ac525f8
> 00000004
> 3fa0: 8001f0e8 8001ef40 00000005 2aacb000 00000001 2aacb000 00000005
> 00000000
> 3fc0: 00000005 2aacb000 2ac525f8 00000004 00000005 000babe0 00000000
> 00000001
> 3fe0: 2aacb000 7e88ca58 2ab99124 2abe643c 60000010 00000001 00000000
> 00000000
> [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>]
> (setup_per_zone_wmarks+0xfc/0x1d4)
> [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>]
> (min_free_kbytes_sysctl_handler+0x20/0x28)
> [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>]
> (proc_sys_call_handler+0x90/0xac)
> [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>]
> (proc_sys_write+0x10/0x14)
> [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>]
> (vfs_write+0xac/0x154)
> [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68)
> [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>]
> (ret_fast_syscall+0x0/0x30)
> Code: 11a0cb8c 11a0cbac 1080018c e3a0c001 (e5904004)
> ---[ end trace 8b96cc6afed783d1 ]---
> note: bash[888] exited with preempt_count 1
> BUG: scheduling while atomic: bash/888/0x40000002
> Modules linked in:
> [<80024558>] (unwind_backtrace+0x0/0xec) from [<8014f110>]
> (schedule+0x6c/0x2e4)
> [<8014f110>] (schedule+0x6c/0x2e4) from [<80035f74>]
> (__cond_resched+0x24/0x34)
> [<80035f74>] (__cond_resched+0x24/0x34) from [<8014f484>]
> (_cond_resched+0x30/0x40)
> [<8014f484>] (_cond_resched+0x30/0x40) from [<8007ac1c>]
> (unmap_vmas+0x550/0x604)
> [<8007ac1c>] (unmap_vmas+0x550/0x604) from [<8007d6b8>]
> (exit_mmap+0xb0/0x1d8)
> [<8007d6b8>] (exit_mmap+0xb0/0x1d8) from [<80037de8>] (mmput+0x4c/0x104)
> [<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c)
> [<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc)
> [<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc)
> [<80022950>] (die+0x190/0x1bc) from [<80025488>]
> (__do_kernel_fault+0x64/0x84)
> [<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>]
> (do_page_fault+0x1d4/0x1ec)
> [<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>]
> (do_DataAbort+0x34/0x94)
> [<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>]
> (__dabt_svc+0x4c/0x60)
> Exception stack(0x80a03e88 to 0x80a03ed0)
> 3e80:                   00000000 00000000 00000002 00000018 0000007e
> 801e5fa4
> 3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001
> 80a03ed0
> 3ec0: 800691a0 800686cc 60000093 ffffffff
> [<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>]
> (get_pageblock_flags_group+0x54/0xa8)
> [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>]
> (setup_per_zone_wmarks+0xfc/0x1d4)
> [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>]
> (min_free_kbytes_sysctl_handler+0x20/0x28)
> [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>]
> (proc_sys_call_handler+0x90/0xac)
> [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>]
> (proc_sys_write+0x10/0x14)
> [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>]
> (vfs_write+0xac/0x154)
> [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68)
> [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>]
> (ret_fast_syscall+0x0/0x30)
> BUG: spinlock lockup on CPU#0, bash/888, 801e5fbc
> [<80024558>] (unwind_backtrace+0x0/0xec) from [<801098a8>]
> (do_raw_spin_lock+0x110/0x14c)
> [<801098a8>] (do_raw_spin_lock+0x110/0x14c) from [<8006963c>]
> (free_pcppages_bulk+0x24/0x2f4)
> [<8006963c>] (free_pcppages_bulk+0x24/0x2f4) from [<80069f28>]
> (free_hot_cold_page+0x15c/0x180)
> [<80069f28>] (free_hot_cold_page+0x15c/0x180) from [<8007b834>]
> (free_pgd_range+0x15c/0x17c)
> [<8007b834>] (free_pgd_range+0x15c/0x17c) from [<8007b8e4>]
> (free_pgtables+0x90/0x9c)
> [<8007b8e4>] (free_pgtables+0x90/0x9c) from [<8007d714>]
> (exit_mmap+0x10c/0x1d8)
> [<8007d714>] (exit_mmap+0x10c/0x1d8) from [<80037de8>] (mmput+0x4c/0x104)
> [<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c)
> [<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc)
> [<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc)
> [<80022950>] (die+0x190/0x1bc) from [<80025488>]
> (__do_kernel_fault+0x64/0x84)
> [<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>]
> (do_page_fault+0x1d4/0x1ec)
> [<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>]
> (do_DataAbort+0x34/0x94)
> [<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>]
> (__dabt_svc+0x4c/0x60)
> Exception stack(0x80a03e88 to 0x80a03ed0)
> 3e80:                   00000000 00000000 00000002 00000018 0000007e
> 801e5fa4
> 3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001
> 80a03ed0
> 3ec0: 800691a0 800686cc 60000093 ffffffff
> [<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>]
> (get_pageblock_flags_group+0x54/0xa8)
> [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>]
> (setup_per_zone_wmarks+0xfc/0x1d4)
> [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>]
> (min_free_kbytes_sysctl_handler+0x20/0x28)
> [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>]
> (proc_sys_call_handler+0x90/0xac)
> [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>]
> (proc_sys_write+0x10/0x14)
> [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>]
> (vfs_write+0xac/0x154)
> [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68)
> [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>]
> (ret_fast_syscall+0x0/0x30)
>
>
> Thanks.
>
> Best regards,
> Kgene.
> --
> Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
> SW Solution Development Team, Samsung Electronics Co., Ltd.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-12  9:35   ` Kyungmin Park
  0 siblings, 0 replies; 60+ messages in thread
From: Kyungmin Park @ 2010-07-12  9:35 UTC (permalink / raw)
  To: linux-arm-kernel

Interesting.

I got tested with
#define MAX_PHYSMEM_BITS        31
#define SECTION_SIZE_BITS       27

# cat /proc/sys/vm/min_free_kbytes
1832
# echo 1828 > /proc/sys/vm/min_free_kbytes
# cat /proc/sys/vm/min_free_kbytes
1828
# echo 1820 > /proc/sys/vm/min_free_kbytes
# cat /proc/sys/vm/min_free_kbytes
1820
# echo 1700 > /proc/sys/vm/min_free_kbytes
# cat /proc/sys/vm/min_free_kbytes
1700

No kernel panic.

Thank you,
Kyungmin Park

On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> Russell,
>
> Hi,
>
> Kukjin Kim wrote:
>> Russell wrote:
>> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's
> fine.
>> > That doesn't mean the section size is 16MB.
>> >
>> > As I've already said, the section size has _nothing_ what so ever to do
>> > with the size of memory, or the granularity of the size of memory. ?By
>> > way of illustration, it is perfectly legal to have a section size of
>> > 256MB but only have 1MB in a section and this is perfectly legal. ?So
>> > sections do not have to be completely filled.
>> >
>> Actually, as you know, the hole's area of mem_map is freed from bootmem if
> a
>> section has a hole when initializing sparse memory.
>>
>> I identified that a section doesn't need to be a contiguous area of
> physical
>> memory when reading your comment with the fact that the mem_map of a
> section
>> can be smaller than the size of a section.
>>
>> I found, however, the kernel panics when modifying min_free_kbytes file in
>> the proc filesystem if a section has a hole.
>>
>> While processing the change of min_free_kbytes in the kernel, page
>> descriptors in a hole of an online section is accessed.
>
> As I said, following error happens.
> It would be helpful to me if any opinions or comments.
>
> ---
> When SECTION_SIZE_BITS is 24 (16MiB),
>
> [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> 2736
> [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
>
> [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> 2730
> [root at Samsung ~]#
>
>
> When SECTION_SIZE_BITS is 28 (256MiB),
>
> [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> 2736
> [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
> Unable to handle kernel NULL pointer dereference at virtual address 00000004
> pgd = 80a14000
> [00000004] *pgd=20a0b031, *pte=00000000, *ppte=00000000
> Internal error: Oops: 17 [#1] PREEMPT
> last sysfs file:
> Modules linked in:
> CPU: 0 ? ?Not tainted ?(2.6.35-rc4-00007-g9a59bf7-dirty #3)
> PC is at get_pageblock_flags_group+0x54/0xa8
> LR is at setup_per_zone_wmarks+0xfc/0x1d4
> pc : [<800686cc>] ? ?lr : [<800691a0>] ? ?psr: 60000093
> sp : 80a03ed0 ?ip : 00000001 ?fp : 00058000
> r10: 004a0000 ?r9 : 801e5fbc ?r8 : 802c0000
> r7 : 0001c900 ?r6 : 00025000 ?r5 : 801e5fa4 ?r4 : 0000007e
> r3 : 00000018 ?r2 : 00000002 ?r1 : 00000000 ?r0 : 00000000
> Flags: nZCv ?IRQs off ?FIQs on ?Mode SVC_32 ?ISA ARM ?Segment user
> Control: 10c5387d ?Table: 20a14019 ?DAC: 00000015
> Process bash (pid: 888, stack limit = 0x80a022e8)
> Stack: (0x80a03ed0 to 0x80a04000)
> 3ec0: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 00000000 801e5fa4 00025000
> 800691a0
> 3ee0: a0000013 000002aa 00000005 00000001 801d8254 2aacb000 00000005
> 00000001
> 3f00: 80a02000 80a03f80 00000001 80069354 80a03f80 00000001 801d7900
> 800d2734
> 3f20: 80a03f80 00000000 00000000 00000005 80a0903c 00000005 b7c81c00
> 2aacb000
> 3f40: 80a03f80 00000005 00000000 800d2760 00000001 2aacb000 00000005
> 8008ff74
> 3f60: b7c81c00 2aacb000 b7c81c00 2aacb000 00000000 00000000 00000005
> 800900c8
> 3f80: 00000005 00000000 00000005 00000000 00000005 2aacb000 2ac525f8
> 00000004
> 3fa0: 8001f0e8 8001ef40 00000005 2aacb000 00000001 2aacb000 00000005
> 00000000
> 3fc0: 00000005 2aacb000 2ac525f8 00000004 00000005 000babe0 00000000
> 00000001
> 3fe0: 2aacb000 7e88ca58 2ab99124 2abe643c 60000010 00000001 00000000
> 00000000
> [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>]
> (setup_per_zone_wmarks+0xfc/0x1d4)
> [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>]
> (min_free_kbytes_sysctl_handler+0x20/0x28)
> [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>]
> (proc_sys_call_handler+0x90/0xac)
> [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>]
> (proc_sys_write+0x10/0x14)
> [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>]
> (vfs_write+0xac/0x154)
> [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68)
> [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>]
> (ret_fast_syscall+0x0/0x30)
> Code: 11a0cb8c 11a0cbac 1080018c e3a0c001 (e5904004)
> ---[ end trace 8b96cc6afed783d1 ]---
> note: bash[888] exited with preempt_count 1
> BUG: scheduling while atomic: bash/888/0x40000002
> Modules linked in:
> [<80024558>] (unwind_backtrace+0x0/0xec) from [<8014f110>]
> (schedule+0x6c/0x2e4)
> [<8014f110>] (schedule+0x6c/0x2e4) from [<80035f74>]
> (__cond_resched+0x24/0x34)
> [<80035f74>] (__cond_resched+0x24/0x34) from [<8014f484>]
> (_cond_resched+0x30/0x40)
> [<8014f484>] (_cond_resched+0x30/0x40) from [<8007ac1c>]
> (unmap_vmas+0x550/0x604)
> [<8007ac1c>] (unmap_vmas+0x550/0x604) from [<8007d6b8>]
> (exit_mmap+0xb0/0x1d8)
> [<8007d6b8>] (exit_mmap+0xb0/0x1d8) from [<80037de8>] (mmput+0x4c/0x104)
> [<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c)
> [<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc)
> [<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc)
> [<80022950>] (die+0x190/0x1bc) from [<80025488>]
> (__do_kernel_fault+0x64/0x84)
> [<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>]
> (do_page_fault+0x1d4/0x1ec)
> [<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>]
> (do_DataAbort+0x34/0x94)
> [<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>]
> (__dabt_svc+0x4c/0x60)
> Exception stack(0x80a03e88 to 0x80a03ed0)
> 3e80: ? ? ? ? ? ? ? ? ? 00000000 00000000 00000002 00000018 0000007e
> 801e5fa4
> 3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001
> 80a03ed0
> 3ec0: 800691a0 800686cc 60000093 ffffffff
> [<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>]
> (get_pageblock_flags_group+0x54/0xa8)
> [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>]
> (setup_per_zone_wmarks+0xfc/0x1d4)
> [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>]
> (min_free_kbytes_sysctl_handler+0x20/0x28)
> [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>]
> (proc_sys_call_handler+0x90/0xac)
> [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>]
> (proc_sys_write+0x10/0x14)
> [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>]
> (vfs_write+0xac/0x154)
> [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68)
> [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>]
> (ret_fast_syscall+0x0/0x30)
> BUG: spinlock lockup on CPU#0, bash/888, 801e5fbc
> [<80024558>] (unwind_backtrace+0x0/0xec) from [<801098a8>]
> (do_raw_spin_lock+0x110/0x14c)
> [<801098a8>] (do_raw_spin_lock+0x110/0x14c) from [<8006963c>]
> (free_pcppages_bulk+0x24/0x2f4)
> [<8006963c>] (free_pcppages_bulk+0x24/0x2f4) from [<80069f28>]
> (free_hot_cold_page+0x15c/0x180)
> [<80069f28>] (free_hot_cold_page+0x15c/0x180) from [<8007b834>]
> (free_pgd_range+0x15c/0x17c)
> [<8007b834>] (free_pgd_range+0x15c/0x17c) from [<8007b8e4>]
> (free_pgtables+0x90/0x9c)
> [<8007b8e4>] (free_pgtables+0x90/0x9c) from [<8007d714>]
> (exit_mmap+0x10c/0x1d8)
> [<8007d714>] (exit_mmap+0x10c/0x1d8) from [<80037de8>] (mmput+0x4c/0x104)
> [<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c)
> [<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc)
> [<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc)
> [<80022950>] (die+0x190/0x1bc) from [<80025488>]
> (__do_kernel_fault+0x64/0x84)
> [<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>]
> (do_page_fault+0x1d4/0x1ec)
> [<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>]
> (do_DataAbort+0x34/0x94)
> [<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>]
> (__dabt_svc+0x4c/0x60)
> Exception stack(0x80a03e88 to 0x80a03ed0)
> 3e80: ? ? ? ? ? ? ? ? ? 00000000 00000000 00000002 00000018 0000007e
> 801e5fa4
> 3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001
> 80a03ed0
> 3ec0: 800691a0 800686cc 60000093 ffffffff
> [<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>]
> (get_pageblock_flags_group+0x54/0xa8)
> [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>]
> (setup_per_zone_wmarks+0xfc/0x1d4)
> [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>]
> (min_free_kbytes_sysctl_handler+0x20/0x28)
> [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>]
> (proc_sys_call_handler+0x90/0xac)
> [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>]
> (proc_sys_write+0x10/0x14)
> [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>]
> (vfs_write+0xac/0x154)
> [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68)
> [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>]
> (ret_fast_syscall+0x0/0x30)
>
>
> Thanks.
>
> Best regards,
> Kgene.
> --
> Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
> SW Solution Development Team, Samsung Electronics Co., Ltd.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-12  8:32 ` Kukjin Kim
@ 2010-07-12  9:52   ` Minchan Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-12  9:52 UTC (permalink / raw)
  To: Kukjin Kim
  Cc: linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux

Hello.

On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> Russell,
>
> Hi,
>
> Kukjin Kim wrote:
>> Russell wrote:
>> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
> fine.
>> > That doesn't mean the section size is 16MB.
>> >
>> > As I've already said, the section size has _nothing_ what so ever to do
>> > with the size of memory, or the granularity of the size of memory.  By
>> > way of illustration, it is perfectly legal to have a section size of
>> > 256MB but only have 1MB in a section and this is perfectly legal.  So
>> > sections do not have to be completely filled.
>> >
>> Actually, as you know, the hole's area of mem_map is freed from bootmem if
> a
>> section has a hole when initializing sparse memory.
>>
>> I identified that a section doesn't need to be a contiguous area of
> physical
>> memory when reading your comment with the fact that the mem_map of a
> section
>> can be smaller than the size of a section.
>>
>> I found, however, the kernel panics when modifying min_free_kbytes file in
>> the proc filesystem if a section has a hole.
>>
>> While processing the change of min_free_kbytes in the kernel, page
>> descriptors in a hole of an online section is accessed.
>
> As I said, following error happens.
> It would be helpful to me if any opinions or comments.
>

Could you test below patch?
Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.

@@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone *zone)
        for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
                if (!pfn_valid(pfn))
                        continue;
+
                page = pfn_to_page(pfn);

+                /* Watch for unexpected holes punched in the memmap */
+                if (!memmap_valid_within(pfn, page, zone))
+                        continue;
+
                /* Watch out for overlapping nodes */
                if (page_to_nid(page) != zone_to_nid(zone))
                        continue;



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-12  9:52   ` Minchan Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-12  9:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hello.

On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> Russell,
>
> Hi,
>
> Kukjin Kim wrote:
>> Russell wrote:
>> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's
> fine.
>> > That doesn't mean the section size is 16MB.
>> >
>> > As I've already said, the section size has _nothing_ what so ever to do
>> > with the size of memory, or the granularity of the size of memory. ?By
>> > way of illustration, it is perfectly legal to have a section size of
>> > 256MB but only have 1MB in a section and this is perfectly legal. ?So
>> > sections do not have to be completely filled.
>> >
>> Actually, as you know, the hole's area of mem_map is freed from bootmem if
> a
>> section has a hole when initializing sparse memory.
>>
>> I identified that a section doesn't need to be a contiguous area of
> physical
>> memory when reading your comment with the fact that the mem_map of a
> section
>> can be smaller than the size of a section.
>>
>> I found, however, the kernel panics when modifying min_free_kbytes file in
>> the proc filesystem if a section has a hole.
>>
>> While processing the change of min_free_kbytes in the kernel, page
>> descriptors in a hole of an online section is accessed.
>
> As I said, following error happens.
> It would be helpful to me if any opinions or comments.
>

Could you test below patch?
Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.

@@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone *zone)
        for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
                if (!pfn_valid(pfn))
                        continue;
+
                page = pfn_to_page(pfn);

+                /* Watch for unexpected holes punched in the memmap */
+                if (!memmap_valid_within(pfn, page, zone))
+                        continue;
+
                /* Watch out for overlapping nodes */
                if (page_to_nid(page) != zone_to_nid(zone))
                        continue;



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: About SECTION_SIZE_BITS for Sparsemem
  2010-07-12  9:35   ` Kyungmin Park
@ 2010-07-12  9:58     ` Kukjin Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Kukjin Kim @ 2010-07-12  9:58 UTC (permalink / raw)
  To: 'Kyungmin Park'
  Cc: linux-arm-kernel, linux-samsung-soc,
	'Russell King - ARM Linux',
	ben-linux

Kyungmin Park wrote:

> 
> Interesting.
> 
> I got tested with
> #define MAX_PHYSMEM_BITS        31
> #define SECTION_SIZE_BITS       27
> 
> # cat /proc/sys/vm/min_free_kbytes
> 1832
> # echo 1828 > /proc/sys/vm/min_free_kbytes
> # cat /proc/sys/vm/min_free_kbytes
> 1828
> # echo 1820 > /proc/sys/vm/min_free_kbytes
> # cat /proc/sys/vm/min_free_kbytes
> 1820
> # echo 1700 > /proc/sys/vm/min_free_kbytes
> # cat /proc/sys/vm/min_free_kbytes
> 1700
> 
> No kernel panic.
> 

Thanks for your test on the board.

But I need your environment for comparing.

Please let me know.
Following is my board environment.

From boot-loader(u-boot).

SMDKC110 # bdinfo
arch_number = 0x00000891
env_t       = 0x00000000
boot_params = 0x20000100
DRAM bank   = 0x00000000
-> start    = 0x20000000
-> size     = 0x05000000
DRAM bank   = 0x00000001
-> start    = 0x40000000
-> size     = 0x10000000
DRAM bank   = 0x00000002
-> start    = 0x50000000
-> size     = 0x08000000

From kernel boot message.

SMDKC110 # bootm c0008000
Boot with zImage

Starting kernel ...

Uncompressing Linux... done, booting the kernel.
Linux version 2.6.35-rc4-00008-g98c749c (kgene@starstone) (gcc version 4.4.1
(Sourcery G++ Lite 2009q3-67) ) #1 PREEMPT Mon Jul 12 18:47:04 KST 2010
CPU: ARMv7 Processor [412fc082] revision 2 (ARMv7), cr=10c53c7f
CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
Machine: SMDKC110
...
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 80MB 256MB 128MB = 464MB total
Memory: 459616k/459616k available, 15520k reserved, 0K highmem
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    DMA     : 0xff000000 - 0xffe00000   (  14 MB)
    vmalloc : 0xb8800000 - 0xe0000000   ( 632 MB)
    lowmem  : 0x80000000 - 0xb8000000   ( 896 MB)
    modules : 0x7f000000 - 0x80000000   (  16 MB)
      .init : 0x80008000 - 0x8001e000   (  88 kB)
      .text : 0x8001e000 - 0x801be000   (1664 kB)
      .data : 0x801ce000 - 0x801e6600   (  98 kB)
SLUB: Genslabs=9, HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Hierarchical RCU implementation.
        RCU-based detection of stalled CPUs is disabled.
        Verbose stalled-CPUs detection is disabled.
...

And SECTION_SIZE_BITS is 28, not 27.

> Thank you,
> Kyungmin Park
> 
> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> > Russell,
> >
> > Hi,
> >
> > Kukjin Kim wrote:
> >> Russell wrote:
> >> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
> > fine.
> >> > That doesn't mean the section size is 16MB.
> >> >
> >> > As I've already said, the section size has _nothing_ what so ever to
do
> >> > with the size of memory, or the granularity of the size of memory.
 By
> >> > way of illustration, it is perfectly legal to have a section size of
> >> > 256MB but only have 1MB in a section and this is perfectly legal.  So
> >> > sections do not have to be completely filled.
> >> >
> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
if
> > a
> >> section has a hole when initializing sparse memory.
> >>
> >> I identified that a section doesn't need to be a contiguous area of
> > physical
> >> memory when reading your comment with the fact that the mem_map of a
> > section
> >> can be smaller than the size of a section.
> >>
> >> I found, however, the kernel panics when modifying min_free_kbytes file
in
> >> the proc filesystem if a section has a hole.
> >>
> >> While processing the change of min_free_kbytes in the kernel, page
> >> descriptors in a hole of an online section is accessed.
> >
> > As I said, following error happens.
> > It would be helpful to me if any opinions or comments.
> >
> > ---


(snip)


Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-12  9:58     ` Kukjin Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Kukjin Kim @ 2010-07-12  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

Kyungmin Park wrote:

> 
> Interesting.
> 
> I got tested with
> #define MAX_PHYSMEM_BITS        31
> #define SECTION_SIZE_BITS       27
> 
> # cat /proc/sys/vm/min_free_kbytes
> 1832
> # echo 1828 > /proc/sys/vm/min_free_kbytes
> # cat /proc/sys/vm/min_free_kbytes
> 1828
> # echo 1820 > /proc/sys/vm/min_free_kbytes
> # cat /proc/sys/vm/min_free_kbytes
> 1820
> # echo 1700 > /proc/sys/vm/min_free_kbytes
> # cat /proc/sys/vm/min_free_kbytes
> 1700
> 
> No kernel panic.
> 

Thanks for your test on the board.

But I need your environment for comparing.

Please let me know.
Following is my board environment.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-12  9:58     ` Kukjin Kim
@ 2010-07-12 10:08       ` Kyungmin Park
  -1 siblings, 0 replies; 60+ messages in thread
From: Kyungmin Park @ 2010-07-12 10:08 UTC (permalink / raw)
  To: Kukjin Kim
  Cc: linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux

On Mon, Jul 12, 2010 at 6:58 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> Kyungmin Park wrote:
>
>>
>> Interesting.
>>
>> I got tested with
>> #define MAX_PHYSMEM_BITS        31
>> #define SECTION_SIZE_BITS       27
>>
>> # cat /proc/sys/vm/min_free_kbytes
>> 1832
>> # echo 1828 > /proc/sys/vm/min_free_kbytes
>> # cat /proc/sys/vm/min_free_kbytes
>> 1828
>> # echo 1820 > /proc/sys/vm/min_free_kbytes
>> # cat /proc/sys/vm/min_free_kbytes
>> 1820
>> # echo 1700 > /proc/sys/vm/min_free_kbytes
>> # cat /proc/sys/vm/min_free_kbytes
>> 1700
>>
>> No kernel panic.
>>
>
> Thanks for your test on the board.
>
> But I need your environment for comparing.
>
> Please let me know.
> Following is my board environment.
>
> From boot-loader(u-boot).
>
> SMDKC110 # bdinfo
> arch_number = 0x00000891
> env_t       = 0x00000000
> boot_params = 0x20000100
> DRAM bank   = 0x00000000
> -> start    = 0x20000000
> -> size     = 0x05000000
> DRAM bank   = 0x00000001
> -> start    = 0x40000000
> -> size     = 0x10000000
> DRAM bank   = 0x00000002
> -> start    = 0x50000000
> -> size     = 0x08000000
>
> From kernel boot message.
>
> SMDKC110 # bootm c0008000
> Boot with zImage
>
> Starting kernel ...
>
> Uncompressing Linux... done, booting the kernel.
> Linux version 2.6.35-rc4-00008-g98c749c (kgene@starstone) (gcc version 4.4.1
> (Sourcery G++ Lite 2009q3-67) ) #1 PREEMPT Mon Jul 12 18:47:04 KST 2010
> CPU: ARMv7 Processor [412fc082] revision 2 (ARMv7), cr=10c53c7f
> CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
> Machine: SMDKC110
> ...
> Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
> Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
> Memory: 80MB 256MB 128MB = 464MB total
> Memory: 459616k/459616k available, 15520k reserved, 0K highmem
> Virtual kernel memory layout:
>    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
>    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
>    DMA     : 0xff000000 - 0xffe00000   (  14 MB)
>    vmalloc : 0xb8800000 - 0xe0000000   ( 632 MB)
>    lowmem  : 0x80000000 - 0xb8000000   ( 896 MB)
>    modules : 0x7f000000 - 0x80000000   (  16 MB)
>      .init : 0x80008000 - 0x8001e000   (  88 kB)
>      .text : 0x8001e000 - 0x801be000   (1664 kB)
>      .data : 0x801ce000 - 0x801e6600   (  98 kB)
> SLUB: Genslabs=9, HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> Hierarchical RCU implementation.
>        RCU-based detection of stalled CPUs is disabled.
>        Verbose stalled-CPUs detection is disabled.
> ...
>
> And SECTION_SIZE_BITS is 28, not 27.

27 for our board. 28 for generic.

Note that I enabled HIGHMEM and modify the VMALLOC_END to 0xd000'0000
to test HIGHMEM.

Universal # bdinfo
arch_number = 0x00000B2E
env_t       = 0x00000000
boot_params = 0x30000100
DRAM bank   = 0x00000000
-> start    = 0x30000000
-> size     = 0x05000000
DRAM bank   = 0x00000001
-> start    = 0x40000000
-> size     = 0x18000000
baudrate    = 115200 bps
Universal # run bootcmd

OneNAND read: offset 0xc00000, size 0x600000
 6291456 bytes read: OK
## Booting kernel from Legacy Image at 30007fc0 ...
   Image Name:   Linux-2.6.35-rc4+
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    1548496 Bytes = 1.5 MiB
   Load Address: 30008000
   Entry Point:  30008000
   Loading Kernel Image ... OK
OK

Starting kernel ...

Uncompressing Linux... done, booting the kernel.
[    0.000000] Linux version 2.6.35-rc4+ (kmpark@july) (gcc version 4.4.1 (GCC)0
[    0.000000] CPU: ARMv7 Processor [412fc082] revision 2 (ARMv7), cr=10c53c7f
[    0.000000] CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction ce
[    0.000000] Machine: GONI
[    0.000000] Ignoring unrecognised tag 0x54410008
[    0.000000] Memory policy: ECC disabled, Data cache writeback
[    0.000000] CPU S5PV210/S5PC110 (id 0x43110200)
...
[    0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes)
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
[    0.000000] Memory: 80MB 128MB 128MB 128MB = 464MB total
[    0.000000] Memory: 468224k/468224k available, 6912k reserved, 262144K highmm
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
[    0.000000]     DMA     : 0xff000000 - 0xffe00000   (  14 MB)
[    0.000000]     vmalloc : 0xd8800000 - 0xe0000000   ( 120 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xd8000000   ( 384 MB)
[    0.000000]     pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
[    0.000000]     modules : 0xbf000000 - 0xbfe00000   (  14 MB)
[    0.000000]       .init : 0xc0008000 - 0xc001b000   (  76 kB)
[    0.000000]       .text : 0xc001b000 - 0xc0282000   (2460 kB)
[    0.000000]       .data : 0xc0298000 - 0xc02b1160   ( 101 kB)
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  RCU-based detection of stalled CPUs is disabled.
[    0.000000]  Verbose stalled-CPUs detection is disabled.
[    0.000000] NR_IRQS:176
[    0.000000] VIC @f4000000: id 0x00041192, vendor 0x41
[    0.000000] VIC @f4010000: id 0x00041192, vendor 0x41
[    0.000000] VIC @f4020000: id 0x00041192, vendor 0x41
[    0.000000] VIC @f4030000: id 0x00041192, vendor 0x41
[    0.000000] Console: colour dummy device 80x30
[    0.000000] console [ttySAC2] enabled
[    0.010000] Calibrating delay loop... 797.90 BogoMIPS (lpj=1994752)


>
>> Thank you,
>> Kyungmin Park
>>
>> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
>> > Russell,
>> >
>> > Hi,
>> >
>> > Kukjin Kim wrote:
>> >> Russell wrote:
>> >> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
>> > fine.
>> >> > That doesn't mean the section size is 16MB.
>> >> >
>> >> > As I've already said, the section size has _nothing_ what so ever to
> do
>> >> > with the size of memory, or the granularity of the size of memory.
>  By
>> >> > way of illustration, it is perfectly legal to have a section size of
>> >> > 256MB but only have 1MB in a section and this is perfectly legal.  So
>> >> > sections do not have to be completely filled.
>> >> >
>> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
> if
>> > a
>> >> section has a hole when initializing sparse memory.
>> >>
>> >> I identified that a section doesn't need to be a contiguous area of
>> > physical
>> >> memory when reading your comment with the fact that the mem_map of a
>> > section
>> >> can be smaller than the size of a section.
>> >>
>> >> I found, however, the kernel panics when modifying min_free_kbytes file
> in
>> >> the proc filesystem if a section has a hole.
>> >>
>> >> While processing the change of min_free_kbytes in the kernel, page
>> >> descriptors in a hole of an online section is accessed.
>> >
>> > As I said, following error happens.
>> > It would be helpful to me if any opinions or comments.
>> >
>> > ---
>
>
> (snip)
>
>
> Thanks.
>
> Best regards,
> Kgene.
> --
> Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
> SW Solution Development Team, Samsung Electronics Co., Ltd.
>
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-12 10:08       ` Kyungmin Park
  0 siblings, 0 replies; 60+ messages in thread
From: Kyungmin Park @ 2010-07-12 10:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 12, 2010 at 6:58 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> Kyungmin Park wrote:
>
>>
>> Interesting.
>>
>> I got tested with
>> #define MAX_PHYSMEM_BITS ? ? ? ?31
>> #define SECTION_SIZE_BITS ? ? ? 27
>>
>> # cat /proc/sys/vm/min_free_kbytes
>> 1832
>> # echo 1828 > /proc/sys/vm/min_free_kbytes
>> # cat /proc/sys/vm/min_free_kbytes
>> 1828
>> # echo 1820 > /proc/sys/vm/min_free_kbytes
>> # cat /proc/sys/vm/min_free_kbytes
>> 1820
>> # echo 1700 > /proc/sys/vm/min_free_kbytes
>> # cat /proc/sys/vm/min_free_kbytes
>> 1700
>>
>> No kernel panic.
>>
>
> Thanks for your test on the board.
>
> But I need your environment for comparing.
>
> Please let me know.
> Following is my board environment.
>
> From boot-loader(u-boot).
>
> SMDKC110 # bdinfo
> arch_number = 0x00000891
> env_t ? ? ? = 0x00000000
> boot_params = 0x20000100
> DRAM bank ? = 0x00000000
> -> start ? ?= 0x20000000
> -> size ? ? = 0x05000000
> DRAM bank ? = 0x00000001
> -> start ? ?= 0x40000000
> -> size ? ? = 0x10000000
> DRAM bank ? = 0x00000002
> -> start ? ?= 0x50000000
> -> size ? ? = 0x08000000
>
> From kernel boot message.
>
> SMDKC110 # bootm c0008000
> Boot with zImage
>
> Starting kernel ...
>
> Uncompressing Linux... done, booting the kernel.
> Linux version 2.6.35-rc4-00008-g98c749c (kgene at starstone) (gcc version 4.4.1
> (Sourcery G++ Lite 2009q3-67) ) #1 PREEMPT Mon Jul 12 18:47:04 KST 2010
> CPU: ARMv7 Processor [412fc082] revision 2 (ARMv7), cr=10c53c7f
> CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
> Machine: SMDKC110
> ...
> Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
> Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
> Memory: 80MB 256MB 128MB = 464MB total
> Memory: 459616k/459616k available, 15520k reserved, 0K highmem
> Virtual kernel memory layout:
> ? ?vector ?: 0xffff0000 - 0xffff1000 ? ( ? 4 kB)
> ? ?fixmap ?: 0xfff00000 - 0xfffe0000 ? ( 896 kB)
> ? ?DMA ? ? : 0xff000000 - 0xffe00000 ? ( ?14 MB)
> ? ?vmalloc : 0xb8800000 - 0xe0000000 ? ( 632 MB)
> ? ?lowmem ?: 0x80000000 - 0xb8000000 ? ( 896 MB)
> ? ?modules : 0x7f000000 - 0x80000000 ? ( ?16 MB)
> ? ? ?.init : 0x80008000 - 0x8001e000 ? ( ?88 kB)
> ? ? ?.text : 0x8001e000 - 0x801be000 ? (1664 kB)
> ? ? ?.data : 0x801ce000 - 0x801e6600 ? ( ?98 kB)
> SLUB: Genslabs=9, HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> Hierarchical RCU implementation.
> ? ? ? ?RCU-based detection of stalled CPUs is disabled.
> ? ? ? ?Verbose stalled-CPUs detection is disabled.
> ...
>
> And SECTION_SIZE_BITS is 28, not 27.

27 for our board. 28 for generic.

Note that I enabled HIGHMEM and modify the VMALLOC_END to 0xd000'0000
to test HIGHMEM.

Universal # bdinfo
arch_number = 0x00000B2E
env_t       = 0x00000000
boot_params = 0x30000100
DRAM bank   = 0x00000000
-> start    = 0x30000000
-> size     = 0x05000000
DRAM bank   = 0x00000001
-> start    = 0x40000000
-> size     = 0x18000000
baudrate    = 115200 bps
Universal # run bootcmd

OneNAND read: offset 0xc00000, size 0x600000
 6291456 bytes read: OK
## Booting kernel from Legacy Image at 30007fc0 ...
   Image Name:   Linux-2.6.35-rc4+
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    1548496 Bytes = 1.5 MiB
   Load Address: 30008000
   Entry Point:  30008000
   Loading Kernel Image ... OK
OK

Starting kernel ...

Uncompressing Linux... done, booting the kernel.
[    0.000000] Linux version 2.6.35-rc4+ (kmpark at july) (gcc version 4.4.1 (GCC)0
[    0.000000] CPU: ARMv7 Processor [412fc082] revision 2 (ARMv7), cr=10c53c7f
[    0.000000] CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction ce
[    0.000000] Machine: GONI
[    0.000000] Ignoring unrecognised tag 0x54410008
[    0.000000] Memory policy: ECC disabled, Data cache writeback
[    0.000000] CPU S5PV210/S5PC110 (id 0x43110200)
...
[    0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes)
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
[    0.000000] Memory: 80MB 128MB 128MB 128MB = 464MB total
[    0.000000] Memory: 468224k/468224k available, 6912k reserved, 262144K highmm
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
[    0.000000]     DMA     : 0xff000000 - 0xffe00000   (  14 MB)
[    0.000000]     vmalloc : 0xd8800000 - 0xe0000000   ( 120 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xd8000000   ( 384 MB)
[    0.000000]     pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
[    0.000000]     modules : 0xbf000000 - 0xbfe00000   (  14 MB)
[    0.000000]       .init : 0xc0008000 - 0xc001b000   (  76 kB)
[    0.000000]       .text : 0xc001b000 - 0xc0282000   (2460 kB)
[    0.000000]       .data : 0xc0298000 - 0xc02b1160   ( 101 kB)
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  RCU-based detection of stalled CPUs is disabled.
[    0.000000]  Verbose stalled-CPUs detection is disabled.
[    0.000000] NR_IRQS:176
[    0.000000] VIC @f4000000: id 0x00041192, vendor 0x41
[    0.000000] VIC @f4010000: id 0x00041192, vendor 0x41
[    0.000000] VIC @f4020000: id 0x00041192, vendor 0x41
[    0.000000] VIC @f4030000: id 0x00041192, vendor 0x41
[    0.000000] Console: colour dummy device 80x30
[    0.000000] console [ttySAC2] enabled
[    0.010000] Calibrating delay loop... 797.90 BogoMIPS (lpj=1994752)


>
>> Thank you,
>> Kyungmin Park
>>
>> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
>> > Russell,
>> >
>> > Hi,
>> >
>> > Kukjin Kim wrote:
>> >> Russell wrote:
>> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's
>> > fine.
>> >> > That doesn't mean the section size is 16MB.
>> >> >
>> >> > As I've already said, the section size has _nothing_ what so ever to
> do
>> >> > with the size of memory, or the granularity of the size of memory.
> ?By
>> >> > way of illustration, it is perfectly legal to have a section size of
>> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So
>> >> > sections do not have to be completely filled.
>> >> >
>> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
> if
>> > a
>> >> section has a hole when initializing sparse memory.
>> >>
>> >> I identified that a section doesn't need to be a contiguous area of
>> > physical
>> >> memory when reading your comment with the fact that the mem_map of a
>> > section
>> >> can be smaller than the size of a section.
>> >>
>> >> I found, however, the kernel panics when modifying min_free_kbytes file
> in
>> >> the proc filesystem if a section has a hole.
>> >>
>> >> While processing the change of min_free_kbytes in the kernel, page
>> >> descriptors in a hole of an online section is accessed.
>> >
>> > As I said, following error happens.
>> > It would be helpful to me if any opinions or comments.
>> >
>> > ---
>
>
> (snip)
>
>
> Thanks.
>
> Best regards,
> Kgene.
> --
> Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
> SW Solution Development Team, Samsung Electronics Co., Ltd.
>
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-12 10:08       ` Kyungmin Park
@ 2010-07-12 10:13         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-12 10:13 UTC (permalink / raw)
  To: Kyungmin Park; +Cc: Kukjin Kim, linux-arm-kernel, linux-samsung-soc, ben-linux

On Mon, Jul 12, 2010 at 07:08:11PM +0900, Kyungmin Park wrote:
> 27 for our board. 28 for generic.
> 
> Note that I enabled HIGHMEM and modify the VMALLOC_END to 0xd000'0000
> to test HIGHMEM.

If you want to test HIGHMEM, the way to do it is to specify a vmalloc=
parameter to the kernel, not to modify VMALLOC_END.

> Universal # bdinfo
> arch_number = 0x00000B2E
> env_t       = 0x00000000
> boot_params = 0x30000100
> DRAM bank   = 0x00000000
> -> start    = 0x30000000
> -> size     = 0x05000000
> DRAM bank   = 0x00000001
> -> start    = 0x40000000
> -> size     = 0x18000000

28 will work for this too.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-12 10:13         ` Russell King - ARM Linux
  0 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-12 10:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 12, 2010 at 07:08:11PM +0900, Kyungmin Park wrote:
> 27 for our board. 28 for generic.
> 
> Note that I enabled HIGHMEM and modify the VMALLOC_END to 0xd000'0000
> to test HIGHMEM.

If you want to test HIGHMEM, the way to do it is to specify a vmalloc=
parameter to the kernel, not to modify VMALLOC_END.

> Universal # bdinfo
> arch_number = 0x00000B2E
> env_t       = 0x00000000
> boot_params = 0x30000100
> DRAM bank   = 0x00000000
> -> start    = 0x30000000
> -> size     = 0x05000000
> DRAM bank   = 0x00000001
> -> start    = 0x40000000
> -> size     = 0x18000000

28 will work for this too.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: About SECTION_SIZE_BITS for Sparsemem
  2010-07-12  9:52   ` Minchan Kim
@ 2010-07-12 10:13     ` Kukjin Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Kukjin Kim @ 2010-07-12 10:13 UTC (permalink / raw)
  To: 'Minchan Kim'
  Cc: linux-samsung-soc, 'Russell King - ARM Linux',
	ben-linux, linux-arm-kernel

Minchan Kim wrote:
> 
> Hello.
> 
Hello :-)

> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> > Russell,
> >
> > Hi,
> >
> > Kukjin Kim wrote:
> >> Russell wrote:
> >> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
> > fine.
> >> > That doesn't mean the section size is 16MB.
> >> >
> >> > As I've already said, the section size has _nothing_ what so ever to
do
> >> > with the size of memory, or the granularity of the size of memory.
 By
> >> > way of illustration, it is perfectly legal to have a section size of
> >> > 256MB but only have 1MB in a section and this is perfectly legal.  So
> >> > sections do not have to be completely filled.
> >> >
> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
if
> > a
> >> section has a hole when initializing sparse memory.
> >>
> >> I identified that a section doesn't need to be a contiguous area of
> > physical
> >> memory when reading your comment with the fact that the mem_map of a
> > section
> >> can be smaller than the size of a section.
> >>
> >> I found, however, the kernel panics when modifying min_free_kbytes file
in
> >> the proc filesystem if a section has a hole.
> >>
> >> While processing the change of min_free_kbytes in the kernel, page
> >> descriptors in a hole of an online section is accessed.
> >
> > As I said, following error happens.
> > It would be helpful to me if any opinions or comments.
> >
> 
> Could you test below patch?
> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
> 
Yes, I did it, and no kernel panic happens :-)

Same test...
[root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
2736
[root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
[root@Samsung ~]#
[root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
2730


> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
> *zone)
>         for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
>                 if (!pfn_valid(pfn))
>                         continue;
> +
>                 page = pfn_to_page(pfn);
> 
> +                /* Watch for unexpected holes punched in the memmap */
> +                if (!memmap_valid_within(pfn, page, zone))
> +                        continue;
> +
>                 /* Watch out for overlapping nodes */
>                 if (page_to_nid(page) != zone_to_nid(zone))
>                         continue;
> 
> 
> 

...Could you please explain about this issue?

Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-12 10:13     ` Kukjin Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Kukjin Kim @ 2010-07-12 10:13 UTC (permalink / raw)
  To: linux-arm-kernel

Minchan Kim wrote:
> 
> Hello.
> 
Hello :-)

> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> > Russell,
> >
> > Hi,
> >
> > Kukjin Kim wrote:
> >> Russell wrote:
> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's
> > fine.
> >> > That doesn't mean the section size is 16MB.
> >> >
> >> > As I've already said, the section size has _nothing_ what so ever to
do
> >> > with the size of memory, or the granularity of the size of memory.
?By
> >> > way of illustration, it is perfectly legal to have a section size of
> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So
> >> > sections do not have to be completely filled.
> >> >
> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
if
> > a
> >> section has a hole when initializing sparse memory.
> >>
> >> I identified that a section doesn't need to be a contiguous area of
> > physical
> >> memory when reading your comment with the fact that the mem_map of a
> > section
> >> can be smaller than the size of a section.
> >>
> >> I found, however, the kernel panics when modifying min_free_kbytes file
in
> >> the proc filesystem if a section has a hole.
> >>
> >> While processing the change of min_free_kbytes in the kernel, page
> >> descriptors in a hole of an online section is accessed.
> >
> > As I said, following error happens.
> > It would be helpful to me if any opinions or comments.
> >
> 
> Could you test below patch?
> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
> 
Yes, I did it, and no kernel panic happens :-)

Same test...
[root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
2736
[root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
[root at Samsung ~]#
[root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
2730


> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
> *zone)
>         for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
>                 if (!pfn_valid(pfn))
>                         continue;
> +
>                 page = pfn_to_page(pfn);
> 
> +                /* Watch for unexpected holes punched in the memmap */
> +                if (!memmap_valid_within(pfn, page, zone))
> +                        continue;
> +
>                 /* Watch out for overlapping nodes */
>                 if (page_to_nid(page) != zone_to_nid(zone))
>                         continue;
> 
> 
> 

...Could you please explain about this issue?

Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-12 10:13     ` Kukjin Kim
@ 2010-07-12 10:35       ` Minchan Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-12 10:35 UTC (permalink / raw)
  To: Kukjin Kim, Mel Gorman, KAMEZAWA Hiroyuki
  Cc: linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux

Hi, Mel and Kame.

On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> Minchan Kim wrote:
>>
>> Hello.
>>
> Hello :-)
>
>> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
>> > Russell,
>> >
>> > Hi,
>> >
>> > Kukjin Kim wrote:
>> >> Russell wrote:
>> >> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
>> > fine.
>> >> > That doesn't mean the section size is 16MB.
>> >> >
>> >> > As I've already said, the section size has _nothing_ what so ever to
> do
>> >> > with the size of memory, or the granularity of the size of memory.
>  By
>> >> > way of illustration, it is perfectly legal to have a section size of
>> >> > 256MB but only have 1MB in a section and this is perfectly legal.  So
>> >> > sections do not have to be completely filled.
>> >> >
>> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
> if
>> > a
>> >> section has a hole when initializing sparse memory.
>> >>
>> >> I identified that a section doesn't need to be a contiguous area of
>> > physical
>> >> memory when reading your comment with the fact that the mem_map of a
>> > section
>> >> can be smaller than the size of a section.
>> >>
>> >> I found, however, the kernel panics when modifying min_free_kbytes file
> in
>> >> the proc filesystem if a section has a hole.
>> >>
>> >> While processing the change of min_free_kbytes in the kernel, page
>> >> descriptors in a hole of an online section is accessed.
>> >
>> > As I said, following error happens.
>> > It would be helpful to me if any opinions or comments.
>> >
>>
>> Could you test below patch?
>> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
>>
> Yes, I did it, and no kernel panic happens :-)
>
> Same test...
> [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> 2736
> [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
> [root@Samsung ~]#
> [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> 2730
>
>
>> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
>> *zone)
>>         for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
>>                 if (!pfn_valid(pfn))
>>                         continue;
>> +
>>                 page = pfn_to_page(pfn);
>>
>> +                /* Watch for unexpected holes punched in the memmap */
>> +                if (!memmap_valid_within(pfn, page, zone))
>> +                        continue;
>> +
>>                 /* Watch out for overlapping nodes */
>>                 if (page_to_nid(page) != zone_to_nid(zone))
>>                         continue;
>>
>>
>>
>
> ...Could you please explain about this issue?

The setup_zone_migrate_reserve doesn't check memmap hole. I think
compaction would have the  same problem, too. I don't know there is a
problem in elsewhere.
Anyway, I think memmap_valid_within calling whenever walking whole pfn
range isn't a good solution. We already have pfn_valid. Could we check
this in there?
For example, mem_section have a valid pfn range and then valid section
can test it in pfn_valid.

What do you think about it?

P.S)
I know Mel is very busy to test to avoid writeback in direct reclaim.
I can wait on you. :)
Kukjin. Is this a urgent issue? If it isn't, please wait on our mm guys. :)

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-12 10:35       ` Minchan Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-12 10:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hi, Mel and Kame.

On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> Minchan Kim wrote:
>>
>> Hello.
>>
> Hello :-)
>
>> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
>> > Russell,
>> >
>> > Hi,
>> >
>> > Kukjin Kim wrote:
>> >> Russell wrote:
>> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's
>> > fine.
>> >> > That doesn't mean the section size is 16MB.
>> >> >
>> >> > As I've already said, the section size has _nothing_ what so ever to
> do
>> >> > with the size of memory, or the granularity of the size of memory.
> ?By
>> >> > way of illustration, it is perfectly legal to have a section size of
>> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So
>> >> > sections do not have to be completely filled.
>> >> >
>> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
> if
>> > a
>> >> section has a hole when initializing sparse memory.
>> >>
>> >> I identified that a section doesn't need to be a contiguous area of
>> > physical
>> >> memory when reading your comment with the fact that the mem_map of a
>> > section
>> >> can be smaller than the size of a section.
>> >>
>> >> I found, however, the kernel panics when modifying min_free_kbytes file
> in
>> >> the proc filesystem if a section has a hole.
>> >>
>> >> While processing the change of min_free_kbytes in the kernel, page
>> >> descriptors in a hole of an online section is accessed.
>> >
>> > As I said, following error happens.
>> > It would be helpful to me if any opinions or comments.
>> >
>>
>> Could you test below patch?
>> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
>>
> Yes, I did it, and no kernel panic happens :-)
>
> Same test...
> [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> 2736
> [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
> [root at Samsung ~]#
> [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> 2730
>
>
>> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
>> *zone)
>> ? ? ? ? for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
>> ? ? ? ? ? ? ? ? if (!pfn_valid(pfn))
>> ? ? ? ? ? ? ? ? ? ? ? ? continue;
>> +
>> ? ? ? ? ? ? ? ? page = pfn_to_page(pfn);
>>
>> + ? ? ? ? ? ? ? ?/* Watch for unexpected holes punched in the memmap */
>> + ? ? ? ? ? ? ? ?if (!memmap_valid_within(pfn, page, zone))
>> + ? ? ? ? ? ? ? ? ? ? ? ?continue;
>> +
>> ? ? ? ? ? ? ? ? /* Watch out for overlapping nodes */
>> ? ? ? ? ? ? ? ? if (page_to_nid(page) != zone_to_nid(zone))
>> ? ? ? ? ? ? ? ? ? ? ? ? continue;
>>
>>
>>
>
> ...Could you please explain about this issue?

The setup_zone_migrate_reserve doesn't check memmap hole. I think
compaction would have the  same problem, too. I don't know there is a
problem in elsewhere.
Anyway, I think memmap_valid_within calling whenever walking whole pfn
range isn't a good solution. We already have pfn_valid. Could we check
this in there?
For example, mem_section have a valid pfn range and then valid section
can test it in pfn_valid.

What do you think about it?

P.S)
I know Mel is very busy to test to avoid writeback in direct reclaim.
I can wait on you. :)
Kukjin. Is this a urgent issue? If it isn't, please wait on our mm guys. :)

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-12  9:52   ` Minchan Kim
@ 2010-07-12 10:45     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-12 10:45 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Kukjin Kim, linux-arm-kernel, linux-samsung-soc, ben-linux

On Mon, Jul 12, 2010 at 06:52:28PM +0900, Minchan Kim wrote:
> Hello.
> 
> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> > Russell,
> >
> > Hi,
> >
> > Kukjin Kim wrote:
> >> Russell wrote:
> >> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
> > fine.
> >> > That doesn't mean the section size is 16MB.
> >> >
> >> > As I've already said, the section size has _nothing_ what so ever to do
> >> > with the size of memory, or the granularity of the size of memory.  By
> >> > way of illustration, it is perfectly legal to have a section size of
> >> > 256MB but only have 1MB in a section and this is perfectly legal.  So
> >> > sections do not have to be completely filled.
> >> >
> >> Actually, as you know, the hole's area of mem_map is freed from bootmem if
> > a
> >> section has a hole when initializing sparse memory.
> >>
> >> I identified that a section doesn't need to be a contiguous area of
> > physical
> >> memory when reading your comment with the fact that the mem_map of a
> > section
> >> can be smaller than the size of a section.
> >>
> >> I found, however, the kernel panics when modifying min_free_kbytes file in
> >> the proc filesystem if a section has a hole.
> >>
> >> While processing the change of min_free_kbytes in the kernel, page
> >> descriptors in a hole of an online section is accessed.
> >
> > As I said, following error happens.
> > It would be helpful to me if any opinions or comments.
> >
> 
> Could you test below patch?
> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.

But hang on.  Where are the hole(s)?

The DRAM setup is:
0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000

which with SECTION_SIZE_BITS set to 28 gives three sections of memory,
and each sparsemem section does not have a hole.

No zone should cross a sparsemem section boundary.

Moreover, our pfn_valid() now returns false for any and all invalid PFNs.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-12 10:45     ` Russell King - ARM Linux
  0 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-12 10:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 12, 2010 at 06:52:28PM +0900, Minchan Kim wrote:
> Hello.
> 
> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> > Russell,
> >
> > Hi,
> >
> > Kukjin Kim wrote:
> >> Russell wrote:
> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's
> > fine.
> >> > That doesn't mean the section size is 16MB.
> >> >
> >> > As I've already said, the section size has _nothing_ what so ever to do
> >> > with the size of memory, or the granularity of the size of memory. ?By
> >> > way of illustration, it is perfectly legal to have a section size of
> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So
> >> > sections do not have to be completely filled.
> >> >
> >> Actually, as you know, the hole's area of mem_map is freed from bootmem if
> > a
> >> section has a hole when initializing sparse memory.
> >>
> >> I identified that a section doesn't need to be a contiguous area of
> > physical
> >> memory when reading your comment with the fact that the mem_map of a
> > section
> >> can be smaller than the size of a section.
> >>
> >> I found, however, the kernel panics when modifying min_free_kbytes file in
> >> the proc filesystem if a section has a hole.
> >>
> >> While processing the change of min_free_kbytes in the kernel, page
> >> descriptors in a hole of an online section is accessed.
> >
> > As I said, following error happens.
> > It would be helpful to me if any opinions or comments.
> >
> 
> Could you test below patch?
> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.

But hang on.  Where are the hole(s)?

The DRAM setup is:
0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000

which with SECTION_SIZE_BITS set to 28 gives three sections of memory,
and each sparsemem section does not have a hole.

No zone should cross a sparsemem section boundary.

Moreover, our pfn_valid() now returns false for any and all invalid PFNs.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-12 10:45     ` Russell King - ARM Linux
@ 2010-07-12 12:28       ` Minchan Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-12 12:28 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-samsung-soc, Kukjin Kim, ben-linux, linux-arm-kernel

On Mon, Jul 12, 2010 at 11:45:41AM +0100, Russell King - ARM Linux wrote:
> On Mon, Jul 12, 2010 at 06:52:28PM +0900, Minchan Kim wrote:
> > Hello.
> > 
> > On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> > > Russell,
> > >
> > > Hi,
> > >
> > > Kukjin Kim wrote:
> > >> Russell wrote:
> > >> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
> > > fine.
> > >> > That doesn't mean the section size is 16MB.
> > >> >
> > >> > As I've already said, the section size has _nothing_ what so ever to do
> > >> > with the size of memory, or the granularity of the size of memory.  By
> > >> > way of illustration, it is perfectly legal to have a section size of
> > >> > 256MB but only have 1MB in a section and this is perfectly legal.  So
> > >> > sections do not have to be completely filled.
> > >> >
> > >> Actually, as you know, the hole's area of mem_map is freed from bootmem if
> > > a
> > >> section has a hole when initializing sparse memory.
> > >>
> > >> I identified that a section doesn't need to be a contiguous area of
> > > physical
> > >> memory when reading your comment with the fact that the mem_map of a
> > > section
> > >> can be smaller than the size of a section.
> > >>
> > >> I found, however, the kernel panics when modifying min_free_kbytes file in
> > >> the proc filesystem if a section has a hole.
> > >>
> > >> While processing the change of min_free_kbytes in the kernel, page
> > >> descriptors in a hole of an online section is accessed.
> > >
> > > As I said, following error happens.
> > > It would be helpful to me if any opinions or comments.
> > >
> > 
> > Could you test below patch?
> > Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
> 
> But hang on.  Where are the hole(s)?
> 
> The DRAM setup is:
> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> 
> which with SECTION_SIZE_BITS set to 28 gives three sections of memory,
> and each sparsemem section does not have a hole.
> 
> No zone should cross a sparsemem section boundary.
> 
> Moreover, our pfn_valid() now returns false for any and all invalid PFNs.

True if it isn't sparsemem. 
But look at pfn_valid in sparsemem. 

It just checks that there is a section and section_mem_map has SECTION_HAS_MEM_MAP.
The first section in above case has just 80M memory but section has 256M. 
So, 0x25000000 - 28000000 is the hole. If you pass pfn whihc is 0x2500000,
let's see pfn_valid. 

1. We pass pfn_to_section_nr check 
2. Both __nr_to_section and valid_section is vaild.

static inline int pfn_valid(unsigned long pfn) 
{
        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
                return 0;
        return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
}


What prevent above hole's case?
I think at least pfn_valid in sparsemem need bank range check like pfn_valid of ARM 
in FLATMEM.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-12 12:28       ` Minchan Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-12 12:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 12, 2010 at 11:45:41AM +0100, Russell King - ARM Linux wrote:
> On Mon, Jul 12, 2010 at 06:52:28PM +0900, Minchan Kim wrote:
> > Hello.
> > 
> > On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> > > Russell,
> > >
> > > Hi,
> > >
> > > Kukjin Kim wrote:
> > >> Russell wrote:
> > >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's
> > > fine.
> > >> > That doesn't mean the section size is 16MB.
> > >> >
> > >> > As I've already said, the section size has _nothing_ what so ever to do
> > >> > with the size of memory, or the granularity of the size of memory. ?By
> > >> > way of illustration, it is perfectly legal to have a section size of
> > >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So
> > >> > sections do not have to be completely filled.
> > >> >
> > >> Actually, as you know, the hole's area of mem_map is freed from bootmem if
> > > a
> > >> section has a hole when initializing sparse memory.
> > >>
> > >> I identified that a section doesn't need to be a contiguous area of
> > > physical
> > >> memory when reading your comment with the fact that the mem_map of a
> > > section
> > >> can be smaller than the size of a section.
> > >>
> > >> I found, however, the kernel panics when modifying min_free_kbytes file in
> > >> the proc filesystem if a section has a hole.
> > >>
> > >> While processing the change of min_free_kbytes in the kernel, page
> > >> descriptors in a hole of an online section is accessed.
> > >
> > > As I said, following error happens.
> > > It would be helpful to me if any opinions or comments.
> > >
> > 
> > Could you test below patch?
> > Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
> 
> But hang on.  Where are the hole(s)?
> 
> The DRAM setup is:
> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> 
> which with SECTION_SIZE_BITS set to 28 gives three sections of memory,
> and each sparsemem section does not have a hole.
> 
> No zone should cross a sparsemem section boundary.
> 
> Moreover, our pfn_valid() now returns false for any and all invalid PFNs.

True if it isn't sparsemem. 
But look at pfn_valid in sparsemem. 

It just checks that there is a section and section_mem_map has SECTION_HAS_MEM_MAP.
The first section in above case has just 80M memory but section has 256M. 
So, 0x25000000 - 28000000 is the hole. If you pass pfn whihc is 0x2500000,
let's see pfn_valid. 

1. We pass pfn_to_section_nr check 
2. Both __nr_to_section and valid_section is vaild.

static inline int pfn_valid(unsigned long pfn) 
{
        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
                return 0;
        return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
}


What prevent above hole's case?
I think at least pfn_valid in sparsemem need bank range check like pfn_valid of ARM 
in FLATMEM.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-12 12:28       ` Minchan Kim
@ 2010-07-12 12:42         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-12 12:42 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Kukjin Kim, linux-arm-kernel, linux-samsung-soc, ben-linux

On Mon, Jul 12, 2010 at 09:28:16PM +0900, Minchan Kim wrote:
> It just checks that there is a section and section_mem_map has SECTION_HAS_MEM_MAP.
> The first section in above case has just 80M memory but section has 256M. 
> So, 0x25000000 - 28000000 is the hole. If you pass pfn whihc is 0x2500000,
> let's see pfn_valid.

That's not a hole as such.  It's an incompletely filled section, which is
precisely what sparsemem is all about.

> 1. We pass pfn_to_section_nr check 
> 2. Both __nr_to_section and valid_section is vaild.
> 
> static inline int pfn_valid(unsigned long pfn) 
> {
>         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>                 return 0;
>         return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> }
> 
> 
> What prevent above hole's case?

That looks too loose to be useful - that means invalid page table entries
can pass the pfn_valid() test when they should actually fail, and I'd
imagine there will be many more such problems like that.

> I think at least pfn_valid in sparsemem need bank range check like pfn_valid of ARM 
> in FLATMEM.

I agree - pfn_valid() on sparsemem needs to be tightened.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-12 12:42         ` Russell King - ARM Linux
  0 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-12 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 12, 2010 at 09:28:16PM +0900, Minchan Kim wrote:
> It just checks that there is a section and section_mem_map has SECTION_HAS_MEM_MAP.
> The first section in above case has just 80M memory but section has 256M. 
> So, 0x25000000 - 28000000 is the hole. If you pass pfn whihc is 0x2500000,
> let's see pfn_valid.

That's not a hole as such.  It's an incompletely filled section, which is
precisely what sparsemem is all about.

> 1. We pass pfn_to_section_nr check 
> 2. Both __nr_to_section and valid_section is vaild.
> 
> static inline int pfn_valid(unsigned long pfn) 
> {
>         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>                 return 0;
>         return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> }
> 
> 
> What prevent above hole's case?

That looks too loose to be useful - that means invalid page table entries
can pass the pfn_valid() test when they should actually fail, and I'd
imagine there will be many more such problems like that.

> I think at least pfn_valid in sparsemem need bank range check like pfn_valid of ARM 
> in FLATMEM.

I agree - pfn_valid() on sparsemem needs to be tightened.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-12 10:35       ` Minchan Kim
@ 2010-07-13  0:25         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 60+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  0:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Kukjin Kim, Mel Gorman, linux-arm-kernel, linux-samsung-soc,
	Russell King - ARM Linux, ben-linux

On Mon, 12 Jul 2010 19:35:17 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Hi, Mel and Kame.
> 
> On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> > Minchan Kim wrote:
> >>
> >> Hello.
> >>
> > Hello :-)
> >
> >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> >> > Russell,
> >> >
> >> > Hi,
> >> >
> >> > Kukjin Kim wrote:
> >> >> Russell wrote:
> >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
> >> > fine.
> >> >> > That doesn't mean the section size is 16MB.
> >> >> >
> >> >> > As I've already said, the section size has _nothing_ what so ever to
> > do
> >> >> > with the size of memory, or the granularity of the size of memory.
> >  By
> >> >> > way of illustration, it is perfectly legal to have a section size of
> >> >> > 256MB but only have 1MB in a section and this is perfectly legal.  So
> >> >> > sections do not have to be completely filled.
> >> >> >
> >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
> > if
> >> > a
> >> >> section has a hole when initializing sparse memory.
> >> >>
> >> >> I identified that a section doesn't need to be a contiguous area of
> >> > physical
> >> >> memory when reading your comment with the fact that the mem_map of a
> >> > section
> >> >> can be smaller than the size of a section.
> >> >>
> >> >> I found, however, the kernel panics when modifying min_free_kbytes file
> > in
> >> >> the proc filesystem if a section has a hole.
> >> >>
> >> >> While processing the change of min_free_kbytes in the kernel, page
> >> >> descriptors in a hole of an online section is accessed.
> >> >
> >> > As I said, following error happens.
> >> > It would be helpful to me if any opinions or comments.
> >> >
> >>
> >> Could you test below patch?
> >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
> >>
> > Yes, I did it, and no kernel panic happens :-)
> >
> > Same test...
> > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > 2736
> > [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
> > [root@Samsung ~]#
> > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > 2730
> >
> >
> >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
> >> *zone)
> >>         for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
> >>                 if (!pfn_valid(pfn))
> >>                         continue;
> >> +
> >>                 page = pfn_to_page(pfn);
> >>
> >> +                /* Watch for unexpected holes punched in the memmap */
> >> +                if (!memmap_valid_within(pfn, page, zone))
> >> +                        continue;
> >> +
> >>                 /* Watch out for overlapping nodes */
> >>                 if (page_to_nid(page) != zone_to_nid(zone))
> >>                         continue;
> >>
> >>
> >>
> >
> > ...Could you please explain about this issue?
> 
> The setup_zone_migrate_reserve doesn't check memmap hole. I think
> compaction would have the  same problem, too. I don't know there is a
> problem in elsewhere.
> Anyway, I think memmap_valid_within calling whenever walking whole pfn
> range isn't a good solution. We already have pfn_valid. Could we check
> this in there?
> For example, mem_section have a valid pfn range and then valid section
> can test it in pfn_valid.
> 
> What do you think about it?
> 

please map another "fake" page when you free memmap.

For example, prepare a page filled with (1 << PG_reserved).
and replace it with unnecessary memmap rather than freeing a page for memmap.

Then, you can find a PageReserved() page there. However it doesn't have valid
nid,zid,secitonID, I don't think it can cause trouble.
But you may see some BUG_ON() hit. If so, I think ARM guys can add

#define PageHole(page) (PageReserved(page) && PageXXXXX(page))
for avoiding BUG_ON() hit. (and this will help to skip unnecessary
memmap_init_zone())

By this, you can finaly remove CONFIG_MEMMAP_HOLES_IN_ZONE, and makes ARM's
sparsemem much easier for maintainance.

section ID is not important as far as you call page_to_pfn().
You may able to add WARNING when page_to_pfn() is called against a HOLE page.

You may feel above as a big work but I don't think it's very difficult at the
first thought.


Thanks,
-Kame

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-13  0:25         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 60+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  0:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 12 Jul 2010 19:35:17 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Hi, Mel and Kame.
> 
> On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> > Minchan Kim wrote:
> >>
> >> Hello.
> >>
> > Hello :-)
> >
> >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> >> > Russell,
> >> >
> >> > Hi,
> >> >
> >> > Kukjin Kim wrote:
> >> >> Russell wrote:
> >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's
> >> > fine.
> >> >> > That doesn't mean the section size is 16MB.
> >> >> >
> >> >> > As I've already said, the section size has _nothing_ what so ever to
> > do
> >> >> > with the size of memory, or the granularity of the size of memory.
> > ?By
> >> >> > way of illustration, it is perfectly legal to have a section size of
> >> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So
> >> >> > sections do not have to be completely filled.
> >> >> >
> >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
> > if
> >> > a
> >> >> section has a hole when initializing sparse memory.
> >> >>
> >> >> I identified that a section doesn't need to be a contiguous area of
> >> > physical
> >> >> memory when reading your comment with the fact that the mem_map of a
> >> > section
> >> >> can be smaller than the size of a section.
> >> >>
> >> >> I found, however, the kernel panics when modifying min_free_kbytes file
> > in
> >> >> the proc filesystem if a section has a hole.
> >> >>
> >> >> While processing the change of min_free_kbytes in the kernel, page
> >> >> descriptors in a hole of an online section is accessed.
> >> >
> >> > As I said, following error happens.
> >> > It would be helpful to me if any opinions or comments.
> >> >
> >>
> >> Could you test below patch?
> >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
> >>
> > Yes, I did it, and no kernel panic happens :-)
> >
> > Same test...
> > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > 2736
> > [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
> > [root at Samsung ~]#
> > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > 2730
> >
> >
> >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
> >> *zone)
> >> ? ? ? ? for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
> >> ? ? ? ? ? ? ? ? if (!pfn_valid(pfn))
> >> ? ? ? ? ? ? ? ? ? ? ? ? continue;
> >> +
> >> ? ? ? ? ? ? ? ? page = pfn_to_page(pfn);
> >>
> >> + ? ? ? ? ? ? ? ?/* Watch for unexpected holes punched in the memmap */
> >> + ? ? ? ? ? ? ? ?if (!memmap_valid_within(pfn, page, zone))
> >> + ? ? ? ? ? ? ? ? ? ? ? ?continue;
> >> +
> >> ? ? ? ? ? ? ? ? /* Watch out for overlapping nodes */
> >> ? ? ? ? ? ? ? ? if (page_to_nid(page) != zone_to_nid(zone))
> >> ? ? ? ? ? ? ? ? ? ? ? ? continue;
> >>
> >>
> >>
> >
> > ...Could you please explain about this issue?
> 
> The setup_zone_migrate_reserve doesn't check memmap hole. I think
> compaction would have the  same problem, too. I don't know there is a
> problem in elsewhere.
> Anyway, I think memmap_valid_within calling whenever walking whole pfn
> range isn't a good solution. We already have pfn_valid. Could we check
> this in there?
> For example, mem_section have a valid pfn range and then valid section
> can test it in pfn_valid.
> 
> What do you think about it?
> 

please map another "fake" page when you free memmap.

For example, prepare a page filled with (1 << PG_reserved).
and replace it with unnecessary memmap rather than freeing a page for memmap.

Then, you can find a PageReserved() page there. However it doesn't have valid
nid,zid,secitonID, I don't think it can cause trouble.
But you may see some BUG_ON() hit. If so, I think ARM guys can add

#define PageHole(page) (PageReserved(page) && PageXXXXX(page))
for avoiding BUG_ON() hit. (and this will help to skip unnecessary
memmap_init_zone())

By this, you can finaly remove CONFIG_MEMMAP_HOLES_IN_ZONE, and makes ARM's
sparsemem much easier for maintainance.

section ID is not important as far as you call page_to_pfn().
You may able to add WARNING when page_to_pfn() is called against a HOLE page.

You may feel above as a big work but I don't think it's very difficult@the
first thought.


Thanks,
-Kame

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-13  0:25         ` KAMEZAWA Hiroyuki
@ 2010-07-13  1:53           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 60+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  1:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Minchan Kim, Kukjin Kim, Mel Gorman, linux-arm-kernel,
	linux-samsung-soc, Russell King - ARM Linux, ben-linux

On Tue, 13 Jul 2010 09:25:22 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Mon, 12 Jul 2010 19:35:17 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
> 
> > Hi, Mel and Kame.
> > 
> > On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> > > Minchan Kim wrote:
> > >>
> > >> Hello.
> > >>
> > > Hello :-)
> > >
> > >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> > >> > Russell,
> > >> >
> > >> > Hi,
> > >> >
> > >> > Kukjin Kim wrote:
> > >> >> Russell wrote:
> > >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
> > >> > fine.
> > >> >> > That doesn't mean the section size is 16MB.
> > >> >> >
> > >> >> > As I've already said, the section size has _nothing_ what so ever to
> > > do
> > >> >> > with the size of memory, or the granularity of the size of memory.
> > >  By
> > >> >> > way of illustration, it is perfectly legal to have a section size of
> > >> >> > 256MB but only have 1MB in a section and this is perfectly legal.  So
> > >> >> > sections do not have to be completely filled.
> > >> >> >
> > >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
> > > if
> > >> > a
> > >> >> section has a hole when initializing sparse memory.
> > >> >>
> > >> >> I identified that a section doesn't need to be a contiguous area of
> > >> > physical
> > >> >> memory when reading your comment with the fact that the mem_map of a
> > >> > section
> > >> >> can be smaller than the size of a section.
> > >> >>
> > >> >> I found, however, the kernel panics when modifying min_free_kbytes file
> > > in
> > >> >> the proc filesystem if a section has a hole.
> > >> >>
> > >> >> While processing the change of min_free_kbytes in the kernel, page
> > >> >> descriptors in a hole of an online section is accessed.
> > >> >
> > >> > As I said, following error happens.
> > >> > It would be helpful to me if any opinions or comments.
> > >> >
> > >>
> > >> Could you test below patch?
> > >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
> > >>
> > > Yes, I did it, and no kernel panic happens :-)
> > >
> > > Same test...
> > > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > > 2736
> > > [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
> > > [root@Samsung ~]#
> > > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > > 2730
> > >
> > >
> > >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
> > >> *zone)
> > >>         for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
> > >>                 if (!pfn_valid(pfn))
> > >>                         continue;
> > >> +
> > >>                 page = pfn_to_page(pfn);
> > >>
> > >> +                /* Watch for unexpected holes punched in the memmap */
> > >> +                if (!memmap_valid_within(pfn, page, zone))
> > >> +                        continue;
> > >> +
> > >>                 /* Watch out for overlapping nodes */
> > >>                 if (page_to_nid(page) != zone_to_nid(zone))
> > >>                         continue;
> > >>
> > >>
> > >>
> > >
> > > ...Could you please explain about this issue?
> > 
> > The setup_zone_migrate_reserve doesn't check memmap hole. I think
> > compaction would have the  same problem, too. I don't know there is a
> > problem in elsewhere.
> > Anyway, I think memmap_valid_within calling whenever walking whole pfn
> > range isn't a good solution. We already have pfn_valid. Could we check
> > this in there?
> > For example, mem_section have a valid pfn range and then valid section
> > can test it in pfn_valid.
> > 
> > What do you think about it?
> > 
> 
> please map another "fake" page when you free memmap.
> 
> For example, prepare a page filled with (1 << PG_reserved).
> and replace it with unnecessary memmap rather than freeing a page for memmap.
> 
> Then, you can find a PageReserved() page there. However it doesn't have valid
> nid,zid,secitonID, I don't think it can cause trouble.
> But you may see some BUG_ON() hit. If so, I think ARM guys can add
> 

yet another _very easy_ solution is, define pfn_valid() as following.

==
static inline arm_pfn_valid(unsigned long pfn)
{
	struct page *page = pfn_to_page(pfn);
	char *end_of_page = (char *)(page + 1) - 1;
	char byte;

	if ((((unsigned long)page) && PAGE_MASK) ==
		(((unsigned long)end_of_page) & PAGE_MASK))
		return (__get_user(byte, page) == 0);
	return (__get_user(byte, page) == 0) && (__get_user(byte, __end_of_page)== 0); 
}
==
And replace this with default pfn_valid() for SPARSEMEM.
(please see ia64_pfn_valid().)

I hope this is enough quick.....no ?
(we use get_user but the "page" will be accessed soon, anyway.)


Thanks,
-Kame

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-13  1:53           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 60+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  1:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 13 Jul 2010 09:25:22 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Mon, 12 Jul 2010 19:35:17 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
> 
> > Hi, Mel and Kame.
> > 
> > On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> > > Minchan Kim wrote:
> > >>
> > >> Hello.
> > >>
> > > Hello :-)
> > >
> > >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> > >> > Russell,
> > >> >
> > >> > Hi,
> > >> >
> > >> > Kukjin Kim wrote:
> > >> >> Russell wrote:
> > >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's
> > >> > fine.
> > >> >> > That doesn't mean the section size is 16MB.
> > >> >> >
> > >> >> > As I've already said, the section size has _nothing_ what so ever to
> > > do
> > >> >> > with the size of memory, or the granularity of the size of memory.
> > > ?By
> > >> >> > way of illustration, it is perfectly legal to have a section size of
> > >> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So
> > >> >> > sections do not have to be completely filled.
> > >> >> >
> > >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
> > > if
> > >> > a
> > >> >> section has a hole when initializing sparse memory.
> > >> >>
> > >> >> I identified that a section doesn't need to be a contiguous area of
> > >> > physical
> > >> >> memory when reading your comment with the fact that the mem_map of a
> > >> > section
> > >> >> can be smaller than the size of a section.
> > >> >>
> > >> >> I found, however, the kernel panics when modifying min_free_kbytes file
> > > in
> > >> >> the proc filesystem if a section has a hole.
> > >> >>
> > >> >> While processing the change of min_free_kbytes in the kernel, page
> > >> >> descriptors in a hole of an online section is accessed.
> > >> >
> > >> > As I said, following error happens.
> > >> > It would be helpful to me if any opinions or comments.
> > >> >
> > >>
> > >> Could you test below patch?
> > >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
> > >>
> > > Yes, I did it, and no kernel panic happens :-)
> > >
> > > Same test...
> > > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > > 2736
> > > [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
> > > [root at Samsung ~]#
> > > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > > 2730
> > >
> > >
> > >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
> > >> *zone)
> > >> ? ? ? ? for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
> > >> ? ? ? ? ? ? ? ? if (!pfn_valid(pfn))
> > >> ? ? ? ? ? ? ? ? ? ? ? ? continue;
> > >> +
> > >> ? ? ? ? ? ? ? ? page = pfn_to_page(pfn);
> > >>
> > >> + ? ? ? ? ? ? ? ?/* Watch for unexpected holes punched in the memmap */
> > >> + ? ? ? ? ? ? ? ?if (!memmap_valid_within(pfn, page, zone))
> > >> + ? ? ? ? ? ? ? ? ? ? ? ?continue;
> > >> +
> > >> ? ? ? ? ? ? ? ? /* Watch out for overlapping nodes */
> > >> ? ? ? ? ? ? ? ? if (page_to_nid(page) != zone_to_nid(zone))
> > >> ? ? ? ? ? ? ? ? ? ? ? ? continue;
> > >>
> > >>
> > >>
> > >
> > > ...Could you please explain about this issue?
> > 
> > The setup_zone_migrate_reserve doesn't check memmap hole. I think
> > compaction would have the  same problem, too. I don't know there is a
> > problem in elsewhere.
> > Anyway, I think memmap_valid_within calling whenever walking whole pfn
> > range isn't a good solution. We already have pfn_valid. Could we check
> > this in there?
> > For example, mem_section have a valid pfn range and then valid section
> > can test it in pfn_valid.
> > 
> > What do you think about it?
> > 
> 
> please map another "fake" page when you free memmap.
> 
> For example, prepare a page filled with (1 << PG_reserved).
> and replace it with unnecessary memmap rather than freeing a page for memmap.
> 
> Then, you can find a PageReserved() page there. However it doesn't have valid
> nid,zid,secitonID, I don't think it can cause trouble.
> But you may see some BUG_ON() hit. If so, I think ARM guys can add
> 

yet another _very easy_ solution is, define pfn_valid() as following.

==
static inline arm_pfn_valid(unsigned long pfn)
{
	struct page *page = pfn_to_page(pfn);
	char *end_of_page = (char *)(page + 1) - 1;
	char byte;

	if ((((unsigned long)page) && PAGE_MASK) ==
		(((unsigned long)end_of_page) & PAGE_MASK))
		return (__get_user(byte, page) == 0);
	return (__get_user(byte, page) == 0) && (__get_user(byte, __end_of_page)== 0); 
}
==
And replace this with default pfn_valid() for SPARSEMEM.
(please see ia64_pfn_valid().)

I hope this is enough quick.....no ?
(we use get_user but the "page" will be accessed soon, anyway.)


Thanks,
-Kame

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-13  0:25         ` KAMEZAWA Hiroyuki
@ 2010-07-13  2:05           ` Minchan Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-13  2:05 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Kukjin Kim, Mel Gorman, linux-arm-kernel, linux-samsung-soc,
	Russell King - ARM Linux, ben-linux

On Tue, Jul 13, 2010 at 9:25 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Mon, 12 Jul 2010 19:35:17 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Hi, Mel and Kame.
>>
>> On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
>> > Minchan Kim wrote:
>> >>
>> >> Hello.
>> >>
>> > Hello :-)
>> >
>> >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
>> >> > Russell,
>> >> >
>> >> > Hi,
>> >> >
>> >> > Kukjin Kim wrote:
>> >> >> Russell wrote:
>> >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
>> >> > fine.
>> >> >> > That doesn't mean the section size is 16MB.
>> >> >> >
>> >> >> > As I've already said, the section size has _nothing_ what so ever to
>> > do
>> >> >> > with the size of memory, or the granularity of the size of memory.
>> >  By
>> >> >> > way of illustration, it is perfectly legal to have a section size of
>> >> >> > 256MB but only have 1MB in a section and this is perfectly legal.  So
>> >> >> > sections do not have to be completely filled.
>> >> >> >
>> >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
>> > if
>> >> > a
>> >> >> section has a hole when initializing sparse memory.
>> >> >>
>> >> >> I identified that a section doesn't need to be a contiguous area of
>> >> > physical
>> >> >> memory when reading your comment with the fact that the mem_map of a
>> >> > section
>> >> >> can be smaller than the size of a section.
>> >> >>
>> >> >> I found, however, the kernel panics when modifying min_free_kbytes file
>> > in
>> >> >> the proc filesystem if a section has a hole.
>> >> >>
>> >> >> While processing the change of min_free_kbytes in the kernel, page
>> >> >> descriptors in a hole of an online section is accessed.
>> >> >
>> >> > As I said, following error happens.
>> >> > It would be helpful to me if any opinions or comments.
>> >> >
>> >>
>> >> Could you test below patch?
>> >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
>> >>
>> > Yes, I did it, and no kernel panic happens :-)
>> >
>> > Same test...
>> > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
>> > 2736
>> > [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
>> > [root@Samsung ~]#
>> > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
>> > 2730
>> >
>> >
>> >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
>> >> *zone)
>> >>         for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
>> >>                 if (!pfn_valid(pfn))
>> >>                         continue;
>> >> +
>> >>                 page = pfn_to_page(pfn);
>> >>
>> >> +                /* Watch for unexpected holes punched in the memmap */
>> >> +                if (!memmap_valid_within(pfn, page, zone))
>> >> +                        continue;
>> >> +
>> >>                 /* Watch out for overlapping nodes */
>> >>                 if (page_to_nid(page) != zone_to_nid(zone))
>> >>                         continue;
>> >>
>> >>
>> >>
>> >
>> > ...Could you please explain about this issue?
>>
>> The setup_zone_migrate_reserve doesn't check memmap hole. I think
>> compaction would have the  same problem, too. I don't know there is a
>> problem in elsewhere.
>> Anyway, I think memmap_valid_within calling whenever walking whole pfn
>> range isn't a good solution. We already have pfn_valid. Could we check
>> this in there?
>> For example, mem_section have a valid pfn range and then valid section
>> can test it in pfn_valid.
>>
>> What do you think about it?
>>
>
> please map another "fake" page when you free memmap.
>
> For example, prepare a page filled with (1 << PG_reserved).
> and replace it with unnecessary memmap rather than freeing a page for memmap.

Hmm. I don't got your point.
The problem is that we access struct page by pfn number not address.

You mean let's remain memmap on hole with changing PageReseved instead of free?

>
> Then, you can find a PageReserved() page there. However it doesn't have valid
> nid,zid,secitonID, I don't think it can cause trouble.
> But you may see some BUG_ON() hit. If so, I think ARM guys can add
>
> #define PageHole(page) (PageReserved(page) && PageXXXXX(page))
> for avoiding BUG_ON() hit. (and this will help to skip unnecessary
> memmap_init_zone())

I think it's not a good idea to add new flag. If
Kame. Could you review my RFC patch which makes pfn_valid check more
tightly on sparsemem?

>
> By this, you can finaly remove CONFIG_MEMMAP_HOLES_IN_ZONE, and makes ARM's
> sparsemem much easier for maintainance.

I think above problem isn't only ARM one. It can happen on any
sparsemem which do loose pfn_valid check.

>
> section ID is not important as far as you call page_to_pfn().
> You may able to add WARNING when page_to_pfn() is called against a HOLE page.

Yes. It's a good idea. We can add it optionally. But for it, we need
PageHole or mem_section bank range check. I like second like my
previous posting.


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-13  2:05           ` Minchan Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-13  2:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 13, 2010 at 9:25 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Mon, 12 Jul 2010 19:35:17 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Hi, Mel and Kame.
>>
>> On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
>> > Minchan Kim wrote:
>> >>
>> >> Hello.
>> >>
>> > Hello :-)
>> >
>> >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
>> >> > Russell,
>> >> >
>> >> > Hi,
>> >> >
>> >> > Kukjin Kim wrote:
>> >> >> Russell wrote:
>> >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's
>> >> > fine.
>> >> >> > That doesn't mean the section size is 16MB.
>> >> >> >
>> >> >> > As I've already said, the section size has _nothing_ what so ever to
>> > do
>> >> >> > with the size of memory, or the granularity of the size of memory.
>> > ?By
>> >> >> > way of illustration, it is perfectly legal to have a section size of
>> >> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So
>> >> >> > sections do not have to be completely filled.
>> >> >> >
>> >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
>> > if
>> >> > a
>> >> >> section has a hole when initializing sparse memory.
>> >> >>
>> >> >> I identified that a section doesn't need to be a contiguous area of
>> >> > physical
>> >> >> memory when reading your comment with the fact that the mem_map of a
>> >> > section
>> >> >> can be smaller than the size of a section.
>> >> >>
>> >> >> I found, however, the kernel panics when modifying min_free_kbytes file
>> > in
>> >> >> the proc filesystem if a section has a hole.
>> >> >>
>> >> >> While processing the change of min_free_kbytes in the kernel, page
>> >> >> descriptors in a hole of an online section is accessed.
>> >> >
>> >> > As I said, following error happens.
>> >> > It would be helpful to me if any opinions or comments.
>> >> >
>> >>
>> >> Could you test below patch?
>> >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
>> >>
>> > Yes, I did it, and no kernel panic happens :-)
>> >
>> > Same test...
>> > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
>> > 2736
>> > [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
>> > [root at Samsung ~]#
>> > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
>> > 2730
>> >
>> >
>> >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
>> >> *zone)
>> >> ? ? ? ? for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
>> >> ? ? ? ? ? ? ? ? if (!pfn_valid(pfn))
>> >> ? ? ? ? ? ? ? ? ? ? ? ? continue;
>> >> +
>> >> ? ? ? ? ? ? ? ? page = pfn_to_page(pfn);
>> >>
>> >> + ? ? ? ? ? ? ? ?/* Watch for unexpected holes punched in the memmap */
>> >> + ? ? ? ? ? ? ? ?if (!memmap_valid_within(pfn, page, zone))
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?continue;
>> >> +
>> >> ? ? ? ? ? ? ? ? /* Watch out for overlapping nodes */
>> >> ? ? ? ? ? ? ? ? if (page_to_nid(page) != zone_to_nid(zone))
>> >> ? ? ? ? ? ? ? ? ? ? ? ? continue;
>> >>
>> >>
>> >>
>> >
>> > ...Could you please explain about this issue?
>>
>> The setup_zone_migrate_reserve doesn't check memmap hole. I think
>> compaction would have the ?same problem, too. I don't know there is a
>> problem in elsewhere.
>> Anyway, I think memmap_valid_within calling whenever walking whole pfn
>> range isn't a good solution. We already have pfn_valid. Could we check
>> this in there?
>> For example, mem_section have a valid pfn range and then valid section
>> can test it in pfn_valid.
>>
>> What do you think about it?
>>
>
> please map another "fake" page when you free memmap.
>
> For example, prepare a page filled with (1 << PG_reserved).
> and replace it with unnecessary memmap rather than freeing a page for memmap.

Hmm. I don't got your point.
The problem is that we access struct page by pfn number not address.

You mean let's remain memmap on hole with changing PageReseved instead of free?

>
> Then, you can find a PageReserved() page there. However it doesn't have valid
> nid,zid,secitonID, I don't think it can cause trouble.
> But you may see some BUG_ON() hit. If so, I think ARM guys can add
>
> #define PageHole(page) (PageReserved(page) && PageXXXXX(page))
> for avoiding BUG_ON() hit. (and this will help to skip unnecessary
> memmap_init_zone())

I think it's not a good idea to add new flag. If
Kame. Could you review my RFC patch which makes pfn_valid check more
tightly on sparsemem?

>
> By this, you can finaly remove CONFIG_MEMMAP_HOLES_IN_ZONE, and makes ARM's
> sparsemem much easier for maintainance.

I think above problem isn't only ARM one. It can happen on any
sparsemem which do loose pfn_valid check.

>
> section ID is not important as far as you call page_to_pfn().
> You may able to add WARNING when page_to_pfn() is called against a HOLE page.

Yes. It's a good idea. We can add it optionally. But for it, we need
PageHole or mem_section bank range check. I like second like my
previous posting.


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-13  2:05           ` Minchan Kim
@ 2010-07-13  3:03             ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 60+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  3:03 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Kukjin Kim, Mel Gorman, linux-arm-kernel, linux-samsung-soc,
	Russell King - ARM Linux, ben-linux

On Tue, 13 Jul 2010 11:05:26 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Tue, Jul 13, 2010 at 9:25 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> > For example, prepare a page filled with (1 << PG_reserved).
> > and replace it with unnecessary memmap rather than freeing a page for memmap.
> 
> Hmm. I don't got your point.
> The problem is that we access struct page by pfn number not address.
> 
> You mean let's remain memmap on hole with changing PageReseved instead of free?
> 
Like a ZERO_PAGE, preparing RESERVED_PAGE, 
which is filled with (1 << PG_reserved) as

  0x00000400,0x00000400,0x00000400,0x00000400
  .....

And map this pages to every hole. Then, you only waste a page to fill all holes
because "struct page" is aligned to 4bytes.


> I think it's not a good idea to add new flag. If
> Kame. Could you review my RFC patch which makes pfn_valid check more
> tightly on sparsemem?
Sure.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-13  3:03             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 60+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  3:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 13 Jul 2010 11:05:26 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Tue, Jul 13, 2010 at 9:25 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> > For example, prepare a page filled with (1 << PG_reserved).
> > and replace it with unnecessary memmap rather than freeing a page for memmap.
> 
> Hmm. I don't got your point.
> The problem is that we access struct page by pfn number not address.
> 
> You mean let's remain memmap on hole with changing PageReseved instead of free?
> 
Like a ZERO_PAGE, preparing RESERVED_PAGE, 
which is filled with (1 << PG_reserved) as

  0x00000400,0x00000400,0x00000400,0x00000400
  .....

And map this pages to every hole. Then, you only waste a page to fill all holes
because "struct page" is aligned to 4bytes.


> I think it's not a good idea to add new flag. If
> Kame. Could you review my RFC patch which makes pfn_valid check more
> tightly on sparsemem?
Sure.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-12 10:35       ` Minchan Kim
@ 2010-07-13  9:26         ` Mel Gorman
  -1 siblings, 0 replies; 60+ messages in thread
From: Mel Gorman @ 2010-07-13  9:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel,
	linux-samsung-soc, Russell King - ARM Linux, ben-linux

On Mon, Jul 12, 2010 at 07:35:17PM +0900, Minchan Kim wrote:
> >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> >> > Russell,
> >> >
> >> > Hi,
> >> >
> >> > Kukjin Kim wrote:
> >> >> Russell wrote:
> >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's fine.
> >> >> > That doesn't mean the section size is 16MB.
> >> >> >
> >> >> > As I've already said, the section size has _nothing_ what so ever to do
> >> >> > with the size of memory, or the granularity of the size of memory.  By
> >> >> > way of illustration, it is perfectly legal to have a section size of
> >> >> > 256MB but only have 1MB in a section and this is perfectly legal.  So
> >> >> > sections do not have to be completely filled.
> >> >> >

This is accurate although there is an expectation that a section is as
larger or larger than MAX_ORDER_NR_PAGES.

> >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem if
> >> > a
> >> >> section has a hole when initializing sparse memory.
> >> >>
> >> >> I identified that a section doesn't need to be a contiguous area of physical
> >> >> memory when reading your comment with the fact that the mem_map of a section
> >> >> can be smaller than the size of a section.
> >> >>

This should only happen in one case, on ARM and it breaks assumptions.
It is typically assumed that if a page is valid within a block of
MAX_ORDER_NR_PAGES, then the entire range is active. If
CONFIG_HOLES_IN_ZONE is set, then there may be holes within a
MAX_ORDER_NR_PAGES range and there is a performance hit as a result.

There is also an assumption that a section is fully populated or empty.
Look at the implementation of pfn_valid for sparsemem, it checks if the
section has SECTION_HAS_MEM_MAP set and it's the same check for any page
within that section. If there are holes in the section, the pfn_valid()
check would return true.

Check out the comment for memmap_valid_within() which tries to get
around this problem on ARM which is the only architecture punching holes
in its mem_map. As it's only depended on for the information in one proc
file, the performance hit is not a problem but it should not be
considered a typical thing.

> >> >> I found, however, the kernel panics when modifying min_free_kbytes file
> > in
> >> >> the proc filesystem if a section has a hole.
> >> >>
> >> >> While processing the change of min_free_kbytes in the kernel, page
> >> >> descriptors in a hole of an online section is accessed.
> >> >
> >> > As I said, following error happens.
> >> > It would be helpful to me if any opinions or comments.
> >> >
> >>
> >> Could you test below patch?
> >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
> >>
> > Yes, I did it, and no kernel panic happens :-)
> >
> > Same test...
> > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > 2736
> > [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
> > [root@Samsung ~]#
> > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > 2730
> >
> >
> >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
> >> *zone)
> >>         for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
> >>                 if (!pfn_valid(pfn))
> >>                         continue;
> >> +
> >>                 page = pfn_to_page(pfn);
> >>
> >> +                /* Watch for unexpected holes punched in the memmap */
> >> +                if (!memmap_valid_within(pfn, page, zone))
> >> +                        continue;
> >> +
> >>                 /* Watch out for overlapping nodes */
> >>                 if (page_to_nid(page) != zone_to_nid(zone))
> >>                         continue;
> >>
> >>
> >>
> >
> > ...Could you please explain about this issue?
> 

The issue is that ARM can create holes within a section of memory which
breaks the memory model by allowing pfn_valid() to return true for PFNs
backed by no memmap. This causes awkwardness.

> The setup_zone_migrate_reserve doesn't check memmap hole.

It doesn't. The worst case scenario is where the hole is punched at the
beginning of a section, pfn_valid returns true but the PFN is junk and
crashes shortly afterwards. This would require a zone to start in a hole
which should never happen - it makes no sense. If this is the scenario
being encountered, ensure that zones do not start in holes.

> I think
> compaction would have the  same problem, too.
> I don't know there is a
> problem in elsewhere.
> Anyway, I think memmap_valid_within calling whenever walking whole pfn
> range isn't a good solution.

No, it's not. The rules for pfn_valid and pfn_valid_within are already poorly
understood and we shouldn't add additional rules on memmap_valid_within just
for ARM if possible. If the problems are being encountered on sparsemem on
ARM, I'd prefer to simply see holes not punched in the memmap within a section!

> We already have pfn_valid. Could we check
> this in there?

Ordinarily, yes you would use pfn_valid or pfn_valid_within. It's only on ARM
where assumptions of the memory model are violated that memmap_valid_within
is used. It's unsatisfactory even there but as it was only used for a
proc file, it wasn't important. I'd really hate to see its use increased.

At the time it was discussed, a "proper" fix would have consumed as much
memory as saved by deleting portions of the memmap and was rejected.

> For example, mem_section have a valid pfn range and then valid section
> can test it in pfn_valid.
> 
> What do you think about it?
> 
> P.S)
> I know Mel is very busy to test to avoid writeback in direct reclaim.

I'm also heavily distracted by internal bugs so I'm afraid I didn't read
this thread. Hopefully the above information is useful to you.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-13  9:26         ` Mel Gorman
  0 siblings, 0 replies; 60+ messages in thread
From: Mel Gorman @ 2010-07-13  9:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 12, 2010 at 07:35:17PM +0900, Minchan Kim wrote:
> >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> >> > Russell,
> >> >
> >> > Hi,
> >> >
> >> > Kukjin Kim wrote:
> >> >> Russell wrote:
> >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's fine.
> >> >> > That doesn't mean the section size is 16MB.
> >> >> >
> >> >> > As I've already said, the section size has _nothing_ what so ever to do
> >> >> > with the size of memory, or the granularity of the size of memory. ?By
> >> >> > way of illustration, it is perfectly legal to have a section size of
> >> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So
> >> >> > sections do not have to be completely filled.
> >> >> >

This is accurate although there is an expectation that a section is as
larger or larger than MAX_ORDER_NR_PAGES.

> >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem if
> >> > a
> >> >> section has a hole when initializing sparse memory.
> >> >>
> >> >> I identified that a section doesn't need to be a contiguous area of physical
> >> >> memory when reading your comment with the fact that the mem_map of a section
> >> >> can be smaller than the size of a section.
> >> >>

This should only happen in one case, on ARM and it breaks assumptions.
It is typically assumed that if a page is valid within a block of
MAX_ORDER_NR_PAGES, then the entire range is active. If
CONFIG_HOLES_IN_ZONE is set, then there may be holes within a
MAX_ORDER_NR_PAGES range and there is a performance hit as a result.

There is also an assumption that a section is fully populated or empty.
Look at the implementation of pfn_valid for sparsemem, it checks if the
section has SECTION_HAS_MEM_MAP set and it's the same check for any page
within that section. If there are holes in the section, the pfn_valid()
check would return true.

Check out the comment for memmap_valid_within() which tries to get
around this problem on ARM which is the only architecture punching holes
in its mem_map. As it's only depended on for the information in one proc
file, the performance hit is not a problem but it should not be
considered a typical thing.

> >> >> I found, however, the kernel panics when modifying min_free_kbytes file
> > in
> >> >> the proc filesystem if a section has a hole.
> >> >>
> >> >> While processing the change of min_free_kbytes in the kernel, page
> >> >> descriptors in a hole of an online section is accessed.
> >> >
> >> > As I said, following error happens.
> >> > It would be helpful to me if any opinions or comments.
> >> >
> >>
> >> Could you test below patch?
> >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
> >>
> > Yes, I did it, and no kernel panic happens :-)
> >
> > Same test...
> > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > 2736
> > [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
> > [root at Samsung ~]#
> > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > 2730
> >
> >
> >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
> >> *zone)
> >> ? ? ? ? for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
> >> ? ? ? ? ? ? ? ? if (!pfn_valid(pfn))
> >> ? ? ? ? ? ? ? ? ? ? ? ? continue;
> >> +
> >> ? ? ? ? ? ? ? ? page = pfn_to_page(pfn);
> >>
> >> + ? ? ? ? ? ? ? ?/* Watch for unexpected holes punched in the memmap */
> >> + ? ? ? ? ? ? ? ?if (!memmap_valid_within(pfn, page, zone))
> >> + ? ? ? ? ? ? ? ? ? ? ? ?continue;
> >> +
> >> ? ? ? ? ? ? ? ? /* Watch out for overlapping nodes */
> >> ? ? ? ? ? ? ? ? if (page_to_nid(page) != zone_to_nid(zone))
> >> ? ? ? ? ? ? ? ? ? ? ? ? continue;
> >>
> >>
> >>
> >
> > ...Could you please explain about this issue?
> 

The issue is that ARM can create holes within a section of memory which
breaks the memory model by allowing pfn_valid() to return true for PFNs
backed by no memmap. This causes awkwardness.

> The setup_zone_migrate_reserve doesn't check memmap hole.

It doesn't. The worst case scenario is where the hole is punched at the
beginning of a section, pfn_valid returns true but the PFN is junk and
crashes shortly afterwards. This would require a zone to start in a hole
which should never happen - it makes no sense. If this is the scenario
being encountered, ensure that zones do not start in holes.

> I think
> compaction would have the  same problem, too.
> I don't know there is a
> problem in elsewhere.
> Anyway, I think memmap_valid_within calling whenever walking whole pfn
> range isn't a good solution.

No, it's not. The rules for pfn_valid and pfn_valid_within are already poorly
understood and we shouldn't add additional rules on memmap_valid_within just
for ARM if possible. If the problems are being encountered on sparsemem on
ARM, I'd prefer to simply see holes not punched in the memmap within a section!

> We already have pfn_valid. Could we check
> this in there?

Ordinarily, yes you would use pfn_valid or pfn_valid_within. It's only on ARM
where assumptions of the memory model are violated that memmap_valid_within
is used. It's unsatisfactory even there but as it was only used for a
proc file, it wasn't important. I'd really hate to see its use increased.

At the time it was discussed, a "proper" fix would have consumed as much
memory as saved by deleting portions of the memmap and was rejected.

> For example, mem_section have a valid pfn range and then valid section
> can test it in pfn_valid.
> 
> What do you think about it?
> 
> P.S)
> I know Mel is very busy to test to avoid writeback in direct reclaim.

I'm also heavily distracted by internal bugs so I'm afraid I didn't read
this thread. Hopefully the above information is useful to you.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-13  3:03             ` KAMEZAWA Hiroyuki
@ 2010-07-13  9:28               ` Mel Gorman
  -1 siblings, 0 replies; 60+ messages in thread
From: Mel Gorman @ 2010-07-13  9:28 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Minchan Kim, Kukjin Kim, linux-arm-kernel, linux-samsung-soc,
	Russell King - ARM Linux, ben-linux

On Tue, Jul 13, 2010 at 12:03:15PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 13 Jul 2010 11:05:26 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
> 
> > On Tue, Jul 13, 2010 at 9:25 AM, KAMEZAWA Hiroyuki
> > <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > > For example, prepare a page filled with (1 << PG_reserved).
> > > and replace it with unnecessary memmap rather than freeing a page for memmap.
> > 
> > Hmm. I don't got your point.
> > The problem is that we access struct page by pfn number not address.
> > 
> > You mean let's remain memmap on hole with changing PageReseved instead of free?
> > 
> Like a ZERO_PAGE, preparing RESERVED_PAGE, 
> which is filled with (1 << PG_reserved) as
> 
>   0x00000400,0x00000400,0x00000400,0x00000400
>   .....
> 
> And map this pages to every hole. Then, you only waste a page to fill all holes
> because "struct page" is aligned to 4bytes.
> 

I like this idea. It would allow memmap_valid_within to be thrown away
altogether and it maintains the assumptions of the memory model and
sounds "cheap".

> 
> > I think it's not a good idea to add new flag. If
> > Kame. Could you review my RFC patch which makes pfn_valid check more
> > tightly on sparsemem?
> Sure.
> 
> Thanks,
> -Kame
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-13  9:28               ` Mel Gorman
  0 siblings, 0 replies; 60+ messages in thread
From: Mel Gorman @ 2010-07-13  9:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 13, 2010 at 12:03:15PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 13 Jul 2010 11:05:26 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
> 
> > On Tue, Jul 13, 2010 at 9:25 AM, KAMEZAWA Hiroyuki
> > <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > > For example, prepare a page filled with (1 << PG_reserved).
> > > and replace it with unnecessary memmap rather than freeing a page for memmap.
> > 
> > Hmm. I don't got your point.
> > The problem is that we access struct page by pfn number not address.
> > 
> > You mean let's remain memmap on hole with changing PageReseved instead of free?
> > 
> Like a ZERO_PAGE, preparing RESERVED_PAGE, 
> which is filled with (1 << PG_reserved) as
> 
>   0x00000400,0x00000400,0x00000400,0x00000400
>   .....
> 
> And map this pages to every hole. Then, you only waste a page to fill all holes
> because "struct page" is aligned to 4bytes.
> 

I like this idea. It would allow memmap_valid_within to be thrown away
altogether and it maintains the assumptions of the memory model and
sounds "cheap".

> 
> > I think it's not a good idea to add new flag. If
> > Kame. Could you review my RFC patch which makes pfn_valid check more
> > tightly on sparsemem?
> Sure.
> 
> Thanks,
> -Kame
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-13  9:26         ` Mel Gorman
@ 2010-07-13  9:38           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13  9:38 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel,
	linux-samsung-soc, ben-linux

On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote:
> There is also an assumption that a section is fully populated or empty.

That is absolutely absurd.  So, I have a platform which has 256MB at
64MB intervals in 4 chunks.  I can fit 512kB to any slot.  It starts
at 0x10000000.  Do I really need 1024 sparsemem sections to cater for
this?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-13  9:38           ` Russell King - ARM Linux
  0 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13  9:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote:
> There is also an assumption that a section is fully populated or empty.

That is absolutely absurd.  So, I have a platform which has 256MB at
64MB intervals in 4 chunks.  I can fit 512kB to any slot.  It starts
at 0x10000000.  Do I really need 1024 sparsemem sections to cater for
this?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-13  9:28               ` Mel Gorman
@ 2010-07-13  9:38                 ` Russell King - ARM Linux
  -1 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13  9:38 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KAMEZAWA Hiroyuki, Minchan Kim, Kukjin Kim, linux-arm-kernel,
	linux-samsung-soc, ben-linux

On Tue, Jul 13, 2010 at 10:28:52AM +0100, Mel Gorman wrote:
> I like this idea. It would allow memmap_valid_within to be thrown away
> altogether and it maintains the assumptions of the memory model and
> sounds "cheap".

It can't work.  We map kernel memory with 1MB sections, not pages.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-13  9:38                 ` Russell King - ARM Linux
  0 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13  9:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 13, 2010 at 10:28:52AM +0100, Mel Gorman wrote:
> I like this idea. It would allow memmap_valid_within to be thrown away
> altogether and it maintains the assumptions of the memory model and
> sounds "cheap".

It can't work.  We map kernel memory with 1MB sections, not pages.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-13  9:38           ` Russell King - ARM Linux
@ 2010-07-13  9:50             ` Mel Gorman
  -1 siblings, 0 replies; 60+ messages in thread
From: Mel Gorman @ 2010-07-13  9:50 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-samsung-soc, KAMEZAWA Hiroyuki, Kukjin Kim, Minchan Kim,
	ben-linux, linux-arm-kernel

On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote:
> On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote:
> > There is also an assumption that a section is fully populated or empty.
> 
> That is absolutely absurd. 

Arguably, violating the memory model by punching unexpected holes in it
is a also absurd.

> So, I have a platform which has 256MB at
> 64MB intervals in 4 chunks. I can fit 512kB to any slot. 

I'm afraid I'm not quite getting your example.

If the granularity of the banks is 64MB and the alignment is 256MB, I
don't see what hole you'd be punching anyway.

> It starts
> at 0x10000000.  Do I really need 1024 sparsemem sections to cater for
> this?
> 

Not necessarily, just don't punch holes within section boundaries when using
sparsemem. If mapping in a PageReserved page is not an option due to complexity
(it's possible by remapping pages in kernel space which x86 used to do for
discontig) then the size of mem_section as suggested by Minchan Kim would be
another option. This would increase the overhead of sparsemem in terms of space
and performance. - at worst by an amount matching the memory freed by memmap.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-13  9:50             ` Mel Gorman
  0 siblings, 0 replies; 60+ messages in thread
From: Mel Gorman @ 2010-07-13  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote:
> On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote:
> > There is also an assumption that a section is fully populated or empty.
> 
> That is absolutely absurd. 

Arguably, violating the memory model by punching unexpected holes in it
is a also absurd.

> So, I have a platform which has 256MB at
> 64MB intervals in 4 chunks. I can fit 512kB to any slot. 

I'm afraid I'm not quite getting your example.

If the granularity of the banks is 64MB and the alignment is 256MB, I
don't see what hole you'd be punching anyway.

> It starts
> at 0x10000000.  Do I really need 1024 sparsemem sections to cater for
> this?
> 

Not necessarily, just don't punch holes within section boundaries when using
sparsemem. If mapping in a PageReserved page is not an option due to complexity
(it's possible by remapping pages in kernel space which x86 used to do for
discontig) then the size of mem_section as suggested by Minchan Kim would be
another option. This would increase the overhead of sparsemem in terms of space
and performance. - at worst by an amount matching the memory freed by memmap.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-13  9:50             ` Mel Gorman
@ 2010-07-13 17:37               ` Russell King - ARM Linux
  -1 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13 17:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel,
	linux-samsung-soc, ben-linux

On Tue, Jul 13, 2010 at 10:50:35AM +0100, Mel Gorman wrote:
> On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote:
> > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote:
> > > There is also an assumption that a section is fully populated or empty.
> > 
> > That is absolutely absurd. 
> 
> Arguably, violating the memory model by punching unexpected holes in it
> is a also absurd.

Well, it's something we've done since 1.2.xx, and actually it's a new
requirement that's evolved into the code that these page structs need
to be populated.  So it's arguably a regression in the MM code which
hasn't been detected.

> > So, I have a platform which has 256MB at
> > 64MB intervals in 4 chunks. I can fit 512kB to any slot. 
> 
> I'm afraid I'm not quite getting your example.
> 
> If the granularity of the banks is 64MB and the alignment is 256MB, I
> don't see what hole you'd be punching anyway.

I didn't say the alignment is 256MB.  I was referring to the *size* of
memory.

> Not necessarily, just don't punch holes within section boundaries when using
> sparsemem.

Err, you've just changed your story.

Let's take an example setup which is perfectly valid.  512kB at 0x10000000.
512kB at 0x14000000. 32MB at 0x1c000000.

You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong.

So, what do you suggest would be the correct sparsemem settings for this?
MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=19 maybe?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-13 17:37               ` Russell King - ARM Linux
  0 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13 17:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 13, 2010 at 10:50:35AM +0100, Mel Gorman wrote:
> On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote:
> > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote:
> > > There is also an assumption that a section is fully populated or empty.
> > 
> > That is absolutely absurd. 
> 
> Arguably, violating the memory model by punching unexpected holes in it
> is a also absurd.

Well, it's something we've done since 1.2.xx, and actually it's a new
requirement that's evolved into the code that these page structs need
to be populated.  So it's arguably a regression in the MM code which
hasn't been detected.

> > So, I have a platform which has 256MB at
> > 64MB intervals in 4 chunks. I can fit 512kB to any slot. 
> 
> I'm afraid I'm not quite getting your example.
> 
> If the granularity of the banks is 64MB and the alignment is 256MB, I
> don't see what hole you'd be punching anyway.

I didn't say the alignment is 256MB.  I was referring to the *size* of
memory.

> Not necessarily, just don't punch holes within section boundaries when using
> sparsemem.

Err, you've just changed your story.

Let's take an example setup which is perfectly valid.  512kB at 0x10000000.
512kB at 0x14000000. 32MB at 0x1c000000.

You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong.

So, what do you suggest would be the correct sparsemem settings for this?
MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=19 maybe?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-13  1:53           ` KAMEZAWA Hiroyuki
@ 2010-07-13 18:31             ` Russell King - ARM Linux
  -1 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13 18:31 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Minchan Kim, Kukjin Kim, Mel Gorman, linux-arm-kernel,
	linux-samsung-soc, ben-linux

On Tue, Jul 13, 2010 at 10:53:24AM +0900, KAMEZAWA Hiroyuki wrote:
> yet another _very easy_ solution is, define pfn_valid() as following.

We have a perfectly good and efficient pfn_valid() implementation which'll
work no matter what memory model is chosen - we don't need to invent other
solutions.  But as sparsemem always provides its own pfn_valid() and is
not overridable...

Maybe the answer _is_ to make it overridable.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-13 18:31             ` Russell King - ARM Linux
  0 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13 18:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 13, 2010 at 10:53:24AM +0900, KAMEZAWA Hiroyuki wrote:
> yet another _very easy_ solution is, define pfn_valid() as following.

We have a perfectly good and efficient pfn_valid() implementation which'll
work no matter what memory model is chosen - we don't need to invent other
solutions.  But as sparsemem always provides its own pfn_valid() and is
not overridable...

Maybe the answer _is_ to make it overridable.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-13 17:37               ` Russell King - ARM Linux
@ 2010-07-13 20:32                 ` Mel Gorman
  -1 siblings, 0 replies; 60+ messages in thread
From: Mel Gorman @ 2010-07-13 20:32 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Minchan Kim, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel,
	linux-samsung-soc, ben-linux

On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote:
> On Tue, Jul 13, 2010 at 10:50:35AM +0100, Mel Gorman wrote:
> > On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote:
> > > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote:
> > > > There is also an assumption that a section is fully populated or empty.
> > > 
> > > That is absolutely absurd. 
> > 
> > Arguably, violating the memory model by punching unexpected holes in it
> > is a also absurd.
> 
> Well, it's something we've done since 1.2.xx, and actually it's a new
> requirement that's evolved into the code that these page structs need
> to be populated. 

It's not that new. It evolved over a long period of time probably starting
from where the buddy bitmap was moved into the page->flags to save space.
Over time, this assumptions became more strict. Anyway, this is neither here
nor there so lets not fall down that hole.

> So it's arguably a regression in the MM code which
> hasn't been detected.
> 
> > > So, I have a platform which has 256MB at
> > > 64MB intervals in 4 chunks. I can fit 512kB to any slot. 
> > 
> > I'm afraid I'm not quite getting your example.
> > 
> > If the granularity of the banks is 64MB and the alignment is 256MB, I
> > don't see what hole you'd be punching anyway.
> 
> I didn't say the alignment is 256MB.  I was referring to the *size* of
> memory.
> 
> > Not necessarily, just don't punch holes within section boundaries when using
> > sparsemem.
> 
> Err, you've just changed your story.
> 

How so? I'd rather you didn't punch holes in the memmap allocated by sparsemem
at all.

> Let's take an example setup which is perfectly valid.  512kB at 0x10000000.
> 512kB at 0x14000000. 32MB at 0x1c000000.
> 

Ok, that is a clear example, thanks.

> You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong.
> 

Not wrong. It's fine as long as you're ok with some unnecessary memmap
being allocated. There is wastage of memory but functionally it'll be
fine with existing sparsemem and the assumptions it makes.

Obviously you're not ok with this wastage or this discussion would not even
be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage.

> So, what do you suggest would be the correct sparsemem settings for this?
> MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=19 maybe?
> 

I haven't worked it out but I would think that wastes more memory than having
the larger section with some memmap wasted so I'm not going to advocate it
even though it'd "work".

I haven't researched this so apologies if it turns out to be stupid but
I think the bit SECTION_MAP_LAST_BIT is actually unused and should be
safe to use. Has the option being considered of using this bit to mean
"section has holes punched in it". If set, the architecture must provide
an additional arch_holey_section_pfn_valid() that does additional checking
based on information sparsemem doesn't have?  This would avoid the worst of
the performance issues of making pfn_valid() slower without increasing the
size of mem_section.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-13 20:32                 ` Mel Gorman
  0 siblings, 0 replies; 60+ messages in thread
From: Mel Gorman @ 2010-07-13 20:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote:
> On Tue, Jul 13, 2010 at 10:50:35AM +0100, Mel Gorman wrote:
> > On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote:
> > > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote:
> > > > There is also an assumption that a section is fully populated or empty.
> > > 
> > > That is absolutely absurd. 
> > 
> > Arguably, violating the memory model by punching unexpected holes in it
> > is a also absurd.
> 
> Well, it's something we've done since 1.2.xx, and actually it's a new
> requirement that's evolved into the code that these page structs need
> to be populated. 

It's not that new. It evolved over a long period of time probably starting
from where the buddy bitmap was moved into the page->flags to save space.
Over time, this assumptions became more strict. Anyway, this is neither here
nor there so lets not fall down that hole.

> So it's arguably a regression in the MM code which
> hasn't been detected.
> 
> > > So, I have a platform which has 256MB at
> > > 64MB intervals in 4 chunks. I can fit 512kB to any slot. 
> > 
> > I'm afraid I'm not quite getting your example.
> > 
> > If the granularity of the banks is 64MB and the alignment is 256MB, I
> > don't see what hole you'd be punching anyway.
> 
> I didn't say the alignment is 256MB.  I was referring to the *size* of
> memory.
> 
> > Not necessarily, just don't punch holes within section boundaries when using
> > sparsemem.
> 
> Err, you've just changed your story.
> 

How so? I'd rather you didn't punch holes in the memmap allocated by sparsemem
at all.

> Let's take an example setup which is perfectly valid.  512kB at 0x10000000.
> 512kB at 0x14000000. 32MB at 0x1c000000.
> 

Ok, that is a clear example, thanks.

> You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong.
> 

Not wrong. It's fine as long as you're ok with some unnecessary memmap
being allocated. There is wastage of memory but functionally it'll be
fine with existing sparsemem and the assumptions it makes.

Obviously you're not ok with this wastage or this discussion would not even
be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage.

> So, what do you suggest would be the correct sparsemem settings for this?
> MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=19 maybe?
> 

I haven't worked it out but I would think that wastes more memory than having
the larger section with some memmap wasted so I'm not going to advocate it
even though it'd "work".

I haven't researched this so apologies if it turns out to be stupid but
I think the bit SECTION_MAP_LAST_BIT is actually unused and should be
safe to use. Has the option being considered of using this bit to mean
"section has holes punched in it". If set, the architecture must provide
an additional arch_holey_section_pfn_valid() that does additional checking
based on information sparsemem doesn't have?  This would avoid the worst of
the performance issues of making pfn_valid() slower without increasing the
size of mem_section.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-13 20:32                 ` Mel Gorman
@ 2010-07-13 23:59                   ` Minchan Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-13 23:59 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Russell King - ARM Linux, Kukjin Kim, KAMEZAWA Hiroyuki,
	linux-arm-kernel, linux-samsung-soc, ben-linux

Hi, Mel.

On Wed, Jul 14, 2010 at 5:32 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote:
>> On Tue, Jul 13, 2010 at 10:50:35AM +0100, Mel Gorman wrote:
>> > On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote:
>> > > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote:
>> > > > There is also an assumption that a section is fully populated or empty.
>> > >
>> > > That is absolutely absurd.
>> >
>> > Arguably, violating the memory model by punching unexpected holes in it
>> > is a also absurd.
>>
>> Well, it's something we've done since 1.2.xx, and actually it's a new
>> requirement that's evolved into the code that these page structs need
>> to be populated.
>
> It's not that new. It evolved over a long period of time probably starting
> from where the buddy bitmap was moved into the page->flags to save space.
> Over time, this assumptions became more strict. Anyway, this is neither here
> nor there so lets not fall down that hole.
>
>> So it's arguably a regression in the MM code which
>> hasn't been detected.
>>
>> > > So, I have a platform which has 256MB at
>> > > 64MB intervals in 4 chunks. I can fit 512kB to any slot.
>> >
>> > I'm afraid I'm not quite getting your example.
>> >
>> > If the granularity of the banks is 64MB and the alignment is 256MB, I
>> > don't see what hole you'd be punching anyway.
>>
>> I didn't say the alignment is 256MB.  I was referring to the *size* of
>> memory.
>>
>> > Not necessarily, just don't punch holes within section boundaries when using
>> > sparsemem.
>>
>> Err, you've just changed your story.
>>
>
> How so? I'd rather you didn't punch holes in the memmap allocated by sparsemem
> at all.
>
>> Let's take an example setup which is perfectly valid.  512kB at 0x10000000.
>> 512kB at 0x14000000. 32MB at 0x1c000000.
>>
>
> Ok, that is a clear example, thanks.
>
>> You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong.
>>
>
> Not wrong. It's fine as long as you're ok with some unnecessary memmap
> being allocated. There is wastage of memory but functionally it'll be
> fine with existing sparsemem and the assumptions it makes.
>
> Obviously you're not ok with this wastage or this discussion would not even
> be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage.
>
>> So, what do you suggest would be the correct sparsemem settings for this?
>> MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=19 maybe?
>>
>
> I haven't worked it out but I would think that wastes more memory than having
> the larger section with some memmap wasted so I'm not going to advocate it
> even though it'd "work".
>
> I haven't researched this so apologies if it turns out to be stupid but
> I think the bit SECTION_MAP_LAST_BIT is actually unused and should be
> safe to use. Has the option being considered of using this bit to mean
> "section has holes punched in it". If set, the architecture must provide
> an additional arch_holey_section_pfn_valid() that does additional checking
> based on information sparsemem doesn't have?  This would avoid the worst of
> the performance issues of making pfn_valid() slower without increasing the
> size of mem_section.

I think it's a very good idea.

We can know if any section has the hole with memory_present so can set
the bit in there.
Then, if pfn_valid find the section which has hole, it can call
arch_holey_section_pfn_valid.

But I think Kukjin case's best solution is to make section size 16M, not 256M.
Regardless of this, Your idea is the direction we should proceed, I think.

Thanks, Mel.

>
> --
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-13 23:59                   ` Minchan Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-13 23:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hi, Mel.

On Wed, Jul 14, 2010 at 5:32 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote:
>> On Tue, Jul 13, 2010 at 10:50:35AM +0100, Mel Gorman wrote:
>> > On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote:
>> > > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote:
>> > > > There is also an assumption that a section is fully populated or empty.
>> > >
>> > > That is absolutely absurd.
>> >
>> > Arguably, violating the memory model by punching unexpected holes in it
>> > is a also absurd.
>>
>> Well, it's something we've done since 1.2.xx, and actually it's a new
>> requirement that's evolved into the code that these page structs need
>> to be populated.
>
> It's not that new. It evolved over a long period of time probably starting
> from where the buddy bitmap was moved into the page->flags to save space.
> Over time, this assumptions became more strict. Anyway, this is neither here
> nor there so lets not fall down that hole.
>
>> So it's arguably a regression in the MM code which
>> hasn't been detected.
>>
>> > > So, I have a platform which has 256MB at
>> > > 64MB intervals in 4 chunks. I can fit 512kB to any slot.
>> >
>> > I'm afraid I'm not quite getting your example.
>> >
>> > If the granularity of the banks is 64MB and the alignment is 256MB, I
>> > don't see what hole you'd be punching anyway.
>>
>> I didn't say the alignment is 256MB. ?I was referring to the *size* of
>> memory.
>>
>> > Not necessarily, just don't punch holes within section boundaries when using
>> > sparsemem.
>>
>> Err, you've just changed your story.
>>
>
> How so? I'd rather you didn't punch holes in the memmap allocated by sparsemem
> at all.
>
>> Let's take an example setup which is perfectly valid. ?512kB at 0x10000000.
>> 512kB at 0x14000000. 32MB at 0x1c000000.
>>
>
> Ok, that is a clear example, thanks.
>
>> You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong.
>>
>
> Not wrong. It's fine as long as you're ok with some unnecessary memmap
> being allocated. There is wastage of memory but functionally it'll be
> fine with existing sparsemem and the assumptions it makes.
>
> Obviously you're not ok with this wastage or this discussion would not even
> be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage.
>
>> So, what do you suggest would be the correct sparsemem settings for this?
>> MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=19 maybe?
>>
>
> I haven't worked it out but I would think that wastes more memory than having
> the larger section with some memmap wasted so I'm not going to advocate it
> even though it'd "work".
>
> I haven't researched this so apologies if it turns out to be stupid but
> I think the bit SECTION_MAP_LAST_BIT is actually unused and should be
> safe to use. Has the option being considered of using this bit to mean
> "section has holes punched in it". If set, the architecture must provide
> an additional arch_holey_section_pfn_valid() that does additional checking
> based on information sparsemem doesn't have? ?This would avoid the worst of
> the performance issues of making pfn_valid() slower without increasing the
> size of mem_section.

I think it's a very good idea.

We can know if any section has the hole with memory_present so can set
the bit in there.
Then, if pfn_valid find the section which has hole, it can call
arch_holey_section_pfn_valid.

But I think Kukjin case's best solution is to make section size 16M, not 256M.
Regardless of this, Your idea is the direction we should proceed, I think.

Thanks, Mel.

>
> --
> Mel Gorman
> Part-time Phd Student ? ? ? ? ? ? ? ? ? ? ? ? ?Linux Technology Center
> University of Limerick ? ? ? ? ? ? ? ? ? ? ? ? IBM Dublin Software Lab
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-13 23:59                   ` Minchan Kim
@ 2010-07-14  8:49                     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-14  8:49 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Mel Gorman, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel,
	linux-samsung-soc, ben-linux

On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote:
> But I think Kukjin case's best solution is to make section size 16M, not 256M.
> Regardless of this, Your idea is the direction we should proceed, I think.

So what if someone decides to fit 8MB DRAMs to the board?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-14  8:49                     ` Russell King - ARM Linux
  0 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-14  8:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote:
> But I think Kukjin case's best solution is to make section size 16M, not 256M.
> Regardless of this, Your idea is the direction we should proceed, I think.

So what if someone decides to fit 8MB DRAMs to the board?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-13 20:32                 ` Mel Gorman
@ 2010-07-14  8:59                   ` Russell King - ARM Linux
  -1 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-14  8:59 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel,
	linux-samsung-soc, ben-linux

On Tue, Jul 13, 2010 at 09:32:50PM +0100, Mel Gorman wrote:
> On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote:
> > You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong.
> > 
> 
> Not wrong. It's fine as long as you're ok with some unnecessary memmap
> being allocated. There is wastage of memory but functionally it'll be
> fine with existing sparsemem and the assumptions it makes.
> 
> Obviously you're not ok with this wastage or this discussion would not even
> be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage.

Absolutely I'm not ok with this wastage - it's 127 + 127 + 64 pages, or
1.2MB - that's 4% of system memory wasted in stuff which isn't going to
be used.

Moreover, the mem_map array for each populated bank is 512K - which is
the size of the RAM in the first two banks.  At that point, there's
absolutely no point in populating them.  The overhead of populating
those banks exactly equals the gain from populating them.

Sparsemem is absolutely absurd in this requirement - it can't handle
sparse memory efficiently without wasting lots of memory in the
process.

> I haven't researched this so apologies if it turns out to be stupid but
> I think the bit SECTION_MAP_LAST_BIT is actually unused and should be
> safe to use. Has the option being considered of using this bit to mean
> "section has holes punched in it". If set, the architecture must provide
> an additional arch_holey_section_pfn_valid() that does additional checking
> based on information sparsemem doesn't have?  This would avoid the worst of
> the performance issues of making pfn_valid() slower without increasing the
> size of mem_section.

As I've already said, how about just allowing pfn_valid() to be overridden
by architectures, even for the sparsemem case.  We have a perfectly good
pfn_valid() implementation that'll work across the board, and will fix
this issue.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-14  8:59                   ` Russell King - ARM Linux
  0 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-14  8:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 13, 2010 at 09:32:50PM +0100, Mel Gorman wrote:
> On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote:
> > You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong.
> > 
> 
> Not wrong. It's fine as long as you're ok with some unnecessary memmap
> being allocated. There is wastage of memory but functionally it'll be
> fine with existing sparsemem and the assumptions it makes.
> 
> Obviously you're not ok with this wastage or this discussion would not even
> be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage.

Absolutely I'm not ok with this wastage - it's 127 + 127 + 64 pages, or
1.2MB - that's 4% of system memory wasted in stuff which isn't going to
be used.

Moreover, the mem_map array for each populated bank is 512K - which is
the size of the RAM in the first two banks.  At that point, there's
absolutely no point in populating them.  The overhead of populating
those banks exactly equals the gain from populating them.

Sparsemem is absolutely absurd in this requirement - it can't handle
sparse memory efficiently without wasting lots of memory in the
process.

> I haven't researched this so apologies if it turns out to be stupid but
> I think the bit SECTION_MAP_LAST_BIT is actually unused and should be
> safe to use. Has the option being considered of using this bit to mean
> "section has holes punched in it". If set, the architecture must provide
> an additional arch_holey_section_pfn_valid() that does additional checking
> based on information sparsemem doesn't have?  This would avoid the worst of
> the performance issues of making pfn_valid() slower without increasing the
> size of mem_section.

As I've already said, how about just allowing pfn_valid() to be overridden
by architectures, even for the sparsemem case.  We have a perfectly good
pfn_valid() implementation that'll work across the board, and will fix
this issue.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-14  8:49                     ` Russell King - ARM Linux
@ 2010-07-14 11:04                       ` Minchan Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-14 11:04 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Mel Gorman, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel,
	linux-samsung-soc, ben-linux

On Wed, Jul 14, 2010 at 5:49 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote:
>> But I think Kukjin case's best solution is to make section size 16M, not 256M.
>> Regardless of this, Your idea is the direction we should proceed, I think.
>
> So what if someone decides to fit 8MB DRAMs to the board?
>

Hmm. It might be a problem.
Should we consider all configuration?

Then, in below cases ram size never can change?
How does it acked and merged?
I don't know below board can't configurable ram size. Just out of curiosity.


   #   line  filename / context / line
   1     76  arch/arm/mach-clps711x/include/mach/memory.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 24
   2     26  arch/arm/mach-lh7a40x/include/mach/memory.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 24
   3     62  arch/arm/mach-realview/include/mach/memory.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 28
   4     38  arch/arm/mach-rpc/include/mach/memory.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 26
   5     21  arch/arm/mach-s5pv210/include/mach/memory.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 28
   6     46  arch/arm/mach-sa1100/include/mach/memory.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 27
   7     19  drivers/staging/msm/memory_ll.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 25





-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-14 11:04                       ` Minchan Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-14 11:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 14, 2010 at 5:49 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote:
>> But I think Kukjin case's best solution is to make section size 16M, not 256M.
>> Regardless of this, Your idea is the direction we should proceed, I think.
>
> So what if someone decides to fit 8MB DRAMs to the board?
>

Hmm. It might be a problem.
Should we consider all configuration?

Then, in below cases ram size never can change?
How does it acked and merged?
I don't know below board can't configurable ram size. Just out of curiosity.


   #   line  filename / context / line
   1     76  arch/arm/mach-clps711x/include/mach/memory.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 24
   2     26  arch/arm/mach-lh7a40x/include/mach/memory.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 24
   3     62  arch/arm/mach-realview/include/mach/memory.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 28
   4     38  arch/arm/mach-rpc/include/mach/memory.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 26
   5     21  arch/arm/mach-s5pv210/include/mach/memory.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 28
   6     46  arch/arm/mach-sa1100/include/mach/memory.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 27
   7     19  drivers/staging/msm/memory_ll.h <<SECTION_SIZE_BITS>>
             #define SECTION_SIZE_BITS 25





-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-14  8:59                   ` Russell King - ARM Linux
@ 2010-07-14 13:14                     ` Mel Gorman
  -1 siblings, 0 replies; 60+ messages in thread
From: Mel Gorman @ 2010-07-14 13:14 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Minchan Kim, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel,
	linux-samsung-soc, ben-linux

On Wed, Jul 14, 2010 at 09:59:18AM +0100, Russell King - ARM Linux wrote:
> On Tue, Jul 13, 2010 at 09:32:50PM +0100, Mel Gorman wrote:
> > On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote:
> > > You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong.
> > > 
> > 
> > Not wrong. It's fine as long as you're ok with some unnecessary memmap
> > being allocated. There is wastage of memory but functionally it'll be
> > fine with existing sparsemem and the assumptions it makes.
> > 
> > Obviously you're not ok with this wastage or this discussion would not even
> > be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage.
> 
> Absolutely I'm not ok with this wastage - it's 127 + 127 + 64 pages, or
> 1.2MB - that's 4% of system memory wasted in stuff which isn't going to
> be used.
> 
> Moreover, the mem_map array for each populated bank is 512K - which is
> the size of the RAM in the first two banks.  At that point, there's
> absolutely no point in populating them.  The overhead of populating
> those banks exactly equals the gain from populating them.
> 
> Sparsemem is absolutely absurd in this requirement - it can't handle
> sparse memory efficiently without wasting lots of memory in the
> process.
> 

I wasn't around when sparsemem was designed, but I strongly suspect it
wasn't considered that a bank would only be 512K. The requirement of a
section being fully populated to allow pfn_valid to be very cheap still
seems very reasonable to me in the majority of cases. It's just no ok in
ARMs case because of the size of banks.

> > I haven't researched this so apologies if it turns out to be stupid but
> > I think the bit SECTION_MAP_LAST_BIT is actually unused and should be
> > safe to use. Has the option being considered of using this bit to mean
> > "section has holes punched in it". If set, the architecture must provide
> > an additional arch_holey_section_pfn_valid() that does additional checking
> > based on information sparsemem doesn't have?  This would avoid the worst of
> > the performance issues of making pfn_valid() slower without increasing the
> > size of mem_section.
> 
> As I've already said, how about just allowing pfn_valid() to be overridden
> by architectures, even for the sparsemem case. 

If nothing else pans out, I won't resist that approach. It's not my No.1
perference because it results in SparseMem-on-ARM behaving one way and
SparseMem-on-everything-else behaving another with respect to pfn_valid().
I'd prefer that an architecture-specific pfn_valid() only be called for
the sections that are known to have holes punched in them.

> We have a perfectly good
> pfn_valid() implementation that'll work across the board, and will fix
> this issue.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-14 13:14                     ` Mel Gorman
  0 siblings, 0 replies; 60+ messages in thread
From: Mel Gorman @ 2010-07-14 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 14, 2010 at 09:59:18AM +0100, Russell King - ARM Linux wrote:
> On Tue, Jul 13, 2010 at 09:32:50PM +0100, Mel Gorman wrote:
> > On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote:
> > > You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong.
> > > 
> > 
> > Not wrong. It's fine as long as you're ok with some unnecessary memmap
> > being allocated. There is wastage of memory but functionally it'll be
> > fine with existing sparsemem and the assumptions it makes.
> > 
> > Obviously you're not ok with this wastage or this discussion would not even
> > be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage.
> 
> Absolutely I'm not ok with this wastage - it's 127 + 127 + 64 pages, or
> 1.2MB - that's 4% of system memory wasted in stuff which isn't going to
> be used.
> 
> Moreover, the mem_map array for each populated bank is 512K - which is
> the size of the RAM in the first two banks.  At that point, there's
> absolutely no point in populating them.  The overhead of populating
> those banks exactly equals the gain from populating them.
> 
> Sparsemem is absolutely absurd in this requirement - it can't handle
> sparse memory efficiently without wasting lots of memory in the
> process.
> 

I wasn't around when sparsemem was designed, but I strongly suspect it
wasn't considered that a bank would only be 512K. The requirement of a
section being fully populated to allow pfn_valid to be very cheap still
seems very reasonable to me in the majority of cases. It's just no ok in
ARMs case because of the size of banks.

> > I haven't researched this so apologies if it turns out to be stupid but
> > I think the bit SECTION_MAP_LAST_BIT is actually unused and should be
> > safe to use. Has the option being considered of using this bit to mean
> > "section has holes punched in it". If set, the architecture must provide
> > an additional arch_holey_section_pfn_valid() that does additional checking
> > based on information sparsemem doesn't have?  This would avoid the worst of
> > the performance issues of making pfn_valid() slower without increasing the
> > size of mem_section.
> 
> As I've already said, how about just allowing pfn_valid() to be overridden
> by architectures, even for the sparsemem case. 

If nothing else pans out, I won't resist that approach. It's not my No.1
perference because it results in SparseMem-on-ARM behaving one way and
SparseMem-on-everything-else behaving another with respect to pfn_valid().
I'd prefer that an architecture-specific pfn_valid() only be called for
the sections that are known to have holes punched in them.

> We have a perfectly good
> pfn_valid() implementation that'll work across the board, and will fix
> this issue.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-14 11:04                       ` Minchan Kim
@ 2010-07-14 20:49                         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-14 20:49 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Mel Gorman, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel,
	linux-samsung-soc, ben-linux

On Wed, Jul 14, 2010 at 08:04:07PM +0900, Minchan Kim wrote:
> On Wed, Jul 14, 2010 at 5:49 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote:
> >> But I think Kukjin case's best solution is to make section size 16M, not 256M.
> >> Regardless of this, Your idea is the direction we should proceed, I think.
> >
> > So what if someone decides to fit 8MB DRAMs to the board?
> >
> 
> Hmm. It might be a problem.
> Should we consider all configuration?
> 
> Then, in below cases ram size never can change?
> How does it acked and merged?
> I don't know below board can't configurable ram size. Just out of curiosity.

All of these, SECTION_SIZE_BITS refers to the _spacing_ of memory, not
the minimum size of DRAM which can ever be fitted.

Reducing SECTION_SIZE_BITS is really not an option for some of these
platforms - just look at where memory starts in physical space:

arch/arm/mach-clps711x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000)
arch/arm/mach-clps711x/include/mach/memory.h:#define SECTION_SIZE_BITS  24
arch/arm/mach-lh7a40x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000)
arch/arm/mach-lh7a40x/include/mach/memory.h:#define SECTION_SIZE_BITS   24
arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x70000000)
arch/arm/mach-realview/include/mach/memory.h:#define SECTION_SIZE_BITS  28
arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x00000000)
arch/arm/mach-rpc/include/mach/memory.h:#define PHYS_OFFSET UL(0x10000000)
arch/arm/mach-rpc/include/mach/memory.h:#define SECTION_SIZE_BITS       26
arch/arm/mach-s5pv210/include/mach/memory.h:#define PHYS_OFFSET UL(0x20000000)
arch/arm/mach-s5pv210/include/mach/memory.h:#define SECTION_SIZE_BITS   28
arch/arm/mach-sa1100/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000)
arch/arm/mach-sa1100/include/mach/memory.h:#define SECTION_SIZE_BITS    27

If physical memory starts at 3GB, with SECTION_SIZE_BITS set at 24,
there's 256 sections already.  Reducing SECTION_SIZE_BITS to the multiple
of RAM size will increase the number of sections to many thousands.
Clearly, sparsemem doesn't handle sparse memory very well, and is
defficient.

I don't know what the answer is to this; I'm totally fed up with being
pushed into using things, and then later down the line being told that
its all wrong and shouldn't be used like we are.  Maybe we should just
delete these sub-architectures which use sparsemem and say to people
that the kernel just doesn't support their memory layouts.

<depressed>.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-14 20:49                         ` Russell King - ARM Linux
  0 siblings, 0 replies; 60+ messages in thread
From: Russell King - ARM Linux @ 2010-07-14 20:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 14, 2010 at 08:04:07PM +0900, Minchan Kim wrote:
> On Wed, Jul 14, 2010 at 5:49 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote:
> >> But I think Kukjin case's best solution is to make section size 16M, not 256M.
> >> Regardless of this, Your idea is the direction we should proceed, I think.
> >
> > So what if someone decides to fit 8MB DRAMs to the board?
> >
> 
> Hmm. It might be a problem.
> Should we consider all configuration?
> 
> Then, in below cases ram size never can change?
> How does it acked and merged?
> I don't know below board can't configurable ram size. Just out of curiosity.

All of these, SECTION_SIZE_BITS refers to the _spacing_ of memory, not
the minimum size of DRAM which can ever be fitted.

Reducing SECTION_SIZE_BITS is really not an option for some of these
platforms - just look at where memory starts in physical space:

arch/arm/mach-clps711x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000)
arch/arm/mach-clps711x/include/mach/memory.h:#define SECTION_SIZE_BITS  24
arch/arm/mach-lh7a40x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000)
arch/arm/mach-lh7a40x/include/mach/memory.h:#define SECTION_SIZE_BITS   24
arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x70000000)
arch/arm/mach-realview/include/mach/memory.h:#define SECTION_SIZE_BITS  28
arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x00000000)
arch/arm/mach-rpc/include/mach/memory.h:#define PHYS_OFFSET UL(0x10000000)
arch/arm/mach-rpc/include/mach/memory.h:#define SECTION_SIZE_BITS       26
arch/arm/mach-s5pv210/include/mach/memory.h:#define PHYS_OFFSET UL(0x20000000)
arch/arm/mach-s5pv210/include/mach/memory.h:#define SECTION_SIZE_BITS   28
arch/arm/mach-sa1100/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000)
arch/arm/mach-sa1100/include/mach/memory.h:#define SECTION_SIZE_BITS    27

If physical memory starts at 3GB, with SECTION_SIZE_BITS set at 24,
there's 256 sections already.  Reducing SECTION_SIZE_BITS to the multiple
of RAM size will increase the number of sections to many thousands.
Clearly, sparsemem doesn't handle sparse memory very well, and is
defficient.

I don't know what the answer is to this; I'm totally fed up with being
pushed into using things, and then later down the line being told that
its all wrong and shouldn't be used like we are.  Maybe we should just
delete these sub-architectures which use sparsemem and say to people
that the kernel just doesn't support their memory layouts.

<depressed>.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: About SECTION_SIZE_BITS for Sparsemem
  2010-07-14 20:49                         ` Russell King - ARM Linux
@ 2010-07-16  0:07                           ` Minchan Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-16  0:07 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Mel Gorman, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel,
	linux-samsung-soc, ben-linux

On Thu, Jul 15, 2010 at 5:49 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Jul 14, 2010 at 08:04:07PM +0900, Minchan Kim wrote:
>> On Wed, Jul 14, 2010 at 5:49 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote:
>> >> But I think Kukjin case's best solution is to make section size 16M, not 256M.
>> >> Regardless of this, Your idea is the direction we should proceed, I think.
>> >
>> > So what if someone decides to fit 8MB DRAMs to the board?
>> >
>>
>> Hmm. It might be a problem.
>> Should we consider all configuration?
>>
>> Then, in below cases ram size never can change?
>> How does it acked and merged?
>> I don't know below board can't configurable ram size. Just out of curiosity.
>
> All of these, SECTION_SIZE_BITS refers to the _spacing_ of memory, not
> the minimum size of DRAM which can ever be fitted.
>
> Reducing SECTION_SIZE_BITS is really not an option for some of these
> platforms - just look at where memory starts in physical space:
>
> arch/arm/mach-clps711x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000)
> arch/arm/mach-clps711x/include/mach/memory.h:#define SECTION_SIZE_BITS  24
> arch/arm/mach-lh7a40x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000)
> arch/arm/mach-lh7a40x/include/mach/memory.h:#define SECTION_SIZE_BITS   24
> arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x70000000)
> arch/arm/mach-realview/include/mach/memory.h:#define SECTION_SIZE_BITS  28
> arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x00000000)
> arch/arm/mach-rpc/include/mach/memory.h:#define PHYS_OFFSET UL(0x10000000)
> arch/arm/mach-rpc/include/mach/memory.h:#define SECTION_SIZE_BITS       26
> arch/arm/mach-s5pv210/include/mach/memory.h:#define PHYS_OFFSET UL(0x20000000)
> arch/arm/mach-s5pv210/include/mach/memory.h:#define SECTION_SIZE_BITS   28
> arch/arm/mach-sa1100/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000)
> arch/arm/mach-sa1100/include/mach/memory.h:#define SECTION_SIZE_BITS    27
>
> If physical memory starts at 3GB, with SECTION_SIZE_BITS set at 24,
> there's 256 sections already.  Reducing SECTION_SIZE_BITS to the multiple
> of RAM size will increase the number of sections to many thousands.
> Clearly, sparsemem doesn't handle sparse memory very well, and is
> defficient.
>
> I don't know what the answer is to this; I'm totally fed up with being
> pushed into using things, and then later down the line being told that
> its all wrong and shouldn't be used like we are.  Maybe we should just
> delete these sub-architectures which use sparsemem and say to people
> that the kernel just doesn't support their memory layouts.
>
> <depressed>.
>

Sorry for late response.

Okay. I think Kame's recent solution is a good.
http://www.spinics.net/lists/linux-mm/msg06594.html
It doesn't need overriding pfn_valid, doesn't need more space.
And we can optimize it with overriding SECTION_MAP_LAST_BIT with
SECTION_MAP_HAS_HOLE  to avoid non-hole section overhead.
If we all agree, I will make the patch.

All agree?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* About SECTION_SIZE_BITS for Sparsemem
@ 2010-07-16  0:07                           ` Minchan Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Minchan Kim @ 2010-07-16  0:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 15, 2010 at 5:49 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Jul 14, 2010 at 08:04:07PM +0900, Minchan Kim wrote:
>> On Wed, Jul 14, 2010 at 5:49 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote:
>> >> But I think Kukjin case's best solution is to make section size 16M, not 256M.
>> >> Regardless of this, Your idea is the direction we should proceed, I think.
>> >
>> > So what if someone decides to fit 8MB DRAMs to the board?
>> >
>>
>> Hmm. It might be a problem.
>> Should we consider all configuration?
>>
>> Then, in below cases ram size never can change?
>> How does it acked and merged?
>> I don't know below board can't configurable ram size. Just out of curiosity.
>
> All of these, SECTION_SIZE_BITS refers to the _spacing_ of memory, not
> the minimum size of DRAM which can ever be fitted.
>
> Reducing SECTION_SIZE_BITS is really not an option for some of these
> platforms - just look at where memory starts in physical space:
>
> arch/arm/mach-clps711x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000)
> arch/arm/mach-clps711x/include/mach/memory.h:#define SECTION_SIZE_BITS ?24
> arch/arm/mach-lh7a40x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000)
> arch/arm/mach-lh7a40x/include/mach/memory.h:#define SECTION_SIZE_BITS ? 24
> arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x70000000)
> arch/arm/mach-realview/include/mach/memory.h:#define SECTION_SIZE_BITS ?28
> arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x00000000)
> arch/arm/mach-rpc/include/mach/memory.h:#define PHYS_OFFSET UL(0x10000000)
> arch/arm/mach-rpc/include/mach/memory.h:#define SECTION_SIZE_BITS ? ? ? 26
> arch/arm/mach-s5pv210/include/mach/memory.h:#define PHYS_OFFSET UL(0x20000000)
> arch/arm/mach-s5pv210/include/mach/memory.h:#define SECTION_SIZE_BITS ? 28
> arch/arm/mach-sa1100/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000)
> arch/arm/mach-sa1100/include/mach/memory.h:#define SECTION_SIZE_BITS ? ?27
>
> If physical memory starts at 3GB, with SECTION_SIZE_BITS set at 24,
> there's 256 sections already. ?Reducing SECTION_SIZE_BITS to the multiple
> of RAM size will increase the number of sections to many thousands.
> Clearly, sparsemem doesn't handle sparse memory very well, and is
> defficient.
>
> I don't know what the answer is to this; I'm totally fed up with being
> pushed into using things, and then later down the line being told that
> its all wrong and shouldn't be used like we are. ?Maybe we should just
> delete these sub-architectures which use sparsemem and say to people
> that the kernel just doesn't support their memory layouts.
>
> <depressed>.
>

Sorry for late response.

Okay. I think Kame's recent solution is a good.
http://www.spinics.net/lists/linux-mm/msg06594.html
It doesn't need overriding pfn_valid, doesn't need more space.
And we can optimize it with overriding SECTION_MAP_LAST_BIT with
SECTION_MAP_HAS_HOLE  to avoid non-hole section overhead.
If we all agree, I will make the patch.

All agree?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2010-07-16  0:08 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-12  8:32 About SECTION_SIZE_BITS for Sparsemem Kukjin Kim
2010-07-12  8:32 ` Kukjin Kim
2010-07-12  9:35 ` Kyungmin Park
2010-07-12  9:35   ` Kyungmin Park
2010-07-12  9:58   ` Kukjin Kim
2010-07-12  9:58     ` Kukjin Kim
2010-07-12 10:08     ` Kyungmin Park
2010-07-12 10:08       ` Kyungmin Park
2010-07-12 10:13       ` Russell King - ARM Linux
2010-07-12 10:13         ` Russell King - ARM Linux
2010-07-12  9:52 ` Minchan Kim
2010-07-12  9:52   ` Minchan Kim
2010-07-12 10:13   ` Kukjin Kim
2010-07-12 10:13     ` Kukjin Kim
2010-07-12 10:35     ` Minchan Kim
2010-07-12 10:35       ` Minchan Kim
2010-07-13  0:25       ` KAMEZAWA Hiroyuki
2010-07-13  0:25         ` KAMEZAWA Hiroyuki
2010-07-13  1:53         ` KAMEZAWA Hiroyuki
2010-07-13  1:53           ` KAMEZAWA Hiroyuki
2010-07-13 18:31           ` Russell King - ARM Linux
2010-07-13 18:31             ` Russell King - ARM Linux
2010-07-13  2:05         ` Minchan Kim
2010-07-13  2:05           ` Minchan Kim
2010-07-13  3:03           ` KAMEZAWA Hiroyuki
2010-07-13  3:03             ` KAMEZAWA Hiroyuki
2010-07-13  9:28             ` Mel Gorman
2010-07-13  9:28               ` Mel Gorman
2010-07-13  9:38               ` Russell King - ARM Linux
2010-07-13  9:38                 ` Russell King - ARM Linux
2010-07-13  9:26       ` Mel Gorman
2010-07-13  9:26         ` Mel Gorman
2010-07-13  9:38         ` Russell King - ARM Linux
2010-07-13  9:38           ` Russell King - ARM Linux
2010-07-13  9:50           ` Mel Gorman
2010-07-13  9:50             ` Mel Gorman
2010-07-13 17:37             ` Russell King - ARM Linux
2010-07-13 17:37               ` Russell King - ARM Linux
2010-07-13 20:32               ` Mel Gorman
2010-07-13 20:32                 ` Mel Gorman
2010-07-13 23:59                 ` Minchan Kim
2010-07-13 23:59                   ` Minchan Kim
2010-07-14  8:49                   ` Russell King - ARM Linux
2010-07-14  8:49                     ` Russell King - ARM Linux
2010-07-14 11:04                     ` Minchan Kim
2010-07-14 11:04                       ` Minchan Kim
2010-07-14 20:49                       ` Russell King - ARM Linux
2010-07-14 20:49                         ` Russell King - ARM Linux
2010-07-16  0:07                         ` Minchan Kim
2010-07-16  0:07                           ` Minchan Kim
2010-07-14  8:59                 ` Russell King - ARM Linux
2010-07-14  8:59                   ` Russell King - ARM Linux
2010-07-14 13:14                   ` Mel Gorman
2010-07-14 13:14                     ` Mel Gorman
2010-07-12 10:45   ` Russell King - ARM Linux
2010-07-12 10:45     ` Russell King - ARM Linux
2010-07-12 12:28     ` Minchan Kim
2010-07-12 12:28       ` Minchan Kim
2010-07-12 12:42       ` Russell King - ARM Linux
2010-07-12 12:42         ` Russell King - ARM Linux

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.