* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-12 8:32 ` Kukjin Kim 0 siblings, 0 replies; 60+ messages in thread From: Kukjin Kim @ 2010-07-12 8:32 UTC (permalink / raw) To: linux-arm-kernel, linux-samsung-soc, 'Russell King - ARM Linux' Cc: ben-linux Russell, Hi, Kukjin Kim wrote: > Russell wrote: > > So, memory starts at 0x20000000 and finishes at 0x25000000. That's fine. > > That doesn't mean the section size is 16MB. > > > > As I've already said, the section size has _nothing_ what so ever to do > > with the size of memory, or the granularity of the size of memory. By > > way of illustration, it is perfectly legal to have a section size of > > 256MB but only have 1MB in a section and this is perfectly legal. So > > sections do not have to be completely filled. > > > Actually, as you know, the hole's area of mem_map is freed from bootmem if a > section has a hole when initializing sparse memory. > > I identified that a section doesn't need to be a contiguous area of physical > memory when reading your comment with the fact that the mem_map of a section > can be smaller than the size of a section. > > I found, however, the kernel panics when modifying min_free_kbytes file in > the proc filesystem if a section has a hole. > > While processing the change of min_free_kbytes in the kernel, page > descriptors in a hole of an online section is accessed. As I said, following error happens. It would be helpful to me if any opinions or comments. --- When SECTION_SIZE_BITS is 24 (16MiB), [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes 2736 [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes 2730 [root@Samsung ~]# When SECTION_SIZE_BITS is 28 (256MiB), [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes 2736 [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes Unable to handle kernel NULL pointer dereference at virtual address 00000004 pgd = 80a14000 [00000004] *pgd=20a0b031, *pte=00000000, *ppte=00000000 Internal error: Oops: 17 [#1] PREEMPT last sysfs file: Modules linked in: CPU: 0 Not tainted (2.6.35-rc4-00007-g9a59bf7-dirty #3) PC is at get_pageblock_flags_group+0x54/0xa8 LR is at setup_per_zone_wmarks+0xfc/0x1d4 pc : [<800686cc>] lr : [<800691a0>] psr: 60000093 sp : 80a03ed0 ip : 00000001 fp : 00058000 r10: 004a0000 r9 : 801e5fbc r8 : 802c0000 r7 : 0001c900 r6 : 00025000 r5 : 801e5fa4 r4 : 0000007e r3 : 00000018 r2 : 00000002 r1 : 00000000 r0 : 00000000 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c5387d Table: 20a14019 DAC: 00000015 Process bash (pid: 888, stack limit = 0x80a022e8) Stack: (0x80a03ed0 to 0x80a04000) 3ec0: 00000000 801e5fa4 00025000 800691a0 3ee0: a0000013 000002aa 00000005 00000001 801d8254 2aacb000 00000005 00000001 3f00: 80a02000 80a03f80 00000001 80069354 80a03f80 00000001 801d7900 800d2734 3f20: 80a03f80 00000000 00000000 00000005 80a0903c 00000005 b7c81c00 2aacb000 3f40: 80a03f80 00000005 00000000 800d2760 00000001 2aacb000 00000005 8008ff74 3f60: b7c81c00 2aacb000 b7c81c00 2aacb000 00000000 00000000 00000005 800900c8 3f80: 00000005 00000000 00000005 00000000 00000005 2aacb000 2ac525f8 00000004 3fa0: 8001f0e8 8001ef40 00000005 2aacb000 00000001 2aacb000 00000005 00000000 3fc0: 00000005 2aacb000 2ac525f8 00000004 00000005 000babe0 00000000 00000001 3fe0: 2aacb000 7e88ca58 2ab99124 2abe643c 60000010 00000001 00000000 00000000 [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>] (proc_sys_call_handler+0x90/0xac) [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>] (proc_sys_write+0x10/0x14) [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>] (vfs_write+0xac/0x154) [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68) [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>] (ret_fast_syscall+0x0/0x30) Code: 11a0cb8c 11a0cbac 1080018c e3a0c001 (e5904004) ---[ end trace 8b96cc6afed783d1 ]--- note: bash[888] exited with preempt_count 1 BUG: scheduling while atomic: bash/888/0x40000002 Modules linked in: [<80024558>] (unwind_backtrace+0x0/0xec) from [<8014f110>] (schedule+0x6c/0x2e4) [<8014f110>] (schedule+0x6c/0x2e4) from [<80035f74>] (__cond_resched+0x24/0x34) [<80035f74>] (__cond_resched+0x24/0x34) from [<8014f484>] (_cond_resched+0x30/0x40) [<8014f484>] (_cond_resched+0x30/0x40) from [<8007ac1c>] (unmap_vmas+0x550/0x604) [<8007ac1c>] (unmap_vmas+0x550/0x604) from [<8007d6b8>] (exit_mmap+0xb0/0x1d8) [<8007d6b8>] (exit_mmap+0xb0/0x1d8) from [<80037de8>] (mmput+0x4c/0x104) [<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c) [<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc) [<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc) [<80022950>] (die+0x190/0x1bc) from [<80025488>] (__do_kernel_fault+0x64/0x84) [<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>] (do_page_fault+0x1d4/0x1ec) [<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>] (do_DataAbort+0x34/0x94) [<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>] (__dabt_svc+0x4c/0x60) Exception stack(0x80a03e88 to 0x80a03ed0) 3e80: 00000000 00000000 00000002 00000018 0000007e 801e5fa4 3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001 80a03ed0 3ec0: 800691a0 800686cc 60000093 ffffffff [<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>] (proc_sys_call_handler+0x90/0xac) [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>] (proc_sys_write+0x10/0x14) [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>] (vfs_write+0xac/0x154) [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68) [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>] (ret_fast_syscall+0x0/0x30) BUG: spinlock lockup on CPU#0, bash/888, 801e5fbc [<80024558>] (unwind_backtrace+0x0/0xec) from [<801098a8>] (do_raw_spin_lock+0x110/0x14c) [<801098a8>] (do_raw_spin_lock+0x110/0x14c) from [<8006963c>] (free_pcppages_bulk+0x24/0x2f4) [<8006963c>] (free_pcppages_bulk+0x24/0x2f4) from [<80069f28>] (free_hot_cold_page+0x15c/0x180) [<80069f28>] (free_hot_cold_page+0x15c/0x180) from [<8007b834>] (free_pgd_range+0x15c/0x17c) [<8007b834>] (free_pgd_range+0x15c/0x17c) from [<8007b8e4>] (free_pgtables+0x90/0x9c) [<8007b8e4>] (free_pgtables+0x90/0x9c) from [<8007d714>] (exit_mmap+0x10c/0x1d8) [<8007d714>] (exit_mmap+0x10c/0x1d8) from [<80037de8>] (mmput+0x4c/0x104) [<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c) [<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc) [<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc) [<80022950>] (die+0x190/0x1bc) from [<80025488>] (__do_kernel_fault+0x64/0x84) [<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>] (do_page_fault+0x1d4/0x1ec) [<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>] (do_DataAbort+0x34/0x94) [<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>] (__dabt_svc+0x4c/0x60) Exception stack(0x80a03e88 to 0x80a03ed0) 3e80: 00000000 00000000 00000002 00000018 0000007e 801e5fa4 3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001 80a03ed0 3ec0: 800691a0 800686cc 60000093 ffffffff [<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>] (proc_sys_call_handler+0x90/0xac) [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>] (proc_sys_write+0x10/0x14) [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>] (vfs_write+0xac/0x154) [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68) [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>] (ret_fast_syscall+0x0/0x30) Thanks. Best regards, Kgene. -- Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer, SW Solution Development Team, Samsung Electronics Co., Ltd. ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-12 8:32 ` Kukjin Kim 0 siblings, 0 replies; 60+ messages in thread From: Kukjin Kim @ 2010-07-12 8:32 UTC (permalink / raw) To: linux-arm-kernel Russell, Hi, Kukjin Kim wrote: > Russell wrote: > > So, memory starts at 0x20000000 and finishes at 0x25000000. That's fine. > > That doesn't mean the section size is 16MB. > > > > As I've already said, the section size has _nothing_ what so ever to do > > with the size of memory, or the granularity of the size of memory. By > > way of illustration, it is perfectly legal to have a section size of > > 256MB but only have 1MB in a section and this is perfectly legal. So > > sections do not have to be completely filled. > > > Actually, as you know, the hole's area of mem_map is freed from bootmem if a > section has a hole when initializing sparse memory. > > I identified that a section doesn't need to be a contiguous area of physical > memory when reading your comment with the fact that the mem_map of a section > can be smaller than the size of a section. > > I found, however, the kernel panics when modifying min_free_kbytes file in > the proc filesystem if a section has a hole. > > While processing the change of min_free_kbytes in the kernel, page > descriptors in a hole of an online section is accessed. As I said, following error happens. It would be helpful to me if any opinions or comments. --- When SECTION_SIZE_BITS is 24 (16MiB), [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes 2736 [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes 2730 [root at Samsung ~]# When SECTION_SIZE_BITS is 28 (256MiB), [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes 2736 [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes Unable to handle kernel NULL pointer dereference at virtual address 00000004 pgd = 80a14000 [00000004] *pgd=20a0b031, *pte=00000000, *ppte=00000000 Internal error: Oops: 17 [#1] PREEMPT last sysfs file: Modules linked in: CPU: 0 Not tainted (2.6.35-rc4-00007-g9a59bf7-dirty #3) PC is at get_pageblock_flags_group+0x54/0xa8 LR is at setup_per_zone_wmarks+0xfc/0x1d4 pc : [<800686cc>] lr : [<800691a0>] psr: 60000093 sp : 80a03ed0 ip : 00000001 fp : 00058000 r10: 004a0000 r9 : 801e5fbc r8 : 802c0000 r7 : 0001c900 r6 : 00025000 r5 : 801e5fa4 r4 : 0000007e r3 : 00000018 r2 : 00000002 r1 : 00000000 r0 : 00000000 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c5387d Table: 20a14019 DAC: 00000015 Process bash (pid: 888, stack limit = 0x80a022e8) Stack: (0x80a03ed0 to 0x80a04000) 3ec0: 00000000 801e5fa4 00025000 800691a0 3ee0: a0000013 000002aa 00000005 00000001 801d8254 2aacb000 00000005 00000001 3f00: 80a02000 80a03f80 00000001 80069354 80a03f80 00000001 801d7900 800d2734 3f20: 80a03f80 00000000 00000000 00000005 80a0903c 00000005 b7c81c00 2aacb000 3f40: 80a03f80 00000005 00000000 800d2760 00000001 2aacb000 00000005 8008ff74 3f60: b7c81c00 2aacb000 b7c81c00 2aacb000 00000000 00000000 00000005 800900c8 3f80: 00000005 00000000 00000005 00000000 00000005 2aacb000 2ac525f8 00000004 3fa0: 8001f0e8 8001ef40 00000005 2aacb000 00000001 2aacb000 00000005 00000000 3fc0: 00000005 2aacb000 2ac525f8 00000004 00000005 000babe0 00000000 00000001 3fe0: 2aacb000 7e88ca58 2ab99124 2abe643c 60000010 00000001 00000000 00000000 [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>] (proc_sys_call_handler+0x90/0xac) [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>] (proc_sys_write+0x10/0x14) [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>] (vfs_write+0xac/0x154) [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68) [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>] (ret_fast_syscall+0x0/0x30) Code: 11a0cb8c 11a0cbac 1080018c e3a0c001 (e5904004) ---[ end trace 8b96cc6afed783d1 ]--- note: bash[888] exited with preempt_count 1 BUG: scheduling while atomic: bash/888/0x40000002 Modules linked in: [<80024558>] (unwind_backtrace+0x0/0xec) from [<8014f110>] (schedule+0x6c/0x2e4) [<8014f110>] (schedule+0x6c/0x2e4) from [<80035f74>] (__cond_resched+0x24/0x34) [<80035f74>] (__cond_resched+0x24/0x34) from [<8014f484>] (_cond_resched+0x30/0x40) [<8014f484>] (_cond_resched+0x30/0x40) from [<8007ac1c>] (unmap_vmas+0x550/0x604) [<8007ac1c>] (unmap_vmas+0x550/0x604) from [<8007d6b8>] (exit_mmap+0xb0/0x1d8) [<8007d6b8>] (exit_mmap+0xb0/0x1d8) from [<80037de8>] (mmput+0x4c/0x104) [<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c) [<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc) [<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc) [<80022950>] (die+0x190/0x1bc) from [<80025488>] (__do_kernel_fault+0x64/0x84) [<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>] (do_page_fault+0x1d4/0x1ec) [<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>] (do_DataAbort+0x34/0x94) [<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>] (__dabt_svc+0x4c/0x60) Exception stack(0x80a03e88 to 0x80a03ed0) 3e80: 00000000 00000000 00000002 00000018 0000007e 801e5fa4 3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001 80a03ed0 3ec0: 800691a0 800686cc 60000093 ffffffff [<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>] (proc_sys_call_handler+0x90/0xac) [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>] (proc_sys_write+0x10/0x14) [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>] (vfs_write+0xac/0x154) [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68) [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>] (ret_fast_syscall+0x0/0x30) BUG: spinlock lockup on CPU#0, bash/888, 801e5fbc [<80024558>] (unwind_backtrace+0x0/0xec) from [<801098a8>] (do_raw_spin_lock+0x110/0x14c) [<801098a8>] (do_raw_spin_lock+0x110/0x14c) from [<8006963c>] (free_pcppages_bulk+0x24/0x2f4) [<8006963c>] (free_pcppages_bulk+0x24/0x2f4) from [<80069f28>] (free_hot_cold_page+0x15c/0x180) [<80069f28>] (free_hot_cold_page+0x15c/0x180) from [<8007b834>] (free_pgd_range+0x15c/0x17c) [<8007b834>] (free_pgd_range+0x15c/0x17c) from [<8007b8e4>] (free_pgtables+0x90/0x9c) [<8007b8e4>] (free_pgtables+0x90/0x9c) from [<8007d714>] (exit_mmap+0x10c/0x1d8) [<8007d714>] (exit_mmap+0x10c/0x1d8) from [<80037de8>] (mmput+0x4c/0x104) [<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c) [<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc) [<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc) [<80022950>] (die+0x190/0x1bc) from [<80025488>] (__do_kernel_fault+0x64/0x84) [<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>] (do_page_fault+0x1d4/0x1ec) [<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>] (do_DataAbort+0x34/0x94) [<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>] (__dabt_svc+0x4c/0x60) Exception stack(0x80a03e88 to 0x80a03ed0) 3e80: 00000000 00000000 00000002 00000018 0000007e 801e5fa4 3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001 80a03ed0 3ec0: 800691a0 800686cc 60000093 ffffffff [<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>] (proc_sys_call_handler+0x90/0xac) [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>] (proc_sys_write+0x10/0x14) [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>] (vfs_write+0xac/0x154) [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68) [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>] (ret_fast_syscall+0x0/0x30) Thanks. Best regards, Kgene. -- Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer, SW Solution Development Team, Samsung Electronics Co., Ltd. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-12 8:32 ` Kukjin Kim @ 2010-07-12 9:35 ` Kyungmin Park -1 siblings, 0 replies; 60+ messages in thread From: Kyungmin Park @ 2010-07-12 9:35 UTC (permalink / raw) To: Kukjin Kim Cc: linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux Interesting. I got tested with #define MAX_PHYSMEM_BITS 31 #define SECTION_SIZE_BITS 27 # cat /proc/sys/vm/min_free_kbytes 1832 # echo 1828 > /proc/sys/vm/min_free_kbytes # cat /proc/sys/vm/min_free_kbytes 1828 # echo 1820 > /proc/sys/vm/min_free_kbytes # cat /proc/sys/vm/min_free_kbytes 1820 # echo 1700 > /proc/sys/vm/min_free_kbytes # cat /proc/sys/vm/min_free_kbytes 1700 No kernel panic. Thank you, Kyungmin Park On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > Russell, > > Hi, > > Kukjin Kim wrote: >> Russell wrote: >> > So, memory starts at 0x20000000 and finishes at 0x25000000. That's > fine. >> > That doesn't mean the section size is 16MB. >> > >> > As I've already said, the section size has _nothing_ what so ever to do >> > with the size of memory, or the granularity of the size of memory. By >> > way of illustration, it is perfectly legal to have a section size of >> > 256MB but only have 1MB in a section and this is perfectly legal. So >> > sections do not have to be completely filled. >> > >> Actually, as you know, the hole's area of mem_map is freed from bootmem if > a >> section has a hole when initializing sparse memory. >> >> I identified that a section doesn't need to be a contiguous area of > physical >> memory when reading your comment with the fact that the mem_map of a > section >> can be smaller than the size of a section. >> >> I found, however, the kernel panics when modifying min_free_kbytes file in >> the proc filesystem if a section has a hole. >> >> While processing the change of min_free_kbytes in the kernel, page >> descriptors in a hole of an online section is accessed. > > As I said, following error happens. > It would be helpful to me if any opinions or comments. > > --- > When SECTION_SIZE_BITS is 24 (16MiB), > > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes > 2736 > [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes > > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes > 2730 > [root@Samsung ~]# > > > When SECTION_SIZE_BITS is 28 (256MiB), > > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes > 2736 > [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes > Unable to handle kernel NULL pointer dereference at virtual address 00000004 > pgd = 80a14000 > [00000004] *pgd=20a0b031, *pte=00000000, *ppte=00000000 > Internal error: Oops: 17 [#1] PREEMPT > last sysfs file: > Modules linked in: > CPU: 0 Not tainted (2.6.35-rc4-00007-g9a59bf7-dirty #3) > PC is at get_pageblock_flags_group+0x54/0xa8 > LR is at setup_per_zone_wmarks+0xfc/0x1d4 > pc : [<800686cc>] lr : [<800691a0>] psr: 60000093 > sp : 80a03ed0 ip : 00000001 fp : 00058000 > r10: 004a0000 r9 : 801e5fbc r8 : 802c0000 > r7 : 0001c900 r6 : 00025000 r5 : 801e5fa4 r4 : 0000007e > r3 : 00000018 r2 : 00000002 r1 : 00000000 r0 : 00000000 > Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user > Control: 10c5387d Table: 20a14019 DAC: 00000015 > Process bash (pid: 888, stack limit = 0x80a022e8) > Stack: (0x80a03ed0 to 0x80a04000) > 3ec0: 00000000 801e5fa4 00025000 > 800691a0 > 3ee0: a0000013 000002aa 00000005 00000001 801d8254 2aacb000 00000005 > 00000001 > 3f00: 80a02000 80a03f80 00000001 80069354 80a03f80 00000001 801d7900 > 800d2734 > 3f20: 80a03f80 00000000 00000000 00000005 80a0903c 00000005 b7c81c00 > 2aacb000 > 3f40: 80a03f80 00000005 00000000 800d2760 00000001 2aacb000 00000005 > 8008ff74 > 3f60: b7c81c00 2aacb000 b7c81c00 2aacb000 00000000 00000000 00000005 > 800900c8 > 3f80: 00000005 00000000 00000005 00000000 00000005 2aacb000 2ac525f8 > 00000004 > 3fa0: 8001f0e8 8001ef40 00000005 2aacb000 00000001 2aacb000 00000005 > 00000000 > 3fc0: 00000005 2aacb000 2ac525f8 00000004 00000005 000babe0 00000000 > 00000001 > 3fe0: 2aacb000 7e88ca58 2ab99124 2abe643c 60000010 00000001 00000000 > 00000000 > [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>] > (setup_per_zone_wmarks+0xfc/0x1d4) > [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>] > (min_free_kbytes_sysctl_handler+0x20/0x28) > [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>] > (proc_sys_call_handler+0x90/0xac) > [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>] > (proc_sys_write+0x10/0x14) > [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>] > (vfs_write+0xac/0x154) > [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68) > [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>] > (ret_fast_syscall+0x0/0x30) > Code: 11a0cb8c 11a0cbac 1080018c e3a0c001 (e5904004) > ---[ end trace 8b96cc6afed783d1 ]--- > note: bash[888] exited with preempt_count 1 > BUG: scheduling while atomic: bash/888/0x40000002 > Modules linked in: > [<80024558>] (unwind_backtrace+0x0/0xec) from [<8014f110>] > (schedule+0x6c/0x2e4) > [<8014f110>] (schedule+0x6c/0x2e4) from [<80035f74>] > (__cond_resched+0x24/0x34) > [<80035f74>] (__cond_resched+0x24/0x34) from [<8014f484>] > (_cond_resched+0x30/0x40) > [<8014f484>] (_cond_resched+0x30/0x40) from [<8007ac1c>] > (unmap_vmas+0x550/0x604) > [<8007ac1c>] (unmap_vmas+0x550/0x604) from [<8007d6b8>] > (exit_mmap+0xb0/0x1d8) > [<8007d6b8>] (exit_mmap+0xb0/0x1d8) from [<80037de8>] (mmput+0x4c/0x104) > [<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c) > [<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc) > [<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc) > [<80022950>] (die+0x190/0x1bc) from [<80025488>] > (__do_kernel_fault+0x64/0x84) > [<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>] > (do_page_fault+0x1d4/0x1ec) > [<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>] > (do_DataAbort+0x34/0x94) > [<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>] > (__dabt_svc+0x4c/0x60) > Exception stack(0x80a03e88 to 0x80a03ed0) > 3e80: 00000000 00000000 00000002 00000018 0000007e > 801e5fa4 > 3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001 > 80a03ed0 > 3ec0: 800691a0 800686cc 60000093 ffffffff > [<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>] > (get_pageblock_flags_group+0x54/0xa8) > [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>] > (setup_per_zone_wmarks+0xfc/0x1d4) > [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>] > (min_free_kbytes_sysctl_handler+0x20/0x28) > [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>] > (proc_sys_call_handler+0x90/0xac) > [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>] > (proc_sys_write+0x10/0x14) > [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>] > (vfs_write+0xac/0x154) > [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68) > [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>] > (ret_fast_syscall+0x0/0x30) > BUG: spinlock lockup on CPU#0, bash/888, 801e5fbc > [<80024558>] (unwind_backtrace+0x0/0xec) from [<801098a8>] > (do_raw_spin_lock+0x110/0x14c) > [<801098a8>] (do_raw_spin_lock+0x110/0x14c) from [<8006963c>] > (free_pcppages_bulk+0x24/0x2f4) > [<8006963c>] (free_pcppages_bulk+0x24/0x2f4) from [<80069f28>] > (free_hot_cold_page+0x15c/0x180) > [<80069f28>] (free_hot_cold_page+0x15c/0x180) from [<8007b834>] > (free_pgd_range+0x15c/0x17c) > [<8007b834>] (free_pgd_range+0x15c/0x17c) from [<8007b8e4>] > (free_pgtables+0x90/0x9c) > [<8007b8e4>] (free_pgtables+0x90/0x9c) from [<8007d714>] > (exit_mmap+0x10c/0x1d8) > [<8007d714>] (exit_mmap+0x10c/0x1d8) from [<80037de8>] (mmput+0x4c/0x104) > [<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c) > [<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc) > [<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc) > [<80022950>] (die+0x190/0x1bc) from [<80025488>] > (__do_kernel_fault+0x64/0x84) > [<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>] > (do_page_fault+0x1d4/0x1ec) > [<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>] > (do_DataAbort+0x34/0x94) > [<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>] > (__dabt_svc+0x4c/0x60) > Exception stack(0x80a03e88 to 0x80a03ed0) > 3e80: 00000000 00000000 00000002 00000018 0000007e > 801e5fa4 > 3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001 > 80a03ed0 > 3ec0: 800691a0 800686cc 60000093 ffffffff > [<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>] > (get_pageblock_flags_group+0x54/0xa8) > [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>] > (setup_per_zone_wmarks+0xfc/0x1d4) > [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>] > (min_free_kbytes_sysctl_handler+0x20/0x28) > [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>] > (proc_sys_call_handler+0x90/0xac) > [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>] > (proc_sys_write+0x10/0x14) > [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>] > (vfs_write+0xac/0x154) > [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68) > [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>] > (ret_fast_syscall+0x0/0x30) > > > Thanks. > > Best regards, > Kgene. > -- > Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer, > SW Solution Development Team, Samsung Electronics Co., Ltd. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-12 9:35 ` Kyungmin Park 0 siblings, 0 replies; 60+ messages in thread From: Kyungmin Park @ 2010-07-12 9:35 UTC (permalink / raw) To: linux-arm-kernel Interesting. I got tested with #define MAX_PHYSMEM_BITS 31 #define SECTION_SIZE_BITS 27 # cat /proc/sys/vm/min_free_kbytes 1832 # echo 1828 > /proc/sys/vm/min_free_kbytes # cat /proc/sys/vm/min_free_kbytes 1828 # echo 1820 > /proc/sys/vm/min_free_kbytes # cat /proc/sys/vm/min_free_kbytes 1820 # echo 1700 > /proc/sys/vm/min_free_kbytes # cat /proc/sys/vm/min_free_kbytes 1700 No kernel panic. Thank you, Kyungmin Park On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > Russell, > > Hi, > > Kukjin Kim wrote: >> Russell wrote: >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's > fine. >> > That doesn't mean the section size is 16MB. >> > >> > As I've already said, the section size has _nothing_ what so ever to do >> > with the size of memory, or the granularity of the size of memory. ?By >> > way of illustration, it is perfectly legal to have a section size of >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So >> > sections do not have to be completely filled. >> > >> Actually, as you know, the hole's area of mem_map is freed from bootmem if > a >> section has a hole when initializing sparse memory. >> >> I identified that a section doesn't need to be a contiguous area of > physical >> memory when reading your comment with the fact that the mem_map of a > section >> can be smaller than the size of a section. >> >> I found, however, the kernel panics when modifying min_free_kbytes file in >> the proc filesystem if a section has a hole. >> >> While processing the change of min_free_kbytes in the kernel, page >> descriptors in a hole of an online section is accessed. > > As I said, following error happens. > It would be helpful to me if any opinions or comments. > > --- > When SECTION_SIZE_BITS is 24 (16MiB), > > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes > 2736 > [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes > > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes > 2730 > [root at Samsung ~]# > > > When SECTION_SIZE_BITS is 28 (256MiB), > > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes > 2736 > [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes > Unable to handle kernel NULL pointer dereference at virtual address 00000004 > pgd = 80a14000 > [00000004] *pgd=20a0b031, *pte=00000000, *ppte=00000000 > Internal error: Oops: 17 [#1] PREEMPT > last sysfs file: > Modules linked in: > CPU: 0 ? ?Not tainted ?(2.6.35-rc4-00007-g9a59bf7-dirty #3) > PC is at get_pageblock_flags_group+0x54/0xa8 > LR is at setup_per_zone_wmarks+0xfc/0x1d4 > pc : [<800686cc>] ? ?lr : [<800691a0>] ? ?psr: 60000093 > sp : 80a03ed0 ?ip : 00000001 ?fp : 00058000 > r10: 004a0000 ?r9 : 801e5fbc ?r8 : 802c0000 > r7 : 0001c900 ?r6 : 00025000 ?r5 : 801e5fa4 ?r4 : 0000007e > r3 : 00000018 ?r2 : 00000002 ?r1 : 00000000 ?r0 : 00000000 > Flags: nZCv ?IRQs off ?FIQs on ?Mode SVC_32 ?ISA ARM ?Segment user > Control: 10c5387d ?Table: 20a14019 ?DAC: 00000015 > Process bash (pid: 888, stack limit = 0x80a022e8) > Stack: (0x80a03ed0 to 0x80a04000) > 3ec0: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 00000000 801e5fa4 00025000 > 800691a0 > 3ee0: a0000013 000002aa 00000005 00000001 801d8254 2aacb000 00000005 > 00000001 > 3f00: 80a02000 80a03f80 00000001 80069354 80a03f80 00000001 801d7900 > 800d2734 > 3f20: 80a03f80 00000000 00000000 00000005 80a0903c 00000005 b7c81c00 > 2aacb000 > 3f40: 80a03f80 00000005 00000000 800d2760 00000001 2aacb000 00000005 > 8008ff74 > 3f60: b7c81c00 2aacb000 b7c81c00 2aacb000 00000000 00000000 00000005 > 800900c8 > 3f80: 00000005 00000000 00000005 00000000 00000005 2aacb000 2ac525f8 > 00000004 > 3fa0: 8001f0e8 8001ef40 00000005 2aacb000 00000001 2aacb000 00000005 > 00000000 > 3fc0: 00000005 2aacb000 2ac525f8 00000004 00000005 000babe0 00000000 > 00000001 > 3fe0: 2aacb000 7e88ca58 2ab99124 2abe643c 60000010 00000001 00000000 > 00000000 > [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>] > (setup_per_zone_wmarks+0xfc/0x1d4) > [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>] > (min_free_kbytes_sysctl_handler+0x20/0x28) > [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>] > (proc_sys_call_handler+0x90/0xac) > [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>] > (proc_sys_write+0x10/0x14) > [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>] > (vfs_write+0xac/0x154) > [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68) > [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>] > (ret_fast_syscall+0x0/0x30) > Code: 11a0cb8c 11a0cbac 1080018c e3a0c001 (e5904004) > ---[ end trace 8b96cc6afed783d1 ]--- > note: bash[888] exited with preempt_count 1 > BUG: scheduling while atomic: bash/888/0x40000002 > Modules linked in: > [<80024558>] (unwind_backtrace+0x0/0xec) from [<8014f110>] > (schedule+0x6c/0x2e4) > [<8014f110>] (schedule+0x6c/0x2e4) from [<80035f74>] > (__cond_resched+0x24/0x34) > [<80035f74>] (__cond_resched+0x24/0x34) from [<8014f484>] > (_cond_resched+0x30/0x40) > [<8014f484>] (_cond_resched+0x30/0x40) from [<8007ac1c>] > (unmap_vmas+0x550/0x604) > [<8007ac1c>] (unmap_vmas+0x550/0x604) from [<8007d6b8>] > (exit_mmap+0xb0/0x1d8) > [<8007d6b8>] (exit_mmap+0xb0/0x1d8) from [<80037de8>] (mmput+0x4c/0x104) > [<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c) > [<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc) > [<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc) > [<80022950>] (die+0x190/0x1bc) from [<80025488>] > (__do_kernel_fault+0x64/0x84) > [<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>] > (do_page_fault+0x1d4/0x1ec) > [<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>] > (do_DataAbort+0x34/0x94) > [<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>] > (__dabt_svc+0x4c/0x60) > Exception stack(0x80a03e88 to 0x80a03ed0) > 3e80: ? ? ? ? ? ? ? ? ? 00000000 00000000 00000002 00000018 0000007e > 801e5fa4 > 3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001 > 80a03ed0 > 3ec0: 800691a0 800686cc 60000093 ffffffff > [<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>] > (get_pageblock_flags_group+0x54/0xa8) > [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>] > (setup_per_zone_wmarks+0xfc/0x1d4) > [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>] > (min_free_kbytes_sysctl_handler+0x20/0x28) > [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>] > (proc_sys_call_handler+0x90/0xac) > [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>] > (proc_sys_write+0x10/0x14) > [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>] > (vfs_write+0xac/0x154) > [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68) > [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>] > (ret_fast_syscall+0x0/0x30) > BUG: spinlock lockup on CPU#0, bash/888, 801e5fbc > [<80024558>] (unwind_backtrace+0x0/0xec) from [<801098a8>] > (do_raw_spin_lock+0x110/0x14c) > [<801098a8>] (do_raw_spin_lock+0x110/0x14c) from [<8006963c>] > (free_pcppages_bulk+0x24/0x2f4) > [<8006963c>] (free_pcppages_bulk+0x24/0x2f4) from [<80069f28>] > (free_hot_cold_page+0x15c/0x180) > [<80069f28>] (free_hot_cold_page+0x15c/0x180) from [<8007b834>] > (free_pgd_range+0x15c/0x17c) > [<8007b834>] (free_pgd_range+0x15c/0x17c) from [<8007b8e4>] > (free_pgtables+0x90/0x9c) > [<8007b8e4>] (free_pgtables+0x90/0x9c) from [<8007d714>] > (exit_mmap+0x10c/0x1d8) > [<8007d714>] (exit_mmap+0x10c/0x1d8) from [<80037de8>] (mmput+0x4c/0x104) > [<80037de8>] (mmput+0x4c/0x104) from [<8003b7f4>] (exit_mm+0x104/0x10c) > [<8003b7f4>] (exit_mm+0x104/0x10c) from [<8003cdc4>] (do_exit+0x194/0x5bc) > [<8003cdc4>] (do_exit+0x194/0x5bc) from [<80022950>] (die+0x190/0x1bc) > [<80022950>] (die+0x190/0x1bc) from [<80025488>] > (__do_kernel_fault+0x64/0x84) > [<80025488>] (__do_kernel_fault+0x64/0x84) from [<8002567c>] > (do_page_fault+0x1d4/0x1ec) > [<8002567c>] (do_page_fault+0x1d4/0x1ec) from [<8001e2d0>] > (do_DataAbort+0x34/0x94) > [<8001e2d0>] (do_DataAbort+0x34/0x94) from [<8001ea2c>] > (__dabt_svc+0x4c/0x60) > Exception stack(0x80a03e88 to 0x80a03ed0) > 3e80: ? ? ? ? ? ? ? ? ? 00000000 00000000 00000002 00000018 0000007e > 801e5fa4 > 3ea0: 00025000 0001c900 802c0000 801e5fbc 004a0000 00058000 00000001 > 80a03ed0 > 3ec0: 800691a0 800686cc 60000093 ffffffff > [<8001ea2c>] (__dabt_svc+0x4c/0x60) from [<800686cc>] > (get_pageblock_flags_group+0x54/0xa8) > [<800686cc>] (get_pageblock_flags_group+0x54/0xa8) from [<800691a0>] > (setup_per_zone_wmarks+0xfc/0x1d4) > [<800691a0>] (setup_per_zone_wmarks+0xfc/0x1d4) from [<80069354>] > (min_free_kbytes_sysctl_handler+0x20/0x28) > [<80069354>] (min_free_kbytes_sysctl_handler+0x20/0x28) from [<800d2734>] > (proc_sys_call_handler+0x90/0xac) > [<800d2734>] (proc_sys_call_handler+0x90/0xac) from [<800d2760>] > (proc_sys_write+0x10/0x14) > [<800d2760>] (proc_sys_write+0x10/0x14) from [<8008ff74>] > (vfs_write+0xac/0x154) > [<8008ff74>] (vfs_write+0xac/0x154) from [<800900c8>] (sys_write+0x3c/0x68) > [<800900c8>] (sys_write+0x3c/0x68) from [<8001ef40>] > (ret_fast_syscall+0x0/0x30) > > > Thanks. > > Best regards, > Kgene. > -- > Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer, > SW Solution Development Team, Samsung Electronics Co., Ltd. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at ?http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 60+ messages in thread
* RE: About SECTION_SIZE_BITS for Sparsemem 2010-07-12 9:35 ` Kyungmin Park @ 2010-07-12 9:58 ` Kukjin Kim -1 siblings, 0 replies; 60+ messages in thread From: Kukjin Kim @ 2010-07-12 9:58 UTC (permalink / raw) To: 'Kyungmin Park' Cc: linux-arm-kernel, linux-samsung-soc, 'Russell King - ARM Linux', ben-linux Kyungmin Park wrote: > > Interesting. > > I got tested with > #define MAX_PHYSMEM_BITS 31 > #define SECTION_SIZE_BITS 27 > > # cat /proc/sys/vm/min_free_kbytes > 1832 > # echo 1828 > /proc/sys/vm/min_free_kbytes > # cat /proc/sys/vm/min_free_kbytes > 1828 > # echo 1820 > /proc/sys/vm/min_free_kbytes > # cat /proc/sys/vm/min_free_kbytes > 1820 > # echo 1700 > /proc/sys/vm/min_free_kbytes > # cat /proc/sys/vm/min_free_kbytes > 1700 > > No kernel panic. > Thanks for your test on the board. But I need your environment for comparing. Please let me know. Following is my board environment. From boot-loader(u-boot). SMDKC110 # bdinfo arch_number = 0x00000891 env_t = 0x00000000 boot_params = 0x20000100 DRAM bank = 0x00000000 -> start = 0x20000000 -> size = 0x05000000 DRAM bank = 0x00000001 -> start = 0x40000000 -> size = 0x10000000 DRAM bank = 0x00000002 -> start = 0x50000000 -> size = 0x08000000 From kernel boot message. SMDKC110 # bootm c0008000 Boot with zImage Starting kernel ... Uncompressing Linux... done, booting the kernel. Linux version 2.6.35-rc4-00008-g98c749c (kgene@starstone) (gcc version 4.4.1 (Sourcery G++ Lite 2009q3-67) ) #1 PREEMPT Mon Jul 12 18:47:04 KST 2010 CPU: ARMv7 Processor [412fc082] revision 2 (ARMv7), cr=10c53c7f CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache Machine: SMDKC110 ... Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) Memory: 80MB 256MB 128MB = 464MB total Memory: 459616k/459616k available, 15520k reserved, 0K highmem Virtual kernel memory layout: vector : 0xffff0000 - 0xffff1000 ( 4 kB) fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB) DMA : 0xff000000 - 0xffe00000 ( 14 MB) vmalloc : 0xb8800000 - 0xe0000000 ( 632 MB) lowmem : 0x80000000 - 0xb8000000 ( 896 MB) modules : 0x7f000000 - 0x80000000 ( 16 MB) .init : 0x80008000 - 0x8001e000 ( 88 kB) .text : 0x8001e000 - 0x801be000 (1664 kB) .data : 0x801ce000 - 0x801e6600 ( 98 kB) SLUB: Genslabs=9, HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 Hierarchical RCU implementation. RCU-based detection of stalled CPUs is disabled. Verbose stalled-CPUs detection is disabled. ... And SECTION_SIZE_BITS is 28, not 27. > Thank you, > Kyungmin Park > > On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > > Russell, > > > > Hi, > > > > Kukjin Kim wrote: > >> Russell wrote: > >> > So, memory starts at 0x20000000 and finishes at 0x25000000. That's > > fine. > >> > That doesn't mean the section size is 16MB. > >> > > >> > As I've already said, the section size has _nothing_ what so ever to do > >> > with the size of memory, or the granularity of the size of memory. By > >> > way of illustration, it is perfectly legal to have a section size of > >> > 256MB but only have 1MB in a section and this is perfectly legal. So > >> > sections do not have to be completely filled. > >> > > >> Actually, as you know, the hole's area of mem_map is freed from bootmem if > > a > >> section has a hole when initializing sparse memory. > >> > >> I identified that a section doesn't need to be a contiguous area of > > physical > >> memory when reading your comment with the fact that the mem_map of a > > section > >> can be smaller than the size of a section. > >> > >> I found, however, the kernel panics when modifying min_free_kbytes file in > >> the proc filesystem if a section has a hole. > >> > >> While processing the change of min_free_kbytes in the kernel, page > >> descriptors in a hole of an online section is accessed. > > > > As I said, following error happens. > > It would be helpful to me if any opinions or comments. > > > > --- (snip) Thanks. Best regards, Kgene. -- Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer, SW Solution Development Team, Samsung Electronics Co., Ltd. ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-12 9:58 ` Kukjin Kim 0 siblings, 0 replies; 60+ messages in thread From: Kukjin Kim @ 2010-07-12 9:58 UTC (permalink / raw) To: linux-arm-kernel Kyungmin Park wrote: > > Interesting. > > I got tested with > #define MAX_PHYSMEM_BITS 31 > #define SECTION_SIZE_BITS 27 > > # cat /proc/sys/vm/min_free_kbytes > 1832 > # echo 1828 > /proc/sys/vm/min_free_kbytes > # cat /proc/sys/vm/min_free_kbytes > 1828 > # echo 1820 > /proc/sys/vm/min_free_kbytes > # cat /proc/sys/vm/min_free_kbytes > 1820 > # echo 1700 > /proc/sys/vm/min_free_kbytes > # cat /proc/sys/vm/min_free_kbytes > 1700 > > No kernel panic. > Thanks for your test on the board. But I need your environment for comparing. Please let me know. Following is my board environment. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-12 9:58 ` Kukjin Kim @ 2010-07-12 10:08 ` Kyungmin Park -1 siblings, 0 replies; 60+ messages in thread From: Kyungmin Park @ 2010-07-12 10:08 UTC (permalink / raw) To: Kukjin Kim Cc: linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux On Mon, Jul 12, 2010 at 6:58 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > Kyungmin Park wrote: > >> >> Interesting. >> >> I got tested with >> #define MAX_PHYSMEM_BITS 31 >> #define SECTION_SIZE_BITS 27 >> >> # cat /proc/sys/vm/min_free_kbytes >> 1832 >> # echo 1828 > /proc/sys/vm/min_free_kbytes >> # cat /proc/sys/vm/min_free_kbytes >> 1828 >> # echo 1820 > /proc/sys/vm/min_free_kbytes >> # cat /proc/sys/vm/min_free_kbytes >> 1820 >> # echo 1700 > /proc/sys/vm/min_free_kbytes >> # cat /proc/sys/vm/min_free_kbytes >> 1700 >> >> No kernel panic. >> > > Thanks for your test on the board. > > But I need your environment for comparing. > > Please let me know. > Following is my board environment. > > From boot-loader(u-boot). > > SMDKC110 # bdinfo > arch_number = 0x00000891 > env_t = 0x00000000 > boot_params = 0x20000100 > DRAM bank = 0x00000000 > -> start = 0x20000000 > -> size = 0x05000000 > DRAM bank = 0x00000001 > -> start = 0x40000000 > -> size = 0x10000000 > DRAM bank = 0x00000002 > -> start = 0x50000000 > -> size = 0x08000000 > > From kernel boot message. > > SMDKC110 # bootm c0008000 > Boot with zImage > > Starting kernel ... > > Uncompressing Linux... done, booting the kernel. > Linux version 2.6.35-rc4-00008-g98c749c (kgene@starstone) (gcc version 4.4.1 > (Sourcery G++ Lite 2009q3-67) ) #1 PREEMPT Mon Jul 12 18:47:04 KST 2010 > CPU: ARMv7 Processor [412fc082] revision 2 (ARMv7), cr=10c53c7f > CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache > Machine: SMDKC110 > ... > Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) > Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) > Memory: 80MB 256MB 128MB = 464MB total > Memory: 459616k/459616k available, 15520k reserved, 0K highmem > Virtual kernel memory layout: > vector : 0xffff0000 - 0xffff1000 ( 4 kB) > fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB) > DMA : 0xff000000 - 0xffe00000 ( 14 MB) > vmalloc : 0xb8800000 - 0xe0000000 ( 632 MB) > lowmem : 0x80000000 - 0xb8000000 ( 896 MB) > modules : 0x7f000000 - 0x80000000 ( 16 MB) > .init : 0x80008000 - 0x8001e000 ( 88 kB) > .text : 0x8001e000 - 0x801be000 (1664 kB) > .data : 0x801ce000 - 0x801e6600 ( 98 kB) > SLUB: Genslabs=9, HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 > Hierarchical RCU implementation. > RCU-based detection of stalled CPUs is disabled. > Verbose stalled-CPUs detection is disabled. > ... > > And SECTION_SIZE_BITS is 28, not 27. 27 for our board. 28 for generic. Note that I enabled HIGHMEM and modify the VMALLOC_END to 0xd000'0000 to test HIGHMEM. Universal # bdinfo arch_number = 0x00000B2E env_t = 0x00000000 boot_params = 0x30000100 DRAM bank = 0x00000000 -> start = 0x30000000 -> size = 0x05000000 DRAM bank = 0x00000001 -> start = 0x40000000 -> size = 0x18000000 baudrate = 115200 bps Universal # run bootcmd OneNAND read: offset 0xc00000, size 0x600000 6291456 bytes read: OK ## Booting kernel from Legacy Image at 30007fc0 ... Image Name: Linux-2.6.35-rc4+ Image Type: ARM Linux Kernel Image (uncompressed) Data Size: 1548496 Bytes = 1.5 MiB Load Address: 30008000 Entry Point: 30008000 Loading Kernel Image ... OK OK Starting kernel ... Uncompressing Linux... done, booting the kernel. [ 0.000000] Linux version 2.6.35-rc4+ (kmpark@july) (gcc version 4.4.1 (GCC)0 [ 0.000000] CPU: ARMv7 Processor [412fc082] revision 2 (ARMv7), cr=10c53c7f [ 0.000000] CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction ce [ 0.000000] Machine: GONI [ 0.000000] Ignoring unrecognised tag 0x54410008 [ 0.000000] Memory policy: ECC disabled, Data cache writeback [ 0.000000] CPU S5PV210/S5PC110 (id 0x43110200) ... [ 0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes) [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) [ 0.000000] Memory: 80MB 128MB 128MB 128MB = 464MB total [ 0.000000] Memory: 468224k/468224k available, 6912k reserved, 262144K highmm [ 0.000000] Virtual kernel memory layout: [ 0.000000] vector : 0xffff0000 - 0xffff1000 ( 4 kB) [ 0.000000] fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB) [ 0.000000] DMA : 0xff000000 - 0xffe00000 ( 14 MB) [ 0.000000] vmalloc : 0xd8800000 - 0xe0000000 ( 120 MB) [ 0.000000] lowmem : 0xc0000000 - 0xd8000000 ( 384 MB) [ 0.000000] pkmap : 0xbfe00000 - 0xc0000000 ( 2 MB) [ 0.000000] modules : 0xbf000000 - 0xbfe00000 ( 14 MB) [ 0.000000] .init : 0xc0008000 - 0xc001b000 ( 76 kB) [ 0.000000] .text : 0xc001b000 - 0xc0282000 (2460 kB) [ 0.000000] .data : 0xc0298000 - 0xc02b1160 ( 101 kB) [ 0.000000] Hierarchical RCU implementation. [ 0.000000] RCU-based detection of stalled CPUs is disabled. [ 0.000000] Verbose stalled-CPUs detection is disabled. [ 0.000000] NR_IRQS:176 [ 0.000000] VIC @f4000000: id 0x00041192, vendor 0x41 [ 0.000000] VIC @f4010000: id 0x00041192, vendor 0x41 [ 0.000000] VIC @f4020000: id 0x00041192, vendor 0x41 [ 0.000000] VIC @f4030000: id 0x00041192, vendor 0x41 [ 0.000000] Console: colour dummy device 80x30 [ 0.000000] console [ttySAC2] enabled [ 0.010000] Calibrating delay loop... 797.90 BogoMIPS (lpj=1994752) > >> Thank you, >> Kyungmin Park >> >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: >> > Russell, >> > >> > Hi, >> > >> > Kukjin Kim wrote: >> >> Russell wrote: >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. That's >> > fine. >> >> > That doesn't mean the section size is 16MB. >> >> > >> >> > As I've already said, the section size has _nothing_ what so ever to > do >> >> > with the size of memory, or the granularity of the size of memory. > By >> >> > way of illustration, it is perfectly legal to have a section size of >> >> > 256MB but only have 1MB in a section and this is perfectly legal. So >> >> > sections do not have to be completely filled. >> >> > >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem > if >> > a >> >> section has a hole when initializing sparse memory. >> >> >> >> I identified that a section doesn't need to be a contiguous area of >> > physical >> >> memory when reading your comment with the fact that the mem_map of a >> > section >> >> can be smaller than the size of a section. >> >> >> >> I found, however, the kernel panics when modifying min_free_kbytes file > in >> >> the proc filesystem if a section has a hole. >> >> >> >> While processing the change of min_free_kbytes in the kernel, page >> >> descriptors in a hole of an online section is accessed. >> > >> > As I said, following error happens. >> > It would be helpful to me if any opinions or comments. >> > >> > --- > > > (snip) > > > Thanks. > > Best regards, > Kgene. > -- > Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer, > SW Solution Development Team, Samsung Electronics Co., Ltd. > > ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-12 10:08 ` Kyungmin Park 0 siblings, 0 replies; 60+ messages in thread From: Kyungmin Park @ 2010-07-12 10:08 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jul 12, 2010 at 6:58 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > Kyungmin Park wrote: > >> >> Interesting. >> >> I got tested with >> #define MAX_PHYSMEM_BITS ? ? ? ?31 >> #define SECTION_SIZE_BITS ? ? ? 27 >> >> # cat /proc/sys/vm/min_free_kbytes >> 1832 >> # echo 1828 > /proc/sys/vm/min_free_kbytes >> # cat /proc/sys/vm/min_free_kbytes >> 1828 >> # echo 1820 > /proc/sys/vm/min_free_kbytes >> # cat /proc/sys/vm/min_free_kbytes >> 1820 >> # echo 1700 > /proc/sys/vm/min_free_kbytes >> # cat /proc/sys/vm/min_free_kbytes >> 1700 >> >> No kernel panic. >> > > Thanks for your test on the board. > > But I need your environment for comparing. > > Please let me know. > Following is my board environment. > > From boot-loader(u-boot). > > SMDKC110 # bdinfo > arch_number = 0x00000891 > env_t ? ? ? = 0x00000000 > boot_params = 0x20000100 > DRAM bank ? = 0x00000000 > -> start ? ?= 0x20000000 > -> size ? ? = 0x05000000 > DRAM bank ? = 0x00000001 > -> start ? ?= 0x40000000 > -> size ? ? = 0x10000000 > DRAM bank ? = 0x00000002 > -> start ? ?= 0x50000000 > -> size ? ? = 0x08000000 > > From kernel boot message. > > SMDKC110 # bootm c0008000 > Boot with zImage > > Starting kernel ... > > Uncompressing Linux... done, booting the kernel. > Linux version 2.6.35-rc4-00008-g98c749c (kgene at starstone) (gcc version 4.4.1 > (Sourcery G++ Lite 2009q3-67) ) #1 PREEMPT Mon Jul 12 18:47:04 KST 2010 > CPU: ARMv7 Processor [412fc082] revision 2 (ARMv7), cr=10c53c7f > CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache > Machine: SMDKC110 > ... > Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) > Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) > Memory: 80MB 256MB 128MB = 464MB total > Memory: 459616k/459616k available, 15520k reserved, 0K highmem > Virtual kernel memory layout: > ? ?vector ?: 0xffff0000 - 0xffff1000 ? ( ? 4 kB) > ? ?fixmap ?: 0xfff00000 - 0xfffe0000 ? ( 896 kB) > ? ?DMA ? ? : 0xff000000 - 0xffe00000 ? ( ?14 MB) > ? ?vmalloc : 0xb8800000 - 0xe0000000 ? ( 632 MB) > ? ?lowmem ?: 0x80000000 - 0xb8000000 ? ( 896 MB) > ? ?modules : 0x7f000000 - 0x80000000 ? ( ?16 MB) > ? ? ?.init : 0x80008000 - 0x8001e000 ? ( ?88 kB) > ? ? ?.text : 0x8001e000 - 0x801be000 ? (1664 kB) > ? ? ?.data : 0x801ce000 - 0x801e6600 ? ( ?98 kB) > SLUB: Genslabs=9, HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 > Hierarchical RCU implementation. > ? ? ? ?RCU-based detection of stalled CPUs is disabled. > ? ? ? ?Verbose stalled-CPUs detection is disabled. > ... > > And SECTION_SIZE_BITS is 28, not 27. 27 for our board. 28 for generic. Note that I enabled HIGHMEM and modify the VMALLOC_END to 0xd000'0000 to test HIGHMEM. Universal # bdinfo arch_number = 0x00000B2E env_t = 0x00000000 boot_params = 0x30000100 DRAM bank = 0x00000000 -> start = 0x30000000 -> size = 0x05000000 DRAM bank = 0x00000001 -> start = 0x40000000 -> size = 0x18000000 baudrate = 115200 bps Universal # run bootcmd OneNAND read: offset 0xc00000, size 0x600000 6291456 bytes read: OK ## Booting kernel from Legacy Image at 30007fc0 ... Image Name: Linux-2.6.35-rc4+ Image Type: ARM Linux Kernel Image (uncompressed) Data Size: 1548496 Bytes = 1.5 MiB Load Address: 30008000 Entry Point: 30008000 Loading Kernel Image ... OK OK Starting kernel ... Uncompressing Linux... done, booting the kernel. [ 0.000000] Linux version 2.6.35-rc4+ (kmpark at july) (gcc version 4.4.1 (GCC)0 [ 0.000000] CPU: ARMv7 Processor [412fc082] revision 2 (ARMv7), cr=10c53c7f [ 0.000000] CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction ce [ 0.000000] Machine: GONI [ 0.000000] Ignoring unrecognised tag 0x54410008 [ 0.000000] Memory policy: ECC disabled, Data cache writeback [ 0.000000] CPU S5PV210/S5PC110 (id 0x43110200) ... [ 0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes) [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) [ 0.000000] Memory: 80MB 128MB 128MB 128MB = 464MB total [ 0.000000] Memory: 468224k/468224k available, 6912k reserved, 262144K highmm [ 0.000000] Virtual kernel memory layout: [ 0.000000] vector : 0xffff0000 - 0xffff1000 ( 4 kB) [ 0.000000] fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB) [ 0.000000] DMA : 0xff000000 - 0xffe00000 ( 14 MB) [ 0.000000] vmalloc : 0xd8800000 - 0xe0000000 ( 120 MB) [ 0.000000] lowmem : 0xc0000000 - 0xd8000000 ( 384 MB) [ 0.000000] pkmap : 0xbfe00000 - 0xc0000000 ( 2 MB) [ 0.000000] modules : 0xbf000000 - 0xbfe00000 ( 14 MB) [ 0.000000] .init : 0xc0008000 - 0xc001b000 ( 76 kB) [ 0.000000] .text : 0xc001b000 - 0xc0282000 (2460 kB) [ 0.000000] .data : 0xc0298000 - 0xc02b1160 ( 101 kB) [ 0.000000] Hierarchical RCU implementation. [ 0.000000] RCU-based detection of stalled CPUs is disabled. [ 0.000000] Verbose stalled-CPUs detection is disabled. [ 0.000000] NR_IRQS:176 [ 0.000000] VIC @f4000000: id 0x00041192, vendor 0x41 [ 0.000000] VIC @f4010000: id 0x00041192, vendor 0x41 [ 0.000000] VIC @f4020000: id 0x00041192, vendor 0x41 [ 0.000000] VIC @f4030000: id 0x00041192, vendor 0x41 [ 0.000000] Console: colour dummy device 80x30 [ 0.000000] console [ttySAC2] enabled [ 0.010000] Calibrating delay loop... 797.90 BogoMIPS (lpj=1994752) > >> Thank you, >> Kyungmin Park >> >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: >> > Russell, >> > >> > Hi, >> > >> > Kukjin Kim wrote: >> >> Russell wrote: >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's >> > fine. >> >> > That doesn't mean the section size is 16MB. >> >> > >> >> > As I've already said, the section size has _nothing_ what so ever to > do >> >> > with the size of memory, or the granularity of the size of memory. > ?By >> >> > way of illustration, it is perfectly legal to have a section size of >> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So >> >> > sections do not have to be completely filled. >> >> > >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem > if >> > a >> >> section has a hole when initializing sparse memory. >> >> >> >> I identified that a section doesn't need to be a contiguous area of >> > physical >> >> memory when reading your comment with the fact that the mem_map of a >> > section >> >> can be smaller than the size of a section. >> >> >> >> I found, however, the kernel panics when modifying min_free_kbytes file > in >> >> the proc filesystem if a section has a hole. >> >> >> >> While processing the change of min_free_kbytes in the kernel, page >> >> descriptors in a hole of an online section is accessed. >> > >> > As I said, following error happens. >> > It would be helpful to me if any opinions or comments. >> > >> > --- > > > (snip) > > > Thanks. > > Best regards, > Kgene. > -- > Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer, > SW Solution Development Team, Samsung Electronics Co., Ltd. > > ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-12 10:08 ` Kyungmin Park @ 2010-07-12 10:13 ` Russell King - ARM Linux -1 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-12 10:13 UTC (permalink / raw) To: Kyungmin Park; +Cc: Kukjin Kim, linux-arm-kernel, linux-samsung-soc, ben-linux On Mon, Jul 12, 2010 at 07:08:11PM +0900, Kyungmin Park wrote: > 27 for our board. 28 for generic. > > Note that I enabled HIGHMEM and modify the VMALLOC_END to 0xd000'0000 > to test HIGHMEM. If you want to test HIGHMEM, the way to do it is to specify a vmalloc= parameter to the kernel, not to modify VMALLOC_END. > Universal # bdinfo > arch_number = 0x00000B2E > env_t = 0x00000000 > boot_params = 0x30000100 > DRAM bank = 0x00000000 > -> start = 0x30000000 > -> size = 0x05000000 > DRAM bank = 0x00000001 > -> start = 0x40000000 > -> size = 0x18000000 28 will work for this too. ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-12 10:13 ` Russell King - ARM Linux 0 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-12 10:13 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jul 12, 2010 at 07:08:11PM +0900, Kyungmin Park wrote: > 27 for our board. 28 for generic. > > Note that I enabled HIGHMEM and modify the VMALLOC_END to 0xd000'0000 > to test HIGHMEM. If you want to test HIGHMEM, the way to do it is to specify a vmalloc= parameter to the kernel, not to modify VMALLOC_END. > Universal # bdinfo > arch_number = 0x00000B2E > env_t = 0x00000000 > boot_params = 0x30000100 > DRAM bank = 0x00000000 > -> start = 0x30000000 > -> size = 0x05000000 > DRAM bank = 0x00000001 > -> start = 0x40000000 > -> size = 0x18000000 28 will work for this too. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-12 8:32 ` Kukjin Kim @ 2010-07-12 9:52 ` Minchan Kim -1 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-12 9:52 UTC (permalink / raw) To: Kukjin Kim Cc: linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux Hello. On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > Russell, > > Hi, > > Kukjin Kim wrote: >> Russell wrote: >> > So, memory starts at 0x20000000 and finishes at 0x25000000. That's > fine. >> > That doesn't mean the section size is 16MB. >> > >> > As I've already said, the section size has _nothing_ what so ever to do >> > with the size of memory, or the granularity of the size of memory. By >> > way of illustration, it is perfectly legal to have a section size of >> > 256MB but only have 1MB in a section and this is perfectly legal. So >> > sections do not have to be completely filled. >> > >> Actually, as you know, the hole's area of mem_map is freed from bootmem if > a >> section has a hole when initializing sparse memory. >> >> I identified that a section doesn't need to be a contiguous area of > physical >> memory when reading your comment with the fact that the mem_map of a > section >> can be smaller than the size of a section. >> >> I found, however, the kernel panics when modifying min_free_kbytes file in >> the proc filesystem if a section has a hole. >> >> While processing the change of min_free_kbytes in the kernel, page >> descriptors in a hole of an online section is accessed. > > As I said, following error happens. > It would be helpful to me if any opinions or comments. > Could you test below patch? Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone *zone) for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { if (!pfn_valid(pfn)) continue; + page = pfn_to_page(pfn); + /* Watch for unexpected holes punched in the memmap */ + if (!memmap_valid_within(pfn, page, zone)) + continue; + /* Watch out for overlapping nodes */ if (page_to_nid(page) != zone_to_nid(zone)) continue; -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-12 9:52 ` Minchan Kim 0 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-12 9:52 UTC (permalink / raw) To: linux-arm-kernel Hello. On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > Russell, > > Hi, > > Kukjin Kim wrote: >> Russell wrote: >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's > fine. >> > That doesn't mean the section size is 16MB. >> > >> > As I've already said, the section size has _nothing_ what so ever to do >> > with the size of memory, or the granularity of the size of memory. ?By >> > way of illustration, it is perfectly legal to have a section size of >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So >> > sections do not have to be completely filled. >> > >> Actually, as you know, the hole's area of mem_map is freed from bootmem if > a >> section has a hole when initializing sparse memory. >> >> I identified that a section doesn't need to be a contiguous area of > physical >> memory when reading your comment with the fact that the mem_map of a > section >> can be smaller than the size of a section. >> >> I found, however, the kernel panics when modifying min_free_kbytes file in >> the proc filesystem if a section has a hole. >> >> While processing the change of min_free_kbytes in the kernel, page >> descriptors in a hole of an online section is accessed. > > As I said, following error happens. > It would be helpful to me if any opinions or comments. > Could you test below patch? Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone *zone) for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { if (!pfn_valid(pfn)) continue; + page = pfn_to_page(pfn); + /* Watch for unexpected holes punched in the memmap */ + if (!memmap_valid_within(pfn, page, zone)) + continue; + /* Watch out for overlapping nodes */ if (page_to_nid(page) != zone_to_nid(zone)) continue; -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* RE: About SECTION_SIZE_BITS for Sparsemem 2010-07-12 9:52 ` Minchan Kim @ 2010-07-12 10:13 ` Kukjin Kim -1 siblings, 0 replies; 60+ messages in thread From: Kukjin Kim @ 2010-07-12 10:13 UTC (permalink / raw) To: 'Minchan Kim' Cc: linux-samsung-soc, 'Russell King - ARM Linux', ben-linux, linux-arm-kernel Minchan Kim wrote: > > Hello. > Hello :-) > On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > > Russell, > > > > Hi, > > > > Kukjin Kim wrote: > >> Russell wrote: > >> > So, memory starts at 0x20000000 and finishes at 0x25000000. That's > > fine. > >> > That doesn't mean the section size is 16MB. > >> > > >> > As I've already said, the section size has _nothing_ what so ever to do > >> > with the size of memory, or the granularity of the size of memory. By > >> > way of illustration, it is perfectly legal to have a section size of > >> > 256MB but only have 1MB in a section and this is perfectly legal. So > >> > sections do not have to be completely filled. > >> > > >> Actually, as you know, the hole's area of mem_map is freed from bootmem if > > a > >> section has a hole when initializing sparse memory. > >> > >> I identified that a section doesn't need to be a contiguous area of > > physical > >> memory when reading your comment with the fact that the mem_map of a > > section > >> can be smaller than the size of a section. > >> > >> I found, however, the kernel panics when modifying min_free_kbytes file in > >> the proc filesystem if a section has a hole. > >> > >> While processing the change of min_free_kbytes in the kernel, page > >> descriptors in a hole of an online section is accessed. > > > > As I said, following error happens. > > It would be helpful to me if any opinions or comments. > > > > Could you test below patch? > Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. > Yes, I did it, and no kernel panic happens :-) Same test... [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes 2736 [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes [root@Samsung ~]# [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes 2730 > @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone > *zone) > for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { > if (!pfn_valid(pfn)) > continue; > + > page = pfn_to_page(pfn); > > + /* Watch for unexpected holes punched in the memmap */ > + if (!memmap_valid_within(pfn, page, zone)) > + continue; > + > /* Watch out for overlapping nodes */ > if (page_to_nid(page) != zone_to_nid(zone)) > continue; > > > ...Could you please explain about this issue? Thanks. Best regards, Kgene. -- Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer, SW Solution Development Team, Samsung Electronics Co., Ltd. ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-12 10:13 ` Kukjin Kim 0 siblings, 0 replies; 60+ messages in thread From: Kukjin Kim @ 2010-07-12 10:13 UTC (permalink / raw) To: linux-arm-kernel Minchan Kim wrote: > > Hello. > Hello :-) > On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > > Russell, > > > > Hi, > > > > Kukjin Kim wrote: > >> Russell wrote: > >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's > > fine. > >> > That doesn't mean the section size is 16MB. > >> > > >> > As I've already said, the section size has _nothing_ what so ever to do > >> > with the size of memory, or the granularity of the size of memory. ?By > >> > way of illustration, it is perfectly legal to have a section size of > >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So > >> > sections do not have to be completely filled. > >> > > >> Actually, as you know, the hole's area of mem_map is freed from bootmem if > > a > >> section has a hole when initializing sparse memory. > >> > >> I identified that a section doesn't need to be a contiguous area of > > physical > >> memory when reading your comment with the fact that the mem_map of a > > section > >> can be smaller than the size of a section. > >> > >> I found, however, the kernel panics when modifying min_free_kbytes file in > >> the proc filesystem if a section has a hole. > >> > >> While processing the change of min_free_kbytes in the kernel, page > >> descriptors in a hole of an online section is accessed. > > > > As I said, following error happens. > > It would be helpful to me if any opinions or comments. > > > > Could you test below patch? > Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. > Yes, I did it, and no kernel panic happens :-) Same test... [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes 2736 [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes [root at Samsung ~]# [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes 2730 > @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone > *zone) > for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { > if (!pfn_valid(pfn)) > continue; > + > page = pfn_to_page(pfn); > > + /* Watch for unexpected holes punched in the memmap */ > + if (!memmap_valid_within(pfn, page, zone)) > + continue; > + > /* Watch out for overlapping nodes */ > if (page_to_nid(page) != zone_to_nid(zone)) > continue; > > > ...Could you please explain about this issue? Thanks. Best regards, Kgene. -- Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer, SW Solution Development Team, Samsung Electronics Co., Ltd. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-12 10:13 ` Kukjin Kim @ 2010-07-12 10:35 ` Minchan Kim -1 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-12 10:35 UTC (permalink / raw) To: Kukjin Kim, Mel Gorman, KAMEZAWA Hiroyuki Cc: linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux Hi, Mel and Kame. On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > Minchan Kim wrote: >> >> Hello. >> > Hello :-) > >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: >> > Russell, >> > >> > Hi, >> > >> > Kukjin Kim wrote: >> >> Russell wrote: >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. That's >> > fine. >> >> > That doesn't mean the section size is 16MB. >> >> > >> >> > As I've already said, the section size has _nothing_ what so ever to > do >> >> > with the size of memory, or the granularity of the size of memory. > By >> >> > way of illustration, it is perfectly legal to have a section size of >> >> > 256MB but only have 1MB in a section and this is perfectly legal. So >> >> > sections do not have to be completely filled. >> >> > >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem > if >> > a >> >> section has a hole when initializing sparse memory. >> >> >> >> I identified that a section doesn't need to be a contiguous area of >> > physical >> >> memory when reading your comment with the fact that the mem_map of a >> > section >> >> can be smaller than the size of a section. >> >> >> >> I found, however, the kernel panics when modifying min_free_kbytes file > in >> >> the proc filesystem if a section has a hole. >> >> >> >> While processing the change of min_free_kbytes in the kernel, page >> >> descriptors in a hole of an online section is accessed. >> > >> > As I said, following error happens. >> > It would be helpful to me if any opinions or comments. >> > >> >> Could you test below patch? >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. >> > Yes, I did it, and no kernel panic happens :-) > > Same test... > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes > 2736 > [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes > [root@Samsung ~]# > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes > 2730 > > >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone >> *zone) >> for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { >> if (!pfn_valid(pfn)) >> continue; >> + >> page = pfn_to_page(pfn); >> >> + /* Watch for unexpected holes punched in the memmap */ >> + if (!memmap_valid_within(pfn, page, zone)) >> + continue; >> + >> /* Watch out for overlapping nodes */ >> if (page_to_nid(page) != zone_to_nid(zone)) >> continue; >> >> >> > > ...Could you please explain about this issue? The setup_zone_migrate_reserve doesn't check memmap hole. I think compaction would have the same problem, too. I don't know there is a problem in elsewhere. Anyway, I think memmap_valid_within calling whenever walking whole pfn range isn't a good solution. We already have pfn_valid. Could we check this in there? For example, mem_section have a valid pfn range and then valid section can test it in pfn_valid. What do you think about it? P.S) I know Mel is very busy to test to avoid writeback in direct reclaim. I can wait on you. :) Kukjin. Is this a urgent issue? If it isn't, please wait on our mm guys. :) -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-12 10:35 ` Minchan Kim 0 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-12 10:35 UTC (permalink / raw) To: linux-arm-kernel Hi, Mel and Kame. On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > Minchan Kim wrote: >> >> Hello. >> > Hello :-) > >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: >> > Russell, >> > >> > Hi, >> > >> > Kukjin Kim wrote: >> >> Russell wrote: >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's >> > fine. >> >> > That doesn't mean the section size is 16MB. >> >> > >> >> > As I've already said, the section size has _nothing_ what so ever to > do >> >> > with the size of memory, or the granularity of the size of memory. > ?By >> >> > way of illustration, it is perfectly legal to have a section size of >> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So >> >> > sections do not have to be completely filled. >> >> > >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem > if >> > a >> >> section has a hole when initializing sparse memory. >> >> >> >> I identified that a section doesn't need to be a contiguous area of >> > physical >> >> memory when reading your comment with the fact that the mem_map of a >> > section >> >> can be smaller than the size of a section. >> >> >> >> I found, however, the kernel panics when modifying min_free_kbytes file > in >> >> the proc filesystem if a section has a hole. >> >> >> >> While processing the change of min_free_kbytes in the kernel, page >> >> descriptors in a hole of an online section is accessed. >> > >> > As I said, following error happens. >> > It would be helpful to me if any opinions or comments. >> > >> >> Could you test below patch? >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. >> > Yes, I did it, and no kernel panic happens :-) > > Same test... > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes > 2736 > [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes > [root at Samsung ~]# > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes > 2730 > > >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone >> *zone) >> ? ? ? ? for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { >> ? ? ? ? ? ? ? ? if (!pfn_valid(pfn)) >> ? ? ? ? ? ? ? ? ? ? ? ? continue; >> + >> ? ? ? ? ? ? ? ? page = pfn_to_page(pfn); >> >> + ? ? ? ? ? ? ? ?/* Watch for unexpected holes punched in the memmap */ >> + ? ? ? ? ? ? ? ?if (!memmap_valid_within(pfn, page, zone)) >> + ? ? ? ? ? ? ? ? ? ? ? ?continue; >> + >> ? ? ? ? ? ? ? ? /* Watch out for overlapping nodes */ >> ? ? ? ? ? ? ? ? if (page_to_nid(page) != zone_to_nid(zone)) >> ? ? ? ? ? ? ? ? ? ? ? ? continue; >> >> >> > > ...Could you please explain about this issue? The setup_zone_migrate_reserve doesn't check memmap hole. I think compaction would have the same problem, too. I don't know there is a problem in elsewhere. Anyway, I think memmap_valid_within calling whenever walking whole pfn range isn't a good solution. We already have pfn_valid. Could we check this in there? For example, mem_section have a valid pfn range and then valid section can test it in pfn_valid. What do you think about it? P.S) I know Mel is very busy to test to avoid writeback in direct reclaim. I can wait on you. :) Kukjin. Is this a urgent issue? If it isn't, please wait on our mm guys. :) -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-12 10:35 ` Minchan Kim @ 2010-07-13 0:25 ` KAMEZAWA Hiroyuki -1 siblings, 0 replies; 60+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-07-13 0:25 UTC (permalink / raw) To: Minchan Kim Cc: Kukjin Kim, Mel Gorman, linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux On Mon, 12 Jul 2010 19:35:17 +0900 Minchan Kim <minchan.kim@gmail.com> wrote: > Hi, Mel and Kame. > > On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > > Minchan Kim wrote: > >> > >> Hello. > >> > > Hello :-) > > > >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > >> > Russell, > >> > > >> > Hi, > >> > > >> > Kukjin Kim wrote: > >> >> Russell wrote: > >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. That's > >> > fine. > >> >> > That doesn't mean the section size is 16MB. > >> >> > > >> >> > As I've already said, the section size has _nothing_ what so ever to > > do > >> >> > with the size of memory, or the granularity of the size of memory. > > By > >> >> > way of illustration, it is perfectly legal to have a section size of > >> >> > 256MB but only have 1MB in a section and this is perfectly legal. So > >> >> > sections do not have to be completely filled. > >> >> > > >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem > > if > >> > a > >> >> section has a hole when initializing sparse memory. > >> >> > >> >> I identified that a section doesn't need to be a contiguous area of > >> > physical > >> >> memory when reading your comment with the fact that the mem_map of a > >> > section > >> >> can be smaller than the size of a section. > >> >> > >> >> I found, however, the kernel panics when modifying min_free_kbytes file > > in > >> >> the proc filesystem if a section has a hole. > >> >> > >> >> While processing the change of min_free_kbytes in the kernel, page > >> >> descriptors in a hole of an online section is accessed. > >> > > >> > As I said, following error happens. > >> > It would be helpful to me if any opinions or comments. > >> > > >> > >> Could you test below patch? > >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. > >> > > Yes, I did it, and no kernel panic happens :-) > > > > Same test... > > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes > > 2736 > > [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes > > [root@Samsung ~]# > > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes > > 2730 > > > > > >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone > >> *zone) > >> for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { > >> if (!pfn_valid(pfn)) > >> continue; > >> + > >> page = pfn_to_page(pfn); > >> > >> + /* Watch for unexpected holes punched in the memmap */ > >> + if (!memmap_valid_within(pfn, page, zone)) > >> + continue; > >> + > >> /* Watch out for overlapping nodes */ > >> if (page_to_nid(page) != zone_to_nid(zone)) > >> continue; > >> > >> > >> > > > > ...Could you please explain about this issue? > > The setup_zone_migrate_reserve doesn't check memmap hole. I think > compaction would have the same problem, too. I don't know there is a > problem in elsewhere. > Anyway, I think memmap_valid_within calling whenever walking whole pfn > range isn't a good solution. We already have pfn_valid. Could we check > this in there? > For example, mem_section have a valid pfn range and then valid section > can test it in pfn_valid. > > What do you think about it? > please map another "fake" page when you free memmap. For example, prepare a page filled with (1 << PG_reserved). and replace it with unnecessary memmap rather than freeing a page for memmap. Then, you can find a PageReserved() page there. However it doesn't have valid nid,zid,secitonID, I don't think it can cause trouble. But you may see some BUG_ON() hit. If so, I think ARM guys can add #define PageHole(page) (PageReserved(page) && PageXXXXX(page)) for avoiding BUG_ON() hit. (and this will help to skip unnecessary memmap_init_zone()) By this, you can finaly remove CONFIG_MEMMAP_HOLES_IN_ZONE, and makes ARM's sparsemem much easier for maintainance. section ID is not important as far as you call page_to_pfn(). You may able to add WARNING when page_to_pfn() is called against a HOLE page. You may feel above as a big work but I don't think it's very difficult at the first thought. Thanks, -Kame ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-13 0:25 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 60+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-07-13 0:25 UTC (permalink / raw) To: linux-arm-kernel On Mon, 12 Jul 2010 19:35:17 +0900 Minchan Kim <minchan.kim@gmail.com> wrote: > Hi, Mel and Kame. > > On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > > Minchan Kim wrote: > >> > >> Hello. > >> > > Hello :-) > > > >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > >> > Russell, > >> > > >> > Hi, > >> > > >> > Kukjin Kim wrote: > >> >> Russell wrote: > >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's > >> > fine. > >> >> > That doesn't mean the section size is 16MB. > >> >> > > >> >> > As I've already said, the section size has _nothing_ what so ever to > > do > >> >> > with the size of memory, or the granularity of the size of memory. > > ?By > >> >> > way of illustration, it is perfectly legal to have a section size of > >> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So > >> >> > sections do not have to be completely filled. > >> >> > > >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem > > if > >> > a > >> >> section has a hole when initializing sparse memory. > >> >> > >> >> I identified that a section doesn't need to be a contiguous area of > >> > physical > >> >> memory when reading your comment with the fact that the mem_map of a > >> > section > >> >> can be smaller than the size of a section. > >> >> > >> >> I found, however, the kernel panics when modifying min_free_kbytes file > > in > >> >> the proc filesystem if a section has a hole. > >> >> > >> >> While processing the change of min_free_kbytes in the kernel, page > >> >> descriptors in a hole of an online section is accessed. > >> > > >> > As I said, following error happens. > >> > It would be helpful to me if any opinions or comments. > >> > > >> > >> Could you test below patch? > >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. > >> > > Yes, I did it, and no kernel panic happens :-) > > > > Same test... > > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes > > 2736 > > [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes > > [root at Samsung ~]# > > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes > > 2730 > > > > > >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone > >> *zone) > >> ? ? ? ? for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { > >> ? ? ? ? ? ? ? ? if (!pfn_valid(pfn)) > >> ? ? ? ? ? ? ? ? ? ? ? ? continue; > >> + > >> ? ? ? ? ? ? ? ? page = pfn_to_page(pfn); > >> > >> + ? ? ? ? ? ? ? ?/* Watch for unexpected holes punched in the memmap */ > >> + ? ? ? ? ? ? ? ?if (!memmap_valid_within(pfn, page, zone)) > >> + ? ? ? ? ? ? ? ? ? ? ? ?continue; > >> + > >> ? ? ? ? ? ? ? ? /* Watch out for overlapping nodes */ > >> ? ? ? ? ? ? ? ? if (page_to_nid(page) != zone_to_nid(zone)) > >> ? ? ? ? ? ? ? ? ? ? ? ? continue; > >> > >> > >> > > > > ...Could you please explain about this issue? > > The setup_zone_migrate_reserve doesn't check memmap hole. I think > compaction would have the same problem, too. I don't know there is a > problem in elsewhere. > Anyway, I think memmap_valid_within calling whenever walking whole pfn > range isn't a good solution. We already have pfn_valid. Could we check > this in there? > For example, mem_section have a valid pfn range and then valid section > can test it in pfn_valid. > > What do you think about it? > please map another "fake" page when you free memmap. For example, prepare a page filled with (1 << PG_reserved). and replace it with unnecessary memmap rather than freeing a page for memmap. Then, you can find a PageReserved() page there. However it doesn't have valid nid,zid,secitonID, I don't think it can cause trouble. But you may see some BUG_ON() hit. If so, I think ARM guys can add #define PageHole(page) (PageReserved(page) && PageXXXXX(page)) for avoiding BUG_ON() hit. (and this will help to skip unnecessary memmap_init_zone()) By this, you can finaly remove CONFIG_MEMMAP_HOLES_IN_ZONE, and makes ARM's sparsemem much easier for maintainance. section ID is not important as far as you call page_to_pfn(). You may able to add WARNING when page_to_pfn() is called against a HOLE page. You may feel above as a big work but I don't think it's very difficult@the first thought. Thanks, -Kame ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-13 0:25 ` KAMEZAWA Hiroyuki @ 2010-07-13 1:53 ` KAMEZAWA Hiroyuki -1 siblings, 0 replies; 60+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-07-13 1:53 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, Kukjin Kim, Mel Gorman, linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux On Tue, 13 Jul 2010 09:25:22 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Mon, 12 Jul 2010 19:35:17 +0900 > Minchan Kim <minchan.kim@gmail.com> wrote: > > > Hi, Mel and Kame. > > > > On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > > > Minchan Kim wrote: > > >> > > >> Hello. > > >> > > > Hello :-) > > > > > >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > > >> > Russell, > > >> > > > >> > Hi, > > >> > > > >> > Kukjin Kim wrote: > > >> >> Russell wrote: > > >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. That's > > >> > fine. > > >> >> > That doesn't mean the section size is 16MB. > > >> >> > > > >> >> > As I've already said, the section size has _nothing_ what so ever to > > > do > > >> >> > with the size of memory, or the granularity of the size of memory. > > > By > > >> >> > way of illustration, it is perfectly legal to have a section size of > > >> >> > 256MB but only have 1MB in a section and this is perfectly legal. So > > >> >> > sections do not have to be completely filled. > > >> >> > > > >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem > > > if > > >> > a > > >> >> section has a hole when initializing sparse memory. > > >> >> > > >> >> I identified that a section doesn't need to be a contiguous area of > > >> > physical > > >> >> memory when reading your comment with the fact that the mem_map of a > > >> > section > > >> >> can be smaller than the size of a section. > > >> >> > > >> >> I found, however, the kernel panics when modifying min_free_kbytes file > > > in > > >> >> the proc filesystem if a section has a hole. > > >> >> > > >> >> While processing the change of min_free_kbytes in the kernel, page > > >> >> descriptors in a hole of an online section is accessed. > > >> > > > >> > As I said, following error happens. > > >> > It would be helpful to me if any opinions or comments. > > >> > > > >> > > >> Could you test below patch? > > >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. > > >> > > > Yes, I did it, and no kernel panic happens :-) > > > > > > Same test... > > > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes > > > 2736 > > > [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes > > > [root@Samsung ~]# > > > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes > > > 2730 > > > > > > > > >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone > > >> *zone) > > >> for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { > > >> if (!pfn_valid(pfn)) > > >> continue; > > >> + > > >> page = pfn_to_page(pfn); > > >> > > >> + /* Watch for unexpected holes punched in the memmap */ > > >> + if (!memmap_valid_within(pfn, page, zone)) > > >> + continue; > > >> + > > >> /* Watch out for overlapping nodes */ > > >> if (page_to_nid(page) != zone_to_nid(zone)) > > >> continue; > > >> > > >> > > >> > > > > > > ...Could you please explain about this issue? > > > > The setup_zone_migrate_reserve doesn't check memmap hole. I think > > compaction would have the same problem, too. I don't know there is a > > problem in elsewhere. > > Anyway, I think memmap_valid_within calling whenever walking whole pfn > > range isn't a good solution. We already have pfn_valid. Could we check > > this in there? > > For example, mem_section have a valid pfn range and then valid section > > can test it in pfn_valid. > > > > What do you think about it? > > > > please map another "fake" page when you free memmap. > > For example, prepare a page filled with (1 << PG_reserved). > and replace it with unnecessary memmap rather than freeing a page for memmap. > > Then, you can find a PageReserved() page there. However it doesn't have valid > nid,zid,secitonID, I don't think it can cause trouble. > But you may see some BUG_ON() hit. If so, I think ARM guys can add > yet another _very easy_ solution is, define pfn_valid() as following. == static inline arm_pfn_valid(unsigned long pfn) { struct page *page = pfn_to_page(pfn); char *end_of_page = (char *)(page + 1) - 1; char byte; if ((((unsigned long)page) && PAGE_MASK) == (((unsigned long)end_of_page) & PAGE_MASK)) return (__get_user(byte, page) == 0); return (__get_user(byte, page) == 0) && (__get_user(byte, __end_of_page)== 0); } == And replace this with default pfn_valid() for SPARSEMEM. (please see ia64_pfn_valid().) I hope this is enough quick.....no ? (we use get_user but the "page" will be accessed soon, anyway.) Thanks, -Kame ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-13 1:53 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 60+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-07-13 1:53 UTC (permalink / raw) To: linux-arm-kernel On Tue, 13 Jul 2010 09:25:22 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Mon, 12 Jul 2010 19:35:17 +0900 > Minchan Kim <minchan.kim@gmail.com> wrote: > > > Hi, Mel and Kame. > > > > On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > > > Minchan Kim wrote: > > >> > > >> Hello. > > >> > > > Hello :-) > > > > > >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > > >> > Russell, > > >> > > > >> > Hi, > > >> > > > >> > Kukjin Kim wrote: > > >> >> Russell wrote: > > >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's > > >> > fine. > > >> >> > That doesn't mean the section size is 16MB. > > >> >> > > > >> >> > As I've already said, the section size has _nothing_ what so ever to > > > do > > >> >> > with the size of memory, or the granularity of the size of memory. > > > ?By > > >> >> > way of illustration, it is perfectly legal to have a section size of > > >> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So > > >> >> > sections do not have to be completely filled. > > >> >> > > > >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem > > > if > > >> > a > > >> >> section has a hole when initializing sparse memory. > > >> >> > > >> >> I identified that a section doesn't need to be a contiguous area of > > >> > physical > > >> >> memory when reading your comment with the fact that the mem_map of a > > >> > section > > >> >> can be smaller than the size of a section. > > >> >> > > >> >> I found, however, the kernel panics when modifying min_free_kbytes file > > > in > > >> >> the proc filesystem if a section has a hole. > > >> >> > > >> >> While processing the change of min_free_kbytes in the kernel, page > > >> >> descriptors in a hole of an online section is accessed. > > >> > > > >> > As I said, following error happens. > > >> > It would be helpful to me if any opinions or comments. > > >> > > > >> > > >> Could you test below patch? > > >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. > > >> > > > Yes, I did it, and no kernel panic happens :-) > > > > > > Same test... > > > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes > > > 2736 > > > [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes > > > [root at Samsung ~]# > > > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes > > > 2730 > > > > > > > > >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone > > >> *zone) > > >> ? ? ? ? for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { > > >> ? ? ? ? ? ? ? ? if (!pfn_valid(pfn)) > > >> ? ? ? ? ? ? ? ? ? ? ? ? continue; > > >> + > > >> ? ? ? ? ? ? ? ? page = pfn_to_page(pfn); > > >> > > >> + ? ? ? ? ? ? ? ?/* Watch for unexpected holes punched in the memmap */ > > >> + ? ? ? ? ? ? ? ?if (!memmap_valid_within(pfn, page, zone)) > > >> + ? ? ? ? ? ? ? ? ? ? ? ?continue; > > >> + > > >> ? ? ? ? ? ? ? ? /* Watch out for overlapping nodes */ > > >> ? ? ? ? ? ? ? ? if (page_to_nid(page) != zone_to_nid(zone)) > > >> ? ? ? ? ? ? ? ? ? ? ? ? continue; > > >> > > >> > > >> > > > > > > ...Could you please explain about this issue? > > > > The setup_zone_migrate_reserve doesn't check memmap hole. I think > > compaction would have the same problem, too. I don't know there is a > > problem in elsewhere. > > Anyway, I think memmap_valid_within calling whenever walking whole pfn > > range isn't a good solution. We already have pfn_valid. Could we check > > this in there? > > For example, mem_section have a valid pfn range and then valid section > > can test it in pfn_valid. > > > > What do you think about it? > > > > please map another "fake" page when you free memmap. > > For example, prepare a page filled with (1 << PG_reserved). > and replace it with unnecessary memmap rather than freeing a page for memmap. > > Then, you can find a PageReserved() page there. However it doesn't have valid > nid,zid,secitonID, I don't think it can cause trouble. > But you may see some BUG_ON() hit. If so, I think ARM guys can add > yet another _very easy_ solution is, define pfn_valid() as following. == static inline arm_pfn_valid(unsigned long pfn) { struct page *page = pfn_to_page(pfn); char *end_of_page = (char *)(page + 1) - 1; char byte; if ((((unsigned long)page) && PAGE_MASK) == (((unsigned long)end_of_page) & PAGE_MASK)) return (__get_user(byte, page) == 0); return (__get_user(byte, page) == 0) && (__get_user(byte, __end_of_page)== 0); } == And replace this with default pfn_valid() for SPARSEMEM. (please see ia64_pfn_valid().) I hope this is enough quick.....no ? (we use get_user but the "page" will be accessed soon, anyway.) Thanks, -Kame ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-13 1:53 ` KAMEZAWA Hiroyuki @ 2010-07-13 18:31 ` Russell King - ARM Linux -1 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-13 18:31 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, Kukjin Kim, Mel Gorman, linux-arm-kernel, linux-samsung-soc, ben-linux On Tue, Jul 13, 2010 at 10:53:24AM +0900, KAMEZAWA Hiroyuki wrote: > yet another _very easy_ solution is, define pfn_valid() as following. We have a perfectly good and efficient pfn_valid() implementation which'll work no matter what memory model is chosen - we don't need to invent other solutions. But as sparsemem always provides its own pfn_valid() and is not overridable... Maybe the answer _is_ to make it overridable. ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-13 18:31 ` Russell King - ARM Linux 0 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-13 18:31 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 13, 2010 at 10:53:24AM +0900, KAMEZAWA Hiroyuki wrote: > yet another _very easy_ solution is, define pfn_valid() as following. We have a perfectly good and efficient pfn_valid() implementation which'll work no matter what memory model is chosen - we don't need to invent other solutions. But as sparsemem always provides its own pfn_valid() and is not overridable... Maybe the answer _is_ to make it overridable. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-13 0:25 ` KAMEZAWA Hiroyuki @ 2010-07-13 2:05 ` Minchan Kim -1 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-13 2:05 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Kukjin Kim, Mel Gorman, linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux On Tue, Jul 13, 2010 at 9:25 AM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Mon, 12 Jul 2010 19:35:17 +0900 > Minchan Kim <minchan.kim@gmail.com> wrote: > >> Hi, Mel and Kame. >> >> On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: >> > Minchan Kim wrote: >> >> >> >> Hello. >> >> >> > Hello :-) >> > >> >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: >> >> > Russell, >> >> > >> >> > Hi, >> >> > >> >> > Kukjin Kim wrote: >> >> >> Russell wrote: >> >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. That's >> >> > fine. >> >> >> > That doesn't mean the section size is 16MB. >> >> >> > >> >> >> > As I've already said, the section size has _nothing_ what so ever to >> > do >> >> >> > with the size of memory, or the granularity of the size of memory. >> > By >> >> >> > way of illustration, it is perfectly legal to have a section size of >> >> >> > 256MB but only have 1MB in a section and this is perfectly legal. So >> >> >> > sections do not have to be completely filled. >> >> >> > >> >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem >> > if >> >> > a >> >> >> section has a hole when initializing sparse memory. >> >> >> >> >> >> I identified that a section doesn't need to be a contiguous area of >> >> > physical >> >> >> memory when reading your comment with the fact that the mem_map of a >> >> > section >> >> >> can be smaller than the size of a section. >> >> >> >> >> >> I found, however, the kernel panics when modifying min_free_kbytes file >> > in >> >> >> the proc filesystem if a section has a hole. >> >> >> >> >> >> While processing the change of min_free_kbytes in the kernel, page >> >> >> descriptors in a hole of an online section is accessed. >> >> > >> >> > As I said, following error happens. >> >> > It would be helpful to me if any opinions or comments. >> >> > >> >> >> >> Could you test below patch? >> >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. >> >> >> > Yes, I did it, and no kernel panic happens :-) >> > >> > Same test... >> > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes >> > 2736 >> > [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes >> > [root@Samsung ~]# >> > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes >> > 2730 >> > >> > >> >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone >> >> *zone) >> >> for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { >> >> if (!pfn_valid(pfn)) >> >> continue; >> >> + >> >> page = pfn_to_page(pfn); >> >> >> >> + /* Watch for unexpected holes punched in the memmap */ >> >> + if (!memmap_valid_within(pfn, page, zone)) >> >> + continue; >> >> + >> >> /* Watch out for overlapping nodes */ >> >> if (page_to_nid(page) != zone_to_nid(zone)) >> >> continue; >> >> >> >> >> >> >> > >> > ...Could you please explain about this issue? >> >> The setup_zone_migrate_reserve doesn't check memmap hole. I think >> compaction would have the same problem, too. I don't know there is a >> problem in elsewhere. >> Anyway, I think memmap_valid_within calling whenever walking whole pfn >> range isn't a good solution. We already have pfn_valid. Could we check >> this in there? >> For example, mem_section have a valid pfn range and then valid section >> can test it in pfn_valid. >> >> What do you think about it? >> > > please map another "fake" page when you free memmap. > > For example, prepare a page filled with (1 << PG_reserved). > and replace it with unnecessary memmap rather than freeing a page for memmap. Hmm. I don't got your point. The problem is that we access struct page by pfn number not address. You mean let's remain memmap on hole with changing PageReseved instead of free? > > Then, you can find a PageReserved() page there. However it doesn't have valid > nid,zid,secitonID, I don't think it can cause trouble. > But you may see some BUG_ON() hit. If so, I think ARM guys can add > > #define PageHole(page) (PageReserved(page) && PageXXXXX(page)) > for avoiding BUG_ON() hit. (and this will help to skip unnecessary > memmap_init_zone()) I think it's not a good idea to add new flag. If Kame. Could you review my RFC patch which makes pfn_valid check more tightly on sparsemem? > > By this, you can finaly remove CONFIG_MEMMAP_HOLES_IN_ZONE, and makes ARM's > sparsemem much easier for maintainance. I think above problem isn't only ARM one. It can happen on any sparsemem which do loose pfn_valid check. > > section ID is not important as far as you call page_to_pfn(). > You may able to add WARNING when page_to_pfn() is called against a HOLE page. Yes. It's a good idea. We can add it optionally. But for it, we need PageHole or mem_section bank range check. I like second like my previous posting. -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-13 2:05 ` Minchan Kim 0 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-13 2:05 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 13, 2010 at 9:25 AM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Mon, 12 Jul 2010 19:35:17 +0900 > Minchan Kim <minchan.kim@gmail.com> wrote: > >> Hi, Mel and Kame. >> >> On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: >> > Minchan Kim wrote: >> >> >> >> Hello. >> >> >> > Hello :-) >> > >> >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: >> >> > Russell, >> >> > >> >> > Hi, >> >> > >> >> > Kukjin Kim wrote: >> >> >> Russell wrote: >> >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's >> >> > fine. >> >> >> > That doesn't mean the section size is 16MB. >> >> >> > >> >> >> > As I've already said, the section size has _nothing_ what so ever to >> > do >> >> >> > with the size of memory, or the granularity of the size of memory. >> > ?By >> >> >> > way of illustration, it is perfectly legal to have a section size of >> >> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So >> >> >> > sections do not have to be completely filled. >> >> >> > >> >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem >> > if >> >> > a >> >> >> section has a hole when initializing sparse memory. >> >> >> >> >> >> I identified that a section doesn't need to be a contiguous area of >> >> > physical >> >> >> memory when reading your comment with the fact that the mem_map of a >> >> > section >> >> >> can be smaller than the size of a section. >> >> >> >> >> >> I found, however, the kernel panics when modifying min_free_kbytes file >> > in >> >> >> the proc filesystem if a section has a hole. >> >> >> >> >> >> While processing the change of min_free_kbytes in the kernel, page >> >> >> descriptors in a hole of an online section is accessed. >> >> > >> >> > As I said, following error happens. >> >> > It would be helpful to me if any opinions or comments. >> >> > >> >> >> >> Could you test below patch? >> >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. >> >> >> > Yes, I did it, and no kernel panic happens :-) >> > >> > Same test... >> > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes >> > 2736 >> > [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes >> > [root at Samsung ~]# >> > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes >> > 2730 >> > >> > >> >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone >> >> *zone) >> >> ? ? ? ? for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { >> >> ? ? ? ? ? ? ? ? if (!pfn_valid(pfn)) >> >> ? ? ? ? ? ? ? ? ? ? ? ? continue; >> >> + >> >> ? ? ? ? ? ? ? ? page = pfn_to_page(pfn); >> >> >> >> + ? ? ? ? ? ? ? ?/* Watch for unexpected holes punched in the memmap */ >> >> + ? ? ? ? ? ? ? ?if (!memmap_valid_within(pfn, page, zone)) >> >> + ? ? ? ? ? ? ? ? ? ? ? ?continue; >> >> + >> >> ? ? ? ? ? ? ? ? /* Watch out for overlapping nodes */ >> >> ? ? ? ? ? ? ? ? if (page_to_nid(page) != zone_to_nid(zone)) >> >> ? ? ? ? ? ? ? ? ? ? ? ? continue; >> >> >> >> >> >> >> > >> > ...Could you please explain about this issue? >> >> The setup_zone_migrate_reserve doesn't check memmap hole. I think >> compaction would have the ?same problem, too. I don't know there is a >> problem in elsewhere. >> Anyway, I think memmap_valid_within calling whenever walking whole pfn >> range isn't a good solution. We already have pfn_valid. Could we check >> this in there? >> For example, mem_section have a valid pfn range and then valid section >> can test it in pfn_valid. >> >> What do you think about it? >> > > please map another "fake" page when you free memmap. > > For example, prepare a page filled with (1 << PG_reserved). > and replace it with unnecessary memmap rather than freeing a page for memmap. Hmm. I don't got your point. The problem is that we access struct page by pfn number not address. You mean let's remain memmap on hole with changing PageReseved instead of free? > > Then, you can find a PageReserved() page there. However it doesn't have valid > nid,zid,secitonID, I don't think it can cause trouble. > But you may see some BUG_ON() hit. If so, I think ARM guys can add > > #define PageHole(page) (PageReserved(page) && PageXXXXX(page)) > for avoiding BUG_ON() hit. (and this will help to skip unnecessary > memmap_init_zone()) I think it's not a good idea to add new flag. If Kame. Could you review my RFC patch which makes pfn_valid check more tightly on sparsemem? > > By this, you can finaly remove CONFIG_MEMMAP_HOLES_IN_ZONE, and makes ARM's > sparsemem much easier for maintainance. I think above problem isn't only ARM one. It can happen on any sparsemem which do loose pfn_valid check. > > section ID is not important as far as you call page_to_pfn(). > You may able to add WARNING when page_to_pfn() is called against a HOLE page. Yes. It's a good idea. We can add it optionally. But for it, we need PageHole or mem_section bank range check. I like second like my previous posting. -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-13 2:05 ` Minchan Kim @ 2010-07-13 3:03 ` KAMEZAWA Hiroyuki -1 siblings, 0 replies; 60+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-07-13 3:03 UTC (permalink / raw) To: Minchan Kim Cc: Kukjin Kim, Mel Gorman, linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux On Tue, 13 Jul 2010 11:05:26 +0900 Minchan Kim <minchan.kim@gmail.com> wrote: > On Tue, Jul 13, 2010 at 9:25 AM, KAMEZAWA Hiroyuki > <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > For example, prepare a page filled with (1 << PG_reserved). > > and replace it with unnecessary memmap rather than freeing a page for memmap. > > Hmm. I don't got your point. > The problem is that we access struct page by pfn number not address. > > You mean let's remain memmap on hole with changing PageReseved instead of free? > Like a ZERO_PAGE, preparing RESERVED_PAGE, which is filled with (1 << PG_reserved) as 0x00000400,0x00000400,0x00000400,0x00000400 ..... And map this pages to every hole. Then, you only waste a page to fill all holes because "struct page" is aligned to 4bytes. > I think it's not a good idea to add new flag. If > Kame. Could you review my RFC patch which makes pfn_valid check more > tightly on sparsemem? Sure. Thanks, -Kame ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-13 3:03 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 60+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-07-13 3:03 UTC (permalink / raw) To: linux-arm-kernel On Tue, 13 Jul 2010 11:05:26 +0900 Minchan Kim <minchan.kim@gmail.com> wrote: > On Tue, Jul 13, 2010 at 9:25 AM, KAMEZAWA Hiroyuki > <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > For example, prepare a page filled with (1 << PG_reserved). > > and replace it with unnecessary memmap rather than freeing a page for memmap. > > Hmm. I don't got your point. > The problem is that we access struct page by pfn number not address. > > You mean let's remain memmap on hole with changing PageReseved instead of free? > Like a ZERO_PAGE, preparing RESERVED_PAGE, which is filled with (1 << PG_reserved) as 0x00000400,0x00000400,0x00000400,0x00000400 ..... And map this pages to every hole. Then, you only waste a page to fill all holes because "struct page" is aligned to 4bytes. > I think it's not a good idea to add new flag. If > Kame. Could you review my RFC patch which makes pfn_valid check more > tightly on sparsemem? Sure. Thanks, -Kame ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-13 3:03 ` KAMEZAWA Hiroyuki @ 2010-07-13 9:28 ` Mel Gorman -1 siblings, 0 replies; 60+ messages in thread From: Mel Gorman @ 2010-07-13 9:28 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, Kukjin Kim, linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux On Tue, Jul 13, 2010 at 12:03:15PM +0900, KAMEZAWA Hiroyuki wrote: > On Tue, 13 Jul 2010 11:05:26 +0900 > Minchan Kim <minchan.kim@gmail.com> wrote: > > > On Tue, Jul 13, 2010 at 9:25 AM, KAMEZAWA Hiroyuki > > <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > > For example, prepare a page filled with (1 << PG_reserved). > > > and replace it with unnecessary memmap rather than freeing a page for memmap. > > > > Hmm. I don't got your point. > > The problem is that we access struct page by pfn number not address. > > > > You mean let's remain memmap on hole with changing PageReseved instead of free? > > > Like a ZERO_PAGE, preparing RESERVED_PAGE, > which is filled with (1 << PG_reserved) as > > 0x00000400,0x00000400,0x00000400,0x00000400 > ..... > > And map this pages to every hole. Then, you only waste a page to fill all holes > because "struct page" is aligned to 4bytes. > I like this idea. It would allow memmap_valid_within to be thrown away altogether and it maintains the assumptions of the memory model and sounds "cheap". > > > I think it's not a good idea to add new flag. If > > Kame. Could you review my RFC patch which makes pfn_valid check more > > tightly on sparsemem? > Sure. > > Thanks, > -Kame > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-13 9:28 ` Mel Gorman 0 siblings, 0 replies; 60+ messages in thread From: Mel Gorman @ 2010-07-13 9:28 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 13, 2010 at 12:03:15PM +0900, KAMEZAWA Hiroyuki wrote: > On Tue, 13 Jul 2010 11:05:26 +0900 > Minchan Kim <minchan.kim@gmail.com> wrote: > > > On Tue, Jul 13, 2010 at 9:25 AM, KAMEZAWA Hiroyuki > > <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > > For example, prepare a page filled with (1 << PG_reserved). > > > and replace it with unnecessary memmap rather than freeing a page for memmap. > > > > Hmm. I don't got your point. > > The problem is that we access struct page by pfn number not address. > > > > You mean let's remain memmap on hole with changing PageReseved instead of free? > > > Like a ZERO_PAGE, preparing RESERVED_PAGE, > which is filled with (1 << PG_reserved) as > > 0x00000400,0x00000400,0x00000400,0x00000400 > ..... > > And map this pages to every hole. Then, you only waste a page to fill all holes > because "struct page" is aligned to 4bytes. > I like this idea. It would allow memmap_valid_within to be thrown away altogether and it maintains the assumptions of the memory model and sounds "cheap". > > > I think it's not a good idea to add new flag. If > > Kame. Could you review my RFC patch which makes pfn_valid check more > > tightly on sparsemem? > Sure. > > Thanks, > -Kame > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-13 9:28 ` Mel Gorman @ 2010-07-13 9:38 ` Russell King - ARM Linux -1 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-13 9:38 UTC (permalink / raw) To: Mel Gorman Cc: KAMEZAWA Hiroyuki, Minchan Kim, Kukjin Kim, linux-arm-kernel, linux-samsung-soc, ben-linux On Tue, Jul 13, 2010 at 10:28:52AM +0100, Mel Gorman wrote: > I like this idea. It would allow memmap_valid_within to be thrown away > altogether and it maintains the assumptions of the memory model and > sounds "cheap". It can't work. We map kernel memory with 1MB sections, not pages. ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-13 9:38 ` Russell King - ARM Linux 0 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-13 9:38 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 13, 2010 at 10:28:52AM +0100, Mel Gorman wrote: > I like this idea. It would allow memmap_valid_within to be thrown away > altogether and it maintains the assumptions of the memory model and > sounds "cheap". It can't work. We map kernel memory with 1MB sections, not pages. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-12 10:35 ` Minchan Kim @ 2010-07-13 9:26 ` Mel Gorman -1 siblings, 0 replies; 60+ messages in thread From: Mel Gorman @ 2010-07-13 9:26 UTC (permalink / raw) To: Minchan Kim Cc: Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel, linux-samsung-soc, Russell King - ARM Linux, ben-linux On Mon, Jul 12, 2010 at 07:35:17PM +0900, Minchan Kim wrote: > >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > >> > Russell, > >> > > >> > Hi, > >> > > >> > Kukjin Kim wrote: > >> >> Russell wrote: > >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. That's fine. > >> >> > That doesn't mean the section size is 16MB. > >> >> > > >> >> > As I've already said, the section size has _nothing_ what so ever to do > >> >> > with the size of memory, or the granularity of the size of memory. By > >> >> > way of illustration, it is perfectly legal to have a section size of > >> >> > 256MB but only have 1MB in a section and this is perfectly legal. So > >> >> > sections do not have to be completely filled. > >> >> > This is accurate although there is an expectation that a section is as larger or larger than MAX_ORDER_NR_PAGES. > >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem if > >> > a > >> >> section has a hole when initializing sparse memory. > >> >> > >> >> I identified that a section doesn't need to be a contiguous area of physical > >> >> memory when reading your comment with the fact that the mem_map of a section > >> >> can be smaller than the size of a section. > >> >> This should only happen in one case, on ARM and it breaks assumptions. It is typically assumed that if a page is valid within a block of MAX_ORDER_NR_PAGES, then the entire range is active. If CONFIG_HOLES_IN_ZONE is set, then there may be holes within a MAX_ORDER_NR_PAGES range and there is a performance hit as a result. There is also an assumption that a section is fully populated or empty. Look at the implementation of pfn_valid for sparsemem, it checks if the section has SECTION_HAS_MEM_MAP set and it's the same check for any page within that section. If there are holes in the section, the pfn_valid() check would return true. Check out the comment for memmap_valid_within() which tries to get around this problem on ARM which is the only architecture punching holes in its mem_map. As it's only depended on for the information in one proc file, the performance hit is not a problem but it should not be considered a typical thing. > >> >> I found, however, the kernel panics when modifying min_free_kbytes file > > in > >> >> the proc filesystem if a section has a hole. > >> >> > >> >> While processing the change of min_free_kbytes in the kernel, page > >> >> descriptors in a hole of an online section is accessed. > >> > > >> > As I said, following error happens. > >> > It would be helpful to me if any opinions or comments. > >> > > >> > >> Could you test below patch? > >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. > >> > > Yes, I did it, and no kernel panic happens :-) > > > > Same test... > > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes > > 2736 > > [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes > > [root@Samsung ~]# > > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes > > 2730 > > > > > >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone > >> *zone) > >> for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { > >> if (!pfn_valid(pfn)) > >> continue; > >> + > >> page = pfn_to_page(pfn); > >> > >> + /* Watch for unexpected holes punched in the memmap */ > >> + if (!memmap_valid_within(pfn, page, zone)) > >> + continue; > >> + > >> /* Watch out for overlapping nodes */ > >> if (page_to_nid(page) != zone_to_nid(zone)) > >> continue; > >> > >> > >> > > > > ...Could you please explain about this issue? > The issue is that ARM can create holes within a section of memory which breaks the memory model by allowing pfn_valid() to return true for PFNs backed by no memmap. This causes awkwardness. > The setup_zone_migrate_reserve doesn't check memmap hole. It doesn't. The worst case scenario is where the hole is punched at the beginning of a section, pfn_valid returns true but the PFN is junk and crashes shortly afterwards. This would require a zone to start in a hole which should never happen - it makes no sense. If this is the scenario being encountered, ensure that zones do not start in holes. > I think > compaction would have the same problem, too. > I don't know there is a > problem in elsewhere. > Anyway, I think memmap_valid_within calling whenever walking whole pfn > range isn't a good solution. No, it's not. The rules for pfn_valid and pfn_valid_within are already poorly understood and we shouldn't add additional rules on memmap_valid_within just for ARM if possible. If the problems are being encountered on sparsemem on ARM, I'd prefer to simply see holes not punched in the memmap within a section! > We already have pfn_valid. Could we check > this in there? Ordinarily, yes you would use pfn_valid or pfn_valid_within. It's only on ARM where assumptions of the memory model are violated that memmap_valid_within is used. It's unsatisfactory even there but as it was only used for a proc file, it wasn't important. I'd really hate to see its use increased. At the time it was discussed, a "proper" fix would have consumed as much memory as saved by deleting portions of the memmap and was rejected. > For example, mem_section have a valid pfn range and then valid section > can test it in pfn_valid. > > What do you think about it? > > P.S) > I know Mel is very busy to test to avoid writeback in direct reclaim. I'm also heavily distracted by internal bugs so I'm afraid I didn't read this thread. Hopefully the above information is useful to you. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-13 9:26 ` Mel Gorman 0 siblings, 0 replies; 60+ messages in thread From: Mel Gorman @ 2010-07-13 9:26 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jul 12, 2010 at 07:35:17PM +0900, Minchan Kim wrote: > >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > >> > Russell, > >> > > >> > Hi, > >> > > >> > Kukjin Kim wrote: > >> >> Russell wrote: > >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's fine. > >> >> > That doesn't mean the section size is 16MB. > >> >> > > >> >> > As I've already said, the section size has _nothing_ what so ever to do > >> >> > with the size of memory, or the granularity of the size of memory. ?By > >> >> > way of illustration, it is perfectly legal to have a section size of > >> >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So > >> >> > sections do not have to be completely filled. > >> >> > This is accurate although there is an expectation that a section is as larger or larger than MAX_ORDER_NR_PAGES. > >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem if > >> > a > >> >> section has a hole when initializing sparse memory. > >> >> > >> >> I identified that a section doesn't need to be a contiguous area of physical > >> >> memory when reading your comment with the fact that the mem_map of a section > >> >> can be smaller than the size of a section. > >> >> This should only happen in one case, on ARM and it breaks assumptions. It is typically assumed that if a page is valid within a block of MAX_ORDER_NR_PAGES, then the entire range is active. If CONFIG_HOLES_IN_ZONE is set, then there may be holes within a MAX_ORDER_NR_PAGES range and there is a performance hit as a result. There is also an assumption that a section is fully populated or empty. Look at the implementation of pfn_valid for sparsemem, it checks if the section has SECTION_HAS_MEM_MAP set and it's the same check for any page within that section. If there are holes in the section, the pfn_valid() check would return true. Check out the comment for memmap_valid_within() which tries to get around this problem on ARM which is the only architecture punching holes in its mem_map. As it's only depended on for the information in one proc file, the performance hit is not a problem but it should not be considered a typical thing. > >> >> I found, however, the kernel panics when modifying min_free_kbytes file > > in > >> >> the proc filesystem if a section has a hole. > >> >> > >> >> While processing the change of min_free_kbytes in the kernel, page > >> >> descriptors in a hole of an online section is accessed. > >> > > >> > As I said, following error happens. > >> > It would be helpful to me if any opinions or comments. > >> > > >> > >> Could you test below patch? > >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. > >> > > Yes, I did it, and no kernel panic happens :-) > > > > Same test... > > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes > > 2736 > > [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes > > [root at Samsung ~]# > > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes > > 2730 > > > > > >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone > >> *zone) > >> ? ? ? ? for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { > >> ? ? ? ? ? ? ? ? if (!pfn_valid(pfn)) > >> ? ? ? ? ? ? ? ? ? ? ? ? continue; > >> + > >> ? ? ? ? ? ? ? ? page = pfn_to_page(pfn); > >> > >> + ? ? ? ? ? ? ? ?/* Watch for unexpected holes punched in the memmap */ > >> + ? ? ? ? ? ? ? ?if (!memmap_valid_within(pfn, page, zone)) > >> + ? ? ? ? ? ? ? ? ? ? ? ?continue; > >> + > >> ? ? ? ? ? ? ? ? /* Watch out for overlapping nodes */ > >> ? ? ? ? ? ? ? ? if (page_to_nid(page) != zone_to_nid(zone)) > >> ? ? ? ? ? ? ? ? ? ? ? ? continue; > >> > >> > >> > > > > ...Could you please explain about this issue? > The issue is that ARM can create holes within a section of memory which breaks the memory model by allowing pfn_valid() to return true for PFNs backed by no memmap. This causes awkwardness. > The setup_zone_migrate_reserve doesn't check memmap hole. It doesn't. The worst case scenario is where the hole is punched at the beginning of a section, pfn_valid returns true but the PFN is junk and crashes shortly afterwards. This would require a zone to start in a hole which should never happen - it makes no sense. If this is the scenario being encountered, ensure that zones do not start in holes. > I think > compaction would have the same problem, too. > I don't know there is a > problem in elsewhere. > Anyway, I think memmap_valid_within calling whenever walking whole pfn > range isn't a good solution. No, it's not. The rules for pfn_valid and pfn_valid_within are already poorly understood and we shouldn't add additional rules on memmap_valid_within just for ARM if possible. If the problems are being encountered on sparsemem on ARM, I'd prefer to simply see holes not punched in the memmap within a section! > We already have pfn_valid. Could we check > this in there? Ordinarily, yes you would use pfn_valid or pfn_valid_within. It's only on ARM where assumptions of the memory model are violated that memmap_valid_within is used. It's unsatisfactory even there but as it was only used for a proc file, it wasn't important. I'd really hate to see its use increased. At the time it was discussed, a "proper" fix would have consumed as much memory as saved by deleting portions of the memmap and was rejected. > For example, mem_section have a valid pfn range and then valid section > can test it in pfn_valid. > > What do you think about it? > > P.S) > I know Mel is very busy to test to avoid writeback in direct reclaim. I'm also heavily distracted by internal bugs so I'm afraid I didn't read this thread. Hopefully the above information is useful to you. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-13 9:26 ` Mel Gorman @ 2010-07-13 9:38 ` Russell King - ARM Linux -1 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-13 9:38 UTC (permalink / raw) To: Mel Gorman Cc: Minchan Kim, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel, linux-samsung-soc, ben-linux On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote: > There is also an assumption that a section is fully populated or empty. That is absolutely absurd. So, I have a platform which has 256MB at 64MB intervals in 4 chunks. I can fit 512kB to any slot. It starts at 0x10000000. Do I really need 1024 sparsemem sections to cater for this? ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-13 9:38 ` Russell King - ARM Linux 0 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-13 9:38 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote: > There is also an assumption that a section is fully populated or empty. That is absolutely absurd. So, I have a platform which has 256MB at 64MB intervals in 4 chunks. I can fit 512kB to any slot. It starts at 0x10000000. Do I really need 1024 sparsemem sections to cater for this? ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-13 9:38 ` Russell King - ARM Linux @ 2010-07-13 9:50 ` Mel Gorman -1 siblings, 0 replies; 60+ messages in thread From: Mel Gorman @ 2010-07-13 9:50 UTC (permalink / raw) To: Russell King - ARM Linux Cc: linux-samsung-soc, KAMEZAWA Hiroyuki, Kukjin Kim, Minchan Kim, ben-linux, linux-arm-kernel On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote: > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote: > > There is also an assumption that a section is fully populated or empty. > > That is absolutely absurd. Arguably, violating the memory model by punching unexpected holes in it is a also absurd. > So, I have a platform which has 256MB at > 64MB intervals in 4 chunks. I can fit 512kB to any slot. I'm afraid I'm not quite getting your example. If the granularity of the banks is 64MB and the alignment is 256MB, I don't see what hole you'd be punching anyway. > It starts > at 0x10000000. Do I really need 1024 sparsemem sections to cater for > this? > Not necessarily, just don't punch holes within section boundaries when using sparsemem. If mapping in a PageReserved page is not an option due to complexity (it's possible by remapping pages in kernel space which x86 used to do for discontig) then the size of mem_section as suggested by Minchan Kim would be another option. This would increase the overhead of sparsemem in terms of space and performance. - at worst by an amount matching the memory freed by memmap. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-13 9:50 ` Mel Gorman 0 siblings, 0 replies; 60+ messages in thread From: Mel Gorman @ 2010-07-13 9:50 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote: > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote: > > There is also an assumption that a section is fully populated or empty. > > That is absolutely absurd. Arguably, violating the memory model by punching unexpected holes in it is a also absurd. > So, I have a platform which has 256MB at > 64MB intervals in 4 chunks. I can fit 512kB to any slot. I'm afraid I'm not quite getting your example. If the granularity of the banks is 64MB and the alignment is 256MB, I don't see what hole you'd be punching anyway. > It starts > at 0x10000000. Do I really need 1024 sparsemem sections to cater for > this? > Not necessarily, just don't punch holes within section boundaries when using sparsemem. If mapping in a PageReserved page is not an option due to complexity (it's possible by remapping pages in kernel space which x86 used to do for discontig) then the size of mem_section as suggested by Minchan Kim would be another option. This would increase the overhead of sparsemem in terms of space and performance. - at worst by an amount matching the memory freed by memmap. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-13 9:50 ` Mel Gorman @ 2010-07-13 17:37 ` Russell King - ARM Linux -1 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-13 17:37 UTC (permalink / raw) To: Mel Gorman Cc: Minchan Kim, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel, linux-samsung-soc, ben-linux On Tue, Jul 13, 2010 at 10:50:35AM +0100, Mel Gorman wrote: > On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote: > > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote: > > > There is also an assumption that a section is fully populated or empty. > > > > That is absolutely absurd. > > Arguably, violating the memory model by punching unexpected holes in it > is a also absurd. Well, it's something we've done since 1.2.xx, and actually it's a new requirement that's evolved into the code that these page structs need to be populated. So it's arguably a regression in the MM code which hasn't been detected. > > So, I have a platform which has 256MB at > > 64MB intervals in 4 chunks. I can fit 512kB to any slot. > > I'm afraid I'm not quite getting your example. > > If the granularity of the banks is 64MB and the alignment is 256MB, I > don't see what hole you'd be punching anyway. I didn't say the alignment is 256MB. I was referring to the *size* of memory. > Not necessarily, just don't punch holes within section boundaries when using > sparsemem. Err, you've just changed your story. Let's take an example setup which is perfectly valid. 512kB at 0x10000000. 512kB at 0x14000000. 32MB at 0x1c000000. You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong. So, what do you suggest would be the correct sparsemem settings for this? MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=19 maybe? ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-13 17:37 ` Russell King - ARM Linux 0 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-13 17:37 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 13, 2010 at 10:50:35AM +0100, Mel Gorman wrote: > On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote: > > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote: > > > There is also an assumption that a section is fully populated or empty. > > > > That is absolutely absurd. > > Arguably, violating the memory model by punching unexpected holes in it > is a also absurd. Well, it's something we've done since 1.2.xx, and actually it's a new requirement that's evolved into the code that these page structs need to be populated. So it's arguably a regression in the MM code which hasn't been detected. > > So, I have a platform which has 256MB at > > 64MB intervals in 4 chunks. I can fit 512kB to any slot. > > I'm afraid I'm not quite getting your example. > > If the granularity of the banks is 64MB and the alignment is 256MB, I > don't see what hole you'd be punching anyway. I didn't say the alignment is 256MB. I was referring to the *size* of memory. > Not necessarily, just don't punch holes within section boundaries when using > sparsemem. Err, you've just changed your story. Let's take an example setup which is perfectly valid. 512kB at 0x10000000. 512kB at 0x14000000. 32MB at 0x1c000000. You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong. So, what do you suggest would be the correct sparsemem settings for this? MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=19 maybe? ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-13 17:37 ` Russell King - ARM Linux @ 2010-07-13 20:32 ` Mel Gorman -1 siblings, 0 replies; 60+ messages in thread From: Mel Gorman @ 2010-07-13 20:32 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Minchan Kim, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel, linux-samsung-soc, ben-linux On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote: > On Tue, Jul 13, 2010 at 10:50:35AM +0100, Mel Gorman wrote: > > On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote: > > > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote: > > > > There is also an assumption that a section is fully populated or empty. > > > > > > That is absolutely absurd. > > > > Arguably, violating the memory model by punching unexpected holes in it > > is a also absurd. > > Well, it's something we've done since 1.2.xx, and actually it's a new > requirement that's evolved into the code that these page structs need > to be populated. It's not that new. It evolved over a long period of time probably starting from where the buddy bitmap was moved into the page->flags to save space. Over time, this assumptions became more strict. Anyway, this is neither here nor there so lets not fall down that hole. > So it's arguably a regression in the MM code which > hasn't been detected. > > > > So, I have a platform which has 256MB at > > > 64MB intervals in 4 chunks. I can fit 512kB to any slot. > > > > I'm afraid I'm not quite getting your example. > > > > If the granularity of the banks is 64MB and the alignment is 256MB, I > > don't see what hole you'd be punching anyway. > > I didn't say the alignment is 256MB. I was referring to the *size* of > memory. > > > Not necessarily, just don't punch holes within section boundaries when using > > sparsemem. > > Err, you've just changed your story. > How so? I'd rather you didn't punch holes in the memmap allocated by sparsemem at all. > Let's take an example setup which is perfectly valid. 512kB at 0x10000000. > 512kB at 0x14000000. 32MB at 0x1c000000. > Ok, that is a clear example, thanks. > You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong. > Not wrong. It's fine as long as you're ok with some unnecessary memmap being allocated. There is wastage of memory but functionally it'll be fine with existing sparsemem and the assumptions it makes. Obviously you're not ok with this wastage or this discussion would not even be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage. > So, what do you suggest would be the correct sparsemem settings for this? > MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=19 maybe? > I haven't worked it out but I would think that wastes more memory than having the larger section with some memmap wasted so I'm not going to advocate it even though it'd "work". I haven't researched this so apologies if it turns out to be stupid but I think the bit SECTION_MAP_LAST_BIT is actually unused and should be safe to use. Has the option being considered of using this bit to mean "section has holes punched in it". If set, the architecture must provide an additional arch_holey_section_pfn_valid() that does additional checking based on information sparsemem doesn't have? This would avoid the worst of the performance issues of making pfn_valid() slower without increasing the size of mem_section. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-13 20:32 ` Mel Gorman 0 siblings, 0 replies; 60+ messages in thread From: Mel Gorman @ 2010-07-13 20:32 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote: > On Tue, Jul 13, 2010 at 10:50:35AM +0100, Mel Gorman wrote: > > On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote: > > > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote: > > > > There is also an assumption that a section is fully populated or empty. > > > > > > That is absolutely absurd. > > > > Arguably, violating the memory model by punching unexpected holes in it > > is a also absurd. > > Well, it's something we've done since 1.2.xx, and actually it's a new > requirement that's evolved into the code that these page structs need > to be populated. It's not that new. It evolved over a long period of time probably starting from where the buddy bitmap was moved into the page->flags to save space. Over time, this assumptions became more strict. Anyway, this is neither here nor there so lets not fall down that hole. > So it's arguably a regression in the MM code which > hasn't been detected. > > > > So, I have a platform which has 256MB at > > > 64MB intervals in 4 chunks. I can fit 512kB to any slot. > > > > I'm afraid I'm not quite getting your example. > > > > If the granularity of the banks is 64MB and the alignment is 256MB, I > > don't see what hole you'd be punching anyway. > > I didn't say the alignment is 256MB. I was referring to the *size* of > memory. > > > Not necessarily, just don't punch holes within section boundaries when using > > sparsemem. > > Err, you've just changed your story. > How so? I'd rather you didn't punch holes in the memmap allocated by sparsemem at all. > Let's take an example setup which is perfectly valid. 512kB at 0x10000000. > 512kB at 0x14000000. 32MB at 0x1c000000. > Ok, that is a clear example, thanks. > You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong. > Not wrong. It's fine as long as you're ok with some unnecessary memmap being allocated. There is wastage of memory but functionally it'll be fine with existing sparsemem and the assumptions it makes. Obviously you're not ok with this wastage or this discussion would not even be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage. > So, what do you suggest would be the correct sparsemem settings for this? > MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=19 maybe? > I haven't worked it out but I would think that wastes more memory than having the larger section with some memmap wasted so I'm not going to advocate it even though it'd "work". I haven't researched this so apologies if it turns out to be stupid but I think the bit SECTION_MAP_LAST_BIT is actually unused and should be safe to use. Has the option being considered of using this bit to mean "section has holes punched in it". If set, the architecture must provide an additional arch_holey_section_pfn_valid() that does additional checking based on information sparsemem doesn't have? This would avoid the worst of the performance issues of making pfn_valid() slower without increasing the size of mem_section. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-13 20:32 ` Mel Gorman @ 2010-07-13 23:59 ` Minchan Kim -1 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-13 23:59 UTC (permalink / raw) To: Mel Gorman Cc: Russell King - ARM Linux, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel, linux-samsung-soc, ben-linux Hi, Mel. On Wed, Jul 14, 2010 at 5:32 AM, Mel Gorman <mel@csn.ul.ie> wrote: > On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote: >> On Tue, Jul 13, 2010 at 10:50:35AM +0100, Mel Gorman wrote: >> > On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote: >> > > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote: >> > > > There is also an assumption that a section is fully populated or empty. >> > > >> > > That is absolutely absurd. >> > >> > Arguably, violating the memory model by punching unexpected holes in it >> > is a also absurd. >> >> Well, it's something we've done since 1.2.xx, and actually it's a new >> requirement that's evolved into the code that these page structs need >> to be populated. > > It's not that new. It evolved over a long period of time probably starting > from where the buddy bitmap was moved into the page->flags to save space. > Over time, this assumptions became more strict. Anyway, this is neither here > nor there so lets not fall down that hole. > >> So it's arguably a regression in the MM code which >> hasn't been detected. >> >> > > So, I have a platform which has 256MB at >> > > 64MB intervals in 4 chunks. I can fit 512kB to any slot. >> > >> > I'm afraid I'm not quite getting your example. >> > >> > If the granularity of the banks is 64MB and the alignment is 256MB, I >> > don't see what hole you'd be punching anyway. >> >> I didn't say the alignment is 256MB. I was referring to the *size* of >> memory. >> >> > Not necessarily, just don't punch holes within section boundaries when using >> > sparsemem. >> >> Err, you've just changed your story. >> > > How so? I'd rather you didn't punch holes in the memmap allocated by sparsemem > at all. > >> Let's take an example setup which is perfectly valid. 512kB at 0x10000000. >> 512kB at 0x14000000. 32MB at 0x1c000000. >> > > Ok, that is a clear example, thanks. > >> You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong. >> > > Not wrong. It's fine as long as you're ok with some unnecessary memmap > being allocated. There is wastage of memory but functionally it'll be > fine with existing sparsemem and the assumptions it makes. > > Obviously you're not ok with this wastage or this discussion would not even > be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage. > >> So, what do you suggest would be the correct sparsemem settings for this? >> MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=19 maybe? >> > > I haven't worked it out but I would think that wastes more memory than having > the larger section with some memmap wasted so I'm not going to advocate it > even though it'd "work". > > I haven't researched this so apologies if it turns out to be stupid but > I think the bit SECTION_MAP_LAST_BIT is actually unused and should be > safe to use. Has the option being considered of using this bit to mean > "section has holes punched in it". If set, the architecture must provide > an additional arch_holey_section_pfn_valid() that does additional checking > based on information sparsemem doesn't have? This would avoid the worst of > the performance issues of making pfn_valid() slower without increasing the > size of mem_section. I think it's a very good idea. We can know if any section has the hole with memory_present so can set the bit in there. Then, if pfn_valid find the section which has hole, it can call arch_holey_section_pfn_valid. But I think Kukjin case's best solution is to make section size 16M, not 256M. Regardless of this, Your idea is the direction we should proceed, I think. Thanks, Mel. > > -- > Mel Gorman > Part-time Phd Student Linux Technology Center > University of Limerick IBM Dublin Software Lab > -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-13 23:59 ` Minchan Kim 0 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-13 23:59 UTC (permalink / raw) To: linux-arm-kernel Hi, Mel. On Wed, Jul 14, 2010 at 5:32 AM, Mel Gorman <mel@csn.ul.ie> wrote: > On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote: >> On Tue, Jul 13, 2010 at 10:50:35AM +0100, Mel Gorman wrote: >> > On Tue, Jul 13, 2010 at 10:38:15AM +0100, Russell King - ARM Linux wrote: >> > > On Tue, Jul 13, 2010 at 10:26:58AM +0100, Mel Gorman wrote: >> > > > There is also an assumption that a section is fully populated or empty. >> > > >> > > That is absolutely absurd. >> > >> > Arguably, violating the memory model by punching unexpected holes in it >> > is a also absurd. >> >> Well, it's something we've done since 1.2.xx, and actually it's a new >> requirement that's evolved into the code that these page structs need >> to be populated. > > It's not that new. It evolved over a long period of time probably starting > from where the buddy bitmap was moved into the page->flags to save space. > Over time, this assumptions became more strict. Anyway, this is neither here > nor there so lets not fall down that hole. > >> So it's arguably a regression in the MM code which >> hasn't been detected. >> >> > > So, I have a platform which has 256MB at >> > > 64MB intervals in 4 chunks. I can fit 512kB to any slot. >> > >> > I'm afraid I'm not quite getting your example. >> > >> > If the granularity of the banks is 64MB and the alignment is 256MB, I >> > don't see what hole you'd be punching anyway. >> >> I didn't say the alignment is 256MB. ?I was referring to the *size* of >> memory. >> >> > Not necessarily, just don't punch holes within section boundaries when using >> > sparsemem. >> >> Err, you've just changed your story. >> > > How so? I'd rather you didn't punch holes in the memmap allocated by sparsemem > at all. > >> Let's take an example setup which is perfectly valid. ?512kB at 0x10000000. >> 512kB at 0x14000000. 32MB at 0x1c000000. >> > > Ok, that is a clear example, thanks. > >> You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong. >> > > Not wrong. It's fine as long as you're ok with some unnecessary memmap > being allocated. There is wastage of memory but functionally it'll be > fine with existing sparsemem and the assumptions it makes. > > Obviously you're not ok with this wastage or this discussion would not even > be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage. > >> So, what do you suggest would be the correct sparsemem settings for this? >> MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=19 maybe? >> > > I haven't worked it out but I would think that wastes more memory than having > the larger section with some memmap wasted so I'm not going to advocate it > even though it'd "work". > > I haven't researched this so apologies if it turns out to be stupid but > I think the bit SECTION_MAP_LAST_BIT is actually unused and should be > safe to use. Has the option being considered of using this bit to mean > "section has holes punched in it". If set, the architecture must provide > an additional arch_holey_section_pfn_valid() that does additional checking > based on information sparsemem doesn't have? ?This would avoid the worst of > the performance issues of making pfn_valid() slower without increasing the > size of mem_section. I think it's a very good idea. We can know if any section has the hole with memory_present so can set the bit in there. Then, if pfn_valid find the section which has hole, it can call arch_holey_section_pfn_valid. But I think Kukjin case's best solution is to make section size 16M, not 256M. Regardless of this, Your idea is the direction we should proceed, I think. Thanks, Mel. > > -- > Mel Gorman > Part-time Phd Student ? ? ? ? ? ? ? ? ? ? ? ? ?Linux Technology Center > University of Limerick ? ? ? ? ? ? ? ? ? ? ? ? IBM Dublin Software Lab > -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-13 23:59 ` Minchan Kim @ 2010-07-14 8:49 ` Russell King - ARM Linux -1 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-14 8:49 UTC (permalink / raw) To: Minchan Kim Cc: Mel Gorman, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel, linux-samsung-soc, ben-linux On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote: > But I think Kukjin case's best solution is to make section size 16M, not 256M. > Regardless of this, Your idea is the direction we should proceed, I think. So what if someone decides to fit 8MB DRAMs to the board? ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-14 8:49 ` Russell King - ARM Linux 0 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-14 8:49 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote: > But I think Kukjin case's best solution is to make section size 16M, not 256M. > Regardless of this, Your idea is the direction we should proceed, I think. So what if someone decides to fit 8MB DRAMs to the board? ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-14 8:49 ` Russell King - ARM Linux @ 2010-07-14 11:04 ` Minchan Kim -1 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-14 11:04 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Mel Gorman, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel, linux-samsung-soc, ben-linux On Wed, Jul 14, 2010 at 5:49 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote: >> But I think Kukjin case's best solution is to make section size 16M, not 256M. >> Regardless of this, Your idea is the direction we should proceed, I think. > > So what if someone decides to fit 8MB DRAMs to the board? > Hmm. It might be a problem. Should we consider all configuration? Then, in below cases ram size never can change? How does it acked and merged? I don't know below board can't configurable ram size. Just out of curiosity. # line filename / context / line 1 76 arch/arm/mach-clps711x/include/mach/memory.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 24 2 26 arch/arm/mach-lh7a40x/include/mach/memory.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 24 3 62 arch/arm/mach-realview/include/mach/memory.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 28 4 38 arch/arm/mach-rpc/include/mach/memory.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 26 5 21 arch/arm/mach-s5pv210/include/mach/memory.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 28 6 46 arch/arm/mach-sa1100/include/mach/memory.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 27 7 19 drivers/staging/msm/memory_ll.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 25 -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-14 11:04 ` Minchan Kim 0 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-14 11:04 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jul 14, 2010 at 5:49 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote: >> But I think Kukjin case's best solution is to make section size 16M, not 256M. >> Regardless of this, Your idea is the direction we should proceed, I think. > > So what if someone decides to fit 8MB DRAMs to the board? > Hmm. It might be a problem. Should we consider all configuration? Then, in below cases ram size never can change? How does it acked and merged? I don't know below board can't configurable ram size. Just out of curiosity. # line filename / context / line 1 76 arch/arm/mach-clps711x/include/mach/memory.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 24 2 26 arch/arm/mach-lh7a40x/include/mach/memory.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 24 3 62 arch/arm/mach-realview/include/mach/memory.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 28 4 38 arch/arm/mach-rpc/include/mach/memory.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 26 5 21 arch/arm/mach-s5pv210/include/mach/memory.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 28 6 46 arch/arm/mach-sa1100/include/mach/memory.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 27 7 19 drivers/staging/msm/memory_ll.h <<SECTION_SIZE_BITS>> #define SECTION_SIZE_BITS 25 -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-14 11:04 ` Minchan Kim @ 2010-07-14 20:49 ` Russell King - ARM Linux -1 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-14 20:49 UTC (permalink / raw) To: Minchan Kim Cc: Mel Gorman, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel, linux-samsung-soc, ben-linux On Wed, Jul 14, 2010 at 08:04:07PM +0900, Minchan Kim wrote: > On Wed, Jul 14, 2010 at 5:49 PM, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: > > On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote: > >> But I think Kukjin case's best solution is to make section size 16M, not 256M. > >> Regardless of this, Your idea is the direction we should proceed, I think. > > > > So what if someone decides to fit 8MB DRAMs to the board? > > > > Hmm. It might be a problem. > Should we consider all configuration? > > Then, in below cases ram size never can change? > How does it acked and merged? > I don't know below board can't configurable ram size. Just out of curiosity. All of these, SECTION_SIZE_BITS refers to the _spacing_ of memory, not the minimum size of DRAM which can ever be fitted. Reducing SECTION_SIZE_BITS is really not an option for some of these platforms - just look at where memory starts in physical space: arch/arm/mach-clps711x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000) arch/arm/mach-clps711x/include/mach/memory.h:#define SECTION_SIZE_BITS 24 arch/arm/mach-lh7a40x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000) arch/arm/mach-lh7a40x/include/mach/memory.h:#define SECTION_SIZE_BITS 24 arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x70000000) arch/arm/mach-realview/include/mach/memory.h:#define SECTION_SIZE_BITS 28 arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x00000000) arch/arm/mach-rpc/include/mach/memory.h:#define PHYS_OFFSET UL(0x10000000) arch/arm/mach-rpc/include/mach/memory.h:#define SECTION_SIZE_BITS 26 arch/arm/mach-s5pv210/include/mach/memory.h:#define PHYS_OFFSET UL(0x20000000) arch/arm/mach-s5pv210/include/mach/memory.h:#define SECTION_SIZE_BITS 28 arch/arm/mach-sa1100/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000) arch/arm/mach-sa1100/include/mach/memory.h:#define SECTION_SIZE_BITS 27 If physical memory starts at 3GB, with SECTION_SIZE_BITS set at 24, there's 256 sections already. Reducing SECTION_SIZE_BITS to the multiple of RAM size will increase the number of sections to many thousands. Clearly, sparsemem doesn't handle sparse memory very well, and is defficient. I don't know what the answer is to this; I'm totally fed up with being pushed into using things, and then later down the line being told that its all wrong and shouldn't be used like we are. Maybe we should just delete these sub-architectures which use sparsemem and say to people that the kernel just doesn't support their memory layouts. <depressed>. ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-14 20:49 ` Russell King - ARM Linux 0 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-14 20:49 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jul 14, 2010 at 08:04:07PM +0900, Minchan Kim wrote: > On Wed, Jul 14, 2010 at 5:49 PM, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: > > On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote: > >> But I think Kukjin case's best solution is to make section size 16M, not 256M. > >> Regardless of this, Your idea is the direction we should proceed, I think. > > > > So what if someone decides to fit 8MB DRAMs to the board? > > > > Hmm. It might be a problem. > Should we consider all configuration? > > Then, in below cases ram size never can change? > How does it acked and merged? > I don't know below board can't configurable ram size. Just out of curiosity. All of these, SECTION_SIZE_BITS refers to the _spacing_ of memory, not the minimum size of DRAM which can ever be fitted. Reducing SECTION_SIZE_BITS is really not an option for some of these platforms - just look at where memory starts in physical space: arch/arm/mach-clps711x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000) arch/arm/mach-clps711x/include/mach/memory.h:#define SECTION_SIZE_BITS 24 arch/arm/mach-lh7a40x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000) arch/arm/mach-lh7a40x/include/mach/memory.h:#define SECTION_SIZE_BITS 24 arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x70000000) arch/arm/mach-realview/include/mach/memory.h:#define SECTION_SIZE_BITS 28 arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x00000000) arch/arm/mach-rpc/include/mach/memory.h:#define PHYS_OFFSET UL(0x10000000) arch/arm/mach-rpc/include/mach/memory.h:#define SECTION_SIZE_BITS 26 arch/arm/mach-s5pv210/include/mach/memory.h:#define PHYS_OFFSET UL(0x20000000) arch/arm/mach-s5pv210/include/mach/memory.h:#define SECTION_SIZE_BITS 28 arch/arm/mach-sa1100/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000) arch/arm/mach-sa1100/include/mach/memory.h:#define SECTION_SIZE_BITS 27 If physical memory starts at 3GB, with SECTION_SIZE_BITS set at 24, there's 256 sections already. Reducing SECTION_SIZE_BITS to the multiple of RAM size will increase the number of sections to many thousands. Clearly, sparsemem doesn't handle sparse memory very well, and is defficient. I don't know what the answer is to this; I'm totally fed up with being pushed into using things, and then later down the line being told that its all wrong and shouldn't be used like we are. Maybe we should just delete these sub-architectures which use sparsemem and say to people that the kernel just doesn't support their memory layouts. <depressed>. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-14 20:49 ` Russell King - ARM Linux @ 2010-07-16 0:07 ` Minchan Kim -1 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-16 0:07 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Mel Gorman, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel, linux-samsung-soc, ben-linux On Thu, Jul 15, 2010 at 5:49 AM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Wed, Jul 14, 2010 at 08:04:07PM +0900, Minchan Kim wrote: >> On Wed, Jul 14, 2010 at 5:49 PM, Russell King - ARM Linux >> <linux@arm.linux.org.uk> wrote: >> > On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote: >> >> But I think Kukjin case's best solution is to make section size 16M, not 256M. >> >> Regardless of this, Your idea is the direction we should proceed, I think. >> > >> > So what if someone decides to fit 8MB DRAMs to the board? >> > >> >> Hmm. It might be a problem. >> Should we consider all configuration? >> >> Then, in below cases ram size never can change? >> How does it acked and merged? >> I don't know below board can't configurable ram size. Just out of curiosity. > > All of these, SECTION_SIZE_BITS refers to the _spacing_ of memory, not > the minimum size of DRAM which can ever be fitted. > > Reducing SECTION_SIZE_BITS is really not an option for some of these > platforms - just look at where memory starts in physical space: > > arch/arm/mach-clps711x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000) > arch/arm/mach-clps711x/include/mach/memory.h:#define SECTION_SIZE_BITS 24 > arch/arm/mach-lh7a40x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000) > arch/arm/mach-lh7a40x/include/mach/memory.h:#define SECTION_SIZE_BITS 24 > arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x70000000) > arch/arm/mach-realview/include/mach/memory.h:#define SECTION_SIZE_BITS 28 > arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x00000000) > arch/arm/mach-rpc/include/mach/memory.h:#define PHYS_OFFSET UL(0x10000000) > arch/arm/mach-rpc/include/mach/memory.h:#define SECTION_SIZE_BITS 26 > arch/arm/mach-s5pv210/include/mach/memory.h:#define PHYS_OFFSET UL(0x20000000) > arch/arm/mach-s5pv210/include/mach/memory.h:#define SECTION_SIZE_BITS 28 > arch/arm/mach-sa1100/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000) > arch/arm/mach-sa1100/include/mach/memory.h:#define SECTION_SIZE_BITS 27 > > If physical memory starts at 3GB, with SECTION_SIZE_BITS set at 24, > there's 256 sections already. Reducing SECTION_SIZE_BITS to the multiple > of RAM size will increase the number of sections to many thousands. > Clearly, sparsemem doesn't handle sparse memory very well, and is > defficient. > > I don't know what the answer is to this; I'm totally fed up with being > pushed into using things, and then later down the line being told that > its all wrong and shouldn't be used like we are. Maybe we should just > delete these sub-architectures which use sparsemem and say to people > that the kernel just doesn't support their memory layouts. > > <depressed>. > Sorry for late response. Okay. I think Kame's recent solution is a good. http://www.spinics.net/lists/linux-mm/msg06594.html It doesn't need overriding pfn_valid, doesn't need more space. And we can optimize it with overriding SECTION_MAP_LAST_BIT with SECTION_MAP_HAS_HOLE to avoid non-hole section overhead. If we all agree, I will make the patch. All agree? -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-16 0:07 ` Minchan Kim 0 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-16 0:07 UTC (permalink / raw) To: linux-arm-kernel On Thu, Jul 15, 2010 at 5:49 AM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Wed, Jul 14, 2010 at 08:04:07PM +0900, Minchan Kim wrote: >> On Wed, Jul 14, 2010 at 5:49 PM, Russell King - ARM Linux >> <linux@arm.linux.org.uk> wrote: >> > On Wed, Jul 14, 2010 at 08:59:06AM +0900, Minchan Kim wrote: >> >> But I think Kukjin case's best solution is to make section size 16M, not 256M. >> >> Regardless of this, Your idea is the direction we should proceed, I think. >> > >> > So what if someone decides to fit 8MB DRAMs to the board? >> > >> >> Hmm. It might be a problem. >> Should we consider all configuration? >> >> Then, in below cases ram size never can change? >> How does it acked and merged? >> I don't know below board can't configurable ram size. Just out of curiosity. > > All of these, SECTION_SIZE_BITS refers to the _spacing_ of memory, not > the minimum size of DRAM which can ever be fitted. > > Reducing SECTION_SIZE_BITS is really not an option for some of these > platforms - just look at where memory starts in physical space: > > arch/arm/mach-clps711x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000) > arch/arm/mach-clps711x/include/mach/memory.h:#define SECTION_SIZE_BITS ?24 > arch/arm/mach-lh7a40x/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000) > arch/arm/mach-lh7a40x/include/mach/memory.h:#define SECTION_SIZE_BITS ? 24 > arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x70000000) > arch/arm/mach-realview/include/mach/memory.h:#define SECTION_SIZE_BITS ?28 > arch/arm/mach-realview/include/mach/memory.h:#define PHYS_OFFSET UL(0x00000000) > arch/arm/mach-rpc/include/mach/memory.h:#define PHYS_OFFSET UL(0x10000000) > arch/arm/mach-rpc/include/mach/memory.h:#define SECTION_SIZE_BITS ? ? ? 26 > arch/arm/mach-s5pv210/include/mach/memory.h:#define PHYS_OFFSET UL(0x20000000) > arch/arm/mach-s5pv210/include/mach/memory.h:#define SECTION_SIZE_BITS ? 28 > arch/arm/mach-sa1100/include/mach/memory.h:#define PHYS_OFFSET UL(0xc0000000) > arch/arm/mach-sa1100/include/mach/memory.h:#define SECTION_SIZE_BITS ? ?27 > > If physical memory starts at 3GB, with SECTION_SIZE_BITS set at 24, > there's 256 sections already. ?Reducing SECTION_SIZE_BITS to the multiple > of RAM size will increase the number of sections to many thousands. > Clearly, sparsemem doesn't handle sparse memory very well, and is > defficient. > > I don't know what the answer is to this; I'm totally fed up with being > pushed into using things, and then later down the line being told that > its all wrong and shouldn't be used like we are. ?Maybe we should just > delete these sub-architectures which use sparsemem and say to people > that the kernel just doesn't support their memory layouts. > > <depressed>. > Sorry for late response. Okay. I think Kame's recent solution is a good. http://www.spinics.net/lists/linux-mm/msg06594.html It doesn't need overriding pfn_valid, doesn't need more space. And we can optimize it with overriding SECTION_MAP_LAST_BIT with SECTION_MAP_HAS_HOLE to avoid non-hole section overhead. If we all agree, I will make the patch. All agree? -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-13 20:32 ` Mel Gorman @ 2010-07-14 8:59 ` Russell King - ARM Linux -1 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-14 8:59 UTC (permalink / raw) To: Mel Gorman Cc: Minchan Kim, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel, linux-samsung-soc, ben-linux On Tue, Jul 13, 2010 at 09:32:50PM +0100, Mel Gorman wrote: > On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote: > > You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong. > > > > Not wrong. It's fine as long as you're ok with some unnecessary memmap > being allocated. There is wastage of memory but functionally it'll be > fine with existing sparsemem and the assumptions it makes. > > Obviously you're not ok with this wastage or this discussion would not even > be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage. Absolutely I'm not ok with this wastage - it's 127 + 127 + 64 pages, or 1.2MB - that's 4% of system memory wasted in stuff which isn't going to be used. Moreover, the mem_map array for each populated bank is 512K - which is the size of the RAM in the first two banks. At that point, there's absolutely no point in populating them. The overhead of populating those banks exactly equals the gain from populating them. Sparsemem is absolutely absurd in this requirement - it can't handle sparse memory efficiently without wasting lots of memory in the process. > I haven't researched this so apologies if it turns out to be stupid but > I think the bit SECTION_MAP_LAST_BIT is actually unused and should be > safe to use. Has the option being considered of using this bit to mean > "section has holes punched in it". If set, the architecture must provide > an additional arch_holey_section_pfn_valid() that does additional checking > based on information sparsemem doesn't have? This would avoid the worst of > the performance issues of making pfn_valid() slower without increasing the > size of mem_section. As I've already said, how about just allowing pfn_valid() to be overridden by architectures, even for the sparsemem case. We have a perfectly good pfn_valid() implementation that'll work across the board, and will fix this issue. ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-14 8:59 ` Russell King - ARM Linux 0 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-14 8:59 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 13, 2010 at 09:32:50PM +0100, Mel Gorman wrote: > On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote: > > You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong. > > > > Not wrong. It's fine as long as you're ok with some unnecessary memmap > being allocated. There is wastage of memory but functionally it'll be > fine with existing sparsemem and the assumptions it makes. > > Obviously you're not ok with this wastage or this discussion would not even > be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage. Absolutely I'm not ok with this wastage - it's 127 + 127 + 64 pages, or 1.2MB - that's 4% of system memory wasted in stuff which isn't going to be used. Moreover, the mem_map array for each populated bank is 512K - which is the size of the RAM in the first two banks. At that point, there's absolutely no point in populating them. The overhead of populating those banks exactly equals the gain from populating them. Sparsemem is absolutely absurd in this requirement - it can't handle sparse memory efficiently without wasting lots of memory in the process. > I haven't researched this so apologies if it turns out to be stupid but > I think the bit SECTION_MAP_LAST_BIT is actually unused and should be > safe to use. Has the option being considered of using this bit to mean > "section has holes punched in it". If set, the architecture must provide > an additional arch_holey_section_pfn_valid() that does additional checking > based on information sparsemem doesn't have? This would avoid the worst of > the performance issues of making pfn_valid() slower without increasing the > size of mem_section. As I've already said, how about just allowing pfn_valid() to be overridden by architectures, even for the sparsemem case. We have a perfectly good pfn_valid() implementation that'll work across the board, and will fix this issue. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-14 8:59 ` Russell King - ARM Linux @ 2010-07-14 13:14 ` Mel Gorman -1 siblings, 0 replies; 60+ messages in thread From: Mel Gorman @ 2010-07-14 13:14 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Minchan Kim, Kukjin Kim, KAMEZAWA Hiroyuki, linux-arm-kernel, linux-samsung-soc, ben-linux On Wed, Jul 14, 2010 at 09:59:18AM +0100, Russell King - ARM Linux wrote: > On Tue, Jul 13, 2010 at 09:32:50PM +0100, Mel Gorman wrote: > > On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote: > > > You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong. > > > > > > > Not wrong. It's fine as long as you're ok with some unnecessary memmap > > being allocated. There is wastage of memory but functionally it'll be > > fine with existing sparsemem and the assumptions it makes. > > > > Obviously you're not ok with this wastage or this discussion would not even > > be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage. > > Absolutely I'm not ok with this wastage - it's 127 + 127 + 64 pages, or > 1.2MB - that's 4% of system memory wasted in stuff which isn't going to > be used. > > Moreover, the mem_map array for each populated bank is 512K - which is > the size of the RAM in the first two banks. At that point, there's > absolutely no point in populating them. The overhead of populating > those banks exactly equals the gain from populating them. > > Sparsemem is absolutely absurd in this requirement - it can't handle > sparse memory efficiently without wasting lots of memory in the > process. > I wasn't around when sparsemem was designed, but I strongly suspect it wasn't considered that a bank would only be 512K. The requirement of a section being fully populated to allow pfn_valid to be very cheap still seems very reasonable to me in the majority of cases. It's just no ok in ARMs case because of the size of banks. > > I haven't researched this so apologies if it turns out to be stupid but > > I think the bit SECTION_MAP_LAST_BIT is actually unused and should be > > safe to use. Has the option being considered of using this bit to mean > > "section has holes punched in it". If set, the architecture must provide > > an additional arch_holey_section_pfn_valid() that does additional checking > > based on information sparsemem doesn't have? This would avoid the worst of > > the performance issues of making pfn_valid() slower without increasing the > > size of mem_section. > > As I've already said, how about just allowing pfn_valid() to be overridden > by architectures, even for the sparsemem case. If nothing else pans out, I won't resist that approach. It's not my No.1 perference because it results in SparseMem-on-ARM behaving one way and SparseMem-on-everything-else behaving another with respect to pfn_valid(). I'd prefer that an architecture-specific pfn_valid() only be called for the sections that are known to have holes punched in them. > We have a perfectly good > pfn_valid() implementation that'll work across the board, and will fix > this issue. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-14 13:14 ` Mel Gorman 0 siblings, 0 replies; 60+ messages in thread From: Mel Gorman @ 2010-07-14 13:14 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jul 14, 2010 at 09:59:18AM +0100, Russell King - ARM Linux wrote: > On Tue, Jul 13, 2010 at 09:32:50PM +0100, Mel Gorman wrote: > > On Tue, Jul 13, 2010 at 06:37:44PM +0100, Russell King - ARM Linux wrote: > > > You're saying that MAX_PHYSMEM_BITS=29 SECTION_SIZE_BITS=26 is wrong. > > > > > > > Not wrong. It's fine as long as you're ok with some unnecessary memmap > > being allocated. There is wastage of memory but functionally it'll be > > fine with existing sparsemem and the assumptions it makes. > > > > Obviously you're not ok with this wastage or this discussion would not even > > be happening and with SECTION_SIZE_BITS=26, there is quite a bit of wastage. > > Absolutely I'm not ok with this wastage - it's 127 + 127 + 64 pages, or > 1.2MB - that's 4% of system memory wasted in stuff which isn't going to > be used. > > Moreover, the mem_map array for each populated bank is 512K - which is > the size of the RAM in the first two banks. At that point, there's > absolutely no point in populating them. The overhead of populating > those banks exactly equals the gain from populating them. > > Sparsemem is absolutely absurd in this requirement - it can't handle > sparse memory efficiently without wasting lots of memory in the > process. > I wasn't around when sparsemem was designed, but I strongly suspect it wasn't considered that a bank would only be 512K. The requirement of a section being fully populated to allow pfn_valid to be very cheap still seems very reasonable to me in the majority of cases. It's just no ok in ARMs case because of the size of banks. > > I haven't researched this so apologies if it turns out to be stupid but > > I think the bit SECTION_MAP_LAST_BIT is actually unused and should be > > safe to use. Has the option being considered of using this bit to mean > > "section has holes punched in it". If set, the architecture must provide > > an additional arch_holey_section_pfn_valid() that does additional checking > > based on information sparsemem doesn't have? This would avoid the worst of > > the performance issues of making pfn_valid() slower without increasing the > > size of mem_section. > > As I've already said, how about just allowing pfn_valid() to be overridden > by architectures, even for the sparsemem case. If nothing else pans out, I won't resist that approach. It's not my No.1 perference because it results in SparseMem-on-ARM behaving one way and SparseMem-on-everything-else behaving another with respect to pfn_valid(). I'd prefer that an architecture-specific pfn_valid() only be called for the sections that are known to have holes punched in them. > We have a perfectly good > pfn_valid() implementation that'll work across the board, and will fix > this issue. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-12 9:52 ` Minchan Kim @ 2010-07-12 10:45 ` Russell King - ARM Linux -1 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-12 10:45 UTC (permalink / raw) To: Minchan Kim; +Cc: Kukjin Kim, linux-arm-kernel, linux-samsung-soc, ben-linux On Mon, Jul 12, 2010 at 06:52:28PM +0900, Minchan Kim wrote: > Hello. > > On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > > Russell, > > > > Hi, > > > > Kukjin Kim wrote: > >> Russell wrote: > >> > So, memory starts at 0x20000000 and finishes at 0x25000000. That's > > fine. > >> > That doesn't mean the section size is 16MB. > >> > > >> > As I've already said, the section size has _nothing_ what so ever to do > >> > with the size of memory, or the granularity of the size of memory. By > >> > way of illustration, it is perfectly legal to have a section size of > >> > 256MB but only have 1MB in a section and this is perfectly legal. So > >> > sections do not have to be completely filled. > >> > > >> Actually, as you know, the hole's area of mem_map is freed from bootmem if > > a > >> section has a hole when initializing sparse memory. > >> > >> I identified that a section doesn't need to be a contiguous area of > > physical > >> memory when reading your comment with the fact that the mem_map of a > > section > >> can be smaller than the size of a section. > >> > >> I found, however, the kernel panics when modifying min_free_kbytes file in > >> the proc filesystem if a section has a hole. > >> > >> While processing the change of min_free_kbytes in the kernel, page > >> descriptors in a hole of an online section is accessed. > > > > As I said, following error happens. > > It would be helpful to me if any opinions or comments. > > > > Could you test below patch? > Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. But hang on. Where are the hole(s)? The DRAM setup is: 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000 which with SECTION_SIZE_BITS set to 28 gives three sections of memory, and each sparsemem section does not have a hole. No zone should cross a sparsemem section boundary. Moreover, our pfn_valid() now returns false for any and all invalid PFNs. ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-12 10:45 ` Russell King - ARM Linux 0 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-12 10:45 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jul 12, 2010 at 06:52:28PM +0900, Minchan Kim wrote: > Hello. > > On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > > Russell, > > > > Hi, > > > > Kukjin Kim wrote: > >> Russell wrote: > >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's > > fine. > >> > That doesn't mean the section size is 16MB. > >> > > >> > As I've already said, the section size has _nothing_ what so ever to do > >> > with the size of memory, or the granularity of the size of memory. ?By > >> > way of illustration, it is perfectly legal to have a section size of > >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So > >> > sections do not have to be completely filled. > >> > > >> Actually, as you know, the hole's area of mem_map is freed from bootmem if > > a > >> section has a hole when initializing sparse memory. > >> > >> I identified that a section doesn't need to be a contiguous area of > > physical > >> memory when reading your comment with the fact that the mem_map of a > > section > >> can be smaller than the size of a section. > >> > >> I found, however, the kernel panics when modifying min_free_kbytes file in > >> the proc filesystem if a section has a hole. > >> > >> While processing the change of min_free_kbytes in the kernel, page > >> descriptors in a hole of an online section is accessed. > > > > As I said, following error happens. > > It would be helpful to me if any opinions or comments. > > > > Could you test below patch? > Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. But hang on. Where are the hole(s)? The DRAM setup is: 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000 which with SECTION_SIZE_BITS set to 28 gives three sections of memory, and each sparsemem section does not have a hole. No zone should cross a sparsemem section boundary. Moreover, our pfn_valid() now returns false for any and all invalid PFNs. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-12 10:45 ` Russell King - ARM Linux @ 2010-07-12 12:28 ` Minchan Kim -1 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-12 12:28 UTC (permalink / raw) To: Russell King - ARM Linux Cc: linux-samsung-soc, Kukjin Kim, ben-linux, linux-arm-kernel On Mon, Jul 12, 2010 at 11:45:41AM +0100, Russell King - ARM Linux wrote: > On Mon, Jul 12, 2010 at 06:52:28PM +0900, Minchan Kim wrote: > > Hello. > > > > On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > > > Russell, > > > > > > Hi, > > > > > > Kukjin Kim wrote: > > >> Russell wrote: > > >> > So, memory starts at 0x20000000 and finishes at 0x25000000. That's > > > fine. > > >> > That doesn't mean the section size is 16MB. > > >> > > > >> > As I've already said, the section size has _nothing_ what so ever to do > > >> > with the size of memory, or the granularity of the size of memory. By > > >> > way of illustration, it is perfectly legal to have a section size of > > >> > 256MB but only have 1MB in a section and this is perfectly legal. So > > >> > sections do not have to be completely filled. > > >> > > > >> Actually, as you know, the hole's area of mem_map is freed from bootmem if > > > a > > >> section has a hole when initializing sparse memory. > > >> > > >> I identified that a section doesn't need to be a contiguous area of > > > physical > > >> memory when reading your comment with the fact that the mem_map of a > > > section > > >> can be smaller than the size of a section. > > >> > > >> I found, however, the kernel panics when modifying min_free_kbytes file in > > >> the proc filesystem if a section has a hole. > > >> > > >> While processing the change of min_free_kbytes in the kernel, page > > >> descriptors in a hole of an online section is accessed. > > > > > > As I said, following error happens. > > > It would be helpful to me if any opinions or comments. > > > > > > > Could you test below patch? > > Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. > > But hang on. Where are the hole(s)? > > The DRAM setup is: > 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000 > > which with SECTION_SIZE_BITS set to 28 gives three sections of memory, > and each sparsemem section does not have a hole. > > No zone should cross a sparsemem section boundary. > > Moreover, our pfn_valid() now returns false for any and all invalid PFNs. True if it isn't sparsemem. But look at pfn_valid in sparsemem. It just checks that there is a section and section_mem_map has SECTION_HAS_MEM_MAP. The first section in above case has just 80M memory but section has 256M. So, 0x25000000 - 28000000 is the hole. If you pass pfn whihc is 0x2500000, let's see pfn_valid. 1. We pass pfn_to_section_nr check 2. Both __nr_to_section and valid_section is vaild. static inline int pfn_valid(unsigned long pfn) { if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); } What prevent above hole's case? I think at least pfn_valid in sparsemem need bank range check like pfn_valid of ARM in FLATMEM. -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-12 12:28 ` Minchan Kim 0 siblings, 0 replies; 60+ messages in thread From: Minchan Kim @ 2010-07-12 12:28 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jul 12, 2010 at 11:45:41AM +0100, Russell King - ARM Linux wrote: > On Mon, Jul 12, 2010 at 06:52:28PM +0900, Minchan Kim wrote: > > Hello. > > > > On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@samsung.com> wrote: > > > Russell, > > > > > > Hi, > > > > > > Kukjin Kim wrote: > > >> Russell wrote: > > >> > So, memory starts at 0x20000000 and finishes at 0x25000000. ?That's > > > fine. > > >> > That doesn't mean the section size is 16MB. > > >> > > > >> > As I've already said, the section size has _nothing_ what so ever to do > > >> > with the size of memory, or the granularity of the size of memory. ?By > > >> > way of illustration, it is perfectly legal to have a section size of > > >> > 256MB but only have 1MB in a section and this is perfectly legal. ?So > > >> > sections do not have to be completely filled. > > >> > > > >> Actually, as you know, the hole's area of mem_map is freed from bootmem if > > > a > > >> section has a hole when initializing sparse memory. > > >> > > >> I identified that a section doesn't need to be a contiguous area of > > > physical > > >> memory when reading your comment with the fact that the mem_map of a > > > section > > >> can be smaller than the size of a section. > > >> > > >> I found, however, the kernel panics when modifying min_free_kbytes file in > > >> the proc filesystem if a section has a hole. > > >> > > >> While processing the change of min_free_kbytes in the kernel, page > > >> descriptors in a hole of an online section is accessed. > > > > > > As I said, following error happens. > > > It would be helpful to me if any opinions or comments. > > > > > > > Could you test below patch? > > Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config. > > But hang on. Where are the hole(s)? > > The DRAM setup is: > 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000 > > which with SECTION_SIZE_BITS set to 28 gives three sections of memory, > and each sparsemem section does not have a hole. > > No zone should cross a sparsemem section boundary. > > Moreover, our pfn_valid() now returns false for any and all invalid PFNs. True if it isn't sparsemem. But look at pfn_valid in sparsemem. It just checks that there is a section and section_mem_map has SECTION_HAS_MEM_MAP. The first section in above case has just 80M memory but section has 256M. So, 0x25000000 - 28000000 is the hole. If you pass pfn whihc is 0x2500000, let's see pfn_valid. 1. We pass pfn_to_section_nr check 2. Both __nr_to_section and valid_section is vaild. static inline int pfn_valid(unsigned long pfn) { if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); } What prevent above hole's case? I think at least pfn_valid in sparsemem need bank range check like pfn_valid of ARM in FLATMEM. -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: About SECTION_SIZE_BITS for Sparsemem 2010-07-12 12:28 ` Minchan Kim @ 2010-07-12 12:42 ` Russell King - ARM Linux -1 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-12 12:42 UTC (permalink / raw) To: Minchan Kim; +Cc: Kukjin Kim, linux-arm-kernel, linux-samsung-soc, ben-linux On Mon, Jul 12, 2010 at 09:28:16PM +0900, Minchan Kim wrote: > It just checks that there is a section and section_mem_map has SECTION_HAS_MEM_MAP. > The first section in above case has just 80M memory but section has 256M. > So, 0x25000000 - 28000000 is the hole. If you pass pfn whihc is 0x2500000, > let's see pfn_valid. That's not a hole as such. It's an incompletely filled section, which is precisely what sparsemem is all about. > 1. We pass pfn_to_section_nr check > 2. Both __nr_to_section and valid_section is vaild. > > static inline int pfn_valid(unsigned long pfn) > { > if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) > return 0; > return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); > } > > > What prevent above hole's case? That looks too loose to be useful - that means invalid page table entries can pass the pfn_valid() test when they should actually fail, and I'd imagine there will be many more such problems like that. > I think at least pfn_valid in sparsemem need bank range check like pfn_valid of ARM > in FLATMEM. I agree - pfn_valid() on sparsemem needs to be tightened. ^ permalink raw reply [flat|nested] 60+ messages in thread
* About SECTION_SIZE_BITS for Sparsemem @ 2010-07-12 12:42 ` Russell King - ARM Linux 0 siblings, 0 replies; 60+ messages in thread From: Russell King - ARM Linux @ 2010-07-12 12:42 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jul 12, 2010 at 09:28:16PM +0900, Minchan Kim wrote: > It just checks that there is a section and section_mem_map has SECTION_HAS_MEM_MAP. > The first section in above case has just 80M memory but section has 256M. > So, 0x25000000 - 28000000 is the hole. If you pass pfn whihc is 0x2500000, > let's see pfn_valid. That's not a hole as such. It's an incompletely filled section, which is precisely what sparsemem is all about. > 1. We pass pfn_to_section_nr check > 2. Both __nr_to_section and valid_section is vaild. > > static inline int pfn_valid(unsigned long pfn) > { > if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) > return 0; > return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); > } > > > What prevent above hole's case? That looks too loose to be useful - that means invalid page table entries can pass the pfn_valid() test when they should actually fail, and I'd imagine there will be many more such problems like that. > I think at least pfn_valid in sparsemem need bank range check like pfn_valid of ARM > in FLATMEM. I agree - pfn_valid() on sparsemem needs to be tightened. ^ permalink raw reply [flat|nested] 60+ messages in thread
end of thread, other threads:[~2010-07-16 0:08 UTC | newest] Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-07-12 8:32 About SECTION_SIZE_BITS for Sparsemem Kukjin Kim 2010-07-12 8:32 ` Kukjin Kim 2010-07-12 9:35 ` Kyungmin Park 2010-07-12 9:35 ` Kyungmin Park 2010-07-12 9:58 ` Kukjin Kim 2010-07-12 9:58 ` Kukjin Kim 2010-07-12 10:08 ` Kyungmin Park 2010-07-12 10:08 ` Kyungmin Park 2010-07-12 10:13 ` Russell King - ARM Linux 2010-07-12 10:13 ` Russell King - ARM Linux 2010-07-12 9:52 ` Minchan Kim 2010-07-12 9:52 ` Minchan Kim 2010-07-12 10:13 ` Kukjin Kim 2010-07-12 10:13 ` Kukjin Kim 2010-07-12 10:35 ` Minchan Kim 2010-07-12 10:35 ` Minchan Kim 2010-07-13 0:25 ` KAMEZAWA Hiroyuki 2010-07-13 0:25 ` KAMEZAWA Hiroyuki 2010-07-13 1:53 ` KAMEZAWA Hiroyuki 2010-07-13 1:53 ` KAMEZAWA Hiroyuki 2010-07-13 18:31 ` Russell King - ARM Linux 2010-07-13 18:31 ` Russell King - ARM Linux 2010-07-13 2:05 ` Minchan Kim 2010-07-13 2:05 ` Minchan Kim 2010-07-13 3:03 ` KAMEZAWA Hiroyuki 2010-07-13 3:03 ` KAMEZAWA Hiroyuki 2010-07-13 9:28 ` Mel Gorman 2010-07-13 9:28 ` Mel Gorman 2010-07-13 9:38 ` Russell King - ARM Linux 2010-07-13 9:38 ` Russell King - ARM Linux 2010-07-13 9:26 ` Mel Gorman 2010-07-13 9:26 ` Mel Gorman 2010-07-13 9:38 ` Russell King - ARM Linux 2010-07-13 9:38 ` Russell King - ARM Linux 2010-07-13 9:50 ` Mel Gorman 2010-07-13 9:50 ` Mel Gorman 2010-07-13 17:37 ` Russell King - ARM Linux 2010-07-13 17:37 ` Russell King - ARM Linux 2010-07-13 20:32 ` Mel Gorman 2010-07-13 20:32 ` Mel Gorman 2010-07-13 23:59 ` Minchan Kim 2010-07-13 23:59 ` Minchan Kim 2010-07-14 8:49 ` Russell King - ARM Linux 2010-07-14 8:49 ` Russell King - ARM Linux 2010-07-14 11:04 ` Minchan Kim 2010-07-14 11:04 ` Minchan Kim 2010-07-14 20:49 ` Russell King - ARM Linux 2010-07-14 20:49 ` Russell King - ARM Linux 2010-07-16 0:07 ` Minchan Kim 2010-07-16 0:07 ` Minchan Kim 2010-07-14 8:59 ` Russell King - ARM Linux 2010-07-14 8:59 ` Russell King - ARM Linux 2010-07-14 13:14 ` Mel Gorman 2010-07-14 13:14 ` Mel Gorman 2010-07-12 10:45 ` Russell King - ARM Linux 2010-07-12 10:45 ` Russell King - ARM Linux 2010-07-12 12:28 ` Minchan Kim 2010-07-12 12:28 ` Minchan Kim 2010-07-12 12:42 ` Russell King - ARM Linux 2010-07-12 12:42 ` Russell King - ARM Linux
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.