All of lore.kernel.org
 help / color / mirror / Atom feed
* WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-05-30 16:33 ` Dave Jones
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-05-30 16:33 UTC (permalink / raw)
  To: Linux Kernel; +Cc: linux-mm

Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc


------------[ cut here ]------------
WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
Hardware name:         
Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_emu10k1 snd_util_mem snd_ac97_codec ac97_bus snd_hwdep snd_rawmidi snd_seq_device snd_pcm microcode snd_page_alloc pcspkr snd_timer snd lpc_ich i2c_i801 mfd_core e1000e soundcore vhost_net tun macvtap macvlan kvm_intel nfsd kvm nfs_acl auth_rpcgss lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci firewire_core sata_sil crc_itu_t floppy radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75
Call Trace:
 [<ffffffff8104897f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff810489da>] warn_slowpath_null+0x1a/0x20
 [<ffffffff81146bda>] __set_page_dirty_nobuffers+0x13a/0x170
 [<ffffffff81193322>] migrate_page_copy+0x1e2/0x260
 [<ffffffff811933fb>] migrate_page+0x5b/0x70
 [<ffffffff811934b5>] move_to_new_page+0xa5/0x260
 [<ffffffff81193ca8>] migrate_pages+0x4c8/0x540
 [<ffffffff811610d0>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
 [<ffffffff81162056>] compact_zone+0x216/0x480
 [<ffffffff81321ad8>] ? debug_check_no_obj_freed+0x88/0x210
 [<ffffffff8116259d>] compact_zone_order+0x8d/0xd0
 [<ffffffff811626a9>] try_to_compact_pages+0xc9/0x140
 [<ffffffff81649f4e>] __alloc_pages_direct_compact+0xaa/0x1d0
 [<ffffffff8114562b>] __alloc_pages_nodemask+0x60b/0xab0
 [<ffffffff81321bbc>] ? debug_check_no_obj_freed+0x16c/0x210
 [<ffffffff81185236>] alloc_pages_vma+0xb6/0x190
 [<ffffffff81195d8d>] khugepaged+0x95d/0x1570
 [<ffffffff81073350>] ? wake_up_bit+0x40/0x40
 [<ffffffff81195430>] ? collect_mm_slot+0xa0/0xa0
 [<ffffffff81072c37>] kthread+0xb7/0xc0
 [<ffffffff8165dc14>] kernel_thread_helper+0x4/0x10
 [<ffffffff8165511d>] ? retint_restore_args+0xe/0xe
 [<ffffffff81072b80>] ? flush_kthread_worker+0x190/0x190
 [<ffffffff8165dc10>] ? gs_change+0xb/0xb
---[ end trace 4324bd0bca27f6f0 ]---


^ permalink raw reply	[flat|nested] 87+ messages in thread

* WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-05-30 16:33 ` Dave Jones
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-05-30 16:33 UTC (permalink / raw)
  To: Linux Kernel; +Cc: linux-mm

Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc


------------[ cut here ]------------
WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
Hardware name:         
Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_emu10k1 snd_util_mem snd_ac97_codec ac97_bus snd_hwdep snd_rawmidi snd_seq_device snd_pcm microcode snd_page_alloc pcspkr snd_timer snd lpc_ich i2c_i801 mfd_core e1000e soundcore vhost_net tun macvtap macvlan kvm_intel nfsd kvm nfs_acl auth_rpcgss lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci firewire_core sata_sil crc_itu_t floppy radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75
Call Trace:
 [<ffffffff8104897f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff810489da>] warn_slowpath_null+0x1a/0x20
 [<ffffffff81146bda>] __set_page_dirty_nobuffers+0x13a/0x170
 [<ffffffff81193322>] migrate_page_copy+0x1e2/0x260
 [<ffffffff811933fb>] migrate_page+0x5b/0x70
 [<ffffffff811934b5>] move_to_new_page+0xa5/0x260
 [<ffffffff81193ca8>] migrate_pages+0x4c8/0x540
 [<ffffffff811610d0>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
 [<ffffffff81162056>] compact_zone+0x216/0x480
 [<ffffffff81321ad8>] ? debug_check_no_obj_freed+0x88/0x210
 [<ffffffff8116259d>] compact_zone_order+0x8d/0xd0
 [<ffffffff811626a9>] try_to_compact_pages+0xc9/0x140
 [<ffffffff81649f4e>] __alloc_pages_direct_compact+0xaa/0x1d0
 [<ffffffff8114562b>] __alloc_pages_nodemask+0x60b/0xab0
 [<ffffffff81321bbc>] ? debug_check_no_obj_freed+0x16c/0x210
 [<ffffffff81185236>] alloc_pages_vma+0xb6/0x190
 [<ffffffff81195d8d>] khugepaged+0x95d/0x1570
 [<ffffffff81073350>] ? wake_up_bit+0x40/0x40
 [<ffffffff81195430>] ? collect_mm_slot+0xa0/0xa0
 [<ffffffff81072c37>] kthread+0xb7/0xc0
 [<ffffffff8165dc14>] kernel_thread_helper+0x4/0x10
 [<ffffffff8165511d>] ? retint_restore_args+0xe/0xe
 [<ffffffff81072b80>] ? flush_kthread_worker+0x190/0x190
 [<ffffffff8165dc10>] ? gs_change+0xb/0xb
---[ end trace 4324bd0bca27f6f0 ]---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-05-30 16:33 ` Dave Jones
@ 2012-05-31  0:57   ` Dave Jones
  -1 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-05-31  0:57 UTC (permalink / raw)
  To: Linux Kernel; +Cc: linux-mm, Andrew Morton

On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote:
 > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc
 > 
 > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
 > Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_emu10k1 snd_util_mem snd_ac97_codec ac97_bus snd_hwdep snd_rawmidi snd_seq_device snd_pcm microcode snd_page_alloc pcspkr snd_timer snd lpc_ich i2c_i801 mfd_core e1000e soundcore vhost_net tun macvtap macvlan kvm_intel nfsd kvm nfs_acl auth_rpcgss lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci firewire_core sata_sil crc_itu_t floppy radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
 > Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75
 > Call Trace:
 >  [<ffffffff8104897f>] warn_slowpath_common+0x7f/0xc0
 >  [<ffffffff810489da>] warn_slowpath_null+0x1a/0x20
 >  [<ffffffff81146bda>] __set_page_dirty_nobuffers+0x13a/0x170
 >  [<ffffffff81193322>] migrate_page_copy+0x1e2/0x260
 >  [<ffffffff811933fb>] migrate_page+0x5b/0x70
 >  [<ffffffff811934b5>] move_to_new_page+0xa5/0x260
 >  [<ffffffff81193ca8>] migrate_pages+0x4c8/0x540
 >  [<ffffffff811610d0>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
 >  [<ffffffff81162056>] compact_zone+0x216/0x480
 >  [<ffffffff81321ad8>] ? debug_check_no_obj_freed+0x88/0x210
 >  [<ffffffff8116259d>] compact_zone_order+0x8d/0xd0
 >  [<ffffffff811626a9>] try_to_compact_pages+0xc9/0x140
 >  [<ffffffff81649f4e>] __alloc_pages_direct_compact+0xaa/0x1d0
 >  [<ffffffff8114562b>] __alloc_pages_nodemask+0x60b/0xab0
 >  [<ffffffff81321bbc>] ? debug_check_no_obj_freed+0x16c/0x210
 >  [<ffffffff81185236>] alloc_pages_vma+0xb6/0x190
 >  [<ffffffff81195d8d>] khugepaged+0x95d/0x1570
 >  [<ffffffff81073350>] ? wake_up_bit+0x40/0x40
 >  [<ffffffff81195430>] ? collect_mm_slot+0xa0/0xa0
 >  [<ffffffff81072c37>] kthread+0xb7/0xc0
 >  [<ffffffff8165dc14>] kernel_thread_helper+0x4/0x10
 >  [<ffffffff8165511d>] ? retint_restore_args+0xe/0xe
 >  [<ffffffff81072b80>] ? flush_kthread_worker+0x190/0x190
 >  [<ffffffff8165dc10>] ? gs_change+0xb/0xb

Seems this can be triggered from mmap, as well as from khugepaged..

WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
Modules linked in: tun dccp_ipv4 dccp nfnetlink sctp libcrc32c fuse ipt_ULOG binfmt_misc caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel serio_raw microcode pcspkr i2c_i801 usb_debug lpc_ich mfd_core e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
Pid: 1171, comm: trinity-child4 Not tainted 3.4.0+ #38
Call Trace:
 [<ffffffff810490ef>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff8104914a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff8114b4ea>] __set_page_dirty_nobuffers+0x13a/0x170
 [<ffffffff81197db2>] migrate_page_copy+0x1e2/0x260
 [<ffffffff81197e8b>] migrate_page+0x5b/0x70
 [<ffffffff81197f45>] move_to_new_page+0xa5/0x260
 [<ffffffff81198738>] migrate_pages+0x4c8/0x540
 [<ffffffff811659e0>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
 [<ffffffff81166966>] compact_zone+0x216/0x480
 [<ffffffff810b1318>] ? trace_hardirqs_off_caller+0x28/0xc0
 [<ffffffff81166ead>] compact_zone_order+0x8d/0xd0
 [<ffffffff81149525>] ? get_page_from_freelist+0x565/0x970
 [<ffffffff81166fb9>] try_to_compact_pages+0xc9/0x140
 [<ffffffff81652591>] __alloc_pages_direct_compact+0xaa/0x1d0
 [<ffffffff81149f3b>] __alloc_pages_nodemask+0x60b/0xab0
 [<ffffffff810b1318>] ? trace_hardirqs_off_caller+0x28/0xc0
 [<ffffffff810b4c00>] ? __lock_acquire+0x2b0/0x1aa0
 [<ffffffff81189cc6>] alloc_pages_vma+0xb6/0x190
 [<ffffffff8119cdb3>] do_huge_pmd_anonymous_page+0x133/0x310
 [<ffffffff8116c0c2>] handle_mm_fault+0x242/0x2e0
 [<ffffffff8116c372>] __get_user_pages+0x142/0x560
 [<ffffffff81171a38>] ? mmap_region+0x3f8/0x630
 [<ffffffff8116c842>] get_user_pages+0x52/0x60
 [<ffffffff8116d732>] make_pages_present+0x92/0xc0
 [<ffffffff811719e6>] mmap_region+0x3a6/0x630
 [<ffffffff81050b7c>] ? do_setitimer+0x1cc/0x310
 [<ffffffff81171fcd>] do_mmap_pgoff+0x35d/0x3b0
 [<ffffffff81172086>] ? sys_mmap_pgoff+0x66/0x240
 [<ffffffff811720a4>] sys_mmap_pgoff+0x84/0x240
 [<ffffffff813225be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81006ca2>] sys_mmap+0x22/0x30
 [<ffffffff81664ed2>] system_call_fastpath+0x16/0x1b
---[ end trace 336c91f371296e41 ]---



I'd bisect this, but it takes a few hours to trigger, which makes it hard
to distinguish between 'good kernel' and 'hasn't triggered yet'.

	Dave


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-05-31  0:57   ` Dave Jones
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-05-31  0:57 UTC (permalink / raw)
  To: Linux Kernel; +Cc: linux-mm, Andrew Morton

On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote:
 > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc
 > 
 > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
 > Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_emu10k1 snd_util_mem snd_ac97_codec ac97_bus snd_hwdep snd_rawmidi snd_seq_device snd_pcm microcode snd_page_alloc pcspkr snd_timer snd lpc_ich i2c_i801 mfd_core e1000e soundcore vhost_net tun macvtap macvlan kvm_intel nfsd kvm nfs_acl auth_rpcgss lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci firewire_core sata_sil crc_itu_t floppy radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
 > Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75
 > Call Trace:
 >  [<ffffffff8104897f>] warn_slowpath_common+0x7f/0xc0
 >  [<ffffffff810489da>] warn_slowpath_null+0x1a/0x20
 >  [<ffffffff81146bda>] __set_page_dirty_nobuffers+0x13a/0x170
 >  [<ffffffff81193322>] migrate_page_copy+0x1e2/0x260
 >  [<ffffffff811933fb>] migrate_page+0x5b/0x70
 >  [<ffffffff811934b5>] move_to_new_page+0xa5/0x260
 >  [<ffffffff81193ca8>] migrate_pages+0x4c8/0x540
 >  [<ffffffff811610d0>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
 >  [<ffffffff81162056>] compact_zone+0x216/0x480
 >  [<ffffffff81321ad8>] ? debug_check_no_obj_freed+0x88/0x210
 >  [<ffffffff8116259d>] compact_zone_order+0x8d/0xd0
 >  [<ffffffff811626a9>] try_to_compact_pages+0xc9/0x140
 >  [<ffffffff81649f4e>] __alloc_pages_direct_compact+0xaa/0x1d0
 >  [<ffffffff8114562b>] __alloc_pages_nodemask+0x60b/0xab0
 >  [<ffffffff81321bbc>] ? debug_check_no_obj_freed+0x16c/0x210
 >  [<ffffffff81185236>] alloc_pages_vma+0xb6/0x190
 >  [<ffffffff81195d8d>] khugepaged+0x95d/0x1570
 >  [<ffffffff81073350>] ? wake_up_bit+0x40/0x40
 >  [<ffffffff81195430>] ? collect_mm_slot+0xa0/0xa0
 >  [<ffffffff81072c37>] kthread+0xb7/0xc0
 >  [<ffffffff8165dc14>] kernel_thread_helper+0x4/0x10
 >  [<ffffffff8165511d>] ? retint_restore_args+0xe/0xe
 >  [<ffffffff81072b80>] ? flush_kthread_worker+0x190/0x190
 >  [<ffffffff8165dc10>] ? gs_change+0xb/0xb

Seems this can be triggered from mmap, as well as from khugepaged..

WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
Modules linked in: tun dccp_ipv4 dccp nfnetlink sctp libcrc32c fuse ipt_ULOG binfmt_misc caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel serio_raw microcode pcspkr i2c_i801 usb_debug lpc_ich mfd_core e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
Pid: 1171, comm: trinity-child4 Not tainted 3.4.0+ #38
Call Trace:
 [<ffffffff810490ef>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff8104914a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff8114b4ea>] __set_page_dirty_nobuffers+0x13a/0x170
 [<ffffffff81197db2>] migrate_page_copy+0x1e2/0x260
 [<ffffffff81197e8b>] migrate_page+0x5b/0x70
 [<ffffffff81197f45>] move_to_new_page+0xa5/0x260
 [<ffffffff81198738>] migrate_pages+0x4c8/0x540
 [<ffffffff811659e0>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
 [<ffffffff81166966>] compact_zone+0x216/0x480
 [<ffffffff810b1318>] ? trace_hardirqs_off_caller+0x28/0xc0
 [<ffffffff81166ead>] compact_zone_order+0x8d/0xd0
 [<ffffffff81149525>] ? get_page_from_freelist+0x565/0x970
 [<ffffffff81166fb9>] try_to_compact_pages+0xc9/0x140
 [<ffffffff81652591>] __alloc_pages_direct_compact+0xaa/0x1d0
 [<ffffffff81149f3b>] __alloc_pages_nodemask+0x60b/0xab0
 [<ffffffff810b1318>] ? trace_hardirqs_off_caller+0x28/0xc0
 [<ffffffff810b4c00>] ? __lock_acquire+0x2b0/0x1aa0
 [<ffffffff81189cc6>] alloc_pages_vma+0xb6/0x190
 [<ffffffff8119cdb3>] do_huge_pmd_anonymous_page+0x133/0x310
 [<ffffffff8116c0c2>] handle_mm_fault+0x242/0x2e0
 [<ffffffff8116c372>] __get_user_pages+0x142/0x560
 [<ffffffff81171a38>] ? mmap_region+0x3f8/0x630
 [<ffffffff8116c842>] get_user_pages+0x52/0x60
 [<ffffffff8116d732>] make_pages_present+0x92/0xc0
 [<ffffffff811719e6>] mmap_region+0x3a6/0x630
 [<ffffffff81050b7c>] ? do_setitimer+0x1cc/0x310
 [<ffffffff81171fcd>] do_mmap_pgoff+0x35d/0x3b0
 [<ffffffff81172086>] ? sys_mmap_pgoff+0x66/0x240
 [<ffffffff811720a4>] sys_mmap_pgoff+0x84/0x240
 [<ffffffff813225be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81006ca2>] sys_mmap+0x22/0x30
 [<ffffffff81664ed2>] system_call_fastpath+0x16/0x1b
---[ end trace 336c91f371296e41 ]---



I'd bisect this, but it takes a few hours to trigger, which makes it hard
to distinguish between 'good kernel' and 'hasn't triggered yet'.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-05-31  0:57   ` Dave Jones
@ 2012-06-01  2:31     ` Dave Jones
  -1 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-01  2:31 UTC (permalink / raw)
  To: Linux Kernel, linux-mm, Andrew Morton, Linus Torvalds,
	Hugh Dickins, Cong Wang

On Wed, May 30, 2012 at 08:57:40PM -0400, Dave Jones wrote:
 > On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote:
 >  > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc
 >  > 
 >  > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
 >  > Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_emu10k1 snd_util_mem snd_ac97_codec ac97_bus snd_hwdep snd_rawmidi snd_seq_device snd_pcm microcode snd_page_alloc pcspkr snd_timer snd lpc_ich i2c_i801 mfd_core e1000e soundcore vhost_net tun macvtap macvlan kvm_intel nfsd kvm nfs_acl auth_rpcgss lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci firewire_core sata_sil crc_itu_t floppy radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
 >  > Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75
 >  > Call Trace:
 >  >  [<ffffffff8104897f>] warn_slowpath_common+0x7f/0xc0
 >  >  [<ffffffff810489da>] warn_slowpath_null+0x1a/0x20
 >  >  [<ffffffff81146bda>] __set_page_dirty_nobuffers+0x13a/0x170
 >  >  [<ffffffff81193322>] migrate_page_copy+0x1e2/0x260
 >  >  [<ffffffff811933fb>] migrate_page+0x5b/0x70
 >  >  [<ffffffff811934b5>] move_to_new_page+0xa5/0x260
 >  >  [<ffffffff81193ca8>] migrate_pages+0x4c8/0x540
 >  >  [<ffffffff811610d0>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
 >  >  [<ffffffff81162056>] compact_zone+0x216/0x480
 >  >  [<ffffffff81321ad8>] ? debug_check_no_obj_freed+0x88/0x210
 >  >  [<ffffffff8116259d>] compact_zone_order+0x8d/0xd0
 >  >  [<ffffffff811626a9>] try_to_compact_pages+0xc9/0x140
 >  >  [<ffffffff81649f4e>] __alloc_pages_direct_compact+0xaa/0x1d0
 >  >  [<ffffffff8114562b>] __alloc_pages_nodemask+0x60b/0xab0
 >  >  [<ffffffff81321bbc>] ? debug_check_no_obj_freed+0x16c/0x210
 >  >  [<ffffffff81185236>] alloc_pages_vma+0xb6/0x190
 >  >  [<ffffffff81195d8d>] khugepaged+0x95d/0x1570
 >  >  [<ffffffff81073350>] ? wake_up_bit+0x40/0x40
 >  >  [<ffffffff81195430>] ? collect_mm_slot+0xa0/0xa0
 >  >  [<ffffffff81072c37>] kthread+0xb7/0xc0
 >  >  [<ffffffff8165dc14>] kernel_thread_helper+0x4/0x10
 >  >  [<ffffffff8165511d>] ? retint_restore_args+0xe/0xe
 >  >  [<ffffffff81072b80>] ? flush_kthread_worker+0x190/0x190
 >  >  [<ffffffff8165dc10>] ? gs_change+0xb/0xb
 > 
 > Seems this can be triggered from mmap, as well as from khugepaged..
 > 
 > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
 > Modules linked in: tun dccp_ipv4 dccp nfnetlink sctp libcrc32c fuse ipt_ULOG binfmt_misc caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel serio_raw microcode pcspkr i2c_i801 usb_debug lpc_ich mfd_core e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
 > Pid: 1171, comm: trinity-child4 Not tainted 3.4.0+ #38
 > Call Trace:
 >  [<ffffffff810490ef>] warn_slowpath_common+0x7f/0xc0
 >  [<ffffffff8104914a>] warn_slowpath_null+0x1a/0x20
 >  [<ffffffff8114b4ea>] __set_page_dirty_nobuffers+0x13a/0x170
 >  [<ffffffff81197db2>] migrate_page_copy+0x1e2/0x260
 >  [<ffffffff81197e8b>] migrate_page+0x5b/0x70
 >  [<ffffffff81197f45>] move_to_new_page+0xa5/0x260
 >  [<ffffffff81198738>] migrate_pages+0x4c8/0x540
 >  [<ffffffff811659e0>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
 >  [<ffffffff81166966>] compact_zone+0x216/0x480
 >  [<ffffffff810b1318>] ? trace_hardirqs_off_caller+0x28/0xc0
 >  [<ffffffff81166ead>] compact_zone_order+0x8d/0xd0
 >  [<ffffffff81149525>] ? get_page_from_freelist+0x565/0x970
 >  [<ffffffff81166fb9>] try_to_compact_pages+0xc9/0x140
 >  [<ffffffff81652591>] __alloc_pages_direct_compact+0xaa/0x1d0
 >  [<ffffffff81149f3b>] __alloc_pages_nodemask+0x60b/0xab0
 >  [<ffffffff810b1318>] ? trace_hardirqs_off_caller+0x28/0xc0
 >  [<ffffffff810b4c00>] ? __lock_acquire+0x2b0/0x1aa0
 >  [<ffffffff81189cc6>] alloc_pages_vma+0xb6/0x190
 >  [<ffffffff8119cdb3>] do_huge_pmd_anonymous_page+0x133/0x310
 >  [<ffffffff8116c0c2>] handle_mm_fault+0x242/0x2e0
 >  [<ffffffff8116c372>] __get_user_pages+0x142/0x560
 >  [<ffffffff81171a38>] ? mmap_region+0x3f8/0x630
 >  [<ffffffff8116c842>] get_user_pages+0x52/0x60
 >  [<ffffffff8116d732>] make_pages_present+0x92/0xc0
 >  [<ffffffff811719e6>] mmap_region+0x3a6/0x630
 >  [<ffffffff81050b7c>] ? do_setitimer+0x1cc/0x310
 >  [<ffffffff81171fcd>] do_mmap_pgoff+0x35d/0x3b0
 >  [<ffffffff81172086>] ? sys_mmap_pgoff+0x66/0x240
 >  [<ffffffff811720a4>] sys_mmap_pgoff+0x84/0x240
 >  [<ffffffff813225be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 >  [<ffffffff81006ca2>] sys_mmap+0x22/0x30
 >  [<ffffffff81664ed2>] system_call_fastpath+0x16/0x1b
 > ---[ end trace 336c91f371296e41 ]---
 > 
 > 
 > 
 > I'd bisect this, but it takes a few hours to trigger, which makes it hard
 > to distinguish between 'good kernel' and 'hasn't triggered yet'.

So I bisected it anyway, and it led to ...


3f31d07571eeea18a7d34db9af21d2285b807a17 is the first bad commit
commit 3f31d07571eeea18a7d34db9af21d2285b807a17
Author: Hugh Dickins <hughd@google.com>
Date:   Tue May 29 15:06:40 2012 -0700

    mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE
    
    Now tmpfs supports hole-punching via fallocate(), switch madvise_remove()
    to use do_fallocate() instead of vmtruncate_range(): which extends
    madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs.
    
    There is one more user of vmtruncate_range() in our tree,
    staging/android's ashmem_shrink(): convert it to use do_fallocate() too
    (but if its unpinned areas are already unmapped - I don't know - then it
    would do better to use shmem_truncate_range() directly).
    
    Based-on-patch-by: Cong Wang <amwang@redhat.com>
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Colin Cross <ccross@android.com>
    Cc: John Stultz <john.stultz@linaro.org>
    Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
    Cc: "Theodore Ts'o" <tytso@mit.edu>
    Cc: Andreas Dilger <adilger@dilger.ca>
    Cc: Mark Fasheh <mfasheh@suse.de>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Ben Myers <bpm@sgi.com>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


Hugh ?

I'll repeat the bisect tomorrow just to be sure. (It took all day, even though
there were only a half dozen bisect points, as I ran the test for an hour on
each build to see what fell out).

Here's what I found..

git bisect start 'mm/'
# bad: [4b395d7ea79472ac240ee8768b4930ca9ce096ef] Merge /home/davej/src/git-trees/kernel/linux
git bisect bad 4b395d7ea79472ac240ee8768b4930ca9ce096ef
# good: [76e10d158efb6d4516018846f60c2ab5501900bc] Linux 3.4
git bisect good 76e10d158efb6d4516018846f60c2ab5501900bc
# good: [c6785b6bf1b2a4b47238b24ee56f61e27c3af682] mm: bootmem: rename alloc_bootmem_core to alloc_bootmem_bdata
git bisect good c6785b6bf1b2a4b47238b24ee56f61e27c3af682
# bad: [89abfab133ef1f5902abafb744df72793213ac19] mm/memcg: move reclaim_stat into lruvec
git bisect bad 89abfab133ef1f5902abafb744df72793213ac19
# bad: [4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3] tmpfs: support SEEK_DATA and SEEK_HOLE
git bisect bad 4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3
# good: [bde05d1ccd512696b09db9dd2e5f33ad19152605] shmem: replace page if mapping excludes its zone
git bisect good bde05d1ccd512696b09db9dd2e5f33ad19152605
# bad: [3f31d07571eeea18a7d34db9af21d2285b807a17] mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE
git bisect bad 3f31d07571eeea18a7d34db9af21d2285b807a17
# good: [ec9516fbc5fa814014991e1ae7f8860127122105] tmpfs: optimize clearing when writing
git bisect good ec9516fbc5fa814014991e1ae7f8860127122105
# good: [83e4fa9c16e4af7122e31be3eca5d57881d236fe] tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE
git bisect good 83e4fa9c16e4af7122e31be3eca5d57881d236fe


This has been a challenge to bisect additionally because I'm not sure if the other mm
bug I reported in the last few days (the list_debug/list_add corruption warnings in the
compaction code) are related or not. Sometimes during the bisect these errors happened
in pairs, sometimes only together.  The 'good' builds showed no errors at all.

As a reminder, the list_add corruption looks like this...

WARNING: at lib/list_debug.c:29 __list_add+0x6c/0x90()
list_add corruption. next->prev should be prev (ffff88014e5d9ed8), but was ffffea0004f48360. (next=ffffea0004b23920).
Modules linked in: ipt_ULOG fuse tun nfnetlink binfmt_misc sctp libcrc32c caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw i2c_i801 pcspkr e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
Pid: 24594, comm: trinity-child1 Not tainted 3.4.0+ #42
Call Trace:
 [<ffffffff81048fdf>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff810490d6>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff810b767d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff813259dc>] __list_add+0x6c/0x90
 [<ffffffff8114591d>] move_freepages_block+0x16d/0x190
 [<ffffffff81165773>] suitable_migration_target.isra.14+0x1b3/0x1d0
 [<ffffffff81165cab>] compaction_alloc+0x1db/0x2f0
 [<ffffffff81198357>] migrate_pages+0xc7/0x540
 [<ffffffff81165ad0>] ? isolate_freepages_block+0x260/0x260
 [<ffffffff81166946>] compact_zone+0x216/0x480
 [<ffffffff81166e8d>] compact_zone_order+0x8d/0xd0
 [<ffffffff81149565>] ? get_page_from_freelist+0x565/0x970
 [<ffffffff81166f99>] try_to_compact_pages+0xc9/0x140
 [<ffffffff8163b7f2>] __alloc_pages_direct_compact+0xaa/0x1d0
 [<ffffffff81149f7b>] __alloc_pages_nodemask+0x60b/0xab0
 [<ffffffff810b12d8>] ? trace_hardirqs_off_caller+0x28/0xc0
 [<ffffffff810b4c00>] ? __lock_acquire+0x2f0/0x1aa0
 [<ffffffff81189ce6>] alloc_pages_vma+0xb6/0x190
 [<ffffffff8119cd83>] do_huge_pmd_anonymous_page+0x133/0x310
 [<ffffffff8116c0a2>] handle_mm_fault+0x242/0x2e0
 [<ffffffff8116c352>] __get_user_pages+0x142/0x560
 [<ffffffff81171a18>] ? mmap_region+0x3f8/0x630
 [<ffffffff8116c822>] get_user_pages+0x52/0x60
 [<ffffffff8116d712>] make_pages_present+0x92/0xc0
 [<ffffffff811719c6>] mmap_region+0x3a6/0x630
 [<ffffffff81050a3c>] ? do_setitimer+0x1cc/0x310
 [<ffffffff81171fad>] do_mmap_pgoff+0x35d/0x3b0
 [<ffffffff81172066>] ? sys_mmap_pgoff+0x66/0x240
 [<ffffffff81172084>] sys_mmap_pgoff+0x84/0x240
 [<ffffffff8131f31e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81006ca2>] sys_mmap+0x22/0x30
 [<ffffffff8164e012>] system_call_fastpath+0x16/0x1b
---[ end trace b606ea2a53bf1425 ]---

On an affected kernel, it'll show up within an hour of fuzzing on a fast machine.

	Dave


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01  2:31     ` Dave Jones
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-01  2:31 UTC (permalink / raw)
  To: Linux Kernel, linux-mm, Andrew Morton, Linus Torvalds,
	Hugh Dickins, Cong Wang

On Wed, May 30, 2012 at 08:57:40PM -0400, Dave Jones wrote:
 > On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote:
 >  > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc
 >  > 
 >  > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
 >  > Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_emu10k1 snd_util_mem snd_ac97_codec ac97_bus snd_hwdep snd_rawmidi snd_seq_device snd_pcm microcode snd_page_alloc pcspkr snd_timer snd lpc_ich i2c_i801 mfd_core e1000e soundcore vhost_net tun macvtap macvlan kvm_intel nfsd kvm nfs_acl auth_rpcgss lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci firewire_core sata_sil crc_itu_t floppy radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
 >  > Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75
 >  > Call Trace:
 >  >  [<ffffffff8104897f>] warn_slowpath_common+0x7f/0xc0
 >  >  [<ffffffff810489da>] warn_slowpath_null+0x1a/0x20
 >  >  [<ffffffff81146bda>] __set_page_dirty_nobuffers+0x13a/0x170
 >  >  [<ffffffff81193322>] migrate_page_copy+0x1e2/0x260
 >  >  [<ffffffff811933fb>] migrate_page+0x5b/0x70
 >  >  [<ffffffff811934b5>] move_to_new_page+0xa5/0x260
 >  >  [<ffffffff81193ca8>] migrate_pages+0x4c8/0x540
 >  >  [<ffffffff811610d0>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
 >  >  [<ffffffff81162056>] compact_zone+0x216/0x480
 >  >  [<ffffffff81321ad8>] ? debug_check_no_obj_freed+0x88/0x210
 >  >  [<ffffffff8116259d>] compact_zone_order+0x8d/0xd0
 >  >  [<ffffffff811626a9>] try_to_compact_pages+0xc9/0x140
 >  >  [<ffffffff81649f4e>] __alloc_pages_direct_compact+0xaa/0x1d0
 >  >  [<ffffffff8114562b>] __alloc_pages_nodemask+0x60b/0xab0
 >  >  [<ffffffff81321bbc>] ? debug_check_no_obj_freed+0x16c/0x210
 >  >  [<ffffffff81185236>] alloc_pages_vma+0xb6/0x190
 >  >  [<ffffffff81195d8d>] khugepaged+0x95d/0x1570
 >  >  [<ffffffff81073350>] ? wake_up_bit+0x40/0x40
 >  >  [<ffffffff81195430>] ? collect_mm_slot+0xa0/0xa0
 >  >  [<ffffffff81072c37>] kthread+0xb7/0xc0
 >  >  [<ffffffff8165dc14>] kernel_thread_helper+0x4/0x10
 >  >  [<ffffffff8165511d>] ? retint_restore_args+0xe/0xe
 >  >  [<ffffffff81072b80>] ? flush_kthread_worker+0x190/0x190
 >  >  [<ffffffff8165dc10>] ? gs_change+0xb/0xb
 > 
 > Seems this can be triggered from mmap, as well as from khugepaged..
 > 
 > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
 > Modules linked in: tun dccp_ipv4 dccp nfnetlink sctp libcrc32c fuse ipt_ULOG binfmt_misc caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel serio_raw microcode pcspkr i2c_i801 usb_debug lpc_ich mfd_core e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
 > Pid: 1171, comm: trinity-child4 Not tainted 3.4.0+ #38
 > Call Trace:
 >  [<ffffffff810490ef>] warn_slowpath_common+0x7f/0xc0
 >  [<ffffffff8104914a>] warn_slowpath_null+0x1a/0x20
 >  [<ffffffff8114b4ea>] __set_page_dirty_nobuffers+0x13a/0x170
 >  [<ffffffff81197db2>] migrate_page_copy+0x1e2/0x260
 >  [<ffffffff81197e8b>] migrate_page+0x5b/0x70
 >  [<ffffffff81197f45>] move_to_new_page+0xa5/0x260
 >  [<ffffffff81198738>] migrate_pages+0x4c8/0x540
 >  [<ffffffff811659e0>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
 >  [<ffffffff81166966>] compact_zone+0x216/0x480
 >  [<ffffffff810b1318>] ? trace_hardirqs_off_caller+0x28/0xc0
 >  [<ffffffff81166ead>] compact_zone_order+0x8d/0xd0
 >  [<ffffffff81149525>] ? get_page_from_freelist+0x565/0x970
 >  [<ffffffff81166fb9>] try_to_compact_pages+0xc9/0x140
 >  [<ffffffff81652591>] __alloc_pages_direct_compact+0xaa/0x1d0
 >  [<ffffffff81149f3b>] __alloc_pages_nodemask+0x60b/0xab0
 >  [<ffffffff810b1318>] ? trace_hardirqs_off_caller+0x28/0xc0
 >  [<ffffffff810b4c00>] ? __lock_acquire+0x2b0/0x1aa0
 >  [<ffffffff81189cc6>] alloc_pages_vma+0xb6/0x190
 >  [<ffffffff8119cdb3>] do_huge_pmd_anonymous_page+0x133/0x310
 >  [<ffffffff8116c0c2>] handle_mm_fault+0x242/0x2e0
 >  [<ffffffff8116c372>] __get_user_pages+0x142/0x560
 >  [<ffffffff81171a38>] ? mmap_region+0x3f8/0x630
 >  [<ffffffff8116c842>] get_user_pages+0x52/0x60
 >  [<ffffffff8116d732>] make_pages_present+0x92/0xc0
 >  [<ffffffff811719e6>] mmap_region+0x3a6/0x630
 >  [<ffffffff81050b7c>] ? do_setitimer+0x1cc/0x310
 >  [<ffffffff81171fcd>] do_mmap_pgoff+0x35d/0x3b0
 >  [<ffffffff81172086>] ? sys_mmap_pgoff+0x66/0x240
 >  [<ffffffff811720a4>] sys_mmap_pgoff+0x84/0x240
 >  [<ffffffff813225be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 >  [<ffffffff81006ca2>] sys_mmap+0x22/0x30
 >  [<ffffffff81664ed2>] system_call_fastpath+0x16/0x1b
 > ---[ end trace 336c91f371296e41 ]---
 > 
 > 
 > 
 > I'd bisect this, but it takes a few hours to trigger, which makes it hard
 > to distinguish between 'good kernel' and 'hasn't triggered yet'.

So I bisected it anyway, and it led to ...


3f31d07571eeea18a7d34db9af21d2285b807a17 is the first bad commit
commit 3f31d07571eeea18a7d34db9af21d2285b807a17
Author: Hugh Dickins <hughd@google.com>
Date:   Tue May 29 15:06:40 2012 -0700

    mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE
    
    Now tmpfs supports hole-punching via fallocate(), switch madvise_remove()
    to use do_fallocate() instead of vmtruncate_range(): which extends
    madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs.
    
    There is one more user of vmtruncate_range() in our tree,
    staging/android's ashmem_shrink(): convert it to use do_fallocate() too
    (but if its unpinned areas are already unmapped - I don't know - then it
    would do better to use shmem_truncate_range() directly).
    
    Based-on-patch-by: Cong Wang <amwang@redhat.com>
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Colin Cross <ccross@android.com>
    Cc: John Stultz <john.stultz@linaro.org>
    Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
    Cc: "Theodore Ts'o" <tytso@mit.edu>
    Cc: Andreas Dilger <adilger@dilger.ca>
    Cc: Mark Fasheh <mfasheh@suse.de>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Ben Myers <bpm@sgi.com>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


Hugh ?

I'll repeat the bisect tomorrow just to be sure. (It took all day, even though
there were only a half dozen bisect points, as I ran the test for an hour on
each build to see what fell out).

Here's what I found..

git bisect start 'mm/'
# bad: [4b395d7ea79472ac240ee8768b4930ca9ce096ef] Merge /home/davej/src/git-trees/kernel/linux
git bisect bad 4b395d7ea79472ac240ee8768b4930ca9ce096ef
# good: [76e10d158efb6d4516018846f60c2ab5501900bc] Linux 3.4
git bisect good 76e10d158efb6d4516018846f60c2ab5501900bc
# good: [c6785b6bf1b2a4b47238b24ee56f61e27c3af682] mm: bootmem: rename alloc_bootmem_core to alloc_bootmem_bdata
git bisect good c6785b6bf1b2a4b47238b24ee56f61e27c3af682
# bad: [89abfab133ef1f5902abafb744df72793213ac19] mm/memcg: move reclaim_stat into lruvec
git bisect bad 89abfab133ef1f5902abafb744df72793213ac19
# bad: [4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3] tmpfs: support SEEK_DATA and SEEK_HOLE
git bisect bad 4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3
# good: [bde05d1ccd512696b09db9dd2e5f33ad19152605] shmem: replace page if mapping excludes its zone
git bisect good bde05d1ccd512696b09db9dd2e5f33ad19152605
# bad: [3f31d07571eeea18a7d34db9af21d2285b807a17] mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE
git bisect bad 3f31d07571eeea18a7d34db9af21d2285b807a17
# good: [ec9516fbc5fa814014991e1ae7f8860127122105] tmpfs: optimize clearing when writing
git bisect good ec9516fbc5fa814014991e1ae7f8860127122105
# good: [83e4fa9c16e4af7122e31be3eca5d57881d236fe] tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE
git bisect good 83e4fa9c16e4af7122e31be3eca5d57881d236fe


This has been a challenge to bisect additionally because I'm not sure if the other mm
bug I reported in the last few days (the list_debug/list_add corruption warnings in the
compaction code) are related or not. Sometimes during the bisect these errors happened
in pairs, sometimes only together.  The 'good' builds showed no errors at all.

As a reminder, the list_add corruption looks like this...

WARNING: at lib/list_debug.c:29 __list_add+0x6c/0x90()
list_add corruption. next->prev should be prev (ffff88014e5d9ed8), but was ffffea0004f48360. (next=ffffea0004b23920).
Modules linked in: ipt_ULOG fuse tun nfnetlink binfmt_misc sctp libcrc32c caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw i2c_i801 pcspkr e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
Pid: 24594, comm: trinity-child1 Not tainted 3.4.0+ #42
Call Trace:
 [<ffffffff81048fdf>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff810490d6>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff810b767d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff813259dc>] __list_add+0x6c/0x90
 [<ffffffff8114591d>] move_freepages_block+0x16d/0x190
 [<ffffffff81165773>] suitable_migration_target.isra.14+0x1b3/0x1d0
 [<ffffffff81165cab>] compaction_alloc+0x1db/0x2f0
 [<ffffffff81198357>] migrate_pages+0xc7/0x540
 [<ffffffff81165ad0>] ? isolate_freepages_block+0x260/0x260
 [<ffffffff81166946>] compact_zone+0x216/0x480
 [<ffffffff81166e8d>] compact_zone_order+0x8d/0xd0
 [<ffffffff81149565>] ? get_page_from_freelist+0x565/0x970
 [<ffffffff81166f99>] try_to_compact_pages+0xc9/0x140
 [<ffffffff8163b7f2>] __alloc_pages_direct_compact+0xaa/0x1d0
 [<ffffffff81149f7b>] __alloc_pages_nodemask+0x60b/0xab0
 [<ffffffff810b12d8>] ? trace_hardirqs_off_caller+0x28/0xc0
 [<ffffffff810b4c00>] ? __lock_acquire+0x2f0/0x1aa0
 [<ffffffff81189ce6>] alloc_pages_vma+0xb6/0x190
 [<ffffffff8119cd83>] do_huge_pmd_anonymous_page+0x133/0x310
 [<ffffffff8116c0a2>] handle_mm_fault+0x242/0x2e0
 [<ffffffff8116c352>] __get_user_pages+0x142/0x560
 [<ffffffff81171a18>] ? mmap_region+0x3f8/0x630
 [<ffffffff8116c822>] get_user_pages+0x52/0x60
 [<ffffffff8116d712>] make_pages_present+0x92/0xc0
 [<ffffffff811719c6>] mmap_region+0x3a6/0x630
 [<ffffffff81050a3c>] ? do_setitimer+0x1cc/0x310
 [<ffffffff81171fad>] do_mmap_pgoff+0x35d/0x3b0
 [<ffffffff81172066>] ? sys_mmap_pgoff+0x66/0x240
 [<ffffffff81172084>] sys_mmap_pgoff+0x84/0x240
 [<ffffffff8131f31e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81006ca2>] sys_mmap+0x22/0x30
 [<ffffffff8164e012>] system_call_fastpath+0x16/0x1b
---[ end trace b606ea2a53bf1425 ]---

On an affected kernel, it'll show up within an hour of fuzzing on a fast machine.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01  2:31     ` Dave Jones
@ 2012-06-01  2:43       ` Linus Torvalds
  -1 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-01  2:43 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, linux-mm, Andrew Morton,
	Linus Torvalds, Hugh Dickins, Cong Wang

On Thu, May 31, 2012 at 7:31 PM, Dave Jones <davej@redhat.com> wrote:
>
> So I bisected it anyway, and it led to ...

Ok, that doesn't sound entirely unlikely, but considering that you're
nervous about the bisection, please just try to revert it and see if
that fixes your testcase.

You'll obviously need to revert the commit that removes
vmtruncate_range() too, since reverting 3f31d07571ee will re-introduce
the use of it (it's the next one:
17cf28afea2a1112f240a3a2da8af883be024811), but it looks like those two
commits revert cleanly and the end result seems to compile ok.

               Linus

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01  2:43       ` Linus Torvalds
  0 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-01  2:43 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, linux-mm, Andrew Morton,
	Linus Torvalds, Hugh Dickins, Cong Wang

On Thu, May 31, 2012 at 7:31 PM, Dave Jones <davej@redhat.com> wrote:
>
> So I bisected it anyway, and it led to ...

Ok, that doesn't sound entirely unlikely, but considering that you're
nervous about the bisection, please just try to revert it and see if
that fixes your testcase.

You'll obviously need to revert the commit that removes
vmtruncate_range() too, since reverting 3f31d07571ee will re-introduce
the use of it (it's the next one:
17cf28afea2a1112f240a3a2da8af883be024811), but it looks like those two
commits revert cleanly and the end result seems to compile ok.

               Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01  2:31     ` Dave Jones
@ 2012-06-01  8:44       ` Hugh Dickins
  -1 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-01  8:44 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linus Torvalds, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On Thu, 31 May 2012, Dave Jones wrote:
> On Wed, May 30, 2012 at 08:57:40PM -0400, Dave Jones wrote:
>  > On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote:
>  >  > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc
>  >  > 
>  >  > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()

I did see your reports, and noted to come back to them, but sad to say I
hadn't even made time to check out line 1990 of mm/page-writeback.c: ah,
that WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));

>  >  > Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75
>  >  > Call Trace:
>  >  >  [<ffffffff81146bda>] __set_page_dirty_nobuffers+0x13a/0x170
>  >  >  [<ffffffff81193322>] migrate_page_copy+0x1e2/0x260
>  > 
>  > Seems this can be triggered from mmap, as well as from khugepaged..
>  > 
>  > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
>  > Pid: 1171, comm: trinity-child4 Not tainted 3.4.0+ #38
>  > Call Trace:
>  >  [<ffffffff8114b4ea>] __set_page_dirty_nobuffers+0x13a/0x170
>  >  [<ffffffff81197db2>] migrate_page_copy+0x1e2/0x260
>  > 
>  > I'd bisect this, but it takes a few hours to trigger, which makes it hard
>  > to distinguish between 'good kernel' and 'hasn't triggered yet'.
> 
> So I bisected it anyway, and it led to ...

Thanks so much for taking the trouble.

> 
> 3f31d07571eeea18a7d34db9af21d2285b807a17 is the first bad commit
> commit 3f31d07571eeea18a7d34db9af21d2285b807a17
> Author: Hugh Dickins <hughd@google.com>
> Date:   Tue May 29 15:06:40 2012 -0700
> 
>     mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE
>     
>     Now tmpfs supports hole-punching via fallocate(), switch madvise_remove()
>     to use do_fallocate() instead of vmtruncate_range(): which extends
>     madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs.
> 
> Hugh ?

Ow, you've caught me.

> 
> I'll repeat the bisect tomorrow just to be sure. (It took all day, even though
> there were only a half dozen bisect points, as I ran the test for an hour on
> each build to see what fell out).
> 
> Here's what I found..
> 
> git bisect start 'mm/'
> # bad: [4b395d7ea79472ac240ee8768b4930ca9ce096ef] Merge /home/davej/src/git-trees/kernel/linux
> git bisect bad 4b395d7ea79472ac240ee8768b4930ca9ce096ef
> # good: [76e10d158efb6d4516018846f60c2ab5501900bc] Linux 3.4
> git bisect good 76e10d158efb6d4516018846f60c2ab5501900bc
> # good: [c6785b6bf1b2a4b47238b24ee56f61e27c3af682] mm: bootmem: rename alloc_bootmem_core to alloc_bootmem_bdata
> git bisect good c6785b6bf1b2a4b47238b24ee56f61e27c3af682
> # bad: [89abfab133ef1f5902abafb744df72793213ac19] mm/memcg: move reclaim_stat into lruvec
> git bisect bad 89abfab133ef1f5902abafb744df72793213ac19
> # bad: [4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3] tmpfs: support SEEK_DATA and SEEK_HOLE
> git bisect bad 4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3
> # good: [bde05d1ccd512696b09db9dd2e5f33ad19152605] shmem: replace page if mapping excludes its zone
> git bisect good bde05d1ccd512696b09db9dd2e5f33ad19152605
> # bad: [3f31d07571eeea18a7d34db9af21d2285b807a17] mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE
> git bisect bad 3f31d07571eeea18a7d34db9af21d2285b807a17
> # good: [ec9516fbc5fa814014991e1ae7f8860127122105] tmpfs: optimize clearing when writing
> git bisect good ec9516fbc5fa814014991e1ae7f8860127122105
> # good: [83e4fa9c16e4af7122e31be3eca5d57881d236fe] tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE
> git bisect good 83e4fa9c16e4af7122e31be3eca5d57881d236fe

That puzzled me for quite a while: it seemed so much more likely that
your bisection would converge on the commit which comes a few later,
1635f6a74152 "tmpfs: undo fallocation on failure", where indeed I do
start to play around with tmpfs pages unlocked while !PageUptodate.

And yes, they're PageDirty !PagePrivate, so migration could very well
end up trying to migrate one and hitting line 1990.  It's an aberration
of migrate_page_copy(), that it uses __set_page_dirty_nobuffers() on
mappings which would never normally go that way at all (I discovered
this last year, when I experimented with radix_tree tags for swap in
tmpfs, and hit upon this rare case where page migration sets a dirty
tag for a tmpfs page, despite tmpfs never using tags).

One half of the patch at the bottom should fix that: I'm not sure that
it's the fix we actually want (a mapping_cap_account_dirty test might
be more appropriate, but it's easier just to test a page flag here);
but it should be good to shed more light on the problem.

Because your bisection converged on a commit a few before I introduced
that bug - and although it was a difficult bisection, you would be very
unlikely to mistake a good for bad: the danger was the other way around.

So I'm wondering if your trinity fuzzer happens to succeed a lot more
often on madvise MADV_REMOVEs than fallocate FALLOC_FL_PUNCH_HOLEs, and
the bug you converged on is not in tmpfs, but in ext4 (or xfs? or ocfs2?),
which began to support MADV_REMOVE with that commit.

So the second half of the patch should show which filesystem's page is
involved when you hit the WARN_ON - unless the first half of the patch
turns out to stop the warnings completely, in which case I need to think
harder about what was going on in tmpfs, and whether it matters.

Or another possibility is that the bad commit doesn't actually touch mm
at all: you were doing a bisection just on mm/ changes, weren't you?

> 
> This has been a challenge to bisect additionally because I'm not sure if the other mm
> bug I reported in the last few days (the list_debug/list_add corruption warnings in the
> compaction code) are related or not.

At present I suspect they're not related; but may change my mind.

> Sometimes during the bisect these errors happened
> in pairs, sometimes only together.

Sometimes in pairs, sometimes together?  I don't understand.

And are "these errors" the list debug warnings,
or list debug warnings and Line 1990 warnings?

> The 'good' builds showed no errors at all.
> 
> As a reminder, the list_add corruption looks like this...
> 
> WARNING: at lib/list_debug.c:29 __list_add+0x6c/0x90()
> list_add corruption. next->prev should be prev (ffff88014e5d9ed8), but was ffffea0004f48360. (next=ffffea0004b23920).
> Pid: 24594, comm: trinity-child1 Not tainted 3.4.0+ #42
> Call Trace:
>  [<ffffffff81048fdf>] warn_slowpath_common+0x7f/0xc0
>  [<ffffffff810490d6>] warn_slowpath_fmt+0x46/0x50
>  [<ffffffff810b767d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff813259dc>] __list_add+0x6c/0x90
>  [<ffffffff8114591d>] move_freepages_block+0x16d/0x190
>  [<ffffffff81165773>] suitable_migration_target.isra.14+0x1b3/0x1d0
>  [<ffffffff81165cab>] compaction_alloc+0x1db/0x2f0
>  [<ffffffff81198357>] migrate_pages+0xc7/0x540
>  [<ffffffff81165ad0>] ? isolate_freepages_block+0x260/0x260
>  [<ffffffff81166946>] compact_zone+0x216/0x480
>  [<ffffffff81166e8d>] compact_zone_order+0x8d/0xd0
>  [<ffffffff81149565>] ? get_page_from_freelist+0x565/0x970
>  [<ffffffff81166f99>] try_to_compact_pages+0xc9/0x140
>  [<ffffffff8163b7f2>] __alloc_pages_direct_compact+0xaa/0x1d0
>  [<ffffffff81149f7b>] __alloc_pages_nodemask+0x60b/0xab0
>  [<ffffffff810b12d8>] ? trace_hardirqs_off_caller+0x28/0xc0
>  [<ffffffff810b4c00>] ? __lock_acquire+0x2f0/0x1aa0
>  [<ffffffff81189ce6>] alloc_pages_vma+0xb6/0x190
>  [<ffffffff8119cd83>] do_huge_pmd_anonymous_page+0x133/0x310
>  [<ffffffff8116c0a2>] handle_mm_fault+0x242/0x2e0
>  [<ffffffff8116c352>] __get_user_pages+0x142/0x560
>  [<ffffffff81171a18>] ? mmap_region+0x3f8/0x630
>  [<ffffffff8116c822>] get_user_pages+0x52/0x60
>  [<ffffffff8116d712>] make_pages_present+0x92/0xc0
>  [<ffffffff811719c6>] mmap_region+0x3a6/0x630
>  [<ffffffff81050a3c>] ? do_setitimer+0x1cc/0x310
>  [<ffffffff81171fad>] do_mmap_pgoff+0x35d/0x3b0
>  [<ffffffff81172066>] ? sys_mmap_pgoff+0x66/0x240
>  [<ffffffff81172084>] sys_mmap_pgoff+0x84/0x240
>  [<ffffffff8131f31e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff81006ca2>] sys_mmap+0x22/0x30
>  [<ffffffff8164e012>] system_call_fastpath+0x16/0x1b
> ---[ end trace b606ea2a53bf1425 ]---
> 
> On an affected kernel, it'll show up within an hour of fuzzing on a fast machine.

Please give this patch a try (preferably on current git), and let us know.

Thanks,
Hugh

--- 3.4.0+/mm/migrate.c	2012-05-27 10:01:43.104049010 -0700
+++ linux/mm/migrate.c	2012-06-01 00:10:58.080098749 -0700
@@ -436,7 +436,10 @@ void migrate_page_copy(struct page *newp
 		 * is actually a signal that all of the page has become dirty.
 		 * Whereas only part of our page may be dirty.
 		 */
-		__set_page_dirty_nobuffers(newpage);
+		if (PageSwapBacked(page))
+			SetPageDirty(newpage);
+		else
+			__set_page_dirty_nobuffers(newpage);
  	}
 
 	mlock_migrate_page(newpage, page);
--- 3.4.0+/mm/page-writeback.c	2012-05-29 08:09:58.304806782 -0700
+++ linux/mm/page-writeback.c	2012-06-01 00:23:43.984116973 -0700
@@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa
 		mapping2 = page_mapping(page);
 		if (mapping2) { /* Race with truncate? */
 			BUG_ON(mapping2 != mapping);
-			WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
+			if (WARN_ON(!PagePrivate(page) && !PageUptodate(page)))
+				print_symbol(KERN_WARNING
+				    "mapping->a_ops->writepage: %s\n",
+				    (unsigned long)mapping->a_ops->writepage);
 			account_page_dirtied(page, mapping);
 			radix_tree_tag_set(&mapping->page_tree,
 				page_index(page), PAGECACHE_TAG_DIRTY);

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01  8:44       ` Hugh Dickins
  0 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-01  8:44 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linus Torvalds, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On Thu, 31 May 2012, Dave Jones wrote:
> On Wed, May 30, 2012 at 08:57:40PM -0400, Dave Jones wrote:
>  > On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote:
>  >  > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc
>  >  > 
>  >  > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()

I did see your reports, and noted to come back to them, but sad to say I
hadn't even made time to check out line 1990 of mm/page-writeback.c: ah,
that WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));

>  >  > Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75
>  >  > Call Trace:
>  >  >  [<ffffffff81146bda>] __set_page_dirty_nobuffers+0x13a/0x170
>  >  >  [<ffffffff81193322>] migrate_page_copy+0x1e2/0x260
>  > 
>  > Seems this can be triggered from mmap, as well as from khugepaged..
>  > 
>  > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
>  > Pid: 1171, comm: trinity-child4 Not tainted 3.4.0+ #38
>  > Call Trace:
>  >  [<ffffffff8114b4ea>] __set_page_dirty_nobuffers+0x13a/0x170
>  >  [<ffffffff81197db2>] migrate_page_copy+0x1e2/0x260
>  > 
>  > I'd bisect this, but it takes a few hours to trigger, which makes it hard
>  > to distinguish between 'good kernel' and 'hasn't triggered yet'.
> 
> So I bisected it anyway, and it led to ...

Thanks so much for taking the trouble.

> 
> 3f31d07571eeea18a7d34db9af21d2285b807a17 is the first bad commit
> commit 3f31d07571eeea18a7d34db9af21d2285b807a17
> Author: Hugh Dickins <hughd@google.com>
> Date:   Tue May 29 15:06:40 2012 -0700
> 
>     mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE
>     
>     Now tmpfs supports hole-punching via fallocate(), switch madvise_remove()
>     to use do_fallocate() instead of vmtruncate_range(): which extends
>     madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs.
> 
> Hugh ?

Ow, you've caught me.

> 
> I'll repeat the bisect tomorrow just to be sure. (It took all day, even though
> there were only a half dozen bisect points, as I ran the test for an hour on
> each build to see what fell out).
> 
> Here's what I found..
> 
> git bisect start 'mm/'
> # bad: [4b395d7ea79472ac240ee8768b4930ca9ce096ef] Merge /home/davej/src/git-trees/kernel/linux
> git bisect bad 4b395d7ea79472ac240ee8768b4930ca9ce096ef
> # good: [76e10d158efb6d4516018846f60c2ab5501900bc] Linux 3.4
> git bisect good 76e10d158efb6d4516018846f60c2ab5501900bc
> # good: [c6785b6bf1b2a4b47238b24ee56f61e27c3af682] mm: bootmem: rename alloc_bootmem_core to alloc_bootmem_bdata
> git bisect good c6785b6bf1b2a4b47238b24ee56f61e27c3af682
> # bad: [89abfab133ef1f5902abafb744df72793213ac19] mm/memcg: move reclaim_stat into lruvec
> git bisect bad 89abfab133ef1f5902abafb744df72793213ac19
> # bad: [4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3] tmpfs: support SEEK_DATA and SEEK_HOLE
> git bisect bad 4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3
> # good: [bde05d1ccd512696b09db9dd2e5f33ad19152605] shmem: replace page if mapping excludes its zone
> git bisect good bde05d1ccd512696b09db9dd2e5f33ad19152605
> # bad: [3f31d07571eeea18a7d34db9af21d2285b807a17] mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE
> git bisect bad 3f31d07571eeea18a7d34db9af21d2285b807a17
> # good: [ec9516fbc5fa814014991e1ae7f8860127122105] tmpfs: optimize clearing when writing
> git bisect good ec9516fbc5fa814014991e1ae7f8860127122105
> # good: [83e4fa9c16e4af7122e31be3eca5d57881d236fe] tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE
> git bisect good 83e4fa9c16e4af7122e31be3eca5d57881d236fe

That puzzled me for quite a while: it seemed so much more likely that
your bisection would converge on the commit which comes a few later,
1635f6a74152 "tmpfs: undo fallocation on failure", where indeed I do
start to play around with tmpfs pages unlocked while !PageUptodate.

And yes, they're PageDirty !PagePrivate, so migration could very well
end up trying to migrate one and hitting line 1990.  It's an aberration
of migrate_page_copy(), that it uses __set_page_dirty_nobuffers() on
mappings which would never normally go that way at all (I discovered
this last year, when I experimented with radix_tree tags for swap in
tmpfs, and hit upon this rare case where page migration sets a dirty
tag for a tmpfs page, despite tmpfs never using tags).

One half of the patch at the bottom should fix that: I'm not sure that
it's the fix we actually want (a mapping_cap_account_dirty test might
be more appropriate, but it's easier just to test a page flag here);
but it should be good to shed more light on the problem.

Because your bisection converged on a commit a few before I introduced
that bug - and although it was a difficult bisection, you would be very
unlikely to mistake a good for bad: the danger was the other way around.

So I'm wondering if your trinity fuzzer happens to succeed a lot more
often on madvise MADV_REMOVEs than fallocate FALLOC_FL_PUNCH_HOLEs, and
the bug you converged on is not in tmpfs, but in ext4 (or xfs? or ocfs2?),
which began to support MADV_REMOVE with that commit.

So the second half of the patch should show which filesystem's page is
involved when you hit the WARN_ON - unless the first half of the patch
turns out to stop the warnings completely, in which case I need to think
harder about what was going on in tmpfs, and whether it matters.

Or another possibility is that the bad commit doesn't actually touch mm
at all: you were doing a bisection just on mm/ changes, weren't you?

> 
> This has been a challenge to bisect additionally because I'm not sure if the other mm
> bug I reported in the last few days (the list_debug/list_add corruption warnings in the
> compaction code) are related or not.

At present I suspect they're not related; but may change my mind.

> Sometimes during the bisect these errors happened
> in pairs, sometimes only together.

Sometimes in pairs, sometimes together?  I don't understand.

And are "these errors" the list debug warnings,
or list debug warnings and Line 1990 warnings?

> The 'good' builds showed no errors at all.
> 
> As a reminder, the list_add corruption looks like this...
> 
> WARNING: at lib/list_debug.c:29 __list_add+0x6c/0x90()
> list_add corruption. next->prev should be prev (ffff88014e5d9ed8), but was ffffea0004f48360. (next=ffffea0004b23920).
> Pid: 24594, comm: trinity-child1 Not tainted 3.4.0+ #42
> Call Trace:
>  [<ffffffff81048fdf>] warn_slowpath_common+0x7f/0xc0
>  [<ffffffff810490d6>] warn_slowpath_fmt+0x46/0x50
>  [<ffffffff810b767d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff813259dc>] __list_add+0x6c/0x90
>  [<ffffffff8114591d>] move_freepages_block+0x16d/0x190
>  [<ffffffff81165773>] suitable_migration_target.isra.14+0x1b3/0x1d0
>  [<ffffffff81165cab>] compaction_alloc+0x1db/0x2f0
>  [<ffffffff81198357>] migrate_pages+0xc7/0x540
>  [<ffffffff81165ad0>] ? isolate_freepages_block+0x260/0x260
>  [<ffffffff81166946>] compact_zone+0x216/0x480
>  [<ffffffff81166e8d>] compact_zone_order+0x8d/0xd0
>  [<ffffffff81149565>] ? get_page_from_freelist+0x565/0x970
>  [<ffffffff81166f99>] try_to_compact_pages+0xc9/0x140
>  [<ffffffff8163b7f2>] __alloc_pages_direct_compact+0xaa/0x1d0
>  [<ffffffff81149f7b>] __alloc_pages_nodemask+0x60b/0xab0
>  [<ffffffff810b12d8>] ? trace_hardirqs_off_caller+0x28/0xc0
>  [<ffffffff810b4c00>] ? __lock_acquire+0x2f0/0x1aa0
>  [<ffffffff81189ce6>] alloc_pages_vma+0xb6/0x190
>  [<ffffffff8119cd83>] do_huge_pmd_anonymous_page+0x133/0x310
>  [<ffffffff8116c0a2>] handle_mm_fault+0x242/0x2e0
>  [<ffffffff8116c352>] __get_user_pages+0x142/0x560
>  [<ffffffff81171a18>] ? mmap_region+0x3f8/0x630
>  [<ffffffff8116c822>] get_user_pages+0x52/0x60
>  [<ffffffff8116d712>] make_pages_present+0x92/0xc0
>  [<ffffffff811719c6>] mmap_region+0x3a6/0x630
>  [<ffffffff81050a3c>] ? do_setitimer+0x1cc/0x310
>  [<ffffffff81171fad>] do_mmap_pgoff+0x35d/0x3b0
>  [<ffffffff81172066>] ? sys_mmap_pgoff+0x66/0x240
>  [<ffffffff81172084>] sys_mmap_pgoff+0x84/0x240
>  [<ffffffff8131f31e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff81006ca2>] sys_mmap+0x22/0x30
>  [<ffffffff8164e012>] system_call_fastpath+0x16/0x1b
> ---[ end trace b606ea2a53bf1425 ]---
> 
> On an affected kernel, it'll show up within an hour of fuzzing on a fast machine.

Please give this patch a try (preferably on current git), and let us know.

Thanks,
Hugh

--- 3.4.0+/mm/migrate.c	2012-05-27 10:01:43.104049010 -0700
+++ linux/mm/migrate.c	2012-06-01 00:10:58.080098749 -0700
@@ -436,7 +436,10 @@ void migrate_page_copy(struct page *newp
 		 * is actually a signal that all of the page has become dirty.
 		 * Whereas only part of our page may be dirty.
 		 */
-		__set_page_dirty_nobuffers(newpage);
+		if (PageSwapBacked(page))
+			SetPageDirty(newpage);
+		else
+			__set_page_dirty_nobuffers(newpage);
  	}
 
 	mlock_migrate_page(newpage, page);
--- 3.4.0+/mm/page-writeback.c	2012-05-29 08:09:58.304806782 -0700
+++ linux/mm/page-writeback.c	2012-06-01 00:23:43.984116973 -0700
@@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa
 		mapping2 = page_mapping(page);
 		if (mapping2) { /* Race with truncate? */
 			BUG_ON(mapping2 != mapping);
-			WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
+			if (WARN_ON(!PagePrivate(page) && !PageUptodate(page)))
+				print_symbol(KERN_WARNING
+				    "mapping->a_ops->writepage: %s\n",
+				    (unsigned long)mapping->a_ops->writepage);
 			account_page_dirtied(page, mapping);
 			radix_tree_tag_set(&mapping->page_tree,
 				page_index(page), PAGECACHE_TAG_DIRTY);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01  8:44       ` Hugh Dickins
@ 2012-06-01  8:51         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 87+ messages in thread
From: KOSAKI Motohiro @ 2012-06-01  8:51 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Linus Torvalds, Andrew Morton, Cong Wang,
	linux-kernel, linux-mm, kosaki.motohiro

>   	mlock_migrate_page(newpage, page);
> --- 3.4.0+/mm/page-writeback.c	2012-05-29 08:09:58.304806782 -0700
> +++ linux/mm/page-writeback.c	2012-06-01 00:23:43.984116973 -0700
> @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa
>   		mapping2 = page_mapping(page);
>   		if (mapping2) { /* Race with truncate? */
>   			BUG_ON(mapping2 != mapping);
> -			WARN_ON_ONCE(!PagePrivate(page)&&  !PageUptodate(page));
> +			if (WARN_ON(!PagePrivate(page)&&  !PageUptodate(page)))
> +				print_symbol(KERN_WARNING
> +				    "mapping->a_ops->writepage: %s\n",
> +				    (unsigned long)mapping->a_ops->writepage);

type mismatch? I guess you want %pf or %pF.



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01  8:51         ` KOSAKI Motohiro
  0 siblings, 0 replies; 87+ messages in thread
From: KOSAKI Motohiro @ 2012-06-01  8:51 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Linus Torvalds, Andrew Morton, Cong Wang,
	linux-kernel, linux-mm, kosaki.motohiro

>   	mlock_migrate_page(newpage, page);
> --- 3.4.0+/mm/page-writeback.c	2012-05-29 08:09:58.304806782 -0700
> +++ linux/mm/page-writeback.c	2012-06-01 00:23:43.984116973 -0700
> @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa
>   		mapping2 = page_mapping(page);
>   		if (mapping2) { /* Race with truncate? */
>   			BUG_ON(mapping2 != mapping);
> -			WARN_ON_ONCE(!PagePrivate(page)&&  !PageUptodate(page));
> +			if (WARN_ON(!PagePrivate(page)&&  !PageUptodate(page)))
> +				print_symbol(KERN_WARNING
> +				    "mapping->a_ops->writepage: %s\n",
> +				    (unsigned long)mapping->a_ops->writepage);

type mismatch? I guess you want %pf or %pF.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01  8:51         ` KOSAKI Motohiro
@ 2012-06-01  9:08           ` Hugh Dickins
  -1 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-01  9:08 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Dave Jones, Linus Torvalds, Andrew Morton, Cong Wang,
	linux-kernel, linux-mm

On Fri, 1 Jun 2012, KOSAKI Motohiro wrote:
> >   	mlock_migrate_page(newpage, page);
> > --- 3.4.0+/mm/page-writeback.c	2012-05-29 08:09:58.304806782 -0700
> > +++ linux/mm/page-writeback.c	2012-06-01 00:23:43.984116973 -0700
> > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa
> >   		mapping2 = page_mapping(page);
> >   		if (mapping2) { /* Race with truncate? */
> >   			BUG_ON(mapping2 != mapping);
> > -			WARN_ON_ONCE(!PagePrivate(page)&&
> > !PageUptodate(page));
> > +			if (WARN_ON(!PagePrivate(page)&&
> > !PageUptodate(page)))
> > +				print_symbol(KERN_WARNING
> > +				    "mapping->a_ops->writepage: %s\n",
> > +				    (unsigned
> > long)mapping->a_ops->writepage);
> 
> type mismatch?

I don't think so: I just copied from print_bad_pte().
Probably you're reading "printk" where it's "print_symbol"?

> I guess you want %pf or %pF.

I expect there is new-fangled %pMagic that can do it too, yes.

Hugh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01  9:08           ` Hugh Dickins
  0 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-01  9:08 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Dave Jones, Linus Torvalds, Andrew Morton, Cong Wang,
	linux-kernel, linux-mm

On Fri, 1 Jun 2012, KOSAKI Motohiro wrote:
> >   	mlock_migrate_page(newpage, page);
> > --- 3.4.0+/mm/page-writeback.c	2012-05-29 08:09:58.304806782 -0700
> > +++ linux/mm/page-writeback.c	2012-06-01 00:23:43.984116973 -0700
> > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa
> >   		mapping2 = page_mapping(page);
> >   		if (mapping2) { /* Race with truncate? */
> >   			BUG_ON(mapping2 != mapping);
> > -			WARN_ON_ONCE(!PagePrivate(page)&&
> > !PageUptodate(page));
> > +			if (WARN_ON(!PagePrivate(page)&&
> > !PageUptodate(page)))
> > +				print_symbol(KERN_WARNING
> > +				    "mapping->a_ops->writepage: %s\n",
> > +				    (unsigned
> > long)mapping->a_ops->writepage);
> 
> type mismatch?

I don't think so: I just copied from print_bad_pte().
Probably you're reading "printk" where it's "print_symbol"?

> I guess you want %pf or %pF.

I expect there is new-fangled %pMagic that can do it too, yes.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01  9:08           ` Hugh Dickins
@ 2012-06-01  9:12             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 87+ messages in thread
From: KOSAKI Motohiro @ 2012-06-01  9:12 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Linus Torvalds, Andrew Morton, Cong Wang,
	linux-kernel, linux-mm

On Fri, Jun 1, 2012 at 5:08 AM, Hugh Dickins <hughd@google.com> wrote:
> On Fri, 1 Jun 2012, KOSAKI Motohiro wrote:
>> >     mlock_migrate_page(newpage, page);
>> > --- 3.4.0+/mm/page-writeback.c      2012-05-29 08:09:58.304806782 -0700
>> > +++ linux/mm/page-writeback.c       2012-06-01 00:23:43.984116973 -0700
>> > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa
>> >             mapping2 = page_mapping(page);
>> >             if (mapping2) { /* Race with truncate? */
>> >                     BUG_ON(mapping2 != mapping);
>> > -                   WARN_ON_ONCE(!PagePrivate(page)&&
>> > !PageUptodate(page));
>> > +                   if (WARN_ON(!PagePrivate(page)&&
>> > !PageUptodate(page)))
>> > +                           print_symbol(KERN_WARNING
>> > +                               "mapping->a_ops->writepage: %s\n",
>> > +                               (unsigned
>> > long)mapping->a_ops->writepage);
>>
>> type mismatch?
>
> I don't think so: I just copied from print_bad_pte().
> Probably you're reading "printk" where it's "print_symbol"?

Oops, yes, sorry for noise.


>> I guess you want %pf or %pF.
>
> I expect there is new-fangled %pMagic that can do it too, yes.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01  9:12             ` KOSAKI Motohiro
  0 siblings, 0 replies; 87+ messages in thread
From: KOSAKI Motohiro @ 2012-06-01  9:12 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Linus Torvalds, Andrew Morton, Cong Wang,
	linux-kernel, linux-mm

On Fri, Jun 1, 2012 at 5:08 AM, Hugh Dickins <hughd@google.com> wrote:
> On Fri, 1 Jun 2012, KOSAKI Motohiro wrote:
>> >     mlock_migrate_page(newpage, page);
>> > --- 3.4.0+/mm/page-writeback.c      2012-05-29 08:09:58.304806782 -0700
>> > +++ linux/mm/page-writeback.c       2012-06-01 00:23:43.984116973 -0700
>> > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa
>> >             mapping2 = page_mapping(page);
>> >             if (mapping2) { /* Race with truncate? */
>> >                     BUG_ON(mapping2 != mapping);
>> > -                   WARN_ON_ONCE(!PagePrivate(page)&&
>> > !PageUptodate(page));
>> > +                   if (WARN_ON(!PagePrivate(page)&&
>> > !PageUptodate(page)))
>> > +                           print_symbol(KERN_WARNING
>> > +                               "mapping->a_ops->writepage: %s\n",
>> > +                               (unsigned
>> > long)mapping->a_ops->writepage);
>>
>> type mismatch?
>
> I don't think so: I just copied from print_bad_pte().
> Probably you're reading "printk" where it's "print_symbol"?

Oops, yes, sorry for noise.


>> I guess you want %pf or %pF.
>
> I expect there is new-fangled %pMagic that can do it too, yes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01  2:43       ` Linus Torvalds
@ 2012-06-01 13:43         ` Dave Jones
  -1 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-01 13:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel, linux-mm, Andrew Morton, Hugh Dickins, Cong Wang

On Thu, May 31, 2012 at 07:43:25PM -0700, Linus Torvalds wrote:
 > On Thu, May 31, 2012 at 7:31 PM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > So I bisected it anyway, and it led to ...
 > 
 > Ok, that doesn't sound entirely unlikely, but considering that you're
 > nervous about the bisection, please just try to revert it and see if
 > that fixes your testcase.
 > 
 > You'll obviously need to revert the commit that removes
 > vmtruncate_range() too, since reverting 3f31d07571ee will re-introduce
 > the use of it (it's the next one:
 > 17cf28afea2a1112f240a3a2da8af883be024811), but it looks like those two
 > commits revert cleanly and the end result seems to compile ok.

crap, so much for that theory.  I ran latest with those two reverted
overnight, and woke up to a dead box.  Over serial console, I see
a bunch of those same compaction oopses (Via sys_mmap_pgoff),
and then kernel BUG at include/linux/mm.h:448! was the last thing
it said before it choked.

I'll redo the bisect. It's possible that one of the 'good' paths just
didn't run for long enough.

	Dave


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01 13:43         ` Dave Jones
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-01 13:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel, linux-mm, Andrew Morton, Hugh Dickins, Cong Wang

On Thu, May 31, 2012 at 07:43:25PM -0700, Linus Torvalds wrote:
 > On Thu, May 31, 2012 at 7:31 PM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > So I bisected it anyway, and it led to ...
 > 
 > Ok, that doesn't sound entirely unlikely, but considering that you're
 > nervous about the bisection, please just try to revert it and see if
 > that fixes your testcase.
 > 
 > You'll obviously need to revert the commit that removes
 > vmtruncate_range() too, since reverting 3f31d07571ee will re-introduce
 > the use of it (it's the next one:
 > 17cf28afea2a1112f240a3a2da8af883be024811), but it looks like those two
 > commits revert cleanly and the end result seems to compile ok.

crap, so much for that theory.  I ran latest with those two reverted
overnight, and woke up to a dead box.  Over serial console, I see
a bunch of those same compaction oopses (Via sys_mmap_pgoff),
and then kernel BUG at include/linux/mm.h:448! was the last thing
it said before it choked.

I'll redo the bisect. It's possible that one of the 'good' paths just
didn't run for long enough.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01  8:44       ` Hugh Dickins
@ 2012-06-01 14:09         ` Dave Jones
  -1 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-01 14:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On Fri, Jun 01, 2012 at 01:44:44AM -0700, Hugh Dickins wrote:

 > > 3f31d07571eeea18a7d34db9af21d2285b807a17 is the first bad commit
 > > commit 3f31d07571eeea18a7d34db9af21d2285b807a17
 > > Author: Hugh Dickins <hughd@google.com>
 > > Date:   Tue May 29 15:06:40 2012 -0700
 > > 
 > >     mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE
 > >     
 > >     Now tmpfs supports hole-punching via fallocate(), switch madvise_remove()
 > >     to use do_fallocate() instead of vmtruncate_range(): which extends
 > >     madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs.
 > > 
 > > Hugh ?
 > 
 > Ow, you've caught me.

As I said in another mail, it looks like the bisect was wrong somewhere,
as with this backed out I still see problems.
 
 > One half of the patch at the bottom should fix that: I'm not sure that
 > it's the fix we actually want (a mapping_cap_account_dirty test might
 > be more appropriate, but it's easier just to test a page flag here);
 > but it should be good to shed more light on the problem.

I'll give the patch a try anyway, as builds are quick on that box.

 > So I'm wondering if your trinity fuzzer happens to succeed a lot more
 > often on madvise MADV_REMOVEs than fallocate FALLOC_FL_PUNCH_HOLEs, and
 > the bug you converged on is not in tmpfs, but in ext4 (or xfs? or ocfs2?),
 > which began to support MADV_REMOVE with that commit.

ext4 is a possibility.
 
 > So the second half of the patch should show which filesystem's page is
 > involved when you hit the WARN_ON - unless the first half of the patch
 > turns out to stop the warnings completely, in which case I need to think
 > harder about what was going on in tmpfs, and whether it matters.
 > 
 > Or another possibility is that the bad commit doesn't actually touch mm
 > at all: you were doing a bisection just on mm/ changes, weren't you?

oh, good point. It hadn't occured to me that this could be fs related.
The mm-heavy stack-trace may have misled me.

 > > Sometimes during the bisect these errors happened
 > > in pairs, sometimes only together.
 > 
 > Sometimes in pairs, sometimes together?  I don't understand.

beware late-night emails. I meant sometimes I saw both the list-debug's and the WARN,
but other times I saw only one or the other.

 > Please give this patch a try (preferably on current git), and let us know.
 
Will do.

	Dave

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01 14:09         ` Dave Jones
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-01 14:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On Fri, Jun 01, 2012 at 01:44:44AM -0700, Hugh Dickins wrote:

 > > 3f31d07571eeea18a7d34db9af21d2285b807a17 is the first bad commit
 > > commit 3f31d07571eeea18a7d34db9af21d2285b807a17
 > > Author: Hugh Dickins <hughd@google.com>
 > > Date:   Tue May 29 15:06:40 2012 -0700
 > > 
 > >     mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE
 > >     
 > >     Now tmpfs supports hole-punching via fallocate(), switch madvise_remove()
 > >     to use do_fallocate() instead of vmtruncate_range(): which extends
 > >     madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs.
 > > 
 > > Hugh ?
 > 
 > Ow, you've caught me.

As I said in another mail, it looks like the bisect was wrong somewhere,
as with this backed out I still see problems.
 
 > One half of the patch at the bottom should fix that: I'm not sure that
 > it's the fix we actually want (a mapping_cap_account_dirty test might
 > be more appropriate, but it's easier just to test a page flag here);
 > but it should be good to shed more light on the problem.

I'll give the patch a try anyway, as builds are quick on that box.

 > So I'm wondering if your trinity fuzzer happens to succeed a lot more
 > often on madvise MADV_REMOVEs than fallocate FALLOC_FL_PUNCH_HOLEs, and
 > the bug you converged on is not in tmpfs, but in ext4 (or xfs? or ocfs2?),
 > which began to support MADV_REMOVE with that commit.

ext4 is a possibility.
 
 > So the second half of the patch should show which filesystem's page is
 > involved when you hit the WARN_ON - unless the first half of the patch
 > turns out to stop the warnings completely, in which case I need to think
 > harder about what was going on in tmpfs, and whether it matters.
 > 
 > Or another possibility is that the bad commit doesn't actually touch mm
 > at all: you were doing a bisection just on mm/ changes, weren't you?

oh, good point. It hadn't occured to me that this could be fs related.
The mm-heavy stack-trace may have misled me.

 > > Sometimes during the bisect these errors happened
 > > in pairs, sometimes only together.
 > 
 > Sometimes in pairs, sometimes together?  I don't understand.

beware late-night emails. I meant sometimes I saw both the list-debug's and the WARN,
but other times I saw only one or the other.

 > Please give this patch a try (preferably on current git), and let us know.
 
Will do.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01  8:44       ` Hugh Dickins
@ 2012-06-01 14:14         ` Dave Jones
  -1 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-01 14:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On Fri, Jun 01, 2012 at 01:44:44AM -0700, Hugh Dickins wrote:
 > So I'm wondering if your trinity fuzzer happens to succeed a lot more
 > often on madvise MADV_REMOVEs than fallocate FALLOC_FL_PUNCH_HOLEs, and
 > the bug you converged on is not in tmpfs, but in ext4 (or xfs? or ocfs2?),
 > which began to support MADV_REMOVE with that commit.

One more thing: I happened to see this during a kernel build last night
on another machine too, so it's not just fuzzing fallout. I'm surprised more
people aren't seeing it.

	Dave


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01 14:14         ` Dave Jones
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-01 14:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On Fri, Jun 01, 2012 at 01:44:44AM -0700, Hugh Dickins wrote:
 > So I'm wondering if your trinity fuzzer happens to succeed a lot more
 > often on madvise MADV_REMOVEs than fallocate FALLOC_FL_PUNCH_HOLEs, and
 > the bug you converged on is not in tmpfs, but in ext4 (or xfs? or ocfs2?),
 > which began to support MADV_REMOVE with that commit.

One more thing: I happened to see this during a kernel build last night
on another machine too, so it's not just fuzzing fallout. I'm surprised more
people aren't seeing it.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01  8:44       ` Hugh Dickins
@ 2012-06-01 16:12         ` Dave Jones
  -1 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-01 16:12 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On Fri, Jun 01, 2012 at 01:44:44AM -0700, Hugh Dickins wrote:

 > Please give this patch a try (preferably on current git), and let us know.
 > 
 > Thanks,
 > Hugh
 > 
 > --- 3.4.0+/mm/migrate.c	2012-05-27 10:01:43.104049010 -0700
 > +++ linux/mm/migrate.c	2012-06-01 00:10:58.080098749 -0700
 > @@ -436,7 +436,10 @@ void migrate_page_copy(struct page *newp
 >  		 * is actually a signal that all of the page has become dirty.
 >  		 * Whereas only part of our page may be dirty.
 >  		 */
 > -		__set_page_dirty_nobuffers(newpage);
 > +		if (PageSwapBacked(page))
 > +			SetPageDirty(newpage);
 > +		else
 > +			__set_page_dirty_nobuffers(newpage);
 >   	}
 >  
 >  	mlock_migrate_page(newpage, page);
 > --- 3.4.0+/mm/page-writeback.c	2012-05-29 08:09:58.304806782 -0700
 > +++ linux/mm/page-writeback.c	2012-06-01 00:23:43.984116973 -0700
 > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa
 >  		mapping2 = page_mapping(page);
 >  		if (mapping2) { /* Race with truncate? */
 >  			BUG_ON(mapping2 != mapping);
 > -			WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
 > +			if (WARN_ON(!PagePrivate(page) && !PageUptodate(page)))
 > +				print_symbol(KERN_WARNING
 > +				    "mapping->a_ops->writepage: %s\n",
 > +				    (unsigned long)mapping->a_ops->writepage);
 >  			account_page_dirtied(page, mapping);
 >  			radix_tree_tag_set(&mapping->page_tree,
 >  				page_index(page), PAGECACHE_TAG_DIRTY);

So with this applied, I don't seem to be able to trigger it. It's been running two hours
so far. I'll leave it running, but right now I don't know what to make of this.

	Dave

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01 16:12         ` Dave Jones
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-01 16:12 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On Fri, Jun 01, 2012 at 01:44:44AM -0700, Hugh Dickins wrote:

 > Please give this patch a try (preferably on current git), and let us know.
 > 
 > Thanks,
 > Hugh
 > 
 > --- 3.4.0+/mm/migrate.c	2012-05-27 10:01:43.104049010 -0700
 > +++ linux/mm/migrate.c	2012-06-01 00:10:58.080098749 -0700
 > @@ -436,7 +436,10 @@ void migrate_page_copy(struct page *newp
 >  		 * is actually a signal that all of the page has become dirty.
 >  		 * Whereas only part of our page may be dirty.
 >  		 */
 > -		__set_page_dirty_nobuffers(newpage);
 > +		if (PageSwapBacked(page))
 > +			SetPageDirty(newpage);
 > +		else
 > +			__set_page_dirty_nobuffers(newpage);
 >   	}
 >  
 >  	mlock_migrate_page(newpage, page);
 > --- 3.4.0+/mm/page-writeback.c	2012-05-29 08:09:58.304806782 -0700
 > +++ linux/mm/page-writeback.c	2012-06-01 00:23:43.984116973 -0700
 > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa
 >  		mapping2 = page_mapping(page);
 >  		if (mapping2) { /* Race with truncate? */
 >  			BUG_ON(mapping2 != mapping);
 > -			WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
 > +			if (WARN_ON(!PagePrivate(page) && !PageUptodate(page)))
 > +				print_symbol(KERN_WARNING
 > +				    "mapping->a_ops->writepage: %s\n",
 > +				    (unsigned long)mapping->a_ops->writepage);
 >  			account_page_dirtied(page, mapping);
 >  			radix_tree_tag_set(&mapping->page_tree,
 >  				page_index(page), PAGECACHE_TAG_DIRTY);

So with this applied, I don't seem to be able to trigger it. It's been running two hours
so far. I'll leave it running, but right now I don't know what to make of this.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01  8:44       ` Hugh Dickins
@ 2012-06-01 16:16         ` Markus Trippelsdorf
  -1 siblings, 0 replies; 87+ messages in thread
From: Markus Trippelsdorf @ 2012-06-01 16:16 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Linus Torvalds, Andrew Morton, Cong Wang,
	linux-kernel, linux-mm

On 2012.06.01 at 01:44 -0700, Hugh Dickins wrote:
> On Thu, 31 May 2012, Dave Jones wrote:
> > On Wed, May 30, 2012 at 08:57:40PM -0400, Dave Jones wrote:
> >  > On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote:
> >  >  > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc
> >  >  > 
> >  >  > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()

I've also hit this warning today:

------------[ cut here ]------------
WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0xea/0x120()
Hardware name: System Product Name
Pid: 4385, comm: firefox Not tainted 3.4.0-09547-gfb21aff-dirty #46
Call Trace:
 [<ffffffff8105c6c0>] ? warn_slowpath_common+0x60/0xa0
 [<ffffffff810ba44a>] ? __set_page_dirty_nobuffers+0xea/0x120
 [<ffffffff810e4db0>] ? migrate_page_copy+0x150/0x160
 [<ffffffff810e4e2d>] ? migrate_page+0x4d/0x80
 [<ffffffff810e4edd>] ? move_to_new_page+0x7d/0x220
 [<ffffffff810c9e40>] ? suitable_migration_target.isra.12+0x1a0/0x1a0
 [<ffffffff810e55a8>] ? migrate_pages+0x3c8/0x460
 [<ffffffff810caa44>] ? compact_zone+0x1c4/0x2c0
 [<ffffffff810cad42>] ? compact_zone_order+0x82/0xc0
 [<ffffffff810cae4a>] ? try_to_compact_pages+0xca/0x140
 [<ffffffff81551f11>] ? __alloc_pages_direct_compact+0xa7/0x18f
 [<ffffffff810b8a30>] ? __alloc_pages_nodemask+0x3b0/0x7a0
 [<ffffffff810e884d>] ? do_huge_pmd_anonymous_page+0x10d/0x2a0
 [<ffffffff8105301b>] ? do_page_fault+0xfb/0x400
 [<ffffffff810d46bd>] ? mmap_region+0x1dd/0x540
 [<ffffffff81559f2f>] ? page_fault+0x1f/0x30
---[ end trace 7d7c821044142576 ]---

-- 
Markus

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01 16:16         ` Markus Trippelsdorf
  0 siblings, 0 replies; 87+ messages in thread
From: Markus Trippelsdorf @ 2012-06-01 16:16 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Linus Torvalds, Andrew Morton, Cong Wang,
	linux-kernel, linux-mm

On 2012.06.01 at 01:44 -0700, Hugh Dickins wrote:
> On Thu, 31 May 2012, Dave Jones wrote:
> > On Wed, May 30, 2012 at 08:57:40PM -0400, Dave Jones wrote:
> >  > On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote:
> >  >  > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc
> >  >  > 
> >  >  > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()

I've also hit this warning today:

------------[ cut here ]------------
WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0xea/0x120()
Hardware name: System Product Name
Pid: 4385, comm: firefox Not tainted 3.4.0-09547-gfb21aff-dirty #46
Call Trace:
 [<ffffffff8105c6c0>] ? warn_slowpath_common+0x60/0xa0
 [<ffffffff810ba44a>] ? __set_page_dirty_nobuffers+0xea/0x120
 [<ffffffff810e4db0>] ? migrate_page_copy+0x150/0x160
 [<ffffffff810e4e2d>] ? migrate_page+0x4d/0x80
 [<ffffffff810e4edd>] ? move_to_new_page+0x7d/0x220
 [<ffffffff810c9e40>] ? suitable_migration_target.isra.12+0x1a0/0x1a0
 [<ffffffff810e55a8>] ? migrate_pages+0x3c8/0x460
 [<ffffffff810caa44>] ? compact_zone+0x1c4/0x2c0
 [<ffffffff810cad42>] ? compact_zone_order+0x82/0xc0
 [<ffffffff810cae4a>] ? try_to_compact_pages+0xca/0x140
 [<ffffffff81551f11>] ? __alloc_pages_direct_compact+0xa7/0x18f
 [<ffffffff810b8a30>] ? __alloc_pages_nodemask+0x3b0/0x7a0
 [<ffffffff810e884d>] ? do_huge_pmd_anonymous_page+0x10d/0x2a0
 [<ffffffff8105301b>] ? do_page_fault+0xfb/0x400
 [<ffffffff810d46bd>] ? mmap_region+0x1dd/0x540
 [<ffffffff81559f2f>] ? page_fault+0x1f/0x30
---[ end trace 7d7c821044142576 ]---

-- 
Markus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01 16:16         ` Markus Trippelsdorf
@ 2012-06-01 16:28           ` Linus Torvalds
  -1 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-01 16:28 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Hugh Dickins, Dave Jones, Andrew Morton, Cong Wang, linux-kernel,
	linux-mm

On Fri, Jun 1, 2012 at 9:16 AM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
>
> I've also hit this warning today:

Can you try the patch by Hugh Dickins earlier in this thread?

Dave is reporting tentative success with it, even though I don't think
we really understand this thing fully yet. Getting way more testing
would still be good, though.

                  Linus

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01 16:28           ` Linus Torvalds
  0 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-01 16:28 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Hugh Dickins, Dave Jones, Andrew Morton, Cong Wang, linux-kernel,
	linux-mm

On Fri, Jun 1, 2012 at 9:16 AM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
>
> I've also hit this warning today:

Can you try the patch by Hugh Dickins earlier in this thread?

Dave is reporting tentative success with it, even though I don't think
we really understand this thing fully yet. Getting way more testing
would still be good, though.

                  Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01 16:28           ` Linus Torvalds
@ 2012-06-01 16:39             ` Markus Trippelsdorf
  -1 siblings, 0 replies; 87+ messages in thread
From: Markus Trippelsdorf @ 2012-06-01 16:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Dave Jones, Andrew Morton, Cong Wang, linux-kernel,
	linux-mm

On 2012.06.01 at 09:28 -0700, Linus Torvalds wrote:
> On Fri, Jun 1, 2012 at 9:16 AM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
> >
> > I've also hit this warning today:
> 
> Can you try the patch by Hugh Dickins earlier in this thread?

I will try. But please notice that the warning just happened per
accident. I don't know how to reproduce the issue yet.

> Dave is reporting tentative success with it, even though I don't think
> we really understand this thing fully yet. Getting way more testing
> would still be good, though.

-- 
Markus

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01 16:39             ` Markus Trippelsdorf
  0 siblings, 0 replies; 87+ messages in thread
From: Markus Trippelsdorf @ 2012-06-01 16:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Dave Jones, Andrew Morton, Cong Wang, linux-kernel,
	linux-mm

On 2012.06.01 at 09:28 -0700, Linus Torvalds wrote:
> On Fri, Jun 1, 2012 at 9:16 AM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
> >
> > I've also hit this warning today:
> 
> Can you try the patch by Hugh Dickins earlier in this thread?

I will try. But please notice that the warning just happened per
accident. I don't know how to reproduce the issue yet.

> Dave is reporting tentative success with it, even though I don't think
> we really understand this thing fully yet. Getting way more testing
> would still be good, though.

-- 
Markus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01 16:12         ` Dave Jones
@ 2012-06-01 17:16           ` Dave Jones
  -1 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-01 17:16 UTC (permalink / raw)
  To: Hugh Dickins, Linus Torvalds, Andrew Morton, Cong Wang,
	linux-kernel, linux-mm

On Fri, Jun 01, 2012 at 12:12:05PM -0400, Dave Jones wrote:

 
 > So with this applied, I don't seem to be able to trigger it. It's been running two hours
 > so far. I'll leave it running, but right now I don't know what to make of this.

I can trigger the list corruption still, but not the WARN.

	Dave

[  551.980716] ------------[ cut here ]------------
[  551.981646] WARNING: at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
[  551.983461] list_del corruption. prev->next should be ffffea0004b305a0, but was ffffea0004f117e0
[  551.984406] Modules linked in: tun fuse nfnetlink binfmt_misc ipt_ULOG sctp libcrc32c caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr i2c_i801 e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
[  551.988121] Pid: 21459, comm: trinity-child2 Not tainted 3.4.0+ #49
[  551.989063] Call Trace:
[  551.990012]  [<ffffffff8104912f>] warn_slowpath_common+0x7f/0xc0
[  551.990956]  [<ffffffff81049226>] warn_slowpath_fmt+0x46/0x50
[  551.991902]  [<ffffffff81329171>] __list_del_entry+0xa1/0xd0
[  551.992849]  [<ffffffff81145ad9>] move_freepages_block+0x159/0x190
[  551.993800]  [<ffffffff81165be3>] suitable_migration_target.isra.15+0x1b3/0x1d0
[  551.994761]  [<ffffffff81165e2e>] compaction_alloc+0x22e/0x2f0
[  551.995731]  [<ffffffff81198547>] migrate_pages+0xc7/0x540
[  551.996684]  [<ffffffff81165c00>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
[  551.997638]  [<ffffffff81166b86>] compact_zone+0x216/0x480
[  551.998593]  [<ffffffff810b15f8>] ? trace_hardirqs_off_caller+0x28/0xc0
[  551.999558]  [<ffffffff811670cd>] compact_zone_order+0x8d/0xd0
[  552.000525]  [<ffffffff81149735>] ? get_page_from_freelist+0x565/0x970
[  552.001502]  [<ffffffff811671d9>] try_to_compact_pages+0xc9/0x140
[  552.002548]  [<ffffffff8163f491>] __alloc_pages_direct_compact+0xaa/0x1d0
[  552.003592]  [<ffffffff8114a14b>] __alloc_pages_nodemask+0x60b/0xab0
[  552.004650]  [<ffffffff810b15f8>] ? trace_hardirqs_off_caller+0x28/0xc0
[  552.005708]  [<ffffffff810b4f00>] ? __lock_acquire+0x2d0/0x1aa0
[  552.007332]  [<ffffffff81189ec6>] alloc_pages_vma+0xb6/0x190
[  552.008953]  [<ffffffff8119cfb3>] do_huge_pmd_anonymous_page+0x133/0x310
[  552.010584]  [<ffffffff8116c2e2>] handle_mm_fault+0x242/0x2e0
[  552.012233]  [<ffffffff8116c592>] __get_user_pages+0x142/0x560
[  552.013891]  [<ffffffff81171c58>] ? mmap_region+0x3f8/0x630
[  552.015753]  [<ffffffff8116ca62>] get_user_pages+0x52/0x60
[  552.017348]  [<ffffffff8116d952>] make_pages_present+0x92/0xc0
[  552.018936]  [<ffffffff81171c06>] mmap_region+0x3a6/0x630
[  552.021074]  [<ffffffff81050e2c>] ? do_setitimer+0x1cc/0x310
[  552.022367]  [<ffffffff811721ed>] do_mmap_pgoff+0x35d/0x3b0
[  552.023406]  [<ffffffff811722a6>] ? sys_mmap_pgoff+0x66/0x240
[  552.024429]  [<ffffffff811722c4>] sys_mmap_pgoff+0x84/0x240
[  552.025445]  [<ffffffff81322cbe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  552.026466]  [<ffffffff81006ca2>] sys_mmap+0x22/0x30
[  552.027486]  [<ffffffff81651c92>] system_call_fastpath+0x16/0x1b
[  552.028521] ---[ end trace c092df1e14d11d14 ]---


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01 17:16           ` Dave Jones
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-01 17:16 UTC (permalink / raw)
  To: Hugh Dickins, Linus Torvalds, Andrew Morton, Cong Wang,
	linux-kernel, linux-mm

On Fri, Jun 01, 2012 at 12:12:05PM -0400, Dave Jones wrote:

 
 > So with this applied, I don't seem to be able to trigger it. It's been running two hours
 > so far. I'll leave it running, but right now I don't know what to make of this.

I can trigger the list corruption still, but not the WARN.

	Dave

[  551.980716] ------------[ cut here ]------------
[  551.981646] WARNING: at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
[  551.983461] list_del corruption. prev->next should be ffffea0004b305a0, but was ffffea0004f117e0
[  551.984406] Modules linked in: tun fuse nfnetlink binfmt_misc ipt_ULOG sctp libcrc32c caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr i2c_i801 e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
[  551.988121] Pid: 21459, comm: trinity-child2 Not tainted 3.4.0+ #49
[  551.989063] Call Trace:
[  551.990012]  [<ffffffff8104912f>] warn_slowpath_common+0x7f/0xc0
[  551.990956]  [<ffffffff81049226>] warn_slowpath_fmt+0x46/0x50
[  551.991902]  [<ffffffff81329171>] __list_del_entry+0xa1/0xd0
[  551.992849]  [<ffffffff81145ad9>] move_freepages_block+0x159/0x190
[  551.993800]  [<ffffffff81165be3>] suitable_migration_target.isra.15+0x1b3/0x1d0
[  551.994761]  [<ffffffff81165e2e>] compaction_alloc+0x22e/0x2f0
[  551.995731]  [<ffffffff81198547>] migrate_pages+0xc7/0x540
[  551.996684]  [<ffffffff81165c00>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
[  551.997638]  [<ffffffff81166b86>] compact_zone+0x216/0x480
[  551.998593]  [<ffffffff810b15f8>] ? trace_hardirqs_off_caller+0x28/0xc0
[  551.999558]  [<ffffffff811670cd>] compact_zone_order+0x8d/0xd0
[  552.000525]  [<ffffffff81149735>] ? get_page_from_freelist+0x565/0x970
[  552.001502]  [<ffffffff811671d9>] try_to_compact_pages+0xc9/0x140
[  552.002548]  [<ffffffff8163f491>] __alloc_pages_direct_compact+0xaa/0x1d0
[  552.003592]  [<ffffffff8114a14b>] __alloc_pages_nodemask+0x60b/0xab0
[  552.004650]  [<ffffffff810b15f8>] ? trace_hardirqs_off_caller+0x28/0xc0
[  552.005708]  [<ffffffff810b4f00>] ? __lock_acquire+0x2d0/0x1aa0
[  552.007332]  [<ffffffff81189ec6>] alloc_pages_vma+0xb6/0x190
[  552.008953]  [<ffffffff8119cfb3>] do_huge_pmd_anonymous_page+0x133/0x310
[  552.010584]  [<ffffffff8116c2e2>] handle_mm_fault+0x242/0x2e0
[  552.012233]  [<ffffffff8116c592>] __get_user_pages+0x142/0x560
[  552.013891]  [<ffffffff81171c58>] ? mmap_region+0x3f8/0x630
[  552.015753]  [<ffffffff8116ca62>] get_user_pages+0x52/0x60
[  552.017348]  [<ffffffff8116d952>] make_pages_present+0x92/0xc0
[  552.018936]  [<ffffffff81171c06>] mmap_region+0x3a6/0x630
[  552.021074]  [<ffffffff81050e2c>] ? do_setitimer+0x1cc/0x310
[  552.022367]  [<ffffffff811721ed>] do_mmap_pgoff+0x35d/0x3b0
[  552.023406]  [<ffffffff811722a6>] ? sys_mmap_pgoff+0x66/0x240
[  552.024429]  [<ffffffff811722c4>] sys_mmap_pgoff+0x84/0x240
[  552.025445]  [<ffffffff81322cbe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  552.026466]  [<ffffffff81006ca2>] sys_mmap+0x22/0x30
[  552.027486]  [<ffffffff81651c92>] system_call_fastpath+0x16/0x1b
[  552.028521] ---[ end trace c092df1e14d11d14 ]---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01 17:16           ` Dave Jones
@ 2012-06-01 22:17             ` Hugh Dickins
  -1 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-01 22:17 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Andrew Morton, Cong Wang
  Cc: linux-kernel, linux-mm

On Fri, 1 Jun 2012, Dave Jones wrote:
> On Fri, Jun 01, 2012 at 12:12:05PM -0400, Dave Jones wrote:
>  
>  > So with this applied, I don't seem to be able to trigger it. It's been running two hours
>  > so far. I'll leave it running, but right now I don't know what to make of this.
> 
> I can trigger the list corruption still, but not the WARN.
> 
> 	Dave
> 
> [  551.980716] ------------[ cut here ]------------
> [  551.981646] WARNING: at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
> [  551.983461] list_del corruption. prev->next should be ffffea0004b305a0, but was ffffea0004f117e0
> [  551.984406] Modules linked in: tun fuse nfnetlink binfmt_misc ipt_ULOG sctp libcrc32c caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr i2c_i801 e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
> [  551.988121] Pid: 21459, comm: trinity-child2 Not tainted 3.4.0+ #49
> [  551.989063] Call Trace:
> [  551.990012]  [<ffffffff8104912f>] warn_slowpath_common+0x7f/0xc0
> [  551.990956]  [<ffffffff81049226>] warn_slowpath_fmt+0x46/0x50
> [  551.991902]  [<ffffffff81329171>] __list_del_entry+0xa1/0xd0
> [  551.992849]  [<ffffffff81145ad9>] move_freepages_block+0x159/0x190
> [  551.993800]  [<ffffffff81165be3>] suitable_migration_target.isra.15+0x1b3/0x1d0
> [  551.994761]  [<ffffffff81165e2e>] compaction_alloc+0x22e/0x2f0
> [  551.995731]  [<ffffffff81198547>] migrate_pages+0xc7/0x540
> [  551.996684]  [<ffffffff81165c00>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
> [  551.997638]  [<ffffffff81166b86>] compact_zone+0x216/0x480
> [  551.998593]  [<ffffffff810b15f8>] ? trace_hardirqs_off_caller+0x28/0xc0
> [  551.999558]  [<ffffffff811670cd>] compact_zone_order+0x8d/0xd0
> [  552.000525]  [<ffffffff81149735>] ? get_page_from_freelist+0x565/0x970
> [  552.001502]  [<ffffffff811671d9>] try_to_compact_pages+0xc9/0x140
> [  552.002548]  [<ffffffff8163f491>] __alloc_pages_direct_compact+0xaa/0x1d0
> [  552.003592]  [<ffffffff8114a14b>] __alloc_pages_nodemask+0x60b/0xab0
> [  552.004650]  [<ffffffff810b15f8>] ? trace_hardirqs_off_caller+0x28/0xc0
> [  552.005708]  [<ffffffff810b4f00>] ? __lock_acquire+0x2d0/0x1aa0
> [  552.007332]  [<ffffffff81189ec6>] alloc_pages_vma+0xb6/0x190
> [  552.008953]  [<ffffffff8119cfb3>] do_huge_pmd_anonymous_page+0x133/0x310
> [  552.010584]  [<ffffffff8116c2e2>] handle_mm_fault+0x242/0x2e0
> [  552.012233]  [<ffffffff8116c592>] __get_user_pages+0x142/0x560
> [  552.013891]  [<ffffffff81171c58>] ? mmap_region+0x3f8/0x630
> [  552.015753]  [<ffffffff8116ca62>] get_user_pages+0x52/0x60
> [  552.017348]  [<ffffffff8116d952>] make_pages_present+0x92/0xc0
> [  552.018936]  [<ffffffff81171c06>] mmap_region+0x3a6/0x630
> [  552.021074]  [<ffffffff81050e2c>] ? do_setitimer+0x1cc/0x310
> [  552.022367]  [<ffffffff811721ed>] do_mmap_pgoff+0x35d/0x3b0
> [  552.023406]  [<ffffffff811722a6>] ? sys_mmap_pgoff+0x66/0x240
> [  552.024429]  [<ffffffff811722c4>] sys_mmap_pgoff+0x84/0x240
> [  552.025445]  [<ffffffff81322cbe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  552.026466]  [<ffffffff81006ca2>] sys_mmap+0x22/0x30
> [  552.027486]  [<ffffffff81651c92>] system_call_fastpath+0x16/0x1b
> [  552.028521] ---[ end trace c092df1e14d11d14 ]---

Several distractions today, and I must rush out now for two or three
hours: but please check if this patch below makes sense (I've only
checked that it builds), and if so give it a run to see if it fixes
your list corruptions - thanks.

(Looks like there's an independent off-by-one in page_zone(end_page),
but that shouldn't do any harm.)

Hugh

--- 3.4.0+/mm/compaction.c	2012-05-30 08:17:19.396008280 -0700
+++ linux/mm/compaction.c	2012-06-01 15:04:18.612051243 -0700
@@ -369,6 +369,9 @@ static bool rescue_unmovable_pageblock(s
 {
 	unsigned long pfn, start_pfn, end_pfn;
 	struct page *start_page, *end_page;
+	struct zone *zone;
+	unsigned long flags;
+	bool rescued = false;
 
 	pfn = page_to_pfn(page);
 	start_pfn = pfn & ~(pageblock_nr_pages - 1);
@@ -378,9 +381,11 @@ static bool rescue_unmovable_pageblock(s
 	end_page = pfn_to_page(end_pfn);
 
 	/* Do not deal with pageblocks that overlap zones */
-	if (page_zone(start_page) != page_zone(end_page))
+	zone = page_zone(start_page);
+	if (zone != page_zone(end_page))
 		return false;
 
+	spin_lock_irqsave(&zone->lock, flags);
 	for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
 								  page++) {
 		if (!pfn_valid_within(pfn))
@@ -396,12 +401,15 @@ static bool rescue_unmovable_pageblock(s
 		} else if (page_count(page) == 0 || PageLRU(page))
 			continue;
 
-		return false;
+		goto out;
 	}
 
 	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
-	return true;
+	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+	rescued = true;
+out:
+	spin_unlock_irqrestore(&zone->lock, flags);
+	return rescued;
 }
 
 enum smt_result {

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-01 22:17             ` Hugh Dickins
  0 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-01 22:17 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Andrew Morton, Cong Wang
  Cc: linux-kernel, linux-mm

On Fri, 1 Jun 2012, Dave Jones wrote:
> On Fri, Jun 01, 2012 at 12:12:05PM -0400, Dave Jones wrote:
>  
>  > So with this applied, I don't seem to be able to trigger it. It's been running two hours
>  > so far. I'll leave it running, but right now I don't know what to make of this.
> 
> I can trigger the list corruption still, but not the WARN.
> 
> 	Dave
> 
> [  551.980716] ------------[ cut here ]------------
> [  551.981646] WARNING: at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
> [  551.983461] list_del corruption. prev->next should be ffffea0004b305a0, but was ffffea0004f117e0
> [  551.984406] Modules linked in: tun fuse nfnetlink binfmt_misc ipt_ULOG sctp libcrc32c caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr i2c_i801 e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
> [  551.988121] Pid: 21459, comm: trinity-child2 Not tainted 3.4.0+ #49
> [  551.989063] Call Trace:
> [  551.990012]  [<ffffffff8104912f>] warn_slowpath_common+0x7f/0xc0
> [  551.990956]  [<ffffffff81049226>] warn_slowpath_fmt+0x46/0x50
> [  551.991902]  [<ffffffff81329171>] __list_del_entry+0xa1/0xd0
> [  551.992849]  [<ffffffff81145ad9>] move_freepages_block+0x159/0x190
> [  551.993800]  [<ffffffff81165be3>] suitable_migration_target.isra.15+0x1b3/0x1d0
> [  551.994761]  [<ffffffff81165e2e>] compaction_alloc+0x22e/0x2f0
> [  551.995731]  [<ffffffff81198547>] migrate_pages+0xc7/0x540
> [  551.996684]  [<ffffffff81165c00>] ? suitable_migration_target.isra.15+0x1d0/0x1d0
> [  551.997638]  [<ffffffff81166b86>] compact_zone+0x216/0x480
> [  551.998593]  [<ffffffff810b15f8>] ? trace_hardirqs_off_caller+0x28/0xc0
> [  551.999558]  [<ffffffff811670cd>] compact_zone_order+0x8d/0xd0
> [  552.000525]  [<ffffffff81149735>] ? get_page_from_freelist+0x565/0x970
> [  552.001502]  [<ffffffff811671d9>] try_to_compact_pages+0xc9/0x140
> [  552.002548]  [<ffffffff8163f491>] __alloc_pages_direct_compact+0xaa/0x1d0
> [  552.003592]  [<ffffffff8114a14b>] __alloc_pages_nodemask+0x60b/0xab0
> [  552.004650]  [<ffffffff810b15f8>] ? trace_hardirqs_off_caller+0x28/0xc0
> [  552.005708]  [<ffffffff810b4f00>] ? __lock_acquire+0x2d0/0x1aa0
> [  552.007332]  [<ffffffff81189ec6>] alloc_pages_vma+0xb6/0x190
> [  552.008953]  [<ffffffff8119cfb3>] do_huge_pmd_anonymous_page+0x133/0x310
> [  552.010584]  [<ffffffff8116c2e2>] handle_mm_fault+0x242/0x2e0
> [  552.012233]  [<ffffffff8116c592>] __get_user_pages+0x142/0x560
> [  552.013891]  [<ffffffff81171c58>] ? mmap_region+0x3f8/0x630
> [  552.015753]  [<ffffffff8116ca62>] get_user_pages+0x52/0x60
> [  552.017348]  [<ffffffff8116d952>] make_pages_present+0x92/0xc0
> [  552.018936]  [<ffffffff81171c06>] mmap_region+0x3a6/0x630
> [  552.021074]  [<ffffffff81050e2c>] ? do_setitimer+0x1cc/0x310
> [  552.022367]  [<ffffffff811721ed>] do_mmap_pgoff+0x35d/0x3b0
> [  552.023406]  [<ffffffff811722a6>] ? sys_mmap_pgoff+0x66/0x240
> [  552.024429]  [<ffffffff811722c4>] sys_mmap_pgoff+0x84/0x240
> [  552.025445]  [<ffffffff81322cbe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  552.026466]  [<ffffffff81006ca2>] sys_mmap+0x22/0x30
> [  552.027486]  [<ffffffff81651c92>] system_call_fastpath+0x16/0x1b
> [  552.028521] ---[ end trace c092df1e14d11d14 ]---

Several distractions today, and I must rush out now for two or three
hours: but please check if this patch below makes sense (I've only
checked that it builds), and if so give it a run to see if it fixes
your list corruptions - thanks.

(Looks like there's an independent off-by-one in page_zone(end_page),
but that shouldn't do any harm.)

Hugh

--- 3.4.0+/mm/compaction.c	2012-05-30 08:17:19.396008280 -0700
+++ linux/mm/compaction.c	2012-06-01 15:04:18.612051243 -0700
@@ -369,6 +369,9 @@ static bool rescue_unmovable_pageblock(s
 {
 	unsigned long pfn, start_pfn, end_pfn;
 	struct page *start_page, *end_page;
+	struct zone *zone;
+	unsigned long flags;
+	bool rescued = false;
 
 	pfn = page_to_pfn(page);
 	start_pfn = pfn & ~(pageblock_nr_pages - 1);
@@ -378,9 +381,11 @@ static bool rescue_unmovable_pageblock(s
 	end_page = pfn_to_page(end_pfn);
 
 	/* Do not deal with pageblocks that overlap zones */
-	if (page_zone(start_page) != page_zone(end_page))
+	zone = page_zone(start_page);
+	if (zone != page_zone(end_page))
 		return false;
 
+	spin_lock_irqsave(&zone->lock, flags);
 	for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
 								  page++) {
 		if (!pfn_valid_within(pfn))
@@ -396,12 +401,15 @@ static bool rescue_unmovable_pageblock(s
 		} else if (page_count(page) == 0 || PageLRU(page))
 			continue;
 
-		return false;
+		goto out;
 	}
 
 	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
-	return true;
+	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+	rescued = true;
+out:
+	spin_unlock_irqrestore(&zone->lock, flags);
+	return rescued;
 }
 
 enum smt_result {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-01 22:17             ` Hugh Dickins
@ 2012-06-02  1:45               ` Linus Torvalds
  -1 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-02  1:45 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
>
> +       spin_lock_irqsave(&zone->lock, flags);
>        for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
>                                                                  page++) {

So holding the spinlock (and disabling irqs!) over the whole loop
sounds horrible.

At the same time, the iterators don't seem to require the spinlock, so
it should be possible to just move the lock into the loop, no?

                  Linus

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-02  1:45               ` Linus Torvalds
  0 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-02  1:45 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
>
> +       spin_lock_irqsave(&zone->lock, flags);
>        for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
>                                                                  page++) {

So holding the spinlock (and disabling irqs!) over the whole loop
sounds horrible.

At the same time, the iterators don't seem to require the spinlock, so
it should be possible to just move the lock into the loop, no?

                  Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-02  1:45               ` Linus Torvalds
  (?)
@ 2012-06-02  4:40               ` Hugh Dickins
  2012-06-02  4:58                   ` Linus Torvalds
                                   ` (3 more replies)
  -1 siblings, 4 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-02  4:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bartlomiej Zolnierkiewicz, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Minchan Kim, Rik van Riel, Dave Jones, Andrew Morton,
	Cong Wang, Markus Trippelsdorf, linux-kernel, linux-mm

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3298 bytes --]

On Fri, 1 Jun 2012, Linus Torvalds wrote:
> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
> >
> > +       spin_lock_irqsave(&zone->lock, flags);
> >        for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
> >                                                                  page++) {
> 
> So holding the spinlock (and disabling irqs!) over the whole loop
> sounds horrible.

There looks to be a pretty similar loop inside move_freepages_block(),
which is the part which I believe really needs the lock - it's moving
free pages from one lru to another.

> 
> At the same time, the iterators don't seem to require the spinlock, so
> it should be possible to just move the lock into the loop, no?

Move the lock after the loop, I think you meant.

I put the lock before the loop because it's deciding whether it can
usefully proceed, and then proceeding: I was thinking that the lock
would stabilize the conditions that it bases that decision on.

But it certainly does not stabilize all of them (most obviously not
PageLRU), so I'm guesssing that this is a best-effort decision which
can safely go wrong some of the time.

In which case, yes, much better to follow your suggestion, and hold
the lock (with irqs disabled) for only half the time.

Similarly untested patch below.

But I'm entirely unfamiliar with this code: best Cc people more familiar
with it.  Does this addition of locking to rescue_unmovable_pageblock()
look correct to you, and do you think it has a good chance of fixing the
move_freepages_block() list debug warnings which Dave has been reporting
(in this and in another thread)?

(Although there's still something of a mystery in where Dave's bisection
appeared to converge, our best assumption at present is that one of my
tmpfs changes is to blame for the __set_page_dirty_nobuffers warnings,
and I need to send a finalized patch to fix that later.

I'm guessing that the few people who see the warning are those running
new systemd distros, and that systemd is indeed now making use of the
fallocate support we added into tmpfs for it.)

Hugh

--- 3.4.0+/mm/compaction.c	2012-05-30 08:17:19.396008280 -0700
+++ linux/mm/compaction.c	2012-06-01 20:59:56.840204915 -0700
@@ -369,6 +369,8 @@ static bool rescue_unmovable_pageblock(s
 {
 	unsigned long pfn, start_pfn, end_pfn;
 	struct page *start_page, *end_page;
+	struct zone *zone;
+	unsigned long flags;
 
 	pfn = page_to_pfn(page);
 	start_pfn = pfn & ~(pageblock_nr_pages - 1);
@@ -378,7 +380,8 @@ static bool rescue_unmovable_pageblock(s
 	end_page = pfn_to_page(end_pfn);
 
 	/* Do not deal with pageblocks that overlap zones */
-	if (page_zone(start_page) != page_zone(end_page))
+	zone = page_zone(start_page);
+	if (zone != page_zone(end_page))
 		return false;
 
 	for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
@@ -399,8 +402,10 @@ static bool rescue_unmovable_pageblock(s
 		return false;
 	}
 
+	spin_lock_irqsave(&zone->lock, flags);
 	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+	spin_unlock_irqrestore(&zone->lock, flags);
 	return true;
 }
 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-02  4:40               ` Hugh Dickins
@ 2012-06-02  4:58                   ` Linus Torvalds
  2012-06-02  7:17                   ` Markus Trippelsdorf
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-02  4:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Bartlomiej Zolnierkiewicz, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Minchan Kim, Rik van Riel, Dave Jones, Andrew Morton,
	Cong Wang, Markus Trippelsdorf, linux-kernel, linux-mm

On Fri, Jun 1, 2012 at 9:40 PM, Hugh Dickins <hughd@google.com> wrote:
>
> Move the lock after the loop, I think you meant.

Well, I wasn't sure if anything inside the loop might need it. I don't
*think* so, but at the same time, what protects "page_order(page)"
(or, indeed PageBuddy()) from being stable while that loop content
uses them?

I don't understand that code at all. It does that crazy iteration over
page, and changes "page" in random ways, and then finishes up with a
totally new "page" value that is some random thing that is *after* the
end_page thing. WHAT?

The code makes no sense. It tests all those pages within the
page-block, but then after it has done all those tests, it does the
final

  set_pageblock_migratetype(..)
  move_freepages_block(..)

using a page that is *beyond* the pageblock (and with the whole
page_order() thing, who knows just how far beyond it?)

It looks entirely too much like random-monkey code to me.

            Linus

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-02  4:58                   ` Linus Torvalds
  0 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-02  4:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Bartlomiej Zolnierkiewicz, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Minchan Kim, Rik van Riel, Dave Jones, Andrew Morton,
	Cong Wang, Markus Trippelsdorf, linux-kernel, linux-mm

On Fri, Jun 1, 2012 at 9:40 PM, Hugh Dickins <hughd@google.com> wrote:
>
> Move the lock after the loop, I think you meant.

Well, I wasn't sure if anything inside the loop might need it. I don't
*think* so, but at the same time, what protects "page_order(page)"
(or, indeed PageBuddy()) from being stable while that loop content
uses them?

I don't understand that code at all. It does that crazy iteration over
page, and changes "page" in random ways, and then finishes up with a
totally new "page" value that is some random thing that is *after* the
end_page thing. WHAT?

The code makes no sense. It tests all those pages within the
page-block, but then after it has done all those tests, it does the
final

  set_pageblock_migratetype(..)
  move_freepages_block(..)

using a page that is *beyond* the pageblock (and with the whole
page_order() thing, who knows just how far beyond it?)

It looks entirely too much like random-monkey code to me.

            Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-02  4:40               ` Hugh Dickins
@ 2012-06-02  7:17                   ` Markus Trippelsdorf
  2012-06-02  7:17                   ` Markus Trippelsdorf
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 87+ messages in thread
From: Markus Trippelsdorf @ 2012-06-02  7:17 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On 2012.06.01 at 21:40 -0700, Hugh Dickins wrote:
> 
> I'm guessing that the few people who see the warning are those running
> new systemd distros, and that systemd is indeed now making use of the
> fallocate support we added into tmpfs for it.)

At least in my case it's nothing that horrible. I'm just setting
browser.cache.disk.parent_directory to /dev/shm in Firefox. And Firefox
does indeed use fallocate on its "disk cache" items.

-- 
Markus

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-02  7:17                   ` Markus Trippelsdorf
  0 siblings, 0 replies; 87+ messages in thread
From: Markus Trippelsdorf @ 2012-06-02  7:17 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On 2012.06.01 at 21:40 -0700, Hugh Dickins wrote:
> 
> I'm guessing that the few people who see the warning are those running
> new systemd distros, and that systemd is indeed now making use of the
> fallocate support we added into tmpfs for it.)

At least in my case it's nothing that horrible. I'm just setting
browser.cache.disk.parent_directory to /dev/shm in Firefox. And Firefox
does indeed use fallocate on its "disk cache" items.

-- 
Markus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-02  4:58                   ` Linus Torvalds
@ 2012-06-02  7:20                     ` Hugh Dickins
  -1 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-02  7:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bartlomiej Zolnierkiewicz, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Minchan Kim, Rik van Riel, Dave Jones, Andrew Morton,
	Cong Wang, Markus Trippelsdorf, linux-kernel, linux-mm

On Fri, 1 Jun 2012, Linus Torvalds wrote:
> On Fri, Jun 1, 2012 at 9:40 PM, Hugh Dickins <hughd@google.com> wrote:
> >
> > Move the lock after the loop, I think you meant.
> 
> Well, I wasn't sure if anything inside the loop might need it. I don't
> *think* so, but at the same time, what protects "page_order(page)"
> (or, indeed PageBuddy()) from being stable while that loop content
> uses them?

Yes, I believe you're right, page_order(page) could supply nonsense
if it's not stabilized under zone->lock along with PageBuddy(page).

Though if this rescue_unmovable_pageblock() is just best-effort,
with a little more care we can probably avoid the lock in there.

> 
> I don't understand that code at all. It does that crazy iteration over
> page, and changes "page" in random ways,

I don't think they're random ways: when buddy it uses the order to skip
that block, otherwise it goes page by page, considering a free (I guess
on pcp) page or an lru page as good for movable.

> and then finishes up with a
> totally new "page" value that is some random thing that is *after* the
> end_page thing. WHAT?
> 
> The code makes no sense. It tests all those pages within the
> page-block, but then after it has done all those tests, it does the
> final
> 
>   set_pageblock_migratetype(..)
>   move_freepages_block(..)
> 
> using a page that is *beyond* the pageblock (and with the whole
> page_order() thing, who knows just how far beyond it?)

I totally missed that, thank goodness you did not.  Yes, it's rubbish.
It goes to this effort to find a suitable pageblock, then chooses the
next one instead (or possibly another).  Perhaps it would get even
better results using a random number generator in there.

> 
> It looks entirely too much like random-monkey code to me.

Presumably it should be passing start_page instead of page
to set_pageblock_migratetype() and move_freepages_block().

But this does seem to be code of the kind, that the longer you look
at it, the more bugs you find.  And I worry about what trouble it
might then cause, if it actually started to work in the way it was
intending.  I don't think fixing it up is wise for -rc1.

Commit 5ceb9ce6fe9462a298bb2cd5c9f1ca6cb80a0199
("mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks")
appears to revert cleanly, and I'm running with it reverted now.

I'm not saying it shouldn't come back later, but does anyone see
an argument against reverting it now?

Hugh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-02  7:20                     ` Hugh Dickins
  0 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-02  7:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bartlomiej Zolnierkiewicz, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Minchan Kim, Rik van Riel, Dave Jones, Andrew Morton,
	Cong Wang, Markus Trippelsdorf, linux-kernel, linux-mm

On Fri, 1 Jun 2012, Linus Torvalds wrote:
> On Fri, Jun 1, 2012 at 9:40 PM, Hugh Dickins <hughd@google.com> wrote:
> >
> > Move the lock after the loop, I think you meant.
> 
> Well, I wasn't sure if anything inside the loop might need it. I don't
> *think* so, but at the same time, what protects "page_order(page)"
> (or, indeed PageBuddy()) from being stable while that loop content
> uses them?

Yes, I believe you're right, page_order(page) could supply nonsense
if it's not stabilized under zone->lock along with PageBuddy(page).

Though if this rescue_unmovable_pageblock() is just best-effort,
with a little more care we can probably avoid the lock in there.

> 
> I don't understand that code at all. It does that crazy iteration over
> page, and changes "page" in random ways,

I don't think they're random ways: when buddy it uses the order to skip
that block, otherwise it goes page by page, considering a free (I guess
on pcp) page or an lru page as good for movable.

> and then finishes up with a
> totally new "page" value that is some random thing that is *after* the
> end_page thing. WHAT?
> 
> The code makes no sense. It tests all those pages within the
> page-block, but then after it has done all those tests, it does the
> final
> 
>   set_pageblock_migratetype(..)
>   move_freepages_block(..)
> 
> using a page that is *beyond* the pageblock (and with the whole
> page_order() thing, who knows just how far beyond it?)

I totally missed that, thank goodness you did not.  Yes, it's rubbish.
It goes to this effort to find a suitable pageblock, then chooses the
next one instead (or possibly another).  Perhaps it would get even
better results using a random number generator in there.

> 
> It looks entirely too much like random-monkey code to me.

Presumably it should be passing start_page instead of page
to set_pageblock_migratetype() and move_freepages_block().

But this does seem to be code of the kind, that the longer you look
at it, the more bugs you find.  And I worry about what trouble it
might then cause, if it actually started to work in the way it was
intending.  I don't think fixing it up is wise for -rc1.

Commit 5ceb9ce6fe9462a298bb2cd5c9f1ca6cb80a0199
("mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks")
appears to revert cleanly, and I'm running with it reverted now.

I'm not saying it shouldn't come back later, but does anyone see
an argument against reverting it now?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-02  7:17                   ` Markus Trippelsdorf
@ 2012-06-02  7:22                     ` Hugh Dickins
  -1 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-02  7:22 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Linus Torvalds, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On Sat, 2 Jun 2012, Markus Trippelsdorf wrote:
> On 2012.06.01 at 21:40 -0700, Hugh Dickins wrote:
> > 
> > I'm guessing that the few people who see the warning are those running
> > new systemd distros, and that systemd is indeed now making use of the
> > fallocate support we added into tmpfs for it.)
> 
> At least in my case it's nothing that horrible. I'm just setting
> browser.cache.disk.parent_directory to /dev/shm in Firefox. And Firefox
> does indeed use fallocate on its "disk cache" items.

That fits, and it's very helpful to know - thank you.

Hugh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-02  7:22                     ` Hugh Dickins
  0 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-02  7:22 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Linus Torvalds, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, linux-kernel, linux-mm

On Sat, 2 Jun 2012, Markus Trippelsdorf wrote:
> On 2012.06.01 at 21:40 -0700, Hugh Dickins wrote:
> > 
> > I'm guessing that the few people who see the warning are those running
> > new systemd distros, and that systemd is indeed now making use of the
> > fallocate support we added into tmpfs for it.)
> 
> At least in my case it's nothing that horrible. I'm just setting
> browser.cache.disk.parent_directory to /dev/shm in Firefox. And Firefox
> does indeed use fallocate on its "disk cache" items.

That fits, and it's very helpful to know - thank you.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH] mm: fix warning in __set_page_dirty_nobuffers
  2012-06-02  7:22                     ` Hugh Dickins
@ 2012-06-02  7:27                       ` Hugh Dickins
  -1 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-02  7:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Markus Trippelsdorf, Dave Jones, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Minchan Kim,
	Rik van Riel, Andrew Morton, Cong Wang, linux-kernel, linux-mm

New tmpfs use of !PageUptodate pages for fallocate() is triggering the
WARNING: at mm/page-writeback.c:1990 when __set_page_dirty_nobuffers()
is called from migrate_page_copy() for compaction.

It is anomalous that migration should use __set_page_dirty_nobuffers()
on an address_space that does not participate in dirty and writeback
accounting; and this has also been observed to insert surprising dirty
tags into a tmpfs radix_tree, despite tmpfs not using tags at all.

We should probably give migrate_page_copy() a better way to preserve
the tag and migrate accounting info, when mapping_cap_account_dirty().
But that needs some more work: so in the interim, avoid the warning
by using a simple SetPageDirty on PageSwapBacked pages.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/migrate.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- 3.4.0+/mm/migrate.c	2012-05-27 10:01:43.104049010 -0700
+++ linux/mm/migrate.c	2012-06-01 00:10:58.080098749 -0700
@@ -436,7 +436,10 @@ void migrate_page_copy(struct page *newp
 		 * is actually a signal that all of the page has become dirty.
 		 * Whereas only part of our page may be dirty.
 		 */
-		__set_page_dirty_nobuffers(newpage);
+		if (PageSwapBacked(page))
+			SetPageDirty(newpage);
+		else
+			__set_page_dirty_nobuffers(newpage);
  	}
 
 	mlock_migrate_page(newpage, page);

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH] mm: fix warning in __set_page_dirty_nobuffers
@ 2012-06-02  7:27                       ` Hugh Dickins
  0 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-02  7:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Markus Trippelsdorf, Dave Jones, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Minchan Kim,
	Rik van Riel, Andrew Morton, Cong Wang, linux-kernel, linux-mm

New tmpfs use of !PageUptodate pages for fallocate() is triggering the
WARNING: at mm/page-writeback.c:1990 when __set_page_dirty_nobuffers()
is called from migrate_page_copy() for compaction.

It is anomalous that migration should use __set_page_dirty_nobuffers()
on an address_space that does not participate in dirty and writeback
accounting; and this has also been observed to insert surprising dirty
tags into a tmpfs radix_tree, despite tmpfs not using tags at all.

We should probably give migrate_page_copy() a better way to preserve
the tag and migrate accounting info, when mapping_cap_account_dirty().
But that needs some more work: so in the interim, avoid the warning
by using a simple SetPageDirty on PageSwapBacked pages.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/migrate.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- 3.4.0+/mm/migrate.c	2012-05-27 10:01:43.104049010 -0700
+++ linux/mm/migrate.c	2012-06-01 00:10:58.080098749 -0700
@@ -436,7 +436,10 @@ void migrate_page_copy(struct page *newp
 		 * is actually a signal that all of the page has become dirty.
 		 * Whereas only part of our page may be dirty.
 		 */
-		__set_page_dirty_nobuffers(newpage);
+		if (PageSwapBacked(page))
+			SetPageDirty(newpage);
+		else
+			__set_page_dirty_nobuffers(newpage);
  	}
 
 	mlock_migrate_page(newpage, page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-02  4:40               ` Hugh Dickins
@ 2012-06-03 18:15                   ` Dave Jones
  2012-06-02  7:17                   ` Markus Trippelsdorf
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-03 18:15 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Fri, Jun 01, 2012 at 09:40:35PM -0700, Hugh Dickins wrote:

 > In which case, yes, much better to follow your suggestion, and hold
 > the lock (with irqs disabled) for only half the time.
 > 
 > Similarly untested patch below.

Things aren't happy with that patch at all.

=============================================
[ INFO: possible recursive locking detected ]
3.5.0-rc1+ #50 Not tainted
---------------------------------------------
trinity-child1/31784 is trying to acquire lock:
 (&(&zone->lock)->rlock){-.-.-.}, at: [<ffffffff81165c5d>] suitable_migration_target.isra.15+0x19d/0x1e0

but task is already holding lock:
 (&(&zone->lock)->rlock){-.-.-.}, at: [<ffffffff811661fb>] compaction_alloc+0x21b/0x2f0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&zone->lock)->rlock);
  lock(&(&zone->lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by trinity-child1/31784:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8115fc46>] vm_mmap_pgoff+0x66/0xb0
 #1:  (&(&zone->lock)->rlock){-.-.-.}, at: [<ffffffff811661fb>] compaction_alloc+0x21b/0x2f0

stack backtrace:
Pid: 31784, comm: trinity-child1 Not tainted 3.5.0-rc1+ #50
Call Trace:
 [<ffffffff810b6584>] __lock_acquire+0x1584/0x1aa0
 [<ffffffff810b19c8>] ? trace_hardirqs_off_caller+0x28/0xc0
 [<ffffffff8108cd47>] ? local_clock+0x47/0x60
 [<ffffffff810b7162>] lock_acquire+0x92/0x1f0
 [<ffffffff81165c5d>] ? suitable_migration_target.isra.15+0x19d/0x1e0
 [<ffffffff8164ce05>] ? _raw_spin_lock_irqsave+0x25/0x90
 [<ffffffff8164ce32>] _raw_spin_lock_irqsave+0x52/0x90
 [<ffffffff81165c5d>] ? suitable_migration_target.isra.15+0x19d/0x1e0
 [<ffffffff81165c5d>] suitable_migration_target.isra.15+0x19d/0x1e0
 [<ffffffff8116620e>] compaction_alloc+0x22e/0x2f0
 [<ffffffff81198547>] migrate_pages+0xc7/0x540
 [<ffffffff81165fe0>] ? isolate_freepages_block+0x260/0x260
 [<ffffffff81166e86>] compact_zone+0x216/0x480
 [<ffffffff810b19c8>] ? trace_hardirqs_off_caller+0x28/0xc0
 [<ffffffff811673cd>] compact_zone_order+0x8d/0xd0
 [<ffffffff811499e5>] ? get_page_from_freelist+0x565/0x970
 [<ffffffff811674d9>] try_to_compact_pages+0xc9/0x140
 [<ffffffff81642e01>] __alloc_pages_direct_compact+0xaa/0x1d0


Then a bunch of NMI backtraces, and a hard lockup.

	Dave 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-03 18:15                   ` Dave Jones
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-03 18:15 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Fri, Jun 01, 2012 at 09:40:35PM -0700, Hugh Dickins wrote:

 > In which case, yes, much better to follow your suggestion, and hold
 > the lock (with irqs disabled) for only half the time.
 > 
 > Similarly untested patch below.

Things aren't happy with that patch at all.

=============================================
[ INFO: possible recursive locking detected ]
3.5.0-rc1+ #50 Not tainted
---------------------------------------------
trinity-child1/31784 is trying to acquire lock:
 (&(&zone->lock)->rlock){-.-.-.}, at: [<ffffffff81165c5d>] suitable_migration_target.isra.15+0x19d/0x1e0

but task is already holding lock:
 (&(&zone->lock)->rlock){-.-.-.}, at: [<ffffffff811661fb>] compaction_alloc+0x21b/0x2f0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&zone->lock)->rlock);
  lock(&(&zone->lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by trinity-child1/31784:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8115fc46>] vm_mmap_pgoff+0x66/0xb0
 #1:  (&(&zone->lock)->rlock){-.-.-.}, at: [<ffffffff811661fb>] compaction_alloc+0x21b/0x2f0

stack backtrace:
Pid: 31784, comm: trinity-child1 Not tainted 3.5.0-rc1+ #50
Call Trace:
 [<ffffffff810b6584>] __lock_acquire+0x1584/0x1aa0
 [<ffffffff810b19c8>] ? trace_hardirqs_off_caller+0x28/0xc0
 [<ffffffff8108cd47>] ? local_clock+0x47/0x60
 [<ffffffff810b7162>] lock_acquire+0x92/0x1f0
 [<ffffffff81165c5d>] ? suitable_migration_target.isra.15+0x19d/0x1e0
 [<ffffffff8164ce05>] ? _raw_spin_lock_irqsave+0x25/0x90
 [<ffffffff8164ce32>] _raw_spin_lock_irqsave+0x52/0x90
 [<ffffffff81165c5d>] ? suitable_migration_target.isra.15+0x19d/0x1e0
 [<ffffffff81165c5d>] suitable_migration_target.isra.15+0x19d/0x1e0
 [<ffffffff8116620e>] compaction_alloc+0x22e/0x2f0
 [<ffffffff81198547>] migrate_pages+0xc7/0x540
 [<ffffffff81165fe0>] ? isolate_freepages_block+0x260/0x260
 [<ffffffff81166e86>] compact_zone+0x216/0x480
 [<ffffffff810b19c8>] ? trace_hardirqs_off_caller+0x28/0xc0
 [<ffffffff811673cd>] compact_zone_order+0x8d/0xd0
 [<ffffffff811499e5>] ? get_page_from_freelist+0x565/0x970
 [<ffffffff811674d9>] try_to_compact_pages+0xc9/0x140
 [<ffffffff81642e01>] __alloc_pages_direct_compact+0xaa/0x1d0


Then a bunch of NMI backtraces, and a hard lockup.

	Dave 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-03 18:15                   ` Dave Jones
@ 2012-06-03 18:23                     ` Linus Torvalds
  -1 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-03 18:23 UTC (permalink / raw)
  To: Dave Jones, Hugh Dickins, Linus Torvalds,
	Bartlomiej Zolnierkiewicz, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Minchan Kim, Rik van Riel, Andrew Morton, Cong Wang,
	Markus Trippelsdorf, linux-kernel, linux-mm

On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones <davej@redhat.com> wrote:
>
> Things aren't happy with that patch at all.

Yeah, at this point I think we need to just revert the compaction changes.

Guys, what's the minimal set of commits to revert? That clearly buggy
"rescue_unmovable_pageblock()" function was introduced by commit
5ceb9ce6fe94, but is that actually involved with the particular bug?
That commit seems to revert cleanly still, but is that sufficient or
does it even matter?

                  Linus

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-03 18:23                     ` Linus Torvalds
  0 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-03 18:23 UTC (permalink / raw)
  To: Dave Jones, Hugh Dickins, Linus Torvalds,
	Bartlomiej Zolnierkiewicz, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Minchan Kim, Rik van Riel, Andrew Morton, Cong Wang,
	Markus Trippelsdorf, linux-kernel, linux-mm

On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones <davej@redhat.com> wrote:
>
> Things aren't happy with that patch at all.

Yeah, at this point I think we need to just revert the compaction changes.

Guys, what's the minimal set of commits to revert? That clearly buggy
"rescue_unmovable_pageblock()" function was introduced by commit
5ceb9ce6fe94, but is that actually involved with the particular bug?
That commit seems to revert cleanly still, but is that sufficient or
does it even matter?

                  Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-03 18:23                     ` Linus Torvalds
@ 2012-06-03 18:31                       ` Dave Jones
  -1 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-03 18:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Sun, Jun 03, 2012 at 11:23:29AM -0700, Linus Torvalds wrote:
 > On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > Things aren't happy with that patch at all.
 > 
 > Yeah, at this point I think we need to just revert the compaction changes.
 > 
 > Guys, what's the minimal set of commits to revert? That clearly buggy
 > "rescue_unmovable_pageblock()" function was introduced by commit
 > 5ceb9ce6fe94, but is that actually involved with the particular bug?
 > That commit seems to revert cleanly still, but is that sufficient or
 > does it even matter?

I'l rerun the test with that (and Hugh's last patch) backed out, and see
if that makes any difference.

	Dave


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-03 18:31                       ` Dave Jones
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-03 18:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Sun, Jun 03, 2012 at 11:23:29AM -0700, Linus Torvalds wrote:
 > On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > Things aren't happy with that patch at all.
 > 
 > Yeah, at this point I think we need to just revert the compaction changes.
 > 
 > Guys, what's the minimal set of commits to revert? That clearly buggy
 > "rescue_unmovable_pageblock()" function was introduced by commit
 > 5ceb9ce6fe94, but is that actually involved with the particular bug?
 > That commit seems to revert cleanly still, but is that sufficient or
 > does it even matter?

I'l rerun the test with that (and Hugh's last patch) backed out, and see
if that makes any difference.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-03 18:31                       ` Dave Jones
@ 2012-06-03 20:53                         ` Dave Jones
  -1 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-03 20:53 UTC (permalink / raw)
  To: Linus Torvalds, Hugh Dickins, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Minchan Kim,
	Rik van Riel, Andrew Morton, Cong Wang, Markus Trippelsdorf,
	linux-kernel, linux-mm

On Sun, Jun 03, 2012 at 02:31:39PM -0400, Dave Jones wrote:
 > On Sun, Jun 03, 2012 at 11:23:29AM -0700, Linus Torvalds wrote:
 >  > On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones <davej@redhat.com> wrote:
 >  > >
 >  > > Things aren't happy with that patch at all.
 >  > 
 >  > Yeah, at this point I think we need to just revert the compaction changes.
 >  > 
 >  > Guys, what's the minimal set of commits to revert? That clearly buggy
 >  > "rescue_unmovable_pageblock()" function was introduced by commit
 >  > 5ceb9ce6fe94, but is that actually involved with the particular bug?
 >  > That commit seems to revert cleanly still, but is that sufficient or
 >  > does it even matter?
 > 
 > I'l rerun the test with that (and Hugh's last patch) backed out, and see
 > if that makes any difference.

running just over two hours with that commit reverted with no obvious ill effects so far.

	Dave 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-03 20:53                         ` Dave Jones
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-03 20:53 UTC (permalink / raw)
  To: Linus Torvalds, Hugh Dickins, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Minchan Kim,
	Rik van Riel, Andrew Morton, Cong Wang, Markus Trippelsdorf,
	linux-kernel, linux-mm

On Sun, Jun 03, 2012 at 02:31:39PM -0400, Dave Jones wrote:
 > On Sun, Jun 03, 2012 at 11:23:29AM -0700, Linus Torvalds wrote:
 >  > On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones <davej@redhat.com> wrote:
 >  > >
 >  > > Things aren't happy with that patch at all.
 >  > 
 >  > Yeah, at this point I think we need to just revert the compaction changes.
 >  > 
 >  > Guys, what's the minimal set of commits to revert? That clearly buggy
 >  > "rescue_unmovable_pageblock()" function was introduced by commit
 >  > 5ceb9ce6fe94, but is that actually involved with the particular bug?
 >  > That commit seems to revert cleanly still, but is that sufficient or
 >  > does it even matter?
 > 
 > I'l rerun the test with that (and Hugh's last patch) backed out, and see
 > if that makes any difference.

running just over two hours with that commit reverted with no obvious ill effects so far.

	Dave 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-03 20:53                         ` Dave Jones
@ 2012-06-03 21:59                           ` Linus Torvalds
  -1 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-03 21:59 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Hugh Dickins,
	Bartlomiej Zolnierkiewicz, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Minchan Kim, Rik van Riel, Andrew Morton, Cong Wang,
	Markus Trippelsdorf, linux-kernel, linux-mm

On Sun, Jun 3, 2012 at 1:53 PM, Dave Jones <davej@redhat.com> wrote:
>
> running just over two hours with that commit reverted with no obvious ill effects so far.

And how quickly have you usually seen the problems? Would you have
considered two ours "good" in your bisection thing?

Also, just to check: Hugh sent out a patch called "mm: fix warning in
__set_page_dirty_nobuffers". Is that applied in your tree too, or did
the __set_page_dirty_nobuffers() warning go away with just the revert?

I'm just trying to figure out exactly what you are testing. When you
said "test with that (and Hugh's last patch) backed out", the "and
Hugh's last patch" part was a bit ambiguous. Do you mean the trial
patch in this thread (backed out) or do you mean "*with* Hugh's patch
for the __set_page_dirty_nobuffers() warning".

              Linus

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-03 21:59                           ` Linus Torvalds
  0 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-03 21:59 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Hugh Dickins,
	Bartlomiej Zolnierkiewicz, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Minchan Kim, Rik van Riel, Andrew Morton, Cong Wang,
	Markus Trippelsdorf, linux-kernel, linux-mm

On Sun, Jun 3, 2012 at 1:53 PM, Dave Jones <davej@redhat.com> wrote:
>
> running just over two hours with that commit reverted with no obvious ill effects so far.

And how quickly have you usually seen the problems? Would you have
considered two ours "good" in your bisection thing?

Also, just to check: Hugh sent out a patch called "mm: fix warning in
__set_page_dirty_nobuffers". Is that applied in your tree too, or did
the __set_page_dirty_nobuffers() warning go away with just the revert?

I'm just trying to figure out exactly what you are testing. When you
said "test with that (and Hugh's last patch) backed out", the "and
Hugh's last patch" part was a bit ambiguous. Do you mean the trial
patch in this thread (backed out) or do you mean "*with* Hugh's patch
for the __set_page_dirty_nobuffers() warning".

              Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-03 21:59                           ` Linus Torvalds
@ 2012-06-03 22:13                             ` Dave Jones
  -1 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-03 22:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Sun, Jun 03, 2012 at 02:59:22PM -0700, Linus Torvalds wrote:
 > On Sun, Jun 3, 2012 at 1:53 PM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > running just over two hours with that commit reverted with no obvious ill effects so far.
 > 
 > And how quickly have you usually seen the problems? Would you have
 > considered two ours "good" in your bisection thing?

Yeah, usually see something go awry in an hour or less.

 > Also, just to check: Hugh sent out a patch called "mm: fix warning in
 > __set_page_dirty_nobuffers". Is that applied in your tree too, or did
 > the __set_page_dirty_nobuffers() warning go away with just the revert?

That is applied. Otherwise I see the warning he refers to.

 > I'm just trying to figure out exactly what you are testing. When you
 > said "test with that (and Hugh's last patch) backed out", the "and
 > Hugh's last patch" part was a bit ambiguous. Do you mean the trial
 > patch in this thread (backed out) or do you mean "*with* Hugh's patch
 > for the __set_page_dirty_nobuffers() warning".

The former.  (This).

--- 3.4.0+/mm/compaction.c      2012-05-30 08:17:19.396008280 -0700
+++ linux/mm/compaction.c       2012-06-01 20:59:56.840204915 -0700
@@ -369,6 +369,8 @@ static bool rescue_unmovable_pageblock(s
 {
  	unsigned long pfn, start_pfn, end_pfn;
       	struct page *start_page, *end_page;
+       struct zone *zone;
+       unsigned long flags;

        pfn = page_to_pfn(page);
        start_pfn = pfn & ~(pageblock_nr_pages - 1);
@@ -378,7 +380,8 @@ static bool rescue_unmovable_pageblock(s
        end_page = pfn_to_page(end_pfn);

        /* Do not deal with pageblocks that overlap zones */
-       if (page_zone(start_page) != page_zone(end_page))
+       zone = page_zone(start_page);
+       if (zone != page_zone(end_page))
                return false;

        for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
@@ -399,8 +402,10 @@ static bool rescue_unmovable_pageblock(s
                return false;
        }

+       spin_lock_irqsave(&zone->lock, flags);
        set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-       move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+       move_freepages_block(zone, page, MIGRATE_MOVABLE);
+       spin_unlock_irqrestore(&zone->lock, flags);
        return true;



I do see something else weird going on, but it seems like an unrelated problem.
I have a lot of processes hanging after calling sys_renameat.

I'll dig some more on that, and post a follow-up.

	Dave



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-03 22:13                             ` Dave Jones
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Jones @ 2012-06-03 22:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Sun, Jun 03, 2012 at 02:59:22PM -0700, Linus Torvalds wrote:
 > On Sun, Jun 3, 2012 at 1:53 PM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > running just over two hours with that commit reverted with no obvious ill effects so far.
 > 
 > And how quickly have you usually seen the problems? Would you have
 > considered two ours "good" in your bisection thing?

Yeah, usually see something go awry in an hour or less.

 > Also, just to check: Hugh sent out a patch called "mm: fix warning in
 > __set_page_dirty_nobuffers". Is that applied in your tree too, or did
 > the __set_page_dirty_nobuffers() warning go away with just the revert?

That is applied. Otherwise I see the warning he refers to.

 > I'm just trying to figure out exactly what you are testing. When you
 > said "test with that (and Hugh's last patch) backed out", the "and
 > Hugh's last patch" part was a bit ambiguous. Do you mean the trial
 > patch in this thread (backed out) or do you mean "*with* Hugh's patch
 > for the __set_page_dirty_nobuffers() warning".

The former.  (This).

--- 3.4.0+/mm/compaction.c      2012-05-30 08:17:19.396008280 -0700
+++ linux/mm/compaction.c       2012-06-01 20:59:56.840204915 -0700
@@ -369,6 +369,8 @@ static bool rescue_unmovable_pageblock(s
 {
  	unsigned long pfn, start_pfn, end_pfn;
       	struct page *start_page, *end_page;
+       struct zone *zone;
+       unsigned long flags;

        pfn = page_to_pfn(page);
        start_pfn = pfn & ~(pageblock_nr_pages - 1);
@@ -378,7 +380,8 @@ static bool rescue_unmovable_pageblock(s
        end_page = pfn_to_page(end_pfn);

        /* Do not deal with pageblocks that overlap zones */
-       if (page_zone(start_page) != page_zone(end_page))
+       zone = page_zone(start_page);
+       if (zone != page_zone(end_page))
                return false;

        for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
@@ -399,8 +402,10 @@ static bool rescue_unmovable_pageblock(s
                return false;
        }

+       spin_lock_irqsave(&zone->lock, flags);
        set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-       move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+       move_freepages_block(zone, page, MIGRATE_MOVABLE);
+       spin_unlock_irqrestore(&zone->lock, flags);
        return true;



I do see something else weird going on, but it seems like an unrelated problem.
I have a lot of processes hanging after calling sys_renameat.

I'll dig some more on that, and post a follow-up.

	Dave


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-03 20:53                         ` Dave Jones
@ 2012-06-03 22:17                           ` Hugh Dickins
  -1 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-03 22:17 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Minchan Kim,
	Rik van Riel, Andrew Morton, Cong Wang, Markus Trippelsdorf
  Cc: linux-kernel, linux-mm

On Sun, 3 Jun 2012, Dave Jones wrote:
> On Sun, Jun 03, 2012 at 02:31:39PM -0400, Dave Jones wrote:
>  > On Sun, Jun 03, 2012 at 11:23:29AM -0700, Linus Torvalds wrote:
>  >  > On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones <davej@redhat.com> wrote:
>  >  > >
>  >  > > Things aren't happy with that patch at all.
>  >  > 
>  >  > Yeah, at this point I think we need to just revert the compaction changes.
>  >  > 
>  >  > Guys, what's the minimal set of commits to revert? That clearly buggy
>  >  > "rescue_unmovable_pageblock()" function was introduced by commit
>  >  > 5ceb9ce6fe94, but is that actually involved with the particular bug?
>  >  > That commit seems to revert cleanly still, but is that sufficient or
>  >  > does it even matter?
>  > 
>  > I'l rerun the test with that (and Hugh's last patch) backed out, and see
>  > if that makes any difference.
> 
> running just over two hours with that commit reverted with no obvious ill effects so far.

Yes, and I ran happily with precisely that commit reverted on Friday -
though I've never got the list corruption that you saw with it in.  

The locking bug certainly comes in with that commit, it's an isolated
commit that reverts cleanly, and I think you got the list corruption
rather sooner than two hours before (9min, 30min, 41min from the traces
you sent).

Maybe we should let you run a little longer, or wait for others to comment.

But another strike against that commit: I tried fixing it up to use
start_page instead of page at the end, with the worrying but safer
locking I suggested at first, with a count of how many times it went
there, and how many times it succeeded.

While I ran my usual swapping test (perhaps that's a very unfair test
to run on this, I've no idea) for seven hours, it went there 25406
times (once per second, it appears) and it succeeded... 0 times.

Let's hope it failed quickly each time, I wasn't capturing that.

Hugh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-03 22:17                           ` Hugh Dickins
  0 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-03 22:17 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Minchan Kim,
	Rik van Riel, Andrew Morton, Cong Wang, Markus Trippelsdorf
  Cc: linux-kernel, linux-mm

On Sun, 3 Jun 2012, Dave Jones wrote:
> On Sun, Jun 03, 2012 at 02:31:39PM -0400, Dave Jones wrote:
>  > On Sun, Jun 03, 2012 at 11:23:29AM -0700, Linus Torvalds wrote:
>  >  > On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones <davej@redhat.com> wrote:
>  >  > >
>  >  > > Things aren't happy with that patch at all.
>  >  > 
>  >  > Yeah, at this point I think we need to just revert the compaction changes.
>  >  > 
>  >  > Guys, what's the minimal set of commits to revert? That clearly buggy
>  >  > "rescue_unmovable_pageblock()" function was introduced by commit
>  >  > 5ceb9ce6fe94, but is that actually involved with the particular bug?
>  >  > That commit seems to revert cleanly still, but is that sufficient or
>  >  > does it even matter?
>  > 
>  > I'l rerun the test with that (and Hugh's last patch) backed out, and see
>  > if that makes any difference.
> 
> running just over two hours with that commit reverted with no obvious ill effects so far.

Yes, and I ran happily with precisely that commit reverted on Friday -
though I've never got the list corruption that you saw with it in.  

The locking bug certainly comes in with that commit, it's an isolated
commit that reverts cleanly, and I think you got the list corruption
rather sooner than two hours before (9min, 30min, 41min from the traces
you sent).

Maybe we should let you run a little longer, or wait for others to comment.

But another strike against that commit: I tried fixing it up to use
start_page instead of page at the end, with the worrying but safer
locking I suggested at first, with a count of how many times it went
there, and how many times it succeeded.

While I ran my usual swapping test (perhaps that's a very unfair test
to run on this, I've no idea) for seven hours, it went there 25406
times (once per second, it appears) and it succeeded... 0 times.

Let's hope it failed quickly each time, I wasn't capturing that.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-03 21:59                           ` Linus Torvalds
@ 2012-06-03 22:29                             ` Hugh Dickins
  -1 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-03 22:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Sun, 3 Jun 2012, Linus Torvalds wrote:
> On Sun, Jun 3, 2012 at 1:53 PM, Dave Jones <davej@redhat.com> wrote:
> >
> > running just over two hours with that commit reverted with no obvious ill effects so far.
> 
> And how quickly have you usually seen the problems? Would you have
> considered two ours "good" in your bisection thing?
> 
> Also, just to check: Hugh sent out a patch called "mm: fix warning in
> __set_page_dirty_nobuffers". Is that applied in your tree too, or did
> the __set_page_dirty_nobuffers() warning go away with just the revert?

That patch is good for fixing the __set_page_dirty_nobuffers() warning,
but it has no relevance to the list corruption Dave was also reporting,
nor vice versa.  The common factor there is just Dave.

And no disaster that the warning fix missed -rc1: it's only a WARN_ON_ONCE,
and nothing was wrong beyond the warning itself, just noise.

It's true that Dave's original bisection raised the doubt whether
that warning is coming from somewhere else too; but best guess at this
point is that something got mixed up, and we should only worry about
that if we see the warning again once the known fix is in.

Hugh

> 
> I'm just trying to figure out exactly what you are testing. When you
> said "test with that (and Hugh's last patch) backed out", the "and
> Hugh's last patch" part was a bit ambiguous. Do you mean the trial
> patch in this thread (backed out) or do you mean "*with* Hugh's patch
> for the __set_page_dirty_nobuffers() warning".

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-03 22:29                             ` Hugh Dickins
  0 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-03 22:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Sun, 3 Jun 2012, Linus Torvalds wrote:
> On Sun, Jun 3, 2012 at 1:53 PM, Dave Jones <davej@redhat.com> wrote:
> >
> > running just over two hours with that commit reverted with no obvious ill effects so far.
> 
> And how quickly have you usually seen the problems? Would you have
> considered two ours "good" in your bisection thing?
> 
> Also, just to check: Hugh sent out a patch called "mm: fix warning in
> __set_page_dirty_nobuffers". Is that applied in your tree too, or did
> the __set_page_dirty_nobuffers() warning go away with just the revert?

That patch is good for fixing the __set_page_dirty_nobuffers() warning,
but it has no relevance to the list corruption Dave was also reporting,
nor vice versa.  The common factor there is just Dave.

And no disaster that the warning fix missed -rc1: it's only a WARN_ON_ONCE,
and nothing was wrong beyond the warning itself, just noise.

It's true that Dave's original bisection raised the doubt whether
that warning is coming from somewhere else too; but best guess at this
point is that something got mixed up, and we should only worry about
that if we see the warning again once the known fix is in.

Hugh

> 
> I'm just trying to figure out exactly what you are testing. When you
> said "test with that (and Hugh's last patch) backed out", the "and
> Hugh's last patch" part was a bit ambiguous. Do you mean the trial
> patch in this thread (backed out) or do you mean "*with* Hugh's patch
> for the __set_page_dirty_nobuffers() warning".

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-03 22:17                           ` Hugh Dickins
@ 2012-06-03 23:13                             ` Linus Torvalds
  -1 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-03 23:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
>
> But another strike against that commit: I tried fixing it up to use
> start_page instead of page at the end, with the worrying but safer
> locking I suggested at first, with a count of how many times it went
> there, and how many times it succeeded.

You can't use start_page anyway, it might not be a valid page. There's
a reson it does that "pfn_valid_within()", methinks.

Anyway, my current plan is to apply your "mm: fix warning in
__set_page_dirty_nobuffers" patch - even if it's just a harmless
WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally
hit his problem much before two hours, and it must be even longer now.

Ack on that plan?

        Linus

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-03 23:13                             ` Linus Torvalds
  0 siblings, 0 replies; 87+ messages in thread
From: Linus Torvalds @ 2012-06-03 23:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
>
> But another strike against that commit: I tried fixing it up to use
> start_page instead of page at the end, with the worrying but safer
> locking I suggested at first, with a count of how many times it went
> there, and how many times it succeeded.

You can't use start_page anyway, it might not be a valid page. There's
a reson it does that "pfn_valid_within()", methinks.

Anyway, my current plan is to apply your "mm: fix warning in
__set_page_dirty_nobuffers" patch - even if it's just a harmless
WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally
hit his problem much before two hours, and it must be even longer now.

Ack on that plan?

        Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-03 23:13                             ` Linus Torvalds
@ 2012-06-04  0:46                               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 87+ messages in thread
From: KOSAKI Motohiro @ 2012-06-04  0:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Dave Jones, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Minchan Kim,
	Rik van Riel, Andrew Morton, Cong Wang, Markus Trippelsdorf,
	linux-kernel, linux-mm, kosaki.motohiro

(6/3/12 7:13 PM), Linus Torvalds wrote:
> On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins<hughd@google.com>  wrote:
>>
>> But another strike against that commit: I tried fixing it up to use
>> start_page instead of page at the end, with the worrying but safer
>> locking I suggested at first, with a count of how many times it went
>> there, and how many times it succeeded.
>
> You can't use start_page anyway, it might not be a valid page. There's
> a reson it does that "pfn_valid_within()", methinks.

Right. ia64 has strange^H^H^H^H optimized pfn_valid and we need care it.
(btw, I don't understand why mips may enable CONFIG_HOLES_INZONE, mips doesn't
have custom pfn_valid)


> Anyway, my current plan is to apply your "mm: fix warning in
> __set_page_dirty_nobuffers" patch - even if it's just a harmless
> WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally
> hit his problem much before two hours, and it must be even longer now.
>
> Ack on that plan?

+1.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-04  0:46                               ` KOSAKI Motohiro
  0 siblings, 0 replies; 87+ messages in thread
From: KOSAKI Motohiro @ 2012-06-04  0:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Dave Jones, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Minchan Kim,
	Rik van Riel, Andrew Morton, Cong Wang, Markus Trippelsdorf,
	linux-kernel, linux-mm, kosaki.motohiro

(6/3/12 7:13 PM), Linus Torvalds wrote:
> On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins<hughd@google.com>  wrote:
>>
>> But another strike against that commit: I tried fixing it up to use
>> start_page instead of page at the end, with the worrying but safer
>> locking I suggested at first, with a count of how many times it went
>> there, and how many times it succeeded.
>
> You can't use start_page anyway, it might not be a valid page. There's
> a reson it does that "pfn_valid_within()", methinks.

Right. ia64 has strange^H^H^H^H optimized pfn_valid and we need care it.
(btw, I don't understand why mips may enable CONFIG_HOLES_INZONE, mips doesn't
have custom pfn_valid)


> Anyway, my current plan is to apply your "mm: fix warning in
> __set_page_dirty_nobuffers" patch - even if it's just a harmless
> WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally
> hit his problem much before two hours, and it must be even longer now.
>
> Ack on that plan?

+1.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-02  4:40               ` Hugh Dickins
@ 2012-06-04  1:10                   ` Minchan Kim
  2012-06-02  7:17                   ` Markus Trippelsdorf
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 87+ messages in thread
From: Minchan Kim @ 2012-06-04  1:10 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On 06/02/2012 01:40 PM, Hugh Dickins wrote:

> On Fri, 1 Jun 2012, Linus Torvalds wrote:
>> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
>>>
>>> +       spin_lock_irqsave(&zone->lock, flags);
>>>        for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
>>>                                                                  page++) {
>>
>> So holding the spinlock (and disabling irqs!) over the whole loop
>> sounds horrible.
> 
> There looks to be a pretty similar loop inside move_freepages_block(),
> which is the part which I believe really needs the lock - it's moving
> free pages from one lru to another.
> 
>>
>> At the same time, the iterators don't seem to require the spinlock, so
>> it should be possible to just move the lock into the loop, no?
> 
> Move the lock after the loop, I think you meant.
> 
> I put the lock before the loop because it's deciding whether it can
> usefully proceed, and then proceeding: I was thinking that the lock
> would stabilize the conditions that it bases that decision on.


We do it with two phase.
In first phase, we don't need lock because we don't need to be exact.
In second phase where move pages really, we need a lock so we already hold it.

ret = suitable_migration_target(page, cc);
..
..
spin_lock_irqsave(&zone->lock, flags);
ret = suitable_migration_target(page, cc); 

So you shouldn't put the lock in loop.

> 
> But it certainly does not stabilize all of them (most obviously not
> PageLRU), so I'm guesssing that this is a best-effort decision which

> can safely go wrong some of the time.

Right.

> 
> In which case, yes, much better to follow your suggestion, and hold
> the lock (with irqs disabled) for only half the time.
> 
> Similarly untested patch below.
> 
> But I'm entirely unfamiliar with this code: best Cc people more familiar
> with it.  Does this addition of locking to rescue_unmovable_pageblock()
> look correct to you, and do you think it has a good chance of fixing the


No.I think we need to use start_page instead of page and
we need a last page of page block to check cross-over zones, not first page in next page block.

I should have reviewed more carefully. :(

barrios@bbox:~/linux-2.6$ git diff
diff --git a/mm/compaction.c b/mm/compaction.c
index 4ac338a..b3fcc4b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
 
        pfn = page_to_pfn(page);
        start_pfn = pfn & ~(pageblock_nr_pages - 1);
-       end_pfn = start_pfn + pageblock_nr_pages;
+       end_pfn = start_pfn + pageblock_nr_pages - 1;
 
        start_page = pfn_to_page(start_pfn);
        end_page = pfn_to_page(end_pfn);
@@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
        if (page_zone(start_page) != page_zone(end_page))
                return false;
 
-       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
+       for (page = start_page, pfn = start_pfn; page <= end_page; pfn++,
                                                                  page++) {
                if (!pfn_valid_within(pfn))
                        continue;
@@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page)
                return false;
        }
 
-       set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-       move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+       set_pageblock_migratetype(start_page, MIGRATE_MOVABLE);
+       move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE);
        return true;
 }


Hugh, thanks for looking this.

> move_freepages_block() list debug warnings which Dave has been reporting
> (in this and in another thread)?
> 
> (Although there's still something of a mystery in where Dave's bisection
> appeared to converge, our best assumption at present is that one of my
> tmpfs changes is to blame for the __set_page_dirty_nobuffers warnings,
> and I need to send a finalized patch to fix that later.
> 
> I'm guessing that the few people who see the warning are those running
> new systemd distros, and that systemd is indeed now making use of the
> fallocate support we added into tmpfs for it.)
> 
> Hugh
> 
> --- 3.4.0+/mm/compaction.c	2012-05-30 08:17:19.396008280 -0700
> +++ linux/mm/compaction.c	2012-06-01 20:59:56.840204915 -0700
> @@ -369,6 +369,8 @@ static bool rescue_unmovable_pageblock(s
>  {
>  	unsigned long pfn, start_pfn, end_pfn;
>  	struct page *start_page, *end_page;
> +	struct zone *zone;
> +	unsigned long flags;
>  
>  	pfn = page_to_pfn(page);
>  	start_pfn = pfn & ~(pageblock_nr_pages - 1);
> @@ -378,7 +380,8 @@ static bool rescue_unmovable_pageblock(s
>  	end_page = pfn_to_page(end_pfn);
>  
>  	/* Do not deal with pageblocks that overlap zones */
> -	if (page_zone(start_page) != page_zone(end_page))
> +	zone = page_zone(start_page);
> +	if (zone != page_zone(end_page))
>  		return false;
>  
>  	for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
> @@ -399,8 +402,10 @@ static bool rescue_unmovable_pageblock(s
>  		return false;
>  	}
>  
> +	spin_lock_irqsave(&zone->lock, flags);
>  	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> -	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
> +	move_freepages_block(zone, page, MIGRATE_MOVABLE);
> +	spin_unlock_irqrestore(&zone->lock, flags);
>  	return true;
>  }
>  



-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-04  1:10                   ` Minchan Kim
  0 siblings, 0 replies; 87+ messages in thread
From: Minchan Kim @ 2012-06-04  1:10 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On 06/02/2012 01:40 PM, Hugh Dickins wrote:

> On Fri, 1 Jun 2012, Linus Torvalds wrote:
>> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
>>>
>>> +       spin_lock_irqsave(&zone->lock, flags);
>>>        for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
>>>                                                                  page++) {
>>
>> So holding the spinlock (and disabling irqs!) over the whole loop
>> sounds horrible.
> 
> There looks to be a pretty similar loop inside move_freepages_block(),
> which is the part which I believe really needs the lock - it's moving
> free pages from one lru to another.
> 
>>
>> At the same time, the iterators don't seem to require the spinlock, so
>> it should be possible to just move the lock into the loop, no?
> 
> Move the lock after the loop, I think you meant.
> 
> I put the lock before the loop because it's deciding whether it can
> usefully proceed, and then proceeding: I was thinking that the lock
> would stabilize the conditions that it bases that decision on.


We do it with two phase.
In first phase, we don't need lock because we don't need to be exact.
In second phase where move pages really, we need a lock so we already hold it.

ret = suitable_migration_target(page, cc);
..
..
spin_lock_irqsave(&zone->lock, flags);
ret = suitable_migration_target(page, cc); 

So you shouldn't put the lock in loop.

> 
> But it certainly does not stabilize all of them (most obviously not
> PageLRU), so I'm guesssing that this is a best-effort decision which

> can safely go wrong some of the time.

Right.

> 
> In which case, yes, much better to follow your suggestion, and hold
> the lock (with irqs disabled) for only half the time.
> 
> Similarly untested patch below.
> 
> But I'm entirely unfamiliar with this code: best Cc people more familiar
> with it.  Does this addition of locking to rescue_unmovable_pageblock()
> look correct to you, and do you think it has a good chance of fixing the


No.I think we need to use start_page instead of page and
we need a last page of page block to check cross-over zones, not first page in next page block.

I should have reviewed more carefully. :(

barrios@bbox:~/linux-2.6$ git diff
diff --git a/mm/compaction.c b/mm/compaction.c
index 4ac338a..b3fcc4b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
 
        pfn = page_to_pfn(page);
        start_pfn = pfn & ~(pageblock_nr_pages - 1);
-       end_pfn = start_pfn + pageblock_nr_pages;
+       end_pfn = start_pfn + pageblock_nr_pages - 1;
 
        start_page = pfn_to_page(start_pfn);
        end_page = pfn_to_page(end_pfn);
@@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
        if (page_zone(start_page) != page_zone(end_page))
                return false;
 
-       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
+       for (page = start_page, pfn = start_pfn; page <= end_page; pfn++,
                                                                  page++) {
                if (!pfn_valid_within(pfn))
                        continue;
@@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page)
                return false;
        }
 
-       set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-       move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+       set_pageblock_migratetype(start_page, MIGRATE_MOVABLE);
+       move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE);
        return true;
 }


Hugh, thanks for looking this.

> move_freepages_block() list debug warnings which Dave has been reporting
> (in this and in another thread)?
> 
> (Although there's still something of a mystery in where Dave's bisection
> appeared to converge, our best assumption at present is that one of my
> tmpfs changes is to blame for the __set_page_dirty_nobuffers warnings,
> and I need to send a finalized patch to fix that later.
> 
> I'm guessing that the few people who see the warning are those running
> new systemd distros, and that systemd is indeed now making use of the
> fallocate support we added into tmpfs for it.)
> 
> Hugh
> 
> --- 3.4.0+/mm/compaction.c	2012-05-30 08:17:19.396008280 -0700
> +++ linux/mm/compaction.c	2012-06-01 20:59:56.840204915 -0700
> @@ -369,6 +369,8 @@ static bool rescue_unmovable_pageblock(s
>  {
>  	unsigned long pfn, start_pfn, end_pfn;
>  	struct page *start_page, *end_page;
> +	struct zone *zone;
> +	unsigned long flags;
>  
>  	pfn = page_to_pfn(page);
>  	start_pfn = pfn & ~(pageblock_nr_pages - 1);
> @@ -378,7 +380,8 @@ static bool rescue_unmovable_pageblock(s
>  	end_page = pfn_to_page(end_pfn);
>  
>  	/* Do not deal with pageblocks that overlap zones */
> -	if (page_zone(start_page) != page_zone(end_page))
> +	zone = page_zone(start_page);
> +	if (zone != page_zone(end_page))
>  		return false;
>  
>  	for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
> @@ -399,8 +402,10 @@ static bool rescue_unmovable_pageblock(s
>  		return false;
>  	}
>  
> +	spin_lock_irqsave(&zone->lock, flags);
>  	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> -	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
> +	move_freepages_block(zone, page, MIGRATE_MOVABLE);
> +	spin_unlock_irqrestore(&zone->lock, flags);
>  	return true;
>  }
>  



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-03 23:13                             ` Linus Torvalds
@ 2012-06-04  1:18                               ` Hugh Dickins
  -1 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-04  1:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Sun, 3 Jun 2012, Linus Torvalds wrote:
> On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
> >
> > But another strike against that commit: I tried fixing it up to use
> > start_page instead of page at the end, with the worrying but safer
> > locking I suggested at first, with a count of how many times it went
> > there, and how many times it succeeded.
> 
> You can't use start_page anyway, it might not be a valid page. There's
> a reson it does that "pfn_valid_within()", methinks.

You wouldn't want me to say that I think you're right,
it would impudently suggest that I might conceive of you being wrong.
I sigh for your heavy burden.

> 
> Anyway, my current plan is to apply your "mm: fix warning in
> __set_page_dirty_nobuffers" patch - even if it's just a harmless
> WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally
> hit his problem much before two hours, and it must be even longer now.
> 
> Ack on that plan?

Sure, ack from me on that plan.

Hugh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-04  1:18                               ` Hugh Dickins
  0 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-04  1:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Minchan Kim, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Sun, 3 Jun 2012, Linus Torvalds wrote:
> On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
> >
> > But another strike against that commit: I tried fixing it up to use
> > start_page instead of page at the end, with the worrying but safer
> > locking I suggested at first, with a count of how many times it went
> > there, and how many times it succeeded.
> 
> You can't use start_page anyway, it might not be a valid page. There's
> a reson it does that "pfn_valid_within()", methinks.

You wouldn't want me to say that I think you're right,
it would impudently suggest that I might conceive of you being wrong.
I sigh for your heavy burden.

> 
> Anyway, my current plan is to apply your "mm: fix warning in
> __set_page_dirty_nobuffers" patch - even if it's just a harmless
> WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally
> hit his problem much before two hours, and it must be even longer now.
> 
> Ack on that plan?

Sure, ack from me on that plan.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-03 23:13                             ` Linus Torvalds
@ 2012-06-04  1:21                               ` Minchan Kim
  -1 siblings, 0 replies; 87+ messages in thread
From: Minchan Kim @ 2012-06-04  1:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Dave Jones, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On 06/04/2012 08:13 AM, Linus Torvalds wrote:

> On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
>>
>> But another strike against that commit: I tried fixing it up to use
>> start_page instead of page at the end, with the worrying but safer
>> locking I suggested at first, with a count of how many times it went
>> there, and how many times it succeeded.
> 
> You can't use start_page anyway, it might not be a valid page. There's
> a reson it does that "pfn_valid_within()", methinks.


Right. I missed that. I think we can use the page passed to rescue_unmovable_pageblock.
We make sure it's valid in isolate_freepages. So how about this?

barrios@bbox:~/linux-2.6$ git diff
diff --git a/mm/compaction.c b/mm/compaction.c
index 4ac338a..7459ab5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -368,11 +368,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 static bool rescue_unmovable_pageblock(struct page *page)
 {
        unsigned long pfn, start_pfn, end_pfn;
-       struct page *start_page, *end_page;
+       struct page *start_page, *end_page, *cursor_page;
 
        pfn = page_to_pfn(page);
        start_pfn = pfn & ~(pageblock_nr_pages - 1);
-       end_pfn = start_pfn + pageblock_nr_pages;
+       end_pfn = start_pfn + pageblock_nr_pages - 1;
 
        start_page = pfn_to_page(start_pfn);
        end_page = pfn_to_page(end_pfn);
@@ -381,19 +381,19 @@ static bool rescue_unmovable_pageblock(struct page *page)
        if (page_zone(start_page) != page_zone(end_page))
                return false;
 
-       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
-                                                                 page++) {
+       for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page; pfn++,
+                                                                 cursor_page++) {
                if (!pfn_valid_within(pfn))
                        continue;
 
-               if (PageBuddy(page)) {
-                       int order = page_order(page);
+               if (PageBuddy(cursor_page)) {
+                       int order = page_order(cursor_page);
 
                        pfn += (1 << order) - 1;
-                       page += (1 << order) - 1;
+                       cursor_page += (1 << order) - 1;
 
                        continue;
-               } else if (page_count(page) == 0 || PageLRU(page))
+               } else if (page_count(cursor_page) == 0 || PageLRU(cursor_page))
                        continue;
 
                return false;


> 
> Anyway, my current plan is to apply your "mm: fix warning in
> __set_page_dirty_nobuffers" patch - even if it's just a harmless
> WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally
> hit his problem much before two hours, and it must be even longer now.

> 

> Ack on that plan?


No objection.
The patch wasn't a bug fix and even test workload was very theoretical.

> 
>         Linus
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 



-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-04  1:21                               ` Minchan Kim
  0 siblings, 0 replies; 87+ messages in thread
From: Minchan Kim @ 2012-06-04  1:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Dave Jones, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On 06/04/2012 08:13 AM, Linus Torvalds wrote:

> On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
>>
>> But another strike against that commit: I tried fixing it up to use
>> start_page instead of page at the end, with the worrying but safer
>> locking I suggested at first, with a count of how many times it went
>> there, and how many times it succeeded.
> 
> You can't use start_page anyway, it might not be a valid page. There's
> a reson it does that "pfn_valid_within()", methinks.


Right. I missed that. I think we can use the page passed to rescue_unmovable_pageblock.
We make sure it's valid in isolate_freepages. So how about this?

barrios@bbox:~/linux-2.6$ git diff
diff --git a/mm/compaction.c b/mm/compaction.c
index 4ac338a..7459ab5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -368,11 +368,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 static bool rescue_unmovable_pageblock(struct page *page)
 {
        unsigned long pfn, start_pfn, end_pfn;
-       struct page *start_page, *end_page;
+       struct page *start_page, *end_page, *cursor_page;
 
        pfn = page_to_pfn(page);
        start_pfn = pfn & ~(pageblock_nr_pages - 1);
-       end_pfn = start_pfn + pageblock_nr_pages;
+       end_pfn = start_pfn + pageblock_nr_pages - 1;
 
        start_page = pfn_to_page(start_pfn);
        end_page = pfn_to_page(end_pfn);
@@ -381,19 +381,19 @@ static bool rescue_unmovable_pageblock(struct page *page)
        if (page_zone(start_page) != page_zone(end_page))
                return false;
 
-       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
-                                                                 page++) {
+       for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page; pfn++,
+                                                                 cursor_page++) {
                if (!pfn_valid_within(pfn))
                        continue;
 
-               if (PageBuddy(page)) {
-                       int order = page_order(page);
+               if (PageBuddy(cursor_page)) {
+                       int order = page_order(cursor_page);
 
                        pfn += (1 << order) - 1;
-                       page += (1 << order) - 1;
+                       cursor_page += (1 << order) - 1;
 
                        continue;
-               } else if (page_count(page) == 0 || PageLRU(page))
+               } else if (page_count(cursor_page) == 0 || PageLRU(cursor_page))
                        continue;
 
                return false;


> 
> Anyway, my current plan is to apply your "mm: fix warning in
> __set_page_dirty_nobuffers" patch - even if it's just a harmless
> WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally
> hit his problem much before two hours, and it must be even longer now.

> 

> Ack on that plan?


No objection.
The patch wasn't a bug fix and even test workload was very theoretical.

> 
>         Linus
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-04  1:21                               ` Minchan Kim
@ 2012-06-04  1:26                                 ` KOSAKI Motohiro
  -1 siblings, 0 replies; 87+ messages in thread
From: KOSAKI Motohiro @ 2012-06-04  1:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Linus Torvalds, Hugh Dickins, Dave Jones,
	Bartlomiej Zolnierkiewicz, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Rik van Riel, Andrew Morton, Cong Wang,
	Markus Trippelsdorf, linux-kernel, linux-mm

> Right. I missed that. I think we can use the page passed to rescue_unmovable_pageblock.
> We make sure it's valid in isolate_freepages. So how about this?
>
> barrios@bbox:~/linux-2.6$ git diff
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4ac338a..7459ab5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -368,11 +368,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>  static bool rescue_unmovable_pageblock(struct page *page)
>  {
>        unsigned long pfn, start_pfn, end_pfn;
> -       struct page *start_page, *end_page;
> +       struct page *start_page, *end_page, *cursor_page;
>
>        pfn = page_to_pfn(page);
>        start_pfn = pfn & ~(pageblock_nr_pages - 1);
> -       end_pfn = start_pfn + pageblock_nr_pages;
> +       end_pfn = start_pfn + pageblock_nr_pages - 1;
>
>        start_page = pfn_to_page(start_pfn);
>        end_page = pfn_to_page(end_pfn);
> @@ -381,19 +381,19 @@ static bool rescue_unmovable_pageblock(struct page *page)
>        if (page_zone(start_page) != page_zone(end_page))
>                return false;
>
> -       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
> -                                                                 page++) {
> +       for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page; pfn++,
> +                                                                 cursor_page++) {
>                if (!pfn_valid_within(pfn))
>                        continue;

I guess  page_zone() should be used after pfn_valid_within(). Why can
we assume invalid
pfn return correct zone?


> -               if (PageBuddy(page)) {
> -                       int order = page_order(page);
> +               if (PageBuddy(cursor_page)) {
> +                       int order = page_order(cursor_page);
>
>                        pfn += (1 << order) - 1;
> -                       page += (1 << order) - 1;
> +                       cursor_page += (1 << order) - 1;
>
>                        continue;
> -               } else if (page_count(page) == 0 || PageLRU(page))
> +               } else if (page_count(cursor_page) == 0 || PageLRU(cursor_page))
>                        continue;
>
>                return false;

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-04  1:26                                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 87+ messages in thread
From: KOSAKI Motohiro @ 2012-06-04  1:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Linus Torvalds, Hugh Dickins, Dave Jones,
	Bartlomiej Zolnierkiewicz, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Rik van Riel, Andrew Morton, Cong Wang,
	Markus Trippelsdorf, linux-kernel, linux-mm

> Right. I missed that. I think we can use the page passed to rescue_unmovable_pageblock.
> We make sure it's valid in isolate_freepages. So how about this?
>
> barrios@bbox:~/linux-2.6$ git diff
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4ac338a..7459ab5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -368,11 +368,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>  static bool rescue_unmovable_pageblock(struct page *page)
>  {
>        unsigned long pfn, start_pfn, end_pfn;
> -       struct page *start_page, *end_page;
> +       struct page *start_page, *end_page, *cursor_page;
>
>        pfn = page_to_pfn(page);
>        start_pfn = pfn & ~(pageblock_nr_pages - 1);
> -       end_pfn = start_pfn + pageblock_nr_pages;
> +       end_pfn = start_pfn + pageblock_nr_pages - 1;
>
>        start_page = pfn_to_page(start_pfn);
>        end_page = pfn_to_page(end_pfn);
> @@ -381,19 +381,19 @@ static bool rescue_unmovable_pageblock(struct page *page)
>        if (page_zone(start_page) != page_zone(end_page))
>                return false;
>
> -       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
> -                                                                 page++) {
> +       for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page; pfn++,
> +                                                                 cursor_page++) {
>                if (!pfn_valid_within(pfn))
>                        continue;

I guess  page_zone() should be used after pfn_valid_within(). Why can
we assume invalid
pfn return correct zone?


> -               if (PageBuddy(page)) {
> -                       int order = page_order(page);
> +               if (PageBuddy(cursor_page)) {
> +                       int order = page_order(cursor_page);
>
>                        pfn += (1 << order) - 1;
> -                       page += (1 << order) - 1;
> +                       cursor_page += (1 << order) - 1;
>
>                        continue;
> -               } else if (page_count(page) == 0 || PageLRU(page))
> +               } else if (page_count(cursor_page) == 0 || PageLRU(cursor_page))
>                        continue;
>
>                return false;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-04  1:10                   ` Minchan Kim
@ 2012-06-04  1:41                     ` Hugh Dickins
  -1 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-04  1:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Linus Torvalds, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Mon, 4 Jun 2012, Minchan Kim wrote:
> On 06/02/2012 01:40 PM, Hugh Dickins wrote:
> 
> > On Fri, 1 Jun 2012, Linus Torvalds wrote:
> >> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
> >>>
> >>> +       spin_lock_irqsave(&zone->lock, flags);
> >>>        for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
> >>>                                                                  page++) {
> >>
> >> So holding the spinlock (and disabling irqs!) over the whole loop
> >> sounds horrible.
> > 
> > There looks to be a pretty similar loop inside move_freepages_block(),
> > which is the part which I believe really needs the lock - it's moving
> > free pages from one lru to another.
> > 
> >>
> >> At the same time, the iterators don't seem to require the spinlock, so
> >> it should be possible to just move the lock into the loop, no?
> > 
> > Move the lock after the loop, I think you meant.
> > 
> > I put the lock before the loop because it's deciding whether it can
> > usefully proceed, and then proceeding: I was thinking that the lock
> > would stabilize the conditions that it bases that decision on.
> 
> 
> We do it with two phase.
> In first phase, we don't need lock because we don't need to be exact.
> In second phase where move pages really, we need a lock so we already hold it.

No, see Linus's point elsewhere in this thread.

To spell it out further, page_order(page) uses page_private(page),
and you've no idea what someone might put into page_private(page)
once it's no longer PageBuddy but perhaps allocated to a user.

So the unlocked advancment by page_order(page) may even take you
way out of this or any pageblock.

Linus was suggesting to take and drop the lock around that little
block each time.  Maybe.  I'm wary, I don't pretend to have thought
it through (nor shall further).

> 
> ret = suitable_migration_target(page, cc);
> ..
> ..
> spin_lock_irqsave(&zone->lock, flags);
> ret = suitable_migration_target(page, cc); 
> 
> So you shouldn't put the lock in loop.
> 
> > 
> > But it certainly does not stabilize all of them (most obviously not
> > PageLRU), so I'm guesssing that this is a best-effort decision which
> 
> > can safely go wrong some of the time.
> 
> Right.
> 
> > 
> > In which case, yes, much better to follow your suggestion, and hold
> > the lock (with irqs disabled) for only half the time.
> > 
> > Similarly untested patch below.
> > 
> > But I'm entirely unfamiliar with this code: best Cc people more familiar
> > with it.  Does this addition of locking to rescue_unmovable_pageblock()
> > look correct to you, and do you think it has a good chance of fixing the
> 
> 
> No.I think we need to use start_page instead of page and

I thought so, but Linus points out why not (pfn_valid_within).

> we need a last page of page block to check cross-over zones,
> not first page in next page block.

Yes, that's the off-by-one I was alluding to.

> 
> I should have reviewed more carefully. :(
> 
> barrios@bbox:~/linux-2.6$ git diff
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4ac338a..b3fcc4b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
>  
>         pfn = page_to_pfn(page);
>         start_pfn = pfn & ~(pageblock_nr_pages - 1);
> -       end_pfn = start_pfn + pageblock_nr_pages;
> +       end_pfn = start_pfn + pageblock_nr_pages - 1;

Yes.

>  
>         start_page = pfn_to_page(start_pfn);
>         end_page = pfn_to_page(end_pfn);
> @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
>         if (page_zone(start_page) != page_zone(end_page))
>                 return false;
>  
> -       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
> +       for (page = start_page, pfn = start_pfn; page <= end_page; pfn++,
>                                                                   page++) {

Yes.

>                 if (!pfn_valid_within(pfn))
>                         continue;
> @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page)
>                 return false;
>         }
>  
> -       set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> -       move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
> +       set_pageblock_migratetype(start_page, MIGRATE_MOVABLE);
> +       move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE);

No.  I guess we can assume the incoming page was valid (fair?),
so should still use that, but something else for the loop iterator.

And you seem to have missed out all the locking needed.

>         return true;
>  }

So Nack to that on several grounds.

And I'd like to hear evidence that this really is useful code,
justifying the locking and interrupt-disabling which would have to
be added.  My 0 out of 25000 was not reassuring.  Nor the original
test results, when it was doing completely the wrong thing unnoticed.

Hugh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-04  1:41                     ` Hugh Dickins
  0 siblings, 0 replies; 87+ messages in thread
From: Hugh Dickins @ 2012-06-04  1:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Linus Torvalds, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On Mon, 4 Jun 2012, Minchan Kim wrote:
> On 06/02/2012 01:40 PM, Hugh Dickins wrote:
> 
> > On Fri, 1 Jun 2012, Linus Torvalds wrote:
> >> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
> >>>
> >>> +       spin_lock_irqsave(&zone->lock, flags);
> >>>        for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
> >>>                                                                  page++) {
> >>
> >> So holding the spinlock (and disabling irqs!) over the whole loop
> >> sounds horrible.
> > 
> > There looks to be a pretty similar loop inside move_freepages_block(),
> > which is the part which I believe really needs the lock - it's moving
> > free pages from one lru to another.
> > 
> >>
> >> At the same time, the iterators don't seem to require the spinlock, so
> >> it should be possible to just move the lock into the loop, no?
> > 
> > Move the lock after the loop, I think you meant.
> > 
> > I put the lock before the loop because it's deciding whether it can
> > usefully proceed, and then proceeding: I was thinking that the lock
> > would stabilize the conditions that it bases that decision on.
> 
> 
> We do it with two phase.
> In first phase, we don't need lock because we don't need to be exact.
> In second phase where move pages really, we need a lock so we already hold it.

No, see Linus's point elsewhere in this thread.

To spell it out further, page_order(page) uses page_private(page),
and you've no idea what someone might put into page_private(page)
once it's no longer PageBuddy but perhaps allocated to a user.

So the unlocked advancment by page_order(page) may even take you
way out of this or any pageblock.

Linus was suggesting to take and drop the lock around that little
block each time.  Maybe.  I'm wary, I don't pretend to have thought
it through (nor shall further).

> 
> ret = suitable_migration_target(page, cc);
> ..
> ..
> spin_lock_irqsave(&zone->lock, flags);
> ret = suitable_migration_target(page, cc); 
> 
> So you shouldn't put the lock in loop.
> 
> > 
> > But it certainly does not stabilize all of them (most obviously not
> > PageLRU), so I'm guesssing that this is a best-effort decision which
> 
> > can safely go wrong some of the time.
> 
> Right.
> 
> > 
> > In which case, yes, much better to follow your suggestion, and hold
> > the lock (with irqs disabled) for only half the time.
> > 
> > Similarly untested patch below.
> > 
> > But I'm entirely unfamiliar with this code: best Cc people more familiar
> > with it.  Does this addition of locking to rescue_unmovable_pageblock()
> > look correct to you, and do you think it has a good chance of fixing the
> 
> 
> No.I think we need to use start_page instead of page and

I thought so, but Linus points out why not (pfn_valid_within).

> we need a last page of page block to check cross-over zones,
> not first page in next page block.

Yes, that's the off-by-one I was alluding to.

> 
> I should have reviewed more carefully. :(
> 
> barrios@bbox:~/linux-2.6$ git diff
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4ac338a..b3fcc4b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
>  
>         pfn = page_to_pfn(page);
>         start_pfn = pfn & ~(pageblock_nr_pages - 1);
> -       end_pfn = start_pfn + pageblock_nr_pages;
> +       end_pfn = start_pfn + pageblock_nr_pages - 1;

Yes.

>  
>         start_page = pfn_to_page(start_pfn);
>         end_page = pfn_to_page(end_pfn);
> @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
>         if (page_zone(start_page) != page_zone(end_page))
>                 return false;
>  
> -       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
> +       for (page = start_page, pfn = start_pfn; page <= end_page; pfn++,
>                                                                   page++) {

Yes.

>                 if (!pfn_valid_within(pfn))
>                         continue;
> @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page)
>                 return false;
>         }
>  
> -       set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> -       move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
> +       set_pageblock_migratetype(start_page, MIGRATE_MOVABLE);
> +       move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE);

No.  I guess we can assume the incoming page was valid (fair?),
so should still use that, but something else for the loop iterator.

And you seem to have missed out all the locking needed.

>         return true;
>  }

So Nack to that on several grounds.

And I'd like to hear evidence that this really is useful code,
justifying the locking and interrupt-disabling which would have to
be added.  My 0 out of 25000 was not reassuring.  Nor the original
test results, when it was doing completely the wrong thing unnoticed.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-04  1:41                     ` Hugh Dickins
@ 2012-06-04  1:47                       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 87+ messages in thread
From: KOSAKI Motohiro @ 2012-06-04  1:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Minchan Kim, Linus Torvalds, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, Markus Trippelsdorf,
	linux-kernel, linux-mm, kosaki.motohiro

>> -       set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>> -       move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
>> +       set_pageblock_migratetype(start_page, MIGRATE_MOVABLE);
>> +       move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE);
>
> No.  I guess we can assume the incoming page was valid (fair?),
> so should still use that, but something else for the loop iterator.

Fair. passed page is always valid.


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-04  1:47                       ` KOSAKI Motohiro
  0 siblings, 0 replies; 87+ messages in thread
From: KOSAKI Motohiro @ 2012-06-04  1:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Minchan Kim, Linus Torvalds, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, Markus Trippelsdorf,
	linux-kernel, linux-mm, kosaki.motohiro

>> -       set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>> -       move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
>> +       set_pageblock_migratetype(start_page, MIGRATE_MOVABLE);
>> +       move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE);
>
> No.  I guess we can assume the incoming page was valid (fair?),
> so should still use that, but something else for the loop iterator.

Fair. passed page is always valid.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-04  1:41                     ` Hugh Dickins
@ 2012-06-04  2:28                       ` Minchan Kim
  -1 siblings, 0 replies; 87+ messages in thread
From: Minchan Kim @ 2012-06-04  2:28 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On 06/04/2012 10:41 AM, Hugh Dickins wrote:

> On Mon, 4 Jun 2012, Minchan Kim wrote:
>> On 06/02/2012 01:40 PM, Hugh Dickins wrote:
>>
>>> On Fri, 1 Jun 2012, Linus Torvalds wrote:
>>>> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
>>>>>
>>>>> +       spin_lock_irqsave(&zone->lock, flags);
>>>>>        for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
>>>>>                                                                  page++) {
>>>>
>>>> So holding the spinlock (and disabling irqs!) over the whole loop
>>>> sounds horrible.
>>>
>>> There looks to be a pretty similar loop inside move_freepages_block(),
>>> which is the part which I believe really needs the lock - it's moving
>>> free pages from one lru to another.
>>>
>>>>
>>>> At the same time, the iterators don't seem to require the spinlock, so
>>>> it should be possible to just move the lock into the loop, no?
>>>
>>> Move the lock after the loop, I think you meant.
>>>
>>> I put the lock before the loop because it's deciding whether it can
>>> usefully proceed, and then proceeding: I was thinking that the lock
>>> would stabilize the conditions that it bases that decision on.
>>
>>
>> We do it with two phase.
>> In first phase, we don't need lock because we don't need to be exact.
>> In second phase where move pages really, we need a lock so we already hold it.
> 
> No, see Linus's point elsewhere in this thread.
> 
> To spell it out further, page_order(page) uses page_private(page),
> and you've no idea what someone might put into page_private(page)
> once it's no longer PageBuddy but perhaps allocated to a user.
> 
> So the unlocked advancment by page_order(page) may even take you
> way out of this or any pageblock.
> 
> Linus was suggesting to take and drop the lock around that little
> block each time.  Maybe.  I'm wary, I don't pretend to have thought
> it through (nor shall further).


Right.
I got confused because suitable_migration_target did rescure_unmovable_pageblock. I don't want it.
 I hope separating test which does just check whether it's migratable or not and working
which really does rescue.
So I think it would be better following as.

if (!suitable_migration_target())
	continue;

spin_lock_irqsave(&zone->lock, flags);
if (ret = suitable_migration_target()) {
	if (ret == CAN_MAKE_MOVABLE_PAGE_BLOCK)
		rescure_unmoable_pageblock()
	isolate_freepages_block();
}

> 
>>
>> ret = suitable_migration_target(page, cc);
>> ..
>> ..
>> spin_lock_irqsave(&zone->lock, flags);
>> ret = suitable_migration_target(page, cc); 
>>
>> So you shouldn't put the lock in loop.
>>
>>>
>>> But it certainly does not stabilize all of them (most obviously not
>>> PageLRU), so I'm guesssing that this is a best-effort decision which
>>
>>> can safely go wrong some of the time.
>>
>> Right.
>>
>>>
>>> In which case, yes, much better to follow your suggestion, and hold
>>> the lock (with irqs disabled) for only half the time.
>>>
>>> Similarly untested patch below.
>>>
>>> But I'm entirely unfamiliar with this code: best Cc people more familiar
>>> with it.  Does this addition of locking to rescue_unmovable_pageblock()
>>> look correct to you, and do you think it has a good chance of fixing the
>>
>>
>> No.I think we need to use start_page instead of page and
> 
> I thought so, but Linus points out why not (pfn_valid_within).
> 
>> we need a last page of page block to check cross-over zones,
>> not first page in next page block.
> 
> Yes, that's the off-by-one I was alluding to.
> 
>>
>> I should have reviewed more carefully. :(
>>
>> barrios@bbox:~/linux-2.6$ git diff
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 4ac338a..b3fcc4b 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
>>  
>>         pfn = page_to_pfn(page);
>>         start_pfn = pfn & ~(pageblock_nr_pages - 1);
>> -       end_pfn = start_pfn + pageblock_nr_pages;
>> +       end_pfn = start_pfn + pageblock_nr_pages - 1;
> 
> Yes.
> 
>>  
>>         start_page = pfn_to_page(start_pfn);
>>         end_page = pfn_to_page(end_pfn);
>> @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
>>         if (page_zone(start_page) != page_zone(end_page))
>>                 return false;
>>  
>> -       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
>> +       for (page = start_page, pfn = start_pfn; page <= end_page; pfn++,
>>                                                                   page++) {
> 
> Yes.
> 
>>                 if (!pfn_valid_within(pfn))
>>                         continue;
>> @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page)
>>                 return false;
>>         }
>>  
>> -       set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>> -       move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
>> +       set_pageblock_migratetype(start_page, MIGRATE_MOVABLE);
>> +       move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE);
> 
> No.  I guess we can assume the incoming page was valid (fair?),
> so should still use that, but something else for the loop iterator.


It should be fair. I did it in following mail.

> 
> And you seem to have missed out all the locking needed.
> 
>>         return true;
>>  }
> 
> So Nack to that on several grounds.
> 
> And I'd like to hear evidence that this really is useful code,
> justifying the locking and interrupt-disabling which would have to
> be added.  My 0 out of 25000 was not reassuring.  Nor the original
> test results, when it was doing completely the wrong thing unnoticed.


In changelog, Bartlomiej said.

    My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means
    131072 standard 4KiB pages in 'Normal' zone) is to:
    
    - allocate 120000 pages for kernel's usage
    - free every second page (60000 pages) of memory just allocated
    - allocate and use 60000 pages from user space
    - free remaining 60000 pages of kernel memory
      (now we have fragmented memory occupied mostly by user space pages)
    - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
    
    The results:
    - with compaction disabled I get 11 successful allocations
    - with compaction enabled - 14 successful allocations
    - with this patch I'm able to get all 100 successful allocations

I think above workload is really really artificial and theoretical so I didn't like
this patch but Mel seem to like it. :(

Quote from Mel
" Ok, that is indeed an adverse workload that the current system will not
properly deal with. I think you are right to try fixing this but may need
a different approach that takes the cost out of the allocation/free path
and moves it the compaction path."

We can correct this patch to work but at least need justification about it.
Do we really need this patch for such artificial workload?
what do you think?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-04  2:28                       ` Minchan Kim
  0 siblings, 0 replies; 87+ messages in thread
From: Minchan Kim @ 2012-06-04  2:28 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Bartlomiej Zolnierkiewicz, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf, linux-kernel,
	linux-mm

On 06/04/2012 10:41 AM, Hugh Dickins wrote:

> On Mon, 4 Jun 2012, Minchan Kim wrote:
>> On 06/02/2012 01:40 PM, Hugh Dickins wrote:
>>
>>> On Fri, 1 Jun 2012, Linus Torvalds wrote:
>>>> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
>>>>>
>>>>> +       spin_lock_irqsave(&zone->lock, flags);
>>>>>        for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
>>>>>                                                                  page++) {
>>>>
>>>> So holding the spinlock (and disabling irqs!) over the whole loop
>>>> sounds horrible.
>>>
>>> There looks to be a pretty similar loop inside move_freepages_block(),
>>> which is the part which I believe really needs the lock - it's moving
>>> free pages from one lru to another.
>>>
>>>>
>>>> At the same time, the iterators don't seem to require the spinlock, so
>>>> it should be possible to just move the lock into the loop, no?
>>>
>>> Move the lock after the loop, I think you meant.
>>>
>>> I put the lock before the loop because it's deciding whether it can
>>> usefully proceed, and then proceeding: I was thinking that the lock
>>> would stabilize the conditions that it bases that decision on.
>>
>>
>> We do it with two phase.
>> In first phase, we don't need lock because we don't need to be exact.
>> In second phase where move pages really, we need a lock so we already hold it.
> 
> No, see Linus's point elsewhere in this thread.
> 
> To spell it out further, page_order(page) uses page_private(page),
> and you've no idea what someone might put into page_private(page)
> once it's no longer PageBuddy but perhaps allocated to a user.
> 
> So the unlocked advancment by page_order(page) may even take you
> way out of this or any pageblock.
> 
> Linus was suggesting to take and drop the lock around that little
> block each time.  Maybe.  I'm wary, I don't pretend to have thought
> it through (nor shall further).


Right.
I got confused because suitable_migration_target did rescure_unmovable_pageblock. I don't want it.
 I hope separating test which does just check whether it's migratable or not and working
which really does rescue.
So I think it would be better following as.

if (!suitable_migration_target())
	continue;

spin_lock_irqsave(&zone->lock, flags);
if (ret = suitable_migration_target()) {
	if (ret == CAN_MAKE_MOVABLE_PAGE_BLOCK)
		rescure_unmoable_pageblock()
	isolate_freepages_block();
}

> 
>>
>> ret = suitable_migration_target(page, cc);
>> ..
>> ..
>> spin_lock_irqsave(&zone->lock, flags);
>> ret = suitable_migration_target(page, cc); 
>>
>> So you shouldn't put the lock in loop.
>>
>>>
>>> But it certainly does not stabilize all of them (most obviously not
>>> PageLRU), so I'm guesssing that this is a best-effort decision which
>>
>>> can safely go wrong some of the time.
>>
>> Right.
>>
>>>
>>> In which case, yes, much better to follow your suggestion, and hold
>>> the lock (with irqs disabled) for only half the time.
>>>
>>> Similarly untested patch below.
>>>
>>> But I'm entirely unfamiliar with this code: best Cc people more familiar
>>> with it.  Does this addition of locking to rescue_unmovable_pageblock()
>>> look correct to you, and do you think it has a good chance of fixing the
>>
>>
>> No.I think we need to use start_page instead of page and
> 
> I thought so, but Linus points out why not (pfn_valid_within).
> 
>> we need a last page of page block to check cross-over zones,
>> not first page in next page block.
> 
> Yes, that's the off-by-one I was alluding to.
> 
>>
>> I should have reviewed more carefully. :(
>>
>> barrios@bbox:~/linux-2.6$ git diff
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 4ac338a..b3fcc4b 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
>>  
>>         pfn = page_to_pfn(page);
>>         start_pfn = pfn & ~(pageblock_nr_pages - 1);
>> -       end_pfn = start_pfn + pageblock_nr_pages;
>> +       end_pfn = start_pfn + pageblock_nr_pages - 1;
> 
> Yes.
> 
>>  
>>         start_page = pfn_to_page(start_pfn);
>>         end_page = pfn_to_page(end_pfn);
>> @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
>>         if (page_zone(start_page) != page_zone(end_page))
>>                 return false;
>>  
>> -       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
>> +       for (page = start_page, pfn = start_pfn; page <= end_page; pfn++,
>>                                                                   page++) {
> 
> Yes.
> 
>>                 if (!pfn_valid_within(pfn))
>>                         continue;
>> @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page)
>>                 return false;
>>         }
>>  
>> -       set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>> -       move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
>> +       set_pageblock_migratetype(start_page, MIGRATE_MOVABLE);
>> +       move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE);
> 
> No.  I guess we can assume the incoming page was valid (fair?),
> so should still use that, but something else for the loop iterator.


It should be fair. I did it in following mail.

> 
> And you seem to have missed out all the locking needed.
> 
>>         return true;
>>  }
> 
> So Nack to that on several grounds.
> 
> And I'd like to hear evidence that this really is useful code,
> justifying the locking and interrupt-disabling which would have to
> be added.  My 0 out of 25000 was not reassuring.  Nor the original
> test results, when it was doing completely the wrong thing unnoticed.


In changelog, Bartlomiej said.

    My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means
    131072 standard 4KiB pages in 'Normal' zone) is to:
    
    - allocate 120000 pages for kernel's usage
    - free every second page (60000 pages) of memory just allocated
    - allocate and use 60000 pages from user space
    - free remaining 60000 pages of kernel memory
      (now we have fragmented memory occupied mostly by user space pages)
    - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
    
    The results:
    - with compaction disabled I get 11 successful allocations
    - with compaction enabled - 14 successful allocations
    - with this patch I'm able to get all 100 successful allocations

I think above workload is really really artificial and theoretical so I didn't like
this patch but Mel seem to like it. :(

Quote from Mel
" Ok, that is indeed an adverse workload that the current system will not
properly deal with. I think you are right to try fixing this but may need
a different approach that takes the cost out of the allocation/free path
and moves it the compaction path."

We can correct this patch to work but at least need justification about it.
Do we really need this patch for such artificial workload?
what do you think?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-04  1:26                                 ` KOSAKI Motohiro
@ 2012-06-04  2:30                                   ` Minchan Kim
  -1 siblings, 0 replies; 87+ messages in thread
From: Minchan Kim @ 2012-06-04  2:30 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Linus Torvalds, Hugh Dickins, Dave Jones,
	Bartlomiej Zolnierkiewicz, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Rik van Riel, Andrew Morton, Cong Wang,
	Markus Trippelsdorf, linux-kernel, linux-mm

On 06/04/2012 10:26 AM, KOSAKI Motohiro wrote:

>> Right. I missed that. I think we can use the page passed to rescue_unmovable_pageblock.
>> We make sure it's valid in isolate_freepages. So how about this?
>>
>> barrios@bbox:~/linux-2.6$ git diff
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 4ac338a..7459ab5 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -368,11 +368,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>>  static bool rescue_unmovable_pageblock(struct page *page)
>>  {
>>        unsigned long pfn, start_pfn, end_pfn;
>> -       struct page *start_page, *end_page;
>> +       struct page *start_page, *end_page, *cursor_page;
>>
>>        pfn = page_to_pfn(page);
>>        start_pfn = pfn & ~(pageblock_nr_pages - 1);
>> -       end_pfn = start_pfn + pageblock_nr_pages;
>> +       end_pfn = start_pfn + pageblock_nr_pages - 1;
>>
>>        start_page = pfn_to_page(start_pfn);
>>        end_page = pfn_to_page(end_pfn);
>> @@ -381,19 +381,19 @@ static bool rescue_unmovable_pageblock(struct page *page)
>>        if (page_zone(start_page) != page_zone(end_page))
>>                return false;
>>
>> -       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
>> -                                                                 page++) {
>> +       for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page; pfn++,
>> +                                                                 cursor_page++) {
>>                if (!pfn_valid_within(pfn))
>>                        continue;
> 
> I guess  page_zone() should be used after pfn_valid_within(). Why can
> we assume invalid
> pfn return correct zone?


Right you are. We can't make sure it in case of CONFIG_HOLES_IN_ZONE.


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-04  2:30                                   ` Minchan Kim
  0 siblings, 0 replies; 87+ messages in thread
From: Minchan Kim @ 2012-06-04  2:30 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Linus Torvalds, Hugh Dickins, Dave Jones,
	Bartlomiej Zolnierkiewicz, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Rik van Riel, Andrew Morton, Cong Wang,
	Markus Trippelsdorf, linux-kernel, linux-mm

On 06/04/2012 10:26 AM, KOSAKI Motohiro wrote:

>> Right. I missed that. I think we can use the page passed to rescue_unmovable_pageblock.
>> We make sure it's valid in isolate_freepages. So how about this?
>>
>> barrios@bbox:~/linux-2.6$ git diff
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 4ac338a..7459ab5 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -368,11 +368,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>>  static bool rescue_unmovable_pageblock(struct page *page)
>>  {
>>        unsigned long pfn, start_pfn, end_pfn;
>> -       struct page *start_page, *end_page;
>> +       struct page *start_page, *end_page, *cursor_page;
>>
>>        pfn = page_to_pfn(page);
>>        start_pfn = pfn & ~(pageblock_nr_pages - 1);
>> -       end_pfn = start_pfn + pageblock_nr_pages;
>> +       end_pfn = start_pfn + pageblock_nr_pages - 1;
>>
>>        start_page = pfn_to_page(start_pfn);
>>        end_page = pfn_to_page(end_pfn);
>> @@ -381,19 +381,19 @@ static bool rescue_unmovable_pageblock(struct page *page)
>>        if (page_zone(start_page) != page_zone(end_page))
>>                return false;
>>
>> -       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
>> -                                                                 page++) {
>> +       for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page; pfn++,
>> +                                                                 cursor_page++) {
>>                if (!pfn_valid_within(pfn))
>>                        continue;
> 
> I guess  page_zone() should be used after pfn_valid_within(). Why can
> we assume invalid
> pfn return correct zone?


Right you are. We can't make sure it in case of CONFIG_HOLES_IN_ZONE.


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-04  2:28                       ` Minchan Kim
@ 2012-06-04  4:21                         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 87+ messages in thread
From: KOSAKI Motohiro @ 2012-06-04  4:21 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Linus Torvalds, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, Markus Trippelsdorf,
	linux-kernel, linux-mm, kosaki.motohiro

> In changelog, Bartlomiej said.
>
>      My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means
>      131072 standard 4KiB pages in 'Normal' zone) is to:
>
>      - allocate 120000 pages for kernel's usage
>      - free every second page (60000 pages) of memory just allocated
>      - allocate and use 60000 pages from user space
>      - free remaining 60000 pages of kernel memory
>        (now we have fragmented memory occupied mostly by user space pages)
>      - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
>
>      The results:
>      - with compaction disabled I get 11 successful allocations
>      - with compaction enabled - 14 successful allocations
>      - with this patch I'm able to get all 100 successful allocations
>
> I think above workload is really really artificial and theoretical so I didn't like
> this patch but Mel seem to like it. :(
>
> Quote from Mel
> " Ok, that is indeed an adverse workload that the current system will not
> properly deal with. I think you are right to try fixing this but may need
> a different approach that takes the cost out of the allocation/free path
> and moves it the compaction path."
>
> We can correct this patch to work but at least need justification about it.
> Do we really need this patch for such artificial workload?
> what do you think?

I'm ok to resubmit. But please change the thread.


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-04  4:21                         ` KOSAKI Motohiro
  0 siblings, 0 replies; 87+ messages in thread
From: KOSAKI Motohiro @ 2012-06-04  4:21 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Linus Torvalds, Bartlomiej Zolnierkiewicz,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, Markus Trippelsdorf,
	linux-kernel, linux-mm, kosaki.motohiro

> In changelog, Bartlomiej said.
>
>      My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means
>      131072 standard 4KiB pages in 'Normal' zone) is to:
>
>      - allocate 120000 pages for kernel's usage
>      - free every second page (60000 pages) of memory just allocated
>      - allocate and use 60000 pages from user space
>      - free remaining 60000 pages of kernel memory
>        (now we have fragmented memory occupied mostly by user space pages)
>      - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
>
>      The results:
>      - with compaction disabled I get 11 successful allocations
>      - with compaction enabled - 14 successful allocations
>      - with this patch I'm able to get all 100 successful allocations
>
> I think above workload is really really artificial and theoretical so I didn't like
> this patch but Mel seem to like it. :(
>
> Quote from Mel
> " Ok, that is indeed an adverse workload that the current system will not
> properly deal with. I think you are right to try fixing this but may need
> a different approach that takes the cost out of the allocation/free path
> and moves it the compaction path."
>
> We can correct this patch to work but at least need justification about it.
> Do we really need this patch for such artificial workload?
> what do you think?

I'm ok to resubmit. But please change the thread.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
  2012-06-04  2:28                       ` Minchan Kim
@ 2012-06-04 13:37                         ` Bartlomiej Zolnierkiewicz
  -1 siblings, 0 replies; 87+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-04 13:37 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Linus Torvalds, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Rik van Riel, Dave Jones, Andrew Morton, Cong Wang,
	Markus Trippelsdorf, linux-kernel, linux-mm


Hi,

On Monday 04 June 2012 04:28:56 Minchan Kim wrote:
> On 06/04/2012 10:41 AM, Hugh Dickins wrote:
> 
> > On Mon, 4 Jun 2012, Minchan Kim wrote:
> >> On 06/02/2012 01:40 PM, Hugh Dickins wrote:
> >>
> >>> On Fri, 1 Jun 2012, Linus Torvalds wrote:
> >>>> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
> >>>>>
> >>>>> +       spin_lock_irqsave(&zone->lock, flags);
> >>>>>        for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
> >>>>>                                                                  page++) {
> >>>>
> >>>> So holding the spinlock (and disabling irqs!) over the whole loop
> >>>> sounds horrible.
> >>>
> >>> There looks to be a pretty similar loop inside move_freepages_block(),
> >>> which is the part which I believe really needs the lock - it's moving
> >>> free pages from one lru to another.
> >>>
> >>>>
> >>>> At the same time, the iterators don't seem to require the spinlock, so
> >>>> it should be possible to just move the lock into the loop, no?
> >>>
> >>> Move the lock after the loop, I think you meant.
> >>>
> >>> I put the lock before the loop because it's deciding whether it can
> >>> usefully proceed, and then proceeding: I was thinking that the lock
> >>> would stabilize the conditions that it bases that decision on.
> >>
> >>
> >> We do it with two phase.
> >> In first phase, we don't need lock because we don't need to be exact.
> >> In second phase where move pages really, we need a lock so we already hold it.
> > 
> > No, see Linus's point elsewhere in this thread.
> > 
> > To spell it out further, page_order(page) uses page_private(page),
> > and you've no idea what someone might put into page_private(page)
> > once it's no longer PageBuddy but perhaps allocated to a user.
> > 
> > So the unlocked advancment by page_order(page) may even take you
> > way out of this or any pageblock.
> > 
> > Linus was suggesting to take and drop the lock around that little
> > block each time.  Maybe.  I'm wary, I don't pretend to have thought
> > it through (nor shall further).
> 
> 
> Right.
> I got confused because suitable_migration_target did rescure_unmovable_pageblock. I don't want it.
>  I hope separating test which does just check whether it's migratable or not and working
> which really does rescue.
> So I think it would be better following as.
> 
> if (!suitable_migration_target())
> 	continue;
> 
> spin_lock_irqsave(&zone->lock, flags);
> if (ret = suitable_migration_target()) {
> 	if (ret == CAN_MAKE_MOVABLE_PAGE_BLOCK)
> 		rescure_unmoable_pageblock()
> 	isolate_freepages_block();
> }
> 
> > 
> >>
> >> ret = suitable_migration_target(page, cc);
> >> ..
> >> ..
> >> spin_lock_irqsave(&zone->lock, flags);
> >> ret = suitable_migration_target(page, cc); 
> >>
> >> So you shouldn't put the lock in loop.
> >>
> >>>
> >>> But it certainly does not stabilize all of them (most obviously not
> >>> PageLRU), so I'm guesssing that this is a best-effort decision which
> >>
> >>> can safely go wrong some of the time.
> >>
> >> Right.
> >>
> >>>
> >>> In which case, yes, much better to follow your suggestion, and hold
> >>> the lock (with irqs disabled) for only half the time.
> >>>
> >>> Similarly untested patch below.
> >>>
> >>> But I'm entirely unfamiliar with this code: best Cc people more familiar
> >>> with it.  Does this addition of locking to rescue_unmovable_pageblock()
> >>> look correct to you, and do you think it has a good chance of fixing the
> >>
> >>
> >> No.I think we need to use start_page instead of page and
> > 
> > I thought so, but Linus points out why not (pfn_valid_within).
> > 
> >> we need a last page of page block to check cross-over zones,
> >> not first page in next page block.
> > 
> > Yes, that's the off-by-one I was alluding to.
> > 
> >>
> >> I should have reviewed more carefully. :(
> >>
> >> barrios@bbox:~/linux-2.6$ git diff
> >> diff --git a/mm/compaction.c b/mm/compaction.c
> >> index 4ac338a..b3fcc4b 100644
> >> --- a/mm/compaction.c
> >> +++ b/mm/compaction.c
> >> @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
> >>  
> >>         pfn = page_to_pfn(page);
> >>         start_pfn = pfn & ~(pageblock_nr_pages - 1);
> >> -       end_pfn = start_pfn + pageblock_nr_pages;
> >> +       end_pfn = start_pfn + pageblock_nr_pages - 1;
> > 
> > Yes.
> > 
> >>  
> >>         start_page = pfn_to_page(start_pfn);
> >>         end_page = pfn_to_page(end_pfn);
> >> @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
> >>         if (page_zone(start_page) != page_zone(end_page))
> >>                 return false;
> >>  
> >> -       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
> >> +       for (page = start_page, pfn = start_pfn; page <= end_page; pfn++,
> >>                                                                   page++) {
> > 
> > Yes.
> > 
> >>                 if (!pfn_valid_within(pfn))
> >>                         continue;
> >> @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page)
> >>                 return false;
> >>         }
> >>  
> >> -       set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> >> -       move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
> >> +       set_pageblock_migratetype(start_page, MIGRATE_MOVABLE);
> >> +       move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE);
> > 
> > No.  I guess we can assume the incoming page was valid (fair?),
> > so should still use that, but something else for the loop iterator.
> 
> 
> It should be fair. I did it in following mail.
> 
> > 
> > And you seem to have missed out all the locking needed.
> > 
> >>         return true;
> >>  }
> > 
> > So Nack to that on several grounds.
> > 
> > And I'd like to hear evidence that this really is useful code,
> > justifying the locking and interrupt-disabling which would have to
> > be added.  My 0 out of 25000 was not reassuring.  Nor the original
> > test results, when it was doing completely the wrong thing unnoticed.
> 
> 
> In changelog, Bartlomiej said.
> 
>     My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means
>     131072 standard 4KiB pages in 'Normal' zone) is to:
>     
>     - allocate 120000 pages for kernel's usage
>     - free every second page (60000 pages) of memory just allocated
>     - allocate and use 60000 pages from user space
>     - free remaining 60000 pages of kernel memory
>       (now we have fragmented memory occupied mostly by user space pages)
>     - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
>     
>     The results:
>     - with compaction disabled I get 11 successful allocations
>     - with compaction enabled - 14 successful allocations
>     - with this patch I'm able to get all 100 successful allocations
> 
> I think above workload is really really artificial and theoretical so I didn't like
> this patch but Mel seem to like it. :(
> 
> Quote from Mel
> " Ok, that is indeed an adverse workload that the current system will not
> properly deal with. I think you are right to try fixing this but may need
> a different approach that takes the cost out of the allocation/free path
> and moves it the compaction path."

Please note that the current patch is less intrusive than the original
version that Mel was talking about in the above quote (the cost is only in
compaction path which is non-default one and in a allocation slow-path).

> We can correct this patch to work but at least need justification about it.
> Do we really need this patch for such artificial workload?
> what do you think?

I would still like to get this patch included since it helps with my
test-case and is not very much code / complexity added.  So far I fixed
(all?) outstanding issues in the patch attached below and will post
the next combined version (v9) of the patch in the new thread.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center


- use right page for pageblock conversion in rescue_unmovable_pageblock()
- split rescue_unmovable_pageblock() on can_rescue_unmovable_pageblock()
  and __rescue_unmovable_pageblock()
- add missing locking

---
 mm/compaction.c |   66 ++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 45 insertions(+), 21 deletions(-)

Index: b/mm/compaction.c
===================================================================
--- a/mm/compaction.c	2012-06-04 15:19:04.564467996 +0200
+++ b/mm/compaction.c	2012-06-04 15:19:15.700467901 +0200
@@ -362,50 +362,70 @@ isolate_migratepages_range(struct zone *
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
 /*
- * Returns true if MIGRATE_UNMOVABLE pageblock was successfully
+ * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
  * converted to MIGRATE_MOVABLE type, false otherwise.
  */
-static bool rescue_unmovable_pageblock(struct page *page)
+static bool can_rescue_unmovable_pageblock(struct page *page, bool locked)
 {
 	unsigned long pfn, start_pfn, end_pfn;
-	struct page *start_page, *end_page;
+	struct page *start_page, *end_page, *cursor_page;
 
 	pfn = page_to_pfn(page);
 	start_pfn = pfn & ~(pageblock_nr_pages - 1);
-	end_pfn = start_pfn + pageblock_nr_pages;
+	end_pfn = start_pfn + pageblock_nr_pages - 1;
 
 	start_page = pfn_to_page(start_pfn);
 	end_page = pfn_to_page(end_pfn);
 
-	/* Do not deal with pageblocks that overlap zones */
-	if (page_zone(start_page) != page_zone(end_page))
-		return false;
+	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
+		pfn++, cursor_page++) {
+		struct zone *zone = page_zone(start_page);
+		unsigned long flags;
 
-	for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
-								  page++) {
 		if (!pfn_valid_within(pfn))
 			continue;
 
-		if (PageBuddy(page)) {
-			int order = page_order(page);
+		/* Do not deal with pageblocks that overlap zones */
+		if (page_zone(cursor_page) != zone)
+			return false;
+
+		if (!locked)
+			spin_lock_irqsave(&zone->lock, flags);
+
+		if (PageBuddy(cursor_page)) {
+			int order = page_order(cursor_page);
 
 			pfn += (1 << order) - 1;
-			page += (1 << order) - 1;
+			cursor_page += (1 << order) - 1;
 
+			if (!locked)
+				spin_unlock_irqrestore(&zone->lock, flags);
 			continue;
-		} else if (page_count(page) == 0 || PageLRU(page))
+		} else if (page_count(cursor_page) == 0 ||
+			   PageLRU(cursor_page)) {
+			if (!locked)
+				spin_unlock_irqrestore(&zone->lock, flags);
 			continue;
+		}
+
+		if (!locked)
+			spin_unlock_irqrestore(&zone->lock, flags);
 
 		return false;
 	}
 
+	return true;
+}
+
+void __rescue_unmovable_pageblock(struct page *page)
+{
 	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
-	return true;
 }
 
 enum smt_result {
 	GOOD_AS_MIGRATION_TARGET,
+	GOOD_CAN_RESCUE_UNMOVABLE_TARGET,
 	FAIL_UNMOVABLE_TARGET,
 	FAIL_BAD_TARGET,
 };
@@ -416,7 +436,7 @@ enum smt_result {
  * is within a MIGRATE_UNMOVABLE block, FAIL_BAD_TARGET otherwise.
  */
 static enum smt_result suitable_migration_target(struct page *page,
-				      struct compact_control *cc)
+				      struct compact_control *cc, bool locked)
 {
 
 	int migratetype = get_pageblock_migratetype(page);
@@ -440,8 +460,8 @@ static enum smt_result suitable_migratio
 
 	if (cc->mode != COMPACT_ASYNC_MOVABLE &&
 	    migratetype == MIGRATE_UNMOVABLE &&
-	    rescue_unmovable_pageblock(page))
-		return GOOD_AS_MIGRATION_TARGET;
+	    can_rescue_unmovable_pageblock(page, locked))
+		return GOOD_CAN_RESCUE_UNMOVABLE_TARGET;
 
 	/* Otherwise skip the block */
 	return FAIL_BAD_TARGET;
@@ -509,8 +529,9 @@ static void isolate_freepages(struct zon
 			continue;
 
 		/* Check the block is suitable for migration */
-		ret = suitable_migration_target(page, cc);
-		if (ret != GOOD_AS_MIGRATION_TARGET) {
+		ret = suitable_migration_target(page, cc, false);
+		if (ret != GOOD_AS_MIGRATION_TARGET &&
+		    ret != GOOD_CAN_RESCUE_UNMOVABLE_TARGET) {
 			if (ret == FAIL_UNMOVABLE_TARGET)
 				cc->nr_pageblocks_skipped++;
 			continue;
@@ -523,8 +544,11 @@ static void isolate_freepages(struct zon
 		 */
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
-		ret = suitable_migration_target(page, cc);
-		if (ret == GOOD_AS_MIGRATION_TARGET) {
+		ret = suitable_migration_target(page, cc, true);
+		if (ret == GOOD_AS_MIGRATION_TARGET ||
+		    ret == GOOD_CAN_RESCUE_UNMOVABLE_TARGET) {
+			if (ret == GOOD_CAN_RESCUE_UNMOVABLE_TARGET)
+				__rescue_unmovable_pageblock(page);
 			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
 			isolated = isolate_freepages_block(pfn, end_pfn,
 							   freelist, false);

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170()
@ 2012-06-04 13:37                         ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 87+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-04 13:37 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Linus Torvalds, Kyungmin Park, Marek Szyprowski,
	Mel Gorman, Rik van Riel, Dave Jones, Andrew Morton, Cong Wang,
	Markus Trippelsdorf, linux-kernel, linux-mm


Hi,

On Monday 04 June 2012 04:28:56 Minchan Kim wrote:
> On 06/04/2012 10:41 AM, Hugh Dickins wrote:
> 
> > On Mon, 4 Jun 2012, Minchan Kim wrote:
> >> On 06/02/2012 01:40 PM, Hugh Dickins wrote:
> >>
> >>> On Fri, 1 Jun 2012, Linus Torvalds wrote:
> >>>> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins <hughd@google.com> wrote:
> >>>>>
> >>>>> +       spin_lock_irqsave(&zone->lock, flags);
> >>>>>        for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
> >>>>>                                                                  page++) {
> >>>>
> >>>> So holding the spinlock (and disabling irqs!) over the whole loop
> >>>> sounds horrible.
> >>>
> >>> There looks to be a pretty similar loop inside move_freepages_block(),
> >>> which is the part which I believe really needs the lock - it's moving
> >>> free pages from one lru to another.
> >>>
> >>>>
> >>>> At the same time, the iterators don't seem to require the spinlock, so
> >>>> it should be possible to just move the lock into the loop, no?
> >>>
> >>> Move the lock after the loop, I think you meant.
> >>>
> >>> I put the lock before the loop because it's deciding whether it can
> >>> usefully proceed, and then proceeding: I was thinking that the lock
> >>> would stabilize the conditions that it bases that decision on.
> >>
> >>
> >> We do it with two phase.
> >> In first phase, we don't need lock because we don't need to be exact.
> >> In second phase where move pages really, we need a lock so we already hold it.
> > 
> > No, see Linus's point elsewhere in this thread.
> > 
> > To spell it out further, page_order(page) uses page_private(page),
> > and you've no idea what someone might put into page_private(page)
> > once it's no longer PageBuddy but perhaps allocated to a user.
> > 
> > So the unlocked advancment by page_order(page) may even take you
> > way out of this or any pageblock.
> > 
> > Linus was suggesting to take and drop the lock around that little
> > block each time.  Maybe.  I'm wary, I don't pretend to have thought
> > it through (nor shall further).
> 
> 
> Right.
> I got confused because suitable_migration_target did rescure_unmovable_pageblock. I don't want it.
>  I hope separating test which does just check whether it's migratable or not and working
> which really does rescue.
> So I think it would be better following as.
> 
> if (!suitable_migration_target())
> 	continue;
> 
> spin_lock_irqsave(&zone->lock, flags);
> if (ret = suitable_migration_target()) {
> 	if (ret == CAN_MAKE_MOVABLE_PAGE_BLOCK)
> 		rescure_unmoable_pageblock()
> 	isolate_freepages_block();
> }
> 
> > 
> >>
> >> ret = suitable_migration_target(page, cc);
> >> ..
> >> ..
> >> spin_lock_irqsave(&zone->lock, flags);
> >> ret = suitable_migration_target(page, cc); 
> >>
> >> So you shouldn't put the lock in loop.
> >>
> >>>
> >>> But it certainly does not stabilize all of them (most obviously not
> >>> PageLRU), so I'm guesssing that this is a best-effort decision which
> >>
> >>> can safely go wrong some of the time.
> >>
> >> Right.
> >>
> >>>
> >>> In which case, yes, much better to follow your suggestion, and hold
> >>> the lock (with irqs disabled) for only half the time.
> >>>
> >>> Similarly untested patch below.
> >>>
> >>> But I'm entirely unfamiliar with this code: best Cc people more familiar
> >>> with it.  Does this addition of locking to rescue_unmovable_pageblock()
> >>> look correct to you, and do you think it has a good chance of fixing the
> >>
> >>
> >> No.I think we need to use start_page instead of page and
> > 
> > I thought so, but Linus points out why not (pfn_valid_within).
> > 
> >> we need a last page of page block to check cross-over zones,
> >> not first page in next page block.
> > 
> > Yes, that's the off-by-one I was alluding to.
> > 
> >>
> >> I should have reviewed more carefully. :(
> >>
> >> barrios@bbox:~/linux-2.6$ git diff
> >> diff --git a/mm/compaction.c b/mm/compaction.c
> >> index 4ac338a..b3fcc4b 100644
> >> --- a/mm/compaction.c
> >> +++ b/mm/compaction.c
> >> @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
> >>  
> >>         pfn = page_to_pfn(page);
> >>         start_pfn = pfn & ~(pageblock_nr_pages - 1);
> >> -       end_pfn = start_pfn + pageblock_nr_pages;
> >> +       end_pfn = start_pfn + pageblock_nr_pages - 1;
> > 
> > Yes.
> > 
> >>  
> >>         start_page = pfn_to_page(start_pfn);
> >>         end_page = pfn_to_page(end_pfn);
> >> @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page)
> >>         if (page_zone(start_page) != page_zone(end_page))
> >>                 return false;
> >>  
> >> -       for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
> >> +       for (page = start_page, pfn = start_pfn; page <= end_page; pfn++,
> >>                                                                   page++) {
> > 
> > Yes.
> > 
> >>                 if (!pfn_valid_within(pfn))
> >>                         continue;
> >> @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page)
> >>                 return false;
> >>         }
> >>  
> >> -       set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> >> -       move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
> >> +       set_pageblock_migratetype(start_page, MIGRATE_MOVABLE);
> >> +       move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE);
> > 
> > No.  I guess we can assume the incoming page was valid (fair?),
> > so should still use that, but something else for the loop iterator.
> 
> 
> It should be fair. I did it in following mail.
> 
> > 
> > And you seem to have missed out all the locking needed.
> > 
> >>         return true;
> >>  }
> > 
> > So Nack to that on several grounds.
> > 
> > And I'd like to hear evidence that this really is useful code,
> > justifying the locking and interrupt-disabling which would have to
> > be added.  My 0 out of 25000 was not reassuring.  Nor the original
> > test results, when it was doing completely the wrong thing unnoticed.
> 
> 
> In changelog, Bartlomiej said.
> 
>     My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means
>     131072 standard 4KiB pages in 'Normal' zone) is to:
>     
>     - allocate 120000 pages for kernel's usage
>     - free every second page (60000 pages) of memory just allocated
>     - allocate and use 60000 pages from user space
>     - free remaining 60000 pages of kernel memory
>       (now we have fragmented memory occupied mostly by user space pages)
>     - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
>     
>     The results:
>     - with compaction disabled I get 11 successful allocations
>     - with compaction enabled - 14 successful allocations
>     - with this patch I'm able to get all 100 successful allocations
> 
> I think above workload is really really artificial and theoretical so I didn't like
> this patch but Mel seem to like it. :(
> 
> Quote from Mel
> " Ok, that is indeed an adverse workload that the current system will not
> properly deal with. I think you are right to try fixing this but may need
> a different approach that takes the cost out of the allocation/free path
> and moves it the compaction path."

Please note that the current patch is less intrusive than the original
version that Mel was talking about in the above quote (the cost is only in
compaction path which is non-default one and in a allocation slow-path).

> We can correct this patch to work but at least need justification about it.
> Do we really need this patch for such artificial workload?
> what do you think?

I would still like to get this patch included since it helps with my
test-case and is not very much code / complexity added.  So far I fixed
(all?) outstanding issues in the patch attached below and will post
the next combined version (v9) of the patch in the new thread.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center


- use right page for pageblock conversion in rescue_unmovable_pageblock()
- split rescue_unmovable_pageblock() on can_rescue_unmovable_pageblock()
  and __rescue_unmovable_pageblock()
- add missing locking

---
 mm/compaction.c |   66 ++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 45 insertions(+), 21 deletions(-)

Index: b/mm/compaction.c
===================================================================
--- a/mm/compaction.c	2012-06-04 15:19:04.564467996 +0200
+++ b/mm/compaction.c	2012-06-04 15:19:15.700467901 +0200
@@ -362,50 +362,70 @@ isolate_migratepages_range(struct zone *
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
 /*
- * Returns true if MIGRATE_UNMOVABLE pageblock was successfully
+ * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
  * converted to MIGRATE_MOVABLE type, false otherwise.
  */
-static bool rescue_unmovable_pageblock(struct page *page)
+static bool can_rescue_unmovable_pageblock(struct page *page, bool locked)
 {
 	unsigned long pfn, start_pfn, end_pfn;
-	struct page *start_page, *end_page;
+	struct page *start_page, *end_page, *cursor_page;
 
 	pfn = page_to_pfn(page);
 	start_pfn = pfn & ~(pageblock_nr_pages - 1);
-	end_pfn = start_pfn + pageblock_nr_pages;
+	end_pfn = start_pfn + pageblock_nr_pages - 1;
 
 	start_page = pfn_to_page(start_pfn);
 	end_page = pfn_to_page(end_pfn);
 
-	/* Do not deal with pageblocks that overlap zones */
-	if (page_zone(start_page) != page_zone(end_page))
-		return false;
+	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
+		pfn++, cursor_page++) {
+		struct zone *zone = page_zone(start_page);
+		unsigned long flags;
 
-	for (page = start_page, pfn = start_pfn; page < end_page; pfn++,
-								  page++) {
 		if (!pfn_valid_within(pfn))
 			continue;
 
-		if (PageBuddy(page)) {
-			int order = page_order(page);
+		/* Do not deal with pageblocks that overlap zones */
+		if (page_zone(cursor_page) != zone)
+			return false;
+
+		if (!locked)
+			spin_lock_irqsave(&zone->lock, flags);
+
+		if (PageBuddy(cursor_page)) {
+			int order = page_order(cursor_page);
 
 			pfn += (1 << order) - 1;
-			page += (1 << order) - 1;
+			cursor_page += (1 << order) - 1;
 
+			if (!locked)
+				spin_unlock_irqrestore(&zone->lock, flags);
 			continue;
-		} else if (page_count(page) == 0 || PageLRU(page))
+		} else if (page_count(cursor_page) == 0 ||
+			   PageLRU(cursor_page)) {
+			if (!locked)
+				spin_unlock_irqrestore(&zone->lock, flags);
 			continue;
+		}
+
+		if (!locked)
+			spin_unlock_irqrestore(&zone->lock, flags);
 
 		return false;
 	}
 
+	return true;
+}
+
+void __rescue_unmovable_pageblock(struct page *page)
+{
 	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
-	return true;
 }
 
 enum smt_result {
 	GOOD_AS_MIGRATION_TARGET,
+	GOOD_CAN_RESCUE_UNMOVABLE_TARGET,
 	FAIL_UNMOVABLE_TARGET,
 	FAIL_BAD_TARGET,
 };
@@ -416,7 +436,7 @@ enum smt_result {
  * is within a MIGRATE_UNMOVABLE block, FAIL_BAD_TARGET otherwise.
  */
 static enum smt_result suitable_migration_target(struct page *page,
-				      struct compact_control *cc)
+				      struct compact_control *cc, bool locked)
 {
 
 	int migratetype = get_pageblock_migratetype(page);
@@ -440,8 +460,8 @@ static enum smt_result suitable_migratio
 
 	if (cc->mode != COMPACT_ASYNC_MOVABLE &&
 	    migratetype == MIGRATE_UNMOVABLE &&
-	    rescue_unmovable_pageblock(page))
-		return GOOD_AS_MIGRATION_TARGET;
+	    can_rescue_unmovable_pageblock(page, locked))
+		return GOOD_CAN_RESCUE_UNMOVABLE_TARGET;
 
 	/* Otherwise skip the block */
 	return FAIL_BAD_TARGET;
@@ -509,8 +529,9 @@ static void isolate_freepages(struct zon
 			continue;
 
 		/* Check the block is suitable for migration */
-		ret = suitable_migration_target(page, cc);
-		if (ret != GOOD_AS_MIGRATION_TARGET) {
+		ret = suitable_migration_target(page, cc, false);
+		if (ret != GOOD_AS_MIGRATION_TARGET &&
+		    ret != GOOD_CAN_RESCUE_UNMOVABLE_TARGET) {
 			if (ret == FAIL_UNMOVABLE_TARGET)
 				cc->nr_pageblocks_skipped++;
 			continue;
@@ -523,8 +544,11 @@ static void isolate_freepages(struct zon
 		 */
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
-		ret = suitable_migration_target(page, cc);
-		if (ret == GOOD_AS_MIGRATION_TARGET) {
+		ret = suitable_migration_target(page, cc, true);
+		if (ret == GOOD_AS_MIGRATION_TARGET ||
+		    ret == GOOD_CAN_RESCUE_UNMOVABLE_TARGET) {
+			if (ret == GOOD_CAN_RESCUE_UNMOVABLE_TARGET)
+				__rescue_unmovable_pageblock(page);
 			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
 			isolated = isolate_freepages_block(pfn, end_pfn,
 							   freelist, false);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2012-06-04 13:38 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-30 16:33 WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Dave Jones
2012-05-30 16:33 ` Dave Jones
2012-05-31  0:57 ` Dave Jones
2012-05-31  0:57   ` Dave Jones
2012-06-01  2:31   ` Dave Jones
2012-06-01  2:31     ` Dave Jones
2012-06-01  2:43     ` Linus Torvalds
2012-06-01  2:43       ` Linus Torvalds
2012-06-01 13:43       ` Dave Jones
2012-06-01 13:43         ` Dave Jones
2012-06-01  8:44     ` Hugh Dickins
2012-06-01  8:44       ` Hugh Dickins
2012-06-01  8:51       ` KOSAKI Motohiro
2012-06-01  8:51         ` KOSAKI Motohiro
2012-06-01  9:08         ` Hugh Dickins
2012-06-01  9:08           ` Hugh Dickins
2012-06-01  9:12           ` KOSAKI Motohiro
2012-06-01  9:12             ` KOSAKI Motohiro
2012-06-01 14:09       ` Dave Jones
2012-06-01 14:09         ` Dave Jones
2012-06-01 14:14       ` Dave Jones
2012-06-01 14:14         ` Dave Jones
2012-06-01 16:12       ` Dave Jones
2012-06-01 16:12         ` Dave Jones
2012-06-01 17:16         ` Dave Jones
2012-06-01 17:16           ` Dave Jones
2012-06-01 22:17           ` Hugh Dickins
2012-06-01 22:17             ` Hugh Dickins
2012-06-02  1:45             ` Linus Torvalds
2012-06-02  1:45               ` Linus Torvalds
2012-06-02  4:40               ` Hugh Dickins
2012-06-02  4:58                 ` Linus Torvalds
2012-06-02  4:58                   ` Linus Torvalds
2012-06-02  7:20                   ` Hugh Dickins
2012-06-02  7:20                     ` Hugh Dickins
2012-06-02  7:17                 ` Markus Trippelsdorf
2012-06-02  7:17                   ` Markus Trippelsdorf
2012-06-02  7:22                   ` Hugh Dickins
2012-06-02  7:22                     ` Hugh Dickins
2012-06-02  7:27                     ` [PATCH] mm: fix warning in __set_page_dirty_nobuffers Hugh Dickins
2012-06-02  7:27                       ` Hugh Dickins
2012-06-03 18:15                 ` WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Dave Jones
2012-06-03 18:15                   ` Dave Jones
2012-06-03 18:23                   ` Linus Torvalds
2012-06-03 18:23                     ` Linus Torvalds
2012-06-03 18:31                     ` Dave Jones
2012-06-03 18:31                       ` Dave Jones
2012-06-03 20:53                       ` Dave Jones
2012-06-03 20:53                         ` Dave Jones
2012-06-03 21:59                         ` Linus Torvalds
2012-06-03 21:59                           ` Linus Torvalds
2012-06-03 22:13                           ` Dave Jones
2012-06-03 22:13                             ` Dave Jones
2012-06-03 22:29                           ` Hugh Dickins
2012-06-03 22:29                             ` Hugh Dickins
2012-06-03 22:17                         ` Hugh Dickins
2012-06-03 22:17                           ` Hugh Dickins
2012-06-03 23:13                           ` Linus Torvalds
2012-06-03 23:13                             ` Linus Torvalds
2012-06-04  0:46                             ` KOSAKI Motohiro
2012-06-04  0:46                               ` KOSAKI Motohiro
2012-06-04  1:18                             ` Hugh Dickins
2012-06-04  1:18                               ` Hugh Dickins
2012-06-04  1:21                             ` Minchan Kim
2012-06-04  1:21                               ` Minchan Kim
2012-06-04  1:26                               ` KOSAKI Motohiro
2012-06-04  1:26                                 ` KOSAKI Motohiro
2012-06-04  2:30                                 ` Minchan Kim
2012-06-04  2:30                                   ` Minchan Kim
2012-06-04  1:10                 ` Minchan Kim
2012-06-04  1:10                   ` Minchan Kim
2012-06-04  1:41                   ` Hugh Dickins
2012-06-04  1:41                     ` Hugh Dickins
2012-06-04  1:47                     ` KOSAKI Motohiro
2012-06-04  1:47                       ` KOSAKI Motohiro
2012-06-04  2:28                     ` Minchan Kim
2012-06-04  2:28                       ` Minchan Kim
2012-06-04  4:21                       ` KOSAKI Motohiro
2012-06-04  4:21                         ` KOSAKI Motohiro
2012-06-04 13:37                       ` Bartlomiej Zolnierkiewicz
2012-06-04 13:37                         ` Bartlomiej Zolnierkiewicz
2012-06-01 16:16       ` Markus Trippelsdorf
2012-06-01 16:16         ` Markus Trippelsdorf
2012-06-01 16:28         ` Linus Torvalds
2012-06-01 16:28           ` Linus Torvalds
2012-06-01 16:39           ` Markus Trippelsdorf
2012-06-01 16:39             ` Markus Trippelsdorf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.