linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [next-20160615] kernel BUG at mm/rmap.c:1251!
@ 2016-06-16  8:46 Sergey Senozhatsky
  2016-06-16  8:58 ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Sergey Senozhatsky @ 2016-06-16  8:46 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Minchan Kim,
	Stephen Rothwell, Sergey Senozhatsky, Sergey Senozhatsky

Hello,

[..]
[  272.687656] vma ffff8800b855a5a0 start 00007f3576d58000 end 00007f3576f66000
               next ffff8800b977d2c0 prev ffff8800bdfb1860 mm ffff8801315ff200
               prot 8000000000000025 anon_vma ffff8800b7e583b0 vm_ops           (null)
               pgoff 7f3576d58 file           (null) private_data           (null)
               flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
[  272.691793] ------------[ cut here ]------------
[  272.692820] kernel BUG at mm/rmap.c:1251!
[  272.693843] invalid opcode: 0000 [#1] PREEMPT SMP
[  272.694858] Modules linked in: snd_hda_codec_realtek snd_hda_codec_generic mousedev snd_hda_intel snd_hda_codec snd_hda_core coretemp hwmon snd_pcm r8169 snd_timer crc32c_intel snd mii i2c_i801 soundcore lpc_ich acpi_cpufreq mfd_core processor sch_fq_codel hid_generic usbhid hid sd_mod ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common
[  272.697061] CPU: 2 PID: 38 Comm: khugepaged Not tainted 4.7.0-rc3-next-20160615-dbg-00005-gfd11984-dirty #493
[  272.699208] task: ffff88013332a980 ti: ffff880133348000 task.ti: ffff880133348000
[  272.700280] RIP: 0010:[<ffffffff810f67ad>]  [<ffffffff810f67ad>] page_add_new_anon_rmap+0x68/0x136
[  272.701359] RSP: 0000:ffff88013334bcd0  EFLAGS: 00010296
[  272.702427] RAX: 0000000000000149 RBX: ffffea0001978000 RCX: 0000000000000002
[  272.703498] RDX: ffff880137d10401 RSI: ffffffff81798adf RDI: 00000000ffffffff
[  272.704574] RBP: ffff88013334bcf0 R08: 0000000000000001 R09: 0000000000000000
[  272.705648] R10: ffff88013334bca0 R11: 00000000fffffffc R12: 0000000000000200
[  272.706714] R13: 00007f3577000000 R14: ffff8800b855a5a0 R15: ffff880000000000
[  272.707782] FS:  0000000000000000(0000) GS:ffff880137d00000(0000) knlGS:0000000000000000
[  272.708852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  272.709913] CR2: 00007f142dd37000 CR3: 00000000baaf4000 CR4: 00000000000006e0
[  272.710961] Stack:
[  272.711998]  ffffea0001978000 ffff8800badbadc0 ffffea0002e77280 8000000065e000e7
[  272.713036]  ffff88013334be68 ffffffff81114671 ffff88013332a980 ffff88013334c000
[  272.714068]  ffff88013332a980 ffff8800b9dcb000 00007f3577200000 000000000101bda0
[  272.715092] Call Trace:
[  272.716100]  [<ffffffff81114671>] khugepaged+0x2227/0x2751
[  272.717105]  [<ffffffff8106f766>] ? prepare_to_wait_event+0xe4/0xe4
[  272.718094]  [<ffffffff8111244a>] ? hugepage_vma_revalidate+0x6f/0x6f
[  272.719087]  [<ffffffff8111244a>] ? hugepage_vma_revalidate+0x6f/0x6f
[  272.720067]  [<ffffffff81055f22>] kthread+0xf3/0xfb
[  272.721035]  [<ffffffff814ab198>] ? _raw_spin_unlock_irq+0x27/0x45
[  272.721990]  [<ffffffff814abaff>] ret_from_fork+0x1f/0x40
[  272.722932]  [<ffffffff81055e2f>] ? kthread_create_on_node+0x1ca/0x1ca
[  272.723860] Code: 19 e4 41 81 e4 01 fe ff ff 41 81 c4 00 02 00 00 eb 06 41 bc 01 00 00 00 4d 39 2e 77 06 4d 3b 6e 08 72 0a 4c 89 f7 e8 73 11 ff ff <0f> 0b 48 8b 53 20 48 8d 42 ff 80 e2 01 48 0f 44 c3 0f ba 28 12 
[  272.724956] RIP  [<ffffffff810f67ad>] page_add_new_anon_rmap+0x68/0x136
[  272.725918]  RSP <ffff88013334bcd0>
[  272.726890] ---[ end trace eb7290ad13e0e7f0 ]---

[  272.727842] BUG: sleeping function called from invalid context at include/linux/sched.h:2960
[  272.728798] in_atomic(): 1, irqs_disabled(): 0, pid: 38, name: khugepaged
[  272.729821] INFO: lockdep is turned off.
[  272.730762] Preemption disabled at:[<ffffffff8111464d>] khugepaged+0x2203/0x2751

[  272.732618] CPU: 2 PID: 38 Comm: khugepaged Tainted: G      D         4.7.0-rc3-next-20160615-dbg-00005-gfd11984-dirty #493
[  272.734460]  0000000000000000 ffff88013334b9d0 ffffffff811ec73b 0000000000000000
[  272.735382]  ffff88013332a980 ffff88013334b9f8 ffffffff81059b98 ffffffff8174c31c
[  272.736296]  0000000000000b90 0000000000000000 ffff88013334ba20 ffffffff81059c0f
[  272.737203] Call Trace:
[  272.738085]  [<ffffffff811ec73b>] dump_stack+0x68/0x92
[  272.738961]  [<ffffffff81059b98>] ___might_sleep+0x1fb/0x202
[  272.739831]  [<ffffffff81059c0f>] __might_sleep+0x70/0x77
[  272.740684]  [<ffffffff81048ac7>] exit_signals+0x1e/0x119
[  272.741528]  [<ffffffff8107dd86>] ? kmsg_dump+0x12c/0x154
[  272.742362]  [<ffffffff8103f23a>] do_exit+0x111/0x8f3
[  272.743184]  [<ffffffff8107dda3>] ? kmsg_dump+0x149/0x154
[  272.743996]  [<ffffffff81014b39>] oops_end+0x9d/0xa4
[  272.744801]  [<ffffffff81014c6e>] die+0x55/0x5e
[  272.745602]  [<ffffffff81012450>] do_trap+0x67/0x11d
[  272.746401]  [<ffffffff8101272d>] do_error_trap+0x100/0x10f
[  272.747190]  [<ffffffff810f67ad>] ? page_add_new_anon_rmap+0x68/0x136
[  272.747974]  [<ffffffff8107d3b9>] ? vprintk_emit+0x427/0x449
[  272.748756]  [<ffffffff81001036>] ? trace_hardirqs_off_thunk+0x1a/0x1c
[  272.749537]  [<ffffffff81012889>] do_invalid_op+0x1b/0x1d
[  272.750316]  [<ffffffff814acb65>] invalid_op+0x15/0x20
[  272.751097]  [<ffffffff810f67ad>] ? page_add_new_anon_rmap+0x68/0x136
[  272.751879]  [<ffffffff81114671>] khugepaged+0x2227/0x2751
[  272.752660]  [<ffffffff8106f766>] ? prepare_to_wait_event+0xe4/0xe4
[  272.753442]  [<ffffffff8111244a>] ? hugepage_vma_revalidate+0x6f/0x6f
[  272.754223]  [<ffffffff8111244a>] ? hugepage_vma_revalidate+0x6f/0x6f
[  272.755001]  [<ffffffff81055f22>] kthread+0xf3/0xfb
[  272.755781]  [<ffffffff814ab198>] ? _raw_spin_unlock_irq+0x27/0x45
[  272.756557]  [<ffffffff814abaff>] ret_from_fork+0x1f/0x40
[  272.757335]  [<ffffffff81055e2f>] ? kthread_create_on_node+0x1ca/0x1ca
[  272.758124] note: khugepaged[38] exited with preempt_count 1

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [next-20160615] kernel BUG at mm/rmap.c:1251!
  2016-06-16  8:46 [next-20160615] kernel BUG at mm/rmap.c:1251! Sergey Senozhatsky
@ 2016-06-16  8:58 ` Michal Hocko
  2016-06-16  9:23   ` Sergey Senozhatsky
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2016-06-16  8:58 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Vlastimil Babka,
	Minchan Kim, Stephen Rothwell, Sergey Senozhatsky

On Thu 16-06-16 17:46:57, Sergey Senozhatsky wrote:
> Hello,
> 
> [..]
> [  272.687656] vma ffff8800b855a5a0 start 00007f3576d58000 end 00007f3576f66000
>                next ffff8800b977d2c0 prev ffff8800bdfb1860 mm ffff8801315ff200
>                prot 8000000000000025 anon_vma ffff8800b7e583b0 vm_ops           (null)
>                pgoff 7f3576d58 file           (null) private_data           (null)
>                flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
> [  272.691793] ------------[ cut here ]------------
> [  272.692820] kernel BUG at mm/rmap.c:1251!

Is this?
page_add_new_anon_rmap:
	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma)
[...]
> [  272.727842] BUG: sleeping function called from invalid context at include/linux/sched.h:2960

If yes then I am not sure we can do much about the this part. BUG_ON in
an atomic context is unfortunate but the BUG_ON points out a real bug so
we shouldn't drop it because of the potential atomic context. The above
VM_BUG_ON should definitely be addressed. I thought that Vlastimil has
pointed out some issues with the khugepaged lock inconsistencies which
might lead to issues like this.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [next-20160615] kernel BUG at mm/rmap.c:1251!
  2016-06-16  8:58 ` Michal Hocko
@ 2016-06-16  9:23   ` Sergey Senozhatsky
  2016-06-16  9:41     ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Sergey Senozhatsky @ 2016-06-16  9:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Vlastimil Babka, Minchan Kim, Stephen Rothwell,
	Sergey Senozhatsky

On (06/16/16 10:58), Michal Hocko wrote:
> > [..]
> > [  272.687656] vma ffff8800b855a5a0 start 00007f3576d58000 end 00007f3576f66000
> >                next ffff8800b977d2c0 prev ffff8800bdfb1860 mm ffff8801315ff200
> >                prot 8000000000000025 anon_vma ffff8800b7e583b0 vm_ops           (null)
> >                pgoff 7f3576d58 file           (null) private_data           (null)
> >                flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
> > [  272.691793] ------------[ cut here ]------------
> > [  272.692820] kernel BUG at mm/rmap.c:1251!
> 
> Is this?
> page_add_new_anon_rmap:
> 	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma)
> [...]

I think it is

1248 void page_add_new_anon_rmap(struct page *page,
1249         struct vm_area_struct *vma, unsigned long address, bool compound)
1250 {
1251         int nr = compound ? hpage_nr_pages(page) : 1;
1252
1253         VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
1254         __SetPageSwapBacked(page);

> > [  272.727842] BUG: sleeping function called from invalid context at include/linux/sched.h:2960
> 
> If yes then I am not sure we can do much about the this part. BUG_ON in
> an atomic context is unfortunate but the BUG_ON points out a real bug so
> we shouldn't drop it because of the potential atomic context. The above
> VM_BUG_ON should definitely be addressed. I thought that Vlastimil has
> pointed out some issues with the khugepaged lock inconsistencies which
> might lead to issues like this.

collapse_huge_page() ->mmap_sem fixup patch (http://marc.info/?l=linux-mm&m=146495692807404&w=2)
is in next-20160615. or do you mean some other patch?

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [next-20160615] kernel BUG at mm/rmap.c:1251!
  2016-06-16  9:23   ` Sergey Senozhatsky
@ 2016-06-16  9:41     ` Michal Hocko
  2016-06-16  9:54       ` Sergey Senozhatsky
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2016-06-16  9:41 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Vlastimil Babka,
	Minchan Kim, Stephen Rothwell, Sergey Senozhatsky

On Thu 16-06-16 18:23:45, Sergey Senozhatsky wrote:
> On (06/16/16 10:58), Michal Hocko wrote:
> > > [..]
> > > [  272.687656] vma ffff8800b855a5a0 start 00007f3576d58000 end 00007f3576f66000
> > >                next ffff8800b977d2c0 prev ffff8800bdfb1860 mm ffff8801315ff200
> > >                prot 8000000000000025 anon_vma ffff8800b7e583b0 vm_ops           (null)
> > >                pgoff 7f3576d58 file           (null) private_data           (null)
> > >                flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
> > > [  272.691793] ------------[ cut here ]------------
> > > [  272.692820] kernel BUG at mm/rmap.c:1251!
> > 
> > Is this?
> > page_add_new_anon_rmap:
> > 	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma)
> > [...]
> 
> I think it is
> 
> 1248 void page_add_new_anon_rmap(struct page *page,
> 1249         struct vm_area_struct *vma, unsigned long address, bool compound)
> 1250 {
> 1251         int nr = compound ? hpage_nr_pages(page) : 1;
> 1252
> 1253         VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> 1254         __SetPageSwapBacked(page);
> 
> > > [  272.727842] BUG: sleeping function called from invalid context at include/linux/sched.h:2960
> > 
> > If yes then I am not sure we can do much about the this part. BUG_ON in
> > an atomic context is unfortunate but the BUG_ON points out a real bug so
> > we shouldn't drop it because of the potential atomic context. The above
> > VM_BUG_ON should definitely be addressed. I thought that Vlastimil has
> > pointed out some issues with the khugepaged lock inconsistencies which
> > might lead to issues like this.
> 
> collapse_huge_page() ->mmap_sem fixup patch (http://marc.info/?l=linux-mm&m=146495692807404&w=2)
> is in next-20160615. or do you mean some other patch?

Yes that's what I meant, but I haven't reviewed the patch to see whether
it is correct/complete. It would be good to see whether the issue is
related to those changes.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [next-20160615] kernel BUG at mm/rmap.c:1251!
  2016-06-16  9:41     ` Michal Hocko
@ 2016-06-16  9:54       ` Sergey Senozhatsky
  2016-06-16 10:12         ` Minchan Kim
  0 siblings, 1 reply; 8+ messages in thread
From: Sergey Senozhatsky @ 2016-06-16  9:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Vlastimil Babka, Minchan Kim, Stephen Rothwell,
	Sergey Senozhatsky

On (06/16/16 11:41), Michal Hocko wrote:
> On Thu 16-06-16 18:23:45, Sergey Senozhatsky wrote:
> > On (06/16/16 10:58), Michal Hocko wrote:
> > > > [..]
> > > > [  272.687656] vma ffff8800b855a5a0 start 00007f3576d58000 end 00007f3576f66000
> > > >                next ffff8800b977d2c0 prev ffff8800bdfb1860 mm ffff8801315ff200
> > > >                prot 8000000000000025 anon_vma ffff8800b7e583b0 vm_ops           (null)
> > > >                pgoff 7f3576d58 file           (null) private_data           (null)
> > > >                flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
> > > > [  272.691793] ------------[ cut here ]------------
> > > > [  272.692820] kernel BUG at mm/rmap.c:1251!
> > > 
> > > Is this?
> > > page_add_new_anon_rmap:
> > > 	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma)
> > > [...]
> > 
> > I think it is
> > 
> > 1248 void page_add_new_anon_rmap(struct page *page,
> > 1249         struct vm_area_struct *vma, unsigned long address, bool compound)
> > 1250 {
> > 1251         int nr = compound ? hpage_nr_pages(page) : 1;
> > 1252
> > 1253         VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> > 1254         __SetPageSwapBacked(page);
> > 
> > > > [  272.727842] BUG: sleeping function called from invalid context at include/linux/sched.h:2960
> > > 
> > > If yes then I am not sure we can do much about the this part. BUG_ON in
> > > an atomic context is unfortunate but the BUG_ON points out a real bug so
> > > we shouldn't drop it because of the potential atomic context. The above
> > > VM_BUG_ON should definitely be addressed. I thought that Vlastimil has
> > > pointed out some issues with the khugepaged lock inconsistencies which
> > > might lead to issues like this.
> > 
> > collapse_huge_page() ->mmap_sem fixup patch (http://marc.info/?l=linux-mm&m=146495692807404&w=2)
> > is in next-20160615. or do you mean some other patch?
> 
> Yes that's what I meant, but I haven't reviewed the patch to see whether
> it is correct/complete. It would be good to see whether the issue is
> related to those changes.

I'll copy-paste one more backtrace I swa today [originally was posted to another
mail thread].


kernel: BUG: Bad page state in process khugepaged  pfn:101db8
kernel: page:ffffea0004076e00 count:0 mapcount:-127 mapping:          (null) index:0x1
kernel: flags: 0x8000000000000000()
kernel: page dumped because: nonzero mapcount
kernel: Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon crc32c_intel snd_hda_codec_realtek i2c_i801 snd_hda_codec_generic r8169 mii snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd soundcore lpc_ich
+processor mfd_core sch_fq_codel sd_mod hid_generic usb
kernel: CPU: 3 PID: 38 Comm: khugepaged Not tainted 4.7.0-rc3-next-20160615-dbg-00005-gfd11984-dirty #491
kernel:  0000000000000000 ffff8801124c73f8 ffffffff814d69b0 ffffea0004076e00
kernel:  ffffffff81e658a0 ffff8801124c7420 ffffffff811e9b63 0000000000000000
kernel:  ffffea0004076e00 ffffffff81e658a0 ffff8801124c7440 ffffffff811e9ca9
kernel: Call Trace:
kernel:  [<ffffffff814d69b0>] dump_stack+0x68/0x92
kernel:  [<ffffffff811e9b63>] bad_page+0x158/0x1a2
kernel:  [<ffffffff811e9ca9>] free_pages_check_bad+0xfc/0x101
kernel:  [<ffffffff811ee516>] free_hot_cold_page+0x135/0x5de
kernel:  [<ffffffff811eea26>] __free_pages+0x67/0x72
kernel:  [<ffffffff81227c63>] release_freepages+0x13a/0x191
kernel:  [<ffffffff8122b3c2>] compact_zone+0x845/0x1155
kernel:  [<ffffffff8122ab7d>] ? compaction_suitable+0x76/0x76
kernel:  [<ffffffff8122bdb2>] compact_zone_order+0xe0/0x167
kernel:  [<ffffffff8122bcd2>] ? compact_zone+0x1155/0x1155
kernel:  [<ffffffff8122ce88>] try_to_compact_pages+0x2f1/0x648
kernel:  [<ffffffff8122ce88>] ? try_to_compact_pages+0x2f1/0x648
kernel:  [<ffffffff8122cb97>] ? compaction_zonelist_suitable+0x3a6/0x3a6
kernel:  [<ffffffff811ef1ea>] ? get_page_from_freelist+0x2c0/0x133c
kernel:  [<ffffffff811f0350>] __alloc_pages_direct_compact+0xea/0x30d
kernel:  [<ffffffff811f0266>] ? get_page_from_freelist+0x133c/0x133c
kernel:  [<ffffffff811ee3b2>] ? drain_all_pages+0x1d6/0x205
kernel:  [<ffffffff811f21a8>] __alloc_pages_nodemask+0x143d/0x16b6
kernel:  [<ffffffff8111f405>] ? debug_show_all_locks+0x226/0x226
kernel:  [<ffffffff811f0d6b>] ? warn_alloc_failed+0x24c/0x24c
kernel:  [<ffffffff81110ffc>] ? finish_wait+0x1a4/0x1b0
kernel:  [<ffffffff81122faf>] ? lock_acquire+0xec/0x147
kernel:  [<ffffffff81d32ed0>] ? _raw_spin_unlock_irqrestore+0x3b/0x5c
kernel:  [<ffffffff81d32edc>] ? _raw_spin_unlock_irqrestore+0x47/0x5c
kernel:  [<ffffffff81110ffc>] ? finish_wait+0x1a4/0x1b0
kernel:  [<ffffffff8128f73a>] khugepaged+0x1d4/0x484f
kernel:  [<ffffffff8128f566>] ? hugepage_vma_revalidate+0xef/0xef
kernel:  [<ffffffff810d5bcc>] ? finish_task_switch+0x3de/0x484
kernel:  [<ffffffff81d32f18>] ? _raw_spin_unlock_irq+0x27/0x45
kernel:  [<ffffffff8111d13f>] ? trace_hardirqs_on_caller+0x3d2/0x492
kernel:  [<ffffffff81111487>] ? prepare_to_wait_event+0x3f7/0x3f7
kernel:  [<ffffffff81d28bf5>] ? __schedule+0xa4d/0xd16
kernel:  [<ffffffff810cd0de>] kthread+0x252/0x261
kernel:  [<ffffffff8128f566>] ? hugepage_vma_revalidate+0xef/0xef
kernel:  [<ffffffff810cce8c>] ? kthread_create_on_node+0x377/0x377
kernel:  [<ffffffff81d3387f>] ret_from_fork+0x1f/0x40
kernel:  [<ffffffff810cce8c>] ? kthread_create_on_node+0x377/0x377
-- Reboot --

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [next-20160615] kernel BUG at mm/rmap.c:1251!
  2016-06-16  9:54       ` Sergey Senozhatsky
@ 2016-06-16 10:12         ` Minchan Kim
  2016-06-16 10:18           ` Sergey Senozhatsky
  2016-06-17  8:17           ` Sergey Senozhatsky
  0 siblings, 2 replies; 8+ messages in thread
From: Minchan Kim @ 2016-06-16 10:12 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Michal Hocko, Andrew Morton, linux-mm, linux-kernel,
	Vlastimil Babka, Stephen Rothwell, Sergey Senozhatsky

On Thu, Jun 16, 2016 at 06:54:57PM +0900, Sergey Senozhatsky wrote:
> On (06/16/16 11:41), Michal Hocko wrote:
> > On Thu 16-06-16 18:23:45, Sergey Senozhatsky wrote:
> > > On (06/16/16 10:58), Michal Hocko wrote:
> > > > > [..]
> > > > > [  272.687656] vma ffff8800b855a5a0 start 00007f3576d58000 end 00007f3576f66000
> > > > >                next ffff8800b977d2c0 prev ffff8800bdfb1860 mm ffff8801315ff200
> > > > >                prot 8000000000000025 anon_vma ffff8800b7e583b0 vm_ops           (null)
> > > > >                pgoff 7f3576d58 file           (null) private_data           (null)
> > > > >                flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
> > > > > [  272.691793] ------------[ cut here ]------------
> > > > > [  272.692820] kernel BUG at mm/rmap.c:1251!
> > > > 
> > > > Is this?
> > > > page_add_new_anon_rmap:
> > > > 	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma)
> > > > [...]
> > > 
> > > I think it is
> > > 
> > > 1248 void page_add_new_anon_rmap(struct page *page,
> > > 1249         struct vm_area_struct *vma, unsigned long address, bool compound)
> > > 1250 {
> > > 1251         int nr = compound ? hpage_nr_pages(page) : 1;
> > > 1252
> > > 1253         VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> > > 1254         __SetPageSwapBacked(page);
> > > 
> > > > > [  272.727842] BUG: sleeping function called from invalid context at include/linux/sched.h:2960
> > > > 
> > > > If yes then I am not sure we can do much about the this part. BUG_ON in
> > > > an atomic context is unfortunate but the BUG_ON points out a real bug so
> > > > we shouldn't drop it because of the potential atomic context. The above
> > > > VM_BUG_ON should definitely be addressed. I thought that Vlastimil has
> > > > pointed out some issues with the khugepaged lock inconsistencies which
> > > > might lead to issues like this.
> > > 
> > > collapse_huge_page() ->mmap_sem fixup patch (http://marc.info/?l=linux-mm&m=146495692807404&w=2)
> > > is in next-20160615. or do you mean some other patch?
> > 
> > Yes that's what I meant, but I haven't reviewed the patch to see whether
> > it is correct/complete. It would be good to see whether the issue is
> > related to those changes.
> 
> I'll copy-paste one more backtrace I swa today [originally was posted to another
> mail thread].

Please, look at http://lkml.kernel.org/r/20160616100932.GS17127@bbox

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [next-20160615] kernel BUG at mm/rmap.c:1251!
  2016-06-16 10:12         ` Minchan Kim
@ 2016-06-16 10:18           ` Sergey Senozhatsky
  2016-06-17  8:17           ` Sergey Senozhatsky
  1 sibling, 0 replies; 8+ messages in thread
From: Sergey Senozhatsky @ 2016-06-16 10:18 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Joonsoo Kim, Sergey Senozhatsky, Michal Hocko, Andrew Morton,
	linux-mm, linux-kernel, Vlastimil Babka, Stephen Rothwell,
	Sergey Senozhatsky

On (06/16/16 19:12), Minchan Kim wrote:
[..]
> > > > > Is this?
> > > > > page_add_new_anon_rmap:
> > > > > 	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma)
> > > > > [...]
> > > > 
> > > > I think it is
> > > > 
> > > > 1248 void page_add_new_anon_rmap(struct page *page,
> > > > 1249         struct vm_area_struct *vma, unsigned long address, bool compound)
> > > > 1250 {
> > > > 1251         int nr = compound ? hpage_nr_pages(page) : 1;
> > > > 1252
> > > > 1253         VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> > > > 1254         __SetPageSwapBacked(page);
> > > > 
> > > > > > [  272.727842] BUG: sleeping function called from invalid context at include/linux/sched.h:2960
> > > > > 
> > > > > If yes then I am not sure we can do much about the this part. BUG_ON in
> > > > > an atomic context is unfortunate but the BUG_ON points out a real bug so
> > > > > we shouldn't drop it because of the potential atomic context. The above
> > > > > VM_BUG_ON should definitely be addressed. I thought that Vlastimil has
> > > > > pointed out some issues with the khugepaged lock inconsistencies which
> > > > > might lead to issues like this.
> > > > 
> > > > collapse_huge_page() ->mmap_sem fixup patch (http://marc.info/?l=linux-mm&m=146495692807404&w=2)
> > > > is in next-20160615. or do you mean some other patch?
> > > 
> > > Yes that's what I meant, but I haven't reviewed the patch to see whether
> > > it is correct/complete. It would be good to see whether the issue is
> > > related to those changes.
> > 
> > I'll copy-paste one more backtrace I swa today [originally was posted to another
> > mail thread].
> 
> Please, look at http://lkml.kernel.org/r/20160616100932.GS17127@bbox

oh, yes, sorry. sure, scheduled for testing a bit later today.

Cc Joonsoo, so we can keep the discussion in one place.

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [next-20160615] kernel BUG at mm/rmap.c:1251!
  2016-06-16 10:12         ` Minchan Kim
  2016-06-16 10:18           ` Sergey Senozhatsky
@ 2016-06-17  8:17           ` Sergey Senozhatsky
  1 sibling, 0 replies; 8+ messages in thread
From: Sergey Senozhatsky @ 2016-06-17  8:17 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Joonsoo Kim, Sergey Senozhatsky, Michal Hocko, Andrew Morton,
	linux-mm, linux-kernel, Vlastimil Babka, Stephen Rothwell,
	Sergey Senozhatsky

Hello,

On (06/16/16 19:12), Minchan Kim wrote:
[..]
> > I'll copy-paste one more backtrace I swa today [originally was posted to another
> > mail thread].
> 
> Please, look at http://lkml.kernel.org/r/20160616100932.GS17127@bbox

I don't have a solid/stable reproducer for this one, but after some
mixed workloads beating (mempressure + zsmalloc + compiler workload)
with reverted b3ceb05f4bae844f67ce I haven't seen any problems.

So I think you nailed it Minchan!


reverted the entire patch set (for simplicity):

    Revert "mm/compaction: split freepages without holding the zone lock"
    Revert "mm/page_owner: initialize page owner without holding the zone lock"
    Revert "mm/page_owner: copy last_migrate_reason in copy_page_owner()"
    Revert "mm/page_owner: introduce split_page_owner and replace manual handling"
    Revert "tools/vm/page_owner: increase temporary buffer size"
    Revert "mm/page_owner: use stackdepot to store stacktrace"
    Revert "mm/page_owner: avoid null pointer dereference"
    Revert "mm/page_alloc: introduce post allocation processing on page allocator"

adding "mm/compaction: split freepages without holding the zone lock"
back seem to introduce the page->map_count bug after some time.

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-06-17  8:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-16  8:46 [next-20160615] kernel BUG at mm/rmap.c:1251! Sergey Senozhatsky
2016-06-16  8:58 ` Michal Hocko
2016-06-16  9:23   ` Sergey Senozhatsky
2016-06-16  9:41     ` Michal Hocko
2016-06-16  9:54       ` Sergey Senozhatsky
2016-06-16 10:12         ` Minchan Kim
2016-06-16 10:18           ` Sergey Senozhatsky
2016-06-17  8:17           ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).