linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [Bug 204407] New: Bad page state in process Xorg
       [not found] <bug-204407-27@https.bugzilla.kernel.org/>
@ 2019-08-02 20:23 ` Andrew Morton
  2019-08-02 20:33   ` Matthew Wilcox
  2019-08-02 21:25   ` Petr Vandrovec
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2019-08-02 20:23 UTC (permalink / raw)
  To: petr
  Cc: bugzilla-daemon, Christian Koenig, Huang Rui, David Airlie,
	Daniel Vetter, dri-devel, linux-mm

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 01 Aug 2019 22:34:16 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=204407
> 
>             Bug ID: 204407
>            Summary: Bad page state in process Xorg
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 5.3.0-rc2-00013
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>           Assignee: akpm@linux-foundation.org
>           Reporter: petr@vandrovec.name
>         Regression: No
> 
> Created attachment 284081
>   --> https://bugzilla.kernel.org/attachment.cgi?id=284081&action=edit
> dmesg
> 
> I've upgraded from 5.3-rc1 to 5.3-rc2, and when I started X server, system
> became unhappy:
> 
> [259701.387365] BUG: Bad page state in process Xorg  pfn:2a300
> [259701.393593] page:ffffea0000a8c000 refcount:0 mapcount:-128
> mapping:0000000000000000 index:0x0
> [259701.402832] flags: 0x2000000000000000()
> [259701.407426] raw: 2000000000000000 ffffffff822ab778 ffffea0000a8f208
> 0000000000000000
> [259701.415900] raw: 0000000000000000 0000000000000003 00000000ffffff7f
> 0000000000000000
> [259701.424373] page dumped because: nonzero mapcount
> [259701.429847] Modules linked in: af_packet xt_REDIRECT nft_compat x_tables
> nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv4 nf_tables
> nfnetlink ppdev parport fuse autofs4 binfmt_misc uinput twofish_generic
> twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common
> camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 serpent_avx_x86_64
> serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64
> blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic cmac
> xcbc rmd160 af_key xfrm_algo rpcsec_gss_krb5 nfsv4 nls_iso8859_2 cifs libarc4
> nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipv6 crc_ccitt
> nf_defrag_ipv6 snd_hda_codec_hdmi pktcdvd coretemp hwmon intel_rapl_common
> iosf_mbi x86_pkg_temp_thermal snd_hda_codec_realtek snd_hda_codec_generic
> ledtrig_audio snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul
> snd_hwdep crc32c_intel snd_hda_core ghash_clmulni_intel snd_pcm_oss uas
> iTCO_wdt aesni_intel e1000e
> [259701.429873]  snd_mixer_oss iTCO_vendor_support aes_x86_64 snd_pcm
> crypto_simd ptp mei_me dcdbas lpc_ich sr_mod cryptd usb_storage snd_timer
> glue_helper mfd_core input_leds pps_core tpm_tis cdrom i2c_i801 snd mei
> tpm_tis_core sg tpm [last unloaded: parport_pc]
> [259701.539387] CPU: 10 PID: 4860 Comm: Xorg Tainted: G                T
> 5.3.0-rc2-64-00013-g03f05a670a3d #69
> [259701.549382] Hardware name: Dell Inc. Precision T3610/09M8Y8, BIOS A16
> 02/05/2018
> [259701.549382] Call Trace:
> [259701.549382]  dump_stack+0x46/0x60
> [259701.549382]  bad_page.cold.28+0x81/0xb4
> [259701.549382]  __free_pages_ok+0x236/0x240
> [259701.549382]  __ttm_dma_free_page+0x2f/0x40
> [259701.549382]  ttm_dma_unpopulate+0x29b/0x370
> [259701.549382]  ttm_tt_destroy.part.6+0x44/0x50
> [259701.549382]  ttm_bo_cleanup_memtype_use+0x29/0x70
> [259701.549382]  ttm_bo_put+0x225/0x280
> [259701.549382]  ttm_bo_vm_close+0x10/0x20
> [259701.549382]  remove_vma+0x20/0x40
> [259701.549382]  __do_munmap+0x2da/0x420
> [259701.549382]  __vm_munmap+0x66/0xc0
> [259701.549382]  __x64_sys_munmap+0x22/0x30
> [259701.549382]  do_syscall_64+0x5e/0x1a0
> [259701.549382]  ? prepare_exit_to_usermode+0x75/0xa0
> [259701.549382]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [259701.549382] RIP: 0033:0x7f504d0ec1d7
> [259701.549382] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 b1 6c 0c 00 f7
> d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48> 3d 01
> f0 ff ff 73 01 c3 48 8b 0d 89 6c 0c 00 f7 d8 64 89 01 48
> [259701.549382] RSP: 002b:00007ffe529db138 EFLAGS: 00000206 ORIG_RAX:
> 000000000000000b
> [259701.549382] RAX: ffffffffffffffda RBX: 0000564a5eabce70 RCX:
> 00007f504d0ec1d7
> [259701.549382] RDX: 00007ffe529db140 RSI: 0000000000400000 RDI:
> 00007f5044b65000
> [259701.549382] RBP: 0000564a5eafe460 R08: 000000000000000b R09:
> 000000010283e000
> [259701.549382] R10: 0000000000000001 R11: 0000000000000206 R12:
> 0000564a5e475b08
> [259701.549382] R13: 0000564a5e475c80 R14: 00007ffe529db190 R15:
> 0000000000000c80
> [259701.707238] Disabling lock debugging due to kernel taint

I assume the above is misbehaviour in the DRM code?

> 
> Also - maybe related, maybe not - I've got three userspace crashes earlier on
> this kernel (but never before):
> 
> [77154.886836] iscons.py[12441]: segfault at 2c ip 00000000080cf0b5 sp
> 00000000f773fb60 error 4 in python[8048000+11a000]
> [77154.898376] Code: 02 0f 84 4a 2e 00 00 8b 4d 08 8b bd 04 ff ff ff 8b 59 38
> 8b 57 20 8b 7b 10 85 ff 0f 84 ee 22 00 00 8b 8d 04 ff ff ff 8b 59 08 <8b> 43 2c
> 85 c0 0f 84 3c e3 ff ff 8b 51 34 8b 71 38 8b 79 3c ff 00
> [119529.983163] in.telnetd[616]: segfault at 0 ip 0000555fdfa09a05 sp
> 00007ffd8fc05380 error 4 in in.telnetd[555fdfa06000+b000]
> [119529.995783] Code: 8d 3d c2 76 00 00 e8 9a 2b 00 00 e9 25 fe ff ff be f2 00
> 00 00 48 8d 3d ac 76 00 00 e8 84 2b 00 00 eb 96 48 8b 05 63 ee 00 00 <0f> b6 00
> e9 fa fe ff ff 44 89 ee 48 8d 3d 8c 76 00 00 e8 64 2b 00
> [120884.183003] iscons.py[10779]: segfault at 2c ip 00000000080d0dc2 sp
> 00000000f7702b60 error 4 in python[8048000+11a000]
> [120884.195182] Code: 4c 24 08 89 44 24 04 89 3c 24 e8 d9 0b 01 00 8b 55 f0 8b
> 7d e8 8b 4d ec 8b 85 04 ff ff ff 89 55 84 89 7d 8c 89 4d 88 8b 50 08 <8b> 7a 2c
> 85 ff 0f 84 3c 10 00 00 8b bd 04 ff ff ff 89 f8 8b 57 34
> 
> I've investigated in.telnetd in detail as I was worried if there is some 0-day
> being used on my system - and as far as I could tell, problem is that part of
> the .bss turned to zeroes after process did fork/exec - there was NULL in the
> variable that cannot have NULL as variable is set to non-NULL value during
> in.telnetd initialization.
> 
> iscons crashes are from NULL pointer dereference too, and iscons does lot of
> fork/exec as well.

hm, that does sound unrelated.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 204407] New: Bad page state in process Xorg
  2019-08-02 20:23 ` [Bug 204407] New: Bad page state in process Xorg Andrew Morton
@ 2019-08-02 20:33   ` Matthew Wilcox
  2019-08-02 21:17     ` Qian Cai
  2019-08-02 21:25   ` Petr Vandrovec
  1 sibling, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2019-08-02 20:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: petr, bugzilla-daemon, Christian Koenig, Huang Rui, David Airlie,
	Daniel Vetter, dri-devel, linux-mm

On Fri, Aug 02, 2019 at 01:23:06PM -0700, Andrew Morton wrote:
> > [259701.387365] BUG: Bad page state in process Xorg  pfn:2a300
> > [259701.393593] page:ffffea0000a8c000 refcount:0 mapcount:-128
> > mapping:0000000000000000 index:0x0

mapcount -128 is PAGE_MAPCOUNT_RESERVE, aka PageBuddy.  I think somebody
called put_page() once more than they should have.  The one before this
caused it to be freed to the page allocator, which set PageBuddy.  Then
this one happened and we got a complaint.

> > [259701.402832] flags: 0x2000000000000000()
> > [259701.407426] raw: 2000000000000000 ffffffff822ab778 ffffea0000a8f208
> > 0000000000000000
> > [259701.415900] raw: 0000000000000000 0000000000000003 00000000ffffff7f
> > 0000000000000000
> > [259701.424373] page dumped because: nonzero mapcount

It occurs to me that when a page is freed, we could record some useful bits
of information in the page from the stack trace to help debug double-free 
situations.  Even just stashing __builtin_return_address in page->mapping
would be helpful, I think.

> > [259701.549382] Call Trace:
> > [259701.549382]  dump_stack+0x46/0x60
> > [259701.549382]  bad_page.cold.28+0x81/0xb4
> > [259701.549382]  __free_pages_ok+0x236/0x240
> > [259701.549382]  __ttm_dma_free_page+0x2f/0x40
> > [259701.549382]  ttm_dma_unpopulate+0x29b/0x370
> > [259701.549382]  ttm_tt_destroy.part.6+0x44/0x50
> > [259701.549382]  ttm_bo_cleanup_memtype_use+0x29/0x70
> > [259701.549382]  ttm_bo_put+0x225/0x280
> > [259701.549382]  ttm_bo_vm_close+0x10/0x20
> > [259701.549382]  remove_vma+0x20/0x40
> > [259701.549382]  __do_munmap+0x2da/0x420
> > [259701.549382]  __vm_munmap+0x66/0xc0
> > [259701.549382]  __x64_sys_munmap+0x22/0x30
> > [259701.549382]  do_syscall_64+0x5e/0x1a0
> > [259701.549382]  ? prepare_exit_to_usermode+0x75/0xa0
> > [259701.549382]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [259701.549382] RIP: 0033:0x7f504d0ec1d7
> > [259701.549382] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 b1 6c 0c 00 f7
> > d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48> 3d 01
> > f0 ff ff 73 01 c3 48 8b 0d 89 6c 0c 00 f7 d8 64 89 01 48
> > [259701.549382] RSP: 002b:00007ffe529db138 EFLAGS: 00000206 ORIG_RAX:
> > 000000000000000b
> > [259701.549382] RAX: ffffffffffffffda RBX: 0000564a5eabce70 RCX:
> > 00007f504d0ec1d7
> > [259701.549382] RDX: 00007ffe529db140 RSI: 0000000000400000 RDI:
> > 00007f5044b65000
> > [259701.549382] RBP: 0000564a5eafe460 R08: 000000000000000b R09:
> > 000000010283e000
> > [259701.549382] R10: 0000000000000001 R11: 0000000000000206 R12:
> > 0000564a5e475b08
> > [259701.549382] R13: 0000564a5e475c80 R14: 00007ffe529db190 R15:
> > 0000000000000c80
> > [259701.707238] Disabling lock debugging due to kernel taint
> 
> I assume the above is misbehaviour in the DRM code?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 204407] New: Bad page state in process Xorg
  2019-08-02 20:33   ` Matthew Wilcox
@ 2019-08-02 21:17     ` Qian Cai
  2019-08-02 22:59       ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Qian Cai @ 2019-08-02 21:17 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton
  Cc: petr, bugzilla-daemon, Christian Koenig, Huang Rui, David Airlie,
	Daniel Vetter, dri-devel, linux-mm

On Fri, 2019-08-02 at 13:33 -0700, Matthew Wilcox wrote:
> On Fri, Aug 02, 2019 at 01:23:06PM -0700, Andrew Morton wrote:
> > > [259701.387365] BUG: Bad page state in process Xorg  pfn:2a300
> > > [259701.393593] page:ffffea0000a8c000 refcount:0 mapcount:-128
> > > mapping:0000000000000000 index:0x0
> 
> mapcount -128 is PAGE_MAPCOUNT_RESERVE, aka PageBuddy.  I think somebody
> called put_page() once more than they should have.  The one before this
> caused it to be freed to the page allocator, which set PageBuddy.  Then
> this one happened and we got a complaint.
> 
> > > [259701.402832] flags: 0x2000000000000000()
> > > [259701.407426] raw: 2000000000000000 ffffffff822ab778 ffffea0000a8f208
> > > 0000000000000000
> > > [259701.415900] raw: 0000000000000000 0000000000000003 00000000ffffff7f
> > > 0000000000000000
> > > [259701.424373] page dumped because: nonzero mapcount
> 
> It occurs to me that when a page is freed, we could record some useful bits
> of information in the page from the stack trace to help debug double-free 
> situations.  Even just stashing __builtin_return_address in page->mapping
> would be helpful, I think.

Sounds like need to enable "page_owner", so it will do  __dump_page_owner().

> 
> > > [259701.549382] Call Trace:
> > > [259701.549382]  dump_stack+0x46/0x60
> > > [259701.549382]  bad_page.cold.28+0x81/0xb4
> > > [259701.549382]  __free_pages_ok+0x236/0x240
> > > [259701.549382]  __ttm_dma_free_page+0x2f/0x40
> > > [259701.549382]  ttm_dma_unpopulate+0x29b/0x370
> > > [259701.549382]  ttm_tt_destroy.part.6+0x44/0x50
> > > [259701.549382]  ttm_bo_cleanup_memtype_use+0x29/0x70
> > > [259701.549382]  ttm_bo_put+0x225/0x280
> > > [259701.549382]  ttm_bo_vm_close+0x10/0x20
> > > [259701.549382]  remove_vma+0x20/0x40
> > > [259701.549382]  __do_munmap+0x2da/0x420
> > > [259701.549382]  __vm_munmap+0x66/0xc0
> > > [259701.549382]  __x64_sys_munmap+0x22/0x30
> > > [259701.549382]  do_syscall_64+0x5e/0x1a0
> > > [259701.549382]  ? prepare_exit_to_usermode+0x75/0xa0
> > > [259701.549382]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > [259701.549382] RIP: 0033:0x7f504d0ec1d7
> > > [259701.549382] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 b1 6c 0c
> > > 00 f7
> > > d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48>
> > > 3d 01
> > > f0 ff ff 73 01 c3 48 8b 0d 89 6c 0c 00 f7 d8 64 89 01 48
> > > [259701.549382] RSP: 002b:00007ffe529db138 EFLAGS: 00000206 ORIG_RAX:
> > > 000000000000000b
> > > [259701.549382] RAX: ffffffffffffffda RBX: 0000564a5eabce70 RCX:
> > > 00007f504d0ec1d7
> > > [259701.549382] RDX: 00007ffe529db140 RSI: 0000000000400000 RDI:
> > > 00007f5044b65000
> > > [259701.549382] RBP: 0000564a5eafe460 R08: 000000000000000b R09:
> > > 000000010283e000
> > > [259701.549382] R10: 0000000000000001 R11: 0000000000000206 R12:
> > > 0000564a5e475b08
> > > [259701.549382] R13: 0000564a5e475c80 R14: 00007ffe529db190 R15:
> > > 0000000000000c80
> > > [259701.707238] Disabling lock debugging due to kernel taint
> > 
> > I assume the above is misbehaviour in the DRM code?
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 204407] New: Bad page state in process Xorg
  2019-08-02 20:23 ` [Bug 204407] New: Bad page state in process Xorg Andrew Morton
  2019-08-02 20:33   ` Matthew Wilcox
@ 2019-08-02 21:25   ` Petr Vandrovec
  1 sibling, 0 replies; 12+ messages in thread
From: Petr Vandrovec @ 2019-08-02 21:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: bugzilla-daemon, Christian Koenig, Huang Rui, David Airlie,
	Daniel Vetter, dri-devel, linux-mm

Andrew Morton wrote on 8/2/2019 1:23 PM:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Thu, 01 Aug 2019 22:34:16 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
>> [259701.549382] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 b1 6c 0c 00 f7
>> d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48> 3d 01
>> f0 ff ff 73 01 c3 48 8b 0d 89 6c 0c 00 f7 d8 64 89 01 48
>> [259701.549382] RSP: 002b:00007ffe529db138 EFLAGS: 00000206 ORIG_RAX:
>> 000000000000000b
>> [259701.549382] RAX: ffffffffffffffda RBX: 0000564a5eabce70 RCX:
>> 00007f504d0ec1d7
>> [259701.549382] RDX: 00007ffe529db140 RSI: 0000000000400000 RDI:
>> 00007f5044b65000
>> [259701.549382] RBP: 0000564a5eafe460 R08: 000000000000000b R09:
>> 000000010283e000
>> [259701.549382] R10: 0000000000000001 R11: 0000000000000206 R12:
>> 0000564a5e475b08
>> [259701.549382] R13: 0000564a5e475c80 R14: 00007ffe529db190 R15:
>> 0000000000000c80
>> [259701.707238] Disabling lock debugging due to kernel taint
> 
> I assume the above is misbehaviour in the DRM code?

Most probably.

When I switched back to 5.2, crashes were gone, but log was filled with 
11 millions of messages complaining about alloc_contig_range problems:

[22042.108043] alloc_contig_range: [106f118, 106f119) PFNs busy
[22042.114400] alloc_contig_range: [106f11c, 106f11d) PFNs busy
[22042.120787] alloc_contig_range: [106f11d, 106f11e) PFNs busy
[22047.093057] alloc_contig_range: 47963 callbacks suppressed
[22047.093058] alloc_contig_range: [106f117, 106f118) PFNs busy
[22047.105576] alloc_contig_range: [106f118, 106f119) PFNs busy
[22047.111937] alloc_contig_range: [106f11c, 106f11d) PFNs busy
[22047.118329] alloc_contig_range: [106f11d, 106f11e) PFNs busy

In total there is 9735 messages logged individually, and 11 million 
suppressed:

petr-dev3:~$ dmesg | grep alloc_contig.*PFNs | wc -l
9735
petr-dev3:~$ expr `echo \`dmesg | grep alloc_contig.*callbacks | cut -d' 
' -f3\` | sed -e 's/ / + /'g`
11333722

So it could be my problems are caused by new Xorg driver.

After I disabled CMA system is stable on 5.2.  I did not try 5.3-rc2 yet.

Petr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 204407] New: Bad page state in process Xorg
  2019-08-02 21:17     ` Qian Cai
@ 2019-08-02 22:59       ` Matthew Wilcox
  2019-08-02 23:29         ` Petr Vandrovec
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2019-08-02 22:59 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, petr, bugzilla-daemon, Christian Koenig,
	Huang Rui, David Airlie, Daniel Vetter, dri-devel, linux-mm

On Fri, Aug 02, 2019 at 05:17:30PM -0400, Qian Cai wrote:
> On Fri, 2019-08-02 at 13:33 -0700, Matthew Wilcox wrote:
> > It occurs to me that when a page is freed, we could record some useful bits
> > of information in the page from the stack trace to help debug double-free 
> > situations.  Even just stashing __builtin_return_address in page->mapping
> > would be helpful, I think.
> 
> Sounds like need to enable "page_owner", so it will do  __dump_page_owner().

That doesn't help because we call reset_page_owner() in the free page path.

We could turn on tracing because we call trace_mm_page_free() in this
path.  That requires the reporter to be able to reproduce the problem,
and it's not clear to me whether this is a "happened once" or "every
time I do this, it happens" problem.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 204407] New: Bad page state in process Xorg
  2019-08-02 22:59       ` Matthew Wilcox
@ 2019-08-02 23:29         ` Petr Vandrovec
  2019-08-15 14:32           ` Vlastimil Babka
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Vandrovec @ 2019-08-02 23:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Qian Cai, Andrew Morton, bugzilla-daemon, Christian Koenig,
	Huang Rui, David Airlie, Daniel Vetter, dri-devel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 683 bytes --]

On Fri, Aug 2, 2019, 3:59 PM Matthew Wilcox <willy@infradead.org> wrote:

> That doesn't help because we call reset_page_owner() in the free page path.
>
> We could turn on tracing because we call trace_mm_page_free() in this
> path.  That requires the reporter to be able to reproduce the problem,
> and it's not clear to me whether this is a "happened once" or "every
> time I do this, it happens" problem.
>

It happened on 3 of the boots with that kernel.  4th time box either
spontaneously rebooted when X started, or watchdog restarted box shortly
after starting X server.

So I believe I should be able to reproduce it with additional patches or
extra flags enabled.

Petr

>

[-- Attachment #2: Type: text/html, Size: 1299 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 204407] New: Bad page state in process Xorg
  2019-08-02 23:29         ` Petr Vandrovec
@ 2019-08-15 14:32           ` Vlastimil Babka
  2019-08-15 19:13             ` Petr Vandrovec
  0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2019-08-15 14:32 UTC (permalink / raw)
  To: Petr Vandrovec, Matthew Wilcox
  Cc: Qian Cai, Andrew Morton, bugzilla-daemon, Christian Koenig,
	Huang Rui, David Airlie, Daniel Vetter, dri-devel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1156 bytes --]

On 8/3/19 1:29 AM, Petr Vandrovec wrote:
> On Fri, Aug 2, 2019, 3:59 PM Matthew Wilcox <willy@infradead.org
> <mailto:willy@infradead.org>> wrote:
> 
>     That doesn't help because we call reset_page_owner() in the free
>     page path.
> 
>     We could turn on tracing because we call trace_mm_page_free() in this
>     path.  That requires the reporter to be able to reproduce the problem,
>     and it's not clear to me whether this is a "happened once" or "every
>     time I do this, it happens" problem.
> 
> 
> It happened on 3 of the boots with that kernel.  4th time box either
> spontaneously rebooted when X started, or watchdog restarted box shortly
> after starting X server.
> 
> So I believe I should be able to reproduce it with additional patches or
> extra flags enabled.

Does the issue still happen with rc4? Could you apply the 3 attached
patches (work in progress), configure-enable CONFIG_DEBUG_PAGEALLOC and
CONFIG_PAGE_OWNER and boot kernel with debug_pagealloc=on page_owner=on
parameters? That should print stacktraces of allocation and first
freeing (assuming this is a double free).

Vlastimil


[-- Attachment #2: 0001-mm-page_owner-record-page-owner-for-each-subpage.patch --]
[-- Type: text/x-patch, Size: 3078 bytes --]

From 5b4c46cb1d7a8bca3e8d98433b19e60b28fb5796 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Thu, 15 Aug 2019 14:06:50 +0200
Subject: [PATCH 1/3] mm, page_owner: record page owner for each subpage

Currently, page owner info is only recorded for the first page of a high-order
allocation, and copied to tail pages in the event of a split page. With the
plan to keep previous owner info after freeing the page, it would be benefical
to record page owner for each subpage upon allocation. This increases the
overhead for high orders, but that should be acceptable for a debugging option.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_owner.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index addcbb2ae4e4..695cd5fdf7fd 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -154,18 +154,23 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
 	return handle;
 }
 
-static inline void __set_page_owner_handle(struct page_ext *page_ext,
-	depot_stack_handle_t handle, unsigned int order, gfp_t gfp_mask)
+static inline void __set_page_owner_handle(struct page *page,
+	struct page_ext *page_ext, depot_stack_handle_t handle,
+	unsigned int order, gfp_t gfp_mask)
 {
 	struct page_owner *page_owner;
+	int i;
 
-	page_owner = get_page_owner(page_ext);
-	page_owner->handle = handle;
-	page_owner->order = order;
-	page_owner->gfp_mask = gfp_mask;
-	page_owner->last_migrate_reason = -1;
+	for (i = 0; i < (1 << order); i++) {
+		page_owner = get_page_owner(page_ext);
+		page_owner->handle = handle;
+		page_owner->order = order;
+		page_owner->gfp_mask = gfp_mask;
+		page_owner->last_migrate_reason = -1;
+		__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
 
-	__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
+		page_ext = lookup_page_ext(page + i);
+	}
 }
 
 noinline void __set_page_owner(struct page *page, unsigned int order,
@@ -178,7 +183,7 @@ noinline void __set_page_owner(struct page *page, unsigned int order,
 		return;
 
 	handle = save_stack(gfp_mask);
-	__set_page_owner_handle(page_ext, handle, order, gfp_mask);
+	__set_page_owner_handle(page, page_ext, handle, order, gfp_mask);
 }
 
 void __set_page_owner_migrate_reason(struct page *page, int reason)
@@ -204,8 +209,11 @@ void __split_page_owner(struct page *page, unsigned int order)
 
 	page_owner = get_page_owner(page_ext);
 	page_owner->order = 0;
-	for (i = 1; i < (1 << order); i++)
-		__copy_page_owner(page, page + i);
+	for (i = 1; i < (1 << order); i++) {
+		page_ext = lookup_page_ext(page + i);
+		page_owner = get_page_owner(page_ext);
+		page_owner->order = 0;
+	}
 }
 
 void __copy_page_owner(struct page *oldpage, struct page *newpage)
@@ -562,7 +570,8 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
 				continue;
 
 			/* Found early allocated page */
-			__set_page_owner_handle(page_ext, early_handle, 0, 0);
+			__set_page_owner_handle(page, page_ext, early_handle,
+						0, 0);
 			count++;
 		}
 		cond_resched();
-- 
2.22.0


[-- Attachment #3: 0002-mm-page_owner-keep-owner-info-when-freeing-the-page.patch --]
[-- Type: text/x-patch, Size: 5531 bytes --]

From 063f233cc154819bb39a005726260496eea12265 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Thu, 15 Aug 2019 14:48:54 +0200
Subject: [PATCH 2/3] mm, page_owner: keep owner info when freeing the page

For debugging purposes it might be useful to keep the owner info even after
page has been freed and include it in e.g. dump_page() when detecting a bad
page state. For that, change the PAGE_EXT_OWNER flag meaning to "page owner
info has been set at least once" and add new PAGE_EXT_OWNER_ACTIVE for tracking
whether page is supposed to be currently allocated or free. Adjust dump_page()
and page_owner file printing accordingly.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/page_ext.h |  1 +
 mm/page_owner.c          | 38 +++++++++++++++++++++++++-------------
 2 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index 09592951725c..682fd465df06 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -18,6 +18,7 @@ struct page_ext_operations {
 
 enum page_ext_flags {
 	PAGE_EXT_OWNER,
+	PAGE_EXT_OWNER_ACTIVE,
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	PAGE_EXT_YOUNG,
 	PAGE_EXT_IDLE,
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 695cd5fdf7fd..c4c33d569f4d 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -111,7 +111,7 @@ void __reset_page_owner(struct page *page, unsigned int order)
 		page_ext = lookup_page_ext(page + i);
 		if (unlikely(!page_ext))
 			continue;
-		__clear_bit(PAGE_EXT_OWNER, &page_ext->flags);
+		__clear_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags);
 	}
 }
 
@@ -168,6 +168,7 @@ static inline void __set_page_owner_handle(struct page *page,
 		page_owner->gfp_mask = gfp_mask;
 		page_owner->last_migrate_reason = -1;
 		__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
+		__set_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags);
 
 		page_ext = lookup_page_ext(page + i);
 	}
@@ -243,6 +244,7 @@ void __copy_page_owner(struct page *oldpage, struct page *newpage)
 	 * the new page, which will be freed.
 	 */
 	__set_bit(PAGE_EXT_OWNER, &new_ext->flags);
+	__set_bit(PAGE_EXT_OWNER_ACTIVE, &new_ext->flags);
 }
 
 void pagetypeinfo_showmixedcount_print(struct seq_file *m,
@@ -302,7 +304,7 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
 			if (unlikely(!page_ext))
 				continue;
 
-			if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags))
+			if (!test_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags))
 				continue;
 
 			page_owner = get_page_owner(page_ext);
@@ -331,7 +333,7 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
 static ssize_t
 print_page_owner(char __user *buf, size_t count, unsigned long pfn,
 		struct page *page, struct page_owner *page_owner,
-		depot_stack_handle_t handle)
+		depot_stack_handle_t handle, bool page_active)
 {
 	int ret, pageblock_mt, page_mt;
 	unsigned long *entries;
@@ -344,7 +346,9 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
 		return -ENOMEM;
 
 	ret = snprintf(kbuf, count,
-			"Page allocated via order %u, mask %#x(%pGg)\n",
+			"%s via order %u, mask %#x(%pGg)\n",
+			page_active ? "Page allocated" :
+				      "Free page previously allocated",
 			page_owner->order, page_owner->gfp_mask,
 			&page_owner->gfp_mask);
 
@@ -413,21 +417,26 @@ void __dump_page_owner(struct page *page)
 	mt = gfpflags_to_migratetype(gfp_mask);
 
 	if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags)) {
-		pr_alert("page_owner info is not active (free page?)\n");
+		pr_alert("page_owner info is not present (never set?)\n");
 		return;
 	}
 
+	if (test_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags))
+		pr_alert("page_owner tracks the page as allocated\n");
+	else
+		pr_alert("page_owner tracks the page as freed\n");
+
+	pr_alert("page last allocated via order %u, migratetype %s, gfp_mask %#x(%pGg)\n",
+		 page_owner->order, migratetype_names[mt], gfp_mask, &gfp_mask);
+
 	handle = READ_ONCE(page_owner->handle);
 	if (!handle) {
-		pr_alert("page_owner info is not active (free page?)\n");
-		return;
+		pr_alert("page_owner allocation stack trace missing\n");
+	} else {
+		nr_entries = stack_depot_fetch(handle, &entries);
+		stack_trace_print(entries, nr_entries, 0);
 	}
 
-	nr_entries = stack_depot_fetch(handle, &entries);
-	pr_alert("page allocated via order %u, migratetype %s, gfp_mask %#x(%pGg)\n",
-		 page_owner->order, migratetype_names[mt], gfp_mask, &gfp_mask);
-	stack_trace_print(entries, nr_entries, 0);
-
 	if (page_owner->last_migrate_reason != -1)
 		pr_alert("page has been migrated, last migrate reason: %s\n",
 			migrate_reason_names[page_owner->last_migrate_reason]);
@@ -441,6 +450,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
 	struct page_ext *page_ext;
 	struct page_owner *page_owner;
 	depot_stack_handle_t handle;
+	bool page_active;
 
 	if (!static_branch_unlikely(&page_owner_inited))
 		return -EINVAL;
@@ -489,6 +499,8 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
 		if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags))
 			continue;
 
+		page_active = test_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags);
+
 		page_owner = get_page_owner(page_ext);
 
 		/*
@@ -503,7 +515,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
 		*ppos = (pfn - min_low_pfn) + 1;
 
 		return print_page_owner(buf, count, pfn, page,
-				page_owner, handle);
+				page_owner, handle, page_active);
 	}
 
 	return 0;
-- 
2.22.0


[-- Attachment #4: 0003-mm-page_owner-debug_pagealloc-save-freeing-stack-tra.patch --]
[-- Type: text/x-patch, Size: 2871 bytes --]

From 4df89531903b8f0a056d3ec7f21825d0a22df355 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Thu, 15 Aug 2019 16:28:14 +0200
Subject: [PATCH 3/3] mm, page_owner, debug_pagealloc: save freeing stack trace

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_owner.c | 53 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 40 insertions(+), 13 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index c4c33d569f4d..f2eccea86210 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -24,6 +24,9 @@ struct page_owner {
 	short last_migrate_reason;
 	gfp_t gfp_mask;
 	depot_stack_handle_t handle;
+#ifdef CONFIG_DEBUG_PAGEALLOC
+	depot_stack_handle_t free_handle;
+#endif
 };
 
 static bool page_owner_disabled = true;
@@ -102,19 +105,6 @@ static inline struct page_owner *get_page_owner(struct page_ext *page_ext)
 	return (void *)page_ext + page_owner_ops.offset;
 }
 
-void __reset_page_owner(struct page *page, unsigned int order)
-{
-	int i;
-	struct page_ext *page_ext;
-
-	for (i = 0; i < (1 << order); i++) {
-		page_ext = lookup_page_ext(page + i);
-		if (unlikely(!page_ext))
-			continue;
-		__clear_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags);
-	}
-}
-
 static inline bool check_recursive_alloc(unsigned long *entries,
 					 unsigned int nr_entries,
 					 unsigned long ip)
@@ -154,6 +144,32 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
 	return handle;
 }
 
+void __reset_page_owner(struct page *page, unsigned int order)
+{
+	int i;
+	struct page_ext *page_ext;
+#ifdef CONFIG_DEBUG_PAGEALLOC
+	depot_stack_handle_t handle = 0;
+	struct page_owner *page_owner;
+
+	if (debug_pagealloc_enabled())
+		handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
+#endif
+
+	for (i = 0; i < (1 << order); i++) {
+		page_ext = lookup_page_ext(page + i);
+		if (unlikely(!page_ext))
+			continue;
+		__clear_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags);
+#ifdef CONFIG_DEBUG_PAGEALLOC
+		if (debug_pagealloc_enabled()) {
+			page_owner = get_page_owner(page_ext);
+			page_owner->free_handle = handle;
+		}
+#endif
+	}
+}
+
 static inline void __set_page_owner_handle(struct page *page,
 	struct page_ext *page_ext, depot_stack_handle_t handle,
 	unsigned int order, gfp_t gfp_mask)
@@ -437,6 +453,17 @@ void __dump_page_owner(struct page *page)
 		stack_trace_print(entries, nr_entries, 0);
 	}
 
+#ifdef CONFIG_DEBUG_PAGEALLOC
+	handle = READ_ONCE(page_owner->free_handle);
+	if (!handle) {
+		pr_alert("page_owner free stack trace missing\n");
+	} else {
+		nr_entries = stack_depot_fetch(handle, &entries);
+		pr_alert("page last free stack trace:\n");
+		stack_trace_print(entries, nr_entries, 0);
+	}
+#endif
+
 	if (page_owner->last_migrate_reason != -1)
 		pr_alert("page has been migrated, last migrate reason: %s\n",
 			migrate_reason_names[page_owner->last_migrate_reason]);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Bug 204407] New: Bad page state in process Xorg
  2019-08-15 14:32           ` Vlastimil Babka
@ 2019-08-15 19:13             ` Petr Vandrovec
  2019-08-16 12:47               ` Vlastimil Babka
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Vandrovec @ 2019-08-15 19:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Petr Vandrovec, Matthew Wilcox, Qian Cai, Andrew Morton,
	bugzilla-daemon, Christian Koenig, Huang Rui, David Airlie,
	Daniel Vetter, dri-devel, linux-mm

Vlastimil Babka wrote on 8/15/2019 7:32 AM:
>
> Does the issue still happen with rc4? Could you apply the 3 attached
> patches (work in progress), configure-enable CONFIG_DEBUG_PAGEALLOC and
> CONFIG_PAGE_OWNER and boot kernel with debug_pagealloc=on page_owner=on
> parameters? That should print stacktraces of allocation and first
> freeing (assuming this is a double free).

Unfortunately -rc4 does not find any my SATA disks due to some 
misunderstanding between AHCI driver and HPT642L adapter (there is no 
device 07:00.1, HPT is single-function device at 07:00.0):

[   18.003015] scsi host6: ahci
[   18.006605] DMAR: DRHD: handling fault status reg 2
[   18.006619] DMAR: [DMA Write] Request device [07:00.1] fault addr 
fffe0000 [fault reason 02] Present bit in context entry is clear
[   18.076616] DMAR: DRHD: handling fault status reg 102
[   18.085910] DMAR: [DMA Write] Request device [07:00.1] fault addr 
fffa0000 [fault reason 02] Present bit in context entry is clear
[   18.100989] DMAR: DRHD: handling fault status reg 202
[   18.110985] DMAR: [DMA Write] Request device [07:00.1] fault addr 
fffe0000 [fault reason 02] Present bit in context entry is clear

With iommu=off disks are visible, but USB keyboard (and other USB 
devices) does not work:

[   18.174802] ehci-pci 0000:00:1a.0: swiotlb buffer is full (sz: 8 
bytes), total 0 (slots), used 0 (slots)
[   18.174804] ehci-pci 0000:00:1a.0: overflow 0x0000000ffdc75ae8+8 of 
DMA mask ffffffff bus mask 0
[   18.174815] WARNING: CPU: 2 PID: 508 at kernel/dma/direct.c:35 
report_addr+0x2e/0x50
[   18.174816] Modules linked in:
[   18.174818] CPU: 2 PID: 508 Comm: kworker/2:1 Tainted: G 
   T 5.3.0-rc4-64-00058-gd717b092e0b2 #77
[   18.174819] Hardware name: Dell Inc. Precision T3610/09M8Y8, BIOS A16 
02/05/2018
[   18.174822] Workqueue: usb_hub_wq hub_event

I'll try to find -rc4 configuration that has enabled debugging and can boot. 


Petr



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 204407] New: Bad page state in process Xorg
  2019-08-15 19:13             ` Petr Vandrovec
@ 2019-08-16 12:47               ` Vlastimil Babka
  2019-08-16 12:52                 ` Joerg Roedel
  2019-08-17  0:20                 ` Petr Vandrovec
  0 siblings, 2 replies; 12+ messages in thread
From: Vlastimil Babka @ 2019-08-16 12:47 UTC (permalink / raw)
  To: Petr Vandrovec
  Cc: Matthew Wilcox, Qian Cai, Andrew Morton, bugzilla-daemon,
	Christian Koenig, Huang Rui, David Airlie, Daniel Vetter,
	dri-devel, linux-mm, Joerg Roedel

On 8/15/19 9:13 PM, Petr Vandrovec wrote:
> Vlastimil Babka wrote on 8/15/2019 7:32 AM:
>>
>> Does the issue still happen with rc4? Could you apply the 3 attached
>> patches (work in progress), configure-enable CONFIG_DEBUG_PAGEALLOC and
>> CONFIG_PAGE_OWNER and boot kernel with debug_pagealloc=on page_owner=on
>> parameters? That should print stacktraces of allocation and first
>> freeing (assuming this is a double free).
> 
> Unfortunately -rc4 does not find any my SATA disks due to some 
> misunderstanding between AHCI driver and HPT642L adapter (there is no 
> device 07:00.1, HPT is single-function device at 07:00.0):
> 
> [   18.003015] scsi host6: ahci
> [   18.006605] DMAR: DRHD: handling fault status reg 2
> [   18.006619] DMAR: [DMA Write] Request device [07:00.1] fault addr 
> fffe0000 [fault reason 02] Present bit in context entry is clear
> [   18.076616] DMAR: DRHD: handling fault status reg 102
> [   18.085910] DMAR: [DMA Write] Request device [07:00.1] fault addr 
> fffa0000 [fault reason 02] Present bit in context entry is clear
> [   18.100989] DMAR: DRHD: handling fault status reg 202
> [   18.110985] DMAR: [DMA Write] Request device [07:00.1] fault addr 
> fffe0000 [fault reason 02] Present bit in context entry is clear

Worth reporting as well, not nice regression.

> With iommu=off disks are visible, but USB keyboard (and other USB 
> devices) does not work:

I've been told iommu=soft might help.

> [   18.174802] ehci-pci 0000:00:1a.0: swiotlb buffer is full (sz: 8 
> bytes), total 0 (slots), used 0 (slots)
> [   18.174804] ehci-pci 0000:00:1a.0: overflow 0x0000000ffdc75ae8+8 of 
> DMA mask ffffffff bus mask 0
> [   18.174815] WARNING: CPU: 2 PID: 508 at kernel/dma/direct.c:35 
> report_addr+0x2e/0x50
> [   18.174816] Modules linked in:
> [   18.174818] CPU: 2 PID: 508 Comm: kworker/2:1 Tainted: G 
>    T 5.3.0-rc4-64-00058-gd717b092e0b2 #77
> [   18.174819] Hardware name: Dell Inc. Precision T3610/09M8Y8, BIOS A16 
> 02/05/2018
> [   18.174822] Workqueue: usb_hub_wq hub_event
> 
> I'll try to find -rc4 configuration that has enabled debugging and can boot. 
> 
> 
> Petr
> 
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 204407] New: Bad page state in process Xorg
  2019-08-16 12:47               ` Vlastimil Babka
@ 2019-08-16 12:52                 ` Joerg Roedel
  2019-08-17  0:20                 ` Petr Vandrovec
  1 sibling, 0 replies; 12+ messages in thread
From: Joerg Roedel @ 2019-08-16 12:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Petr Vandrovec, Matthew Wilcox, Qian Cai, Andrew Morton,
	bugzilla-daemon, Christian Koenig, Huang Rui, David Airlie,
	Daniel Vetter, dri-devel, linux-mm

On Fri, Aug 16, 2019 at 02:47:53PM +0200, Vlastimil Babka wrote:
> On 8/15/19 9:13 PM, Petr Vandrovec wrote:
> > [   18.110985] DMAR: [DMA Write] Request device [07:00.1] fault addr 
> > fffe0000 [fault reason 02] Present bit in context entry is clear
> 
> Worth reporting as well, not nice regression.

Is that a regression between 5.3-rc3 and 5.3-rc4 or is it already broken
since -rc1? The 5.3-rc5 kernel will contains some VT-d fixes that are
worth a try here too. If you can test latest linus/master branch that
would be great, otherwise -rc5 is fine too.


Regards,

	Joerg


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 204407] New: Bad page state in process Xorg
  2019-08-16 12:47               ` Vlastimil Babka
  2019-08-16 12:52                 ` Joerg Roedel
@ 2019-08-17  0:20                 ` Petr Vandrovec
  2019-08-19 14:44                   ` Vlastimil Babka
  1 sibling, 1 reply; 12+ messages in thread
From: Petr Vandrovec @ 2019-08-17  0:20 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Matthew Wilcox, Qian Cai, Andrew Morton, bugzilla-daemon,
	Christian Koenig, Huang Rui, David Airlie, Daniel Vetter,
	dri-devel, linux-mm, Joerg Roedel

Vlastimil Babka wrote on 8/16/2019 5:47 AM:
> On 8/15/19 9:13 PM, Petr Vandrovec wrote:
>> With iommu=off disks are visible, but USB keyboard (and other USB
>> devices) does not work:
> 
> I've been told iommu=soft might help.

Thanks.  I've rebuilt kernel without IOMMU.

Unfortunately I was not able to reproduce original problem with latest 
kernel - neither with CMA nor without CMA.  System seems stable as a rock.

This is the change on which I've tried to reproduce it with your 
debugging patches:

commit 41de59634046b19cd53a1983594a95135c656997 (HEAD -> master, 
origin/master, origin/HEAD)
Merge: e22a97a2a85d 1ee1119d184b
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Aug 14 15:29:53 2019 -0700

     Merge tag 'Wimplicit-fallthrough-5.3-rc5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

						Petr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 204407] New: Bad page state in process Xorg
  2019-08-17  0:20                 ` Petr Vandrovec
@ 2019-08-19 14:44                   ` Vlastimil Babka
  0 siblings, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2019-08-19 14:44 UTC (permalink / raw)
  To: Petr Vandrovec
  Cc: Matthew Wilcox, Qian Cai, Andrew Morton, bugzilla-daemon,
	Christian Koenig, Huang Rui, David Airlie, Daniel Vetter,
	dri-devel, linux-mm, Joerg Roedel

On 8/17/19 2:20 AM, Petr Vandrovec wrote:
> Vlastimil Babka wrote on 8/16/2019 5:47 AM:
>> On 8/15/19 9:13 PM, Petr Vandrovec wrote:
>>> With iommu=off disks are visible, but USB keyboard (and other USB
>>> devices) does not work:
>>
>> I've been told iommu=soft might help.
> 
> Thanks.  I've rebuilt kernel without IOMMU.
> 
> Unfortunately I was not able to reproduce original problem with latest 
> kernel - neither with CMA nor without CMA.  System seems stable as a rock.

That's rather fortunate I would say :) But if you have any unrelated
regressions in 5.3, please report them accordingly.

> This is the change on which I've tried to reproduce it with your 
> debugging patches:
> 
> commit 41de59634046b19cd53a1983594a95135c656997 (HEAD -> master, 
> origin/master, origin/HEAD)
> Merge: e22a97a2a85d 1ee1119d184b
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Wed Aug 14 15:29:53 2019 -0700
> 
>      Merge tag 'Wimplicit-fallthrough-5.3-rc5' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
> 
> 						Petr
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-08-19 14:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-204407-27@https.bugzilla.kernel.org/>
2019-08-02 20:23 ` [Bug 204407] New: Bad page state in process Xorg Andrew Morton
2019-08-02 20:33   ` Matthew Wilcox
2019-08-02 21:17     ` Qian Cai
2019-08-02 22:59       ` Matthew Wilcox
2019-08-02 23:29         ` Petr Vandrovec
2019-08-15 14:32           ` Vlastimil Babka
2019-08-15 19:13             ` Petr Vandrovec
2019-08-16 12:47               ` Vlastimil Babka
2019-08-16 12:52                 ` Joerg Roedel
2019-08-17  0:20                 ` Petr Vandrovec
2019-08-19 14:44                   ` Vlastimil Babka
2019-08-02 21:25   ` Petr Vandrovec

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).