All of lore.kernel.org
 help / color / mirror / Atom feed
* advice on bad_page instance
@ 2015-04-14 18:36 Luigi Semenzato
  2015-04-15  7:16 ` Minchan Kim
  0 siblings, 1 reply; 5+ messages in thread
From: Luigi Semenzato @ 2015-04-14 18:36 UTC (permalink / raw)
  To: Linux Memory Management List; +Cc: Minchan Kim

We are seeing several instances of these things (often with different
but plausible values in the struct page) in kernel 3.8.11, followed by
a panic() in release_pages a few seconds later.

I realize it's an old kernel and probably of little interest here, but
I would be most grateful for any pointers on how to proceed.  In
particular, I suspect that many such bugs may have been fixed by now,
but I am not sure how to find the right fix (which I would backport).

Also, this happens under heavy swap, and we're using zram.  I wonder
if there may be a race condition related to zram which may have been
fixed since then, and which may result in these symptoms.

Many thanks for any pointer!

Luigi

<1>[ 5392.106074] BUG: Bad page state in process CompositorTileW  pfn:57a7e
<1>[ 5392.106109] page:ffffea00015e9f80 count:0 mapcount:0 mapping:
      (null) index:0x2
<1>[ 5392.106122] page flags: 0x4000000000000004(referenced)
<5>[ 5392.106139] Modules linked in: i2c_dev uinput
snd_hda_codec_realtek memconsole snd_hda_codec_hdmi uvcvideo
videobuf2_vmalloc videobuf2_memops videobuf2_core videodev
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer
zram(C) lzo_compress zsmalloc(C) fuse nf_conntrack_ipv6 nf_defrag_ipv6
ip6table_filter ip6_tables ath9k_btcoex ath9k_common_btcoex
ath9k_hw_btcoex ath mac80211 cfg80211 option usb_wwan cdc_ether usbnet
ath3k btusb bluetooth joydev ppp_async ppp_generic slhc tun
<5>[ 5392.106333] Pid: 27363, comm: CompositorTileW Tainted: G    B
C   3.8.11 #1
<5>[ 5392.106344] Call Trace:
<5>[ 5392.106357]  [<ffffffff978ba5bb>] bad_page+0xcf/0xe3
<5>[ 5392.106370]  [<ffffffff978bb181>] get_page_from_freelist+0x21a/0x46c
<5>[ 5392.106383]  [<ffffffff978beb74>] ? release_pages+0x19b/0x1be
<5>[ 5392.106394]  [<ffffffff978bb5da>] __alloc_pages_nodemask+0x207/0x685
<5>[ 5392.106407]  [<ffffffff97cb8caf>] ? _cond_resched+0xe/0x1e
<5>[ 5392.106421]  [<ffffffff978d215a>] handle_pte_fault+0x305/0x500
<5>[ 5392.106433]  [<ffffffff978d4f5e>] ? __vma_link_file+0x65/0x67
<5>[ 5392.106445]  [<ffffffff978d30d0>] handle_mm_fault+0x97/0xbb
<5>[ 5392.106459]  [<ffffffff97828616>] __do_page_fault+0x1d4/0x38c
<5>[ 5392.106470]  [<ffffffff978d7803>] ? do_mmap_pgoff+0x284/0x2c0
<5>[ 5392.106482]  [<ffffffff978ca82c>] ? vm_mmap_pgoff+0x7d/0x8e
<5>[ 5392.106495]  [<ffffffff97828800>] do_page_fault+0xe/0x10
<5>[ 5392.106506]  [<ffffffff97cb9d32>] page_fault+0x22/0x30

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: advice on bad_page instance
  2015-04-14 18:36 advice on bad_page instance Luigi Semenzato
@ 2015-04-15  7:16 ` Minchan Kim
  2015-04-15  8:05   ` Sergey Senozhatsky
  2015-04-15  8:22   ` Sergey Senozhatsky
  0 siblings, 2 replies; 5+ messages in thread
From: Minchan Kim @ 2015-04-15  7:16 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Linux Memory Management List

Hello Luigi,

On Tue, Apr 14, 2015 at 11:36:57AM -0700, Luigi Semenzato wrote:
> We are seeing several instances of these things (often with different
> but plausible values in the struct page) in kernel 3.8.11, followed by
> a panic() in release_pages a few seconds later.
> 
> I realize it's an old kernel and probably of little interest here, but
> I would be most grateful for any pointers on how to proceed.  In
> particular, I suspect that many such bugs may have been fixed by now,
> but I am not sure how to find the right fix (which I would backport).
> 
> Also, this happens under heavy swap, and we're using zram.  I wonder
> if there may be a race condition related to zram which may have been
> fixed since then, and which may result in these symptoms.

I didn't see such bug until now. Sorry. However, I might miss something
because zram has changed a lot since then.
What I recommend is just to use recent zram/zsmalloc.
I think it's not hard to backport it because they are almost isolated
from other parts in kernel.
If you don't see any problem any more with recent zram, yay, your
system doesn't have any problem. But if you see the problem still,
it means you should suspect another stuffs as culprits as well as
zram.

Thanks.

> 
> Many thanks for any pointer!
> 
> Luigi
> 
> <1>[ 5392.106074] BUG: Bad page state in process CompositorTileW  pfn:57a7e
> <1>[ 5392.106109] page:ffffea00015e9f80 count:0 mapcount:0 mapping:
>       (null) index:0x2
> <1>[ 5392.106122] page flags: 0x4000000000000004(referenced)
> <5>[ 5392.106139] Modules linked in: i2c_dev uinput
> snd_hda_codec_realtek memconsole snd_hda_codec_hdmi uvcvideo
> videobuf2_vmalloc videobuf2_memops videobuf2_core videodev
> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer
> zram(C) lzo_compress zsmalloc(C) fuse nf_conntrack_ipv6 nf_defrag_ipv6
> ip6table_filter ip6_tables ath9k_btcoex ath9k_common_btcoex
> ath9k_hw_btcoex ath mac80211 cfg80211 option usb_wwan cdc_ether usbnet
> ath3k btusb bluetooth joydev ppp_async ppp_generic slhc tun
> <5>[ 5392.106333] Pid: 27363, comm: CompositorTileW Tainted: G    B
> C   3.8.11 #1
> <5>[ 5392.106344] Call Trace:
> <5>[ 5392.106357]  [<ffffffff978ba5bb>] bad_page+0xcf/0xe3
> <5>[ 5392.106370]  [<ffffffff978bb181>] get_page_from_freelist+0x21a/0x46c
> <5>[ 5392.106383]  [<ffffffff978beb74>] ? release_pages+0x19b/0x1be
> <5>[ 5392.106394]  [<ffffffff978bb5da>] __alloc_pages_nodemask+0x207/0x685
> <5>[ 5392.106407]  [<ffffffff97cb8caf>] ? _cond_resched+0xe/0x1e
> <5>[ 5392.106421]  [<ffffffff978d215a>] handle_pte_fault+0x305/0x500
> <5>[ 5392.106433]  [<ffffffff978d4f5e>] ? __vma_link_file+0x65/0x67
> <5>[ 5392.106445]  [<ffffffff978d30d0>] handle_mm_fault+0x97/0xbb
> <5>[ 5392.106459]  [<ffffffff97828616>] __do_page_fault+0x1d4/0x38c
> <5>[ 5392.106470]  [<ffffffff978d7803>] ? do_mmap_pgoff+0x284/0x2c0
> <5>[ 5392.106482]  [<ffffffff978ca82c>] ? vm_mmap_pgoff+0x7d/0x8e
> <5>[ 5392.106495]  [<ffffffff97828800>] do_page_fault+0xe/0x10
> <5>[ 5392.106506]  [<ffffffff97cb9d32>] page_fault+0x22/0x30

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: advice on bad_page instance
  2015-04-15  7:16 ` Minchan Kim
@ 2015-04-15  8:05   ` Sergey Senozhatsky
  2015-04-15  8:22   ` Sergey Senozhatsky
  1 sibling, 0 replies; 5+ messages in thread
From: Sergey Senozhatsky @ 2015-04-15  8:05 UTC (permalink / raw)
  To: Luigi Semenzato
  Cc: Minchan Kim, Linux Memory Management List, sergey.senozhatsky

On (04/15/15 16:16), Minchan Kim wrote:
> On Tue, Apr 14, 2015 at 11:36:57AM -0700, Luigi Semenzato wrote:
> > We are seeing several instances of these things (often with different
> > but plausible values in the struct page) in kernel 3.8.11, followed by
> > a panic() in release_pages a few seconds later.
> > 
> > I realize it's an old kernel and probably of little interest here, but
> > I would be most grateful for any pointers on how to proceed.  In
> > particular, I suspect that many such bugs may have been fixed by now,
> > but I am not sure how to find the right fix (which I would backport).
> > 
> > Also, this happens under heavy swap, and we're using zram.  I wonder
> > if there may be a race condition related to zram which may have been
> > fixed since then, and which may result in these symptoms.
> 
> I didn't see such bug until now. Sorry. However, I might miss something
> because zram has changed a lot since then.
> What I recommend is just to use recent zram/zsmalloc.
> I think it's not hard to backport it because they are almost isolated
> from other parts in kernel.

Hello,

do you see anything suspicious in the logs?
like "Buffer I/O error on device zramX, logical block XXXXX", etc.


zram has evolved significantly since its staging age.

	-ss

> > <1>[ 5392.106074] BUG: Bad page state in process CompositorTileW  pfn:57a7e
> > <1>[ 5392.106109] page:ffffea00015e9f80 count:0 mapcount:0 mapping:
> >       (null) index:0x2
> > <1>[ 5392.106122] page flags: 0x4000000000000004(referenced)
> > <5>[ 5392.106139] Modules linked in: i2c_dev uinput
> > snd_hda_codec_realtek memconsole snd_hda_codec_hdmi uvcvideo
> > videobuf2_vmalloc videobuf2_memops videobuf2_core videodev
> > snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer
> > zram(C) lzo_compress zsmalloc(C) fuse nf_conntrack_ipv6 nf_defrag_ipv6
> > ip6table_filter ip6_tables ath9k_btcoex ath9k_common_btcoex
> > ath9k_hw_btcoex ath mac80211 cfg80211 option usb_wwan cdc_ether usbnet
> > ath3k btusb bluetooth joydev ppp_async ppp_generic slhc tun
> > <5>[ 5392.106333] Pid: 27363, comm: CompositorTileW Tainted: G    B
> > C   3.8.11 #1
> > <5>[ 5392.106344] Call Trace:
> > <5>[ 5392.106357]  [<ffffffff978ba5bb>] bad_page+0xcf/0xe3
> > <5>[ 5392.106370]  [<ffffffff978bb181>] get_page_from_freelist+0x21a/0x46c
> > <5>[ 5392.106383]  [<ffffffff978beb74>] ? release_pages+0x19b/0x1be
> > <5>[ 5392.106394]  [<ffffffff978bb5da>] __alloc_pages_nodemask+0x207/0x685
> > <5>[ 5392.106407]  [<ffffffff97cb8caf>] ? _cond_resched+0xe/0x1e
> > <5>[ 5392.106421]  [<ffffffff978d215a>] handle_pte_fault+0x305/0x500
> > <5>[ 5392.106433]  [<ffffffff978d4f5e>] ? __vma_link_file+0x65/0x67
> > <5>[ 5392.106445]  [<ffffffff978d30d0>] handle_mm_fault+0x97/0xbb
> > <5>[ 5392.106459]  [<ffffffff97828616>] __do_page_fault+0x1d4/0x38c
> > <5>[ 5392.106470]  [<ffffffff978d7803>] ? do_mmap_pgoff+0x284/0x2c0
> > <5>[ 5392.106482]  [<ffffffff978ca82c>] ? vm_mmap_pgoff+0x7d/0x8e
> > <5>[ 5392.106495]  [<ffffffff97828800>] do_page_fault+0xe/0x10
> > <5>[ 5392.106506]  [<ffffffff97cb9d32>] page_fault+0x22/0x30

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: advice on bad_page instance
  2015-04-15  7:16 ` Minchan Kim
  2015-04-15  8:05   ` Sergey Senozhatsky
@ 2015-04-15  8:22   ` Sergey Senozhatsky
  2015-04-15 15:43     ` Luigi Semenzato
  1 sibling, 1 reply; 5+ messages in thread
From: Sergey Senozhatsky @ 2015-04-15  8:22 UTC (permalink / raw)
  To: Luigi Semenzato
  Cc: Minchan Kim, Linux Memory Management List, sergey.senozhatsky

On (04/15/15 16:16), Minchan Kim wrote:
> On Tue, Apr 14, 2015 at 11:36:57AM -0700, Luigi Semenzato wrote:
> > We are seeing several instances of these things (often with different
> > but plausible values in the struct page) in kernel 3.8.11, followed by
> > a panic() in release_pages a few seconds later.
> > 
> > I realize it's an old kernel and probably of little interest here, but
> > I would be most grateful for any pointers on how to proceed.  In
> > particular, I suspect that many such bugs may have been fixed by now,
> > but I am not sure how to find the right fix (which I would backport).
> > 
> > Also, this happens under heavy swap, and we're using zram.  I wonder
> > if there may be a race condition related to zram which may have been
> > fixed since then, and which may result in these symptoms.
> 
> I didn't see such bug until now. Sorry. However, I might miss something
> because zram has changed a lot since then.
> What I recommend is just to use recent zram/zsmalloc.
> I think it's not hard to backport it because they are almost isolated
> from other parts in kernel.
> If you don't see any problem any more with recent zram, yay, your
> system doesn't have any problem. But if you see the problem still,
> it means you should suspect another stuffs as culprits as well as
> zram.
> 
> Thanks.
> 

assuming that you use zram0, does 'mkswap -c /dev/zram0' show any bad
pages right after the swap creation/activation?

	-ss

> > <1>[ 5392.106074] BUG: Bad page state in process CompositorTileW  pfn:57a7e
> > <1>[ 5392.106109] page:ffffea00015e9f80 count:0 mapcount:0 mapping:
> >       (null) index:0x2
> > <1>[ 5392.106122] page flags: 0x4000000000000004(referenced)
> > <5>[ 5392.106139] Modules linked in: i2c_dev uinput
> > snd_hda_codec_realtek memconsole snd_hda_codec_hdmi uvcvideo
> > videobuf2_vmalloc videobuf2_memops videobuf2_core videodev
> > snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer
> > zram(C) lzo_compress zsmalloc(C) fuse nf_conntrack_ipv6 nf_defrag_ipv6
> > ip6table_filter ip6_tables ath9k_btcoex ath9k_common_btcoex
> > ath9k_hw_btcoex ath mac80211 cfg80211 option usb_wwan cdc_ether usbnet
> > ath3k btusb bluetooth joydev ppp_async ppp_generic slhc tun
> > <5>[ 5392.106333] Pid: 27363, comm: CompositorTileW Tainted: G    B
> > C   3.8.11 #1
> > <5>[ 5392.106344] Call Trace:
> > <5>[ 5392.106357]  [<ffffffff978ba5bb>] bad_page+0xcf/0xe3
> > <5>[ 5392.106370]  [<ffffffff978bb181>] get_page_from_freelist+0x21a/0x46c
> > <5>[ 5392.106383]  [<ffffffff978beb74>] ? release_pages+0x19b/0x1be
> > <5>[ 5392.106394]  [<ffffffff978bb5da>] __alloc_pages_nodemask+0x207/0x685
> > <5>[ 5392.106407]  [<ffffffff97cb8caf>] ? _cond_resched+0xe/0x1e
> > <5>[ 5392.106421]  [<ffffffff978d215a>] handle_pte_fault+0x305/0x500
> > <5>[ 5392.106433]  [<ffffffff978d4f5e>] ? __vma_link_file+0x65/0x67
> > <5>[ 5392.106445]  [<ffffffff978d30d0>] handle_mm_fault+0x97/0xbb
> > <5>[ 5392.106459]  [<ffffffff97828616>] __do_page_fault+0x1d4/0x38c
> > <5>[ 5392.106470]  [<ffffffff978d7803>] ? do_mmap_pgoff+0x284/0x2c0
> > <5>[ 5392.106482]  [<ffffffff978ca82c>] ? vm_mmap_pgoff+0x7d/0x8e
> > <5>[ 5392.106495]  [<ffffffff97828800>] do_page_fault+0xe/0x10
> > <5>[ 5392.106506]  [<ffffffff97cb9d32>] page_fault+0x22/0x30

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: advice on bad_page instance
  2015-04-15  8:22   ` Sergey Senozhatsky
@ 2015-04-15 15:43     ` Luigi Semenzato
  0 siblings, 0 replies; 5+ messages in thread
From: Luigi Semenzato @ 2015-04-15 15:43 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Linux Memory Management List, sergey.senozhatsky

Thank you for your replies!  Actually I don't see anything that makes
me suspect zram as the culprit.  I mentioned it just in case you folks
knew of any bug that would result in that behavior.

Thank you for the suggestion to backport---I'll look into it.  Of
course I would prefer to backport code that I actually know fixes this
mm problem.


On Wed, Apr 15, 2015 at 1:22 AM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> On (04/15/15 16:16), Minchan Kim wrote:
>> On Tue, Apr 14, 2015 at 11:36:57AM -0700, Luigi Semenzato wrote:
>> > We are seeing several instances of these things (often with different
>> > but plausible values in the struct page) in kernel 3.8.11, followed by
>> > a panic() in release_pages a few seconds later.
>> >
>> > I realize it's an old kernel and probably of little interest here, but
>> > I would be most grateful for any pointers on how to proceed.  In
>> > particular, I suspect that many such bugs may have been fixed by now,
>> > but I am not sure how to find the right fix (which I would backport).
>> >
>> > Also, this happens under heavy swap, and we're using zram.  I wonder
>> > if there may be a race condition related to zram which may have been
>> > fixed since then, and which may result in these symptoms.
>>
>> I didn't see such bug until now. Sorry. However, I might miss something
>> because zram has changed a lot since then.
>> What I recommend is just to use recent zram/zsmalloc.
>> I think it's not hard to backport it because they are almost isolated
>> from other parts in kernel.
>> If you don't see any problem any more with recent zram, yay, your
>> system doesn't have any problem. But if you see the problem still,
>> it means you should suspect another stuffs as culprits as well as
>> zram.
>>
>> Thanks.
>>
>
> assuming that you use zram0, does 'mkswap -c /dev/zram0' show any bad
> pages right after the swap creation/activation?
>
>         -ss
>
>> > <1>[ 5392.106074] BUG: Bad page state in process CompositorTileW  pfn:57a7e
>> > <1>[ 5392.106109] page:ffffea00015e9f80 count:0 mapcount:0 mapping:
>> >       (null) index:0x2
>> > <1>[ 5392.106122] page flags: 0x4000000000000004(referenced)
>> > <5>[ 5392.106139] Modules linked in: i2c_dev uinput
>> > snd_hda_codec_realtek memconsole snd_hda_codec_hdmi uvcvideo
>> > videobuf2_vmalloc videobuf2_memops videobuf2_core videodev
>> > snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer
>> > zram(C) lzo_compress zsmalloc(C) fuse nf_conntrack_ipv6 nf_defrag_ipv6
>> > ip6table_filter ip6_tables ath9k_btcoex ath9k_common_btcoex
>> > ath9k_hw_btcoex ath mac80211 cfg80211 option usb_wwan cdc_ether usbnet
>> > ath3k btusb bluetooth joydev ppp_async ppp_generic slhc tun
>> > <5>[ 5392.106333] Pid: 27363, comm: CompositorTileW Tainted: G    B
>> > C   3.8.11 #1
>> > <5>[ 5392.106344] Call Trace:
>> > <5>[ 5392.106357]  [<ffffffff978ba5bb>] bad_page+0xcf/0xe3
>> > <5>[ 5392.106370]  [<ffffffff978bb181>] get_page_from_freelist+0x21a/0x46c
>> > <5>[ 5392.106383]  [<ffffffff978beb74>] ? release_pages+0x19b/0x1be
>> > <5>[ 5392.106394]  [<ffffffff978bb5da>] __alloc_pages_nodemask+0x207/0x685
>> > <5>[ 5392.106407]  [<ffffffff97cb8caf>] ? _cond_resched+0xe/0x1e
>> > <5>[ 5392.106421]  [<ffffffff978d215a>] handle_pte_fault+0x305/0x500
>> > <5>[ 5392.106433]  [<ffffffff978d4f5e>] ? __vma_link_file+0x65/0x67
>> > <5>[ 5392.106445]  [<ffffffff978d30d0>] handle_mm_fault+0x97/0xbb
>> > <5>[ 5392.106459]  [<ffffffff97828616>] __do_page_fault+0x1d4/0x38c
>> > <5>[ 5392.106470]  [<ffffffff978d7803>] ? do_mmap_pgoff+0x284/0x2c0
>> > <5>[ 5392.106482]  [<ffffffff978ca82c>] ? vm_mmap_pgoff+0x7d/0x8e
>> > <5>[ 5392.106495]  [<ffffffff97828800>] do_page_fault+0xe/0x10
>> > <5>[ 5392.106506]  [<ffffffff97cb9d32>] page_fault+0x22/0x30

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-04-15 15:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-14 18:36 advice on bad_page instance Luigi Semenzato
2015-04-15  7:16 ` Minchan Kim
2015-04-15  8:05   ` Sergey Senozhatsky
2015-04-15  8:22   ` Sergey Senozhatsky
2015-04-15 15:43     ` Luigi Semenzato

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.