xfs trace in 4.4.2

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* xfs trace in 4.4.2
@ 2016-02-20  8:02 Stefan Priebe
  2016-02-20 14:45 ` Brian Foster
  0 siblings, 1 reply; 49+ messages in thread
From: Stefan Priebe @ 2016-02-20  8:02 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel, xfs-masters

Hi,

got this one today. Not sure if this is a bug.

[67674.907736] ------------[ cut here ]------------
[67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232 
xfs_vm_releasepage+0xa9/0xe0()
[67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT 
nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd 
sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod 
i2c_algo_bit ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
sas scsi_transport_sas pps_core
[67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
[67675.277120] Hardware name: Supermicro 
X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
[67675.335176]  ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1 
0000000000000001
[67675.392983]  0000000000000000 ffff88007950fad8 ffffffffa3083587 
ffff88007950fae8
[67675.449743]  0000000000000001 ffffea0020883480 ffff880cf4b9cdd0 
ffffea00208834a0
[67675.506112] Call Trace:
[67675.561285]  [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
[67675.619364]  [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
[67675.675719]  [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
[67675.731113]  [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
[67675.786116]  [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
[67675.844216]  [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
[67675.903862]  [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
[67675.957625]  [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
[67676.011497]  [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
[67676.064980]  [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
[67676.118828]  [<ffffffffa3166659>] kswapd+0x4f9/0x930
[67676.172075]  [<ffffffffa3166160>] ? 
mem_cgroup_shrink_node_zone+0x150/0x150
[67676.225139]  [<ffffffffa30a08c9>] kthread+0xc9/0xe0
[67676.277539]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
[67676.330124]  [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
[67676.381816]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
[67676.433499] ---[ end trace cb1827fe308f7f6b ]---

Greets Stefan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2
  2016-02-20  8:02 xfs trace in 4.4.2 Stefan Priebe
@ 2016-02-20 14:45 ` Brian Foster
  2016-02-20 18:02   ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 49+ messages in thread
From: Brian Foster @ 2016-02-20 14:45 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: linux-fsdevel, xfs-masters, xfs

On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> Hi,
> 
> got this one today. Not sure if this is a bug.
> 

That looks like the releasepage() delayed allocation block warning. I'm
not sure we've had any fixes for (or reports of) that issue since the
v4.2 timeframe.

What is the xfs_info of the associated filesystem? Also, do you have any
insight as to the possible reproducer application or workload? Is this
reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
won't fire again regardless until after a reboot.

Brian

> [67674.907736] ------------[ cut here ]------------
> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
> xfs_vm_releasepage+0xa9/0xe0()
> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
> sas scsi_transport_sas pps_core
> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
> [67675.277120] Hardware name: Supermicro
> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
> [67675.335176]  ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
> 0000000000000001
> [67675.392983]  0000000000000000 ffff88007950fad8 ffffffffa3083587
> ffff88007950fae8
> [67675.449743]  0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
> ffffea00208834a0
> [67675.506112] Call Trace:
> [67675.561285]  [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
> [67675.619364]  [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
> [67675.675719]  [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
> [67675.731113]  [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
> [67675.786116]  [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
> [67675.844216]  [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
> [67675.903862]  [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
> [67675.957625]  [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
> [67676.011497]  [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
> [67676.064980]  [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
> [67676.118828]  [<ffffffffa3166659>] kswapd+0x4f9/0x930
> [67676.172075]  [<ffffffffa3166160>] ?
> mem_cgroup_shrink_node_zone+0x150/0x150
> [67676.225139]  [<ffffffffa30a08c9>] kthread+0xc9/0xe0
> [67676.277539]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
> [67676.330124]  [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
> [67676.381816]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
> [67676.433499] ---[ end trace cb1827fe308f7f6b ]---
> 
> Greets Stefan
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2
  2016-02-20 14:45 ` Brian Foster
@ 2016-02-20 18:02   ` Stefan Priebe - Profihost AG
  2016-03-04 18:47     ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe
  0 siblings, 1 reply; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-02-20 18:02 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs


> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> 
>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>> Hi,
>> 
>> got this one today. Not sure if this is a bug.
> 
> That looks like the releasepage() delayed allocation block warning. I'm
> not sure we've had any fixes for (or reports of) that issue since the
> v4.2 timeframe.
> 
> What is the xfs_info of the associated filesystem? Also, do you have any
> insight as to the possible reproducer application or workload? Is this
> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
> won't fire again regardless until after a reboot.

Sorry no reproducer and also no xfs Info. As i didn't know which fs this was.

But the job running is doing:
mount /dev/loop0p3 /mpt
xfs_repair -n /mpt
unount /mpt

Stefan

> 
> Brian
> 
>> [67674.907736] ------------[ cut here ]------------
>> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
>> xfs_vm_releasepage+0xa9/0xe0()
>> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
>> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
>> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
>> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
>> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
>> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
>> sas scsi_transport_sas pps_core
>> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
>> [67675.277120] Hardware name: Supermicro
>> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
>> [67675.335176]  ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
>> 0000000000000001
>> [67675.392983]  0000000000000000 ffff88007950fad8 ffffffffa3083587
>> ffff88007950fae8
>> [67675.449743]  0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
>> ffffea00208834a0
>> [67675.506112] Call Trace:
>> [67675.561285]  [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
>> [67675.619364]  [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
>> [67675.675719]  [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
>> [67675.731113]  [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
>> [67675.786116]  [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
>> [67675.844216]  [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
>> [67675.903862]  [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
>> [67675.957625]  [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
>> [67676.011497]  [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
>> [67676.064980]  [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
>> [67676.118828]  [<ffffffffa3166659>] kswapd+0x4f9/0x930
>> [67676.172075]  [<ffffffffa3166160>] ?
>> mem_cgroup_shrink_node_zone+0x150/0x150
>> [67676.225139]  [<ffffffffa30a08c9>] kthread+0xc9/0xe0
>> [67676.277539]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>> [67676.330124]  [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
>> [67676.381816]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>> [67676.433499] ---[ end trace cb1827fe308f7f6b ]---
>> 
>> Greets Stefan
>> 
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-02-20 18:02   ` Stefan Priebe - Profihost AG
@ 2016-03-04 18:47     ` Stefan Priebe
  2016-03-04 19:13       ` Brian Foster
  0 siblings, 1 reply; 49+ messages in thread
From: Stefan Priebe @ 2016-03-04 18:47 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs

Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>
>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>
>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>> Hi,
>>>
>>> got this one today. Not sure if this is a bug.
>>
>> That looks like the releasepage() delayed allocation block warning. I'm
>> not sure we've had any fixes for (or reports of) that issue since the
>> v4.2 timeframe.
>>
>> What is the xfs_info of the associated filesystem? Also, do you have any
>> insight as to the possible reproducer application or workload? Is this
>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
>> won't fire again regardless until after a reboot.

Toda i got this one running 4.3.3.

[154152.949610] ------------[ cut here ]------------
[154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232 
xfs_vm_releasepage+0xc3/0xf0()
[154152.952596] Modules linked in: netconsole mpt3sas raid_class 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp 
ipt_REJECT nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables 
x_tables 8021q garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si 
usbcore edac_core ipmi_msghandler i2c_i801 usb_common button btrfs xor 
raid6_pq sg igb sd_mod i2c_algo_bit isci i2c_core libsas ahci ptp 
libahci scsi_transport_sas megaraid_sas pps_core
[154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
[154152.964625] Hardware name: Supermicro 
X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a 
03/06/2012
[154152.967029]  0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f 
0000000000000000
[154152.968836]  ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757 
0000000000000000
[154152.970641]  0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0 
ffffea0001e7bfe0
[154152.972447] Call Trace:
[154152.973011]  [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
[154152.974167]  [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
[154152.975515]  [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
[154152.976826]  [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
[154152.978137]  [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
[154152.979467]  [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
[154152.980816]  [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
[154152.982068]  [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
[154152.983262]  [<ffffffffa7167399>] kswapd+0x4f9/0x970
[154152.984380]  [<ffffffffa7166ea0>] ? 
mem_cgroup_shrink_node_zone+0x1a0/0x1a0
[154152.985942]  [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
[154152.987040]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
[154152.988313]  [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
[154152.989527]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
[154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---

This time with an xfs info:
# xfs_info /
meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac 
isize=256    agcount=4, agsize=58224256 blks
          =                       sectsz=512   attr=2, projid32bit=0
          =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=232897024, imaxpct=25
          =                       sunit=64     swidth=384 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=113728, version=2
          =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

>
>>
>> Brian
>>
>>> [67674.907736] ------------[ cut here ]------------
>>> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
>>> xfs_vm_releasepage+0xa9/0xe0()
>>> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
>>> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
>>> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
>>> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
>>> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
>>> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
>>> sas scsi_transport_sas pps_core
>>> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
>>> [67675.277120] Hardware name: Supermicro
>>> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
>>> [67675.335176]  ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
>>> 0000000000000001
>>> [67675.392983]  0000000000000000 ffff88007950fad8 ffffffffa3083587
>>> ffff88007950fae8
>>> [67675.449743]  0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
>>> ffffea00208834a0
>>> [67675.506112] Call Trace:
>>> [67675.561285]  [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
>>> [67675.619364]  [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
>>> [67675.675719]  [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
>>> [67675.731113]  [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
>>> [67675.786116]  [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
>>> [67675.844216]  [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
>>> [67675.903862]  [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
>>> [67675.957625]  [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
>>> [67676.011497]  [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
>>> [67676.064980]  [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
>>> [67676.118828]  [<ffffffffa3166659>] kswapd+0x4f9/0x930
>>> [67676.172075]  [<ffffffffa3166160>] ?
>>> mem_cgroup_shrink_node_zone+0x150/0x150
>>> [67676.225139]  [<ffffffffa30a08c9>] kthread+0xc9/0xe0
>>> [67676.277539]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>>> [67676.330124]  [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
>>> [67676.381816]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>>> [67676.433499] ---[ end trace cb1827fe308f7f6b ]---
>>>
>>> Greets Stefan
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-04 18:47     ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe
@ 2016-03-04 19:13       ` Brian Foster
  2016-03-04 20:02         ` Stefan Priebe
  0 siblings, 1 reply; 49+ messages in thread
From: Brian Foster @ 2016-03-04 19:13 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: linux-fsdevel, xfs-masters, xfs

On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >
> >>Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>
> >>>On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> >>>Hi,
> >>>
> >>>got this one today. Not sure if this is a bug.
> >>
> >>That looks like the releasepage() delayed allocation block warning. I'm
> >>not sure we've had any fixes for (or reports of) that issue since the
> >>v4.2 timeframe.
> >>
> >>What is the xfs_info of the associated filesystem? Also, do you have any
> >>insight as to the possible reproducer application or workload? Is this
> >>reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
> >>won't fire again regardless until after a reboot.
> 
> Toda i got this one running 4.3.3.
> 
> [154152.949610] ------------[ cut here ]------------
> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
> xfs_vm_releasepage+0xc3/0xf0()
> [154152.952596] Modules linked in: netconsole mpt3sas raid_class
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
> megaraid_sas pps_core
> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
> [154152.964625] Hardware name: Supermicro
> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
> 03/06/2012
> [154152.967029]  0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
> 0000000000000000
> [154152.968836]  ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
> 0000000000000000
> [154152.970641]  0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
> ffffea0001e7bfe0
> [154152.972447] Call Trace:
> [154152.973011]  [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
> [154152.974167]  [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
> [154152.975515]  [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
> [154152.976826]  [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
> [154152.978137]  [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
> [154152.979467]  [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
> [154152.980816]  [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
> [154152.982068]  [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
> [154152.983262]  [<ffffffffa7167399>] kswapd+0x4f9/0x970
> [154152.984380]  [<ffffffffa7166ea0>] ?
> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> [154152.985942]  [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
> [154152.987040]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
> [154152.988313]  [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
> [154152.989527]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
> 
> This time with an xfs info:
> # xfs_info /
> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
> agcount=4, agsize=58224256 blks
>          =                       sectsz=512   attr=2, projid32bit=0
>          =                       crc=0        finobt=0
> data     =                       bsize=4096   blocks=232897024, imaxpct=25
>          =                       sunit=64     swidth=384 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
> log      =internal               bsize=4096   blocks=113728, version=2
>          =                       sectsz=512   sunit=64 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 

Can you describe the workload to the filesystem?

Brian

> >
> >>
> >>Brian
> >>
> >>>[67674.907736] ------------[ cut here ]------------
> >>>[67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
> >>>xfs_vm_releasepage+0xa9/0xe0()
> >>>[67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
> >>>nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
> >>>es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
> >>>sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
> >>>handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
> >>>ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
> >>>sas scsi_transport_sas pps_core
> >>>[67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
> >>>[67675.277120] Hardware name: Supermicro
> >>>X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
> >>>[67675.335176]  ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
> >>>0000000000000001
> >>>[67675.392983]  0000000000000000 ffff88007950fad8 ffffffffa3083587
> >>>ffff88007950fae8
> >>>[67675.449743]  0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
> >>>ffffea00208834a0
> >>>[67675.506112] Call Trace:
> >>>[67675.561285]  [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
> >>>[67675.619364]  [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
> >>>[67675.675719]  [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
> >>>[67675.731113]  [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
> >>>[67675.786116]  [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
> >>>[67675.844216]  [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
> >>>[67675.903862]  [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
> >>>[67675.957625]  [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
> >>>[67676.011497]  [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
> >>>[67676.064980]  [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
> >>>[67676.118828]  [<ffffffffa3166659>] kswapd+0x4f9/0x930
> >>>[67676.172075]  [<ffffffffa3166160>] ?
> >>>mem_cgroup_shrink_node_zone+0x150/0x150
> >>>[67676.225139]  [<ffffffffa30a08c9>] kthread+0xc9/0xe0
> >>>[67676.277539]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
> >>>[67676.330124]  [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
> >>>[67676.381816]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
> >>>[67676.433499] ---[ end trace cb1827fe308f7f6b ]---
> >>>
> >>>Greets Stefan
> >>>
> >>>_______________________________________________
> >>>xfs mailing list
> >>>xfs@oss.sgi.com
> >>>http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-04 19:13       ` Brian Foster
@ 2016-03-04 20:02         ` Stefan Priebe
  2016-03-04 21:03           ` Brian Foster
  0 siblings, 1 reply; 49+ messages in thread
From: Stefan Priebe @ 2016-03-04 20:02 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs


Am 04.03.2016 um 20:13 schrieb Brian Foster:
> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>
>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>
>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>> Hi,
>>>>>
>>>>> got this one today. Not sure if this is a bug.
>>>>
>>>> That looks like the releasepage() delayed allocation block warning. I'm
>>>> not sure we've had any fixes for (or reports of) that issue since the
>>>> v4.2 timeframe.
>>>>
>>>> What is the xfs_info of the associated filesystem? Also, do you have any
>>>> insight as to the possible reproducer application or workload? Is this
>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
>>>> won't fire again regardless until after a reboot.
>>
>> Toda i got this one running 4.3.3.
>>
>> [154152.949610] ------------[ cut here ]------------
>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
>> xfs_vm_releasepage+0xc3/0xf0()
>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class
>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
>> megaraid_sas pps_core
>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
>> [154152.964625] Hardware name: Supermicro
>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
>> 03/06/2012
>> [154152.967029]  0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
>> 0000000000000000
>> [154152.968836]  ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
>> 0000000000000000
>> [154152.970641]  0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
>> ffffea0001e7bfe0
>> [154152.972447] Call Trace:
>> [154152.973011]  [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
>> [154152.974167]  [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
>> [154152.975515]  [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
>> [154152.976826]  [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
>> [154152.978137]  [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
>> [154152.979467]  [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
>> [154152.980816]  [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
>> [154152.982068]  [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
>> [154152.983262]  [<ffffffffa7167399>] kswapd+0x4f9/0x970
>> [154152.984380]  [<ffffffffa7166ea0>] ?
>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>> [154152.985942]  [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
>> [154152.987040]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>> [154152.988313]  [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
>> [154152.989527]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
>>
>> This time with an xfs info:
>> # xfs_info /
>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
>> agcount=4, agsize=58224256 blks
>>           =                       sectsz=512   attr=2, projid32bit=0
>>           =                       crc=0        finobt=0
>> data     =                       bsize=4096   blocks=232897024, imaxpct=25
>>           =                       sunit=64     swidth=384 blks
>> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
>> log      =internal               bsize=4096   blocks=113728, version=2
>>           =                       sectsz=512   sunit=64 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>
>
> Can you describe the workload to the filesystem?

At the time of this trace the rsync backup of the fs has started. So the 
workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.

Stefan

> Brian
>
>>>
>>>>
>>>> Brian
>>>>
>>>>> [67674.907736] ------------[ cut here ]------------
>>>>> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
>>>>> xfs_vm_releasepage+0xa9/0xe0()
>>>>> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
>>>>> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
>>>>> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
>>>>> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
>>>>> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
>>>>> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
>>>>> sas scsi_transport_sas pps_core
>>>>> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
>>>>> [67675.277120] Hardware name: Supermicro
>>>>> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
>>>>> [67675.335176]  ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
>>>>> 0000000000000001
>>>>> [67675.392983]  0000000000000000 ffff88007950fad8 ffffffffa3083587
>>>>> ffff88007950fae8
>>>>> [67675.449743]  0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
>>>>> ffffea00208834a0
>>>>> [67675.506112] Call Trace:
>>>>> [67675.561285]  [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
>>>>> [67675.619364]  [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
>>>>> [67675.675719]  [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
>>>>> [67675.731113]  [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
>>>>> [67675.786116]  [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
>>>>> [67675.844216]  [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
>>>>> [67675.903862]  [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
>>>>> [67675.957625]  [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
>>>>> [67676.011497]  [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
>>>>> [67676.064980]  [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
>>>>> [67676.118828]  [<ffffffffa3166659>] kswapd+0x4f9/0x930
>>>>> [67676.172075]  [<ffffffffa3166160>] ?
>>>>> mem_cgroup_shrink_node_zone+0x150/0x150
>>>>> [67676.225139]  [<ffffffffa30a08c9>] kthread+0xc9/0xe0
>>>>> [67676.277539]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>>>>> [67676.330124]  [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
>>>>> [67676.381816]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>>>>> [67676.433499] ---[ end trace cb1827fe308f7f6b ]---
>>>>>
>>>>> Greets Stefan
>>>>>
>>>>> _______________________________________________
>>>>> xfs mailing list
>>>>> xfs@oss.sgi.com
>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-04 20:02         ` Stefan Priebe
@ 2016-03-04 21:03           ` Brian Foster
  2016-03-04 21:15             ` Stefan Priebe
  2016-03-05 22:48             ` Dave Chinner
  0 siblings, 2 replies; 49+ messages in thread
From: Brian Foster @ 2016-03-04 21:03 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: linux-fsdevel, xfs-masters, xfs

On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> 
> Am 04.03.2016 um 20:13 schrieb Brian Foster:
> >On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> >>Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >>>
> >>>>Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>>>
> >>>>>On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> >>>>>Hi,
> >>>>>
> >>>>>got this one today. Not sure if this is a bug.
> >>>>
> >>>>That looks like the releasepage() delayed allocation block warning. I'm
> >>>>not sure we've had any fixes for (or reports of) that issue since the
> >>>>v4.2 timeframe.
> >>>>
> >>>>What is the xfs_info of the associated filesystem? Also, do you have any
> >>>>insight as to the possible reproducer application or workload? Is this
> >>>>reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
> >>>>won't fire again regardless until after a reboot.
> >>
> >>Toda i got this one running 4.3.3.
> >>
> >>[154152.949610] ------------[ cut here ]------------
> >>[154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
> >>xfs_vm_releasepage+0xc3/0xf0()
> >>[154152.952596] Modules linked in: netconsole mpt3sas raid_class
> >>nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
> >>nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
> >>garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
> >>ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
> >>i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
> >>megaraid_sas pps_core
> >>[154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
> >>[154152.964625] Hardware name: Supermicro
> >>X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
> >>03/06/2012
> >>[154152.967029]  0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
> >>0000000000000000
> >>[154152.968836]  ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
> >>0000000000000000
> >>[154152.970641]  0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
> >>ffffea0001e7bfe0
> >>[154152.972447] Call Trace:
> >>[154152.973011]  [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
> >>[154152.974167]  [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
> >>[154152.975515]  [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
> >>[154152.976826]  [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
> >>[154152.978137]  [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
> >>[154152.979467]  [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
> >>[154152.980816]  [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
> >>[154152.982068]  [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
> >>[154152.983262]  [<ffffffffa7167399>] kswapd+0x4f9/0x970
> >>[154152.984380]  [<ffffffffa7166ea0>] ?
> >>mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> >>[154152.985942]  [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
> >>[154152.987040]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
> >>[154152.988313]  [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
> >>[154152.989527]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
> >>[154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
> >>
> >>This time with an xfs info:
> >># xfs_info /
> >>meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
> >>agcount=4, agsize=58224256 blks
> >>          =                       sectsz=512   attr=2, projid32bit=0
> >>          =                       crc=0        finobt=0
> >>data     =                       bsize=4096   blocks=232897024, imaxpct=25
> >>          =                       sunit=64     swidth=384 blks
> >>naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
> >>log      =internal               bsize=4096   blocks=113728, version=2
> >>          =                       sectsz=512   sunit=64 blks, lazy-count=1
> >>realtime =none                   extsz=4096   blocks=0, rtextents=0
> >>
> >
> >Can you describe the workload to the filesystem?
> 
> At the time of this trace the rsync backup of the fs has started. So the
> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.
> 

Interesting. The warning is associated with releasing a page that has a
delayed allocation when it shouldn't. That means something had written
to a file to cause the delalloc in the first place. Any idea what could
have been writing at the time or shortly before the rsync read workload
had kicked in?

Brian

> Stefan
> 
> >Brian
> >
> >>>
> >>>>
> >>>>Brian
> >>>>
> >>>>>[67674.907736] ------------[ cut here ]------------
> >>>>>[67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
> >>>>>xfs_vm_releasepage+0xa9/0xe0()
> >>>>>[67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
> >>>>>nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
> >>>>>es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
> >>>>>sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
> >>>>>handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
> >>>>>ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
> >>>>>sas scsi_transport_sas pps_core
> >>>>>[67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
> >>>>>[67675.277120] Hardware name: Supermicro
> >>>>>X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
> >>>>>[67675.335176]  ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
> >>>>>0000000000000001
> >>>>>[67675.392983]  0000000000000000 ffff88007950fad8 ffffffffa3083587
> >>>>>ffff88007950fae8
> >>>>>[67675.449743]  0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
> >>>>>ffffea00208834a0
> >>>>>[67675.506112] Call Trace:
> >>>>>[67675.561285]  [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
> >>>>>[67675.619364]  [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
> >>>>>[67675.675719]  [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
> >>>>>[67675.731113]  [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
> >>>>>[67675.786116]  [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
> >>>>>[67675.844216]  [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
> >>>>>[67675.903862]  [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
> >>>>>[67675.957625]  [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
> >>>>>[67676.011497]  [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
> >>>>>[67676.064980]  [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
> >>>>>[67676.118828]  [<ffffffffa3166659>] kswapd+0x4f9/0x930
> >>>>>[67676.172075]  [<ffffffffa3166160>] ?
> >>>>>mem_cgroup_shrink_node_zone+0x150/0x150
> >>>>>[67676.225139]  [<ffffffffa30a08c9>] kthread+0xc9/0xe0
> >>>>>[67676.277539]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
> >>>>>[67676.330124]  [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
> >>>>>[67676.381816]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
> >>>>>[67676.433499] ---[ end trace cb1827fe308f7f6b ]---
> >>>>>
> >>>>>Greets Stefan
> >>>>>
> >>>>>_______________________________________________
> >>>>>xfs mailing list
> >>>>>xfs@oss.sgi.com
> >>>>>http://oss.sgi.com/mailman/listinfo/xfs
> >>
> >>_______________________________________________
> >>xfs mailing list
> >>xfs@oss.sgi.com
> >>http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-04 21:03           ` Brian Foster
@ 2016-03-04 21:15             ` Stefan Priebe
  2016-03-05 22:48             ` Dave Chinner
  1 sibling, 0 replies; 49+ messages in thread
From: Stefan Priebe @ 2016-03-04 21:15 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs

Am 04.03.2016 um 22:03 schrieb Brian Foster:
> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>
>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>
>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>
>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> got this one today. Not sure if this is a bug.
>>>>>>
>>>>>> That looks like the releasepage() delayed allocation block warning. I'm
>>>>>> not sure we've had any fixes for (or reports of) that issue since the
>>>>>> v4.2 timeframe.
>>>>>>
>>>>>> What is the xfs_info of the associated filesystem? Also, do you have any
>>>>>> insight as to the possible reproducer application or workload? Is this
>>>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
>>>>>> won't fire again regardless until after a reboot.
>>>>
>>>> Toda i got this one running 4.3.3.
>>>>
>>>> [154152.949610] ------------[ cut here ]------------
>>>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
>>>> xfs_vm_releasepage+0xc3/0xf0()
>>>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class
>>>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
>>>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
>>>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
>>>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
>>>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
>>>> megaraid_sas pps_core
>>>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
>>>> [154152.964625] Hardware name: Supermicro
>>>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
>>>> 03/06/2012
>>>> [154152.967029]  0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
>>>> 0000000000000000
>>>> [154152.968836]  ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
>>>> 0000000000000000
>>>> [154152.970641]  0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
>>>> ffffea0001e7bfe0
>>>> [154152.972447] Call Trace:
>>>> [154152.973011]  [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
>>>> [154152.974167]  [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
>>>> [154152.975515]  [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
>>>> [154152.976826]  [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
>>>> [154152.978137]  [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
>>>> [154152.979467]  [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
>>>> [154152.980816]  [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
>>>> [154152.982068]  [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
>>>> [154152.983262]  [<ffffffffa7167399>] kswapd+0x4f9/0x970
>>>> [154152.984380]  [<ffffffffa7166ea0>] ?
>>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>>>> [154152.985942]  [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
>>>> [154152.987040]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>> [154152.988313]  [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
>>>> [154152.989527]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
>>>>
>>>> This time with an xfs info:
>>>> # xfs_info /
>>>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
>>>> agcount=4, agsize=58224256 blks
>>>>           =                       sectsz=512   attr=2, projid32bit=0
>>>>           =                       crc=0        finobt=0
>>>> data     =                       bsize=4096   blocks=232897024, imaxpct=25
>>>>           =                       sunit=64     swidth=384 blks
>>>> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
>>>> log      =internal               bsize=4096   blocks=113728, version=2
>>>>           =                       sectsz=512   sunit=64 blks, lazy-count=1
>>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>>>
>>>
>>> Can you describe the workload to the filesystem?
>>
>> At the time of this trace the rsync backup of the fs has started. So the
>> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.
>>
>
> Interesting. The warning is associated with releasing a page that has a
> delayed allocation when it shouldn't. That means something had written
> to a file to cause the delalloc in the first place. Any idea what could
> have been writing at the time or shortly before the rsync read workload
> had kicked in?

The systen itself is a lamp system so PHP and MySQL are running and may 
write data to files but at the time the trace happens the system was 
nearly idle but not completely. It was 3am.

Stefan

>
> Brian
>
>> Stefan
>>
>>> Brian
>>>
>>>>>
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>>> [67674.907736] ------------[ cut here ]------------
>>>>>>> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
>>>>>>> xfs_vm_releasepage+0xa9/0xe0()
>>>>>>> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
>>>>>>> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
>>>>>>> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
>>>>>>> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
>>>>>>> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
>>>>>>> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
>>>>>>> sas scsi_transport_sas pps_core
>>>>>>> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
>>>>>>> [67675.277120] Hardware name: Supermicro
>>>>>>> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
>>>>>>> [67675.335176]  ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
>>>>>>> 0000000000000001
>>>>>>> [67675.392983]  0000000000000000 ffff88007950fad8 ffffffffa3083587
>>>>>>> ffff88007950fae8
>>>>>>> [67675.449743]  0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
>>>>>>> ffffea00208834a0
>>>>>>> [67675.506112] Call Trace:
>>>>>>> [67675.561285]  [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
>>>>>>> [67675.619364]  [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
>>>>>>> [67675.675719]  [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
>>>>>>> [67675.731113]  [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
>>>>>>> [67675.786116]  [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
>>>>>>> [67675.844216]  [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
>>>>>>> [67675.903862]  [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
>>>>>>> [67675.957625]  [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
>>>>>>> [67676.011497]  [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
>>>>>>> [67676.064980]  [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
>>>>>>> [67676.118828]  [<ffffffffa3166659>] kswapd+0x4f9/0x930
>>>>>>> [67676.172075]  [<ffffffffa3166160>] ?
>>>>>>> mem_cgroup_shrink_node_zone+0x150/0x150
>>>>>>> [67676.225139]  [<ffffffffa30a08c9>] kthread+0xc9/0xe0
>>>>>>> [67676.277539]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>>>>>>> [67676.330124]  [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
>>>>>>> [67676.381816]  [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>>>>>>> [67676.433499] ---[ end trace cb1827fe308f7f6b ]---
>>>>>>>
>>>>>>> Greets Stefan
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> xfs mailing list
>>>>>>> xfs@oss.sgi.com
>>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>
>>>> _______________________________________________
>>>> xfs mailing list
>>>> xfs@oss.sgi.com
>>>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-04 21:03           ` Brian Foster
  2016-03-04 21:15             ` Stefan Priebe
@ 2016-03-05 22:48             ` Dave Chinner
  2016-03-05 22:58               ` Stefan Priebe
                                 ` (2 more replies)
  1 sibling, 3 replies; 49+ messages in thread
From: Dave Chinner @ 2016-03-05 22:48 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs, Stefan Priebe

On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> > Am 04.03.2016 um 20:13 schrieb Brian Foster:
> > >On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> > >>Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> > >>>
> > >>>>Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> > >>>>
> > >>>>>On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> > >>>>>Hi,
> > >>>>>
> > >>>>>got this one today. Not sure if this is a bug.
> > >>>>
> > >>>>That looks like the releasepage() delayed allocation block warning. I'm
> > >>>>not sure we've had any fixes for (or reports of) that issue since the
> > >>>>v4.2 timeframe.
> > >>>>
> > >>>>What is the xfs_info of the associated filesystem? Also, do you have any
> > >>>>insight as to the possible reproducer application or workload? Is this
> > >>>>reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
> > >>>>won't fire again regardless until after a reboot.
> > >>
> > >>Toda i got this one running 4.3.3.
> > >>
> > >>[154152.949610] ------------[ cut here ]------------
> > >>[154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
> > >>xfs_vm_releasepage+0xc3/0xf0()
> > >>[154152.952596] Modules linked in: netconsole mpt3sas raid_class
> > >>nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
> > >>nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
> > >>garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
> > >>ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
> > >>i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
> > >>megaraid_sas pps_core
> > >>[154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
> > >>[154152.964625] Hardware name: Supermicro
> > >>X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
> > >>03/06/2012
> > >>[154152.967029]  0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
> > >>0000000000000000
> > >>[154152.968836]  ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
> > >>0000000000000000
> > >>[154152.970641]  0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
> > >>ffffea0001e7bfe0
> > >>[154152.972447] Call Trace:
> > >>[154152.973011]  [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
> > >>[154152.974167]  [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
> > >>[154152.975515]  [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
> > >>[154152.976826]  [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
> > >>[154152.978137]  [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
> > >>[154152.979467]  [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
> > >>[154152.980816]  [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
> > >>[154152.982068]  [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
> > >>[154152.983262]  [<ffffffffa7167399>] kswapd+0x4f9/0x970
> > >>[154152.984380]  [<ffffffffa7166ea0>] ?
> > >>mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> > >>[154152.985942]  [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
> > >>[154152.987040]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
> > >>[154152.988313]  [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
> > >>[154152.989527]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
> > >>[154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
> > >>
> > >>This time with an xfs info:
> > >># xfs_info /
> > >>meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
> > >>agcount=4, agsize=58224256 blks
> > >>          =                       sectsz=512   attr=2, projid32bit=0
> > >>          =                       crc=0        finobt=0
> > >>data     =                       bsize=4096   blocks=232897024, imaxpct=25
> > >>          =                       sunit=64     swidth=384 blks
> > >>naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
> > >>log      =internal               bsize=4096   blocks=113728, version=2
> > >>          =                       sectsz=512   sunit=64 blks, lazy-count=1
> > >>realtime =none                   extsz=4096   blocks=0, rtextents=0
> > >>
> > >
> > >Can you describe the workload to the filesystem?
> > 
> > At the time of this trace the rsync backup of the fs has started. So the
> > workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.
> > 
> 
> Interesting. The warning is associated with releasing a page that has a
> delayed allocation when it shouldn't. That means something had written
> to a file to cause the delalloc in the first place. Any idea what could
> have been writing at the time or shortly before the rsync read workload
> had kicked in?

It's memory reclaim that tripped over it, so the cause is long gone
- couple have been anything in the previous 24 hours that caused the
issue. i.e. rsync has triggered memory reclaim which triggered the
warning, but I don't think rsync has anything to do with causing the
page to be in a state that caused the warning.

I'd be interested to know if there are any other warnings in the
logs - stuff like IO errors, page discards, ENOSPC issues, etc that
could trigger less travelled write error paths...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-05 22:48             ` Dave Chinner
@ 2016-03-05 22:58               ` Stefan Priebe
  2016-03-23 13:26               ` Stefan Priebe - Profihost AG
  2016-03-23 13:28               ` Stefan Priebe - Profihost AG
  2 siblings, 0 replies; 49+ messages in thread
From: Stefan Priebe @ 2016-03-05 22:58 UTC (permalink / raw)
  To: Dave Chinner, Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs


Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>
>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>
>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> got this one today. Not sure if this is a bug.
>>>>>>>
>>>>>>> That looks like the releasepage() delayed allocation block warning. I'm
>>>>>>> not sure we've had any fixes for (or reports of) that issue since the
>>>>>>> v4.2 timeframe.
>>>>>>>
>>>>>>> What is the xfs_info of the associated filesystem? Also, do you have any
>>>>>>> insight as to the possible reproducer application or workload? Is this
>>>>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
>>>>>>> won't fire again regardless until after a reboot.
>>>>>
>>>>> Toda i got this one running 4.3.3.
>>>>>
>>>>> [154152.949610] ------------[ cut here ]------------
>>>>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
>>>>> xfs_vm_releasepage+0xc3/0xf0()
>>>>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class
>>>>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
>>>>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
>>>>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
>>>>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
>>>>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
>>>>> megaraid_sas pps_core
>>>>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
>>>>> [154152.964625] Hardware name: Supermicro
>>>>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
>>>>> 03/06/2012
>>>>> [154152.967029]  0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
>>>>> 0000000000000000
>>>>> [154152.968836]  ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
>>>>> 0000000000000000
>>>>> [154152.970641]  0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
>>>>> ffffea0001e7bfe0
>>>>> [154152.972447] Call Trace:
>>>>> [154152.973011]  [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
>>>>> [154152.974167]  [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
>>>>> [154152.975515]  [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
>>>>> [154152.976826]  [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
>>>>> [154152.978137]  [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
>>>>> [154152.979467]  [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
>>>>> [154152.980816]  [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
>>>>> [154152.982068]  [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
>>>>> [154152.983262]  [<ffffffffa7167399>] kswapd+0x4f9/0x970
>>>>> [154152.984380]  [<ffffffffa7166ea0>] ?
>>>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>>>>> [154152.985942]  [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
>>>>> [154152.987040]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>>> [154152.988313]  [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
>>>>> [154152.989527]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
>>>>>
>>>>> This time with an xfs info:
>>>>> # xfs_info /
>>>>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
>>>>> agcount=4, agsize=58224256 blks
>>>>>           =                       sectsz=512   attr=2, projid32bit=0
>>>>>           =                       crc=0        finobt=0
>>>>> data     =                       bsize=4096   blocks=232897024, imaxpct=25
>>>>>           =                       sunit=64     swidth=384 blks
>>>>> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
>>>>> log      =internal               bsize=4096   blocks=113728, version=2
>>>>>           =                       sectsz=512   sunit=64 blks, lazy-count=1
>>>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>>>>
>>>>
>>>> Can you describe the workload to the filesystem?
>>>
>>> At the time of this trace the rsync backup of the fs has started. So the
>>> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.
>>>
>>
>> Interesting. The warning is associated with releasing a page that has a
>> delayed allocation when it shouldn't. That means something had written
>> to a file to cause the delalloc in the first place. Any idea what could
>> have been writing at the time or shortly before the rsync read workload
>> had kicked in?
>
> It's memory reclaim that tripped over it, so the cause is long gone
> - couple have been anything in the previous 24 hours that caused the
> issue. i.e. rsync has triggered memory reclaim which triggered the
> warning, but I don't think rsync has anything to do with causing the
> page to be in a state that caused the warning.
>
> I'd be interested to know if there are any other warnings in the
> logs - stuff like IO errors, page discards, ENOSPC issues, etc that
> could trigger less travelled write error paths...

No dmesg is absolutely clean. This hasn't happened with 4.1.18 before. 
It has started after upgrade from 4.1 to 4.4.

Stefan

>
> -Dave.
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-05 22:48             ` Dave Chinner
  2016-03-05 22:58               ` Stefan Priebe
@ 2016-03-23 13:26               ` Stefan Priebe - Profihost AG
  2016-03-23 13:28               ` Stefan Priebe - Profihost AG
  2 siblings, 0 replies; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-03-23 13:26 UTC (permalink / raw)
  To: Dave Chinner, Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs


Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>
>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>
>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> got this one today. Not sure if this is a bug.
>>>>>>>
>>>>>>> That looks like the releasepage() delayed allocation block warning. I'm
>>>>>>> not sure we've had any fixes for (or reports of) that issue since the
>>>>>>> v4.2 timeframe.
>>>>>>>
>>>>>>> What is the xfs_info of the associated filesystem? Also, do you have any
>>>>>>> insight as to the possible reproducer application or workload? Is this
>>>>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
>>>>>>> won't fire again regardless until after a reboot.
>>>>>
>>>>> Toda i got this one running 4.3.3.
>>>>>
>>>>> [154152.949610] ------------[ cut here ]------------
>>>>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
>>>>> xfs_vm_releasepage+0xc3/0xf0()
>>>>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class
>>>>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
>>>>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
>>>>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
>>>>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
>>>>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
>>>>> megaraid_sas pps_core
>>>>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
>>>>> [154152.964625] Hardware name: Supermicro
>>>>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
>>>>> 03/06/2012
>>>>> [154152.967029]  0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
>>>>> 0000000000000000
>>>>> [154152.968836]  ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
>>>>> 0000000000000000
>>>>> [154152.970641]  0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
>>>>> ffffea0001e7bfe0
>>>>> [154152.972447] Call Trace:
>>>>> [154152.973011]  [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
>>>>> [154152.974167]  [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
>>>>> [154152.975515]  [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
>>>>> [154152.976826]  [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
>>>>> [154152.978137]  [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
>>>>> [154152.979467]  [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
>>>>> [154152.980816]  [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
>>>>> [154152.982068]  [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
>>>>> [154152.983262]  [<ffffffffa7167399>] kswapd+0x4f9/0x970
>>>>> [154152.984380]  [<ffffffffa7166ea0>] ?

Mit freundlichen Grüßen
  Stefan Priebe
Bachelor of Science in Computer Science (BSCS)
Vorstand (CTO)

-------------------------------
Profihost AG
Expo Plaza 1
30539 Hannover
Deutschland

Tel.: +49 (511) 5151 8181     | Fax.: +49 (511) 5151 8282
URL: http://www.profihost.com | E-Mail: info@profihost.com

Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827
Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350
Vorstand: Cristoph Bluhm, Sebastian Bluhm, Stefan Priebe
Aufsichtsrat: Prof. Dr. iur. Winfried Huck (Vorsitzender)
>>>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>>>>> [154152.985942]  [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
>>>>> [154152.987040]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>>> [154152.988313]  [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
>>>>> [154152.989527]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
>>>>>
>>>>> This time with an xfs info:
>>>>> # xfs_info /
>>>>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
>>>>> agcount=4, agsize=58224256 blks
>>>>>          =                       sectsz=512   attr=2, projid32bit=0
>>>>>          =                       crc=0        finobt=0
>>>>> data     =                       bsize=4096   blocks=232897024, imaxpct=25
>>>>>          =                       sunit=64     swidth=384 blks
>>>>> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
>>>>> log      =internal               bsize=4096   blocks=113728, version=2
>>>>>          =                       sectsz=512   sunit=64 blks, lazy-count=1
>>>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>>>>
>>>>
>>>> Can you describe the workload to the filesystem?
>>>
>>> At the time of this trace the rsync backup of the fs has started. So the
>>> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.
>>>
>>
>> Interesting. The warning is associated with releasing a page that has a
>> delayed allocation when it shouldn't. That means something had written
>> to a file to cause the delalloc in the first place. Any idea what could
>> have been writing at the time or shortly before the rsync read workload
>> had kicked in?
> 
> It's memory reclaim that tripped over it, so the cause is long gone
> - couple have been anything in the previous 24 hours that caused the
> issue. i.e. rsync has triggered memory reclaim which triggered the
> warning, but I don't think rsync has anything to do with causing the
> page to be in a state that caused the warning.
> 
> I'd be interested to know if there are any other warnings in the
> logs - stuff like IO errors, page discards, ENOSPC issues, etc that
> could trigger less travelled write error paths...

This has happened again on 8 different hosts in the last 24 hours
running 4.4.6.

All of those are KVM / Qemu hosts and are doing NO I/O except the normal
OS stuff as the VMs have remote storage. So no database, no rsync on
those hosts - just the OS doing nearly nothing.

All those show:
[153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
xfs_vm_releasepage+0xe2/0xf0()

Stefan

> 
> -Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-05 22:48             ` Dave Chinner
  2016-03-05 22:58               ` Stefan Priebe
  2016-03-23 13:26               ` Stefan Priebe - Profihost AG
@ 2016-03-23 13:28               ` Stefan Priebe - Profihost AG
  2016-03-23 14:07                 ` Brian Foster
  2 siblings, 1 reply; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-03-23 13:28 UTC (permalink / raw)
  To: Dave Chinner, Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs

sorry new one the last one got mangled. Comments inside.

Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>
>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>
>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> got this one today. Not sure if this is a bug.
>>>>>>>
>>>>>>> That looks like the releasepage() delayed allocation block warning. I'm
>>>>>>> not sure we've had any fixes for (or reports of) that issue since the
>>>>>>> v4.2 timeframe.
>>>>>>>
>>>>>>> What is the xfs_info of the associated filesystem? Also, do you have any
>>>>>>> insight as to the possible reproducer application or workload? Is this
>>>>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
>>>>>>> won't fire again regardless until after a reboot.
>>>>>
>>>>> Toda i got this one running 4.3.3.
>>>>>
>>>>> [154152.949610] ------------[ cut here ]------------
>>>>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
>>>>> xfs_vm_releasepage+0xc3/0xf0()
>>>>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class
>>>>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
>>>>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
>>>>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
>>>>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
>>>>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
>>>>> megaraid_sas pps_core
>>>>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
>>>>> [154152.964625] Hardware name: Supermicro
>>>>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
>>>>> 03/06/2012
>>>>> [154152.967029]  0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
>>>>> 0000000000000000
>>>>> [154152.968836]  ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
>>>>> 0000000000000000
>>>>> [154152.970641]  0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
>>>>> ffffea0001e7bfe0
>>>>> [154152.972447] Call Trace:
>>>>> [154152.973011]  [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
>>>>> [154152.974167]  [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
>>>>> [154152.975515]  [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
>>>>> [154152.976826]  [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
>>>>> [154152.978137]  [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
>>>>> [154152.979467]  [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
>>>>> [154152.980816]  [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
>>>>> [154152.982068]  [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
>>>>> [154152.983262]  [<ffffffffa7167399>] kswapd+0x4f9/0x970
>>>>> [154152.984380]  [<ffffffffa7166ea0>] ?
>>>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>>>>> [154152.985942]  [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
>>>>> [154152.987040]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>>> [154152.988313]  [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
>>>>> [154152.989527]  [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
>>>>>
>>>>> This time with an xfs info:
>>>>> # xfs_info /
>>>>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
>>>>> agcount=4, agsize=58224256 blks
>>>>>          =                       sectsz=512   attr=2, projid32bit=0
>>>>>          =                       crc=0        finobt=0
>>>>> data     =                       bsize=4096   blocks=232897024, imaxpct=25
>>>>>          =                       sunit=64     swidth=384 blks
>>>>> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
>>>>> log      =internal               bsize=4096   blocks=113728, version=2
>>>>>          =                       sectsz=512   sunit=64 blks, lazy-count=1
>>>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>>>>
>>>>
>>>> Can you describe the workload to the filesystem?
>>>
>>> At the time of this trace the rsync backup of the fs has started. So the
>>> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.
>>>
>>
>> Interesting. The warning is associated with releasing a page that has a
>> delayed allocation when it shouldn't. That means something had written
>> to a file to cause the delalloc in the first place. Any idea what could
>> have been writing at the time or shortly before the rsync read workload
>> had kicked in?
> 
> It's memory reclaim that tripped over it, so the cause is long gone
> - couple have been anything in the previous 24 hours that caused the
> issue. i.e. rsync has triggered memory reclaim which triggered the
> warning, but I don't think rsync has anything to do with causing the
> page to be in a state that caused the warning.
> 
> I'd be interested to know if there are any other warnings in the
> logs - stuff like IO errors, page discards, ENOSPC issues, etc that
> could trigger less travelled write error paths...

This has happened again on 8 different hosts in the last 24 hours
running 4.4.6.

All of those are KVM / Qemu hosts and are doing NO I/O except the normal
OS stuff as the VMs have remote storage. So no database, no rsync on
those hosts - just the OS doing nearly nothing.

All those show:
[153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
xfs_vm_releasepage+0xe2/0xf0()

Stefan

> 
> -Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-23 13:28               ` Stefan Priebe - Profihost AG
@ 2016-03-23 14:07                 ` Brian Foster
  2016-03-24  8:10                   ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 49+ messages in thread
From: Brian Foster @ 2016-03-23 14:07 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: linux-fsdevel, xfs-masters, xfs

On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
> sorry new one the last one got mangled. Comments inside.
> 
> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> > On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> >> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> >>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
> >>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> >>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >>>>>>
> >>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>>>>>>
> >>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
...
> 
> This has happened again on 8 different hosts in the last 24 hours
> running 4.4.6.
> 
> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
> OS stuff as the VMs have remote storage. So no database, no rsync on
> those hosts - just the OS doing nearly nothing.
> 
> All those show:
> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
> xfs_vm_releasepage+0xe2/0xf0()
> 

Ok, well at this point the warning isn't telling us anything beyond
you're reproducing the problem. We can't really make progress without
more information. We don't necessarily know what application or
operations caused this by the time it occurs, but perhaps knowing what
file is affected could give us a hint.

We have the xfs_releasepage tracepoint, but that's unconditional and so
might generate a lot of noise by default. Could you enable the
xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
E.g., we could leave a long running 'trace-cmd record -e
"xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
-e "xfs:xfs_releasepage"' and leave something like 'cat
/sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
~/trace.out' running to capture instances.

If we can get a tracepoint hit, it will include the inode number and
something like 'find / -inum <ino>' can point us at the file.

Brian

> Stefan
> 
> > 
> > -Dave.
> > 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-23 14:07                 ` Brian Foster
@ 2016-03-24  8:10                   ` Stefan Priebe - Profihost AG
  2016-03-24  8:15                     ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-03-24  8:10 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs


Am 23.03.2016 um 15:07 schrieb Brian Foster:
> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>> sorry new one the last one got mangled. Comments inside.
>>
>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>
>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>
>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> ...
>>
>> This has happened again on 8 different hosts in the last 24 hours
>> running 4.4.6.
>>
>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>> OS stuff as the VMs have remote storage. So no database, no rsync on
>> those hosts - just the OS doing nearly nothing.
>>
>> All those show:
>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>> xfs_vm_releasepage+0xe2/0xf0()
>>
> 
> Ok, well at this point the warning isn't telling us anything beyond
> you're reproducing the problem. We can't really make progress without
> more information. We don't necessarily know what application or
> operations caused this by the time it occurs, but perhaps knowing what
> file is affected could give us a hint.
> 
> We have the xfs_releasepage tracepoint, but that's unconditional and so
> might generate a lot of noise by default. Could you enable the
> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
> E.g., we could leave a long running 'trace-cmd record -e
> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
> -e "xfs:xfs_releasepage"' and leave something like 'cat
> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
> ~/trace.out' running to capture instances.
> 
> If we can get a tracepoint hit, it will include the inode number and
> something like 'find / -inum <ino>' can point us at the file.

thanks - need to compile trace-cmd first. Do you know if and how it
influences performance?

Stefan

> 
> Brian
> 
>> Stefan
>>
>>>
>>> -Dave.
>>>
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-24  8:10                   ` Stefan Priebe - Profihost AG
@ 2016-03-24  8:15                     ` Stefan Priebe - Profihost AG
  2016-03-24 11:17                       ` Brian Foster
  0 siblings, 1 reply; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-03-24  8:15 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs


Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
> 
> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>> sorry new one the last one got mangled. Comments inside.
>>>
>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>
>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>
>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>> ...
>>>
>>> This has happened again on 8 different hosts in the last 24 hours
>>> running 4.4.6.
>>>
>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>> those hosts - just the OS doing nearly nothing.
>>>
>>> All those show:
>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>> xfs_vm_releasepage+0xe2/0xf0()
>>>
>>
>> Ok, well at this point the warning isn't telling us anything beyond
>> you're reproducing the problem. We can't really make progress without
>> more information. We don't necessarily know what application or
>> operations caused this by the time it occurs, but perhaps knowing what
>> file is affected could give us a hint.
>>
>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>> might generate a lot of noise by default. Could you enable the
>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>> E.g., we could leave a long running 'trace-cmd record -e
>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>> ~/trace.out' running to capture instances.

Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
it in the trace.out even the WARN_ONCE was already triggered?

Stefan


> 
> Stefan
> 
>>
>> Brian
>>
>>> Stefan
>>>
>>>>
>>>> -Dave.
>>>>
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-24  8:15                     ` Stefan Priebe - Profihost AG
@ 2016-03-24 11:17                       ` Brian Foster
  2016-03-24 12:17                         ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 49+ messages in thread
From: Brian Foster @ 2016-03-24 11:17 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: linux-fsdevel, xfs-masters, xfs

On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
> 
> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
> > 
> > Am 23.03.2016 um 15:07 schrieb Brian Foster:
> >> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
> >>> sorry new one the last one got mangled. Comments inside.
> >>>
> >>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> >>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> >>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> >>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
> >>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> >>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >>>>>>>>>
> >>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>>>>>>>>>
> >>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> >> ...
> >>>
> >>> This has happened again on 8 different hosts in the last 24 hours
> >>> running 4.4.6.
> >>>
> >>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
> >>> OS stuff as the VMs have remote storage. So no database, no rsync on
> >>> those hosts - just the OS doing nearly nothing.
> >>>
> >>> All those show:
> >>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
> >>> xfs_vm_releasepage+0xe2/0xf0()
> >>>
> >>
> >> Ok, well at this point the warning isn't telling us anything beyond
> >> you're reproducing the problem. We can't really make progress without
> >> more information. We don't necessarily know what application or
> >> operations caused this by the time it occurs, but perhaps knowing what
> >> file is affected could give us a hint.
> >>
> >> We have the xfs_releasepage tracepoint, but that's unconditional and so
> >> might generate a lot of noise by default. Could you enable the
> >> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
> >> E.g., we could leave a long running 'trace-cmd record -e
> >> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
> >> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
> >> -e "xfs:xfs_releasepage"' and leave something like 'cat
> >> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
> >> ~/trace.out' running to capture instances.
> 
> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
> it in the trace.out even the WARN_ONCE was already triggered?
> 

The tracepoint is independent from the warning (see
xfs_vm_releasepage()), so the tracepoint will fire every invocation of
the function regardless of whether delalloc blocks still exist at that
point. That creates the need to filter the entries.

With regard to performance, I believe the tracepoints are intended to be
pretty lightweight. I don't think it should hurt to try it on a box,
observe for a bit and make sure there isn't a huge impact. Note that the
'trace-cmd record' approach will save everything to file, so that's
something to consider I suppose.

Brian

> Stefan
> 
> 
> > 
> > Stefan
> > 
> >>
> >> Brian
> >>
> >>> Stefan
> >>>
> >>>>
> >>>> -Dave.
> >>>>
> >>>
> >>> _______________________________________________
> >>> xfs mailing list
> >>> xfs@oss.sgi.com
> >>> http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-24 11:17                       ` Brian Foster
@ 2016-03-24 12:17                         ` Stefan Priebe - Profihost AG
  2016-03-24 12:24                           ` Brian Foster
  0 siblings, 1 reply; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-03-24 12:17 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs


Am 24.03.2016 um 12:17 schrieb Brian Foster:
> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
>>
>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
>>>
>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>> sorry new one the last one got mangled. Comments inside.
>>>>>
>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>
>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>> ...
>>>>>
>>>>> This has happened again on 8 different hosts in the last 24 hours
>>>>> running 4.4.6.
>>>>>
>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>>>> those hosts - just the OS doing nearly nothing.
>>>>>
>>>>> All those show:
>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>>>> xfs_vm_releasepage+0xe2/0xf0()
>>>>>
>>>>
>>>> Ok, well at this point the warning isn't telling us anything beyond
>>>> you're reproducing the problem. We can't really make progress without
>>>> more information. We don't necessarily know what application or
>>>> operations caused this by the time it occurs, but perhaps knowing what
>>>> file is affected could give us a hint.
>>>>
>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>>>> might generate a lot of noise by default. Could you enable the
>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>>>> E.g., we could leave a long running 'trace-cmd record -e
>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>>>> ~/trace.out' running to capture instances.
>>
>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
>> it in the trace.out even the WARN_ONCE was already triggered?
>>
> 
> The tracepoint is independent from the warning (see
> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
> the function regardless of whether delalloc blocks still exist at that
> point. That creates the need to filter the entries.
> 
> With regard to performance, I believe the tracepoints are intended to be
> pretty lightweight. I don't think it should hurt to try it on a box,
> observe for a bit and make sure there isn't a huge impact. Note that the
> 'trace-cmd record' approach will save everything to file, so that's
> something to consider I suppose.

Tests / cat is running. Is there any way to test if it works? Or is it
enough that cat prints stuff from time to time but does not match -v
delalloc 0

Stefan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-24 12:17                         ` Stefan Priebe - Profihost AG
@ 2016-03-24 12:24                           ` Brian Foster
  2016-04-04  6:12                             ` Stefan Priebe - Profihost AG
  2016-05-11 12:26                             ` Stefan Priebe - Profihost AG
  0 siblings, 2 replies; 49+ messages in thread
From: Brian Foster @ 2016-03-24 12:24 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: linux-fsdevel, xfs-masters, xfs

On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
> 
> Am 24.03.2016 um 12:17 schrieb Brian Foster:
> > On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
> >>
> >> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
> >>>
> >>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
> >>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
> >>>>> sorry new one the last one got mangled. Comments inside.
> >>>>>
> >>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> >>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> >>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> >>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
> >>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> >>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >>>>>>>>>>>
> >>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> >>>> ...
> >>>>>
> >>>>> This has happened again on 8 different hosts in the last 24 hours
> >>>>> running 4.4.6.
> >>>>>
> >>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
> >>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
> >>>>> those hosts - just the OS doing nearly nothing.
> >>>>>
> >>>>> All those show:
> >>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
> >>>>> xfs_vm_releasepage+0xe2/0xf0()
> >>>>>
> >>>>
> >>>> Ok, well at this point the warning isn't telling us anything beyond
> >>>> you're reproducing the problem. We can't really make progress without
> >>>> more information. We don't necessarily know what application or
> >>>> operations caused this by the time it occurs, but perhaps knowing what
> >>>> file is affected could give us a hint.
> >>>>
> >>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
> >>>> might generate a lot of noise by default. Could you enable the
> >>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
> >>>> E.g., we could leave a long running 'trace-cmd record -e
> >>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
> >>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
> >>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
> >>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
> >>>> ~/trace.out' running to capture instances.
> >>
> >> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
> >> it in the trace.out even the WARN_ONCE was already triggered?
> >>
> > 
> > The tracepoint is independent from the warning (see
> > xfs_vm_releasepage()), so the tracepoint will fire every invocation of
> > the function regardless of whether delalloc blocks still exist at that
> > point. That creates the need to filter the entries.
> > 
> > With regard to performance, I believe the tracepoints are intended to be
> > pretty lightweight. I don't think it should hurt to try it on a box,
> > observe for a bit and make sure there isn't a huge impact. Note that the
> > 'trace-cmd record' approach will save everything to file, so that's
> > something to consider I suppose.
> 
> Tests / cat is running. Is there any way to test if it works? Or is it
> enough that cat prints stuff from time to time but does not match -v
> delalloc 0
> 

What is it printing where delalloc != 0? You could always just cat
trace_pipe and make sure the event is firing, it's just that I suspect
most entries will have delalloc == unwritten == 0.

Also, while the tracepoint fires independent of the warning, it might
not be a bad idea to restart a system that has already seen the warning
since boot, just to provide some correlation or additional notification
when the problem occurs.

Brian

> Stefan
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-24 12:24                           ` Brian Foster
@ 2016-04-04  6:12                             ` Stefan Priebe - Profihost AG
  2016-05-11 12:26                             ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-04-04  6:12 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs

Am 24.03.2016 um 13:24 schrieb Brian Foster:
> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
>>
>> Am 24.03.2016 um 12:17 schrieb Brian Foster:
>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
>>>>>
>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>> sorry new one the last one got mangled. Comments inside.
>>>>>>>
>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>> ...
>>>>>>>
>>>>>>> This has happened again on 8 different hosts in the last 24 hours
>>>>>>> running 4.4.6.
>>>>>>>
>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>>>>>> those hosts - just the OS doing nearly nothing.
>>>>>>>
>>>>>>> All those show:
>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>>>>>> xfs_vm_releasepage+0xe2/0xf0()
>>>>>>>
>>>>>>
>>>>>> Ok, well at this point the warning isn't telling us anything beyond
>>>>>> you're reproducing the problem. We can't really make progress without
>>>>>> more information. We don't necessarily know what application or
>>>>>> operations caused this by the time it occurs, but perhaps knowing what
>>>>>> file is affected could give us a hint.
>>>>>>
>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>>>>>> might generate a lot of noise by default. Could you enable the
>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>>>>>> E.g., we could leave a long running 'trace-cmd record -e
>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>>>>>> ~/trace.out' running to capture instances.
>>>>
>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
>>>> it in the trace.out even the WARN_ONCE was already triggered?
>>>>
>>>
>>> The tracepoint is independent from the warning (see
>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
>>> the function regardless of whether delalloc blocks still exist at that
>>> point. That creates the need to filter the entries.
>>>
>>> With regard to performance, I believe the tracepoints are intended to be
>>> pretty lightweight. I don't think it should hurt to try it on a box,
>>> observe for a bit and make sure there isn't a huge impact. Note that the
>>> 'trace-cmd record' approach will save everything to file, so that's
>>> something to consider I suppose.
>>
>> Tests / cat is running. Is there any way to test if it works? Or is it
>> enough that cat prints stuff from time to time but does not match -v
>> delalloc 0
>>
> 
> What is it printing where delalloc != 0? You could always just cat
> trace_pipe and make sure the event is firing, it's just that I suspect
> most entries will have delalloc == unwritten == 0.
> 
> Also, while the tracepoint fires independent of the warning, it might
> not be a bad idea to restart a system that has already seen the warning
> since boot, just to provide some correlation or additional notification
> when the problem occurs.

I still wasn't able to catch one with trace-cmd. But i notice that it
happens mostly in the first 48hours after a reboot. All systems running
since some days but noone triggers this again. All systems who have
triggered this bug got rebootet.

Stefan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-03-24 12:24                           ` Brian Foster
  2016-04-04  6:12                             ` Stefan Priebe - Profihost AG
@ 2016-05-11 12:26                             ` Stefan Priebe - Profihost AG
  2016-05-11 13:34                               ` Brian Foster
  1 sibling, 1 reply; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-05-11 12:26 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs

Hi Brian,

i'm still unable to grab anything to the trace file? Is there anything
to check if it's working at all?

This still happens in the first 48 hours after a fresh reboot.

Stefan

Am 24.03.2016 um 13:24 schrieb Brian Foster:
> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
>>
>> Am 24.03.2016 um 12:17 schrieb Brian Foster:
>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
>>>>>
>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>> sorry new one the last one got mangled. Comments inside.
>>>>>>>
>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>> ...
>>>>>>>
>>>>>>> This has happened again on 8 different hosts in the last 24 hours
>>>>>>> running 4.4.6.
>>>>>>>
>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>>>>>> those hosts - just the OS doing nearly nothing.
>>>>>>>
>>>>>>> All those show:
>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>>>>>> xfs_vm_releasepage+0xe2/0xf0()
>>>>>>>
>>>>>>
>>>>>> Ok, well at this point the warning isn't telling us anything beyond
>>>>>> you're reproducing the problem. We can't really make progress without
>>>>>> more information. We don't necessarily know what application or
>>>>>> operations caused this by the time it occurs, but perhaps knowing what
>>>>>> file is affected could give us a hint.
>>>>>>
>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>>>>>> might generate a lot of noise by default. Could you enable the
>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>>>>>> E.g., we could leave a long running 'trace-cmd record -e
>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>>>>>> ~/trace.out' running to capture instances.
>>>>
>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
>>>> it in the trace.out even the WARN_ONCE was already triggered?
>>>>
>>>
>>> The tracepoint is independent from the warning (see
>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
>>> the function regardless of whether delalloc blocks still exist at that
>>> point. That creates the need to filter the entries.
>>>
>>> With regard to performance, I believe the tracepoints are intended to be
>>> pretty lightweight. I don't think it should hurt to try it on a box,
>>> observe for a bit and make sure there isn't a huge impact. Note that the
>>> 'trace-cmd record' approach will save everything to file, so that's
>>> something to consider I suppose.
>>
>> Tests / cat is running. Is there any way to test if it works? Or is it
>> enough that cat prints stuff from time to time but does not match -v
>> delalloc 0
>>
> 
> What is it printing where delalloc != 0? You could always just cat
> trace_pipe and make sure the event is firing, it's just that I suspect
> most entries will have delalloc == unwritten == 0.
> 
> Also, while the tracepoint fires independent of the warning, it might
> not be a bad idea to restart a system that has already seen the warning
> since boot, just to provide some correlation or additional notification
> when the problem occurs.
> 
> Brian
> 
>> Stefan
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-05-11 12:26                             ` Stefan Priebe - Profihost AG
@ 2016-05-11 13:34                               ` Brian Foster
  2016-05-11 14:03                                 ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 49+ messages in thread
From: Brian Foster @ 2016-05-11 13:34 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: linux-fsdevel, xfs-masters, xfs

On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote:
> Hi Brian,
> 
> i'm still unable to grab anything to the trace file? Is there anything
> to check if it's working at all?
> 

See my previous mail:

http://oss.sgi.com/pipermail/xfs/2016-March/047793.html

E.g., something like this should work after writing to and removing a
new file:

# trace-cmd start -e "xfs:xfs_releasepage"
# cat /sys/kernel/debug/tracing/trace_pipe
...
rm-8198  [000] ....  9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0

Once that is working, add the grep command to filter out "delalloc 0"
instances, etc. For example:

	cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out

Brian

> This still happens in the first 48 hours after a fresh reboot.
> 
> Stefan
> 
> Am 24.03.2016 um 13:24 schrieb Brian Foster:
> > On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
> >>
> >> Am 24.03.2016 um 12:17 schrieb Brian Foster:
> >>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
> >>>>
> >>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
> >>>>>
> >>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
> >>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
> >>>>>>> sorry new one the last one got mangled. Comments inside.
> >>>>>>>
> >>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> >>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> >>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> >>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
> >>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> >>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> >>>>>> ...
> >>>>>>>
> >>>>>>> This has happened again on 8 different hosts in the last 24 hours
> >>>>>>> running 4.4.6.
> >>>>>>>
> >>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
> >>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
> >>>>>>> those hosts - just the OS doing nearly nothing.
> >>>>>>>
> >>>>>>> All those show:
> >>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
> >>>>>>> xfs_vm_releasepage+0xe2/0xf0()
> >>>>>>>
> >>>>>>
> >>>>>> Ok, well at this point the warning isn't telling us anything beyond
> >>>>>> you're reproducing the problem. We can't really make progress without
> >>>>>> more information. We don't necessarily know what application or
> >>>>>> operations caused this by the time it occurs, but perhaps knowing what
> >>>>>> file is affected could give us a hint.
> >>>>>>
> >>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
> >>>>>> might generate a lot of noise by default. Could you enable the
> >>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
> >>>>>> E.g., we could leave a long running 'trace-cmd record -e
> >>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
> >>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
> >>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
> >>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
> >>>>>> ~/trace.out' running to capture instances.
> >>>>
> >>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
> >>>> it in the trace.out even the WARN_ONCE was already triggered?
> >>>>
> >>>
> >>> The tracepoint is independent from the warning (see
> >>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
> >>> the function regardless of whether delalloc blocks still exist at that
> >>> point. That creates the need to filter the entries.
> >>>
> >>> With regard to performance, I believe the tracepoints are intended to be
> >>> pretty lightweight. I don't think it should hurt to try it on a box,
> >>> observe for a bit and make sure there isn't a huge impact. Note that the
> >>> 'trace-cmd record' approach will save everything to file, so that's
> >>> something to consider I suppose.
> >>
> >> Tests / cat is running. Is there any way to test if it works? Or is it
> >> enough that cat prints stuff from time to time but does not match -v
> >> delalloc 0
> >>
> > 
> > What is it printing where delalloc != 0? You could always just cat
> > trace_pipe and make sure the event is firing, it's just that I suspect
> > most entries will have delalloc == unwritten == 0.
> > 
> > Also, while the tracepoint fires independent of the warning, it might
> > not be a bad idea to restart a system that has already seen the warning
> > since boot, just to provide some correlation or additional notification
> > when the problem occurs.
> > 
> > Brian
> > 
> >> Stefan
> >>
> >> _______________________________________________
> >> xfs mailing list
> >> xfs@oss.sgi.com
> >> http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-05-11 13:34                               ` Brian Foster
@ 2016-05-11 14:03                                 ` Stefan Priebe - Profihost AG
  2016-05-11 15:59                                   ` Brian Foster
  0 siblings, 1 reply; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-05-11 14:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, xfs-masters, xfs


Am 11.05.2016 um 15:34 schrieb Brian Foster:
> On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote:
>> Hi Brian,
>>
>> i'm still unable to grab anything to the trace file? Is there anything
>> to check if it's working at all?
>>
> 
> See my previous mail:
> 
> http://oss.sgi.com/pipermail/xfs/2016-March/047793.html
> 
> E.g., something like this should work after writing to and removing a
> new file:
> 
> # trace-cmd start -e "xfs:xfs_releasepage"
> # cat /sys/kernel/debug/tracing/trace_pipe
> ...
> rm-8198  [000] ....  9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0

arg sorry yes that's working but delalloc is always 0.

May be i have to hook that into my initramfs to be fast enough?

Stefan

> Once that is working, add the grep command to filter out "delalloc 0"
> instances, etc. For example:
> 
> 	cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out
> 
> Brian
> 
>> This still happens in the first 48 hours after a fresh reboot.
>>
>> Stefan
>>
>> Am 24.03.2016 um 13:24 schrieb Brian Foster:
>>> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 24.03.2016 um 12:17 schrieb Brian Foster:
>>>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>
>>>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
>>>>>>>
>>>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>>>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>>>> sorry new one the last one got mangled. Comments inside.
>>>>>>>>>
>>>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> This has happened again on 8 different hosts in the last 24 hours
>>>>>>>>> running 4.4.6.
>>>>>>>>>
>>>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>>>>>>>> those hosts - just the OS doing nearly nothing.
>>>>>>>>>
>>>>>>>>> All those show:
>>>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>>>>>>>> xfs_vm_releasepage+0xe2/0xf0()
>>>>>>>>>
>>>>>>>>
>>>>>>>> Ok, well at this point the warning isn't telling us anything beyond
>>>>>>>> you're reproducing the problem. We can't really make progress without
>>>>>>>> more information. We don't necessarily know what application or
>>>>>>>> operations caused this by the time it occurs, but perhaps knowing what
>>>>>>>> file is affected could give us a hint.
>>>>>>>>
>>>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>>>>>>>> might generate a lot of noise by default. Could you enable the
>>>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>>>>>>>> E.g., we could leave a long running 'trace-cmd record -e
>>>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>>>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>>>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>>>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>>>>>>>> ~/trace.out' running to capture instances.
>>>>>>
>>>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
>>>>>> it in the trace.out even the WARN_ONCE was already triggered?
>>>>>>
>>>>>
>>>>> The tracepoint is independent from the warning (see
>>>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
>>>>> the function regardless of whether delalloc blocks still exist at that
>>>>> point. That creates the need to filter the entries.
>>>>>
>>>>> With regard to performance, I believe the tracepoints are intended to be
>>>>> pretty lightweight. I don't think it should hurt to try it on a box,
>>>>> observe for a bit and make sure there isn't a huge impact. Note that the
>>>>> 'trace-cmd record' approach will save everything to file, so that's
>>>>> something to consider I suppose.
>>>>
>>>> Tests / cat is running. Is there any way to test if it works? Or is it
>>>> enough that cat prints stuff from time to time but does not match -v
>>>> delalloc 0
>>>>
>>>
>>> What is it printing where delalloc != 0? You could always just cat
>>> trace_pipe and make sure the event is firing, it's just that I suspect
>>> most entries will have delalloc == unwritten == 0.
>>>
>>> Also, while the tracepoint fires independent of the warning, it might
>>> not be a bad idea to restart a system that has already seen the warning
>>> since boot, just to provide some correlation or additional notification
>>> when the problem occurs.
>>>
>>> Brian
>>>
>>>> Stefan
>>>>
>>>> _______________________________________________
>>>> xfs mailing list
>>>> xfs@oss.sgi.com
>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-05-11 14:03                                 ` Stefan Priebe - Profihost AG
@ 2016-05-11 15:59                                   ` Brian Foster
  2016-05-11 19:20                                     ` Stefan Priebe
  2016-05-15 11:03                                     ` Stefan Priebe
  0 siblings, 2 replies; 49+ messages in thread
From: Brian Foster @ 2016-05-11 15:59 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: xfs-masters, xfs

Dropped non-XFS cc's, probably no need to spam other lists at this
point...

On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote:
> 
> Am 11.05.2016 um 15:34 schrieb Brian Foster:
> > On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote:
> >> Hi Brian,
> >>
> >> i'm still unable to grab anything to the trace file? Is there anything
> >> to check if it's working at all?
> >>
> > 
> > See my previous mail:
> > 
> > http://oss.sgi.com/pipermail/xfs/2016-March/047793.html
> > 
> > E.g., something like this should work after writing to and removing a
> > new file:
> > 
> > # trace-cmd start -e "xfs:xfs_releasepage"
> > # cat /sys/kernel/debug/tracing/trace_pipe
> > ...
> > rm-8198  [000] ....  9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0
> 
> arg sorry yes that's working but delalloc is always 0.
> 

Hrm, Ok. That is strange.

> May be i have to hook that into my initramfs to be fast enough?
> 

Not sure that would matter.. you said it occurs within 48 hours? I take
that to mean it doesn't occur immediately on boot. You should be able to
tell from the logs or dmesg if it happens before you get a chance to
start the tracing.

Well, the options I can think of are:

- Perhaps I botched matching up the line number to the warning, in which
  case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch
  any delalloc or unwritten blocks at releasepage() time.

- Perhaps there's a race that the tracepoint doesn't catch. The warnings
  are based on local vars, so we could instrument the code to print a
  warning[1] to try and get the inode number.

Brian

[1] - compile tested diff:

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 40645a4..94738ea 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1038,11 +1038,18 @@ xfs_vm_releasepage(
 	gfp_t			gfp_mask)
 {
 	int			delalloc, unwritten;
+	struct xfs_inode	*ip = XFS_I(page->mapping->host);
 
 	trace_xfs_releasepage(page->mapping->host, page, 0, 0);
 
 	xfs_count_page_state(page, &delalloc, &unwritten);
 
+	if (delalloc || unwritten)
+		xfs_warn(ip->i_mount,
+		"ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx",
+			 ip->i_ino, delalloc, unwritten, page_offset(page),
+			 i_size_read(page->mapping->host));
+
 	if (WARN_ON_ONCE(delalloc))
 		return 0;
 	if (WARN_ON_ONCE(unwritten))

> Stefan
> 
> > Once that is working, add the grep command to filter out "delalloc 0"
> > instances, etc. For example:
> > 
> > 	cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out
> > 
> > Brian
> > 
> >> This still happens in the first 48 hours after a fresh reboot.
> >>
> >> Stefan
> >>
> >> Am 24.03.2016 um 13:24 schrieb Brian Foster:
> >>> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
> >>>>
> >>>> Am 24.03.2016 um 12:17 schrieb Brian Foster:
> >>>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
> >>>>>>
> >>>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
> >>>>>>>
> >>>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
> >>>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>> sorry new one the last one got mangled. Comments inside.
> >>>>>>>>>
> >>>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> >>>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> >>>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> >>>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
> >>>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> >>>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> >>>>>>>> ...
> >>>>>>>>>
> >>>>>>>>> This has happened again on 8 different hosts in the last 24 hours
> >>>>>>>>> running 4.4.6.
> >>>>>>>>>
> >>>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
> >>>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
> >>>>>>>>> those hosts - just the OS doing nearly nothing.
> >>>>>>>>>
> >>>>>>>>> All those show:
> >>>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
> >>>>>>>>> xfs_vm_releasepage+0xe2/0xf0()
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Ok, well at this point the warning isn't telling us anything beyond
> >>>>>>>> you're reproducing the problem. We can't really make progress without
> >>>>>>>> more information. We don't necessarily know what application or
> >>>>>>>> operations caused this by the time it occurs, but perhaps knowing what
> >>>>>>>> file is affected could give us a hint.
> >>>>>>>>
> >>>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
> >>>>>>>> might generate a lot of noise by default. Could you enable the
> >>>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
> >>>>>>>> E.g., we could leave a long running 'trace-cmd record -e
> >>>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
> >>>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
> >>>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
> >>>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
> >>>>>>>> ~/trace.out' running to capture instances.
> >>>>>>
> >>>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
> >>>>>> it in the trace.out even the WARN_ONCE was already triggered?
> >>>>>>
> >>>>>
> >>>>> The tracepoint is independent from the warning (see
> >>>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
> >>>>> the function regardless of whether delalloc blocks still exist at that
> >>>>> point. That creates the need to filter the entries.
> >>>>>
> >>>>> With regard to performance, I believe the tracepoints are intended to be
> >>>>> pretty lightweight. I don't think it should hurt to try it on a box,
> >>>>> observe for a bit and make sure there isn't a huge impact. Note that the
> >>>>> 'trace-cmd record' approach will save everything to file, so that's
> >>>>> something to consider I suppose.
> >>>>
> >>>> Tests / cat is running. Is there any way to test if it works? Or is it
> >>>> enough that cat prints stuff from time to time but does not match -v
> >>>> delalloc 0
> >>>>
> >>>
> >>> What is it printing where delalloc != 0? You could always just cat
> >>> trace_pipe and make sure the event is firing, it's just that I suspect
> >>> most entries will have delalloc == unwritten == 0.
> >>>
> >>> Also, while the tracepoint fires independent of the warning, it might
> >>> not be a bad idea to restart a system that has already seen the warning
> >>> since boot, just to provide some correlation or additional notification
> >>> when the problem occurs.
> >>>
> >>> Brian
> >>>
> >>>> Stefan
> >>>>
> >>>> _______________________________________________
> >>>> xfs mailing list
> >>>> xfs@oss.sgi.com
> >>>> http://oss.sgi.com/mailman/listinfo/xfs
> >>
> >> _______________________________________________
> >> xfs mailing list
> >> xfs@oss.sgi.com
> >> http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-05-11 15:59                                   ` Brian Foster
@ 2016-05-11 19:20                                     ` Stefan Priebe
  2016-05-15 11:03                                     ` Stefan Priebe
  1 sibling, 0 replies; 49+ messages in thread
From: Stefan Priebe @ 2016-05-11 19:20 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs-masters, xfs


Am 11.05.2016 um 17:59 schrieb Brian Foster:
> Dropped non-XFS cc's, probably no need to spam other lists at this
> point...
>
> On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote:
>>
>> Am 11.05.2016 um 15:34 schrieb Brian Foster:
>>> On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote:
>>>> Hi Brian,
>>>>
>>>> i'm still unable to grab anything to the trace file? Is there anything
>>>> to check if it's working at all?
>>>>
>>>
>>> See my previous mail:
>>>
>>> http://oss.sgi.com/pipermail/xfs/2016-March/047793.html
>>>
>>> E.g., something like this should work after writing to and removing a
>>> new file:
>>>
>>> # trace-cmd start -e "xfs:xfs_releasepage"
>>> # cat /sys/kernel/debug/tracing/trace_pipe
>>> ...
>>> rm-8198  [000] ....  9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0
>>
>> arg sorry yes that's working but delalloc is always 0.
>>
>
> Hrm, Ok. That is strange.
>
>> May be i have to hook that into my initramfs to be fast enough?
>>
>
> Not sure that would matter.. you said it occurs within 48 hours? I take
> that to mean it doesn't occur immediately on boot. You should be able to
> tell from the logs or dmesg if it happens before you get a chance to
> start the tracing.
>
> Well, the options I can think of are:
>
> - Perhaps I botched matching up the line number to the warning, in which
>    case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch
>    any delalloc or unwritten blocks at releasepage() time.

OK i changed the grep command.

>
> - Perhaps there's a race that the tracepoint doesn't catch. The warnings
>    are based on local vars, so we could instrument the code to print a
>    warning[1] to try and get the inode number.

Thx i also added your patch.

So we need to wait another 48h.

Greets,
Stefan

> Brian
>
> [1] - compile tested diff:
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 40645a4..94738ea 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1038,11 +1038,18 @@ xfs_vm_releasepage(
>   	gfp_t			gfp_mask)
>   {
>   	int			delalloc, unwritten;
> +	struct xfs_inode	*ip = XFS_I(page->mapping->host);
>
>   	trace_xfs_releasepage(page->mapping->host, page, 0, 0);
>
>   	xfs_count_page_state(page, &delalloc, &unwritten);
>
> +	if (delalloc || unwritten)
> +		xfs_warn(ip->i_mount,
> +		"ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx",
> +			 ip->i_ino, delalloc, unwritten, page_offset(page),
> +			 i_size_read(page->mapping->host));
> +
>   	if (WARN_ON_ONCE(delalloc))
>   		return 0;
>   	if (WARN_ON_ONCE(unwritten))
>
>> Stefan
>>
>>> Once that is working, add the grep command to filter out "delalloc 0"
>>> instances, etc. For example:
>>>
>>> 	cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out
>>>
>>> Brian
>>>
>>>> This still happens in the first 48 hours after a fresh reboot.
>>>>
>>>> Stefan
>>>>
>>>> Am 24.03.2016 um 13:24 schrieb Brian Foster:
>>>>> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>
>>>>>> Am 24.03.2016 um 12:17 schrieb Brian Foster:
>>>>>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>>>
>>>>>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>
>>>>>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>>>>>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>> sorry new one the last one got mangled. Comments inside.
>>>>>>>>>>>
>>>>>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>>>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> This has happened again on 8 different hosts in the last 24 hours
>>>>>>>>>>> running 4.4.6.
>>>>>>>>>>>
>>>>>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>>>>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>>>>>>>>>> those hosts - just the OS doing nearly nothing.
>>>>>>>>>>>
>>>>>>>>>>> All those show:
>>>>>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>>>>>>>>>> xfs_vm_releasepage+0xe2/0xf0()
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ok, well at this point the warning isn't telling us anything beyond
>>>>>>>>>> you're reproducing the problem. We can't really make progress without
>>>>>>>>>> more information. We don't necessarily know what application or
>>>>>>>>>> operations caused this by the time it occurs, but perhaps knowing what
>>>>>>>>>> file is affected could give us a hint.
>>>>>>>>>>
>>>>>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>>>>>>>>>> might generate a lot of noise by default. Could you enable the
>>>>>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>>>>>>>>>> E.g., we could leave a long running 'trace-cmd record -e
>>>>>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>>>>>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>>>>>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>>>>>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>>>>>>>>>> ~/trace.out' running to capture instances.
>>>>>>>>
>>>>>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
>>>>>>>> it in the trace.out even the WARN_ONCE was already triggered?
>>>>>>>>
>>>>>>>
>>>>>>> The tracepoint is independent from the warning (see
>>>>>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
>>>>>>> the function regardless of whether delalloc blocks still exist at that
>>>>>>> point. That creates the need to filter the entries.
>>>>>>>
>>>>>>> With regard to performance, I believe the tracepoints are intended to be
>>>>>>> pretty lightweight. I don't think it should hurt to try it on a box,
>>>>>>> observe for a bit and make sure there isn't a huge impact. Note that the
>>>>>>> 'trace-cmd record' approach will save everything to file, so that's
>>>>>>> something to consider I suppose.
>>>>>>
>>>>>> Tests / cat is running. Is there any way to test if it works? Or is it
>>>>>> enough that cat prints stuff from time to time but does not match -v
>>>>>> delalloc 0
>>>>>>
>>>>>
>>>>> What is it printing where delalloc != 0? You could always just cat
>>>>> trace_pipe and make sure the event is firing, it's just that I suspect
>>>>> most entries will have delalloc == unwritten == 0.
>>>>>
>>>>> Also, while the tracepoint fires independent of the warning, it might
>>>>> not be a bad idea to restart a system that has already seen the warning
>>>>> since boot, just to provide some correlation or additional notification
>>>>> when the problem occurs.
>>>>>
>>>>> Brian
>>>>>
>>>>>> Stefan
>>>>>>
>>>>>> _______________________________________________
>>>>>> xfs mailing list
>>>>>> xfs@oss.sgi.com
>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>
>>>> _______________________________________________
>>>> xfs mailing list
>>>> xfs@oss.sgi.com
>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-05-11 15:59                                   ` Brian Foster
  2016-05-11 19:20                                     ` Stefan Priebe
@ 2016-05-15 11:03                                     ` Stefan Priebe
  2016-05-15 11:50                                       ` Brian Foster
  1 sibling, 1 reply; 49+ messages in thread
From: Stefan Priebe @ 2016-05-15 11:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs-masters, xfs

Hi Brian,

here's the new trace:
[310740.407263] XFS (sdf1): ino 0x27c69cd delalloc 0 unwritten 1 pgoff 
0x19f000 size 0x1a0000
[310740.407265] ------------[ cut here ]------------
[310740.407269] WARNING: CPU: 3 PID: 108 at fs/xfs/xfs_aops.c:1241 
xfs_vm_releasepage+0x12e/0x140()
[310740.407270] Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 
xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q 
garp fuse sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd vxlan 
ip6_udp_tunnel shpchp udp_tunnel ipmi_si ipmi_msghandler button btrfs 
xor raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod 
ehci_pci ehci_hcd ahci usbcore libahci igb usb_common i2c_algo_bit 
i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas
[310740.407289] CPU: 3 PID: 108 Comm: kswapd0 Tainted: G           O 
4.4.10+25-ph #1
[310740.407290] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 
1.0b 05/18/2015
[310740.407291]  0000000000000000 ffff880c4da1fa88 ffffffffa13c6d0f 
0000000000000000
[310740.407292]  ffffffffa1a51a1c ffff880c4da1fac8 ffffffffa10837a7 
ffff880c4da1fae8
[310740.407293]  0000000000000000 ffffea0000e38140 ffff8807e20bfd10 
ffffea0000e38160
[310740.407295] Call Trace:
[310740.407299]  [<ffffffffa13c6d0f>] dump_stack+0x63/0x84
[310740.407301]  [<ffffffffa10837a7>] warn_slowpath_common+0x97/0xe0
[310740.407302]  [<ffffffffa108380a>] warn_slowpath_null+0x1a/0x20
[310740.407303]  [<ffffffffa1326f6e>] xfs_vm_releasepage+0x12e/0x140
[310740.407305]  [<ffffffffa11520c2>] try_to_release_page+0x32/0x50
[310740.407308]  [<ffffffffa1166a8e>] shrink_active_list+0x3ce/0x3e0
[310740.407309]  [<ffffffffa1167127>] shrink_lruvec+0x687/0x7d0
[310740.407311]  [<ffffffffa116734c>] shrink_zone+0xdc/0x2c0
[310740.407312]  [<ffffffffa1168499>] kswapd+0x4f9/0x970
[310740.407314]  [<ffffffffa1167fa0>] ? 
mem_cgroup_shrink_node_zone+0x1a0/0x1a0
[310740.407316]  [<ffffffffa10a0d99>] kthread+0xc9/0xe0
[310740.407318]  [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100
[310740.407320]  [<ffffffffa16b58cf>] ret_from_fork+0x3f/0x70
[310740.407321]  [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100
[310740.407322] ---[ end trace bf76ad5e8a4d863e ]---


Stefan

Am 11.05.2016 um 17:59 schrieb Brian Foster:
> Dropped non-XFS cc's, probably no need to spam other lists at this
> point...
>
> On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote:
>>
>> Am 11.05.2016 um 15:34 schrieb Brian Foster:
>>> On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote:
>>>> Hi Brian,
>>>>
>>>> i'm still unable to grab anything to the trace file? Is there anything
>>>> to check if it's working at all?
>>>>
>>>
>>> See my previous mail:
>>>
>>> http://oss.sgi.com/pipermail/xfs/2016-March/047793.html
>>>
>>> E.g., something like this should work after writing to and removing a
>>> new file:
>>>
>>> # trace-cmd start -e "xfs:xfs_releasepage"
>>> # cat /sys/kernel/debug/tracing/trace_pipe
>>> ...
>>> rm-8198  [000] ....  9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0
>>
>> arg sorry yes that's working but delalloc is always 0.
>>
>
> Hrm, Ok. That is strange.
>
>> May be i have to hook that into my initramfs to be fast enough?
>>
>
> Not sure that would matter.. you said it occurs within 48 hours? I take
> that to mean it doesn't occur immediately on boot. You should be able to
> tell from the logs or dmesg if it happens before you get a chance to
> start the tracing.
>
> Well, the options I can think of are:
>
> - Perhaps I botched matching up the line number to the warning, in which
>    case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch
>    any delalloc or unwritten blocks at releasepage() time.
>
> - Perhaps there's a race that the tracepoint doesn't catch. The warnings
>    are based on local vars, so we could instrument the code to print a
>    warning[1] to try and get the inode number.
>
> Brian
>
> [1] - compile tested diff:
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 40645a4..94738ea 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1038,11 +1038,18 @@ xfs_vm_releasepage(
>   	gfp_t			gfp_mask)
>   {
>   	int			delalloc, unwritten;
> +	struct xfs_inode	*ip = XFS_I(page->mapping->host);
>
>   	trace_xfs_releasepage(page->mapping->host, page, 0, 0);
>
>   	xfs_count_page_state(page, &delalloc, &unwritten);
>
> +	if (delalloc || unwritten)
> +		xfs_warn(ip->i_mount,
> +		"ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx",
> +			 ip->i_ino, delalloc, unwritten, page_offset(page),
> +			 i_size_read(page->mapping->host));
> +
>   	if (WARN_ON_ONCE(delalloc))
>   		return 0;
>   	if (WARN_ON_ONCE(unwritten))
>
>> Stefan
>>
>>> Once that is working, add the grep command to filter out "delalloc 0"
>>> instances, etc. For example:
>>>
>>> 	cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out
>>>
>>> Brian
>>>
>>>> This still happens in the first 48 hours after a fresh reboot.
>>>>
>>>> Stefan
>>>>
>>>> Am 24.03.2016 um 13:24 schrieb Brian Foster:
>>>>> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>
>>>>>> Am 24.03.2016 um 12:17 schrieb Brian Foster:
>>>>>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>>>
>>>>>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>
>>>>>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>>>>>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>> sorry new one the last one got mangled. Comments inside.
>>>>>>>>>>>
>>>>>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>>>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> This has happened again on 8 different hosts in the last 24 hours
>>>>>>>>>>> running 4.4.6.
>>>>>>>>>>>
>>>>>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>>>>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>>>>>>>>>> those hosts - just the OS doing nearly nothing.
>>>>>>>>>>>
>>>>>>>>>>> All those show:
>>>>>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>>>>>>>>>> xfs_vm_releasepage+0xe2/0xf0()
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ok, well at this point the warning isn't telling us anything beyond
>>>>>>>>>> you're reproducing the problem. We can't really make progress without
>>>>>>>>>> more information. We don't necessarily know what application or
>>>>>>>>>> operations caused this by the time it occurs, but perhaps knowing what
>>>>>>>>>> file is affected could give us a hint.
>>>>>>>>>>
>>>>>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>>>>>>>>>> might generate a lot of noise by default. Could you enable the
>>>>>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>>>>>>>>>> E.g., we could leave a long running 'trace-cmd record -e
>>>>>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>>>>>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>>>>>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>>>>>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>>>>>>>>>> ~/trace.out' running to capture instances.
>>>>>>>>
>>>>>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
>>>>>>>> it in the trace.out even the WARN_ONCE was already triggered?
>>>>>>>>
>>>>>>>
>>>>>>> The tracepoint is independent from the warning (see
>>>>>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
>>>>>>> the function regardless of whether delalloc blocks still exist at that
>>>>>>> point. That creates the need to filter the entries.
>>>>>>>
>>>>>>> With regard to performance, I believe the tracepoints are intended to be
>>>>>>> pretty lightweight. I don't think it should hurt to try it on a box,
>>>>>>> observe for a bit and make sure there isn't a huge impact. Note that the
>>>>>>> 'trace-cmd record' approach will save everything to file, so that's
>>>>>>> something to consider I suppose.
>>>>>>
>>>>>> Tests / cat is running. Is there any way to test if it works? Or is it
>>>>>> enough that cat prints stuff from time to time but does not match -v
>>>>>> delalloc 0
>>>>>>
>>>>>
>>>>> What is it printing where delalloc != 0? You could always just cat
>>>>> trace_pipe and make sure the event is firing, it's just that I suspect
>>>>> most entries will have delalloc == unwritten == 0.
>>>>>
>>>>> Also, while the tracepoint fires independent of the warning, it might
>>>>> not be a bad idea to restart a system that has already seen the warning
>>>>> since boot, just to provide some correlation or additional notification
>>>>> when the problem occurs.
>>>>>
>>>>> Brian
>>>>>
>>>>>> Stefan
>>>>>>
>>>>>> _______________________________________________
>>>>>> xfs mailing list
>>>>>> xfs@oss.sgi.com
>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>
>>>> _______________________________________________
>>>> xfs mailing list
>>>> xfs@oss.sgi.com
>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-05-15 11:03                                     ` Stefan Priebe
@ 2016-05-15 11:50                                       ` Brian Foster
  2016-05-15 12:41                                         ` Stefan Priebe
  0 siblings, 1 reply; 49+ messages in thread
From: Brian Foster @ 2016-05-15 11:50 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: xfs-masters, xfs

On Sun, May 15, 2016 at 01:03:07PM +0200, Stefan Priebe wrote:
> Hi Brian,
> 
> here's the new trace:
> [310740.407263] XFS (sdf1): ino 0x27c69cd delalloc 0 unwritten 1 pgoff
> 0x19f000 size 0x1a0000

So it is actually an unwritten buffer, on what appears to be the last
page of the file. Well, we had 60630fe ("xfs: clean up unwritten buffers
on write failure") that went into 4.6, but that was reproducible on
sub-4k block size filesystems and depends on some kind of write error.
Are either of those applicable here? Are you close to ENOSPC, for
example?

Otherwise, have you determined what file is associated with that inode
(e.g., 'find <mnt> -inum 0x27c69cd -print')? I'm hoping that gives some
insight on what actually preallocates/writes the file and perhaps that
helps us identify something we can trace. Also, if you think the file
has not been modified since the error, an 'xfs_bmap -v <file>' might be
interesting as well...

Brian

> [310740.407265] ------------[ cut here ]------------
> [310740.407269] WARNING: CPU: 3 PID: 108 at fs/xfs/xfs_aops.c:1241
> xfs_vm_releasepage+0x12e/0x140()
> [310740.407270] Modules linked in: netconsole ipt_REJECT nf_reject_ipv4
> xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q garp
> fuse sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd vxlan
> ip6_udp_tunnel shpchp udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
> raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci
> ehci_hcd ahci usbcore libahci igb usb_common i2c_algo_bit i2c_core ptp
> mpt3sas pps_core raid_class scsi_transport_sas
> [310740.407289] CPU: 3 PID: 108 Comm: kswapd0 Tainted: G           O
> 4.4.10+25-ph #1
> [310740.407290] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b
> 05/18/2015
> [310740.407291]  0000000000000000 ffff880c4da1fa88 ffffffffa13c6d0f
> 0000000000000000
> [310740.407292]  ffffffffa1a51a1c ffff880c4da1fac8 ffffffffa10837a7
> ffff880c4da1fae8
> [310740.407293]  0000000000000000 ffffea0000e38140 ffff8807e20bfd10
> ffffea0000e38160
> [310740.407295] Call Trace:
> [310740.407299]  [<ffffffffa13c6d0f>] dump_stack+0x63/0x84
> [310740.407301]  [<ffffffffa10837a7>] warn_slowpath_common+0x97/0xe0
> [310740.407302]  [<ffffffffa108380a>] warn_slowpath_null+0x1a/0x20
> [310740.407303]  [<ffffffffa1326f6e>] xfs_vm_releasepage+0x12e/0x140
> [310740.407305]  [<ffffffffa11520c2>] try_to_release_page+0x32/0x50
> [310740.407308]  [<ffffffffa1166a8e>] shrink_active_list+0x3ce/0x3e0
> [310740.407309]  [<ffffffffa1167127>] shrink_lruvec+0x687/0x7d0
> [310740.407311]  [<ffffffffa116734c>] shrink_zone+0xdc/0x2c0
> [310740.407312]  [<ffffffffa1168499>] kswapd+0x4f9/0x970
> [310740.407314]  [<ffffffffa1167fa0>] ?
> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> [310740.407316]  [<ffffffffa10a0d99>] kthread+0xc9/0xe0
> [310740.407318]  [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100
> [310740.407320]  [<ffffffffa16b58cf>] ret_from_fork+0x3f/0x70
> [310740.407321]  [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100
> [310740.407322] ---[ end trace bf76ad5e8a4d863e ]---
> 
> 
> Stefan
> 
> Am 11.05.2016 um 17:59 schrieb Brian Foster:
> >Dropped non-XFS cc's, probably no need to spam other lists at this
> >point...
> >
> >On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote:
> >>
> >>Am 11.05.2016 um 15:34 schrieb Brian Foster:
> >>>On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote:
> >>>>Hi Brian,
> >>>>
> >>>>i'm still unable to grab anything to the trace file? Is there anything
> >>>>to check if it's working at all?
> >>>>
> >>>
> >>>See my previous mail:
> >>>
> >>>http://oss.sgi.com/pipermail/xfs/2016-March/047793.html
> >>>
> >>>E.g., something like this should work after writing to and removing a
> >>>new file:
> >>>
> >>># trace-cmd start -e "xfs:xfs_releasepage"
> >>># cat /sys/kernel/debug/tracing/trace_pipe
> >>>...
> >>>rm-8198  [000] ....  9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0
> >>
> >>arg sorry yes that's working but delalloc is always 0.
> >>
> >
> >Hrm, Ok. That is strange.
> >
> >>May be i have to hook that into my initramfs to be fast enough?
> >>
> >
> >Not sure that would matter.. you said it occurs within 48 hours? I take
> >that to mean it doesn't occur immediately on boot. You should be able to
> >tell from the logs or dmesg if it happens before you get a chance to
> >start the tracing.
> >
> >Well, the options I can think of are:
> >
> >- Perhaps I botched matching up the line number to the warning, in which
> >   case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch
> >   any delalloc or unwritten blocks at releasepage() time.
> >
> >- Perhaps there's a race that the tracepoint doesn't catch. The warnings
> >   are based on local vars, so we could instrument the code to print a
> >   warning[1] to try and get the inode number.
> >
> >Brian
> >
> >[1] - compile tested diff:
> >
> >diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> >index 40645a4..94738ea 100644
> >--- a/fs/xfs/xfs_aops.c
> >+++ b/fs/xfs/xfs_aops.c
> >@@ -1038,11 +1038,18 @@ xfs_vm_releasepage(
> >  	gfp_t			gfp_mask)
> >  {
> >  	int			delalloc, unwritten;
> >+	struct xfs_inode	*ip = XFS_I(page->mapping->host);
> >
> >  	trace_xfs_releasepage(page->mapping->host, page, 0, 0);
> >
> >  	xfs_count_page_state(page, &delalloc, &unwritten);
> >
> >+	if (delalloc || unwritten)
> >+		xfs_warn(ip->i_mount,
> >+		"ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx",
> >+			 ip->i_ino, delalloc, unwritten, page_offset(page),
> >+			 i_size_read(page->mapping->host));
> >+
> >  	if (WARN_ON_ONCE(delalloc))
> >  		return 0;
> >  	if (WARN_ON_ONCE(unwritten))
> >
> >>Stefan
> >>
> >>>Once that is working, add the grep command to filter out "delalloc 0"
> >>>instances, etc. For example:
> >>>
> >>>	cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out
> >>>
> >>>Brian
> >>>
> >>>>This still happens in the first 48 hours after a fresh reboot.
> >>>>
> >>>>Stefan
> >>>>
> >>>>Am 24.03.2016 um 13:24 schrieb Brian Foster:
> >>>>>On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
> >>>>>>
> >>>>>>Am 24.03.2016 um 12:17 schrieb Brian Foster:
> >>>>>>>On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>
> >>>>>>>>Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
> >>>>>>>>>
> >>>>>>>>>Am 23.03.2016 um 15:07 schrieb Brian Foster:
> >>>>>>>>>>On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>sorry new one the last one got mangled. Comments inside.
> >>>>>>>>>>>
> >>>>>>>>>>>Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> >>>>>>>>>>>>On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> >>>>>>>>>>>>>On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> >>>>>>>>>>>>>>Am 04.03.2016 um 20:13 schrieb Brian Foster:
> >>>>>>>>>>>>>>>On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> >>>>>>>>>>>>>>>>Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> >>>>>>>>>>...
> >>>>>>>>>>>
> >>>>>>>>>>>This has happened again on 8 different hosts in the last 24 hours
> >>>>>>>>>>>running 4.4.6.
> >>>>>>>>>>>
> >>>>>>>>>>>All of those are KVM / Qemu hosts and are doing NO I/O except the normal
> >>>>>>>>>>>OS stuff as the VMs have remote storage. So no database, no rsync on
> >>>>>>>>>>>those hosts - just the OS doing nearly nothing.
> >>>>>>>>>>>
> >>>>>>>>>>>All those show:
> >>>>>>>>>>>[153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
> >>>>>>>>>>>xfs_vm_releasepage+0xe2/0xf0()
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>Ok, well at this point the warning isn't telling us anything beyond
> >>>>>>>>>>you're reproducing the problem. We can't really make progress without
> >>>>>>>>>>more information. We don't necessarily know what application or
> >>>>>>>>>>operations caused this by the time it occurs, but perhaps knowing what
> >>>>>>>>>>file is affected could give us a hint.
> >>>>>>>>>>
> >>>>>>>>>>We have the xfs_releasepage tracepoint, but that's unconditional and so
> >>>>>>>>>>might generate a lot of noise by default. Could you enable the
> >>>>>>>>>>xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
> >>>>>>>>>>E.g., we could leave a long running 'trace-cmd record -e
> >>>>>>>>>>"xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
> >>>>>>>>>>problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
> >>>>>>>>>>-e "xfs:xfs_releasepage"' and leave something like 'cat
> >>>>>>>>>>/sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
> >>>>>>>>>>~/trace.out' running to capture instances.
> >>>>>>>>
> >>>>>>>>Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
> >>>>>>>>it in the trace.out even the WARN_ONCE was already triggered?
> >>>>>>>>
> >>>>>>>
> >>>>>>>The tracepoint is independent from the warning (see
> >>>>>>>xfs_vm_releasepage()), so the tracepoint will fire every invocation of
> >>>>>>>the function regardless of whether delalloc blocks still exist at that
> >>>>>>>point. That creates the need to filter the entries.
> >>>>>>>
> >>>>>>>With regard to performance, I believe the tracepoints are intended to be
> >>>>>>>pretty lightweight. I don't think it should hurt to try it on a box,
> >>>>>>>observe for a bit and make sure there isn't a huge impact. Note that the
> >>>>>>>'trace-cmd record' approach will save everything to file, so that's
> >>>>>>>something to consider I suppose.
> >>>>>>
> >>>>>>Tests / cat is running. Is there any way to test if it works? Or is it
> >>>>>>enough that cat prints stuff from time to time but does not match -v
> >>>>>>delalloc 0
> >>>>>>
> >>>>>
> >>>>>What is it printing where delalloc != 0? You could always just cat
> >>>>>trace_pipe and make sure the event is firing, it's just that I suspect
> >>>>>most entries will have delalloc == unwritten == 0.
> >>>>>
> >>>>>Also, while the tracepoint fires independent of the warning, it might
> >>>>>not be a bad idea to restart a system that has already seen the warning
> >>>>>since boot, just to provide some correlation or additional notification
> >>>>>when the problem occurs.
> >>>>>
> >>>>>Brian
> >>>>>
> >>>>>>Stefan
> >>>>>>
> >>>>>>_______________________________________________
> >>>>>>xfs mailing list
> >>>>>>xfs@oss.sgi.com
> >>>>>>http://oss.sgi.com/mailman/listinfo/xfs
> >>>>
> >>>>_______________________________________________
> >>>>xfs mailing list
> >>>>xfs@oss.sgi.com
> >>>>http://oss.sgi.com/mailman/listinfo/xfs
> >>
> >>_______________________________________________
> >>xfs mailing list
> >>xfs@oss.sgi.com
> >>http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-05-15 11:50                                       ` Brian Foster
@ 2016-05-15 12:41                                         ` Stefan Priebe
  2016-05-16  1:06                                           ` Brian Foster
  0 siblings, 1 reply; 49+ messages in thread
From: Stefan Priebe @ 2016-05-15 12:41 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs-masters, xfs

Hi,

find shows a ceph object file:
/var/lib/ceph/osd/ceph-13/current/3.29f_head/DIR_F/DIR_9/DIR_2/DIR_D/rbd\udata.904a406b8b4567.00000000000052d6__head_143BD29F__3

File was again modified since than.


At another system i've different output.
[Sun May 15 07:00:44 2016] XFS (md127p3): ino 0x600204f delalloc 1 
unwritten 0 pgoff 0x50000 size 0x13d1c8
[Sun May 15 07:00:44 2016] ------------[ cut here ]------------
[Sun May 15 07:00:44 2016] WARNING: CPU: 2 PID: 108 at 
fs/xfs/xfs_aops.c:1239 xfs_vm_releasepage+0x10f/0x140()
[Sun May 15 07:00:44 2016] Modules linked in: netconsole ipt_REJECT 
nf_reject_ipv4 xt_multiport iptable_filter ip_tables x_tables bonding 
coretemp 8021q garp fuse xhci_pci xhci_hcd sb_edac edac_core i2c_i801 
i40e(O) shpchp vxlan ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler 
button btrfs xor raid6_pq dm_mod raid1 md_mod usbhid usb_storage 
ohci_hcd sg sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci 
i2c_algo_bit libahci i2c_core ptp mpt3sas pps_core raid_class 
scsi_transport_sas
[Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G 
       O    4.4.10+25-ph #1
[Sun May 15 07:00:44 2016] Hardware name: Supermicro Super 
Server/X10SRH-CF, BIOS 1.0b 05/18/2015
[Sun May 15 07:00:44 2016]  0000000000000000 ffff880c4da37a88 
ffffffff9c3c6d0f 0000000000000000
[Sun May 15 07:00:44 2016]  ffffffff9ca51a1c ffff880c4da37ac8 
ffffffff9c0837a7 ffff880c4da37ae8
[Sun May 15 07:00:44 2016]  0000000000000001 ffffea0001053080 
ffff8801429ef490 ffffea00010530a0
[Sun May 15 07:00:44 2016] Call Trace:
[Sun May 15 07:00:44 2016]  [<ffffffff9c3c6d0f>] dump_stack+0x63/0x84
[Sun May 15 07:00:44 2016]  [<ffffffff9c0837a7>] 
warn_slowpath_common+0x97/0xe0
[Sun May 15 07:00:44 2016]  [<ffffffff9c08380a>] 
warn_slowpath_null+0x1a/0x20
[Sun May 15 07:00:44 2016]  [<ffffffff9c326f4f>] 
xfs_vm_releasepage+0x10f/0x140
[Sun May 15 07:00:44 2016]  [<ffffffff9c1520c2>] 
try_to_release_page+0x32/0x50
[Sun May 15 07:00:44 2016]  [<ffffffff9c166a8e>] 
shrink_active_list+0x3ce/0x3e0
[Sun May 15 07:00:44 2016]  [<ffffffff9c167127>] shrink_lruvec+0x687/0x7d0
[Sun May 15 07:00:44 2016]  [<ffffffff9c16734c>] shrink_zone+0xdc/0x2c0
[Sun May 15 07:00:44 2016]  [<ffffffff9c168499>] kswapd+0x4f9/0x970
[Sun May 15 07:00:44 2016]  [<ffffffff9c167fa0>] ? 
mem_cgroup_shrink_node_zone+0x1a0/0x1a0
[Sun May 15 07:00:44 2016]  [<ffffffff9c0a0d99>] kthread+0xc9/0xe0
[Sun May 15 07:00:44 2016]  [<ffffffff9c0a0cd0>] ? kthread_stop+0x100/0x100
[Sun May 15 07:00:44 2016]  [<ffffffff9c6b58cf>] ret_from_fork+0x3f/0x70
[Sun May 15 07:00:44 2016]  [<ffffffff9c0a0cd0>] ? kthread_stop+0x100/0x100
[Sun May 15 07:00:44 2016] ---[ end trace 9497d464aafe5b88 ]---
[295086.353469] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x51000 size 0x13d1c8
[295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x52000 size 0x13d1c8
[295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x53000 size 0x13d1c8
[295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x54000 size 0x13d1c8
[295086.353480] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x55000 size 0x13d1c8
[295086.353482] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x56000 size 0x13d1c8
[295086.353489] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x57000 size 0x13d1c8
[295086.353491] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x58000 size 0x13d1c8
[295086.353494] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x59000 size 0x13d1c8
[295086.353496] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x5a000 size 0x13d1c8
[295086.353498] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x5b000 size 0x13d1c8
[295086.353500] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x5c000 size 0x13d1c8
[295086.353503] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x5d000 size 0x13d1c8
[295086.353505] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x5e000 size 0x13d1c8
[295086.353513] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x5f000 size 0x13d1c8
[295086.353515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x60000 size 0x13d1c8
[295086.353517] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x61000 size 0x13d1c8
[295086.353521] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x62000 size 0x13d1c8
[295086.353523] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x63000 size 0x13d1c8
[295086.353525] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x64000 size 0x13d1c8
[295086.353528] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x65000 size 0x13d1c8
[295086.353530] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x66000 size 0x13d1c8
[295086.353536] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x67000 size 0x13d1c8
[295086.353538] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x68000 size 0x13d1c8
[295086.353541] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x69000 size 0x13d1c8
[295086.353543] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x6a000 size 0x13d1c8
[295086.353545] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x6b000 size 0x13d1c8
[295086.353548] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x6c000 size 0x13d1c8
[295086.353550] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x6d000 size 0x13d1c8
[295086.353553] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x6e000 size 0x13d1c8
[295086.567308] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x6f000 size 0x13d1c8
[295086.567313] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x70000 size 0x13d1c8
[295086.567317] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x71000 size 0x13d1c8
[295086.567319] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x72000 size 0x13d1c8
[295086.567321] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x73000 size 0x13d1c8
[295086.567326] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x74000 size 0x13d1c8
[295086.567328] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x75000 size 0x13d1c8
[295086.567331] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x76000 size 0x13d1c8
[295086.567341] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x77000 size 0x13d1c8
[295086.567343] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x78000 size 0x13d1c8
[295086.567346] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x79000 size 0x13d1c8
[295086.567348] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x7a000 size 0x13d1c8
[295086.567350] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x7b000 size 0x13d1c8
[295086.567353] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x7c000 size 0x13d1c8
[295086.567355] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x7d000 size 0x13d1c8
[295086.567357] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x7e000 size 0x13d1c8
[295086.567367] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x7f000 size 0x13d1c8
[295086.567369] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x80000 size 0x13d1c8
[295086.567372] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x81000 size 0x13d1c8
[295086.567374] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x82000 size 0x13d1c8
[295086.567376] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x83000 size 0x13d1c8
[295086.567380] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x84000 size 0x13d1c8
[295086.567382] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x85000 size 0x13d1c8
[295086.567385] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x86000 size 0x13d1c8
[295086.567394] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x87000 size 0x13d1c8
[295086.567396] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x88000 size 0x13d1c8
[295086.567399] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x89000 size 0x13d1c8
[295086.567401] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x8a000 size 0x13d1c8
[295086.567403] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x8b000 size 0x13d1c8
[295086.567405] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x8c000 size 0x13d1c8
[295086.567408] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x8d000 size 0x13d1c8
[295086.567410] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x8e000 size 0x13d1c8
[295086.567416] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x8f000 size 0x13d1c8
[295086.567419] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x90000 size 0x13d1c8
[295086.567421] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x91000 size 0x13d1c8
[295086.567423] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x92000 size 0x13d1c8
[295086.567427] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x93000 size 0x13d1c8
[295086.567429] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x94000 size 0x13d1c8
[295086.567431] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x95000 size 0x13d1c8
[295086.567434] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x96000 size 0x13d1c8
[295086.567447] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x97000 size 0x13d1c8
[295086.567450] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x98000 size 0x13d1c8
[295086.567452] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x99000 size 0x13d1c8
[295086.567454] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x9a000 size 0x13d1c8
[295086.567456] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x9b000 size 0x13d1c8
[295086.567458] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x9c000 size 0x13d1c8
[295086.567461] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x9d000 size 0x13d1c8
[295086.567463] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x9e000 size 0x13d1c8
[295086.567471] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0x9f000 size 0x13d1c8
[295086.567474] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xa0000 size 0x13d1c8
[295086.567476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xa1000 size 0x13d1c8
[295086.567479] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xa2000 size 0x13d1c8
[295086.567483] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xa3000 size 0x13d1c8
[295086.567485] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xa4000 size 0x13d1c8
[295086.567488] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xa5000 size 0x13d1c8
[295086.567490] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xa6000 size 0x13d1c8
[295086.567499] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xa7000 size 0x13d1c8
[295086.567501] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xa8000 size 0x13d1c8
[295086.567503] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xa9000 size 0x13d1c8
[295086.567505] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xaa000 size 0x13d1c8
[295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xab000 size 0x13d1c8
[295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xac000 size 0x13d1c8
[295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 
pgoff 0xad000 size 0x13d1c8

The file to the inode number is:
/var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en

dmesg output / trace was at 7 am today and last modify of the file was 
yesterday 11 pm.

Stefan

Am 15.05.2016 um 13:50 schrieb Brian Foster:
> On Sun, May 15, 2016 at 01:03:07PM +0200, Stefan Priebe wrote:
>> Hi Brian,
>>
>> here's the new trace:
>> [310740.407263] XFS (sdf1): ino 0x27c69cd delalloc 0 unwritten 1 pgoff
>> 0x19f000 size 0x1a0000
>
> So it is actually an unwritten buffer, on what appears to be the last
> page of the file. Well, we had 60630fe ("xfs: clean up unwritten buffers
> on write failure") that went into 4.6, but that was reproducible on
> sub-4k block size filesystems and depends on some kind of write error.
> Are either of those applicable here? Are you close to ENOSPC, for
> example?
>
> Otherwise, have you determined what file is associated with that inode
> (e.g., 'find <mnt> -inum 0x27c69cd -print')? I'm hoping that gives some
> insight on what actually preallocates/writes the file and perhaps that
> helps us identify something we can trace. Also, if you think the file
> has not been modified since the error, an 'xfs_bmap -v <file>' might be
> interesting as well...
>
> Brian
>
>> [310740.407265] ------------[ cut here ]------------
>> [310740.407269] WARNING: CPU: 3 PID: 108 at fs/xfs/xfs_aops.c:1241
>> xfs_vm_releasepage+0x12e/0x140()
>> [310740.407270] Modules linked in: netconsole ipt_REJECT nf_reject_ipv4
>> xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q garp
>> fuse sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd vxlan
>> ip6_udp_tunnel shpchp udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
>> raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci
>> ehci_hcd ahci usbcore libahci igb usb_common i2c_algo_bit i2c_core ptp
>> mpt3sas pps_core raid_class scsi_transport_sas
>> [310740.407289] CPU: 3 PID: 108 Comm: kswapd0 Tainted: G           O
>> 4.4.10+25-ph #1
>> [310740.407290] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b
>> 05/18/2015
>> [310740.407291]  0000000000000000 ffff880c4da1fa88 ffffffffa13c6d0f
>> 0000000000000000
>> [310740.407292]  ffffffffa1a51a1c ffff880c4da1fac8 ffffffffa10837a7
>> ffff880c4da1fae8
>> [310740.407293]  0000000000000000 ffffea0000e38140 ffff8807e20bfd10
>> ffffea0000e38160
>> [310740.407295] Call Trace:
>> [310740.407299]  [<ffffffffa13c6d0f>] dump_stack+0x63/0x84
>> [310740.407301]  [<ffffffffa10837a7>] warn_slowpath_common+0x97/0xe0
>> [310740.407302]  [<ffffffffa108380a>] warn_slowpath_null+0x1a/0x20
>> [310740.407303]  [<ffffffffa1326f6e>] xfs_vm_releasepage+0x12e/0x140
>> [310740.407305]  [<ffffffffa11520c2>] try_to_release_page+0x32/0x50
>> [310740.407308]  [<ffffffffa1166a8e>] shrink_active_list+0x3ce/0x3e0
>> [310740.407309]  [<ffffffffa1167127>] shrink_lruvec+0x687/0x7d0
>> [310740.407311]  [<ffffffffa116734c>] shrink_zone+0xdc/0x2c0
>> [310740.407312]  [<ffffffffa1168499>] kswapd+0x4f9/0x970
>> [310740.407314]  [<ffffffffa1167fa0>] ?
>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>> [310740.407316]  [<ffffffffa10a0d99>] kthread+0xc9/0xe0
>> [310740.407318]  [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100
>> [310740.407320]  [<ffffffffa16b58cf>] ret_from_fork+0x3f/0x70
>> [310740.407321]  [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100
>> [310740.407322] ---[ end trace bf76ad5e8a4d863e ]---
>>
>>
>> Stefan
>>
>> Am 11.05.2016 um 17:59 schrieb Brian Foster:
>>> Dropped non-XFS cc's, probably no need to spam other lists at this
>>> point...
>>>
>>> On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 11.05.2016 um 15:34 schrieb Brian Foster:
>>>>> On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Brian,
>>>>>>
>>>>>> i'm still unable to grab anything to the trace file? Is there anything
>>>>>> to check if it's working at all?
>>>>>>
>>>>>
>>>>> See my previous mail:
>>>>>
>>>>> http://oss.sgi.com/pipermail/xfs/2016-March/047793.html
>>>>>
>>>>> E.g., something like this should work after writing to and removing a
>>>>> new file:
>>>>>
>>>>> # trace-cmd start -e "xfs:xfs_releasepage"
>>>>> # cat /sys/kernel/debug/tracing/trace_pipe
>>>>> ...
>>>>> rm-8198  [000] ....  9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0
>>>>
>>>> arg sorry yes that's working but delalloc is always 0.
>>>>
>>>
>>> Hrm, Ok. That is strange.
>>>
>>>> May be i have to hook that into my initramfs to be fast enough?
>>>>
>>>
>>> Not sure that would matter.. you said it occurs within 48 hours? I take
>>> that to mean it doesn't occur immediately on boot. You should be able to
>>> tell from the logs or dmesg if it happens before you get a chance to
>>> start the tracing.
>>>
>>> Well, the options I can think of are:
>>>
>>> - Perhaps I botched matching up the line number to the warning, in which
>>>    case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch
>>>    any delalloc or unwritten blocks at releasepage() time.
>>>
>>> - Perhaps there's a race that the tracepoint doesn't catch. The warnings
>>>    are based on local vars, so we could instrument the code to print a
>>>    warning[1] to try and get the inode number.
>>>
>>> Brian
>>>
>>> [1] - compile tested diff:
>>>
>>> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
>>> index 40645a4..94738ea 100644
>>> --- a/fs/xfs/xfs_aops.c
>>> +++ b/fs/xfs/xfs_aops.c
>>> @@ -1038,11 +1038,18 @@ xfs_vm_releasepage(
>>>   	gfp_t			gfp_mask)
>>>   {
>>>   	int			delalloc, unwritten;
>>> +	struct xfs_inode	*ip = XFS_I(page->mapping->host);
>>>
>>>   	trace_xfs_releasepage(page->mapping->host, page, 0, 0);
>>>
>>>   	xfs_count_page_state(page, &delalloc, &unwritten);
>>>
>>> +	if (delalloc || unwritten)
>>> +		xfs_warn(ip->i_mount,
>>> +		"ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx",
>>> +			 ip->i_ino, delalloc, unwritten, page_offset(page),
>>> +			 i_size_read(page->mapping->host));
>>> +
>>>   	if (WARN_ON_ONCE(delalloc))
>>>   		return 0;
>>>   	if (WARN_ON_ONCE(unwritten))
>>>
>>>> Stefan
>>>>
>>>>> Once that is working, add the grep command to filter out "delalloc 0"
>>>>> instances, etc. For example:
>>>>>
>>>>> 	cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out
>>>>>
>>>>> Brian
>>>>>
>>>>>> This still happens in the first 48 hours after a fresh reboot.
>>>>>>
>>>>>> Stefan
>>>>>>
>>>>>> Am 24.03.2016 um 13:24 schrieb Brian Foster:
>>>>>>> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>>>
>>>>>>>> Am 24.03.2016 um 12:17 schrieb Brian Foster:
>>>>>>>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>
>>>>>>>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>
>>>>>>>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>>>>>>>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>> sorry new one the last one got mangled. Comments inside.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>>>>>>> ...
>>>>>>>>>>>>>
>>>>>>>>>>>>> This has happened again on 8 different hosts in the last 24 hours
>>>>>>>>>>>>> running 4.4.6.
>>>>>>>>>>>>>
>>>>>>>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>>>>>>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>>>>>>>>>>>> those hosts - just the OS doing nearly nothing.
>>>>>>>>>>>>>
>>>>>>>>>>>>> All those show:
>>>>>>>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>>>>>>>>>>>> xfs_vm_releasepage+0xe2/0xf0()
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, well at this point the warning isn't telling us anything beyond
>>>>>>>>>>>> you're reproducing the problem. We can't really make progress without
>>>>>>>>>>>> more information. We don't necessarily know what application or
>>>>>>>>>>>> operations caused this by the time it occurs, but perhaps knowing what
>>>>>>>>>>>> file is affected could give us a hint.
>>>>>>>>>>>>
>>>>>>>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>>>>>>>>>>>> might generate a lot of noise by default. Could you enable the
>>>>>>>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>>>>>>>>>>>> E.g., we could leave a long running 'trace-cmd record -e
>>>>>>>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>>>>>>>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>>>>>>>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>>>>>>>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>>>>>>>>>>>> ~/trace.out' running to capture instances.
>>>>>>>>>>
>>>>>>>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
>>>>>>>>>> it in the trace.out even the WARN_ONCE was already triggered?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The tracepoint is independent from the warning (see
>>>>>>>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
>>>>>>>>> the function regardless of whether delalloc blocks still exist at that
>>>>>>>>> point. That creates the need to filter the entries.
>>>>>>>>>
>>>>>>>>> With regard to performance, I believe the tracepoints are intended to be
>>>>>>>>> pretty lightweight. I don't think it should hurt to try it on a box,
>>>>>>>>> observe for a bit and make sure there isn't a huge impact. Note that the
>>>>>>>>> 'trace-cmd record' approach will save everything to file, so that's
>>>>>>>>> something to consider I suppose.
>>>>>>>>
>>>>>>>> Tests / cat is running. Is there any way to test if it works? Or is it
>>>>>>>> enough that cat prints stuff from time to time but does not match -v
>>>>>>>> delalloc 0
>>>>>>>>
>>>>>>>
>>>>>>> What is it printing where delalloc != 0? You could always just cat
>>>>>>> trace_pipe and make sure the event is firing, it's just that I suspect
>>>>>>> most entries will have delalloc == unwritten == 0.
>>>>>>>
>>>>>>> Also, while the tracepoint fires independent of the warning, it might
>>>>>>> not be a bad idea to restart a system that has already seen the warning
>>>>>>> since boot, just to provide some correlation or additional notification
>>>>>>> when the problem occurs.
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> xfs mailing list
>>>>>>>> xfs@oss.sgi.com
>>>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>>>
>>>>>> _______________________________________________
>>>>>> xfs mailing list
>>>>>> xfs@oss.sgi.com
>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>
>>>> _______________________________________________
>>>> xfs mailing list
>>>> xfs@oss.sgi.com
>>>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-05-15 12:41                                         ` Stefan Priebe
@ 2016-05-16  1:06                                           ` Brian Foster
  2016-05-22 19:36                                             ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 49+ messages in thread
From: Brian Foster @ 2016-05-16  1:06 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: xfs-masters, xfs

On Sun, May 15, 2016 at 02:41:40PM +0200, Stefan Priebe wrote:
> Hi,
> 
> find shows a ceph object file:
> /var/lib/ceph/osd/ceph-13/current/3.29f_head/DIR_F/DIR_9/DIR_2/DIR_D/rbd\udata.904a406b8b4567.00000000000052d6__head_143BD29F__3
> 

Any idea what this file is? Does it represent user data, Ceph metadata?
How was it created? Can you create others like it (I'm assuming via some
file/block operation through Ceph) and/or reproduce the error?

(Also, this thread is 20+ mails strong at this point, why is this the
first reference to Ceph? :/)

> File was again modified since than.
> 

xfs_bmap -v might still be interesting.

> 
> At another system i've different output.
> [Sun May 15 07:00:44 2016] XFS (md127p3): ino 0x600204f delalloc 1 unwritten
> 0 pgoff 0x50000 size 0x13d1c8
> [Sun May 15 07:00:44 2016] ------------[ cut here ]------------
> [Sun May 15 07:00:44 2016] WARNING: CPU: 2 PID: 108 at
> fs/xfs/xfs_aops.c:1239 xfs_vm_releasepage+0x10f/0x140()

This one is different, being a lingering delalloc block in this case.

> [Sun May 15 07:00:44 2016] Modules linked in: netconsole ipt_REJECT
> nf_reject_ipv4 xt_multiport iptable_filter ip_tables x_tables bonding
> coretemp 8021q garp fuse xhci_pci xhci_hcd sb_edac edac_core i2c_i801
> i40e(O) shpchp vxlan ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler
> button btrfs xor raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg
> sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
> i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas
> [Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G       O
> 4.4.10+25-ph #1

How close is this to an upstream kernel? Upstream XFS? Have you tried to
reproduce this on an upstream kernel?

> [Sun May 15 07:00:44 2016] Hardware name: Supermicro Super Server/X10SRH-CF,
> BIOS 1.0b 05/18/2015
> [Sun May 15 07:00:44 2016]  0000000000000000 ffff880c4da37a88
> ffffffff9c3c6d0f 0000000000000000
> [Sun May 15 07:00:44 2016]  ffffffff9ca51a1c ffff880c4da37ac8
> ffffffff9c0837a7 ffff880c4da37ae8
> [Sun May 15 07:00:44 2016]  0000000000000001 ffffea0001053080
> ffff8801429ef490 ffffea00010530a0
> [Sun May 15 07:00:44 2016] Call Trace:
> [Sun May 15 07:00:44 2016]  [<ffffffff9c3c6d0f>] dump_stack+0x63/0x84
> [Sun May 15 07:00:44 2016]  [<ffffffff9c0837a7>]
> warn_slowpath_common+0x97/0xe0
> [Sun May 15 07:00:44 2016]  [<ffffffff9c08380a>]
> warn_slowpath_null+0x1a/0x20
> [Sun May 15 07:00:44 2016]  [<ffffffff9c326f4f>]
> xfs_vm_releasepage+0x10f/0x140
> [Sun May 15 07:00:44 2016]  [<ffffffff9c1520c2>]
> try_to_release_page+0x32/0x50
> [Sun May 15 07:00:44 2016]  [<ffffffff9c166a8e>]
> shrink_active_list+0x3ce/0x3e0
> [Sun May 15 07:00:44 2016]  [<ffffffff9c167127>] shrink_lruvec+0x687/0x7d0
> [Sun May 15 07:00:44 2016]  [<ffffffff9c16734c>] shrink_zone+0xdc/0x2c0
> [Sun May 15 07:00:44 2016]  [<ffffffff9c168499>] kswapd+0x4f9/0x970
> [Sun May 15 07:00:44 2016]  [<ffffffff9c167fa0>] ?
> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> [Sun May 15 07:00:44 2016]  [<ffffffff9c0a0d99>] kthread+0xc9/0xe0
> [Sun May 15 07:00:44 2016]  [<ffffffff9c0a0cd0>] ? kthread_stop+0x100/0x100
> [Sun May 15 07:00:44 2016]  [<ffffffff9c6b58cf>] ret_from_fork+0x3f/0x70
> [Sun May 15 07:00:44 2016]  [<ffffffff9c0a0cd0>] ? kthread_stop+0x100/0x100
> [Sun May 15 07:00:44 2016] ---[ end trace 9497d464aafe5b88 ]---
> [295086.353469] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
> 0x51000 size 0x13d1c8

What is md127p3, is the root fs on some kind of raid device? Can you
provide xfs_info for this filesystem?

> [295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
> 0x52000 size 0x13d1c8
> [295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
> 0x53000 size 0x13d1c8
> [295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
> 0x54000 size 0x13d1c8
...
> [295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
> 0xab000 size 0x13d1c8
> [295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
> 0xac000 size 0x13d1c8
> [295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
> 0xad000 size 0x13d1c8
> 
> The file to the inode number is:
> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en
> 

xfs_bmap -v might be interesting here as well.

This certainly seems like it is more repeatable. According to google,
the content of /var/lib/apt/lists/ can be removed and repopulated safely
with 'apt-get update' (please verify before trying). Does that reproduce
this variant of the problem?

Note that the apt command might not directly cause the error message,
but rather only create the conditions for it to occur sometime later via
memory reclaim. E.g., you might need to run 'sync; echo 3 >
/proc/sys/vm/drop_caches' after, or possibly run a dummy workload of
some kind (e.g., dd if=/dev/zero of=tmpfile bs=1M ...) to cause memory
pressure and reclaim the pagecache of the package list file.

Brian

> dmesg output / trace was at 7 am today and last modify of the file was
> yesterday 11 pm.
> 
> Stefan
> 
> Am 15.05.2016 um 13:50 schrieb Brian Foster:
> > On Sun, May 15, 2016 at 01:03:07PM +0200, Stefan Priebe wrote:
> > > Hi Brian,
> > > 
> > > here's the new trace:
> > > [310740.407263] XFS (sdf1): ino 0x27c69cd delalloc 0 unwritten 1 pgoff
> > > 0x19f000 size 0x1a0000
> > 
> > So it is actually an unwritten buffer, on what appears to be the last
> > page of the file. Well, we had 60630fe ("xfs: clean up unwritten buffers
> > on write failure") that went into 4.6, but that was reproducible on
> > sub-4k block size filesystems and depends on some kind of write error.
> > Are either of those applicable here? Are you close to ENOSPC, for
> > example?
> > 
> > Otherwise, have you determined what file is associated with that inode
> > (e.g., 'find <mnt> -inum 0x27c69cd -print')? I'm hoping that gives some
> > insight on what actually preallocates/writes the file and perhaps that
> > helps us identify something we can trace. Also, if you think the file
> > has not been modified since the error, an 'xfs_bmap -v <file>' might be
> > interesting as well...
> > 
> > Brian
> > 
> > > [310740.407265] ------------[ cut here ]------------
> > > [310740.407269] WARNING: CPU: 3 PID: 108 at fs/xfs/xfs_aops.c:1241
> > > xfs_vm_releasepage+0x12e/0x140()
> > > [310740.407270] Modules linked in: netconsole ipt_REJECT nf_reject_ipv4
> > > xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q garp
> > > fuse sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd vxlan
> > > ip6_udp_tunnel shpchp udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
> > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci
> > > ehci_hcd ahci usbcore libahci igb usb_common i2c_algo_bit i2c_core ptp
> > > mpt3sas pps_core raid_class scsi_transport_sas
> > > [310740.407289] CPU: 3 PID: 108 Comm: kswapd0 Tainted: G           O
> > > 4.4.10+25-ph #1
> > > [310740.407290] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b
> > > 05/18/2015
> > > [310740.407291]  0000000000000000 ffff880c4da1fa88 ffffffffa13c6d0f
> > > 0000000000000000
> > > [310740.407292]  ffffffffa1a51a1c ffff880c4da1fac8 ffffffffa10837a7
> > > ffff880c4da1fae8
> > > [310740.407293]  0000000000000000 ffffea0000e38140 ffff8807e20bfd10
> > > ffffea0000e38160
> > > [310740.407295] Call Trace:
> > > [310740.407299]  [<ffffffffa13c6d0f>] dump_stack+0x63/0x84
> > > [310740.407301]  [<ffffffffa10837a7>] warn_slowpath_common+0x97/0xe0
> > > [310740.407302]  [<ffffffffa108380a>] warn_slowpath_null+0x1a/0x20
> > > [310740.407303]  [<ffffffffa1326f6e>] xfs_vm_releasepage+0x12e/0x140
> > > [310740.407305]  [<ffffffffa11520c2>] try_to_release_page+0x32/0x50
> > > [310740.407308]  [<ffffffffa1166a8e>] shrink_active_list+0x3ce/0x3e0
> > > [310740.407309]  [<ffffffffa1167127>] shrink_lruvec+0x687/0x7d0
> > > [310740.407311]  [<ffffffffa116734c>] shrink_zone+0xdc/0x2c0
> > > [310740.407312]  [<ffffffffa1168499>] kswapd+0x4f9/0x970
> > > [310740.407314]  [<ffffffffa1167fa0>] ?
> > > mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> > > [310740.407316]  [<ffffffffa10a0d99>] kthread+0xc9/0xe0
> > > [310740.407318]  [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100
> > > [310740.407320]  [<ffffffffa16b58cf>] ret_from_fork+0x3f/0x70
> > > [310740.407321]  [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100
> > > [310740.407322] ---[ end trace bf76ad5e8a4d863e ]---
> > > 
> > > 
> > > Stefan
> > > 
> > > Am 11.05.2016 um 17:59 schrieb Brian Foster:
> > > > Dropped non-XFS cc's, probably no need to spam other lists at this
> > > > point...
> > > > 
> > > > On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote:
> > > > > 
> > > > > Am 11.05.2016 um 15:34 schrieb Brian Foster:
> > > > > > On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote:
> > > > > > > Hi Brian,
> > > > > > > 
> > > > > > > i'm still unable to grab anything to the trace file? Is there anything
> > > > > > > to check if it's working at all?
> > > > > > > 
> > > > > > 
> > > > > > See my previous mail:
> > > > > > 
> > > > > > http://oss.sgi.com/pipermail/xfs/2016-March/047793.html
> > > > > > 
> > > > > > E.g., something like this should work after writing to and removing a
> > > > > > new file:
> > > > > > 
> > > > > > # trace-cmd start -e "xfs:xfs_releasepage"
> > > > > > # cat /sys/kernel/debug/tracing/trace_pipe
> > > > > > ...
> > > > > > rm-8198  [000] ....  9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0
> > > > > 
> > > > > arg sorry yes that's working but delalloc is always 0.
> > > > > 
> > > > 
> > > > Hrm, Ok. That is strange.
> > > > 
> > > > > May be i have to hook that into my initramfs to be fast enough?
> > > > > 
> > > > 
> > > > Not sure that would matter.. you said it occurs within 48 hours? I take
> > > > that to mean it doesn't occur immediately on boot. You should be able to
> > > > tell from the logs or dmesg if it happens before you get a chance to
> > > > start the tracing.
> > > > 
> > > > Well, the options I can think of are:
> > > > 
> > > > - Perhaps I botched matching up the line number to the warning, in which
> > > >    case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch
> > > >    any delalloc or unwritten blocks at releasepage() time.
> > > > 
> > > > - Perhaps there's a race that the tracepoint doesn't catch. The warnings
> > > >    are based on local vars, so we could instrument the code to print a
> > > >    warning[1] to try and get the inode number.
> > > > 
> > > > Brian
> > > > 
> > > > [1] - compile tested diff:
> > > > 
> > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > > > index 40645a4..94738ea 100644
> > > > --- a/fs/xfs/xfs_aops.c
> > > > +++ b/fs/xfs/xfs_aops.c
> > > > @@ -1038,11 +1038,18 @@ xfs_vm_releasepage(
> > > >   	gfp_t			gfp_mask)
> > > >   {
> > > >   	int			delalloc, unwritten;
> > > > +	struct xfs_inode	*ip = XFS_I(page->mapping->host);
> > > > 
> > > >   	trace_xfs_releasepage(page->mapping->host, page, 0, 0);
> > > > 
> > > >   	xfs_count_page_state(page, &delalloc, &unwritten);
> > > > 
> > > > +	if (delalloc || unwritten)
> > > > +		xfs_warn(ip->i_mount,
> > > > +		"ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx",
> > > > +			 ip->i_ino, delalloc, unwritten, page_offset(page),
> > > > +			 i_size_read(page->mapping->host));
> > > > +
> > > >   	if (WARN_ON_ONCE(delalloc))
> > > >   		return 0;
> > > >   	if (WARN_ON_ONCE(unwritten))
> > > > 
> > > > > Stefan
> > > > > 
> > > > > > Once that is working, add the grep command to filter out "delalloc 0"
> > > > > > instances, etc. For example:
> > > > > > 
> > > > > > 	cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out
> > > > > > 
> > > > > > Brian
> > > > > > 
> > > > > > > This still happens in the first 48 hours after a fresh reboot.
> > > > > > > 
> > > > > > > Stefan
> > > > > > > 
> > > > > > > Am 24.03.2016 um 13:24 schrieb Brian Foster:
> > > > > > > > On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
> > > > > > > > > 
> > > > > > > > > Am 24.03.2016 um 12:17 schrieb Brian Foster:
> > > > > > > > > > On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
> > > > > > > > > > > 
> > > > > > > > > > > Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
> > > > > > > > > > > > 
> > > > > > > > > > > > Am 23.03.2016 um 15:07 schrieb Brian Foster:
> > > > > > > > > > > > > On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
> > > > > > > > > > > > > > sorry new one the last one got mangled. Comments inside.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> > > > > > > > > > > > > > > On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> > > > > > > > > > > > > > > > On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> > > > > > > > > > > > > > > > > Am 04.03.2016 um 20:13 schrieb Brian Foster:
> > > > > > > > > > > > > > > > > > On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> > > > > > > > > > > > > > > > > > > Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> > > > > > > > > > > > > ...
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > This has happened again on 8 different hosts in the last 24 hours
> > > > > > > > > > > > > > running 4.4.6.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > All of those are KVM / Qemu hosts and are doing NO I/O except the normal
> > > > > > > > > > > > > > OS stuff as the VMs have remote storage. So no database, no rsync on
> > > > > > > > > > > > > > those hosts - just the OS doing nearly nothing.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > All those show:
> > > > > > > > > > > > > > [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
> > > > > > > > > > > > > > xfs_vm_releasepage+0xe2/0xf0()
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Ok, well at this point the warning isn't telling us anything beyond
> > > > > > > > > > > > > you're reproducing the problem. We can't really make progress without
> > > > > > > > > > > > > more information. We don't necessarily know what application or
> > > > > > > > > > > > > operations caused this by the time it occurs, but perhaps knowing what
> > > > > > > > > > > > > file is affected could give us a hint.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > We have the xfs_releasepage tracepoint, but that's unconditional and so
> > > > > > > > > > > > > might generate a lot of noise by default. Could you enable the
> > > > > > > > > > > > > xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
> > > > > > > > > > > > > E.g., we could leave a long running 'trace-cmd record -e
> > > > > > > > > > > > > "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
> > > > > > > > > > > > > problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
> > > > > > > > > > > > > -e "xfs:xfs_releasepage"' and leave something like 'cat
> > > > > > > > > > > > > /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
> > > > > > > > > > > > > ~/trace.out' running to capture instances.
> > > > > > > > > > > 
> > > > > > > > > > > Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
> > > > > > > > > > > it in the trace.out even the WARN_ONCE was already triggered?
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > The tracepoint is independent from the warning (see
> > > > > > > > > > xfs_vm_releasepage()), so the tracepoint will fire every invocation of
> > > > > > > > > > the function regardless of whether delalloc blocks still exist at that
> > > > > > > > > > point. That creates the need to filter the entries.
> > > > > > > > > > 
> > > > > > > > > > With regard to performance, I believe the tracepoints are intended to be
> > > > > > > > > > pretty lightweight. I don't think it should hurt to try it on a box,
> > > > > > > > > > observe for a bit and make sure there isn't a huge impact. Note that the
> > > > > > > > > > 'trace-cmd record' approach will save everything to file, so that's
> > > > > > > > > > something to consider I suppose.
> > > > > > > > > 
> > > > > > > > > Tests / cat is running. Is there any way to test if it works? Or is it
> > > > > > > > > enough that cat prints stuff from time to time but does not match -v
> > > > > > > > > delalloc 0
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > What is it printing where delalloc != 0? You could always just cat
> > > > > > > > trace_pipe and make sure the event is firing, it's just that I suspect
> > > > > > > > most entries will have delalloc == unwritten == 0.
> > > > > > > > 
> > > > > > > > Also, while the tracepoint fires independent of the warning, it might
> > > > > > > > not be a bad idea to restart a system that has already seen the warning
> > > > > > > > since boot, just to provide some correlation or additional notification
> > > > > > > > when the problem occurs.
> > > > > > > > 
> > > > > > > > Brian
> > > > > > > > 
> > > > > > > > > Stefan
> > > > > > > > > 
> > > > > > > > > _______________________________________________
> > > > > > > > > xfs mailing list
> > > > > > > > > xfs@oss.sgi.com
> > > > > > > > > http://oss.sgi.com/mailman/listinfo/xfs
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > xfs mailing list
> > > > > > > xfs@oss.sgi.com
> > > > > > > http://oss.sgi.com/mailman/listinfo/xfs
> > > > > 
> > > > > _______________________________________________
> > > > > xfs mailing list
> > > > > xfs@oss.sgi.com
> > > > > http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-05-16  1:06                                           ` Brian Foster
@ 2016-05-22 19:36                                             ` Stefan Priebe - Profihost AG
  2016-05-22 21:38                                               ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-05-22 19:36 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs-masters, xfs

Am 16.05.2016 um 03:06 schrieb Brian Foster:
> On Sun, May 15, 2016 at 02:41:40PM +0200, Stefan Priebe wrote:
>> Hi,
>>
>> find shows a ceph object file:
>> /var/lib/ceph/osd/ceph-13/current/3.29f_head/DIR_F/DIR_9/DIR_2/DIR_D/rbd\udata.904a406b8b4567.00000000000052d6__head_143BD29F__3
>>
> 
> Any idea what this file is? Does it represent user data, Ceph metadata?

It's user data.

> How was it created? Can you create others like it (I'm assuming via some
> file/block operation through Ceph) and/or reproduce the error?

It's the ceph osd daemon creating those files. It works with normal file
operations. I'm not able to force this.

> (Also, this thread is 20+ mails strong at this point, why is this the
> first reference to Ceph? :/)

Cause i still see no reference to ceph. It happens also on non ceph systems.


>> File was again modified since than.
>>
> xfs_bmap -v might still be interesting.

I was on holiday the last days - the file got deleted by ceph. Should i
recollect everything from a new trace?

>> At another system i've different output.
>> [Sun May 15 07:00:44 2016] XFS (md127p3): ino 0x600204f delalloc 1 unwritten
>> 0 pgoff 0x50000 size 0x13d1c8
>> [Sun May 15 07:00:44 2016] ------------[ cut here ]------------
>> [Sun May 15 07:00:44 2016] WARNING: CPU: 2 PID: 108 at
>> fs/xfs/xfs_aops.c:1239 xfs_vm_releasepage+0x10f/0x140()
> 
> This one is different, being a lingering delalloc block in this case.
> 
>> [Sun May 15 07:00:44 2016] Modules linked in: netconsole ipt_REJECT
>> nf_reject_ipv4 xt_multiport iptable_filter ip_tables x_tables bonding
>> coretemp 8021q garp fuse xhci_pci xhci_hcd sb_edac edac_core i2c_i801
>> i40e(O) shpchp vxlan ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler
>> button btrfs xor raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg
>> sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
>> i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas
>> [Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G       O
>> 4.4.10+25-ph #1
> 
> How close is this to an upstream kernel? Upstream XFS? Have you tried to
> reproduce this on an upstream kernel?

It's a vanilla 4.4.10 + a new adaptec driver and some sched and wq
patches from 4.5 and 4.6 but i can try to replace the kernel on one
machine with a 100% vanilla one if this helps.

>> [Sun May 15 07:00:44 2016] Hardware name: Supermicro Super Server/X10SRH-CF,
>> BIOS 1.0b 05/18/2015
>> [Sun May 15 07:00:44 2016]  0000000000000000 ffff880c4da37a88
>> ffffffff9c3c6d0f 0000000000000000
>> [Sun May 15 07:00:44 2016]  ffffffff9ca51a1c ffff880c4da37ac8
>> ffffffff9c0837a7 ffff880c4da37ae8
>> [Sun May 15 07:00:44 2016]  0000000000000001 ffffea0001053080
>> ffff8801429ef490 ffffea00010530a0
>> [Sun May 15 07:00:44 2016] Call Trace:
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c3c6d0f>] dump_stack+0x63/0x84
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c0837a7>]
>> warn_slowpath_common+0x97/0xe0
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c08380a>]
>> warn_slowpath_null+0x1a/0x20
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c326f4f>]
>> xfs_vm_releasepage+0x10f/0x140
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c1520c2>]
>> try_to_release_page+0x32/0x50
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c166a8e>]
>> shrink_active_list+0x3ce/0x3e0
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c167127>] shrink_lruvec+0x687/0x7d0
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c16734c>] shrink_zone+0xdc/0x2c0
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c168499>] kswapd+0x4f9/0x970
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c167fa0>] ?
>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c0a0d99>] kthread+0xc9/0xe0
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c0a0cd0>] ? kthread_stop+0x100/0x100
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c6b58cf>] ret_from_fork+0x3f/0x70
>> [Sun May 15 07:00:44 2016]  [<ffffffff9c0a0cd0>] ? kthread_stop+0x100/0x100
>> [Sun May 15 07:00:44 2016] ---[ end trace 9497d464aafe5b88 ]---
>> [295086.353469] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>> 0x51000 size 0x13d1c8
> 
> What is md127p3, is the root fs on some kind of raid device? Can you
> provide xfs_info for this filesystem?

It's a mdadm raid 1 amd the root fs.

# xfs_info /
meta-data=/dev/disk/by-uuid/afffa232-0025-4222-9952-adb31482fe4a
isize=256    agcount=4, agsize=1703936 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=6815744, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=3328, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

>> [295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>> 0x52000 size 0x13d1c8
>> [295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>> 0x53000 size 0x13d1c8
>> [295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>> 0x54000 size 0x13d1c8
> ...
>> [295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>> 0xab000 size 0x13d1c8
>> [295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>> 0xac000 size 0x13d1c8
>> [295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>> 0xad000 size 0x13d1c8
>>
>> The file to the inode number is:
>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en
>>
> 
> xfs_bmap -v might be interesting here as well.

# xfs_bmap -v
/var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en
/var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en:
 EXT: FILE-OFFSET      BLOCK-RANGE        AG AG-OFFSET        TOTAL
   0: [0..2567]:       41268928..41271495  3 (374464..377031)  2568


> This certainly seems like it is more repeatable. According to google,
> the content of /var/lib/apt/lists/ can be removed and repopulated safely
> with 'apt-get update' (please verify before trying). Does that reproduce
> this variant of the problem?
> Note that the apt command might not directly cause the error message,
> but rather only create the conditions for it to occur sometime later via
> memory reclaim. E.g., you might need to run 'sync; echo 3 >
> /proc/sys/vm/drop_caches' after, or possibly run a dummy workload of
> some kind (e.g., dd if=/dev/zero of=tmpfile bs=1M ...) to cause memory
> pressure and reclaim the pagecache of the package list file.

OK - this is what i did but no trace:
  106  22.05.2016 - 21:31:03 reboot
  108  22.05.2016 - 21:33:25 dmesg -c
  109  22.05.2016 - 21:33:51 mv /var/lib/apt/lists /var/lib/apt/lists.backup
  110  22.05.2016 - 21:33:54 apt-get update
  111  22.05.2016 - 21:34:09 ls -la /var/lib/apt/lists
  112  22.05.2016 - 21:34:58 dmesg
  113  22.05.2016 - 21:35:14 sync; echo 3 >/proc/sys/vm/drop_caches
  114  22.05.2016 - 21:35:17 dmesg
  115  22.05.2016 - 21:35:50 dd if=/dev/zero of=tmpfile bs=1M
count=4096; rm -v tmpfile
  116  22.05.2016 - 21:35:55 dmesg

Greets,
Stefan

> Brian
> 
>> dmesg output / trace was at 7 am today and last modify of the file was
>> yesterday 11 pm.
>>
>> Stefan
>>
>> Am 15.05.2016 um 13:50 schrieb Brian Foster:
>>> On Sun, May 15, 2016 at 01:03:07PM +0200, Stefan Priebe wrote:
>>>> Hi Brian,
>>>>
>>>> here's the new trace:
>>>> [310740.407263] XFS (sdf1): ino 0x27c69cd delalloc 0 unwritten 1 pgoff
>>>> 0x19f000 size 0x1a0000
>>>
>>> So it is actually an unwritten buffer, on what appears to be the last
>>> page of the file. Well, we had 60630fe ("xfs: clean up unwritten buffers
>>> on write failure") that went into 4.6, but that was reproducible on
>>> sub-4k block size filesystems and depends on some kind of write error.
>>> Are either of those applicable here? Are you close to ENOSPC, for
>>> example?
>>>
>>> Otherwise, have you determined what file is associated with that inode
>>> (e.g., 'find <mnt> -inum 0x27c69cd -print')? I'm hoping that gives some
>>> insight on what actually preallocates/writes the file and perhaps that
>>> helps us identify something we can trace. Also, if you think the file
>>> has not been modified since the error, an 'xfs_bmap -v <file>' might be
>>> interesting as well...
>>>
>>> Brian
>>>
>>>> [310740.407265] ------------[ cut here ]------------
>>>> [310740.407269] WARNING: CPU: 3 PID: 108 at fs/xfs/xfs_aops.c:1241
>>>> xfs_vm_releasepage+0x12e/0x140()
>>>> [310740.407270] Modules linked in: netconsole ipt_REJECT nf_reject_ipv4
>>>> xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q garp
>>>> fuse sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd vxlan
>>>> ip6_udp_tunnel shpchp udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
>>>> raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci
>>>> ehci_hcd ahci usbcore libahci igb usb_common i2c_algo_bit i2c_core ptp
>>>> mpt3sas pps_core raid_class scsi_transport_sas
>>>> [310740.407289] CPU: 3 PID: 108 Comm: kswapd0 Tainted: G           O
>>>> 4.4.10+25-ph #1
>>>> [310740.407290] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b
>>>> 05/18/2015
>>>> [310740.407291]  0000000000000000 ffff880c4da1fa88 ffffffffa13c6d0f
>>>> 0000000000000000
>>>> [310740.407292]  ffffffffa1a51a1c ffff880c4da1fac8 ffffffffa10837a7
>>>> ffff880c4da1fae8
>>>> [310740.407293]  0000000000000000 ffffea0000e38140 ffff8807e20bfd10
>>>> ffffea0000e38160
>>>> [310740.407295] Call Trace:
>>>> [310740.407299]  [<ffffffffa13c6d0f>] dump_stack+0x63/0x84
>>>> [310740.407301]  [<ffffffffa10837a7>] warn_slowpath_common+0x97/0xe0
>>>> [310740.407302]  [<ffffffffa108380a>] warn_slowpath_null+0x1a/0x20
>>>> [310740.407303]  [<ffffffffa1326f6e>] xfs_vm_releasepage+0x12e/0x140
>>>> [310740.407305]  [<ffffffffa11520c2>] try_to_release_page+0x32/0x50
>>>> [310740.407308]  [<ffffffffa1166a8e>] shrink_active_list+0x3ce/0x3e0
>>>> [310740.407309]  [<ffffffffa1167127>] shrink_lruvec+0x687/0x7d0
>>>> [310740.407311]  [<ffffffffa116734c>] shrink_zone+0xdc/0x2c0
>>>> [310740.407312]  [<ffffffffa1168499>] kswapd+0x4f9/0x970
>>>> [310740.407314]  [<ffffffffa1167fa0>] ?
>>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>>>> [310740.407316]  [<ffffffffa10a0d99>] kthread+0xc9/0xe0
>>>> [310740.407318]  [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100
>>>> [310740.407320]  [<ffffffffa16b58cf>] ret_from_fork+0x3f/0x70
>>>> [310740.407321]  [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100
>>>> [310740.407322] ---[ end trace bf76ad5e8a4d863e ]---
>>>>
>>>>
>>>> Stefan
>>>>
>>>> Am 11.05.2016 um 17:59 schrieb Brian Foster:
>>>>> Dropped non-XFS cc's, probably no need to spam other lists at this
>>>>> point...
>>>>>
>>>>> On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote:
>>>>>>
>>>>>> Am 11.05.2016 um 15:34 schrieb Brian Foster:
>>>>>>> On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Brian,
>>>>>>>>
>>>>>>>> i'm still unable to grab anything to the trace file? Is there anything
>>>>>>>> to check if it's working at all?
>>>>>>>>
>>>>>>>
>>>>>>> See my previous mail:
>>>>>>>
>>>>>>> http://oss.sgi.com/pipermail/xfs/2016-March/047793.html
>>>>>>>
>>>>>>> E.g., something like this should work after writing to and removing a
>>>>>>> new file:
>>>>>>>
>>>>>>> # trace-cmd start -e "xfs:xfs_releasepage"
>>>>>>> # cat /sys/kernel/debug/tracing/trace_pipe
>>>>>>> ...
>>>>>>> rm-8198  [000] ....  9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0
>>>>>>
>>>>>> arg sorry yes that's working but delalloc is always 0.
>>>>>>
>>>>>
>>>>> Hrm, Ok. That is strange.
>>>>>
>>>>>> May be i have to hook that into my initramfs to be fast enough?
>>>>>>
>>>>>
>>>>> Not sure that would matter.. you said it occurs within 48 hours? I take
>>>>> that to mean it doesn't occur immediately on boot. You should be able to
>>>>> tell from the logs or dmesg if it happens before you get a chance to
>>>>> start the tracing.
>>>>>
>>>>> Well, the options I can think of are:
>>>>>
>>>>> - Perhaps I botched matching up the line number to the warning, in which
>>>>>    case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch
>>>>>    any delalloc or unwritten blocks at releasepage() time.
>>>>>
>>>>> - Perhaps there's a race that the tracepoint doesn't catch. The warnings
>>>>>    are based on local vars, so we could instrument the code to print a
>>>>>    warning[1] to try and get the inode number.
>>>>>
>>>>> Brian
>>>>>
>>>>> [1] - compile tested diff:
>>>>>
>>>>> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
>>>>> index 40645a4..94738ea 100644
>>>>> --- a/fs/xfs/xfs_aops.c
>>>>> +++ b/fs/xfs/xfs_aops.c
>>>>> @@ -1038,11 +1038,18 @@ xfs_vm_releasepage(
>>>>>   	gfp_t			gfp_mask)
>>>>>   {
>>>>>   	int			delalloc, unwritten;
>>>>> +	struct xfs_inode	*ip = XFS_I(page->mapping->host);
>>>>>
>>>>>   	trace_xfs_releasepage(page->mapping->host, page, 0, 0);
>>>>>
>>>>>   	xfs_count_page_state(page, &delalloc, &unwritten);
>>>>>
>>>>> +	if (delalloc || unwritten)
>>>>> +		xfs_warn(ip->i_mount,
>>>>> +		"ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx",
>>>>> +			 ip->i_ino, delalloc, unwritten, page_offset(page),
>>>>> +			 i_size_read(page->mapping->host));
>>>>> +
>>>>>   	if (WARN_ON_ONCE(delalloc))
>>>>>   		return 0;
>>>>>   	if (WARN_ON_ONCE(unwritten))
>>>>>
>>>>>> Stefan
>>>>>>
>>>>>>> Once that is working, add the grep command to filter out "delalloc 0"
>>>>>>> instances, etc. For example:
>>>>>>>
>>>>>>> 	cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>>> This still happens in the first 48 hours after a fresh reboot.
>>>>>>>>
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 24.03.2016 um 13:24 schrieb Brian Foster:
>>>>>>>>> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>
>>>>>>>>>> Am 24.03.2016 um 12:17 schrieb Brian Foster:
>>>>>>>>>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>>>>>>>>>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>> sorry new one the last one got mangled. Comments inside.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This has happened again on 8 different hosts in the last 24 hours
>>>>>>>>>>>>>>> running 4.4.6.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>>>>>>>>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>>>>>>>>>>>>>> those hosts - just the OS doing nearly nothing.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> All those show:
>>>>>>>>>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>>>>>>>>>>>>>> xfs_vm_releasepage+0xe2/0xf0()
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok, well at this point the warning isn't telling us anything beyond
>>>>>>>>>>>>>> you're reproducing the problem. We can't really make progress without
>>>>>>>>>>>>>> more information. We don't necessarily know what application or
>>>>>>>>>>>>>> operations caused this by the time it occurs, but perhaps knowing what
>>>>>>>>>>>>>> file is affected could give us a hint.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>>>>>>>>>>>>>> might generate a lot of noise by default. Could you enable the
>>>>>>>>>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>>>>>>>>>>>>>> E.g., we could leave a long running 'trace-cmd record -e
>>>>>>>>>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>>>>>>>>>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>>>>>>>>>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>>>>>>>>>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>>>>>>>>>>>>>> ~/trace.out' running to capture instances.
>>>>>>>>>>>>
>>>>>>>>>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
>>>>>>>>>>>> it in the trace.out even the WARN_ONCE was already triggered?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The tracepoint is independent from the warning (see
>>>>>>>>>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
>>>>>>>>>>> the function regardless of whether delalloc blocks still exist at that
>>>>>>>>>>> point. That creates the need to filter the entries.
>>>>>>>>>>>
>>>>>>>>>>> With regard to performance, I believe the tracepoints are intended to be
>>>>>>>>>>> pretty lightweight. I don't think it should hurt to try it on a box,
>>>>>>>>>>> observe for a bit and make sure there isn't a huge impact. Note that the
>>>>>>>>>>> 'trace-cmd record' approach will save everything to file, so that's
>>>>>>>>>>> something to consider I suppose.
>>>>>>>>>>
>>>>>>>>>> Tests / cat is running. Is there any way to test if it works? Or is it
>>>>>>>>>> enough that cat prints stuff from time to time but does not match -v
>>>>>>>>>> delalloc 0
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What is it printing where delalloc != 0? You could always just cat
>>>>>>>>> trace_pipe and make sure the event is firing, it's just that I suspect
>>>>>>>>> most entries will have delalloc == unwritten == 0.
>>>>>>>>>
>>>>>>>>> Also, while the tracepoint fires independent of the warning, it might
>>>>>>>>> not be a bad idea to restart a system that has already seen the warning
>>>>>>>>> since boot, just to provide some correlation or additional notification
>>>>>>>>> when the problem occurs.
>>>>>>>>>
>>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> xfs mailing list
>>>>>>>>>> xfs@oss.sgi.com
>>>>>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> xfs mailing list
>>>>>>>> xfs@oss.sgi.com
>>>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>>>
>>>>>> _______________________________________________
>>>>>> xfs mailing list
>>>>>> xfs@oss.sgi.com
>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-05-22 19:36                                             ` Stefan Priebe - Profihost AG
@ 2016-05-22 21:38                                               ` Dave Chinner
  2016-05-30  7:23                                                 ` Stefan Priebe - Profihost AG
  2016-06-03 17:56                                                 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe - Profihost AG
  0 siblings, 2 replies; 49+ messages in thread
From: Dave Chinner @ 2016-05-22 21:38 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: xfs-masters, Brian Foster, xfs

On Sun, May 22, 2016 at 09:36:39PM +0200, Stefan Priebe - Profihost AG wrote:
> Am 16.05.2016 um 03:06 schrieb Brian Foster:
> >> sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
> >> i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas
> >> [Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G       O
> >> 4.4.10+25-ph #1
> > 
> > How close is this to an upstream kernel? Upstream XFS? Have you tried to
> > reproduce this on an upstream kernel?
> 
> It's a vanilla 4.4.10 + a new adaptec driver and some sched and wq
> patches from 4.5 and 4.6 but i can try to replace the kernel on one
> machine with a 100% vanilla one if this helps.

Please do.

> >> [295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
> >> 0x52000 size 0x13d1c8
> >> [295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
> >> 0x53000 size 0x13d1c8
> >> [295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
> >> 0x54000 size 0x13d1c8
> > ...
> >> [295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
> >> 0xab000 size 0x13d1c8
> >> [295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
> >> 0xac000 size 0x13d1c8
> >> [295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
> >> 0xad000 size 0x13d1c8
> >>
> >> The file to the inode number is:
> >> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en
> >>
> > 
> > xfs_bmap -v might be interesting here as well.
> 
> # xfs_bmap -v
> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en
> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en:
>  EXT: FILE-OFFSET      BLOCK-RANGE        AG AG-OFFSET        TOTAL
>    0: [0..2567]:       41268928..41271495  3 (374464..377031)  2568

So the last file offset with a block is 0x140e00. This means the
file is fully allocated. However, the pages inside the file range
are still marked delayed allocation. That implies that we've failed
to write the pages over a delayed allocation region after we've
allocated the space.

That, in turn, tends to indicate a problem in page writeback - the
first page to be written has triggered delayed allocation of the
entire range, but then the subsequent pages have not been written
(for some as yet unknown reason). When a page is written, we map it
to the current block via xfs_map_at_offset(), and that clears both
the buffer delay and unwritten flags.

This clearly isn't happening which means either the VFS doesn't
think the inode is dirty anymore, writeback is never asking for
these pages to be written, or XFs is screwing something up in
->writepage. The XFS writepage code changed significantly in 4.6, so
it might be worth seeing if a 4.6 kernel reproduces this same
problem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-05-22 21:38                                               ` Dave Chinner
@ 2016-05-30  7:23                                                 ` Stefan Priebe - Profihost AG
  2016-05-30 22:36                                                   ` shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) Dave Chinner
  2016-06-03 17:56                                                 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-05-30  7:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs-masters, Brian Foster, xfs

Hi Dave,
  Hi Brian,

below are the results with a vanilla 4.4.11 kernel.

Am 22.05.2016 um 23:38 schrieb Dave Chinner:
> On Sun, May 22, 2016 at 09:36:39PM +0200, Stefan Priebe - Profihost AG wrote:
>> Am 16.05.2016 um 03:06 schrieb Brian Foster:
>>>> sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
>>>> i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas
>>>> [Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G       O
>>>> 4.4.10+25-ph #1
>>>
>>> How close is this to an upstream kernel? Upstream XFS? Have you tried to
>>> reproduce this on an upstream kernel?
>>
>> It's a vanilla 4.4.10 + a new adaptec driver and some sched and wq
>> patches from 4.5 and 4.6 but i can try to replace the kernel on one
>> machine with a 100% vanilla one if this helps.
> 
> Please do.
> 
>>>> [295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>> 0x52000 size 0x13d1c8
>>>> [295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>> 0x53000 size 0x13d1c8
>>>> [295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>> 0x54000 size 0x13d1c8
>>> ...
>>>> [295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>> 0xab000 size 0x13d1c8
>>>> [295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>> 0xac000 size 0x13d1c8
>>>> [295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>> 0xad000 size 0x13d1c8
>>>>
>>>> The file to the inode number is:
>>>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en
>>>>
>>>
>>> xfs_bmap -v might be interesting here as well.
>>
>> # xfs_bmap -v
>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en
>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en:
>>  EXT: FILE-OFFSET      BLOCK-RANGE        AG AG-OFFSET        TOTAL
>>    0: [0..2567]:       41268928..41271495  3 (374464..377031)  2568
> 
> So the last file offset with a block is 0x140e00. This means the
> file is fully allocated. However, the pages inside the file range
> are still marked delayed allocation. That implies that we've failed
> to write the pages over a delayed allocation region after we've
> allocated the space.
> 
> That, in turn, tends to indicate a problem in page writeback - the
> first page to be written has triggered delayed allocation of the
> entire range, but then the subsequent pages have not been written
> (for some as yet unknown reason). When a page is written, we map it
> to the current block via xfs_map_at_offset(), and that clears both
> the buffer delay and unwritten flags.
> 
> This clearly isn't happening which means either the VFS doesn't
> think the inode is dirty anymore, writeback is never asking for
> these pages to be written, or XFs is screwing something up in
> ->writepage. The XFS writepage code changed significantly in 4.6, so
> it might be worth seeing if a 4.6 kernel reproduces this same
> problem....

i've now used a vanilla 4.4.11 Kernel and the issue remains. After a
fresh reboot it has happened again on the root FS for a debian apt file:

XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990
------------[ cut here ]------------
WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239
xfs_vm_releasepage+0x10f/0x140()
Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse
sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan
ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod
ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas
CPU: 1 PID: 111 Comm: kswapd0 Tainted: G           O    4.4.11 #1
Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015
 0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000
 ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8
 0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660
Call Trace:
 [<ffffffffa23c5b8f>] dump_stack+0x63/0x84
 [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0
 [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20
 [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140
 [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0
 [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150
 [<ffffffffa21521c2>] try_to_release_page+0x32/0x50
 [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0
 [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0
 [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0
 [<ffffffffa2168539>] kswapd+0x4f9/0x970
 [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
 [<ffffffffa20a0d99>] kthread+0xc9/0xe0
 [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
 [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70
 [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
---[ end trace c9d679f8ed4d7610 ]---
XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size
0x12b990
XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size
0x12b990
XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x3000 size
0x12b990
XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x4000 size
0x12b990
XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x5000 size
0x12b990
XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x6000 size
0x12b990
XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x7000 size
0x12b990
XFS (md127p3): ino 0x400de4c delalloc 1 unwritten 0 pgoff 0x12000 size
0x2cc69

# find / -inum $(printf "%d" 0x41221d1) -print
/var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_source_Sources

# xfs_bmap -v
/var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_source_Sources
/var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_source_Sources:
 EXT: FILE-OFFSET      BLOCK-RANGE        AG AG-OFFSET        TOTAL
   0: [0..2399]:       27851552..27853951  2 (588576..590975)  2400

So you mean the next step would be to test 4.6? I hope this is stable
enough for production usage.

Greets,
Stefan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-05-30  7:23                                                 ` Stefan Priebe - Profihost AG
@ 2016-05-30 22:36                                                   ` Dave Chinner
  2016-05-31  1:07                                                     ` Minchan Kim
  0 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2016-05-30 22:36 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: linux-mm, Brian Foster, linux-kernel, xfs

[adding lkml and linux-mm to the cc list]

On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote:
> Hi Dave,
>   Hi Brian,
> 
> below are the results with a vanilla 4.4.11 kernel.

Thanks for persisting with the testing, Stefan.

....

> i've now used a vanilla 4.4.11 Kernel and the issue remains. After a
> fresh reboot it has happened again on the root FS for a debian apt file:
> 
> XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239
> xfs_vm_releasepage+0x10f/0x140()
> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
> iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse
> sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan
> ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
> raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod
> ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
> i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas
> CPU: 1 PID: 111 Comm: kswapd0 Tainted: G           O    4.4.11 #1
> Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015
>  0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000
>  ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8
>  0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660
> Call Trace:
>  [<ffffffffa23c5b8f>] dump_stack+0x63/0x84
>  [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0
>  [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140
>  [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0
>  [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150
>  [<ffffffffa21521c2>] try_to_release_page+0x32/0x50
>  [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0
>  [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0
>  [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0
>  [<ffffffffa2168539>] kswapd+0x4f9/0x970
>  [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>  [<ffffffffa20a0d99>] kthread+0xc9/0xe0
>  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
>  [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70
>  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
> ---[ end trace c9d679f8ed4d7610 ]---
> XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size
> 0x12b990
> XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size
.....

Ok, I suspect this may be a VM bug. I've been looking at the 4.6
code (so please try to reproduce on that kernel!) but it looks to me
like the only way we can get from shrink_active_list() direct to
try_to_release_page() is if we are over the maximum bufferhead
threshold (i.e buffer_heads_over_limit = true) and we are trying to
reclaim pages direct from the active list.

Because we are called from kswapd()->balance_pgdat(), we have:

        struct scan_control sc = {
                .gfp_mask = GFP_KERNEL,
                .order = order,
                .priority = DEF_PRIORITY,
                .may_writepage = !laptop_mode,
                .may_unmap = 1,
                .may_swap = 1,
        };

The key point here is reclaim is being run with .may_writepage =
true for default configuration kernels. when we get to
shrink_active_list():

	if (!sc->may_writepage)
		isolate_mode |= ISOLATE_CLEAN;

But sc->may_writepage = true and this allows isolate_lru_pages() to
isolate dirty pages from the active list. Normally this isn't a
problem, because the isolated active list pages are rotated to the
inactive list, and nothing else happens to them. *Except when
buffer_heads_over_limit = true*. This special condition would
explain why I have never seen apt/dpkg cause this problem on any of
my (many) Debian systems that all use XFS....

In that case, shrink_active_list() runs:

	if (unlikely(buffer_heads_over_limit)) {
		if (page_has_private(page) && trylock_page(page)) {
			if (page_has_private(page))
				try_to_release_page(page, 0);
			unlock_page(page);
		}
	}

i.e. it locks the page, and if it has buffer heads it trys to get
the bufferheads freed from the page.

But this is a dirty page, which means it may have delalloc or
unwritten state on it's buffers, both of which indicate that there
is dirty data in teh page that hasn't been written. XFS issues a
warning on this because neither shrink_active_list nor
try_to_release_page() check for whether the page is dirty or not.

Hence it seems to me that shrink_active_list() is calling
try_to_release_page() inappropriately, and XFS is just the
messenger. If you turn laptop mode on, it is likely the problem will
go away as kswapd will run with .may_writepage = false, but that
will also cause other behavioural changes relating to writeback and
memory reclaim. It might be worth trying as a workaround for now.

MM-folk - is this analysis correct? If so, why is
shrink_active_list() calling try_to_release_page() on dirty pages?
Is this just an oversight or is there some problem that this is
trying to work around? It seems trivial to fix to me (add a
!PageDirty check), but I don't know why the check is there in the
first place...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-05-30 22:36                                                   ` shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) Dave Chinner
@ 2016-05-31  1:07                                                     ` Minchan Kim
  2016-05-31  2:55                                                       ` Dave Chinner
  2016-05-31  9:50                                                       ` Jan Kara
  0 siblings, 2 replies; 49+ messages in thread
From: Minchan Kim @ 2016-05-31  1:07 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-mm, Brian Foster, xfs, linux-kernel, Stefan Priebe - Profihost AG

On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote:
> [adding lkml and linux-mm to the cc list]
> 
> On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote:
> > Hi Dave,
> >   Hi Brian,
> > 
> > below are the results with a vanilla 4.4.11 kernel.
> 
> Thanks for persisting with the testing, Stefan.
> 
> ....
> 
> > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a
> > fresh reboot it has happened again on the root FS for a debian apt file:
> > 
> > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239
> > xfs_vm_releasepage+0x10f/0x140()
> > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
> > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse
> > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan
> > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
> > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod
> > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
> > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas
> > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G           O    4.4.11 #1
> > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015
> >  0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000
> >  ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8
> >  0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660
> > Call Trace:
> >  [<ffffffffa23c5b8f>] dump_stack+0x63/0x84
> >  [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0
> >  [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20
> >  [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140
> >  [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0
> >  [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150
> >  [<ffffffffa21521c2>] try_to_release_page+0x32/0x50
> >  [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0
> >  [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0
> >  [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0
> >  [<ffffffffa2168539>] kswapd+0x4f9/0x970
> >  [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> >  [<ffffffffa20a0d99>] kthread+0xc9/0xe0
> >  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
> >  [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70
> >  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
> > ---[ end trace c9d679f8ed4d7610 ]---
> > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size
> > 0x12b990
> > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size
> .....
> 
> Ok, I suspect this may be a VM bug. I've been looking at the 4.6
> code (so please try to reproduce on that kernel!) but it looks to me
> like the only way we can get from shrink_active_list() direct to
> try_to_release_page() is if we are over the maximum bufferhead
> threshold (i.e buffer_heads_over_limit = true) and we are trying to
> reclaim pages direct from the active list.
> 
> Because we are called from kswapd()->balance_pgdat(), we have:
> 
>         struct scan_control sc = {
>                 .gfp_mask = GFP_KERNEL,
>                 .order = order,
>                 .priority = DEF_PRIORITY,
>                 .may_writepage = !laptop_mode,
>                 .may_unmap = 1,
>                 .may_swap = 1,
>         };
> 
> The key point here is reclaim is being run with .may_writepage =
> true for default configuration kernels. when we get to
> shrink_active_list():
> 
> 	if (!sc->may_writepage)
> 		isolate_mode |= ISOLATE_CLEAN;
> 
> But sc->may_writepage = true and this allows isolate_lru_pages() to
> isolate dirty pages from the active list. Normally this isn't a
> problem, because the isolated active list pages are rotated to the
> inactive list, and nothing else happens to them. *Except when
> buffer_heads_over_limit = true*. This special condition would
> explain why I have never seen apt/dpkg cause this problem on any of
> my (many) Debian systems that all use XFS....
> 
> In that case, shrink_active_list() runs:
> 
> 	if (unlikely(buffer_heads_over_limit)) {
> 		if (page_has_private(page) && trylock_page(page)) {
> 			if (page_has_private(page))
> 				try_to_release_page(page, 0);
> 			unlock_page(page);
> 		}
> 	}
> 
> i.e. it locks the page, and if it has buffer heads it trys to get
> the bufferheads freed from the page.
> 
> But this is a dirty page, which means it may have delalloc or
> unwritten state on it's buffers, both of which indicate that there
> is dirty data in teh page that hasn't been written. XFS issues a
> warning on this because neither shrink_active_list nor
> try_to_release_page() check for whether the page is dirty or not.
> 
> Hence it seems to me that shrink_active_list() is calling
> try_to_release_page() inappropriately, and XFS is just the
> messenger. If you turn laptop mode on, it is likely the problem will
> go away as kswapd will run with .may_writepage = false, but that
> will also cause other behavioural changes relating to writeback and
> memory reclaim. It might be worth trying as a workaround for now.
> 
> MM-folk - is this analysis correct? If so, why is
> shrink_active_list() calling try_to_release_page() on dirty pages?
> Is this just an oversight or is there some problem that this is
> trying to work around? It seems trivial to fix to me (add a
> !PageDirty check), but I don't know why the check is there in the
> first place...

It seems to be latter.
Below commit seems to be related.
[ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.]

At that time, even shrink_page_list works like this.

shrink_page_list
        while (!list_empty(page_list)) {
                ..
                ..
                if (PageDirty(page)) {
                        ..
                }

                /*
                 * If the page has buffers, try to free the buffer mappings
                 * associated with this page. If we succeed we try to free
                 * the page as well.
                 *
                 * We do this even if the page is PageDirty().
                 * try_to_release_page() does not perform I/O, but it is
                 * possible for a page to have PageDirty set, but it is actually
                 * clean (all its buffers are clean).  This happens if the
                 * buffers were written out directly, with submit_bh(). ext3
                 * will do this, as well as the blockdev mapping. 
                 * try_to_release_page() will discover that cleanness and will
                 * drop the buffers and mark the page clean - it can be freed.
                 * ..
                 */
                if (PagePrivate(page)) {
                        if (!try_to_release_page(page, sc->gfp_mask))
                                goto activate_locked;
                        if (!mapping && page_count(page) == 1)
                                goto free_it;
                }
                ..
        }

I wonder whether it's valid or not with on ext4.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-05-31  1:07                                                     ` Minchan Kim
@ 2016-05-31  2:55                                                       ` Dave Chinner
  2016-05-31  3:59                                                         ` Minchan Kim
  2016-05-31  9:50                                                       ` Jan Kara
  1 sibling, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2016-05-31  2:55 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, Brian Foster, xfs, linux-kernel, Stefan Priebe - Profihost AG

On Tue, May 31, 2016 at 10:07:24AM +0900, Minchan Kim wrote:
> On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote:
> > [adding lkml and linux-mm to the cc list]
> > 
> > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote:
> > > Hi Dave,
> > >   Hi Brian,
> > > 
> > > below are the results with a vanilla 4.4.11 kernel.
> > 
> > Thanks for persisting with the testing, Stefan.
> > 
> > ....
> > 
> > > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a
> > > fresh reboot it has happened again on the root FS for a debian apt file:
> > > 
> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239
> > > xfs_vm_releasepage+0x10f/0x140()
> > > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
> > > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse
> > > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan
> > > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
> > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod
> > > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
> > > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas
> > > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G           O    4.4.11 #1
> > > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015
> > >  0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000
> > >  ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8
> > >  0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660
> > > Call Trace:
> > >  [<ffffffffa23c5b8f>] dump_stack+0x63/0x84
> > >  [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0
> > >  [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20
> > >  [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140
> > >  [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0
> > >  [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150
> > >  [<ffffffffa21521c2>] try_to_release_page+0x32/0x50
> > >  [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0
> > >  [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0
> > >  [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0
> > >  [<ffffffffa2168539>] kswapd+0x4f9/0x970
> > >  [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> > >  [<ffffffffa20a0d99>] kthread+0xc9/0xe0
> > >  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
> > >  [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70
> > >  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
> > > ---[ end trace c9d679f8ed4d7610 ]---
> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size
> > > 0x12b990
> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size
> > .....
> > 
> > Ok, I suspect this may be a VM bug. I've been looking at the 4.6
> > code (so please try to reproduce on that kernel!) but it looks to me
> > like the only way we can get from shrink_active_list() direct to
> > try_to_release_page() is if we are over the maximum bufferhead
> > threshold (i.e buffer_heads_over_limit = true) and we are trying to
> > reclaim pages direct from the active list.
> > 
> > Because we are called from kswapd()->balance_pgdat(), we have:
> > 
> >         struct scan_control sc = {
> >                 .gfp_mask = GFP_KERNEL,
> >                 .order = order,
> >                 .priority = DEF_PRIORITY,
> >                 .may_writepage = !laptop_mode,
> >                 .may_unmap = 1,
> >                 .may_swap = 1,
> >         };
> > 
> > The key point here is reclaim is being run with .may_writepage =
> > true for default configuration kernels. when we get to
> > shrink_active_list():
> > 
> > 	if (!sc->may_writepage)
> > 		isolate_mode |= ISOLATE_CLEAN;
> > 
> > But sc->may_writepage = true and this allows isolate_lru_pages() to
> > isolate dirty pages from the active list. Normally this isn't a
> > problem, because the isolated active list pages are rotated to the
> > inactive list, and nothing else happens to them. *Except when
> > buffer_heads_over_limit = true*. This special condition would
> > explain why I have never seen apt/dpkg cause this problem on any of
> > my (many) Debian systems that all use XFS....
> > 
> > In that case, shrink_active_list() runs:
> > 
> > 	if (unlikely(buffer_heads_over_limit)) {
> > 		if (page_has_private(page) && trylock_page(page)) {
> > 			if (page_has_private(page))
> > 				try_to_release_page(page, 0);
> > 			unlock_page(page);
> > 		}
> > 	}
> > 
> > i.e. it locks the page, and if it has buffer heads it trys to get
> > the bufferheads freed from the page.
> > 
> > But this is a dirty page, which means it may have delalloc or
> > unwritten state on it's buffers, both of which indicate that there
> > is dirty data in teh page that hasn't been written. XFS issues a
> > warning on this because neither shrink_active_list nor
> > try_to_release_page() check for whether the page is dirty or not.
> > 
> > Hence it seems to me that shrink_active_list() is calling
> > try_to_release_page() inappropriately, and XFS is just the
> > messenger. If you turn laptop mode on, it is likely the problem will
> > go away as kswapd will run with .may_writepage = false, but that
> > will also cause other behavioural changes relating to writeback and
> > memory reclaim. It might be worth trying as a workaround for now.
> > 
> > MM-folk - is this analysis correct? If so, why is
> > shrink_active_list() calling try_to_release_page() on dirty pages?
> > Is this just an oversight or is there some problem that this is
> > trying to work around? It seems trivial to fix to me (add a
> > !PageDirty check), but I don't know why the check is there in the
> > first place...
> 
> It seems to be latter.
> Below commit seems to be related.
> [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.]

Okay, that's been there a long, long time (2007), and it covers a
case where the filesystem cleans pages without the VM knowing about
it (i.e. it marks bufferheads clean without clearing the PageDirty
state).

That does not explain the code in shrink_active_list().

> At that time, even shrink_page_list works like this.

The current code in shrink_page_list still works this way - the
PageDirty code will *jump over the PagePrivate case* if the page is
to remain dirty or pageout() fails to make it clean.  Hence it never
gets to try_to_release_page() on a dirty page.

Seems like this really needs a dirty check in shrink_active_list()
and to leave the stripping of bufferheads from dirty pages in the
ext3 corner case to shrink_inactive_list() once the dirty pages have
been rotated off the active list...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-05-31  2:55                                                       ` Dave Chinner
@ 2016-05-31  3:59                                                         ` Minchan Kim
  2016-05-31  6:07                                                           ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Minchan Kim @ 2016-05-31  3:59 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-mm, Brian Foster, xfs, linux-kernel, Stefan Priebe - Profihost AG

On Tue, May 31, 2016 at 12:55:09PM +1000, Dave Chinner wrote:
> On Tue, May 31, 2016 at 10:07:24AM +0900, Minchan Kim wrote:
> > On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote:
> > > [adding lkml and linux-mm to the cc list]
> > > 
> > > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote:
> > > > Hi Dave,
> > > >   Hi Brian,
> > > > 
> > > > below are the results with a vanilla 4.4.11 kernel.
> > > 
> > > Thanks for persisting with the testing, Stefan.
> > > 
> > > ....
> > > 
> > > > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a
> > > > fresh reboot it has happened again on the root FS for a debian apt file:
> > > > 
> > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239
> > > > xfs_vm_releasepage+0x10f/0x140()
> > > > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
> > > > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse
> > > > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan
> > > > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
> > > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod
> > > > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
> > > > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas
> > > > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G           O    4.4.11 #1
> > > > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015
> > > >  0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000
> > > >  ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8
> > > >  0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660
> > > > Call Trace:
> > > >  [<ffffffffa23c5b8f>] dump_stack+0x63/0x84
> > > >  [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0
> > > >  [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20
> > > >  [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140
> > > >  [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0
> > > >  [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150
> > > >  [<ffffffffa21521c2>] try_to_release_page+0x32/0x50
> > > >  [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0
> > > >  [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0
> > > >  [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0
> > > >  [<ffffffffa2168539>] kswapd+0x4f9/0x970
> > > >  [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> > > >  [<ffffffffa20a0d99>] kthread+0xc9/0xe0
> > > >  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
> > > >  [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70
> > > >  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
> > > > ---[ end trace c9d679f8ed4d7610 ]---
> > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size
> > > > 0x12b990
> > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size
> > > .....
> > > 
> > > Ok, I suspect this may be a VM bug. I've been looking at the 4.6
> > > code (so please try to reproduce on that kernel!) but it looks to me
> > > like the only way we can get from shrink_active_list() direct to
> > > try_to_release_page() is if we are over the maximum bufferhead
> > > threshold (i.e buffer_heads_over_limit = true) and we are trying to
> > > reclaim pages direct from the active list.
> > > 
> > > Because we are called from kswapd()->balance_pgdat(), we have:
> > > 
> > >         struct scan_control sc = {
> > >                 .gfp_mask = GFP_KERNEL,
> > >                 .order = order,
> > >                 .priority = DEF_PRIORITY,
> > >                 .may_writepage = !laptop_mode,
> > >                 .may_unmap = 1,
> > >                 .may_swap = 1,
> > >         };
> > > 
> > > The key point here is reclaim is being run with .may_writepage =
> > > true for default configuration kernels. when we get to
> > > shrink_active_list():
> > > 
> > > 	if (!sc->may_writepage)
> > > 		isolate_mode |= ISOLATE_CLEAN;
> > > 
> > > But sc->may_writepage = true and this allows isolate_lru_pages() to
> > > isolate dirty pages from the active list. Normally this isn't a
> > > problem, because the isolated active list pages are rotated to the
> > > inactive list, and nothing else happens to them. *Except when
> > > buffer_heads_over_limit = true*. This special condition would
> > > explain why I have never seen apt/dpkg cause this problem on any of
> > > my (many) Debian systems that all use XFS....
> > > 
> > > In that case, shrink_active_list() runs:
> > > 
> > > 	if (unlikely(buffer_heads_over_limit)) {
> > > 		if (page_has_private(page) && trylock_page(page)) {
> > > 			if (page_has_private(page))
> > > 				try_to_release_page(page, 0);
> > > 			unlock_page(page);
> > > 		}
> > > 	}
> > > 
> > > i.e. it locks the page, and if it has buffer heads it trys to get
> > > the bufferheads freed from the page.
> > > 
> > > But this is a dirty page, which means it may have delalloc or
> > > unwritten state on it's buffers, both of which indicate that there
> > > is dirty data in teh page that hasn't been written. XFS issues a
> > > warning on this because neither shrink_active_list nor
> > > try_to_release_page() check for whether the page is dirty or not.
> > > 
> > > Hence it seems to me that shrink_active_list() is calling
> > > try_to_release_page() inappropriately, and XFS is just the
> > > messenger. If you turn laptop mode on, it is likely the problem will
> > > go away as kswapd will run with .may_writepage = false, but that
> > > will also cause other behavioural changes relating to writeback and
> > > memory reclaim. It might be worth trying as a workaround for now.
> > > 
> > > MM-folk - is this analysis correct? If so, why is
> > > shrink_active_list() calling try_to_release_page() on dirty pages?
> > > Is this just an oversight or is there some problem that this is
> > > trying to work around? It seems trivial to fix to me (add a
> > > !PageDirty check), but I don't know why the check is there in the
> > > first place...
> > 
> > It seems to be latter.
> > Below commit seems to be related.
> > [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.]
> 
> Okay, that's been there a long, long time (2007), and it covers a
> case where the filesystem cleans pages without the VM knowing about
> it (i.e. it marks bufferheads clean without clearing the PageDirty
> state).
> 
> That does not explain the code in shrink_active_list().

Yeb, My point was the patch removed the PageDirty check in
try_to_free_buffers.

When I read description correctly, at that time, we wanted to check
PageDirty in try_to_free_buffers but couldn't do with above ext3
corner case reason.

iff --git a/fs/buffer.c b/fs/buffer.c
index 3b116078b4c3..460f1c43238e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *page)
        int ret = 0;
 
        BUG_ON(!PageLocked(page));
-       if (PageDirty(page) || PageWriteback(page))
+       if (PageWriteback(page))
                return 0;

And I found a culprit.
e182d61263b7d5, [PATCH] buffer_head takedown for bighighmem machines

It introduced pagevec_strip wich calls try_to_release_page without
PageDirty check in refill_inactive_zone which is shrink_active_list
now.

Quote from
"
    In refill_inactive(): if the number of buffer_heads is excessive then
    strip buffers from pages as they move onto the inactive list.  This
    change is useful for all filesystems.  This approach is good because
    pages which are being repeatedly overwritten will remain on the active
    list and will retain their buffers, whereas pages which are not being
    overwritten will be stripped.
"

> 
> > At that time, even shrink_page_list works like this.
> 
> The current code in shrink_page_list still works this way - the
> PageDirty code will *jump over the PagePrivate case* if the page is
> to remain dirty or pageout() fails to make it clean.  Hence it never
> gets to try_to_release_page() on a dirty page.
> 
> Seems like this really needs a dirty check in shrink_active_list()
> and to leave the stripping of bufferheads from dirty pages in the
> ext3 corner case to shrink_inactive_list() once the dirty pages have
> been rotated off the active list...

Another topic:
I don't know file system at all so I might miss something.

IMHO, if we should prohibit dirty page to fs->releasepage, isn't it
better to move PageDirty warning check to try_to_release_page and
clean it up all FSes's releasepage.

diff --git a/mm/filemap.c b/mm/filemap.c
index 00ae878b2a38..7c8b375c3475 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2821,8 +2821,10 @@ int try_to_release_page(struct page *page, gfp_t gfp_mask)
        if (PageWriteback(page))
                return 0;
 
-       if (mapping && mapping->a_ops->releasepage)
+       if (mapping && mapping->a_ops->releasepage) {
+               WARN_ON(PageDirty(page));
                return mapping->a_ops->releasepage(page, gfp_mask);
+       }
        return try_to_free_buffers(page);
 }

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9a8bbc1fb1fa..89b432a90f59 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1795,10 +1795,6 @@ void f2fs_invalidate_page(struct page *page, unsigned int offset,
 
 int f2fs_release_page(struct page *page, gfp_t wait)
 {
-       /* If this is dirty page, keep PagePrivate */
-       if (PageDirty(page))
-               return 0;
-
        /* This is atomic written page, keep Private */
        if (IS_ATOMIC_WRITTEN_PAGE(page))
                return 0;

Otherwise, we can simply return 0 in try_to_relase_page if it finds dirty page.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-05-31  3:59                                                         ` Minchan Kim
@ 2016-05-31  6:07                                                           ` Dave Chinner
  2016-05-31  6:11                                                             ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2016-05-31  6:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, Brian Foster, xfs, linux-kernel, Stefan Priebe - Profihost AG

On Tue, May 31, 2016 at 12:59:04PM +0900, Minchan Kim wrote:
> On Tue, May 31, 2016 at 12:55:09PM +1000, Dave Chinner wrote:
> > On Tue, May 31, 2016 at 10:07:24AM +0900, Minchan Kim wrote:
> > > On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote:
> > > > But this is a dirty page, which means it may have delalloc or
> > > > unwritten state on it's buffers, both of which indicate that there
> > > > is dirty data in teh page that hasn't been written. XFS issues a
> > > > warning on this because neither shrink_active_list nor
> > > > try_to_release_page() check for whether the page is dirty or not.
> > > > 
> > > > Hence it seems to me that shrink_active_list() is calling
> > > > try_to_release_page() inappropriately, and XFS is just the
> > > > messenger. If you turn laptop mode on, it is likely the problem will
> > > > go away as kswapd will run with .may_writepage = false, but that
> > > > will also cause other behavioural changes relating to writeback and
> > > > memory reclaim. It might be worth trying as a workaround for now.
> > > > 
> > > > MM-folk - is this analysis correct? If so, why is
> > > > shrink_active_list() calling try_to_release_page() on dirty pages?
> > > > Is this just an oversight or is there some problem that this is
> > > > trying to work around? It seems trivial to fix to me (add a
> > > > !PageDirty check), but I don't know why the check is there in the
> > > > first place...
> > > 
> > > It seems to be latter.
> > > Below commit seems to be related.
> > > [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.]
> > 
> > Okay, that's been there a long, long time (2007), and it covers a
> > case where the filesystem cleans pages without the VM knowing about
> > it (i.e. it marks bufferheads clean without clearing the PageDirty
> > state).
> > 
> > That does not explain the code in shrink_active_list().
> 
> Yeb, My point was the patch removed the PageDirty check in
> try_to_free_buffers.

*nod*

[...]

> And I found a culprit.
> e182d61263b7d5, [PATCH] buffer_head takedown for bighighmem machines

Heh. You have the combined historic tree sitting around for code
archeology, just like I do :)

> It introduced pagevec_strip wich calls try_to_release_page without
> PageDirty check in refill_inactive_zone which is shrink_active_list
> now.

<sigh>

It was merged 2 days before XFS was merged. Merging XFS made the
code Andrew wrote incorrect:

> Quote from
> "
>     In refill_inactive(): if the number of buffer_heads is excessive then
>     strip buffers from pages as they move onto the inactive list.  This
>     change is useful for all filesystems. [....]

Except for those that carry state necessary for writeback to be done
correctly on the dirty page bufferheads.  At the time, nobody doing
work the mm/writeback code cared about delayed allocation. So we've
carried this behaviour for 14 years without realising that it's
probably the source of all the unexplainable warnings we've got from
XFS over all that time.

I'm half tempted at this point to mostly ignore this mm/ behavour
because we are moving down the path of removing buffer heads from
XFS. That will require us to do different things in ->releasepage
and so just skipping dirty pages in the XFS code is the best thing
to do....

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-05-31  6:07                                                           ` Dave Chinner
@ 2016-05-31  6:11                                                             ` Stefan Priebe - Profihost AG
  2016-05-31  7:31                                                               ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-05-31  6:11 UTC (permalink / raw)
  To: Dave Chinner, Minchan Kim; +Cc: linux-mm, Brian Foster, linux-kernel, xfs

Hi Dave,

Am 31.05.2016 um 08:07 schrieb Dave Chinner:
> On Tue, May 31, 2016 at 12:59:04PM +0900, Minchan Kim wrote:
>> On Tue, May 31, 2016 at 12:55:09PM +1000, Dave Chinner wrote:
>>> On Tue, May 31, 2016 at 10:07:24AM +0900, Minchan Kim wrote:
>>>> On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote:
>>>>> But this is a dirty page, which means it may have delalloc or
>>>>> unwritten state on it's buffers, both of which indicate that there
>>>>> is dirty data in teh page that hasn't been written. XFS issues a
>>>>> warning on this because neither shrink_active_list nor
>>>>> try_to_release_page() check for whether the page is dirty or not.
>>>>>
>>>>> Hence it seems to me that shrink_active_list() is calling
>>>>> try_to_release_page() inappropriately, and XFS is just the
>>>>> messenger. If you turn laptop mode on, it is likely the problem will
>>>>> go away as kswapd will run with .may_writepage = false, but that
>>>>> will also cause other behavioural changes relating to writeback and
>>>>> memory reclaim. It might be worth trying as a workaround for now.
>>>>>
>>>>> MM-folk - is this analysis correct? If so, why is
>>>>> shrink_active_list() calling try_to_release_page() on dirty pages?
>>>>> Is this just an oversight or is there some problem that this is
>>>>> trying to work around? It seems trivial to fix to me (add a
>>>>> !PageDirty check), but I don't know why the check is there in the
>>>>> first place...
>>>>
>>>> It seems to be latter.
>>>> Below commit seems to be related.
>>>> [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.]
>>>
>>> Okay, that's been there a long, long time (2007), and it covers a
>>> case where the filesystem cleans pages without the VM knowing about
>>> it (i.e. it marks bufferheads clean without clearing the PageDirty
>>> state).
>>>
>>> That does not explain the code in shrink_active_list().
>>
>> Yeb, My point was the patch removed the PageDirty check in
>> try_to_free_buffers.
> 
> *nod*
> 
> [...]
> 
>> And I found a culprit.
>> e182d61263b7d5, [PATCH] buffer_head takedown for bighighmem machines
> 
> Heh. You have the combined historic tree sitting around for code
> archeology, just like I do :)
> 
>> It introduced pagevec_strip wich calls try_to_release_page without
>> PageDirty check in refill_inactive_zone which is shrink_active_list
>> now.
> 
> <sigh>
> 
> It was merged 2 days before XFS was merged. Merging XFS made the
> code Andrew wrote incorrect:
> 
>> Quote from
>> "
>>     In refill_inactive(): if the number of buffer_heads is excessive then
>>     strip buffers from pages as they move onto the inactive list.  This
>>     change is useful for all filesystems. [....]
> 
> Except for those that carry state necessary for writeback to be done
> correctly on the dirty page bufferheads.  At the time, nobody doing
> work the mm/writeback code cared about delayed allocation. So we've
> carried this behaviour for 14 years without realising that it's
> probably the source of all the unexplainable warnings we've got from
> XFS over all that time.
> 
> I'm half tempted at this point to mostly ignore this mm/ behavour
> because we are moving down the path of removing buffer heads from
> XFS. That will require us to do different things in ->releasepage
> and so just skipping dirty pages in the XFS code is the best thing
> to do....

does this change anything i should test? Or is 4.6 still the way to go?

Greets,
Stefan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-05-31  6:11                                                             ` Stefan Priebe - Profihost AG
@ 2016-05-31  7:31                                                               ` Dave Chinner
  2016-05-31  8:03                                                                 ` Stefan Priebe - Profihost AG
  2016-06-02 12:13                                                                 ` Stefan Priebe - Profihost AG
  0 siblings, 2 replies; 49+ messages in thread
From: Dave Chinner @ 2016-05-31  7:31 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: linux-mm, Minchan Kim, Brian Foster, linux-kernel, xfs

On Tue, May 31, 2016 at 08:11:42AM +0200, Stefan Priebe - Profihost AG wrote:
> > I'm half tempted at this point to mostly ignore this mm/ behavour
> > because we are moving down the path of removing buffer heads from
> > XFS. That will require us to do different things in ->releasepage
> > and so just skipping dirty pages in the XFS code is the best thing
> > to do....
> 
> does this change anything i should test? Or is 4.6 still the way to go?

Doesn't matter now - the warning will still be there on 4.6. I think
you can simply ignore it as the XFS code appears to be handling the
dirty page that is being passed to it correctly. We'll work out what
needs to be done to get rid of the warning for this case, wether it
be a mm/ change or an XFS change.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-05-31  7:31                                                               ` Dave Chinner
@ 2016-05-31  8:03                                                                 ` Stefan Priebe - Profihost AG
  2016-06-02 12:13                                                                 ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-05-31  8:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-mm, Minchan Kim, Brian Foster, linux-kernel, xfs

Am 31.05.2016 um 09:31 schrieb Dave Chinner:
> On Tue, May 31, 2016 at 08:11:42AM +0200, Stefan Priebe - Profihost AG wrote:
>>> I'm half tempted at this point to mostly ignore this mm/ behavour
>>> because we are moving down the path of removing buffer heads from
>>> XFS. That will require us to do different things in ->releasepage
>>> and so just skipping dirty pages in the XFS code is the best thing
>>> to do....
>>
>> does this change anything i should test? Or is 4.6 still the way to go?
> 
> Doesn't matter now - the warning will still be there on 4.6. I think
> you can simply ignore it as the XFS code appears to be handling the
> dirty page that is being passed to it correctly. We'll work out what
> needs to be done to get rid of the warning for this case, wether it
> be a mm/ change or an XFS change.

So is it OK to remove the WARN_ONCE in kernel code? So i don't get
alarms from our monitoring systems for the trace.

Stefan

> 
> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-05-31  1:07                                                     ` Minchan Kim
  2016-05-31  2:55                                                       ` Dave Chinner
@ 2016-05-31  9:50                                                       ` Jan Kara
  2016-06-01  1:38                                                         ` Minchan Kim
  2016-08-17 15:37                                                         ` Andreas Grünbacher
  1 sibling, 2 replies; 49+ messages in thread
From: Jan Kara @ 2016-05-31  9:50 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Stefan Priebe - Profihost AG, Brian Foster, linux-kernel, xfs, linux-mm

On Tue 31-05-16 10:07:24, Minchan Kim wrote:
> On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote:
> > [adding lkml and linux-mm to the cc list]
> > 
> > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote:
> > > Hi Dave,
> > >   Hi Brian,
> > > 
> > > below are the results with a vanilla 4.4.11 kernel.
> > 
> > Thanks for persisting with the testing, Stefan.
> > 
> > ....
> > 
> > > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a
> > > fresh reboot it has happened again on the root FS for a debian apt file:
> > > 
> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239
> > > xfs_vm_releasepage+0x10f/0x140()
> > > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
> > > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse
> > > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan
> > > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
> > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod
> > > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
> > > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas
> > > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G           O    4.4.11 #1
> > > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015
> > >  0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000
> > >  ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8
> > >  0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660
> > > Call Trace:
> > >  [<ffffffffa23c5b8f>] dump_stack+0x63/0x84
> > >  [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0
> > >  [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20
> > >  [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140
> > >  [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0
> > >  [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150
> > >  [<ffffffffa21521c2>] try_to_release_page+0x32/0x50
> > >  [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0
> > >  [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0
> > >  [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0
> > >  [<ffffffffa2168539>] kswapd+0x4f9/0x970
> > >  [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> > >  [<ffffffffa20a0d99>] kthread+0xc9/0xe0
> > >  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
> > >  [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70
> > >  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
> > > ---[ end trace c9d679f8ed4d7610 ]---
> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size
> > > 0x12b990
> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size
> > .....
> > 
> > Ok, I suspect this may be a VM bug. I've been looking at the 4.6
> > code (so please try to reproduce on that kernel!) but it looks to me
> > like the only way we can get from shrink_active_list() direct to
> > try_to_release_page() is if we are over the maximum bufferhead
> > threshold (i.e buffer_heads_over_limit = true) and we are trying to
> > reclaim pages direct from the active list.
> > 
> > Because we are called from kswapd()->balance_pgdat(), we have:
> > 
> >         struct scan_control sc = {
> >                 .gfp_mask = GFP_KERNEL,
> >                 .order = order,
> >                 .priority = DEF_PRIORITY,
> >                 .may_writepage = !laptop_mode,
> >                 .may_unmap = 1,
> >                 .may_swap = 1,
> >         };
> > 
> > The key point here is reclaim is being run with .may_writepage =
> > true for default configuration kernels. when we get to
> > shrink_active_list():
> > 
> > 	if (!sc->may_writepage)
> > 		isolate_mode |= ISOLATE_CLEAN;
> > 
> > But sc->may_writepage = true and this allows isolate_lru_pages() to
> > isolate dirty pages from the active list. Normally this isn't a
> > problem, because the isolated active list pages are rotated to the
> > inactive list, and nothing else happens to them. *Except when
> > buffer_heads_over_limit = true*. This special condition would
> > explain why I have never seen apt/dpkg cause this problem on any of
> > my (many) Debian systems that all use XFS....
> > 
> > In that case, shrink_active_list() runs:
> > 
> > 	if (unlikely(buffer_heads_over_limit)) {
> > 		if (page_has_private(page) && trylock_page(page)) {
> > 			if (page_has_private(page))
> > 				try_to_release_page(page, 0);
> > 			unlock_page(page);
> > 		}
> > 	}
> > 
> > i.e. it locks the page, and if it has buffer heads it trys to get
> > the bufferheads freed from the page.
> > 
> > But this is a dirty page, which means it may have delalloc or
> > unwritten state on it's buffers, both of which indicate that there
> > is dirty data in teh page that hasn't been written. XFS issues a
> > warning on this because neither shrink_active_list nor
> > try_to_release_page() check for whether the page is dirty or not.
> > 
> > Hence it seems to me that shrink_active_list() is calling
> > try_to_release_page() inappropriately, and XFS is just the
> > messenger. If you turn laptop mode on, it is likely the problem will
> > go away as kswapd will run with .may_writepage = false, but that
> > will also cause other behavioural changes relating to writeback and
> > memory reclaim. It might be worth trying as a workaround for now.
> > 
> > MM-folk - is this analysis correct? If so, why is
> > shrink_active_list() calling try_to_release_page() on dirty pages?
> > Is this just an oversight or is there some problem that this is
> > trying to work around? It seems trivial to fix to me (add a
> > !PageDirty check), but I don't know why the check is there in the
> > first place...
> 
> It seems to be latter.
> Below commit seems to be related.
> [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.]
> 
> At that time, even shrink_page_list works like this.
> 
> shrink_page_list
>         while (!list_empty(page_list)) {
>                 ..
>                 ..
>                 if (PageDirty(page)) {
>                         ..
>                 }
> 
>                 /*
>                  * If the page has buffers, try to free the buffer mappings
>                  * associated with this page. If we succeed we try to free
>                  * the page as well.
>                  *
>                  * We do this even if the page is PageDirty().
>                  * try_to_release_page() does not perform I/O, but it is
>                  * possible for a page to have PageDirty set, but it is actually
>                  * clean (all its buffers are clean).  This happens if the
>                  * buffers were written out directly, with submit_bh(). ext3
>                  * will do this, as well as the blockdev mapping. 
>                  * try_to_release_page() will discover that cleanness and will
>                  * drop the buffers and mark the page clean - it can be freed.
>                  * ..
>                  */
>                 if (PagePrivate(page)) {
>                         if (!try_to_release_page(page, sc->gfp_mask))
>                                 goto activate_locked;
>                         if (!mapping && page_count(page) == 1)
>                                 goto free_it;
>                 }
>                 ..
>         }
> 
> I wonder whether it's valid or not with on ext4.

Actually, we've already discussed this about an year ago:
http://oss.sgi.com/archives/xfs/2015-06/msg00119.html

And it was the last drop that made me remove ext3 from the tree. ext4 can
also clean dirty buffers while keeping pages dirty but it is limited only
to metadata (and data in data=journal mode) so the scope of the problem is
much smaller. So just avoiding calling ->releasepage for dirty pages may
work fine these days.

Also it is possible to change ext4 checkpointing code to completely avoid
doing this but I never got to rewriting that code. Probably I should give
it higher priority on my todo list...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-05-31  9:50                                                       ` Jan Kara
@ 2016-06-01  1:38                                                         ` Minchan Kim
  2016-08-17 15:37                                                         ` Andreas Grünbacher
  1 sibling, 0 replies; 49+ messages in thread
From: Minchan Kim @ 2016-06-01  1:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: Stefan Priebe - Profihost AG, Brian Foster, linux-kernel, xfs, linux-mm

On Tue, May 31, 2016 at 11:50:31AM +0200, Jan Kara wrote:
> On Tue 31-05-16 10:07:24, Minchan Kim wrote:
> > On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote:
> > > [adding lkml and linux-mm to the cc list]
> > > 
> > > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote:
> > > > Hi Dave,
> > > >   Hi Brian,
> > > > 
> > > > below are the results with a vanilla 4.4.11 kernel.
> > > 
> > > Thanks for persisting with the testing, Stefan.
> > > 
> > > ....
> > > 
> > > > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a
> > > > fresh reboot it has happened again on the root FS for a debian apt file:
> > > > 
> > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239
> > > > xfs_vm_releasepage+0x10f/0x140()
> > > > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
> > > > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse
> > > > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan
> > > > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
> > > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod
> > > > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
> > > > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas
> > > > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G           O    4.4.11 #1
> > > > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015
> > > >  0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000
> > > >  ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8
> > > >  0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660
> > > > Call Trace:
> > > >  [<ffffffffa23c5b8f>] dump_stack+0x63/0x84
> > > >  [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0
> > > >  [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20
> > > >  [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140
> > > >  [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0
> > > >  [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150
> > > >  [<ffffffffa21521c2>] try_to_release_page+0x32/0x50
> > > >  [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0
> > > >  [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0
> > > >  [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0
> > > >  [<ffffffffa2168539>] kswapd+0x4f9/0x970
> > > >  [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> > > >  [<ffffffffa20a0d99>] kthread+0xc9/0xe0
> > > >  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
> > > >  [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70
> > > >  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
> > > > ---[ end trace c9d679f8ed4d7610 ]---
> > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size
> > > > 0x12b990
> > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size
> > > .....
> > > 
> > > Ok, I suspect this may be a VM bug. I've been looking at the 4.6
> > > code (so please try to reproduce on that kernel!) but it looks to me
> > > like the only way we can get from shrink_active_list() direct to
> > > try_to_release_page() is if we are over the maximum bufferhead
> > > threshold (i.e buffer_heads_over_limit = true) and we are trying to
> > > reclaim pages direct from the active list.
> > > 
> > > Because we are called from kswapd()->balance_pgdat(), we have:
> > > 
> > >         struct scan_control sc = {
> > >                 .gfp_mask = GFP_KERNEL,
> > >                 .order = order,
> > >                 .priority = DEF_PRIORITY,
> > >                 .may_writepage = !laptop_mode,
> > >                 .may_unmap = 1,
> > >                 .may_swap = 1,
> > >         };
> > > 
> > > The key point here is reclaim is being run with .may_writepage =
> > > true for default configuration kernels. when we get to
> > > shrink_active_list():
> > > 
> > > 	if (!sc->may_writepage)
> > > 		isolate_mode |= ISOLATE_CLEAN;
> > > 
> > > But sc->may_writepage = true and this allows isolate_lru_pages() to
> > > isolate dirty pages from the active list. Normally this isn't a
> > > problem, because the isolated active list pages are rotated to the
> > > inactive list, and nothing else happens to them. *Except when
> > > buffer_heads_over_limit = true*. This special condition would
> > > explain why I have never seen apt/dpkg cause this problem on any of
> > > my (many) Debian systems that all use XFS....
> > > 
> > > In that case, shrink_active_list() runs:
> > > 
> > > 	if (unlikely(buffer_heads_over_limit)) {
> > > 		if (page_has_private(page) && trylock_page(page)) {
> > > 			if (page_has_private(page))
> > > 				try_to_release_page(page, 0);
> > > 			unlock_page(page);
> > > 		}
> > > 	}
> > > 
> > > i.e. it locks the page, and if it has buffer heads it trys to get
> > > the bufferheads freed from the page.
> > > 
> > > But this is a dirty page, which means it may have delalloc or
> > > unwritten state on it's buffers, both of which indicate that there
> > > is dirty data in teh page that hasn't been written. XFS issues a
> > > warning on this because neither shrink_active_list nor
> > > try_to_release_page() check for whether the page is dirty or not.
> > > 
> > > Hence it seems to me that shrink_active_list() is calling
> > > try_to_release_page() inappropriately, and XFS is just the
> > > messenger. If you turn laptop mode on, it is likely the problem will
> > > go away as kswapd will run with .may_writepage = false, but that
> > > will also cause other behavioural changes relating to writeback and
> > > memory reclaim. It might be worth trying as a workaround for now.
> > > 
> > > MM-folk - is this analysis correct? If so, why is
> > > shrink_active_list() calling try_to_release_page() on dirty pages?
> > > Is this just an oversight or is there some problem that this is
> > > trying to work around? It seems trivial to fix to me (add a
> > > !PageDirty check), but I don't know why the check is there in the
> > > first place...
> > 
> > It seems to be latter.
> > Below commit seems to be related.
> > [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.]
> > 
> > At that time, even shrink_page_list works like this.
> > 
> > shrink_page_list
> >         while (!list_empty(page_list)) {
> >                 ..
> >                 ..
> >                 if (PageDirty(page)) {
> >                         ..
> >                 }
> > 
> >                 /*
> >                  * If the page has buffers, try to free the buffer mappings
> >                  * associated with this page. If we succeed we try to free
> >                  * the page as well.
> >                  *
> >                  * We do this even if the page is PageDirty().
> >                  * try_to_release_page() does not perform I/O, but it is
> >                  * possible for a page to have PageDirty set, but it is actually
> >                  * clean (all its buffers are clean).  This happens if the
> >                  * buffers were written out directly, with submit_bh(). ext3
> >                  * will do this, as well as the blockdev mapping. 
> >                  * try_to_release_page() will discover that cleanness and will
> >                  * drop the buffers and mark the page clean - it can be freed.
> >                  * ..
> >                  */
> >                 if (PagePrivate(page)) {
> >                         if (!try_to_release_page(page, sc->gfp_mask))
> >                                 goto activate_locked;
> >                         if (!mapping && page_count(page) == 1)
> >                                 goto free_it;
> >                 }
> >                 ..
> >         }
> > 
> > I wonder whether it's valid or not with on ext4.
> 
> Actually, we've already discussed this about an year ago:
> http://oss.sgi.com/archives/xfs/2015-06/msg00119.html
> 
> And it was the last drop that made me remove ext3 from the tree. ext4 can
> also clean dirty buffers while keeping pages dirty but it is limited only
> to metadata (and data in data=journal mode) so the scope of the problem is
> much smaller. So just avoiding calling ->releasepage for dirty pages may
> work fine these days.
> 
> Also it is possible to change ext4 checkpointing code to completely avoid
> doing this but I never got to rewriting that code. Probably I should give
> it higher priority on my todo list...

Hah, you already noticed. Thanks for the information.

At a first glance, it seems to fix it in /mm with checking PageDirty but
it might be risky for other out-of-tree FSes without full understanding
of internal and block_invalidatepage users can make such clean buffers
but dirty page although there is no one in mainline now so I will leave
the fix to FS guys.

Thanks.

> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-05-31  7:31                                                               ` Dave Chinner
  2016-05-31  8:03                                                                 ` Stefan Priebe - Profihost AG
@ 2016-06-02 12:13                                                                 ` Stefan Priebe - Profihost AG
  2016-06-02 12:44                                                                   ` Holger Hoffstätte
  1 sibling, 1 reply; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-06-02 12:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-mm, Minchan Kim, Brian Foster, linux-kernel, xfs


Am 31.05.2016 um 09:31 schrieb Dave Chinner:
> On Tue, May 31, 2016 at 08:11:42AM +0200, Stefan Priebe - Profihost AG wrote:
>>> I'm half tempted at this point to mostly ignore this mm/ behavour
>>> because we are moving down the path of removing buffer heads from
>>> XFS. That will require us to do different things in ->releasepage
>>> and so just skipping dirty pages in the XFS code is the best thing
>>> to do....
>>
>> does this change anything i should test? Or is 4.6 still the way to go?
> 
> Doesn't matter now - the warning will still be there on 4.6. I think
> you can simply ignore it as the XFS code appears to be handling the
> dirty page that is being passed to it correctly. We'll work out what
> needs to be done to get rid of the warning for this case, wether it
> be a mm/ change or an XFS change.

Any idea what i could do with 4.4.X? Can i safely remove the WARN_ONCE
statement?

Stefan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-06-02 12:13                                                                 ` Stefan Priebe - Profihost AG
@ 2016-06-02 12:44                                                                   ` Holger Hoffstätte
  2016-06-02 23:08                                                                     ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Holger Hoffstätte @ 2016-06-02 12:44 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Dave Chinner
  Cc: linux-mm, Brian Foster, xfs, linux-kernel, Minchan Kim

On 06/02/16 14:13, Stefan Priebe - Profihost AG wrote:
> 
> Am 31.05.2016 um 09:31 schrieb Dave Chinner:
>> On Tue, May 31, 2016 at 08:11:42AM +0200, Stefan Priebe - Profihost AG wrote:
>>>> I'm half tempted at this point to mostly ignore this mm/ behavour
>>>> because we are moving down the path of removing buffer heads from
>>>> XFS. That will require us to do different things in ->releasepage
>>>> and so just skipping dirty pages in the XFS code is the best thing
>>>> to do....
>>>
>>> does this change anything i should test? Or is 4.6 still the way to go?
>>
>> Doesn't matter now - the warning will still be there on 4.6. I think
>> you can simply ignore it as the XFS code appears to be handling the
>> dirty page that is being passed to it correctly. We'll work out what
>> needs to be done to get rid of the warning for this case, wether it
>> be a mm/ change or an XFS change.
> 
> Any idea what i could do with 4.4.X? Can i safely remove the WARN_ONCE
> statement?

By definition it won't break anything since it's just a heads-up message,
so yes, it should be "safe". However if my understanding of the situation
is correct, mainline commit f0281a00fe "mm: workingset: only do workingset
activations on reads" (+ friends) in 4.7 should effectively prevent this
from happenning. Can someone confirm or deny this?

-h

PS: Stefan: I backported that commit (and friends) to my 4.4.x patch queue,
so if you want to try that for today's 4.4.12 the warning should be gone.
No guarantees though :)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-06-02 12:44                                                                   ` Holger Hoffstätte
@ 2016-06-02 23:08                                                                     ` Dave Chinner
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2016-06-02 23:08 UTC (permalink / raw)
  To: Holger Hoffstätte
  Cc: Minchan Kim, Brian Foster, Stefan Priebe - Profihost AG,
	linux-kernel, xfs, linux-mm

On Thu, Jun 02, 2016 at 02:44:30PM +0200, Holger Hoffstätte wrote:
> On 06/02/16 14:13, Stefan Priebe - Profihost AG wrote:
> > 
> > Am 31.05.2016 um 09:31 schrieb Dave Chinner:
> >> On Tue, May 31, 2016 at 08:11:42AM +0200, Stefan Priebe - Profihost AG wrote:
> >>>> I'm half tempted at this point to mostly ignore this mm/ behavour
> >>>> because we are moving down the path of removing buffer heads from
> >>>> XFS. That will require us to do different things in ->releasepage
> >>>> and so just skipping dirty pages in the XFS code is the best thing
> >>>> to do....
> >>>
> >>> does this change anything i should test? Or is 4.6 still the way to go?
> >>
> >> Doesn't matter now - the warning will still be there on 4.6. I think
> >> you can simply ignore it as the XFS code appears to be handling the
> >> dirty page that is being passed to it correctly. We'll work out what
> >> needs to be done to get rid of the warning for this case, wether it
> >> be a mm/ change or an XFS change.
> > 
> > Any idea what i could do with 4.4.X? Can i safely remove the WARN_ONCE
> > statement?
> 
> By definition it won't break anything since it's just a heads-up message,
> so yes, it should be "safe". However if my understanding of the situation
> is correct, mainline commit f0281a00fe "mm: workingset: only do workingset
> activations on reads" (+ friends) in 4.7 should effectively prevent this
> from happenning. Can someone confirm or deny this?

I don't think it will.  The above commits will avoid putting
/write-only/ dirty pages on the active list from the write() syscall
vector, but it won't prevent pages that are read first then dirtied
from ending up on the active list. e.g. a mmap write will first read
the page from disk to populate the page (hence it ends up on the
active list), then the page gets dirtied and ->page_mkwrite is
called to tell the filesystem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-05-22 21:38                                               ` Dave Chinner
  2016-05-30  7:23                                                 ` Stefan Priebe - Profihost AG
@ 2016-06-03 17:56                                                 ` Stefan Priebe - Profihost AG
  2016-06-03 19:35                                                   ` Holger Hoffstätte
                                                                     ` (2 more replies)
  1 sibling, 3 replies; 49+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-06-03 17:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs-masters, Brian Foster, xfs

Hi,

should i remove the complete if conditions incl. the return 0 or should
id convert it to if without WARN_ONCE? like below?

        if (WARN_ON_ONCE(delalloc))
                return 0;
        if (WARN_ON_ONCE(unwritten))
                return 0;

=>

  if (delalloc)
    return 0;
  if (unwritten)
    return 0;



Am 22.05.2016 um 23:38 schrieb Dave Chinner:
> On Sun, May 22, 2016 at 09:36:39PM +0200, Stefan Priebe - Profihost AG wrote:
>> Am 16.05.2016 um 03:06 schrieb Brian Foster:
>>>> sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
>>>> i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas
>>>> [Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G       O
>>>> 4.4.10+25-ph #1
>>>
>>> How close is this to an upstream kernel? Upstream XFS? Have you tried to
>>> reproduce this on an upstream kernel?
>>
>> It's a vanilla 4.4.10 + a new adaptec driver and some sched and wq
>> patches from 4.5 and 4.6 but i can try to replace the kernel on one
>> machine with a 100% vanilla one if this helps.
> 
> Please do.
> 
>>>> [295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>> 0x52000 size 0x13d1c8
>>>> [295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>> 0x53000 size 0x13d1c8
>>>> [295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>> 0x54000 size 0x13d1c8
>>> ...
>>>> [295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>> 0xab000 size 0x13d1c8
>>>> [295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>> 0xac000 size 0x13d1c8
>>>> [295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>> 0xad000 size 0x13d1c8
>>>>
>>>> The file to the inode number is:
>>>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en
>>>>
>>>
>>> xfs_bmap -v might be interesting here as well.
>>
>> # xfs_bmap -v
>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en
>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en:
>>  EXT: FILE-OFFSET      BLOCK-RANGE        AG AG-OFFSET        TOTAL
>>    0: [0..2567]:       41268928..41271495  3 (374464..377031)  2568
> 
> So the last file offset with a block is 0x140e00. This means the
> file is fully allocated. However, the pages inside the file range
> are still marked delayed allocation. That implies that we've failed
> to write the pages over a delayed allocation region after we've
> allocated the space.
> 
> That, in turn, tends to indicate a problem in page writeback - the
> first page to be written has triggered delayed allocation of the
> entire range, but then the subsequent pages have not been written
> (for some as yet unknown reason). When a page is written, we map it
> to the current block via xfs_map_at_offset(), and that clears both
> the buffer delay and unwritten flags.
> 
> This clearly isn't happening which means either the VFS doesn't
> think the inode is dirty anymore, writeback is never asking for
> these pages to be written, or XFs is screwing something up in
> ->writepage. The XFS writepage code changed significantly in 4.6, so
> it might be worth seeing if a 4.6 kernel reproduces this same
> problem....
> 
> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-06-03 17:56                                                 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe - Profihost AG
@ 2016-06-03 19:35                                                   ` Holger Hoffstätte
  2016-06-04  0:04                                                   ` Dave Chinner
  2016-06-26  5:45                                                   ` Stefan Priebe
  2 siblings, 0 replies; 49+ messages in thread
From: Holger Hoffstätte @ 2016-06-03 19:35 UTC (permalink / raw)
  To: xfs

On 06/03/16 19:56, Stefan Priebe - Profihost AG wrote:
> Hi,
> 
> should i remove the complete if conditions incl. the return 0 or should
> id convert it to if without WARN_ONCE? like below?
> 
>         if (WARN_ON_ONCE(delalloc))
>                 return 0;
>         if (WARN_ON_ONCE(unwritten))
>                 return 0;
> 
> =>
> 
>   if (delalloc)
>     return 0;
>   if (unwritten)
>     return 0;

Good thing you ask, I forgot about the returns..

Until the bigger picture has been figured out with -mm I'd probably
keep the returns.

-h

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-06-03 17:56                                                 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe - Profihost AG
  2016-06-03 19:35                                                   ` Holger Hoffstätte
@ 2016-06-04  0:04                                                   ` Dave Chinner
  2016-06-26  5:45                                                   ` Stefan Priebe
  2 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2016-06-04  0:04 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: xfs-masters, Brian Foster, xfs

On Fri, Jun 03, 2016 at 07:56:08PM +0200, Stefan Priebe - Profihost AG wrote:
> Hi,
> 
> should i remove the complete if conditions incl. the return 0 or should
> id convert it to if without WARN_ONCE? like below?
> 
>         if (WARN_ON_ONCE(delalloc))
>                 return 0;
>         if (WARN_ON_ONCE(unwritten))
>                 return 0;
> 
> =>
> 
>   if (delalloc)
>     return 0;
>   if (unwritten)
>     return 0;

Yes, you need to keep the checks and returns. That's what I meant
when I said that "XFS handles the dirty page case correctly in this
case". If the page is dirty, we should not be attempting to release
the buffers, and that is what the code does. It's just noisy about
it...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
  2016-06-03 17:56                                                 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe - Profihost AG
  2016-06-03 19:35                                                   ` Holger Hoffstätte
  2016-06-04  0:04                                                   ` Dave Chinner
@ 2016-06-26  5:45                                                   ` Stefan Priebe
  2 siblings, 0 replies; 49+ messages in thread
From: Stefan Priebe @ 2016-06-26  5:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs-masters, Brian Foster, xfs

Hi Dave,

today i got this XFS trace while running 4.4.13. I'm not sure if it is 
related.

[282732.262739] ------------[ cut here ]------------
[282732.264093] kernel BUG at fs/xfs/xfs_aops.c:1054!
[282732.265459] invalid opcode: 0000 [#1] SMP
[282732.266753] Modules linked in: netconsole xt_multiport 
iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse 
sb_edac edac_core xhci_pci i40e(O) xhci_hcd i2c_i801 vxlan shpchp 
ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor 
raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod 
ehci_pci ehci_hcd usbcore usb_common ahci libahci igb i2c_algo_bit 
mpt3sas i2c_core raid_class ptp pps_core scsi_transport_sas
[282732.280494] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G           O 
4.4.13+36-ph #1
[282732.282707] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 
2.0 12/17/2015
[282732.284873] task: ffff880c4d9ba500 ti: ffff880c4da28000 task.ti: 
ffff880c4da28000
[282732.287038] RIP: 0010:[<ffffffff943267f1>]  [<ffffffff943267f1>] 
xfs_vm_writepage+0x561/0x5c0
[282732.289554] RSP: 0018:ffff880c4da2b8e8  EFLAGS: 00010246
[282732.291095] RAX: 001fffff80020009 RBX: ffffea000186de80 RCX: 
000000000000000c
[282732.293161] RDX: 0000000000001800 RSI: ffff880c4da2b9b8 RDI: 
ffffea000186de80
[282732.295255] RBP: ffff880c4da2b9a8 R08: 0000000000000003 R09: 
7fffffffffffffff
[282732.297340] R10: ffff880c7ffdc6c0 R11: 0000000000000000 R12: 
ffffea000186de80
[282732.299405] R13: ffff88001ea855d0 R14: ffff880c4da2bad8 R15: 
ffffea000186dea0
[282732.301472] FS:  0000000000000000(0000) GS:ffff880c7fc40000(0000) 
knlGS:0000000000000000
[282732.303811] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[282732.305480] CR2: ffffffffff600400 CR3: 0000000014c0a000 CR4: 
00000000001406e0
[282732.307545] Stack:
[282732.308142]  ffff8806c1e980e0 ffff880c442dc800 ffff880100000001 
0000000100042000
[282732.310482]  ffffea00016db240 ffff880c4da2b968 0000000001800000 
ffff880c4da2b9b8
[282732.312822]  0000000000001000 0000000000297000 0000000000000000 
0000000000000246
[282732.315161] Call Trace:
[282732.315890]  [<ffffffff9415c72e>] ? clear_page_dirty_for_io+0xee/0x1b0
[282732.317782]  [<ffffffff94163974>] pageout.isra.43+0x164/0x280
[282732.319449]  [<ffffffff94165f4a>] shrink_page_list+0x5ba/0x760
[282732.321143]  [<ffffffff941667ce>] shrink_inactive_list+0x1ee/0x500
[282732.322934]  [<ffffffff941674e1>] shrink_lruvec+0x621/0x7d0
[282732.324554]  [<ffffffff9416776c>] shrink_zone+0xdc/0x2c0
[282732.326096]  [<ffffffff941688b9>] kswapd+0x4f9/0x970
[282732.327541]  [<ffffffff941683c0>] ? 
mem_cgroup_shrink_node_zone+0x1a0/0x1a0
[282732.329561]  [<ffffffff940a0dc9>] kthread+0xc9/0xe0
[282732.330979]  [<ffffffff940a0d00>] ? kthread_stop+0x100/0x100
[282732.332623]  [<ffffffff946b470f>] ret_from_fork+0x3f/0x70
[282732.334191]  [<ffffffff940a0d00>] ? kthread_stop+0x100/0x100
[282732.404901] Code: f8 f5 74 a3 4c 89 e7 89 85 50 ff ff ff e8 38 e8 ff 
ff f0 41 80 24 24 f7 4c 89 e7 e8 3a bf e2 ff 8b 85 50 ff ff ff e9 39 fd 
ff ff <0f> 0b 80 3d 04 fc 9c 00 00 0f 85 6d ff ff ff be d6 03 00 00 48
[282732.556398] RIP  [<ffffffff943267f1>] xfs_vm_writepage+0x561/0x5c0
[282732.630207]  RSP <ffff880c4da2b8e8>
[282732.703062] ---[ end trace 9ea1afce9e126cdc ]---
[282732.842462] ------------[ cut here ]------------
[282732.914729] WARNING: CPU: 2 PID: 108 at kernel/exit.c:661 
do_exit+0x50/0xab0()
[282732.989039] Modules linked in: netconsole xt_multiport 
iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse 
sb_edac edac_core xhci_pci i40e(O) xhci_hcd i2c_i801 vxlan shpchp 
ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor 
raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod 
ehci_pci ehci_hcd usbcore usb_common ahci libahci igb i2c_algo_bit 
mpt3sas i2c_core raid_class ptp pps_core scsi_transport_sas
[282733.306619] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G      D    O 
4.4.13+36-ph #1
[282733.386805] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 
2.0 12/17/2015
[282733.467585]  0000000000000000 ffff880c4da2b5d8 ffffffff943c60ff 
0000000000000000
[282733.547861]  ffffffff94a330a8 ffff880c4da2b618 ffffffff940837a7 
ffff880c4da2b838
[282733.626144]  000000000000000b ffff880c4da2b838 0000000000000246 
ffff880c4d9ba500
[282733.702723] Call Trace:
[282733.776298]  [<ffffffff943c60ff>] dump_stack+0x63/0x84
[282733.849877]  [<ffffffff940837a7>] warn_slowpath_common+0x97/0xe0
[282733.922917]  [<ffffffff9408380a>] warn_slowpath_null+0x1a/0x20
[282733.994640]  [<ffffffff94085a90>] do_exit+0x50/0xab0
[282734.064858]  [<ffffffff94008a02>] oops_end+0xa2/0xe0
[282734.133974]  [<ffffffff94008b88>] die+0x58/0x80
[282734.202270]  [<ffffffff94005ba9>] do_trap+0x69/0x150
[282734.269913]  [<ffffffff940a1bc2>] ? 
__atomic_notifier_call_chain+0x12/0x20
[282734.337974]  [<ffffffff94005d5d>] do_error_trap+0xcd/0xf0
[282734.406008]  [<ffffffff943267f1>] ? xfs_vm_writepage+0x561/0x5c0
[282734.474472]  [<ffffffff9439c334>] ? generic_make_request+0x104/0x190
[282734.542216]  [<ffffffff94006000>] do_invalid_op+0x20/0x30
[282734.609276]  [<ffffffff946b5e8e>] invalid_op+0x1e/0x30
[282734.675516]  [<ffffffff943267f1>] ? xfs_vm_writepage+0x561/0x5c0
[282734.741307]  [<ffffffff94326528>] ? xfs_vm_writepage+0x298/0x5c0
[282734.805722]  [<ffffffff9415c72e>] ? clear_page_dirty_for_io+0xee/0x1b0
[282734.870589]  [<ffffffff94163974>] pageout.isra.43+0x164/0x280
[282734.934901]  [<ffffffff94165f4a>] shrink_page_list+0x5ba/0x760
[282734.998565]  [<ffffffff941667ce>] shrink_inactive_list+0x1ee/0x500
[282735.061845]  [<ffffffff941674e1>] shrink_lruvec+0x621/0x7d0
[282735.124441]  [<ffffffff9416776c>] shrink_zone+0xdc/0x2c0
[282735.186752]  [<ffffffff941688b9>] kswapd+0x4f9/0x970
[282735.249021]  [<ffffffff941683c0>] ? 
mem_cgroup_shrink_node_zone+0x1a0/0x1a0
[282735.312215]  [<ffffffff940a0dc9>] kthread+0xc9/0xe0
[282735.375420]  [<ffffffff940a0d00>] ? kthread_stop+0x100/0x100
[282735.439012]  [<ffffffff946b470f>] ret_from_fork+0x3f/0x70
[282735.502368]  [<ffffffff940a0d00>] ? kthread_stop+0x100/0x100
[282735.565534] ---[ end trace 9ea1afce9e126cdd ]---

Stefan

Am 03.06.2016 um 19:56 schrieb Stefan Priebe - Profihost AG:
> Hi,
>
> should i remove the complete if conditions incl. the return 0 or should
> id convert it to if without WARN_ONCE? like below?
>
>          if (WARN_ON_ONCE(delalloc))
>                  return 0;
>          if (WARN_ON_ONCE(unwritten))
>                  return 0;
>
> =>
>
>    if (delalloc)
>      return 0;
>    if (unwritten)
>      return 0;
>
>
>
> Am 22.05.2016 um 23:38 schrieb Dave Chinner:
>> On Sun, May 22, 2016 at 09:36:39PM +0200, Stefan Priebe - Profihost AG wrote:
>>> Am 16.05.2016 um 03:06 schrieb Brian Foster:
>>>>> sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
>>>>> i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas
>>>>> [Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G       O
>>>>> 4.4.10+25-ph #1
>>>>
>>>> How close is this to an upstream kernel? Upstream XFS? Have you tried to
>>>> reproduce this on an upstream kernel?
>>>
>>> It's a vanilla 4.4.10 + a new adaptec driver and some sched and wq
>>> patches from 4.5 and 4.6 but i can try to replace the kernel on one
>>> machine with a 100% vanilla one if this helps.
>>
>> Please do.
>>
>>>>> [295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>>> 0x52000 size 0x13d1c8
>>>>> [295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>>> 0x53000 size 0x13d1c8
>>>>> [295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>>> 0x54000 size 0x13d1c8
>>>> ...
>>>>> [295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>>> 0xab000 size 0x13d1c8
>>>>> [295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>>> 0xac000 size 0x13d1c8
>>>>> [295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff
>>>>> 0xad000 size 0x13d1c8
>>>>>
>>>>> The file to the inode number is:
>>>>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en
>>>>>
>>>>
>>>> xfs_bmap -v might be interesting here as well.
>>>
>>> # xfs_bmap -v
>>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en
>>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en:
>>>   EXT: FILE-OFFSET      BLOCK-RANGE        AG AG-OFFSET        TOTAL
>>>     0: [0..2567]:       41268928..41271495  3 (374464..377031)  2568
>>
>> So the last file offset with a block is 0x140e00. This means the
>> file is fully allocated. However, the pages inside the file range
>> are still marked delayed allocation. That implies that we've failed
>> to write the pages over a delayed allocation region after we've
>> allocated the space.
>>
>> That, in turn, tends to indicate a problem in page writeback - the
>> first page to be written has triggered delayed allocation of the
>> entire range, but then the subsequent pages have not been written
>> (for some as yet unknown reason). When a page is written, we map it
>> to the current block via xfs_map_at_offset(), and that clears both
>> the buffer delay and unwritten flags.
>>
>> This clearly isn't happening which means either the VFS doesn't
>> think the inode is dirty anymore, writeback is never asking for
>> these pages to be written, or XFs is screwing something up in
>> ->writepage. The XFS writepage code changed significantly in 4.6, so
>> it might be worth seeing if a 4.6 kernel reproduces this same
>> problem....
>>
>> Cheers,
>>
>> Dave.
>>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)
  2016-05-31  9:50                                                       ` Jan Kara
  2016-06-01  1:38                                                         ` Minchan Kim
@ 2016-08-17 15:37                                                         ` Andreas Grünbacher
  1 sibling, 0 replies; 49+ messages in thread
From: Andreas Grünbacher @ 2016-08-17 15:37 UTC (permalink / raw)
  To: Jan Kara
  Cc: Minchan Kim, Brian Foster, Stefan Priebe - Profihost AG,
	Linux Kernel Mailing List, xfs, linux-mm, Lukas Czerner,
	Steven Whitehouse

Hi Jan,

2016-05-31 11:50 GMT+02:00 Jan Kara <jack@suse.cz>:
> On Tue 31-05-16 10:07:24, Minchan Kim wrote:
>> On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote:
>> > [adding lkml and linux-mm to the cc list]
>> >
>> > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote:
>> > > Hi Dave,
>> > >   Hi Brian,
>> > >
>> > > below are the results with a vanilla 4.4.11 kernel.
>> >
>> > Thanks for persisting with the testing, Stefan.
>> >
>> > ....
>> >
>> > > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a
>> > > fresh reboot it has happened again on the root FS for a debian apt file:
>> > >
>> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990
>> > > ------------[ cut here ]------------
>> > > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239
>> > > xfs_vm_releasepage+0x10f/0x140()
>> > > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
>> > > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse
>> > > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan
>> > > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
>> > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod
>> > > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
>> > > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas
>> > > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G           O    4.4.11 #1
>> > > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015
>> > >  0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000
>> > >  ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8
>> > >  0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660
>> > > Call Trace:
>> > >  [<ffffffffa23c5b8f>] dump_stack+0x63/0x84
>> > >  [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0
>> > >  [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20
>> > >  [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140
>> > >  [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0
>> > >  [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150
>> > >  [<ffffffffa21521c2>] try_to_release_page+0x32/0x50
>> > >  [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0
>> > >  [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0
>> > >  [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0
>> > >  [<ffffffffa2168539>] kswapd+0x4f9/0x970
>> > >  [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>> > >  [<ffffffffa20a0d99>] kthread+0xc9/0xe0
>> > >  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
>> > >  [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70
>> > >  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
>> > > ---[ end trace c9d679f8ed4d7610 ]---
>> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size
>> > > 0x12b990
>> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size
>> > .....
>> >
>> > Ok, I suspect this may be a VM bug. I've been looking at the 4.6
>> > code (so please try to reproduce on that kernel!) but it looks to me
>> > like the only way we can get from shrink_active_list() direct to
>> > try_to_release_page() is if we are over the maximum bufferhead
>> > threshold (i.e buffer_heads_over_limit = true) and we are trying to
>> > reclaim pages direct from the active list.
>> >
>> > Because we are called from kswapd()->balance_pgdat(), we have:
>> >
>> >         struct scan_control sc = {
>> >                 .gfp_mask = GFP_KERNEL,
>> >                 .order = order,
>> >                 .priority = DEF_PRIORITY,
>> >                 .may_writepage = !laptop_mode,
>> >                 .may_unmap = 1,
>> >                 .may_swap = 1,
>> >         };
>> >
>> > The key point here is reclaim is being run with .may_writepage =
>> > true for default configuration kernels. when we get to
>> > shrink_active_list():
>> >
>> >     if (!sc->may_writepage)
>> >             isolate_mode |= ISOLATE_CLEAN;
>> >
>> > But sc->may_writepage = true and this allows isolate_lru_pages() to
>> > isolate dirty pages from the active list. Normally this isn't a
>> > problem, because the isolated active list pages are rotated to the
>> > inactive list, and nothing else happens to them. *Except when
>> > buffer_heads_over_limit = true*. This special condition would
>> > explain why I have never seen apt/dpkg cause this problem on any of
>> > my (many) Debian systems that all use XFS....
>> >
>> > In that case, shrink_active_list() runs:
>> >
>> >     if (unlikely(buffer_heads_over_limit)) {
>> >             if (page_has_private(page) && trylock_page(page)) {
>> >                     if (page_has_private(page))
>> >                             try_to_release_page(page, 0);
>> >                     unlock_page(page);
>> >             }
>> >     }
>> >
>> > i.e. it locks the page, and if it has buffer heads it trys to get
>> > the bufferheads freed from the page.
>> >
>> > But this is a dirty page, which means it may have delalloc or
>> > unwritten state on it's buffers, both of which indicate that there
>> > is dirty data in teh page that hasn't been written. XFS issues a
>> > warning on this because neither shrink_active_list nor
>> > try_to_release_page() check for whether the page is dirty or not.
>> >
>> > Hence it seems to me that shrink_active_list() is calling
>> > try_to_release_page() inappropriately, and XFS is just the
>> > messenger. If you turn laptop mode on, it is likely the problem will
>> > go away as kswapd will run with .may_writepage = false, but that
>> > will also cause other behavioural changes relating to writeback and
>> > memory reclaim. It might be worth trying as a workaround for now.
>> >
>> > MM-folk - is this analysis correct? If so, why is
>> > shrink_active_list() calling try_to_release_page() on dirty pages?
>> > Is this just an oversight or is there some problem that this is
>> > trying to work around? It seems trivial to fix to me (add a
>> > !PageDirty check), but I don't know why the check is there in the
>> > first place...
>>
>> It seems to be latter.
>> Below commit seems to be related.
>> [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.]
>>
>> At that time, even shrink_page_list works like this.
>>
>> shrink_page_list
>>         while (!list_empty(page_list)) {
>>                 ..
>>                 ..
>>                 if (PageDirty(page)) {
>>                         ..
>>                 }
>>
>>                 /*
>>                  * If the page has buffers, try to free the buffer mappings
>>                  * associated with this page. If we succeed we try to free
>>                  * the page as well.
>>                  *
>>                  * We do this even if the page is PageDirty().
>>                  * try_to_release_page() does not perform I/O, but it is
>>                  * possible for a page to have PageDirty set, but it is actually
>>                  * clean (all its buffers are clean).  This happens if the
>>                  * buffers were written out directly, with submit_bh(). ext3
>>                  * will do this, as well as the blockdev mapping.
>>                  * try_to_release_page() will discover that cleanness and will
>>                  * drop the buffers and mark the page clean - it can be freed.
>>                  * ..
>>                  */
>>                 if (PagePrivate(page)) {
>>                         if (!try_to_release_page(page, sc->gfp_mask))
>>                                 goto activate_locked;
>>                         if (!mapping && page_count(page) == 1)
>>                                 goto free_it;
>>                 }
>>                 ..
>>         }
>>
>> I wonder whether it's valid or not with on ext4.
>
> Actually, we've already discussed this about an year ago:
> http://oss.sgi.com/archives/xfs/2015-06/msg00119.html
>
> And it was the last drop that made me remove ext3 from the tree. ext4 can
> also clean dirty buffers while keeping pages dirty but it is limited only
> to metadata (and data in data=journal mode) so the scope of the problem is
> much smaller. So just avoiding calling ->releasepage for dirty pages may
> work fine these days.
>
> Also it is possible to change ext4 checkpointing code to completely avoid
> doing this but I never got to rewriting that code. Probably I should give
> it higher priority on my todo list...

we're seeing the same (releasepage being called for dirty pages) on
GFS2 as well. Right now, GFS2 warns about this case, but we'll remove
that warning and wait for ext4 and releasepage to be fixed so that we
can re-add the warning. Maybe this will help as an argument for fixing
ext4 soon :)

Thanks,
Andreas

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2016-08-17 15:37 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-20  8:02 xfs trace in 4.4.2 Stefan Priebe
2016-02-20 14:45 ` Brian Foster
2016-02-20 18:02   ` Stefan Priebe - Profihost AG
2016-03-04 18:47     ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe
2016-03-04 19:13       ` Brian Foster
2016-03-04 20:02         ` Stefan Priebe
2016-03-04 21:03           ` Brian Foster
2016-03-04 21:15             ` Stefan Priebe
2016-03-05 22:48             ` Dave Chinner
2016-03-05 22:58               ` Stefan Priebe
2016-03-23 13:26               ` Stefan Priebe - Profihost AG
2016-03-23 13:28               ` Stefan Priebe - Profihost AG
2016-03-23 14:07                 ` Brian Foster
2016-03-24  8:10                   ` Stefan Priebe - Profihost AG
2016-03-24  8:15                     ` Stefan Priebe - Profihost AG
2016-03-24 11:17                       ` Brian Foster
2016-03-24 12:17                         ` Stefan Priebe - Profihost AG
2016-03-24 12:24                           ` Brian Foster
2016-04-04  6:12                             ` Stefan Priebe - Profihost AG
2016-05-11 12:26                             ` Stefan Priebe - Profihost AG
2016-05-11 13:34                               ` Brian Foster
2016-05-11 14:03                                 ` Stefan Priebe - Profihost AG
2016-05-11 15:59                                   ` Brian Foster
2016-05-11 19:20                                     ` Stefan Priebe
2016-05-15 11:03                                     ` Stefan Priebe
2016-05-15 11:50                                       ` Brian Foster
2016-05-15 12:41                                         ` Stefan Priebe
2016-05-16  1:06                                           ` Brian Foster
2016-05-22 19:36                                             ` Stefan Priebe - Profihost AG
2016-05-22 21:38                                               ` Dave Chinner
2016-05-30  7:23                                                 ` Stefan Priebe - Profihost AG
2016-05-30 22:36                                                   ` shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) Dave Chinner
2016-05-31  1:07                                                     ` Minchan Kim
2016-05-31  2:55                                                       ` Dave Chinner
2016-05-31  3:59                                                         ` Minchan Kim
2016-05-31  6:07                                                           ` Dave Chinner
2016-05-31  6:11                                                             ` Stefan Priebe - Profihost AG
2016-05-31  7:31                                                               ` Dave Chinner
2016-05-31  8:03                                                                 ` Stefan Priebe - Profihost AG
2016-06-02 12:13                                                                 ` Stefan Priebe - Profihost AG
2016-06-02 12:44                                                                   ` Holger Hoffstätte
2016-06-02 23:08                                                                     ` Dave Chinner
2016-05-31  9:50                                                       ` Jan Kara
2016-06-01  1:38                                                         ` Minchan Kim
2016-08-17 15:37                                                         ` Andreas Grünbacher
2016-06-03 17:56                                                 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe - Profihost AG
2016-06-03 19:35                                                   ` Holger Hoffstätte
2016-06-04  0:04                                                   ` Dave Chinner
2016-06-26  5:45                                                   ` Stefan Priebe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).