All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Ext4 stack trace with savedwrite patches
       [not found] <87innzu233.fsf@skywalker.in.ibm.com>
@ 2017-03-01  9:49 ` Jan Kara
  2017-03-01 10:23   ` Aneesh Kumar K.V
  0 siblings, 1 reply; 2+ messages in thread
From: Jan Kara @ 2017-03-01  9:49 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: Jan Kara, Andrew Morton, mgorman, linux-mm

Hi,

On Fri 24-02-17 19:23:52, Aneesh Kumar K.V wrote:
> I am hitting this while running stress test with the saved write patch
> series. I guess we are missing a set page dirty some where. I will
> continue to debug this, but if you have any suggestion let me know.
<snip>

So this warning can happen when page got dirtied but ->page_mkwrite() was
not called. I don't know details of how autonuma works but a quick look
suggests that autonuma can also do numa hinting faults for file pages.
So the following seems to be possible:

Autonuma decides to check for accesses to a mapped shared file page that is
dirty. pte_present gets cleared, pte_write stays set (due to logic
introduced in commit b191f9b106 "mm: numa: preserve PTE write permissions
across a NUMA hinting fault"). Then page writeback happens, page_mkclean()
is called to write-protect the page. However page_check_address() returns
NULL for the PTE (__page_check_address() returns NULL for !pte_present
PTEs) so we don't clear pte_write bit in page_mkclean_one(). Sometime later
a process looks at the page through mmap, takes NUMA fault and
do_numa_page() reestablishes a writeable mapping of the page although the
filesystem does not expect there to be one and funny things happen
afterwards...

I'll defer to more mm-savvy people to decide how this should be fixed. My
naive understanding is that page_mkclean_one() should clear the pte_write
bit even for pages that are undergoing NUMA probation but I'm not sure
about a preferred way to achieve that...

								Honza

> [ 3177.528954] ------------[ cut here ]------------
> [ 3177.528968] WARNING: CPU: 155 PID: 84480 at fs/ext4/inode.c:3711 ext4_set_page_dirty+0x9c/0xe0
> [ 3177.528969] Modules linked in: powernv_op_panel
> [ 3177.528977] CPU: 155 PID: 84480 Comm: stress-ng-mmap Not tainted 4.10.0-rc8-00038-g528b408-dirty #6
> [ 3177.528979] task: c000000bbbda7d00 task.stack: c000001d777c0000
> [ 3177.528981] NIP: c00000000043322c LR: c00000000027f460 CTR: c000000000433190
> [ 3177.528983] REGS: c000001d777c3850 TRAP: 0700   Not tainted  (4.10.0-rc8-00038-g528b408-dirty)
> [ 3177.528984] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>
> [ 3177.528994]   CR: 22082442  XER: 00000000
> [ 3177.528995] CFAR: c0000000004331dc SOFTE: 1 
>                GPR00: c00000000027f460 c000001d777c3ad0 c000000000fb9c00 f0000000063ac880 
>                GPR04: 0000000000000010 00000000c0000018 0000000000000000 f0000000073dbf20 
>                GPR08: c000000000f39c00 0000000000000001 c000000000f39c00 0000000000000018 
>                GPR12: c000000000433190 c00000000fb9c080 c0000018eb220386 0008000000000000 
>                GPR16: f0000000063ac880 c000001e44c76f90 c000001e180c6ca0 00007fffa20f0000 
>                GPR20: c000001bfc672d48 0000000000000000 c000001d777c3b40 00007fffa20e0000 
>                GPR24: 0000000000000000 c1ffffffffffe7ff ffffffffffffffff 860322eb180000c0 
>                GPR28: c000001e180c6900 c000001d777c3cc0 00007fffa20f0000 f0000000063ac880 
> [ 3177.529032] NIP [c00000000043322c] ext4_set_page_dirty+0x9c/0xe0
> [ 3177.529035] LR [c00000000027f460] set_page_dirty+0xb0/0x190
> [ 3177.529036] Call Trace:
> [ 3177.529039] [c000001d777c3ad0] [00007fffa20f0000] 0x7fffa20f0000 (unreliable)
> [ 3177.529043] [c000001d777c3af0] [c00000000027f460] set_page_dirty+0xb0/0x190
> [ 3177.529047] [c000001d777c3b20] [c0000000002c0abc] unmap_page_range+0xf1c/0x1040
> [ 3177.529050] [c000001d777c3c50] [c0000000002c10f4] unmap_vmas+0x84/0x120
> [ 3177.529053] [c000001d777c3ca0] [c0000000002cbe80] unmap_region+0xd0/0x1a0
> [ 3177.529057] [c000001d777c3d80] [c0000000002ce7cc] do_munmap+0x2dc/0x4a0
> [ 3177.529061] [c000001d777c3df0] [c0000000002cea94] SyS_munmap+0x64/0xb0
> [ 3177.529065] [c000001d777c3e30] [c00000000000b96c] system_call+0x38/0xfc
> [ 3177.529066] Instruction dump:
> [ 3177.529068] 7c0803a6 4e800020 60000000 60000000 60420000 3d42fff8 892a6416 2f890000 
> [ 3177.529076] 409effc4 39200001 3d02fff8 99286416 <0fe00000> 4bffffb0 60000000 60000000 
> [ 3177.529083] ---[ end trace 50350faad3b7b385 ]---
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Ext4 stack trace with savedwrite patches
  2017-03-01  9:49 ` Ext4 stack trace with savedwrite patches Jan Kara
@ 2017-03-01 10:23   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 2+ messages in thread
From: Aneesh Kumar K.V @ 2017-03-01 10:23 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, mgorman, linux-mm



On Wednesday 01 March 2017 03:19 PM, Jan Kara wrote:
> Hi,
>
> On Fri 24-02-17 19:23:52, Aneesh Kumar K.V wrote:
>> I am hitting this while running stress test with the saved write patch
>> series. I guess we are missing a set page dirty some where. I will
>> continue to debug this, but if you have any suggestion let me know.
> <snip>
>
> So this warning can happen when page got dirtied but ->page_mkwrite() was
> not called. I don't know details of how autonuma works but a quick look
> suggests that autonuma can also do numa hinting faults for file pages.
> So the following seems to be possible:
>
> Autonuma decides to check for accesses to a mapped shared file page that is
> dirty. pte_present gets cleared, pte_write stays set (due to logic
> introduced in commit b191f9b106 "mm: numa: preserve PTE write permissions
> across a NUMA hinting fault"). Then page writeback happens, page_mkclean()
> is called to write-protect the page. However page_check_address() returns
> NULL for the PTE (__page_check_address() returns NULL for !pte_present
> PTEs) so we don't clear pte_write bit in page_mkclean_one().


Even though we cleared _PAGE_PRESENT a pte_present() check return true 
for numa fault pte. The problem with savedwrite patch series that i 
quoted in the original mail was that pte_write() was checking on 
_PAGE_WRITE where as numa fault stashed the write bit as savedwrite bit. 
Hence page_mkclean was skipping those ptes.

> Sometime later
> a process looks at the page through mmap, takes NUMA fault and
> do_numa_page() reestablishes a writeable mapping of the page although the
> filesystem does not expect there to be one and funny things happen
> afterwards...
>
> I'll defer to more mm-savvy people to decide how this should be fixed. My
> naive understanding is that page_mkclean_one() should clear the pte_write
> bit even for pages that are undergoing NUMA probation but I'm not sure
> about a preferred way to achieve that...
>
>


Yes found that and finally decided that instead of fixing all those code 
path, we can update pte_write to handle autonuma preserved write bit.

https://lkml.kernel.org/r/1488203787-17849-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-03-01 10:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <87innzu233.fsf@skywalker.in.ibm.com>
2017-03-01  9:49 ` Ext4 stack trace with savedwrite patches Jan Kara
2017-03-01 10:23   ` Aneesh Kumar K.V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.