linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mm: WARNING in __delete_from_page_cache
@ 2016-01-24 10:48 Dmitry Vyukov
  2016-01-24 23:04 ` Kirill A. Shutemov
  2016-01-27 18:02 ` Ross Zwisler
  0 siblings, 2 replies; 12+ messages in thread
From: Dmitry Vyukov @ 2016-01-24 10:48 UTC (permalink / raw)
  To: Matthew Wilcox, Alexander Viro, linux-fsdevel, LKML,
	Andrew Morton, Michal Hocko, Jan Kara, Vlastimil Babka,
	Kirill A. Shutemov, Matthew Wilcox, Junichi Nomura, Greg Thelen,
	Dave Hansen, linux-mm
  Cc: syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin

Hello,

The following program triggers WARNING in __delete_from_page_cache:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
__delete_from_page_cache+0x9f6/0xb60()
Modules linked in:
CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
 ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
 ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
 [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
 [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
 [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
 [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
 [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
 [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
 [<     inline     >] wp_pfn_shared mm/memory.c:2208
 [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
 [<     inline     >] handle_pte_fault mm/memory.c:3323
 [<     inline     >] __handle_mm_fault mm/memory.c:3417
 [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
 [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
 [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
 [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
 [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
 [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
---[ end trace dae21e0f85f1f98c ]---


// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include <pthread.h>
#include <stdint.h>
#include <string.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <fcntl.h>

int main()
{
  syscall(SYS_mmap, 0x20000000ul, 0x10000ul, 0x3ul, 0x32ul, -1, 0x0ul);
  int fd = syscall(SYS_open, "/dev/ram1", O_RDWR);
  syscall(SYS_mmap, 0x20a31000ul, 0x3000ul, 0x3ul, 0xb011ul, fd, 0x0ul);
  *(uint64_t*)0x20003000 = 1;
  syscall(SYS_write, fd, 0x20003000ul, 0x78ul, 0, 0, 0);
  syscall(SYS_getresuid, 0x20000688ul, 0x200008f2ul, 0x20a31000ul, 0, 0, 0);
  return 0;
}

On commit 30f05309bde49295e02e45c7e615f73aa4e0ccc2.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mm: WARNING in __delete_from_page_cache
  2016-01-24 10:48 mm: WARNING in __delete_from_page_cache Dmitry Vyukov
@ 2016-01-24 23:04 ` Kirill A. Shutemov
  2016-01-25 12:22   ` Jan Kara
  2016-01-27 18:02 ` Ross Zwisler
  1 sibling, 1 reply; 12+ messages in thread
From: Kirill A. Shutemov @ 2016-01-24 23:04 UTC (permalink / raw)
  To: Dmitry Vyukov, Matthew Wilcox
  Cc: Alexander Viro, linux-fsdevel, LKML, Andrew Morton, Michal Hocko,
	Jan Kara, Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Junichi Nomura, Greg Thelen, Dave Hansen, linux-mm, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin

On Sun, Jan 24, 2016 at 11:48:21AM +0100, Dmitry Vyukov wrote:
> Hello,
> 
> The following program triggers WARNING in __delete_from_page_cache:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
> __delete_from_page_cache+0x9f6/0xb60()
> Modules linked in:
> CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
>  ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
>  ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>  [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
>  [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
>  [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
>  [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
>  [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
>  [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
>  [<     inline     >] wp_pfn_shared mm/memory.c:2208
>  [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
>  [<     inline     >] handle_pte_fault mm/memory.c:3323
>  [<     inline     >] __handle_mm_fault mm/memory.c:3417
>  [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
>  [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
>  [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
>  [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
>  [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
>  [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
> ---[ end trace dae21e0f85f1f98c ]---
> 
> 
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #include <pthread.h>
> #include <stdint.h>
> #include <string.h>
> #include <sys/syscall.h>
> #include <unistd.h>
> #include <fcntl.h>
> 
> int main()
> {
>   syscall(SYS_mmap, 0x20000000ul, 0x10000ul, 0x3ul, 0x32ul, -1, 0x0ul);
>   int fd = syscall(SYS_open, "/dev/ram1", O_RDWR);
>   syscall(SYS_mmap, 0x20a31000ul, 0x3000ul, 0x3ul, 0xb011ul, fd, 0x0ul);
>   *(uint64_t*)0x20003000 = 1;
>   syscall(SYS_write, fd, 0x20003000ul, 0x78ul, 0, 0, 0);
>   syscall(SYS_getresuid, 0x20000688ul, 0x200008f2ul, 0x20a31000ul, 0, 0, 0);
>   return 0;
> }
> 
> On commit 30f05309bde49295e02e45c7e615f73aa4e0ccc2.

Reduced and human readable test case:

#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>

int main()
{
	int fd;
	char *p;

	fd = open("/dev/ram0", O_RDWR);
	p = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
	write(fd, "1", 1);
	*p = 1;
	return 0;
}

Looks like DAX doesn't expect to see something except hole-page in the radix
tree. This expectation is [probably] true for files on DAX-enabled
filesystems, but it seems broken for ramdisks.

Matthew?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mm: WARNING in __delete_from_page_cache
  2016-01-24 23:04 ` Kirill A. Shutemov
@ 2016-01-25 12:22   ` Jan Kara
  2016-01-26  3:42     ` Williams, Dan J
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2016-01-25 12:22 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dmitry Vyukov, Matthew Wilcox, Alexander Viro, linux-fsdevel,
	LKML, Andrew Morton, Michal Hocko, Jan Kara, Vlastimil Babka,
	Kirill A. Shutemov, Matthew Wilcox, Junichi Nomura, Greg Thelen,
	Dave Hansen, linux-mm, syzkaller, Kostya Serebryany,
	Alexander Potapenko, Sasha Levin, Dan Williams

On Mon 25-01-16 01:04:22, Kirill A. Shutemov wrote:
> On Sun, Jan 24, 2016 at 11:48:21AM +0100, Dmitry Vyukov wrote:
> > Hello,
> > 
> > The following program triggers WARNING in __delete_from_page_cache:
> > 
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
> > __delete_from_page_cache+0x9f6/0xb60()
> > Modules linked in:
> > CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> >  00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
> >  ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
> >  ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
> > Call Trace:
> >  [<     inline     >] __dump_stack lib/dump_stack.c:15
> >  [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
> >  [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
> >  [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
> >  [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
> >  [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
> >  [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
> >  [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
> >  [<     inline     >] wp_pfn_shared mm/memory.c:2208
> >  [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
> >  [<     inline     >] handle_pte_fault mm/memory.c:3323
> >  [<     inline     >] __handle_mm_fault mm/memory.c:3417
> >  [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
> >  [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
> >  [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
> >  [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
> >  [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
> >  [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
> > arch/x86/entry/entry_64.S:185
> > ---[ end trace dae21e0f85f1f98c ]---
> > 
> > 
> > // autogenerated by syzkaller (http://github.com/google/syzkaller)
> > #include <pthread.h>
> > #include <stdint.h>
> > #include <string.h>
> > #include <sys/syscall.h>
> > #include <unistd.h>
> > #include <fcntl.h>
> > 
> > int main()
> > {
> >   syscall(SYS_mmap, 0x20000000ul, 0x10000ul, 0x3ul, 0x32ul, -1, 0x0ul);
> >   int fd = syscall(SYS_open, "/dev/ram1", O_RDWR);
> >   syscall(SYS_mmap, 0x20a31000ul, 0x3000ul, 0x3ul, 0xb011ul, fd, 0x0ul);
> >   *(uint64_t*)0x20003000 = 1;
> >   syscall(SYS_write, fd, 0x20003000ul, 0x78ul, 0, 0, 0);
> >   syscall(SYS_getresuid, 0x20000688ul, 0x200008f2ul, 0x20a31000ul, 0, 0, 0);
> >   return 0;
> > }
> > 
> > On commit 30f05309bde49295e02e45c7e615f73aa4e0ccc2.
> 
> Reduced and human readable test case:
> 
> #include <fcntl.h>
> #include <unistd.h>
> #include <sys/mman.h>
> 
> int main()
> {
> 	int fd;
> 	char *p;
> 
> 	fd = open("/dev/ram0", O_RDWR);
> 	p = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> 	write(fd, "1", 1);
> 	*p = 1;
> 	return 0;
> }
> 
> Looks like DAX doesn't expect to see something except hole-page in the radix
> tree. This expectation is [probably] true for files on DAX-enabled
> filesystems, but it seems broken for ramdisks.
> 
> Matthew?

Thanks. Despite the huge list of recipients the author of the changes
hasn't been CCed :) I've added Dan to CC since he wrote DAX support for
block devices. It seems somehow the write didn't go through the DAX path
but through the standard page cache write path. Ah, I see, only
file->f_mapping->host has S_DAX set but io_is_direct() which decides
whether DAX or pagecache path should be used for writes uses file->f_inode
which is something different for block devices... 

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mm: WARNING in __delete_from_page_cache
  2016-01-25 12:22   ` Jan Kara
@ 2016-01-26  3:42     ` Williams, Dan J
  2016-01-26  8:29       ` Jan Kara
  2016-01-26 12:54       ` Matthew Wilcox
  0 siblings, 2 replies; 12+ messages in thread
From: Williams, Dan J @ 2016-01-26  3:42 UTC (permalink / raw)
  To: kirill, jack
  Cc: syzkaller, linux-kernel, kirill.shutemov, linux-mm, dave.hansen,
	kcc, vbabka, viro, dvyukov, akpm, gthelen, willy, linux-fsdevel,
	mhocko, jack, glider, sasha.levin, Wilcox, Matthew R, j-nomura

On Mon, 2016-01-25 at 13:22 +0100, Jan Kara wrote:
[..]
> Thanks. Despite the huge list of recipients the author of the changes
> hasn't been CCed :) I've added Dan to CC since he wrote DAX support
> for
> block devices. It seems somehow the write didn't go through the DAX
> path
> but through the standard page cache write path. Ah, I see, only
> file->f_mapping->host has S_DAX set but io_is_direct() which decides
> whether DAX or pagecache path should be used for writes uses file-
> >f_inode
> which is something different for block devices... 

Thanks, yes, the following silences the warning for me:

8<----- (git am --scissors)
Subject: fs, block: force direct-I/O for dax-enabled block devices

From: Dan Williams <dan.j.williams@intel.com>

Similar to the file I/O path, re-direct all I/O to the DAX path for I/O
to a block-device special file.

Otherwise, we confuse the DAX code that does not expect to find live
data in the page cache:

    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
    __delete_from_page_cache+0x9f6/0xb60()
    Modules linked in:
    CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
     00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
     ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
     ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
    Call Trace:
     [<     inline     >] __dump_stack lib/dump_stack.c:15
     [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
     [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
     [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
     [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
     [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
     [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
     [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
     [<     inline     >] wp_pfn_shared mm/memory.c:2208
     [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
     [<     inline     >] handle_pte_fault mm/memory.c:3323
     [<     inline     >] __handle_mm_fault mm/memory.c:3417
     [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
     [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
     [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
     [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
     [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
     [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
    arch/x86/entry/entry_64.S:185
    ---[ end trace dae21e0f85f1f98c ]---

Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Fixes: 5a023cdba50c ("block: enable dax for raw block devices")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/block_dev.c     |    5 -----
 include/linux/fs.h |   12 +++++++++++-
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7b9cd49622b1..277008617b2d 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -156,11 +156,6 @@ blkdev_get_block(struct inode *inode, sector_t iblock,
 	return 0;
 }
 
-static struct inode *bdev_file_inode(struct file *file)
-{
-	return file->f_mapping->host;
-}
-
 static ssize_t
 blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1a2046275cdf..a4c4314eed48 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1237,6 +1237,11 @@ static inline struct inode *file_inode(const struct file *f)
 	return f->f_inode;
 }
 
+static inline struct inode *bdev_file_inode(struct file *file)
+{
+	return file->f_mapping->host;
+}
+
 static inline int locks_lock_file_wait(struct file *filp, struct file_lock *fl)
 {
 	return locks_lock_inode_wait(file_inode(filp), fl);
@@ -2907,7 +2912,12 @@ extern void replace_mount_options(struct super_block *sb, char *options);
 
 static inline bool io_is_direct(struct file *filp)
 {
-	return (filp->f_flags & O_DIRECT) || IS_DAX(file_inode(filp));
+	struct inode *inode = file_inode(filp);
+
+	if (S_ISBLK(inode->i_mode))
+		inode = bdev_file_inode(filp);
+
+	return (filp->f_flags & O_DIRECT) || IS_DAX(inode);
 }
 
 static inline int iocb_flags(struct file *file)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: mm: WARNING in __delete_from_page_cache
  2016-01-26  3:42     ` Williams, Dan J
@ 2016-01-26  8:29       ` Jan Kara
  2016-01-26 12:54       ` Matthew Wilcox
  1 sibling, 0 replies; 12+ messages in thread
From: Jan Kara @ 2016-01-26  8:29 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: kirill, jack, syzkaller, linux-kernel, kirill.shutemov, linux-mm,
	dave.hansen, kcc, vbabka, viro, dvyukov, akpm, gthelen, willy,
	linux-fsdevel, mhocko, jack, glider, sasha.levin, Wilcox,
	Matthew R, j-nomura

On Tue 26-01-16 03:42:34, Williams, Dan J wrote:
> On Mon, 2016-01-25 at 13:22 +0100, Jan Kara wrote:
> [..]
> > Thanks. Despite the huge list of recipients the author of the changes
> > hasn't been CCed :) I've added Dan to CC since he wrote DAX support
> > for
> > block devices. It seems somehow the write didn't go through the DAX
> > path
> > but through the standard page cache write path. Ah, I see, only
> > file->f_mapping->host has S_DAX set but io_is_direct() which decides
> > whether DAX or pagecache path should be used for writes uses file-
> > >f_inode
> > which is something different for block devices... 
> 
> Thanks, yes, the following silences the warning for me:
> 
> 8<----- (git am --scissors)
> Subject: fs, block: force direct-I/O for dax-enabled block devices
> 
> From: Dan Williams <dan.j.williams@intel.com>
> 
> Similar to the file I/O path, re-direct all I/O to the DAX path for I/O
> to a block-device special file.
> 
> Otherwise, we confuse the DAX code that does not expect to find live
> data in the page cache:
> 
>     ------------[ cut here ]------------
>     WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
>     __delete_from_page_cache+0x9f6/0xb60()
>     Modules linked in:
>     CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
>     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>      00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
>      ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
>      ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
>     Call Trace:
>      [<     inline     >] __dump_stack lib/dump_stack.c:15
>      [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>      [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
>      [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
>      [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
>      [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
>      [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
>      [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
>      [<     inline     >] wp_pfn_shared mm/memory.c:2208
>      [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
>      [<     inline     >] handle_pte_fault mm/memory.c:3323
>      [<     inline     >] __handle_mm_fault mm/memory.c:3417
>      [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
>      [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
>      [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
>      [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
>      [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
>      [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
>     arch/x86/entry/entry_64.S:185
>     ---[ end trace dae21e0f85f1f98c ]---
> 
> Cc: Matthew Wilcox <willy@linux.intel.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Fixes: 5a023cdba50c ("block: enable dax for raw block devices")
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> ---
>  fs/block_dev.c     |    5 -----
>  include/linux/fs.h |   12 +++++++++++-
>  2 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 7b9cd49622b1..277008617b2d 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -156,11 +156,6 @@ blkdev_get_block(struct inode *inode, sector_t iblock,
>  	return 0;
>  }
>  
> -static struct inode *bdev_file_inode(struct file *file)
> -{
> -	return file->f_mapping->host;
> -}
> -
>  static ssize_t
>  blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
>  {
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 1a2046275cdf..a4c4314eed48 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1237,6 +1237,11 @@ static inline struct inode *file_inode(const struct file *f)
>  	return f->f_inode;
>  }
>  
> +static inline struct inode *bdev_file_inode(struct file *file)
> +{
> +	return file->f_mapping->host;
> +}
> +
>  static inline int locks_lock_file_wait(struct file *filp, struct file_lock *fl)
>  {
>  	return locks_lock_inode_wait(file_inode(filp), fl);
> @@ -2907,7 +2912,12 @@ extern void replace_mount_options(struct super_block *sb, char *options);
>  
>  static inline bool io_is_direct(struct file *filp)
>  {
> -	return (filp->f_flags & O_DIRECT) || IS_DAX(file_inode(filp));
> +	struct inode *inode = file_inode(filp);
> +
> +	if (S_ISBLK(inode->i_mode))
> +		inode = bdev_file_inode(filp);
> +
> +	return (filp->f_flags & O_DIRECT) || IS_DAX(inode);
>  }
>  
>  static inline int iocb_flags(struct file *file)
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mm: WARNING in __delete_from_page_cache
  2016-01-26  3:42     ` Williams, Dan J
  2016-01-26  8:29       ` Jan Kara
@ 2016-01-26 12:54       ` Matthew Wilcox
  2016-01-26 13:36         ` Jan Kara
  1 sibling, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2016-01-26 12:54 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: kirill, jack, syzkaller, linux-kernel, kirill.shutemov, linux-mm,
	dave.hansen, kcc, vbabka, viro, dvyukov, akpm, gthelen,
	linux-fsdevel, mhocko, jack, glider, sasha.levin, Wilcox,
	Matthew R, j-nomura

On Tue, Jan 26, 2016 at 03:42:34AM +0000, Williams, Dan J wrote:
> @@ -2907,7 +2912,12 @@ extern void replace_mount_options(struct super_block *sb, char *options);
>  
>  static inline bool io_is_direct(struct file *filp)
>  {
> -	return (filp->f_flags & O_DIRECT) || IS_DAX(file_inode(filp));

I think this should just be a one-liner:

-	return (filp->f_flags & O_DIRECT) || IS_DAX(file_inode(filp));
+	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);

This does the right thing for block device inodes and filesystem inodes.
(see the opening stanzas of __dax_fault for an example).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mm: WARNING in __delete_from_page_cache
  2016-01-26 12:54       ` Matthew Wilcox
@ 2016-01-26 13:36         ` Jan Kara
  2016-01-26 16:59           ` Williams, Dan J
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2016-01-26 13:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Williams, Dan J, kirill, jack, syzkaller, linux-kernel,
	kirill.shutemov, linux-mm, dave.hansen, kcc, vbabka, viro,
	dvyukov, akpm, gthelen, linux-fsdevel, mhocko, jack, glider,
	sasha.levin, Wilcox, Matthew R, j-nomura

On Tue 26-01-16 07:54:56, Matthew Wilcox wrote:
> On Tue, Jan 26, 2016 at 03:42:34AM +0000, Williams, Dan J wrote:
> > @@ -2907,7 +2912,12 @@ extern void replace_mount_options(struct super_block *sb, char *options);
> >  
> >  static inline bool io_is_direct(struct file *filp)
> >  {
> > -	return (filp->f_flags & O_DIRECT) || IS_DAX(file_inode(filp));
> 
> I think this should just be a one-liner:
> 
> -	return (filp->f_flags & O_DIRECT) || IS_DAX(file_inode(filp));
> +	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
> 
> This does the right thing for block device inodes and filesystem inodes.
> (see the opening stanzas of __dax_fault for an example).

Ah, right. This looks indeed better.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mm: WARNING in __delete_from_page_cache
  2016-01-26 13:36         ` Jan Kara
@ 2016-01-26 16:59           ` Williams, Dan J
  2016-01-27 22:04             ` Ross Zwisler
  0 siblings, 1 reply; 12+ messages in thread
From: Williams, Dan J @ 2016-01-26 16:59 UTC (permalink / raw)
  To: jack, willy
  Cc: syzkaller, linux-kernel, kirill.shutemov, linux-mm, kcc,
	dave.hansen, vbabka, viro, dvyukov, akpm, gthelen, kirill,
	linux-fsdevel, mhocko, jack, glider, sasha.levin, Wilcox,
	Matthew R, j-nomura

On Tue, 2016-01-26 at 14:36 +0100, Jan Kara wrote:
> On Tue 26-01-16 07:54:56, Matthew Wilcox wrote:
> > On Tue, Jan 26, 2016 at 03:42:34AM +0000, Williams, Dan J wrote:
> > > @@ -2907,7 +2912,12 @@ extern void replace_mount_options(struct
> > > super_block *sb, char *options);
> > >  
> > >  static inline bool io_is_direct(struct file *filp)
> > >  {
> > > -	return (filp->f_flags & O_DIRECT) ||
> > > IS_DAX(file_inode(filp));
> > 
> > I think this should just be a one-liner:
> > 
> > -	return (filp->f_flags & O_DIRECT) ||
> > IS_DAX(file_inode(filp));
> > +	return (filp->f_flags & O_DIRECT) || IS_DAX(filp-
> > >f_mapping->host);
> > 
> > This does the right thing for block device inodes and filesystem
> > inodes.
> > (see the opening stanzas of __dax_fault for an example).
> 
> Ah, right. This looks indeed better.
> 


Oh, yeah, looks good.

8<---- (git am --scissors)
Subject: fs, block: force direct-I/O for dax-enabled block devices

From: Dan Williams <dan.j.williams@intel.com>

Similar to the file I/O path, re-direct all I/O to the DAX path for I/O
to a block-device special file.  Both regular files and device special
files can use the common filp->f_mapping->host lookup to determing is
DAX is enabled.

Otherwise, we confuse the DAX code that does not expect to find live
data in the page cache:

    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
    __delete_from_page_cache+0x9f6/0xb60()
    Modules linked in:
    CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
     00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
     ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
     ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
    Call Trace:
     [<     inline     >] __dump_stack lib/dump_stack.c:15
     [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
     [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
     [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
     [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
     [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
     [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
     [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
     [<     inline     >] wp_pfn_shared mm/memory.c:2208
     [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
     [<     inline     >] handle_pte_fault mm/memory.c:3323
     [<     inline     >] __handle_mm_fault mm/memory.c:3417
     [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
     [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
     [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
     [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
     [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
     [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
    arch/x86/entry/entry_64.S:185
    ---[ end trace dae21e0f85f1f98c ]---

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Fixes: 5a023cdba50c ("block: enable dax for raw block devices")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/fs.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1a2046275cdf..b10002d4a5f5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2907,7 +2907,7 @@ extern void replace_mount_options(struct super_block *sb, char *options);
 
 static inline bool io_is_direct(struct file *filp)
 {
-	return (filp->f_flags & O_DIRECT) || IS_DAX(file_inode(filp));
+	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
 }
 
 static inline int iocb_flags(struct file *file)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: mm: WARNING in __delete_from_page_cache
  2016-01-24 10:48 mm: WARNING in __delete_from_page_cache Dmitry Vyukov
  2016-01-24 23:04 ` Kirill A. Shutemov
@ 2016-01-27 18:02 ` Ross Zwisler
  2016-01-27 18:07   ` Dmitry Vyukov
  1 sibling, 1 reply; 12+ messages in thread
From: Ross Zwisler @ 2016-01-27 18:02 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Matthew Wilcox, Alexander Viro, linux-fsdevel, LKML,
	Andrew Morton, Michal Hocko, Jan Kara, Vlastimil Babka,
	Kirill A. Shutemov, Matthew Wilcox, Junichi Nomura, Greg Thelen,
	Dave Hansen, linux-mm, syzkaller, Kostya Serebryany,
	Alexander Potapenko, Sasha Levin

On Sun, Jan 24, 2016 at 3:48 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Hello,
>
> The following program triggers WARNING in __delete_from_page_cache:
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
> __delete_from_page_cache+0x9f6/0xb60()
> Modules linked in:
> CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
>  ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
>  ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>  [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
>  [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
>  [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
>  [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
>  [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
>  [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
>  [<     inline     >] wp_pfn_shared mm/memory.c:2208
>  [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
>  [<     inline     >] handle_pte_fault mm/memory.c:3323
>  [<     inline     >] __handle_mm_fault mm/memory.c:3417

Having inline functions represented in the stack trace and having file
names with line numbers seems really useful - how did you get this
output?  Is this a feature of some kernel patch applied for syzkaller?

>  [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
>  [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
>  [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
>  [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
>  [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
>  [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
> ---[ end trace dae21e0f85f1f98c ]---
>
>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #include <pthread.h>
> #include <stdint.h>
> #include <string.h>
> #include <sys/syscall.h>
> #include <unistd.h>
> #include <fcntl.h>
>
> int main()
> {
>   syscall(SYS_mmap, 0x20000000ul, 0x10000ul, 0x3ul, 0x32ul, -1, 0x0ul);
>   int fd = syscall(SYS_open, "/dev/ram1", O_RDWR);
>   syscall(SYS_mmap, 0x20a31000ul, 0x3000ul, 0x3ul, 0xb011ul, fd, 0x0ul);
>   *(uint64_t*)0x20003000 = 1;
>   syscall(SYS_write, fd, 0x20003000ul, 0x78ul, 0, 0, 0);
>   syscall(SYS_getresuid, 0x20000688ul, 0x200008f2ul, 0x20a31000ul, 0, 0, 0);
>   return 0;
> }
>
> On commit 30f05309bde49295e02e45c7e615f73aa4e0ccc2.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mm: WARNING in __delete_from_page_cache
  2016-01-27 18:02 ` Ross Zwisler
@ 2016-01-27 18:07   ` Dmitry Vyukov
  0 siblings, 0 replies; 12+ messages in thread
From: Dmitry Vyukov @ 2016-01-27 18:07 UTC (permalink / raw)
  To: syzkaller
  Cc: Matthew Wilcox, Alexander Viro, linux-fsdevel, LKML,
	Andrew Morton, Michal Hocko, Jan Kara, Vlastimil Babka,
	Kirill A. Shutemov, Matthew Wilcox, Junichi Nomura, Greg Thelen,
	Dave Hansen, linux-mm, Kostya Serebryany, Alexander Potapenko,
	Sasha Levin

On Wed, Jan 27, 2016 at 7:02 PM, Ross Zwisler <zwisler@gmail.com> wrote:
> On Sun, Jan 24, 2016 at 3:48 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> Hello,
>>
>> The following program triggers WARNING in __delete_from_page_cache:
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
>> __delete_from_page_cache+0x9f6/0xb60()
>> Modules linked in:
>> CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>  00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
>>  ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
>>  ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
>> Call Trace:
>>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>>  [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>>  [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
>>  [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
>>  [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
>>  [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
>>  [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
>>  [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
>>  [<     inline     >] wp_pfn_shared mm/memory.c:2208
>>  [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
>>  [<     inline     >] handle_pte_fault mm/memory.c:3323
>>  [<     inline     >] __handle_mm_fault mm/memory.c:3417
>
> Having inline functions represented in the stack trace and having file
> names with line numbers seems really useful - how did you get this
> output?  Is this a feature of some kernel patch applied for syzkaller?


I pipe normal kernel output through this script:
https://github.com/google/sanitizers/blob/master/address-sanitizer/tools/kasan_symbolize.py

If you are in linux source dir with vmlinux and modules, then you just do:
$ cat crash | kasan_symbolize.py

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mm: WARNING in __delete_from_page_cache
  2016-01-26 16:59           ` Williams, Dan J
@ 2016-01-27 22:04             ` Ross Zwisler
  2016-01-28 19:51               ` Ross Zwisler
  0 siblings, 1 reply; 12+ messages in thread
From: Ross Zwisler @ 2016-01-27 22:04 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: jack, willy, syzkaller, linux-kernel, kirill.shutemov, linux-mm,
	kcc, dave.hansen, vbabka, viro, dvyukov, akpm, gthelen, kirill,
	linux-fsdevel, mhocko, jack, glider, sasha.levin, Wilcox,
	Matthew R, j-nomura

On Tue, Jan 26, 2016 at 04:59:27PM +0000, Williams, Dan J wrote:
> On Tue, 2016-01-26 at 14:36 +0100, Jan Kara wrote:
> > On Tue 26-01-16 07:54:56, Matthew Wilcox wrote:
> > > On Tue, Jan 26, 2016 at 03:42:34AM +0000, Williams, Dan J wrote:
> > > > @@ -2907,7 +2912,12 @@ extern void replace_mount_options(struct
> > > > super_block *sb, char *options);
> > > >  
> > > >  static inline bool io_is_direct(struct file *filp)
> > > >  {
> > > > -	return (filp->f_flags & O_DIRECT) ||
> > > > IS_DAX(file_inode(filp));
> > > 
> > > I think this should just be a one-liner:
> > > 
> > > -	return (filp->f_flags & O_DIRECT) ||
> > > IS_DAX(file_inode(filp));
> > > +	return (filp->f_flags & O_DIRECT) || IS_DAX(filp-
> > > >f_mapping->host);
> > > 
> > > This does the right thing for block device inodes and filesystem
> > > inodes.
> > > (see the opening stanzas of __dax_fault for an example).
> > 
> > Ah, right. This looks indeed better.
> > 
> 
> 
> Oh, yeah, looks good.
> 
> 8<---- (git am --scissors)
> Subject: fs, block: force direct-I/O for dax-enabled block devices
> 
> From: Dan Williams <dan.j.williams@intel.com>
> 
> Similar to the file I/O path, re-direct all I/O to the DAX path for I/O
> to a block-device special file.  Both regular files and device special
> files can use the common filp->f_mapping->host lookup to determing is
> DAX is enabled.
> 
> Otherwise, we confuse the DAX code that does not expect to find live
> data in the page cache:
> 
>     ------------[ cut here ]------------
>     WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
>     __delete_from_page_cache+0x9f6/0xb60()
>     Modules linked in:
>     CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
>     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>      00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
>      ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
>      ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
>     Call Trace:
>      [<     inline     >] __dump_stack lib/dump_stack.c:15
>      [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>      [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
>      [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
>      [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
>      [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
>      [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
>      [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
>      [<     inline     >] wp_pfn_shared mm/memory.c:2208
>      [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
>      [<     inline     >] handle_pte_fault mm/memory.c:3323
>      [<     inline     >] __handle_mm_fault mm/memory.c:3417
>      [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
>      [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
>      [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
>      [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
>      [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
>      [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
>     arch/x86/entry/entry_64.S:185
>     ---[ end trace dae21e0f85f1f98c ]---
> 
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Fixes: 5a023cdba50c ("block: enable dax for raw block devices")
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
> Suggested-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Suggested-by: Matthew Wilcox <willy@linux.intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

I had a test case where I was hitting a warning while inserting into the page
cache when the inode was supposed to be DAX, and this clears up my issue as
well.

Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>

> ---
>  include/linux/fs.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 1a2046275cdf..b10002d4a5f5 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2907,7 +2907,7 @@ extern void replace_mount_options(struct super_block *sb, char *options);
>  
>  static inline bool io_is_direct(struct file *filp)
>  {
> -	return (filp->f_flags & O_DIRECT) || IS_DAX(file_inode(filp));
> +	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
>  }
>  
>  static inline int iocb_flags(struct file *file)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mm: WARNING in __delete_from_page_cache
  2016-01-27 22:04             ` Ross Zwisler
@ 2016-01-28 19:51               ` Ross Zwisler
  0 siblings, 0 replies; 12+ messages in thread
From: Ross Zwisler @ 2016-01-28 19:51 UTC (permalink / raw)
  To: Ross Zwisler, Williams, Dan J, jack, willy, syzkaller,
	linux-kernel, kirill.shutemov, linux-mm, kcc, dave.hansen,
	vbabka, viro, dvyukov, akpm, gthelen, kirill, linux-fsdevel,
	mhocko, jack, glider, sasha.levin, Wilcox, Matthew R, j-nomura

On Wed, Jan 27, 2016 at 03:04:21PM -0700, Ross Zwisler wrote:
> On Tue, Jan 26, 2016 at 04:59:27PM +0000, Williams, Dan J wrote:
> > 8<---- (git am --scissors)
> > Subject: fs, block: force direct-I/O for dax-enabled block devices
> > 
> > From: Dan Williams <dan.j.williams@intel.com>
> > 
> > Similar to the file I/O path, re-direct all I/O to the DAX path for I/O
> > to a block-device special file.  Both regular files and device special
> > files can use the common filp->f_mapping->host lookup to determing is
> > DAX is enabled.
> > 
> > Otherwise, we confuse the DAX code that does not expect to find live
> > data in the page cache:
> > 
> >     ------------[ cut here ]------------
> >     WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
> >     __delete_from_page_cache+0x9f6/0xb60()
> >     Modules linked in:
> >     CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
> >     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> >      00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
> >      ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
> >      ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
> >     Call Trace:
> >      [<     inline     >] __dump_stack lib/dump_stack.c:15
> >      [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
> >      [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
> >      [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
> >      [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
> >      [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
> >      [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
> >      [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
> >      [<     inline     >] wp_pfn_shared mm/memory.c:2208
> >      [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
> >      [<     inline     >] handle_pte_fault mm/memory.c:3323
> >      [<     inline     >] __handle_mm_fault mm/memory.c:3417
> >      [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
> >      [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
> >      [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
> >      [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
> >      [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
> >      [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
> >     arch/x86/entry/entry_64.S:185
> >     ---[ end trace dae21e0f85f1f98c ]---
> > 
> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Fixes: 5a023cdba50c ("block: enable dax for raw block devices")
> > Reported-by: Dmitry Vyukov <dvyukov@google.com>
> > Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Suggested-by: Matthew Wilcox <willy@linux.intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> I had a test case where I was hitting a warning while inserting into the page
> cache when the inode was supposed to be DAX, and this clears up my issue as
> well.
> 
> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> 
> > ---
> >  include/linux/fs.h |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 1a2046275cdf..b10002d4a5f5 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2907,7 +2907,7 @@ extern void replace_mount_options(struct super_block *sb, char *options);
> >  
> >  static inline bool io_is_direct(struct file *filp)
> >  {
> > -	return (filp->f_flags & O_DIRECT) || IS_DAX(file_inode(filp));
> > +	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
> >  }
> >  
> >  static inline int iocb_flags(struct file *file)

My testing has turned up another case where we can end up doing both page
cache I/O and DAX I/O to the same raw block device.  Here is the stack trace
of the error, passed through kasan_symbolize.py:

------------[ cut here ]------------                                            
WARNING: CPU: 1 PID: 598 at mm/filemap.c:590 __add_to_page_cache_locked+0x30d/0x3c0()
Modules linked in: nd_pmem nd_btt nd_e820 libnvdimm                             
CPU: 1 PID: 598 Comm: systemd-udevd Tainted: G        W       4.5.0-rc1+ #1     
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
 0000000000000000 0000000001e02977 ffff88040ff9fa88 ffffffff81576db2            
 0000000000000000 ffff88040ff9fac0 ffffffff810a65a2 ffff8804103d9020            
 0000000000000000 ffff8804103d9008 00000000ffffffea ffffea0002ebf080            
Call Trace:                                                                     
 [<     inline     >] __dump_stack lib/dump_stack.c:15                          
 [<ffffffff81576db2>] dump_stack+0x44/0x62 lib/dump_stack.c:50                  
 [<ffffffff810a65a2>] warn_slowpath_common+0x82/0xc0 kernel/panic.c:482         
 [<ffffffff810a66ea>] warn_slowpath_null+0x1a/0x20 kernel/panic.c:515           
 [<     inline     >] page_cache_tree_insert mm/filemap.c:590                   
 [<ffffffff811ca32d>] __add_to_page_cache_locked+0x30d/0x3c0 mm/filemap.c:649   
 [<ffffffff811ca449>] add_to_page_cache_lru+0x49/0xd0 mm/filemap.c:697          
 [<     inline     >] __read_cache_page mm/filemap.c:2300                       
 [<ffffffff811ca87b>] do_read_cache_page+0x6b/0x300 mm/filemap.c:2330           
 [<ffffffff811cab2c>] read_cache_page+0x1c/0x20 mm/filemap.c:2377               
 [<     inline     >] read_mapping_page include/linux/pagemap.h:391             
 [<ffffffff81558d74>] read_dev_sector+0x34/0xf0 block/partition-generic.c:558   
 [<     inline     >] read_part_sector block/partitions/check.h:37              
 [<ffffffff81560c4e>] read_lba+0x18e/0x290 block/partitions/efi.c:264           
 [<     inline     >] find_valid_gpt block/partitions/efi.c:610                 
 [<ffffffff81561542>] efi_partition+0xf2/0x7d0 block/partitions/efi.c:692       
 [<ffffffff8155b47e>] check_partition+0x13e/0x220 block/partitions/check.c:166  
 [<ffffffff81559670>] rescan_partitions+0xc0/0x2b0 block/partition-generic.c:434
 [<ffffffff81553aeb>] __blkdev_reread_part+0x6b/0xa0 block/ioctl.c:171          
 [<ffffffff81553b45>] blkdev_reread_part+0x25/0x40 block/ioctl.c:191            
 [<ffffffff815547b9>] blkdev_ioctl+0x5a9/0xab0 block/ioctl.c:624                
 [<ffffffff812a2303>] block_ioctl+0x43/0x50 fs/block_dev.c:1624                 
 [<     inline     >] vfs_ioctl fs/ioctl.c:43                                   
 [<ffffffff81274452>] do_vfs_ioctl+0xa2/0x6a0 fs/ioctl.c:674                    
 [<     inline     >] SYSC_ioctl fs/ioctl.c:689                                 
 [<ffffffff81274ac9>] SyS_ioctl+0x79/0x90 fs/ioctl.c:680                        
 [<ffffffff81a6a732>] entry_SYSCALL_64_fastpath+0x12/0x76 arch/x86/entry/entry_64.S:185
---[ end trace 96171b39eee8b31b ]---   

Here is my minimal reproducer:

#define _GNU_SOURCE                                                             
#include <sys/mman.h>                                                           
#include <sys/types.h>                                                          
#include <sys/stat.h>                                                           
#include <fcntl.h>                                                              
#include <linux/falloc.h>                                                       
#include <stdio.h>                                                              
#include <string.h>                                                             
#include <errno.h>                                                              
#include <unistd.h>                                                             
                                                                                
#define PAGE(a) ((a)*0x1000)                                                    
                                                                                
int main(int argc, char *argv[])                                                
{                                                                               
        int i, fd;                                                              
        char *data_array = (char*) 0x10200000; /* request a 2MiB aligned address with mmap() */
        int a;                                                                  
                                                                                
        fd = open("/dev/pmem0", O_RDWR);                                        
        if (fd < 0) {                                                           
                perror("fd");                                                   
                return 1;                                                       
        }                                                                       
                                                                                
        data_array = mmap(data_array, PAGE(0x300), PROT_READ|PROT_WRITE,        
                        MAP_SHARED, fd, PAGE(0));                               
                                                                                
        data_array[PAGE(0x0)] = 1;                                              
        close(fd);                                                              
        return 0;                                                               
} 

I believe what is happening is that I am doing a DAX PMD fault, and that fault
is happening at the same time that an IOCTL is doing a scan for partition
tables.  The above stack trace is for a scan of an efi_partition(), but I get a
similar stack trace going through sgi_partition(), ldm_partition(),
msdos_partition(), etc.  This looping is happening in check_partition(), as it
loops through the various partition types.

I think the issue essentially is that the partition scanning code doesn't have
a direct I/O code path, at least from what I have found.  This means that it's
not calling into DAX, and instead is just using the page cache.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-01-28 19:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-24 10:48 mm: WARNING in __delete_from_page_cache Dmitry Vyukov
2016-01-24 23:04 ` Kirill A. Shutemov
2016-01-25 12:22   ` Jan Kara
2016-01-26  3:42     ` Williams, Dan J
2016-01-26  8:29       ` Jan Kara
2016-01-26 12:54       ` Matthew Wilcox
2016-01-26 13:36         ` Jan Kara
2016-01-26 16:59           ` Williams, Dan J
2016-01-27 22:04             ` Ross Zwisler
2016-01-28 19:51               ` Ross Zwisler
2016-01-27 18:02 ` Ross Zwisler
2016-01-27 18:07   ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).