All of lore.kernel.org
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@gmail.com>
To: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	"hch@infradead.org" <hch@infradead.org>,
	"dsterba@suse.cz" <dsterba@suse.cz>
Subject: Re: [PATCH 4/7] btrfs: Switch to iomap_dio_rw() for dio
Date: Fri, 5 Jun 2020 16:17:09 +0100	[thread overview]
Message-ID: <CAL3q7H6Hvjzi_bcKZFprApkrnoUDFGHO9wD52xSjS1rZ0_cbVA@mail.gmail.com> (raw)
In-Reply-To: <CAL3q7H4SMOJEkAgQEd+i=yeJP20Mv7AthbxaE2==BVr=SGtyjg@mail.gmail.com>

On Wed, Jun 3, 2020 at 12:35 PM Filipe Manana <fdmanana@gmail.com> wrote:
>
> On Thu, May 28, 2020 at 7:38 PM Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
> >
> > On 17:45 28/05, Filipe Manana wrote:
> > > On Thu, May 28, 2020 at 5:34 PM Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
> > > > > And who locked the extent range before?
> > > >
> > > > This is usually locked by a previous buffered write or read.
> > >
> > > A previous buffered write/read that has already finished or is still
> > > in progress?
> > >
> > > Because if it has finished we're not supposed to have the file range
> > > locked anymore.
> >
> > In progress. Mixing buffered I/O with direct writes.
> >
> > >
> > > >
> > > > >
> > > > > That seems alarming to me, specially if it's a direct IO write failing
> > > > > to invalidate the page cache, since a subsequent buffered read could
> > > > > get stale data (what's in the page cache), and not what the direct IO
> > > > > write wrote.
> > > > >
> > > > > Can you elaborate more on all those details?
> > > >
> > > > The origin of the message is when iomap_dio_rw() tries to invalidate the
> > > > inode pages, but fails and calls dio_warn_stale_pagecache().
> > > >
> > > > In the vanilla code, generic_file_direct_write() aborts direct writes
> > > > and returns 0 so that it may fallback to buffered I/O. Perhaps this
> > > > should be changed in iomap_dio_rw() as well. I will write a patch to
> > > > accomodate that.
> > >
> > > On vanilla we have no problems of mixing buffered and direct
> > > operations as long as they are done sequentially at least.
> > > And even if done concurrently we take several measures to ensure that
> > > are no surprises (locking ranges, waiting for any ordered extents in
> > > progress, etc).
> >
> > Yes, it is because of the code in generic_file_direct_write(). Anyways,
> > I did some tests with the following patch, and it seems to work. I will
> > send a formal patch to so that it gets incorporated in iomap sequence as
> > well.
> >
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index e4addfc58107..215315be6233 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -483,9 +483,15 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> >          */
> >         ret = invalidate_inode_pages2_range(mapping,
> >                         pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
> > -       if (ret)
> > -               dio_warn_stale_pagecache(iocb->ki_filp);
> > -       ret = 0;
> > +       /*
> > +        * If a page can not be invalidated, return 0 to fall back
> > +        * to buffered write.
> > +        */
> > +       if (ret) {
> > +               if (ret == -EBUSY)
> > +                       ret = 0;
> > +               goto out_free_dio;
> > +       }
> >
> >         if (iov_iter_rw(iter) == WRITE && !wait_for_completion &&
> >             !inode->i_sb->s_dio_done_wq) {
> >
> >
>
> Thanks. As I just replied on another thread for that patch, we
> actually have a regression.
> There's more than the annoying warning in dmesg, it also sets -EIO on
> the inode's mapping and makes future fsyncs return that error despite
> the fact that no actual errors or corruptions happened:
>
> https://patchwork.kernel.org/patch/11576677/
>

There's also some deadlock/hang, I have triggered it twice today with
generic/113 on two different VMs:

[14621.297370] INFO: task kworker/1:117:15962 blocked for more than 120 seconds.
[14621.298491]       Not tainted 5.7.0-rc7-btrfs-next-59 #1
[14621.299231] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14621.300523] kworker/1:117   D    0 15962      2 0x80004000
[14621.301558] Workqueue: dio/sdb iomap_dio_complete_work
[14621.302389] Call Trace:
[14621.302877]  __schedule+0x384/0xa30
[14621.303555]  schedule+0x33/0xe0
[14621.304167]  rwsem_down_write_slowpath+0x2c2/0x750
[14621.305121]  ? btrfs_sync_file+0x1fe/0x4d0 [btrfs]
[14621.306217]  btrfs_sync_file+0x1fe/0x4d0 [btrfs]
[14621.307113]  iomap_dio_complete+0x11b/0x260
[14621.307888]  ? aio_fsync_work+0x5b0/0x5b0
[14621.308585]  iomap_dio_complete_work+0x17/0x30
[14621.309476]  process_one_work+0x275/0x6b0
[14621.310275]  worker_thread+0x4f/0x3e0
[14621.310869]  ? process_one_work+0x6b0/0x6b0
[14621.311403]  kthread+0x12a/0x170
[14621.311819]  ? kthread_create_worker_on_cpu+0x70/0x70
[14621.312460]  ret_from_fork+0x3a/0x50
[14621.312983] INFO: task kworker/1:199:16063 blocked for more than 120 seconds.
[14621.313921]       Not tainted 5.7.0-rc7-btrfs-next-59 #1
[14621.314680] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14621.315724] kworker/1:199   D    0 16063      2 0x80004000
[14621.316445] Workqueue: dio/sdb iomap_dio_complete_work
[14621.317101] Call Trace:
[14621.317437]  __schedule+0x384/0xa30
[14621.317928]  schedule+0x33/0xe0
[14621.318339]  rwsem_down_write_slowpath+0x2c2/0x750
[14621.318981]  ? btrfs_sync_file+0x1fe/0x4d0 [btrfs]
[14621.319609]  btrfs_sync_file+0x1fe/0x4d0 [btrfs]
[14621.320203]  iomap_dio_complete+0x11b/0x260
[14621.320721]  ? aio_fsync_work+0x5b0/0x5b0
[14621.321249]  iomap_dio_complete_work+0x17/0x30
[14621.321844]  process_one_work+0x275/0x6b0
[14621.322376]  worker_thread+0x4f/0x3e0
[14621.322871]  ? process_one_work+0x6b0/0x6b0
[14621.323408]  kthread+0x12a/0x170
[14621.323827]  ? kthread_create_worker_on_cpu+0x70/0x70
[14621.324473]  ret_from_fork+0x3a/0x50
[14621.324983] INFO: task aio-stress:16274 blocked for more than 120 seconds.
[14621.325896]       Not tainted 5.7.0-rc7-btrfs-next-59 #1
[14621.326579] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14621.327580] aio-stress      D    0 16274  14855 0x00004000
[14621.328280] Call Trace:
[14621.328602]  __schedule+0x384/0xa30
[14621.329056]  schedule+0x33/0xe0
[14621.329478]  rwsem_down_write_slowpath+0x2c2/0x750
[14621.330118]  ? btrfs_sync_file+0x219/0x4d0 [btrfs]
[14621.330747]  btrfs_sync_file+0x219/0x4d0 [btrfs]
[14621.331346]  iomap_dio_complete+0x11b/0x260
[14621.331886]  iomap_dio_rw+0x3bc/0x4c0
[14621.332372]  ? btrfs_file_write_iter+0x645/0x870 [btrfs]
[14621.333076]  btrfs_file_write_iter+0x645/0x870 [btrfs]
[14621.333749]  aio_write+0x148/0x1d0
[14621.334196]  ? lock_acquire+0xb1/0x4a0
[14621.334682]  ? __might_fault+0x3e/0x90
[14621.335172]  ? __fget_files+0x132/0x270
[14621.335668]  ? io_submit_one+0x946/0x1630
[14621.336184]  io_submit_one+0x946/0x1630
[14621.336680]  ? lock_acquire+0xb1/0x4a0
[14621.337175]  ? __might_fault+0x3e/0x90
[14621.337707]  ? __x64_sys_io_submit+0x9c/0x330
[14621.338269]  __x64_sys_io_submit+0x9c/0x330
[14621.338812]  ? do_syscall_64+0x5c/0x280
[14621.339303]  do_syscall_64+0x5c/0x280
[14621.339774]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[14621.340416] RIP: 0033:0x7fb6cd395717
[14621.340875] Code: Bad RIP value.
[14621.341304] RSP: 002b:00007fb6bf7e1de8 EFLAGS: 00000202 ORIG_RAX:
00000000000000d1
[14621.342262] RAX: ffffffffffffffda RBX: 0000560d3d92ea60 RCX: 00007fb6cd395717
[14621.343180] RDX: 0000560d3d92ea60 RSI: 0000000000000008 RDI: 00007fb6cdb32000
[14621.344081] RBP: 0000000000000008 R08: 0000150e50ac6651 R09: 00000000003081a8
[14621.344981] R10: 00007fb6bf7e1df0 R11: 0000000000000202 R12: 0000560d3d8fe110
[14621.345897] R13: 00007fb6bf7e1e10 R14: 00007fb6bf7e1e00 R15: 0000560d3d8fe110
[14621.346820] INFO: lockdep is turned off.
[14742.125500] INFO: task kworker/1:117:15962 blocked for more than 241 seconds.
[14742.126456]       Not tainted 5.7.0-rc7-btrfs-next-59 #1
[14742.127156] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14742.128156] kworker/1:117   D    0 15962      2 0x80004000
[14742.128875] Workqueue: dio/sdb iomap_dio_complete_work
[14742.129633] Call Trace:
[14742.130010]  __schedule+0x384/0xa30
[14742.130494]  schedule+0x33/0xe0
[14742.131068]  rwsem_down_write_slowpath+0x2c2/0x750
[14742.131956]  ? btrfs_sync_file+0x1fe/0x4d0 [btrfs]
[14742.132834]  btrfs_sync_file+0x1fe/0x4d0 [btrfs]
[14742.133712]  iomap_dio_complete+0x11b/0x260
[14742.134475]  ? aio_fsync_work+0x5b0/0x5b0
[14742.135205]  iomap_dio_complete_work+0x17/0x30
[14742.136018]  process_one_work+0x275/0x6b0
[14742.136677]  worker_thread+0x4f/0x3e0
[14742.137154]  ? process_one_work+0x6b0/0x6b0
[14742.137805]  kthread+0x12a/0x170
[14742.138236]  ? kthread_create_worker_on_cpu+0x70/0x70
[14742.138901]  ret_from_fork+0x3a/0x50
[14742.139389] INFO: task kworker/1:199:16063 blocked for more than 241 seconds.
[14742.140305]       Not tainted 5.7.0-rc7-btrfs-next-59 #1
[14742.140998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14742.142056] kworker/1:199   D    0 16063      2 0x80004000
[14742.142877] Workqueue: dio/sdb iomap_dio_complete_work
[14742.143397] Call Trace:
[14742.143654]  __schedule+0x384/0xa30
[14742.144017]  schedule+0x33/0xe0
[14742.144352]  rwsem_down_write_slowpath+0x2c2/0x750
[14742.144859]  ? btrfs_sync_file+0x1fe/0x4d0 [btrfs]
[14742.145386]  btrfs_sync_file+0x1fe/0x4d0 [btrfs]
[14742.145863]  iomap_dio_complete+0x11b/0x260
[14742.146289]  ? aio_fsync_work+0x5b0/0x5b0
[14742.146701]  iomap_dio_complete_work+0x17/0x30
[14742.147168]  process_one_work+0x275/0x6b0
[14742.147579]  worker_thread+0x4f/0x3e0
[14742.147954]  ? process_one_work+0x6b0/0x6b0
[14742.148377]  kthread+0x12a/0x170
[14742.148722]  ? kthread_create_worker_on_cpu+0x70/0x70
[14742.149257]  ret_from_fork+0x3a/0x50
[14742.149671] INFO: task aio-stress:16274 blocked for more than 241 seconds.
[14742.150376]       Not tainted 5.7.0-rc7-btrfs-next-59 #1
[14742.150948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14742.151962] aio-stress      D    0 16274  14855 0x00004000
(...)

Thanks.

>
>
> >
> > --
> > Goldwyn
>
>
>
> --
> Filipe David Manana,
>
> “Whether you think you can, or you think you can't — you're right.”



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

  reply	other threads:[~2020-06-05 15:17 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-22 12:38 [PATCH 0/7 v8] btrfs direct-io using iomap Goldwyn Rodrigues
2020-05-22 12:38 ` [PATCH 1/7] fs: Export generic_file_buffered_read() Goldwyn Rodrigues
2020-05-25 12:25   ` David Sterba
2020-05-22 12:38 ` [PATCH 2/7] iomap: add a filesystem hook for direct I/O bio submission Goldwyn Rodrigues
2020-05-22 12:38 ` [PATCH 3/7] iomap: Remove lockdep_assert_held() Goldwyn Rodrigues
2020-05-28 15:40   ` Darrick J. Wong
2020-05-28 16:45     ` Goldwyn Rodrigues
2020-05-22 12:38 ` [PATCH 4/7] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
2020-05-26 15:03   ` Johannes Thumshirn
2020-05-26 16:44     ` Goldwyn Rodrigues
2020-05-28 15:13       ` Filipe Manana
2020-05-28 16:34         ` Goldwyn Rodrigues
2020-05-28 16:45           ` Filipe Manana
2020-05-28 18:38             ` Goldwyn Rodrigues
2020-06-03 11:35               ` Filipe Manana
2020-06-05 15:17                 ` Filipe Manana [this message]
2020-06-05 20:43                   ` Goldwyn Rodrigues
2020-06-06  9:57                     ` Filipe Manana
2020-06-08 15:39                       ` Goldwyn Rodrigues
2020-05-22 12:38 ` [PATCH 5/7] fs: Remove dio_end_io() Goldwyn Rodrigues
2020-05-22 12:38 ` [PATCH 6/7] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK Goldwyn Rodrigues
2020-05-22 12:38 ` [PATCH 7/7] btrfs: btrfs: split btrfs_direct_IO Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL3q7H6Hvjzi_bcKZFprApkrnoUDFGHO9wD52xSjS1rZ0_cbVA@mail.gmail.com \
    --to=fdmanana@gmail.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=dsterba@suse.cz \
    --cc=hch@infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rgoldwyn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.