All of lore.kernel.org
 help / color / mirror / Atom feed
* Hole punch races in GFS2
@ 2021-04-22 11:26 ` Jan Kara
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2021-04-22 11:26 UTC (permalink / raw)
  To: Bob Peterson; +Cc: Andreas Gruenbacher, cluster-devel, linux-fsdevel

Hello,

I am looking into how GFS2 protects against races between hole punching and
things like page fault or readahead and AFAICT it seems it does not. In
particular is there anything that protects against a race like:

CPU1					CPU2
gfs2_fallocate()
  __gfs2_punch_hole()
    truncate_pagecache_range()
					gfs2_fault()
					  - faults in old data into page
					    cache
    punch_hole()

And now we have stale data in the page cache (data corruption). If
gfs2_page_mkwrite() sneaked in that window as well, we might be even racing
with writeback and are possibly corrupting the filesystem on disk. Is there
anything I'm missing?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] Hole punch races in GFS2
@ 2021-04-22 11:26 ` Jan Kara
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2021-04-22 11:26 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hello,

I am looking into how GFS2 protects against races between hole punching and
things like page fault or readahead and AFAICT it seems it does not. In
particular is there anything that protects against a race like:

CPU1					CPU2
gfs2_fallocate()
  __gfs2_punch_hole()
    truncate_pagecache_range()
					gfs2_fault()
					  - faults in old data into page
					    cache
    punch_hole()

And now we have stale data in the page cache (data corruption). If
gfs2_page_mkwrite() sneaked in that window as well, we might be even racing
with writeback and are possibly corrupting the filesystem on disk. Is there
anything I'm missing?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] Hole punch races in GFS2
       [not found] ` <CAHc6FU7kFWtQDAAU16h3xkWkSV2dm9WWb-RbjXBusJ9+118GRw@mail.gmail.com>
@ 2021-04-22 13:14   ` Jan Kara
  2021-04-22 15:57     ` Bob Peterson
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2021-04-22 13:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu 22-04-21 14:51:42, Andreas Gruenbacher wrote:
> Hi Jan,
> 
> On Thu, Apr 22, 2021 at 1:26 PM Jan Kara <jack@suse.cz> wrote:
> > I am looking into how GFS2 protects against races between hole punching and
> > things like page fault or readahead and AFAICT it seems it does not. In
> > particular is there anything that protects against a race like:
> >
> > CPU1                                    CPU2
> > gfs2_fallocate()
> >   __gfs2_punch_hole()
> >     truncate_pagecache_range()
> >                                         gfs2_fault()
> >                                           - faults in old data into page
> >                                             cache
> >     punch_hole()
> >
> > And now we have stale data in the page cache (data corruption). If
> > gfs2_page_mkwrite() sneaked in that window as well, we might be even racing
> > with writeback and are possibly corrupting the filesystem on disk. Is there
> > anything I'm missing?
> 
> This is controlled by the inode glock (ip->i_gl). Readers are holding
> a shared lock while writers are holding an exclusive lock, similar to
> an rwlock. Those locks take effect cluster wide (via DLM locks) and
> down to individual tasks locally. The only exception are resource
> group glocks, which use the LM_FLAG_NODE_SCOPE flag to indicate that
> exclusive locks should be shared locally. In that case, rgd->rd_mutex
> provides the exclusion among local tasks.

OK, thanks for explanation! I missed that GFS2 glocks are task local. But
then I have another question. We have the following:

gfs2_file_read_iter()
  grabs inode glock in shared mode
  generic_file_read_iter()
    filemap_read()
      copy_page_to_iter()
        <page fault>
        grabs mmap_sem
          gfs2_fault()
            - tries to grab inode glock again -> possible recursive locking

And even if the locking isn't recursive, you have glock->mmap_sem and
mmap_sem->glock lock orderings so ABBA deadlocks are possible AFAICT.

And there's a similar problem with the write path as well, just the lock is
grabbed exclusively.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] Hole punch races in GFS2
  2021-04-22 13:14   ` Jan Kara
@ 2021-04-22 15:57     ` Bob Peterson
  2021-04-22 16:11       ` Jan Kara
  2021-04-29 21:53       ` Andreas Gruenbacher
  0 siblings, 2 replies; 6+ messages in thread
From: Bob Peterson @ 2021-04-22 15:57 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
> OK, thanks for explanation! I missed that GFS2 glocks are task local. But
> then I have another question. We have the following:
> 
> gfs2_file_read_iter()
>   grabs inode glock in shared mode
>   generic_file_read_iter()
>     filemap_read()
>       copy_page_to_iter()
>         <page fault>
>         grabs mmap_sem
>           gfs2_fault()
>             - tries to grab inode glock again -> possible recursive locking
> 
> And even if the locking isn't recursive, you have glock->mmap_sem and
> mmap_sem->glock lock orderings so ABBA deadlocks are possible AFAICT.
> 
> And there's a similar problem with the write path as well, just the lock is
> grabbed exclusively.
> 
> 								Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

Hi Jan,

IIRC, gfs2 tries to prevent the page fault by having gfs2_file_read_iter
call into generic_file_read_iter twice:

The first time without the glock held, but specifying IOCB_NOIO, which
brings the pages in and returns -EAGAIN. Then it acquires the glock,
then calls into generic_file_read_iter a second time.

I'm not sure if that prevents the deadlock or just makes it less likely.
I still need to investigate that.

Regards,

Bob Peterson
Red Hat File Systems



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] Hole punch races in GFS2
  2021-04-22 15:57     ` Bob Peterson
@ 2021-04-22 16:11       ` Jan Kara
  2021-04-29 21:53       ` Andreas Gruenbacher
  1 sibling, 0 replies; 6+ messages in thread
From: Jan Kara @ 2021-04-22 16:11 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu 22-04-21 11:57:14, Bob Peterson wrote:
> ----- Original Message -----
> > OK, thanks for explanation! I missed that GFS2 glocks are task local. But
> > then I have another question. We have the following:
> > 
> > gfs2_file_read_iter()
> >   grabs inode glock in shared mode
> >   generic_file_read_iter()
> >     filemap_read()
> >       copy_page_to_iter()
> >         <page fault>
> >         grabs mmap_sem
> >           gfs2_fault()
> >             - tries to grab inode glock again -> possible recursive locking
> > 
> > And even if the locking isn't recursive, you have glock->mmap_sem and
> > mmap_sem->glock lock orderings so ABBA deadlocks are possible AFAICT.
> > 
> > And there's a similar problem with the write path as well, just the lock is
> > grabbed exclusively.
> > 
> > 								Honza
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
> 
> Hi Jan,
> 
> IIRC, gfs2 tries to prevent the page fault by having gfs2_file_read_iter
> call into generic_file_read_iter twice:
> 
> The first time without the glock held, but specifying IOCB_NOIO, which
> brings the pages in and returns -EAGAIN. Then it acquires the glock,
> then calls into generic_file_read_iter a second time.
> 
> I'm not sure if that prevents the deadlock or just makes it less likely.
> I still need to investigate that.

Aha, I missed that trick. Thanks for pointing that out. But indeed this
just makes the race harder to hit. If already the first page is not in the
page cache and needs to be fetched from the disk, nothing is faulted in and
you have to do everything in the call with glock held. And even if you
prefaulted the whole buffer, there's no guarantee it doesn't get reclaimed
again before you reach copy_page_to_iter().

Also the write path doesn't seem to play any tricks like this and is prone
to the same kind of problem:

iomap_file_buffered_write()
  iomap_apply()
    gfs2_iomap_begin() - takes glock
    iomap_write_actor()
      iov_iter_fault_in_readable()
        <fault>

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] Hole punch races in GFS2
  2021-04-22 15:57     ` Bob Peterson
  2021-04-22 16:11       ` Jan Kara
@ 2021-04-29 21:53       ` Andreas Gruenbacher
  1 sibling, 0 replies; 6+ messages in thread
From: Andreas Gruenbacher @ 2021-04-29 21:53 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Apr 22, 2021 at 5:57 PM Bob Peterson <rpeterso@redhat.com> wrote:
> ----- Original Message -----
> > OK, thanks for explanation! I missed that GFS2 glocks are task local. But
> > then I have another question. We have the following:
> >
> > gfs2_file_read_iter()
> >   grabs inode glock in shared mode
> >   generic_file_read_iter()
> >     filemap_read()
> >       copy_page_to_iter()
> >         <page fault>
> >         grabs mmap_sem
> >           gfs2_fault()
> >             - tries to grab inode glock again -> possible recursive locking
> >
> > And even if the locking isn't recursive, you have glock->mmap_sem and
> > mmap_sem->glock lock orderings so ABBA deadlocks are possible AFAICT.
> >
> > And there's a similar problem with the write path as well, just the lock is
> > grabbed exclusively.
> >
> >                                                               Honza
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
>
> Hi Jan,
>
> IIRC, gfs2 tries to prevent the page fault by having gfs2_file_read_iter
> call into generic_file_read_iter twice:
>
> The first time without the glock held, but specifying IOCB_NOIO, which
> brings the pages in and returns -EAGAIN. Then it acquires the glock,
> then calls into generic_file_read_iter a second time.

The IOCB_NOIO generic_file_read_iter call is there for lockless reads,
not there for faulting pages in at all. See commit 20f829999c38
("gfs2: Rework read and page fault locking").

Andreas



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-04-29 21:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-22 11:26 Hole punch races in GFS2 Jan Kara
2021-04-22 11:26 ` [Cluster-devel] " Jan Kara
     [not found] ` <CAHc6FU7kFWtQDAAU16h3xkWkSV2dm9WWb-RbjXBusJ9+118GRw@mail.gmail.com>
2021-04-22 13:14   ` Jan Kara
2021-04-22 15:57     ` Bob Peterson
2021-04-22 16:11       ` Jan Kara
2021-04-29 21:53       ` Andreas Gruenbacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.