All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] Possible fs corruption when hole punch races with other ops
@ 2021-04-21 16:29 Jan Kara
  2021-04-22  3:22 ` Joseph Qi
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2021-04-21 16:29 UTC (permalink / raw)
  To: Joseph Qi; +Cc: ocfs2-devel

Hello,

I'm unifying protection various filesystems use to protect hole punch
operations from racing with other operations (like readahead, page fault,
writepage etc.). I was looking into OCFS2 and I think it is prone to a
following race which can possibly lead to filesystem corruption. But maybe
I miss something so that's why I'm writing here. The scenario I'm concerned
about is:

CPU1					CPU2
ocfs2_remove_inode_range()		ocfs2_writepage()
  ...					  block_write_full_page()
  ocfs2_remove_btree_range()		    ocfs2_extent_map_get_blocks()

Now ocfs2_extent_map_get_blocks() runs without protection of ip_alloc_sem
AFAICT and so both these operations can be modifying extent map at the same
time? What am I missing?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ocfs2-devel] Possible fs corruption when hole punch races with other ops
  2021-04-21 16:29 [Ocfs2-devel] Possible fs corruption when hole punch races with other ops Jan Kara
@ 2021-04-22  3:22 ` Joseph Qi
  2021-04-22 10:44   ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: Joseph Qi @ 2021-04-22  3:22 UTC (permalink / raw)
  To: Jan Kara; +Cc: ocfs2-devel

Hi,

Checked the code flow, it seems the race you worried truly exists.
We have to take ip_alloc_sem before calling into ocfs2_get_block().

Thanks,
Joseph

On 4/22/21 12:29 AM, Jan Kara wrote:
> Hello,
> 
> I'm unifying protection various filesystems use to protect hole punch
> operations from racing with other operations (like readahead, page fault,
> writepage etc.). I was looking into OCFS2 and I think it is prone to a
> following race which can possibly lead to filesystem corruption. But maybe
> I miss something so that's why I'm writing here. The scenario I'm concerned
> about is:
> 
> CPU1					CPU2
> ocfs2_remove_inode_range()		ocfs2_writepage()
>   ...					  block_write_full_page()
>   ocfs2_remove_btree_range()		    ocfs2_extent_map_get_blocks()
> 
> Now ocfs2_extent_map_get_blocks() runs without protection of ip_alloc_sem
> AFAICT and so both these operations can be modifying extent map at the same
> time? What am I missing?
> 
> 								Honza
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ocfs2-devel] Possible fs corruption when hole punch races with other ops
  2021-04-22  3:22 ` Joseph Qi
@ 2021-04-22 10:44   ` Jan Kara
  2021-04-22 15:56     ` Wengang Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2021-04-22 10:44 UTC (permalink / raw)
  To: Joseph Qi; +Cc: Jan Kara, ocfs2-devel

Hello!

On Thu 22-04-21 11:22:07, Joseph Qi wrote:
> Checked the code flow, it seems the race you worried truly exists.
> We have to take ip_alloc_sem before calling into ocfs2_get_block().

OK, that's what I assumed but this won't be easy. Because ocfs2_writepage()
is called with a page locked but ocfs2_remove_inode_range() gets called
under ip_alloc_sem and locks pages in ocfs2_truncate_cluster_pages() ->
truncate_inode_pages_range(). Which creates ABBA deadlock. So you'll
probably need to come up with a similar dance like in ocfs2_readpage().

But after fixing this, ip_alloc_sem should provide you with enough
protection so that ocfs2 isn't prone to races between hole punch and
readahead / fault my work will be mostly irrelevant for OCFS2.

Thanks for confirmation!

								Honza

> On 4/22/21 12:29 AM, Jan Kara wrote:
> > Hello,
> > 
> > I'm unifying protection various filesystems use to protect hole punch
> > operations from racing with other operations (like readahead, page fault,
> > writepage etc.). I was looking into OCFS2 and I think it is prone to a
> > following race which can possibly lead to filesystem corruption. But maybe
> > I miss something so that's why I'm writing here. The scenario I'm concerned
> > about is:
> > 
> > CPU1					CPU2
> > ocfs2_remove_inode_range()		ocfs2_writepage()
> >   ...					  block_write_full_page()
> >   ocfs2_remove_btree_range()		    ocfs2_extent_map_get_blocks()
> > 
> > Now ocfs2_extent_map_get_blocks() runs without protection of ip_alloc_sem
> > AFAICT and so both these operations can be modifying extent map at the same
> > time? What am I missing?
> > 
> > 								Honza
> > 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ocfs2-devel] Possible fs corruption when hole punch races with other ops
  2021-04-22 10:44   ` Jan Kara
@ 2021-04-22 15:56     ` Wengang Wang
  2021-04-22 16:06       ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Wengang Wang @ 2021-04-22 15:56 UTC (permalink / raw)
  To: Jan Kara; +Cc: ocfs2-devel ML


[-- Attachment #1.1: Type: text/plain, Size: 2290 bytes --]

Hi Honza,

By “hole punching operation”, you meant deallocate some data blocks from a file and keep the file size unchanged, after that operation some “holes" are made in that file, right?
I am just curious that do we really have end user programs that do hole punching on files? Or hole punching is only used for development purpose?

thanks,
wengang

On Apr 22, 2021, at 3:44 AM, Jan Kara <jack@suse.cz<mailto:jack@suse.cz>> wrote:

Hello!

On Thu 22-04-21 11:22:07, Joseph Qi wrote:
Checked the code flow, it seems the race you worried truly exists.
We have to take ip_alloc_sem before calling into ocfs2_get_block().

OK, that's what I assumed but this won't be easy. Because ocfs2_writepage()
is called with a page locked but ocfs2_remove_inode_range() gets called
under ip_alloc_sem and locks pages in ocfs2_truncate_cluster_pages() ->
truncate_inode_pages_range(). Which creates ABBA deadlock. So you'll
probably need to come up with a similar dance like in ocfs2_readpage().

But after fixing this, ip_alloc_sem should provide you with enough
protection so that ocfs2 isn't prone to races between hole punch and
readahead / fault my work will be mostly irrelevant for OCFS2.

Thanks for confirmation!

Honza

On 4/22/21 12:29 AM, Jan Kara wrote:
Hello,

I'm unifying protection various filesystems use to protect hole punch
operations from racing with other operations (like readahead, page fault,
writepage etc.). I was looking into OCFS2 and I think it is prone to a
following race which can possibly lead to filesystem corruption. But maybe
I miss something so that's why I'm writing here. The scenario I'm concerned
about is:

CPU1 CPU2
ocfs2_remove_inode_range() ocfs2_writepage()
 ...   block_write_full_page()
 ocfs2_remove_btree_range()     ocfs2_extent_map_get_blocks()

Now ocfs2_extent_map_get_blocks() runs without protection of ip_alloc_sem
AFAICT and so both these operations can be modifying extent map at the same
time? What am I missing?

Honza

--
Jan Kara <jack@suse.com<mailto:jack@suse.com>>
SUSE Labs, CR

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com<mailto:Ocfs2-devel@oss.oracle.com>
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[-- Attachment #1.2: Type: text/html, Size: 24879 bytes --]

[-- Attachment #2: Type: text/plain, Size: 151 bytes --]

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ocfs2-devel] Possible fs corruption when hole punch races with other ops
  2021-04-22 15:56     ` Wengang Wang
@ 2021-04-22 16:06       ` Darrick J. Wong
  2021-04-22 16:36         ` Wengang Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2021-04-22 16:06 UTC (permalink / raw)
  To: Wengang Wang; +Cc: Jan Kara, ocfs2-devel ML

On Thu, Apr 22, 2021 at 03:56:15PM +0000, Wengang Wang wrote:
> Hi Honza,
> 
> By “hole punching operation”, you meant deallocate some data blocks
> from a file and keep the file size unchanged, after that operation
> some “holes" are made in that file, right?
> I am just curious that do we really have end user programs that do
> hole punching on files? Or hole punching is only used for development
> purpose?

Yes, we do.  QEMU does it to image files, various $database products,
etc...

--D

> 
> thanks,
> wengang
> 
> On Apr 22, 2021, at 3:44 AM, Jan Kara <jack@suse.cz<mailto:jack@suse.cz>> wrote:
> 
> Hello!
> 
> On Thu 22-04-21 11:22:07, Joseph Qi wrote:
> Checked the code flow, it seems the race you worried truly exists.
> We have to take ip_alloc_sem before calling into ocfs2_get_block().
> 
> OK, that's what I assumed but this won't be easy. Because ocfs2_writepage()
> is called with a page locked but ocfs2_remove_inode_range() gets called
> under ip_alloc_sem and locks pages in ocfs2_truncate_cluster_pages() ->
> truncate_inode_pages_range(). Which creates ABBA deadlock. So you'll
> probably need to come up with a similar dance like in ocfs2_readpage().
> 
> But after fixing this, ip_alloc_sem should provide you with enough
> protection so that ocfs2 isn't prone to races between hole punch and
> readahead / fault my work will be mostly irrelevant for OCFS2.
> 
> Thanks for confirmation!
> 
> Honza
> 
> On 4/22/21 12:29 AM, Jan Kara wrote:
> Hello,
> 
> I'm unifying protection various filesystems use to protect hole punch
> operations from racing with other operations (like readahead, page fault,
> writepage etc.). I was looking into OCFS2 and I think it is prone to a
> following race which can possibly lead to filesystem corruption. But maybe
> I miss something so that's why I'm writing here. The scenario I'm concerned
> about is:
> 
> CPU1 CPU2
> ocfs2_remove_inode_range() ocfs2_writepage()
>  ...   block_write_full_page()
>  ocfs2_remove_btree_range()     ocfs2_extent_map_get_blocks()
> 
> Now ocfs2_extent_map_get_blocks() runs without protection of ip_alloc_sem
> AFAICT and so both these operations can be modifying extent map at the same
> time? What am I missing?
> 
> Honza
> 
> --
> Jan Kara <jack@suse.com<mailto:jack@suse.com>>
> SUSE Labs, CR
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com<mailto:Ocfs2-devel@oss.oracle.com>
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ocfs2-devel] Possible fs corruption when hole punch races with other ops
  2021-04-22 16:06       ` Darrick J. Wong
@ 2021-04-22 16:36         ` Wengang Wang
  2021-04-22 16:46           ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Wengang Wang @ 2021-04-22 16:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Jan Kara, ocfs2-devel ML



> On Apr 22, 2021, at 9:06 AM, Darrick J. Wong <djwong@kernel.org> wrote:
> 
> On Thu, Apr 22, 2021 at 03:56:15PM +0000, Wengang Wang wrote:
>> Hi Honza,
>> 
>> By “hole punching operation”, you meant deallocate some data blocks
>> from a file and keep the file size unchanged, after that operation
>> some “holes" are made in that file, right?
>> I am just curious that do we really have end user programs that do
>> hole punching on files? Or hole punching is only used for development
>> purpose?
> 
> Yes, we do.  QEMU does it to image files, various $database products,

Hm, to run a hole punching, that means the data blocks to deallocate were useful but now are no longer useful. 
For QEMU, it does hole punching at guest run time for guest FS trims? Don’t know in what cases DBs make use of hole punching.
Well, good to know.

Thanks Darrick.

wengang

> etc...
> 
> --D
> 
>> 
>> thanks,
>> wengang
>> 
>> On Apr 22, 2021, at 3:44 AM, Jan Kara <jack@suse.cz<mailto:jack@suse.cz>> wrote:
>> 
>> Hello!
>> 
>> On Thu 22-04-21 11:22:07, Joseph Qi wrote:
>> Checked the code flow, it seems the race you worried truly exists.
>> We have to take ip_alloc_sem before calling into ocfs2_get_block().
>> 
>> OK, that's what I assumed but this won't be easy. Because ocfs2_writepage()
>> is called with a page locked but ocfs2_remove_inode_range() gets called
>> under ip_alloc_sem and locks pages in ocfs2_truncate_cluster_pages() ->
>> truncate_inode_pages_range(). Which creates ABBA deadlock. So you'll
>> probably need to come up with a similar dance like in ocfs2_readpage().
>> 
>> But after fixing this, ip_alloc_sem should provide you with enough
>> protection so that ocfs2 isn't prone to races between hole punch and
>> readahead / fault my work will be mostly irrelevant for OCFS2.
>> 
>> Thanks for confirmation!
>> 
>> Honza
>> 
>> On 4/22/21 12:29 AM, Jan Kara wrote:
>> Hello,
>> 
>> I'm unifying protection various filesystems use to protect hole punch
>> operations from racing with other operations (like readahead, page fault,
>> writepage etc.). I was looking into OCFS2 and I think it is prone to a
>> following race which can possibly lead to filesystem corruption. But maybe
>> I miss something so that's why I'm writing here. The scenario I'm concerned
>> about is:
>> 
>> CPU1 CPU2
>> ocfs2_remove_inode_range() ocfs2_writepage()
>> ...   block_write_full_page()
>> ocfs2_remove_btree_range()     ocfs2_extent_map_get_blocks()
>> 
>> Now ocfs2_extent_map_get_blocks() runs without protection of ip_alloc_sem
>> AFAICT and so both these operations can be modifying extent map at the same
>> time? What am I missing?
>> 
>> Honza
>> 
>> --
>> Jan Kara <jack@suse.com<mailto:jack@suse.com>>
>> SUSE Labs, CR
>> 
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com<mailto:Ocfs2-devel@oss.oracle.com>
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>> 
> 
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ocfs2-devel] Possible fs corruption when hole punch races with other ops
  2021-04-22 16:36         ` Wengang Wang
@ 2021-04-22 16:46           ` Darrick J. Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2021-04-22 16:46 UTC (permalink / raw)
  To: Wengang Wang; +Cc: Jan Kara, ocfs2-devel ML

On Thu, Apr 22, 2021 at 04:36:23PM +0000, Wengang Wang wrote:
> 
> 
> > On Apr 22, 2021, at 9:06 AM, Darrick J. Wong <djwong@kernel.org> wrote:
> > 
> > On Thu, Apr 22, 2021 at 03:56:15PM +0000, Wengang Wang wrote:
> >> Hi Honza,
> >> 
> >> By “hole punching operation”, you meant deallocate some data blocks
> >> from a file and keep the file size unchanged, after that operation
> >> some “holes" are made in that file, right?
> >> I am just curious that do we really have end user programs that do
> >> hole punching on files? Or hole punching is only used for development
> >> purpose?
> > 
> > Yes, we do.  QEMU does it to image files, various $database products,
> 
> Hm, to run a hole punching, that means the data blocks to deallocate
> were useful but now are no longer useful. 
> For QEMU, it does hole punching at guest run time for guest FS trims?

Yes.  SCSI discard commands are implemented as hole punching if the
backing device is a raw file on a filesystem.

--D

> Don’t know in what cases DBs make use of hole punching.
> Well, good to know.
> 
> Thanks Darrick.
> 
> wengang
> 
> > etc...
> > 
> > --D
> > 
> >> 
> >> thanks,
> >> wengang
> >> 
> >> On Apr 22, 2021, at 3:44 AM, Jan Kara <jack@suse.cz<mailto:jack@suse.cz>> wrote:
> >> 
> >> Hello!
> >> 
> >> On Thu 22-04-21 11:22:07, Joseph Qi wrote:
> >> Checked the code flow, it seems the race you worried truly exists.
> >> We have to take ip_alloc_sem before calling into ocfs2_get_block().
> >> 
> >> OK, that's what I assumed but this won't be easy. Because ocfs2_writepage()
> >> is called with a page locked but ocfs2_remove_inode_range() gets called
> >> under ip_alloc_sem and locks pages in ocfs2_truncate_cluster_pages() ->
> >> truncate_inode_pages_range(). Which creates ABBA deadlock. So you'll
> >> probably need to come up with a similar dance like in ocfs2_readpage().
> >> 
> >> But after fixing this, ip_alloc_sem should provide you with enough
> >> protection so that ocfs2 isn't prone to races between hole punch and
> >> readahead / fault my work will be mostly irrelevant for OCFS2.
> >> 
> >> Thanks for confirmation!
> >> 
> >> Honza
> >> 
> >> On 4/22/21 12:29 AM, Jan Kara wrote:
> >> Hello,
> >> 
> >> I'm unifying protection various filesystems use to protect hole punch
> >> operations from racing with other operations (like readahead, page fault,
> >> writepage etc.). I was looking into OCFS2 and I think it is prone to a
> >> following race which can possibly lead to filesystem corruption. But maybe
> >> I miss something so that's why I'm writing here. The scenario I'm concerned
> >> about is:
> >> 
> >> CPU1 CPU2
> >> ocfs2_remove_inode_range() ocfs2_writepage()
> >> ...   block_write_full_page()
> >> ocfs2_remove_btree_range()     ocfs2_extent_map_get_blocks()
> >> 
> >> Now ocfs2_extent_map_get_blocks() runs without protection of ip_alloc_sem
> >> AFAICT and so both these operations can be modifying extent map at the same
> >> time? What am I missing?
> >> 
> >> Honza
> >> 
> >> --
> >> Jan Kara <jack@suse.com<mailto:jack@suse.com>>
> >> SUSE Labs, CR
> >> 
> >> _______________________________________________
> >> Ocfs2-devel mailing list
> >> Ocfs2-devel@oss.oracle.com<mailto:Ocfs2-devel@oss.oracle.com>
> >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> >> 
> > 
> >> _______________________________________________
> >> Ocfs2-devel mailing list
> >> Ocfs2-devel@oss.oracle.com
> >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> > 
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-04-22 16:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21 16:29 [Ocfs2-devel] Possible fs corruption when hole punch races with other ops Jan Kara
2021-04-22  3:22 ` Joseph Qi
2021-04-22 10:44   ` Jan Kara
2021-04-22 15:56     ` Wengang Wang
2021-04-22 16:06       ` Darrick J. Wong
2021-04-22 16:36         ` Wengang Wang
2021-04-22 16:46           ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.