All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fix bmap-vs-truncate race
@ 2009-03-30 19:20 Mikulas Patocka
  2009-03-31 14:48 ` Aneesh Kumar K.V
  2009-03-31 17:54 ` Christoph Hellwig
  0 siblings, 2 replies; 6+ messages in thread
From: Mikulas Patocka @ 2009-03-30 19:20 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, linux-kernel

Hi

I'm submitting this patch for 2.6.30 merge window.


FAT filesystem used down_read(&mapping->host->i_alloc_sem) to prevent 
a race between bmap and truncate. However, such race is present in all the
other filesystems --- it is generally assumed that blocks queried with
get_block won't disappear while get_block is in progress.

The race can be only triggered by root, non-privileged users can't use
bmap, so it is not a security issue (unless there is some program run
by root that bmaps users' files).

This patch fixes the race in a generic way, in all the filesystems. If some
filesystem employs its own locking and doesn't want to take i_alloc_sem
(I don't know about any, where taking i_alloc_sem could be problem),
let it use its own function and not generic_block_bmap.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 fs/buffer.c    |    8 ++++++++
 fs/fat/inode.c |    2 --
 2 files changed, 8 insertions(+), 2 deletions(-)

Index: linux-2.6.29-devel/fs/buffer.c
===================================================================
--- linux-2.6.29-devel.orig/fs/buffer.c	2009-03-29 04:48:38.000000000 +0200
+++ linux-2.6.29-devel/fs/buffer.c	2009-03-29 05:26:53.000000000 +0200
@@ -2963,7 +2963,15 @@ sector_t generic_block_bmap(struct addre
 	tmp.b_state = 0;
 	tmp.b_blocknr = 0;
 	tmp.b_size = 1 << inode->i_blkbits;
+
+	/*
+	 * Protect the inode from being truncated while get_block is
+	 * in progress.
+	 */
+	down_read(&mapping->host->i_alloc_sem);
 	get_block(inode, block, &tmp, 0);
+	up_read(&mapping->host->i_alloc_sem);
+
 	return tmp.b_blocknr;
 }
 
Index: linux-2.6.29-devel/fs/fat/inode.c
===================================================================
--- linux-2.6.29-devel.orig/fs/fat/inode.c	2009-03-29 04:46:35.000000000 +0200
+++ linux-2.6.29-devel/fs/fat/inode.c	2009-03-29 05:26:53.000000000 +0200
@@ -202,9 +202,7 @@ static sector_t _fat_bmap(struct address
 	sector_t blocknr;
 
 	/* fat_get_cluster() assumes the requested blocknr isn't truncated. */
-	down_read(&mapping->host->i_alloc_sem);
 	blocknr = generic_block_bmap(mapping, block, fat_get_block);
-	up_read(&mapping->host->i_alloc_sem);
 
 	return blocknr;
 }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] fix bmap-vs-truncate race
  2009-03-30 19:20 [PATCH] fix bmap-vs-truncate race Mikulas Patocka
@ 2009-03-31 14:48 ` Aneesh Kumar K.V
  2009-03-31 17:54 ` Christoph Hellwig
  1 sibling, 0 replies; 6+ messages in thread
From: Aneesh Kumar K.V @ 2009-03-31 14:48 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: viro, linux-fsdevel, linux-kernel

On Mon, Mar 30, 2009 at 03:20:24PM -0400, Mikulas Patocka wrote:
> Hi
> 
> I'm submitting this patch for 2.6.30 merge window.
> 
> 
> FAT filesystem used down_read(&mapping->host->i_alloc_sem) to prevent 
> a race between bmap and truncate. However, such race is present in all the
> other filesystems --- it is generally assumed that blocks queried with
> get_block won't disappear while get_block is in progress.
> 
> The race can be only triggered by root, non-privileged users can't use
> bmap, so it is not a security issue (unless there is some program run
> by root that bmaps users' files).
> 
> This patch fixes the race in a generic way, in all the filesystems. If some
> filesystem employs its own locking and doesn't want to take i_alloc_sem
> (I don't know about any, where taking i_alloc_sem could be problem),
> let it use its own function and not generic_block_bmap.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 
> ---
>  fs/buffer.c    |    8 ++++++++
>  fs/fat/inode.c |    2 --
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.29-devel/fs/buffer.c
> ===================================================================
> --- linux-2.6.29-devel.orig/fs/buffer.c	2009-03-29 04:48:38.000000000 +0200
> +++ linux-2.6.29-devel/fs/buffer.c	2009-03-29 05:26:53.000000000 +0200
> @@ -2963,7 +2963,15 @@ sector_t generic_block_bmap(struct addre
>  	tmp.b_state = 0;
>  	tmp.b_blocknr = 0;
>  	tmp.b_size = 1 << inode->i_blkbits;
> +
> +	/*
> +	 * Protect the inode from being truncated while get_block is
> +	 * in progress.
> +	 */
> +	down_read(&mapping->host->i_alloc_sem);
>  	get_block(inode, block, &tmp, 0);
> +	up_read(&mapping->host->i_alloc_sem);
> +
>  	return tmp.b_blocknr;
>  }
> 
> Index: linux-2.6.29-devel/fs/fat/inode.c
> ===================================================================
> --- linux-2.6.29-devel.orig/fs/fat/inode.c	2009-03-29 04:46:35.000000000 +0200
> +++ linux-2.6.29-devel/fs/fat/inode.c	2009-03-29 05:26:53.000000000 +0200
> @@ -202,9 +202,7 @@ static sector_t _fat_bmap(struct address
>  	sector_t blocknr;
> 
>  	/* fat_get_cluster() assumes the requested blocknr isn't truncated. */
> -	down_read(&mapping->host->i_alloc_sem);
>  	blocknr = generic_block_bmap(mapping, block, fat_get_block);
> -	up_read(&mapping->host->i_alloc_sem);
> 
>  	return blocknr;
>  }

ext4_inode_info.i_data_sem should protect against get_block and truncate
race . On ext3 ext3_inode_info.truncate_mutex should do the same. We
already take these locks in get_block

-aneesh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] fix bmap-vs-truncate race
  2009-03-30 19:20 [PATCH] fix bmap-vs-truncate race Mikulas Patocka
  2009-03-31 14:48 ` Aneesh Kumar K.V
@ 2009-03-31 17:54 ` Christoph Hellwig
  2009-03-31 22:42   ` Mikulas Patocka
  1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2009-03-31 17:54 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: viro, linux-fsdevel, linux-kernel

On Mon, Mar 30, 2009 at 03:20:24PM -0400, Mikulas Patocka wrote:
> Hi
> 
> I'm submitting this patch for 2.6.30 merge window.

Please not.  i_alloc_sem is really a horrible hack needed for a couple
filesystems only and we should not leak it into more generic code but
rather move the few instances into the filesystem.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] fix bmap-vs-truncate race
  2009-03-31 17:54 ` Christoph Hellwig
@ 2009-03-31 22:42   ` Mikulas Patocka
  2009-04-01 11:36     ` Al Viro
  0 siblings, 1 reply; 6+ messages in thread
From: Mikulas Patocka @ 2009-03-31 22:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: viro, Aneesh Kumar K.V, linux-fsdevel, linux-kernel

On Tue, 31 Mar 2009, Christoph Hellwig wrote:

> On Mon, Mar 30, 2009 at 03:20:24PM -0400, Mikulas Patocka wrote:
> > Hi
> > 
> > I'm submitting this patch for 2.6.30 merge window.
> 
> Please not.  i_alloc_sem is really a horrible hack needed for a couple
> filesystems only and we should not leak it into more generic code but
> rather move the few instances into the filesystem.

Could you please document locking rules for get_block(), truncate, bmap & 
direct i/o in Documentation/filesystems/Locking ?

There is a lot of text about directories, but nothing about locking of 
block mappings.

I was living under an impression that get_block() cannot be called on a 
block that is being truncated. That's what read/write/direct-io vs 
truncate seems to guarante --- truncate will first lower i_size 
(preventing any new pages past i_size from being created), then destroy 
any existing pages past i_size (that includes waiting for pagelock until 
all get_blocks on that page end) and finally truncate the metadata on the 
filesystem.

So there should be no situation when you truncate block and call get_block 
on it simultaneously. If get_block can race with truncate, document it.

There are filesystems that don't do any locking on get_block() (for 
example UFS, HPFS; FAT does it only for bmap and doesn't do it for general 
accesses) and other filesystems verify indirect block chains obsessively 
if they were truncated under get_block (why? because of bmap? or some 
other possibility?) --- so the rules should really be documented.

Mikulas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] fix bmap-vs-truncate race
  2009-03-31 22:42   ` Mikulas Patocka
@ 2009-04-01 11:36     ` Al Viro
  2009-04-02 23:22       ` Mikulas Patocka
  0 siblings, 1 reply; 6+ messages in thread
From: Al Viro @ 2009-04-01 11:36 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Christoph Hellwig, Aneesh Kumar K.V, linux-fsdevel, linux-kernel

On Tue, Mar 31, 2009 at 06:42:34PM -0400, Mikulas Patocka wrote:
> On Tue, 31 Mar 2009, Christoph Hellwig wrote:
> 
> > On Mon, Mar 30, 2009 at 03:20:24PM -0400, Mikulas Patocka wrote:
> > > Hi
> > > 
> > > I'm submitting this patch for 2.6.30 merge window.
> > 
> > Please not.  i_alloc_sem is really a horrible hack needed for a couple
> > filesystems only and we should not leak it into more generic code but
> > rather move the few instances into the filesystem.
> 
> Could you please document locking rules for get_block(), truncate, bmap & 
> direct i/o in Documentation/filesystems/Locking ?
> 
> There is a lot of text about directories, but nothing about locking of 
> block mappings.
> 
> I was living under an impression that get_block() cannot be called on a 
> block that is being truncated. That's what read/write/direct-io vs 
> truncate seems to guarante --- truncate will first lower i_size 
> (preventing any new pages past i_size from being created), then destroy 
> any existing pages past i_size (that includes waiting for pagelock until 
> all get_blocks on that page end) and finally truncate the metadata on the 
> filesystem.
> 
> So there should be no situation when you truncate block and call get_block 
> on it simultaneously. If get_block can race with truncate, document it.
> 
> There are filesystems that don't do any locking on get_block() (for 
> example UFS, HPFS; FAT does it only for bmap and doesn't do it for general 
> accesses) and other filesystems verify indirect block chains obsessively 
> if they were truncated under get_block (why? because of bmap? or some 
> other possibility?) --- so the rules should really be documented.

Indirect chain stuff used to be [1] about truncate that *wouldn't* affect page
in question.  Look: we have e.g. 4Kb blocks and data at offset 80Kb.  We do
allocation at offset 40Kb *and* truncate to 60Kb at the same time.

Both 40Kb (block 10) and 80Kb (block 20) are covered by the first indirect
block.  It's there, so get_block() reads it and gets ready to allocate
a block and put its number in the very beginning of indirect block.  In
the meanwhile, truncate() sees that the boundary falls within the first
indirect block (at entry 15).  It sees that we have no blocks prior to
that, so the indirect block ought to be freed.

Now ext2_get_block() comes back with allocated data block and has nowhere
to stick it anymore - indirect one just got freed.

_That_ is where verify_chain() came from.  As far as anything outside of
ext2 can know, this truncate() won't come anywhere near the page we are
working with.  And it won't - for data, that is.

Disclaimer: this code has been changed several times since the last time
I worked with it, so this might not match the current situation anymore.

[1] see disclaimer above.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] fix bmap-vs-truncate race
  2009-04-01 11:36     ` Al Viro
@ 2009-04-02 23:22       ` Mikulas Patocka
  0 siblings, 0 replies; 6+ messages in thread
From: Mikulas Patocka @ 2009-04-02 23:22 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, Aneesh Kumar K.V, linux-fsdevel, linux-kernel



On Wed, 1 Apr 2009, Al Viro wrote:

> On Tue, Mar 31, 2009 at 06:42:34PM -0400, Mikulas Patocka wrote:
> > 
> > There is a lot of text about directories, but nothing about locking of 
> > block mappings.
> > 
> > I was living under an impression that get_block() cannot be called on a 
> > block that is being truncated. That's what read/write/direct-io vs 
> > truncate seems to guarante --- truncate will first lower i_size 
> > (preventing any new pages past i_size from being created), then destroy 
> > any existing pages past i_size (that includes waiting for pagelock until 
> > all get_blocks on that page end) and finally truncate the metadata on the 
> > filesystem.
> > 
> > So there should be no situation when you truncate block and call get_block 
> > on it simultaneously. If get_block can race with truncate, document it.
> > 
> > There are filesystems that don't do any locking on get_block() (for 
> > example UFS, HPFS; FAT does it only for bmap and doesn't do it for general 
> > accesses) and other filesystems verify indirect block chains obsessively 
> > if they were truncated under get_block (why? because of bmap? or some 
> > other possibility?) --- so the rules should really be documented.
> 
> Indirect chain stuff used to be [1] about truncate that *wouldn't* affect page
> in question.  Look: we have e.g. 4Kb blocks and data at offset 80Kb.  We do
> allocation at offset 40Kb *and* truncate to 60Kb at the same time.
> 
> Both 40Kb (block 10) and 80Kb (block 20) are covered by the first indirect
> block.  It's there, so get_block() reads it and gets ready to allocate
> a block and put its number in the very beginning of indirect block.  In
> the meanwhile, truncate() sees that the boundary falls within the first
> indirect block (at entry 15).  It sees that we have no blocks prior to
> that, so the indirect block ought to be freed.
> 
> Now ext2_get_block() comes back with allocated data block and has nowhere
> to stick it anymore - indirect one just got freed.

I see. So if we change ext2_truncate to not delete indirect blocks that 
map only partially truncated space, we could drop that verify_chanin().

Upside: get rid of up to 3 spinlocks & associated cache bounce from every 
get_block call.

Downside: truncate with sparse files would occasionally produce empty 
indirect block. Is it legal to have indirect block full of zero pointers 
on ext2? Or would fsck complain about it?

> _That_ is where verify_chain() came from.  As far as anything outside of
> ext2 can know, this truncate() won't come anywhere near the page we are
> working with.  And it won't - for data, that is.

True. Except that bmap case. Bmap should be either documented or fixed 
with my proposed patch.

> Disclaimer: this code has been changed several times since the last time
> I worked with it, so this might not match the current situation anymore.
> 
> [1] see disclaimer above.

Mikulas

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-04-02 23:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-30 19:20 [PATCH] fix bmap-vs-truncate race Mikulas Patocka
2009-03-31 14:48 ` Aneesh Kumar K.V
2009-03-31 17:54 ` Christoph Hellwig
2009-03-31 22:42   ` Mikulas Patocka
2009-04-01 11:36     ` Al Viro
2009-04-02 23:22       ` Mikulas Patocka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.