linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Filling holes in ext2
@ 2001-08-22 17:21 Adrian Cox
  2001-08-22 18:34 ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Adrian Cox @ 2001-08-22 17:21 UTC (permalink / raw)
  To: linux-kernel

I've been looking at generic_file_write() a lot recently, and I'm a 
little bothered by this section, as mangled here by Mozilla:

status = mapping->a_ops->prepare_write(file, page, offset, offset+bytes);
if (status)
	goto unlock;
status = __copy_from_user(kaddr+offset, buf, bytes);
flush_dcache_page(page);
if (status)
	goto fail_write;
status = mapping->a_ops->commit_write(file, page, offset, offset+bytes);

If the __copy_from_user() does fail when writing to a hole or extending 
a file on ext2, disk blocks get added to the file, but are never 
cleared. The result is that data from a free block appears in the file.

I've not managed to trigger this in a real system, but I have explored 
the failure path by running UML under gdb. I filled the filesystem with 
data as root (yes > /mnt/test), deleted the files, then triggered this 
path on an application running as a normal user. The result was that 
root's old data appeared in the user file.

So:
Can this really happen on the mainstream kernel? (The kernel I tested on 
  was 2.4.7 with the corresponding UML patch.)

Can this actually be exploited?  I assume the test on __copy_from_user() 
is there in case another thread changes memory mappings while 
generic_file_write() is running. My attempts to do this haven't yet 
succeeded.

If this can happen, does it matter? Should ext2 have an abort_write() 
operation like ext3() has?

-- 
Adrian Cox   http://www.humboldt.co.uk/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Filling holes in ext2
  2001-08-22 17:21 Filling holes in ext2 Adrian Cox
@ 2001-08-22 18:34 ` Andrew Morton
  2001-08-22 18:59   ` Adrian Cox
  2001-08-23 17:22   ` Adrian Cox
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2001-08-22 18:34 UTC (permalink / raw)
  To: Adrian Cox; +Cc: linux-kernel

Adrian Cox wrote:
> 
> I've been looking at generic_file_write() a lot recently, and I'm a
> little bothered by this section, as mangled here by Mozilla:
> 
> status = mapping->a_ops->prepare_write(file, page, offset, offset+bytes);
> if (status)
>         goto unlock;
> status = __copy_from_user(kaddr+offset, buf, bytes);
> flush_dcache_page(page);
> if (status)
>         goto fail_write;
> status = mapping->a_ops->commit_write(file, page, offset, offset+bytes);
> 
> If the __copy_from_user() does fail when writing to a hole or extending
> a file on ext2, disk blocks get added to the file, but are never
> cleared. The result is that data from a free block appears in the file.

Yup.

> I've not managed to trigger this in a real system, but I have explored
> the failure path by running UML under gdb. I filled the filesystem with
> data as root (yes > /mnt/test), deleted the files, then triggered this
> path on an application running as a normal user. The result was that
> root's old data appeared in the user file.
> 
> So:
> Can this really happen on the mainstream kernel? (The kernel I tested on
>   was 2.4.7 with the corresponding UML patch.)

Yes, it can.

> Can this actually be exploited?  I assume the test on __copy_from_user()
> is there in case another thread changes memory mappings while
> generic_file_write() is running. My attempts to do this haven't yet
> succeeded.

I'd expect it to occur if you simply pass an unmapped address
to write()?
 
> If this can happen, does it matter?

It matters.  -ac kernels handle this by clearing out the blocks
on the error path in __block_prepare_write().  If you retest with
-ac kernels, you should just see zeroes.

> Should ext2 have an abort_write() operation like ext3() has?

abort_write() is an expediency - it was (much) easier and safer
to add a new operation rather than running all over the kernel
and redefining the commit_write() API.

ext3 definitely needs to know about prepare/commit imbalance in
lo_send() and generic_file_write(), and in block_symlink() if that
is ever changed to error out after the prepare_write().

But long-term, we need to change the commit_write() definition so
that it is called even in the error case, thus informing the
underlying fs of the partial write.

-

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Filling holes in ext2
  2001-08-22 18:34 ` Andrew Morton
@ 2001-08-22 18:59   ` Adrian Cox
  2001-08-23 17:22   ` Adrian Cox
  1 sibling, 0 replies; 10+ messages in thread
From: Adrian Cox @ 2001-08-22 18:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:
> Adrian Cox wrote:

>>Can this actually be exploited?  I assume the test on __copy_from_user()
>>is there in case another thread changes memory mappings while
>>generic_file_write() is running. My attempts to do this haven't yet
>>succeeded.
> I'd expect it to occur if you simply pass an unmapped address
> to write()?

No, because the first thing generic_file_write does is an access_ok() 
check. It can only happen if the permissions change during the function. 
That's why it's hard to exploit for real.

-- 
Adrian Cox   http://www.humboldt.co.uk/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Filling holes in ext2
  2001-08-22 18:34 ` Andrew Morton
  2001-08-22 18:59   ` Adrian Cox
@ 2001-08-23 17:22   ` Adrian Cox
  2001-08-23 17:46     ` Andrew Morton
  2001-08-23 17:47     ` Alan Cox
  1 sibling, 2 replies; 10+ messages in thread
From: Adrian Cox @ 2001-08-23 17:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:

> It matters.  -ac kernels handle this by clearing out the blocks
> on the error path in __block_prepare_write().  If you retest with
> -ac kernels, you should just see zeroes.

That doesn't help. The problem occurs just the same on -ac kernels.

In this case __block_prepare_write() is successful. What happens is that 
if __copy_from_user() fails, the block remains mapped but not up to 
date. Thus the next read access to the file fetches the garbage data off 
disk, and presents it to the user.

The only definitive solutions I can see are:
1) implement an abort_write() method for every filesystem that uses 
block_prepare_write()
2) redefine commit_write() to be called on failure as well as success.
3) lock the pages corresponding to the user buffer so that 
__copy_from_user() cannot fail.

I like the latter option, as it removes this abort case for ext3 and 
could drastically simplify GFS.

-- 
Adrian Cox


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Filling holes in ext2
  2001-08-23 17:22   ` Adrian Cox
@ 2001-08-23 17:46     ` Andrew Morton
  2001-08-23 19:17       ` Adrian Cox
  2001-08-23 17:47     ` Alan Cox
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2001-08-23 17:46 UTC (permalink / raw)
  To: Adrian Cox; +Cc: linux-kernel

Adrian Cox wrote:
> 
> Andrew Morton wrote:
> 
> > It matters.  -ac kernels handle this by clearing out the blocks
> > on the error path in __block_prepare_write().  If you retest with
> > -ac kernels, you should just see zeroes.
> 
> That doesn't help. The problem occurs just the same on -ac kernels.

OK, that code is for IO errors and disk-full.

> In this case __block_prepare_write() is successful. What happens is that
> if __copy_from_user() fails, the block remains mapped but not up to
> date. Thus the next read access to the file fetches the garbage data off
> disk, and presents it to the user.

generic_file_write() will mark the page not up-to-date in this case.
I wonder what's actually going on?  Perhaps the fact that we've
instantiated a block in ext2 outside i_size?

If you change the error path in -ac's generic_file_write() thusly:

-	goto fail_write;
+	status = -EFAULT;
+	goto sync_failure;

does it fix it?

Can you send the code you're using to demonstrate this?

-

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Filling holes in ext2
  2001-08-23 17:22   ` Adrian Cox
  2001-08-23 17:46     ` Andrew Morton
@ 2001-08-23 17:47     ` Alan Cox
  1 sibling, 0 replies; 10+ messages in thread
From: Alan Cox @ 2001-08-23 17:47 UTC (permalink / raw)
  To: Adrian Cox; +Cc: Andrew Morton, linux-kernel

> 3) lock the pages corresponding to the user buffer so that 
> __copy_from_user() cannot fail.
> 
> I like the latter option, as it removes this abort case for ext3 and 
> could drastically simplify GFS.

Neat elegant and fundamentally close to impossible to implement without
deadlocks.

There only seems to be a problem if prepare_write fills in the metadata. In
such cases the fs I think does need to implement abort operations

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Filling holes in ext2
  2001-08-23 17:46     ` Andrew Morton
@ 2001-08-23 19:17       ` Adrian Cox
  2001-08-23 19:23         ` Alan Cox
  2001-08-23 19:41         ` Andrew Morton
  0 siblings, 2 replies; 10+ messages in thread
From: Adrian Cox @ 2001-08-23 19:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:

> generic_file_write() will mark the page not up-to-date in this case.
> I wonder what's actually going on?  Perhaps the fact that we've
> instantiated a block in ext2 outside i_size?

The problem is that the on-disk metadata now says that that disk block 
is part of the file. So as the page is not up-to-date, the next read 
operation will go to the disk and fetch that block of garbage into the 
page cache.

> If you change the error path in -ac's generic_file_write() thusly:
> 
> -	goto fail_write;
> +	status = -EFAULT;
> +	goto sync_failure;
> 
> does it fix it?

No difference, as the write is to a hole in the middle of the file, and 
doesn't change i_size.

> Can you send the code you're using to demonstrate this?

The heart of it is this:
fd = open(file, O_CREAT | O_TRUNC | O_RDWR, S_IRUSR | S_IWUSR);
lseek(fd, 16385, SEEK_SET);
write(fd, "x", 1);
lseek(fd, 12345, SEEK_SET);
write(fd, "y", 1);
	
During the final write, I intervene with gdb to force the 
__copy_from_user() to fail. It should be possible to do something 
similar with a multi-threaded application where another thread modifies 
the VM, but I haven't actually built a real example.

Then I just take a look at the file with od.

[usermode:~] od -x /mnt/adrian/test
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0030000 0a79 0a79 0a79 0a79 0a79 0a79 0a79 0a79
*
0040000 7800
0040002


-- 
Adrian Cox   http://www.humboldt.co.uk/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Filling holes in ext2
  2001-08-23 19:17       ` Adrian Cox
@ 2001-08-23 19:23         ` Alan Cox
  2001-08-23 19:41         ` Andrew Morton
  1 sibling, 0 replies; 10+ messages in thread
From: Alan Cox @ 2001-08-23 19:23 UTC (permalink / raw)
  To: Adrian Cox; +Cc: Andrew Morton, linux-kernel

> The problem is that the on-disk metadata now says that that disk block 
> is part of the file. So as the page is not up-to-date, the next read 
> operation will go to the disk and fetch that block of garbage into the 
> page cache.

So in actual fact the bug is the file system committing metadata too early
from prepare not commit

Alan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Filling holes in ext2
  2001-08-23 19:17       ` Adrian Cox
  2001-08-23 19:23         ` Alan Cox
@ 2001-08-23 19:41         ` Andrew Morton
  2001-08-23 21:21           ` Adrian Cox
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2001-08-23 19:41 UTC (permalink / raw)
  To: Adrian Cox; +Cc: linux-kernel

Adrian Cox wrote:
> 
> Andrew Morton wrote:
> 
> > generic_file_write() will mark the page not up-to-date in this case.
> > I wonder what's actually going on?  Perhaps the fact that we've
> > instantiated a block in ext2 outside i_size?
> 
> The problem is that the on-disk metadata now says that that disk block
> is part of the file. So as the page is not up-to-date, the next read
> operation will go to the disk and fetch that block of garbage into the
> page cache.
> 

Ah.  Now I'm with you.  Yes, we need a better cleanup facility
to handle this.

We can sort-of fudge it with commit_write():

--- linux-2.4.8-ac9/mm/filemap.c	Wed Aug 22 10:57:47 2001
+++ ac/mm/filemap.c	Thu Aug 23 12:33:50 2001
@@ -2674,8 +2674,9 @@ generic_file_write(struct file *file,con
 		status = __copy_from_user(kaddr+offset, buf, bytes);
 		flush_dcache_page(page);
 		if (status) {
-			if (mapping->a_ops->abort_write)
-				mapping->a_ops->abort_write(file, page);
+			/* Zero the disk blocks so we don't expose stale data mid-file */
+			memset(kaddr + offset, 0, bytes);
+			mapping->a_ops->commit_write(file, page, offset, offset+bytes);
 			goto fail_write;
 		}
 		status = mapping->a_ops->commit_write(file, page, offset, offset+bytes);
@@ -2720,7 +2721,6 @@ out:
 fail_write:
 	status = -EFAULT;
 	ClearPageUptodate(page);
-	kunmap(page);
 	goto unlock;
 sync_failure:
 	UnlockPage(page);

Which is OK for mid-file blocks, but will cause i_size to be extended
at eof, which probably isn't too bad.  Needs more thought.

-

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Filling holes in ext2
  2001-08-23 19:41         ` Andrew Morton
@ 2001-08-23 21:21           ` Adrian Cox
  0 siblings, 0 replies; 10+ messages in thread
From: Adrian Cox @ 2001-08-23 21:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:
> Ah.  Now I'm with you.  Yes, we need a better cleanup facility
> to handle this.
> 
> We can sort-of fudge it with commit_write():
[snip]
> Which is OK for mid-file blocks, but will cause i_size to be extended
> at eof, which probably isn't too bad.  Needs more thought.

That certainly stops it happening. Does anybody think that extending 
i_size in this particular corner case is harmful?

As Alan said, the bug is the file system committing metadata too early, 
and I suspect that ext2 is not the only culprit.

-- 
Adrian Cox   http://www.humboldt.co.uk/


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2001-08-23 21:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-22 17:21 Filling holes in ext2 Adrian Cox
2001-08-22 18:34 ` Andrew Morton
2001-08-22 18:59   ` Adrian Cox
2001-08-23 17:22   ` Adrian Cox
2001-08-23 17:46     ` Andrew Morton
2001-08-23 19:17       ` Adrian Cox
2001-08-23 19:23         ` Alan Cox
2001-08-23 19:41         ` Andrew Morton
2001-08-23 21:21           ` Adrian Cox
2001-08-23 17:47     ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).