linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] eCryptfs fixes for 3.2-rc3
@ 2011-11-23 22:25 Tyler Hicks
  2011-11-23 22:56 ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Tyler Hicks @ 2011-11-23 22:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, ecryptfs

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

Hi Linus - Here are 3 eCryptfs fixes. The first patch solves a race condition
around the initialization of eCryptfs inode info. The second patch solves a
regression in mmap support. The last patch fixes a kernel memory disclosure
which was fixed using your suggested implementation.

The following changes since commit 6fe4c6d466e95d31164f14b1ac4aefb51f0f4f82:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2011-11-20 14:59:33 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs.git for-linus

Tyler Hicks (3):
      eCryptfs: Prevent file create race condition
      eCryptfs: Flush file in vma close
      eCryptfs: Extend array bounds for all filename chars

 fs/ecryptfs/crypto.c          |   26 +++++++++++---------
 fs/ecryptfs/ecryptfs_kernel.h |    5 ++-
 fs/ecryptfs/file.c            |   23 +++++++++++++++++-
 fs/ecryptfs/inode.c           |   52 ++++++++++++++++++++++++----------------
 4 files changed, 70 insertions(+), 36 deletions(-)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GIT PULL] eCryptfs fixes for 3.2-rc3
  2011-11-23 22:25 [GIT PULL] eCryptfs fixes for 3.2-rc3 Tyler Hicks
@ 2011-11-23 22:56 ` Linus Torvalds
  2011-11-24  7:45   ` Tyler Hicks
  0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2011-11-23 22:56 UTC (permalink / raw)
  To: Tyler Hicks, Al Viro; +Cc: Andrew Morton, linux-kernel, ecryptfs

On Wed, Nov 23, 2011 at 2:25 PM, Tyler Hicks <tyhicks@canonical.com> wrote:
>
> Tyler Hicks (3):
>      eCryptfs: Flush file in vma close

I'm not hugely happy with this one.

The commit message says:

    Dirty pages weren't being written back when an mmap'ed eCryptfs file was
    closed before the mapping was unmapped. Since f_ops->flush() is not
    called by the munmap() path, the lower file was simply being released.
    This patch flushes the eCryptfs file in the vm_ops->close() path.

Fair enough - you've debugged the problem. You're misusing the ->flush
thing which only gets called at close time, rather than flushing
things at ->release time. But why?

"->flush()" is very special, and is literally meant for things that
need to wait at close time. A file descriptor may be flushed many
times for a single open (because it was dup'ed etc), and yes, if it is
closed before mmap, it will be flushed before the mmap is done. The
"flush()" is basically attached to a particular fd - useful mainly for
things like special devices that actually want to delay the close
(serial lines etc).

But the fundamental issue is that I don't think cryptfs should be
using "flush". The file is still *open* when "flush()" is called.
That's the fundamental reason for the bug, I think.

cryptfs should flush the encrypted information at *release* time, not
"flush" time. And that would have avoided the bug with mmap, because
release gets called on the very last internal reference count drop of
the 'struct file' - so it gets called after the last close *and*
munmap.

Is there some reason I am missing that cryptfs has to use flush?

I'm doing the pull, but I really think that this is papering over the
*real* bug, which was the use of 'flush' in the first place.

In general, I'd urge people to *not* use "->flush" at all as a
"correctness issue". It's useful to return EIO to "close()" and to be
*polite* (ie the return value of "flush()" will be returned to user
space at close time), but it really should be seen as a "we try to
flush now to try to give user space nice error reports where
possible", but it's important to understand that it's not the last
close, and if you rely on it for correctness, you're doing something
wrong. It's "release()" that is the "get rid of all your state now",
and is about correctness. "flush" is purely about being polite.

ecryptfs seems to have relied on it for correctness. Not good.

                       Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GIT PULL] eCryptfs fixes for 3.2-rc3
  2011-11-23 22:56 ` Linus Torvalds
@ 2011-11-24  7:45   ` Tyler Hicks
  2011-11-24 18:27     ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Tyler Hicks @ 2011-11-24  7:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, Andrew Morton, linux-kernel, ecryptfs

[-- Attachment #1: Type: text/plain, Size: 3397 bytes --]

On 2011-11-23 14:56:23, Linus Torvalds wrote:
> On Wed, Nov 23, 2011 at 2:25 PM, Tyler Hicks <tyhicks@canonical.com> wrote:
> >
> > Tyler Hicks (3):
> >      eCryptfs: Flush file in vma close
> 
> I'm not hugely happy with this one.
> 
> The commit message says:
> 
>     Dirty pages weren't being written back when an mmap'ed eCryptfs file was
>     closed before the mapping was unmapped. Since f_ops->flush() is not
>     called by the munmap() path, the lower file was simply being released.
>     This patch flushes the eCryptfs file in the vm_ops->close() path.
> 
> Fair enough - you've debugged the problem. You're misusing the ->flush
> thing which only gets called at close time, rather than flushing
> things at ->release time. But why?
> 
> "->flush()" is very special, and is literally meant for things that
> need to wait at close time. A file descriptor may be flushed many
> times for a single open (because it was dup'ed etc), and yes, if it is
> closed before mmap, it will be flushed before the mmap is done. The
> "flush()" is basically attached to a particular fd - useful mainly for
> things like special devices that actually want to delay the close
> (serial lines etc).
> 
> But the fundamental issue is that I don't think cryptfs should be
> using "flush". The file is still *open* when "flush()" is called.
> That's the fundamental reason for the bug, I think.
> 
> cryptfs should flush the encrypted information at *release* time, not
> "flush" time. And that would have avoided the bug with mmap, because
> release gets called on the very last internal reference count drop of
> the 'struct file' - so it gets called after the last close *and*
> munmap.
> 
> Is there some reason I am missing that cryptfs has to use flush?

I don't think so. I had actually eyed moving the flush to
ecryptfs_release() but went this route, instead.

> 
> I'm doing the pull, but I really think that this is papering over the
> *real* bug, which was the use of 'flush' in the first place.

Thanks for doing the pull. The patch gets the job done, but I'm now in
agreement that this should be handled in ecryptfs_release().

It will likely be a few days before I can get to it (due to the
holiday), but I'll get a patch ready for review and test.

> 
> In general, I'd urge people to *not* use "->flush" at all as a
> "correctness issue". It's useful to return EIO to "close()" and to be
> *polite* (ie the return value of "flush()" will be returned to user
> space at close time), but it really should be seen as a "we try to
> flush now to try to give user space nice error reports where
> possible", but it's important to understand that it's not the last
> close, and if you rely on it for correctness, you're doing something
> wrong. It's "release()" that is the "get rid of all your state now",
> and is about correctness. "flush" is purely about being polite.

But it *could* be the last close, so it seems that using flush() for
politeness *and* release() for correctness is not an option.
Theoretically, flush() could fail, followed by a successful release(),
resulting in close() returning an error when it shouldn't since the
return value of release() is ignored.

> 
> ecryptfs seems to have relied on it for correctness. Not good.
> 
>                        Linus

I appreciate the review!

Tyler

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GIT PULL] eCryptfs fixes for 3.2-rc3
  2011-11-24  7:45   ` Tyler Hicks
@ 2011-11-24 18:27     ` Linus Torvalds
  0 siblings, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2011-11-24 18:27 UTC (permalink / raw)
  To: Tyler Hicks; +Cc: Al Viro, Andrew Morton, linux-kernel, ecryptfs

On Wed, Nov 23, 2011 at 11:45 PM, Tyler Hicks <tyhicks@canonical.com> wrote:
>
>> In general, I'd urge people to *not* use "->flush" at all as a
>> "correctness issue". It's useful to return EIO to "close()" and to be
>> *polite* (ie the return value of "flush()" will be returned to user
>> space at close time), but it really should be seen as a "we try to
>> flush now to try to give user space nice error reports where
>> possible", but it's important to understand that it's not the last
>> close, and if you rely on it for correctness, you're doing something
>> wrong. It's "release()" that is the "get rid of all your state now",
>> and is about correctness. "flush" is purely about being polite.
>
> But it *could* be the last close, so it seems that using flush() for
> politeness *and* release() for correctness is not an option.

You can certainly do both, there is nothing wrong with it.

Note that even if "flush()" returns an error, we *will* close the fd.
It is not going to abort the close or anything like that: it's just a
signal to the user that something is wrong.

For example, a filesystem like NFS may do delayed writes, so when you
do a "write()" system call, and the server diskspace is full, you may
not get the ENOSPC at "write()" time. You may get it at a subsequent
write(), or you may get it at close() time - because NFS does try to
write it synchronously at that time.  The user cannot *recover* from
the error (the file is closed and you don't know how much of it made
it), but a careful writer can check the error code of close() and at
least know to alert the user that something went wrong.

So there is nothing *wrong* with using "flush()", and it exists for a
reason: so that careful writers *can* be careful.

But when you do use flush(), you also need to be aware that most
writers aren't careful. Even if they don't use mmap(), they also don't
necessarily care about close(). And there are situations where
"flush()" is used as a "let's try to flush, but we will time it out or
still react to SIGINT, so we're doing a 'best effort' kind of flush,
not any correctness guarantees".

In fact, that "best effort" kind of flush is one of the original
reasons for the callback: the flushing of characters of a serial line.
It's timed out (because the close() does have to finish in a timely
manner even if the other end has stopped receiving and i no longer
asserting DTS), and it's not really even about the error code - it's
literally just about "delay until the pending stuff has actually been
sent".

So having both flush (to do a "best effort" try at waiting for stuff
and maybe returning an error) and a release (to actually finish
everything off and get rid of reference counts etc) is perfectly fine
and normal.

> Theoretically, flush() could fail, followed by a successful release(),
> resulting in close() returning an error when it shouldn't since the
> return value of release() is ignored.

That's not even theoretical, it's quite normal. If flush fails, it
*will* be followed by the release() if it's the last close, and the
release is by definition always successful - the release is just a
"ok, we're done now".

So the case you describe is what flush() is designed for. Something
did a best effort to inform the user that things probably didn't work
out. But the user may well not care. If the user close()'d the file
before the last mmap was done, or if the user simply ignores the
return value of close, the kernel doesn't really care. The kernel
basically says "ok, I can *try* to give you relevant errors, but I'm
not going to force the issue, and I'm not going to care if you don't
care".

                       Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-11-24 18:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-23 22:25 [GIT PULL] eCryptfs fixes for 3.2-rc3 Tyler Hicks
2011-11-23 22:56 ` Linus Torvalds
2011-11-24  7:45   ` Tyler Hicks
2011-11-24 18:27     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).