All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mimi Zohar <zohar@linux.vnet.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Chinner <david@fromorbit.com>,
	LSM List <linux-security-module@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	"Theodore Ts'o" <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-integrity@vger.kernel.org
Subject: Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively
Date: Sun, 01 Oct 2017 19:54:39 -0400	[thread overview]
Message-ID: <1506902079.5691.256.camel@linux.vnet.ibm.com> (raw)
In-Reply-To: <CA+55aFwD=+5trJbmuc2SL-DYcdmt2p4gq1uPBp8mznj1JYSVTg@mail.gmail.com>

On Sun, 2017-10-01 at 15:20 -0700, Linus Torvalds wrote:
> On Sun, Oct 1, 2017 at 3:06 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> > Unless I misread something it was being pointed out there are some vfs
> > operations today on which ima writes an ima xattr as a side effect.  And
> > those operations hold the i_sem.  So perhaps I am misunderstanding
> > things or writing the ima xattr needs to happen at some point.  Which
> > implies something like queued work.
> 
> So the issue is indeed the inode semaphore, as it is used by IMA. But
> all these IMA patches to work around the issue are just horribly ugly.
> One adds a VFS-layer filesystem method that most filesystems end up
> not really needing (it's the same as the regular read), and other
> filesystems end up then having hacks with ("oh, I don't need to take
> this lock because it was already taken by the caller").
> 
> The second patch attempt avoided the need for a new filesystem method,
> but added a flag in an annoying place (for the same basic logic). The
> advantage is that now most filesystems don't actually need to care any
> more (and the filesystems that used to care now check that flag).
> 
> There was discussion about moving the flag to a mode convenient spot,
> which would have made it a lot less intrusive.
> 
> But the basic issue is that almost always when you see lock
> inversions, the problem can just be fixed by doing the locking
> differently instead.

This is what I've been missing.  Thank you for taking the time to
understand the problem and explain how!

> And that's what I was/am pushing for.

> There really are two totally different issues:
> 
>  - the integrity _measurement_.
> 
>    This one wants to be serialized, so that you don't have multiple
> concurrent measurements, and the serialization fundamentally has to be
> around all the IO, so this lock pretty much has to be outside the
> i_sem.
> 
>  - the integrity invalidation on certain operations.
> 
>    This one fundamentally had to be inside the i_sem, since some of
> the operations that cause this end up already holding the i_sem at a
> VFS layer.
> 
> so you had these two different requirements (inside _and_ outside),
> and the IMA approach was basically to avoid the problem by making
> i_sem *the* lock, and then making the IO routines aware of it already
> being held. That does solve the inside/outside issue.
> 
> But the simpler way to fix it is to simply use two locks that nest
> inside each other, with i_sem nesting in the middle.  That just avoids
> the problem entirely, and doesn't require anybody to ever care about
> i_sem semantic changes, because i_sem semantics simply didn't change
> at all.
> 
> So that's the approach I'm pushing. I admittedly haven't actually
> looked at the IMA details, but from a high-level standpoint you can
> basically describe it (as above) without having to care too much about
> exactly what IMA even wants.
> 
> The two-lock approach does require that the operations that invalidate
> the integrity measurements always only invalidate it, and don't try to
> re-compute it. But I suspect that would be entirely insane anyway
> (imagine a world where "setxattr" would have to read the whole file
> contents in order to revalidate the integrity measurement - even if
> there is nobody who even *cares*).

Right, the setxattr, chmod, chown syscalls just resets the cached
flags, which indicate whether the file needs to be re-measured, re-
validated, or re-audited.  The file hash is not re-calculated at this
point.  That happens on the next access (in policy).

Mimi

WARNING: multiple messages have this Message-ID (diff)
From: zohar@linux.vnet.ibm.com (Mimi Zohar)
To: linux-security-module@vger.kernel.org
Subject: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively
Date: Sun, 01 Oct 2017 19:54:39 -0400	[thread overview]
Message-ID: <1506902079.5691.256.camel@linux.vnet.ibm.com> (raw)
In-Reply-To: <CA+55aFwD=+5trJbmuc2SL-DYcdmt2p4gq1uPBp8mznj1JYSVTg@mail.gmail.com>

On Sun, 2017-10-01 at 15:20 -0700, Linus Torvalds wrote:
> On Sun, Oct 1, 2017 at 3:06 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> > Unless I misread something it was being pointed out there are some vfs
> > operations today on which ima writes an ima xattr as a side effect.  And
> > those operations hold the i_sem.  So perhaps I am misunderstanding
> > things or writing the ima xattr needs to happen at some point.  Which
> > implies something like queued work.
> 
> So the issue is indeed the inode semaphore, as it is used by IMA. But
> all these IMA patches to work around the issue are just horribly ugly.
> One adds a VFS-layer filesystem method that most filesystems end up
> not really needing (it's the same as the regular read), and other
> filesystems end up then having hacks with ("oh, I don't need to take
> this lock because it was already taken by the caller").
> 
> The second patch attempt avoided the need for a new filesystem method,
> but added a flag in an annoying place (for the same basic logic). The
> advantage is that now most filesystems don't actually need to care any
> more (and the filesystems that used to care now check that flag).
> 
> There was discussion about moving the flag to a mode convenient spot,
> which would have made it a lot less intrusive.
> 
> But the basic issue is that almost always when you see lock
> inversions, the problem can just be fixed by doing the locking
> differently instead.

This is what I've been missing. ?Thank you for taking the time to
understand the problem and explain how!

> And that's what I was/am pushing for.

> There really are two totally different issues:
> 
>  - the integrity _measurement_.
> 
>    This one wants to be serialized, so that you don't have multiple
> concurrent measurements, and the serialization fundamentally has to be
> around all the IO, so this lock pretty much has to be outside the
> i_sem.
> 
>  - the integrity invalidation on certain operations.
> 
>    This one fundamentally had to be inside the i_sem, since some of
> the operations that cause this end up already holding the i_sem at a
> VFS layer.
> 
> so you had these two different requirements (inside _and_ outside),
> and the IMA approach was basically to avoid the problem by making
> i_sem *the* lock, and then making the IO routines aware of it already
> being held. That does solve the inside/outside issue.
> 
> But the simpler way to fix it is to simply use two locks that nest
> inside each other, with i_sem nesting in the middle.  That just avoids
> the problem entirely, and doesn't require anybody to ever care about
> i_sem semantic changes, because i_sem semantics simply didn't change
> at all.
> 
> So that's the approach I'm pushing. I admittedly haven't actually
> looked at the IMA details, but from a high-level standpoint you can
> basically describe it (as above) without having to care too much about
> exactly what IMA even wants.
> 
> The two-lock approach does require that the operations that invalidate
> the integrity measurements always only invalidate it, and don't try to
> re-compute it. But I suspect that would be entirely insane anyway
> (imagine a world where "setxattr" would have to read the whole file
> contents in order to revalidate the integrity measurement - even if
> there is nobody who even *cares*).

Right, the setxattr, chmod, chown syscalls just resets the cached
flags, which indicate whether the file needs to be re-measured, re-
validated, or re-audited. ?The file hash is not re-calculated at this
point. ?That happens on the next access (in policy).

Mimi

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Mimi Zohar <zohar@linux.vnet.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Chinner <david@fromorbit.com>,
	LSM List <linux-security-module@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	"Theodore Ts'o" <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-integrity@vger.kernel.org
Subject: Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively
Date: Sun, 01 Oct 2017 19:54:39 -0400	[thread overview]
Message-ID: <1506902079.5691.256.camel@linux.vnet.ibm.com> (raw)
In-Reply-To: <CA+55aFwD=+5trJbmuc2SL-DYcdmt2p4gq1uPBp8mznj1JYSVTg@mail.gmail.com>

On Sun, 2017-10-01 at 15:20 -0700, Linus Torvalds wrote:
> On Sun, Oct 1, 2017 at 3:06 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> > Unless I misread something it was being pointed out there are some vfs
> > operations today on which ima writes an ima xattr as a side effect.  And
> > those operations hold the i_sem.  So perhaps I am misunderstanding
> > things or writing the ima xattr needs to happen at some point.  Which
> > implies something like queued work.
> 
> So the issue is indeed the inode semaphore, as it is used by IMA. But
> all these IMA patches to work around the issue are just horribly ugly.
> One adds a VFS-layer filesystem method that most filesystems end up
> not really needing (it's the same as the regular read), and other
> filesystems end up then having hacks with ("oh, I don't need to take
> this lock because it was already taken by the caller").
> 
> The second patch attempt avoided the need for a new filesystem method,
> but added a flag in an annoying place (for the same basic logic). The
> advantage is that now most filesystems don't actually need to care any
> more (and the filesystems that used to care now check that flag).
> 
> There was discussion about moving the flag to a mode convenient spot,
> which would have made it a lot less intrusive.
> 
> But the basic issue is that almost always when you see lock
> inversions, the problem can just be fixed by doing the locking
> differently instead.

This is what I've been missing.  Thank you for taking the time to
understand the problem and explain how!

> And that's what I was/am pushing for.

> There really are two totally different issues:
> 
>  - the integrity _measurement_.
> 
>    This one wants to be serialized, so that you don't have multiple
> concurrent measurements, and the serialization fundamentally has to be
> around all the IO, so this lock pretty much has to be outside the
> i_sem.
> 
>  - the integrity invalidation on certain operations.
> 
>    This one fundamentally had to be inside the i_sem, since some of
> the operations that cause this end up already holding the i_sem at a
> VFS layer.
> 
> so you had these two different requirements (inside _and_ outside),
> and the IMA approach was basically to avoid the problem by making
> i_sem *the* lock, and then making the IO routines aware of it already
> being held. That does solve the inside/outside issue.
> 
> But the simpler way to fix it is to simply use two locks that nest
> inside each other, with i_sem nesting in the middle.  That just avoids
> the problem entirely, and doesn't require anybody to ever care about
> i_sem semantic changes, because i_sem semantics simply didn't change
> at all.
> 
> So that's the approach I'm pushing. I admittedly haven't actually
> looked at the IMA details, but from a high-level standpoint you can
> basically describe it (as above) without having to care too much about
> exactly what IMA even wants.
> 
> The two-lock approach does require that the operations that invalidate
> the integrity measurements always only invalidate it, and don't try to
> re-compute it. But I suspect that would be entirely insane anyway
> (imagine a world where "setxattr" would have to read the whole file
> contents in order to revalidate the integrity measurement - even if
> there is nobody who even *cares*).

Right, the setxattr, chmod, chown syscalls just resets the cached
flags, which indicate whether the file needs to be re-measured, re-
validated, or re-audited.  The file hash is not re-calculated at this
point.  That happens on the next access (in policy).

Mimi

  reply	other threads:[~2017-10-01 23:54 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-28 12:39 [RFC PATCH 0/3] define new read_iter file operation rwf flag Mimi Zohar
2017-09-28 12:39 ` Mimi Zohar
2017-09-28 12:39 ` [RFC PATCH 1/3] fs: define new read_iter " Mimi Zohar
2017-09-28 12:39   ` Mimi Zohar
2017-09-28 13:54   ` Matthew Wilcox
2017-09-28 13:54     ` Matthew Wilcox
2017-09-28 14:33     ` Mimi Zohar
2017-09-28 14:33       ` Mimi Zohar
2017-09-28 15:51     ` Linus Torvalds
2017-09-28 15:51       ` Linus Torvalds
2017-09-28 12:39 ` [RFC PATCH 2/3] integrity: use call_read_iter to calculate the file hash Mimi Zohar
2017-09-28 12:39   ` Mimi Zohar
2017-09-28 12:39 ` [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively Mimi Zohar
2017-09-28 12:39   ` Mimi Zohar
2017-09-28 22:02   ` Dave Chinner
2017-09-28 22:02     ` Dave Chinner
2017-09-28 23:39     ` Linus Torvalds
2017-09-28 23:39       ` Linus Torvalds
2017-09-29  0:12       ` Mimi Zohar
2017-09-29  0:12         ` Mimi Zohar
2017-09-29  0:12         ` Mimi Zohar
2017-09-29  0:33         ` Linus Torvalds
2017-09-29  0:33           ` Linus Torvalds
2017-09-29  1:53           ` Mimi Zohar
2017-09-29  1:53             ` Mimi Zohar
2017-09-29  1:53             ` Mimi Zohar
2017-09-29  3:26             ` Linus Torvalds
2017-09-29  3:26               ` Linus Torvalds
2017-10-01  1:33               ` Eric W. Biederman
2017-10-01  1:33                 ` Eric W. Biederman
     [not found]                 ` <CA+55aFx726wT4VprN-sHm6s8Q_PV_VjhTBC4goEbMcerYU1Tig@mail.gmail.com>
2017-10-01 12:08                   ` Mimi Zohar
2017-10-01 12:08                     ` Mimi Zohar
2017-10-01 12:08                     ` Mimi Zohar
2017-10-01 18:41                     ` Linus Torvalds
2017-10-01 18:41                       ` Linus Torvalds
2017-10-01 22:34                       ` Dave Chinner
2017-10-01 22:34                         ` Dave Chinner
2017-10-01 23:15                         ` Linus Torvalds
2017-10-01 23:15                           ` Linus Torvalds
2017-10-02  3:54                           ` Dave Chinner
2017-10-02  3:54                             ` Dave Chinner
2017-10-01 23:42                         ` Mimi Zohar
2017-10-01 23:42                           ` Mimi Zohar
2017-10-01 23:42                           ` Mimi Zohar
2017-10-02  3:25                           ` Eric W. Biederman
2017-10-02  3:25                             ` Eric W. Biederman
2017-10-02  3:25                             ` Eric W. Biederman
2017-10-02 12:25                             ` Mimi Zohar
2017-10-02 12:25                               ` Mimi Zohar
2017-10-02 12:25                               ` Mimi Zohar
2017-10-02  4:35                           ` Dave Chinner
2017-10-02  4:35                             ` Dave Chinner
2017-10-02  4:35                             ` Dave Chinner
2017-10-02  4:35                             ` Dave Chinner
2017-10-02 12:09                             ` Mimi Zohar
2017-10-02 12:09                               ` Mimi Zohar
2017-10-02 12:09                               ` Mimi Zohar
2017-10-02 12:43                               ` Jeff Layton
2017-10-02 12:43                                 ` Jeff Layton
2017-10-01 22:06                   ` Eric W. Biederman
2017-10-01 22:06                     ` Eric W. Biederman
2017-10-01 22:20                     ` Linus Torvalds
2017-10-01 22:20                       ` Linus Torvalds
2017-10-01 23:54                       ` Mimi Zohar [this message]
2017-10-01 23:54                         ` Mimi Zohar
2017-10-01 23:54                         ` Mimi Zohar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1506902079.5691.256.camel@linux.vnet.ibm.com \
    --to=zohar@linux.vnet.ibm.com \
    --cc=david@fromorbit.com \
    --cc=ebiederm@xmission.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.