linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
@ 2017-01-08 14:48 Mimi Zohar
  2017-01-08 14:52 ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Mimi Zohar @ 2017-01-08 14:48 UTC (permalink / raw)
  To: linux-xfs
  Cc: Christoph Hellwig, Dave Chinner, linux-fsdevel, linux-kernel, Al Viro

Validation of a file's signature/hash, calculating a file's hash, or
simply audit logging a file's hash prior to use, with commit
6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode
instead" cause the system to hang.

IMA takes the i_rwsem (fomerly i_mutex) before reading the file to
synchronize calculating the file hash and validating the file's
hash/signature stored as security.ima xattr.  (In fix mode, it writes
the file hash as the security.ima.)  Prior to commit 6552321831dc "xfs:
remove i_iolock and use i_rwsem in the VFS inode instead" used an XFS
specific lock, not i_rwsem.

INFO: task plymouthd:3106 blocked for more than 120 seconds.
      Not tainted 4.10.0-rc2 #3
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
plymouthd       D11400  3106      1 0x00040080
Call Trace:
[c0000000f11df320] [c0000000ffde2700] 0xc0000000ffde2700 (unreliable)
[c0000000f11df4f0] [c00000000001c2f0] __switch_to+0x2c0/0x450
[c0000000f11df550] [c0000000008a8a24] __schedule+0x304/0x950
[c0000000f11df5f0] [c0000000008a90b8] schedule+0x48/0xc0
[c0000000f11df620] [c0000000008b0e64] rwsem_down_read_failed+0x144/0x1e0
[c0000000f11df6b0] [c0000000008afa78] down_read+0x78/0x80
[c0000000f11df6e0] [c0000000004d0ef8] xfs_ilock+0x198/0x1b0
[c0000000f11df720] [c0000000004be758] xfs_file_buffered_aio_read+0x58/0x140
[c0000000f11df770] [c0000000004be8f8] xfs_file_read_iter+0xb8/0x150
[c0000000f11df7c0] [c000000000306948] __vfs_read+0x138/0x1b0
[c0000000f11df860] [c00000000055e4a0] integrity_kernel_read+0x70/0xb0
[c0000000f11df8b0] [c000000000560fc0] ima_calc_file_hash+0x3a0/0x770
[c0000000f11dfa60] [c000000000561f2c] ima_collect_measurement+0x1dc/0x210
[c0000000f11dfb10] [c00000000055feec] process_measurement.isra.0+0x39c/0x510
[c0000000f11dfb90] [c00000000031f1b4] path_openat+0x764/0x14a0
[c0000000f11dfc90] [c00000000032196c] do_filp_open+0xfc/0x170
[c0000000f11dfdc0] [c000000000305c4c] do_sys_open+0x1ac/0x2d0
[c0000000f11dfe30] [c00000000000b860] system_call+0x38/0xfc

Mimi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 14:48 xfs: commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead" change causes hang Mimi Zohar
@ 2017-01-08 14:52 ` Christoph Hellwig
  2017-01-08 15:03   ` Mimi Zohar
  2017-01-08 17:59   ` James Bottomley
  0 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2017-01-08 14:52 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: linux-xfs, Christoph Hellwig, Dave Chinner, linux-fsdevel,
	linux-kernel, Al Viro

On Sun, Jan 08, 2017 at 09:48:44AM -0500, Mimi Zohar wrote:
> IMA takes the i_rwsem (fomerly i_mutex) before reading the file to
> synchronize calculating the file hash and validating the file's
> hash/signature stored as security.ima xattr

Well, it shouldn't do that.  In the I/O path i_rwsem is up to the
fs to use.  Various other file systems also take it internally for
reads, although mostly only for direct I/O.

So the answer here is that ima needs to stop playing with i_rwsem.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use    i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 14:52 ` Christoph Hellwig
@ 2017-01-08 15:03   ` Mimi Zohar
  2017-01-08 15:14     ` Christoph Hellwig
  2017-01-08 17:59   ` James Bottomley
  1 sibling, 1 reply; 19+ messages in thread
From: Mimi Zohar @ 2017-01-08 15:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, Dave Chinner, linux-fsdevel, linux-kernel, Al Viro

On Sun, 2017-01-08 at 15:52 +0100, Christoph Hellwig wrote:
> On Sun, Jan 08, 2017 at 09:48:44AM -0500, Mimi Zohar wrote:
> > IMA takes the i_rwsem (fomerly i_mutex) before reading the file to
> > synchronize calculating the file hash and validating the file's
> > hash/signature stored as security.ima xattr
> 
> Well, it shouldn't do that.  In the I/O path i_rwsem is up to the
> fs to use.  Various other file systems also take it internally for
> reads, although mostly only for direct I/O.

But not normally for a normal file read.  For direct I/O, IMA fails the
file hash/signature verification, measurement, audit unless the IMA
"permit_directio" rule is defined.

> So the answer here is that ima needs to stop playing with i_rwsem.

Unless something has changed recently, to synchronize reading files to
calculate the file hash and writing xattrs it has to take the i_rwsem
prior to reading the file.

Mimi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 15:03   ` Mimi Zohar
@ 2017-01-08 15:14     ` Christoph Hellwig
  2017-01-08 15:31       ` Mimi Zohar
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2017-01-08 15:14 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Christoph Hellwig, linux-xfs, Dave Chinner, linux-fsdevel,
	linux-kernel, Al Viro

On Sun, Jan 08, 2017 at 10:03:09AM -0500, Mimi Zohar wrote:
> But not normally for a normal file read.

Depends on the file system.  In addition to XFS at least the NFS
also uses i_rwsem by default.  Also all file systems supporting
a DAX I/O path.

> Unless something has changed recently, to synchronize reading files to
> calculate the file hash and writing xattrs it has to take the i_rwsem
> prior to reading the file.

No, you must simply not do this at all.  If you take a lock that
belongs to the fs and is not your own over ->read_iter you're toast
as you've seen.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use    i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 15:14     ` Christoph Hellwig
@ 2017-01-08 15:31       ` Mimi Zohar
  2017-01-08 15:37         ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Mimi Zohar @ 2017-01-08 15:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, Dave Chinner, linux-fsdevel, linux-kernel, Al Viro

On Sun, 2017-01-08 at 16:14 +0100, Christoph Hellwig wrote:
> On Sun, Jan 08, 2017 at 10:03:09AM -0500, Mimi Zohar wrote:
> > But not normally for a normal file read.
> 
> Depends on the file system.  In addition to XFS at least the NFS
> also uses i_rwsem by default.  Also all file systems supporting
> a DAX I/O path.

We're only interested in the integrity of the local file system.

> > Unless something has changed recently, to synchronize reading files to
> > calculate the file hash and writing xattrs it has to take the i_rwsem
> > prior to reading the file.
> 
> No, you must simply not do this at all.  If you take a lock that
> belongs to the fs and is not your own over ->read_iter you're toast
> as you've seen.

Christoph,  this isn't a new story and telling me this isn't very
productive.  Originally there was an IMA specific lock.  The i_mutex was
taken just to access the xattr.   Unforutnately, having two locks caused
a lockdep between the normal read/validate and setxattr.  As a result,
we dropped the IMA specific lock.

IMA needs a mechanism for quickly reading a file to calculate the file
hash and validate (or set) the file signature/hash stored as an xattr,
prior to any other process getting access to the file.

Mimi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 15:31       ` Mimi Zohar
@ 2017-01-08 15:37         ` Christoph Hellwig
  2017-01-08 16:38           ` Mimi Zohar
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2017-01-08 15:37 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Christoph Hellwig, linux-xfs, Dave Chinner, linux-fsdevel,
	linux-kernel, Al Viro

On Sun, Jan 08, 2017 at 10:31:22AM -0500, Mimi Zohar wrote:
> > Depends on the file system.  In addition to XFS at least the NFS
> > also uses i_rwsem by default.  Also all file systems supporting
> > a DAX I/O path.
> 
> We're only interested in the integrity of the local file system.

And I'm only interested in supporting sane use cases that don't include
IMA.

But even if you only care about local file systems IMA is simply
fundamentally broken.  Read/Write zerialization in Linux has always been
in the fs (or generic library called by it) and not by the caller.

> IMA needs a mechanism for quickly reading a file to calculate the file
> hash and validate (or set) the file signature/hash stored as an xattr,
> prior to any other process getting access to the file.

And that's simply not doable in Linux from a layer outside the file system.
And really questionable even from inside the fs.    We should simply drop
IMA from the tree in the current state because it's obviously broken.

If you can convince fs maintainers to do the right calls from the file
systems itself for given file systems (and only support IMA for those)
we can add it back just for those.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use    i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 15:37         ` Christoph Hellwig
@ 2017-01-08 16:38           ` Mimi Zohar
  2017-01-08 16:43             ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Mimi Zohar @ 2017-01-08 16:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, Dave Chinner, linux-fsdevel, linux-kernel, Al Viro,
	James Morris, Andrew Morton, David Safford

On Sun, 2017-01-08 at 16:37 +0100, Christoph Hellwig wrote:
> On Sun, Jan 08, 2017 at 10:31:22AM -0500, Mimi Zohar wrote:
> > > Depends on the file system.  In addition to XFS at least the NFS
> > > also uses i_rwsem by default.  Also all file systems supporting
> > > a DAX I/O path.
> > 
> > We're only interested in the integrity of the local file system.
> 
> And I'm only interested in supporting sane use cases that don't include
> IMA.

Nothing new, you haven't changed your position all these years.  I'm not
sure whether your problem is with the concepts that the integrity
subsystem provides (eg. extending secure and trusted boot to the OS,
auditing file hashes for use in forensics) or just with the
implementation.

Others are interested in these features and are using the integrity
subsystem (eg. David Safford LSS 2016 "Design and Implementation of a
Security Architecture for Critical Infrastructure"
(https://www.youtube.com/watch?v=LDQxzz42Bs0),  Eric Richter's LSS 2016
talk "(Ab)using Linux as a Trusted Bootloader",  and many, many others).

> But even if you only care about local file systems IMA is simply
> fundamentally broken.  Read/Write zerialization in Linux has always been
> in the fs (or generic library called by it) and not by the caller.
> 
> > IMA needs a mechanism for quickly reading a file to calculate the file
> > hash and validate (or set) the file signature/hash stored as an xattr,
> > prior to any other process getting access to the file.
> 
> And that's simply not doable in Linux from a layer outside the file system.
> And really questionable even from inside the fs.    We should simply drop
> IMA from the tree in the current state because it's obviously broken.

True, it is now broken on XFS, but please don't generalize this XFS
regression to all file systems.

> If you can convince fs maintainers to do the right calls from the file
> systems itself for given file systems (and only support IMA for those)
> we can add it back just for those.

We definitely need VFS help in resolving this regression.  As a last
resort, we could always update the default IMA builtin policy to exclude
XFS.

Mimi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 16:38           ` Mimi Zohar
@ 2017-01-08 16:43             ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2017-01-08 16:43 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Christoph Hellwig, linux-xfs, Dave Chinner, linux-fsdevel,
	linux-kernel, Al Viro, James Morris, Andrew Morton,
	David Safford

On Sun, Jan 08, 2017 at 11:38:26AM -0500, Mimi Zohar wrote:
> We definitely need VFS help in resolving this regression.  As a last
> resort, we could always update the default IMA builtin policy to exclude
> XFS.

No, you need to explicitly white list file system that are supported,
and most importantly actually tested using an automated tesuite.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 14:52 ` Christoph Hellwig
  2017-01-08 15:03   ` Mimi Zohar
@ 2017-01-08 17:59   ` James Bottomley
  2017-01-08 18:18     ` Christoph Hellwig
  1 sibling, 1 reply; 19+ messages in thread
From: James Bottomley @ 2017-01-08 17:59 UTC (permalink / raw)
  To: Christoph Hellwig, Mimi Zohar
  Cc: linux-xfs, Dave Chinner, linux-fsdevel, linux-kernel, Al Viro

On Sun, 2017-01-08 at 15:52 +0100, Christoph Hellwig wrote:
> On Sun, Jan 08, 2017 at 09:48:44AM -0500, Mimi Zohar wrote:
> > IMA takes the i_rwsem (fomerly i_mutex) before reading the file to
> > synchronize calculating the file hash and validating the file's
> > hash/signature stored as security.ima xattr
> 
> Well, it shouldn't do that.  In the I/O path i_rwsem is up to the
> fs to use.  Various other file systems also take it internally for
> reads, although mostly only for direct I/O.

Hey, that's not really true: the inode lock (i_rwsem) is used in all
sorts of generic places, including generic_file_write_iter().  That's,
I think, why ima is using it to try to prevent writes while it measures
the file.

> So the answer here is that ima needs to stop playing with i_rwsem.

Isn't there a happy medium? most sensible filesystems will allow shared
reading (unless they want to tank performance) so we can rely on the
fact that even if a fs does use i_rwsem internally on the read path, it
will have to be shared.  So simply replacing the inode_lock() in ima
with inode_lock_shared() should do what ima wants and not interact
badly even if the underlying FS uses i_rwsem.  If there's ever a FS
that takes it exclusively in the read path, ima can simply blacklist
it.

James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 17:59   ` James Bottomley
@ 2017-01-08 18:18     ` Christoph Hellwig
  2017-01-08 18:57       ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2017-01-08 18:18 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Mimi Zohar, linux-xfs, Dave Chinner,
	linux-fsdevel, linux-kernel, Al Viro

On Sun, Jan 08, 2017 at 09:59:25AM -0800, James Bottomley wrote:
> Hey, that's not really true: the inode lock (i_rwsem) is used in all
> sorts of generic places, including generic_file_write_iter().  That's,
> I think, why ima is using it to try to prevent writes while it measures
> the file.

But all these are _below_ file_operations.  The only place where take
them in the VFS is for namespace locking, e.g. before calling into
inode_operations (to generalize a little).

> 
> > So the answer here is that ima needs to stop playing with i_rwsem.
> 
> Isn't there a happy medium? most sensible filesystems will allow shared
> reading (unless they want to tank performance) so we can rely on the
> fact that even if a fs does use i_rwsem internally on the read path, it
> will have to be shared.

At least for direct I/O that doesn't always have to be true.

> So simply replacing the inode_lock() in ima
> with inode_lock_shared() should do what ima wants and not interact
> badly even if the underlying FS uses i_rwsem.  If there's ever a FS
> that takes it exclusively in the read path, ima can simply blacklist
> it.

IFF we actually allow recursive readers for rw_semaphores this would
work around the issue (but I'm not sure about that fact, at least
in the past we didn't).  It won't fix IMA for all the file systems
use other synchronization for reads, e.g. the cluster locks in ocfs2
or gfs2.  It won't fix NFS which will exhibit exacly the same issue
as Mimi reported.

Last but not least it won't solve the problem that IMA has never been
designed and does neither document the requires it has from a file system,
nor is there any systematic testing for it.  It will keep on breaking
because it has all kinds of weird implicit assumptions never written down
or verified, and the test coverage for it is basically non-existing.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 18:18     ` Christoph Hellwig
@ 2017-01-08 18:57       ` James Bottomley
  2017-01-08 19:09         ` Christoph Hellwig
  2017-01-08 19:16         ` Mimi Zohar
  0 siblings, 2 replies; 19+ messages in thread
From: James Bottomley @ 2017-01-08 18:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mimi Zohar, linux-xfs, Dave Chinner, linux-fsdevel, linux-kernel,
	Al Viro

On Sun, 2017-01-08 at 19:18 +0100, Christoph Hellwig wrote:
> On Sun, Jan 08, 2017 at 09:59:25AM -0800, James Bottomley wrote:
> > Hey, that's not really true: the inode lock (i_rwsem) is used in 
> > all sorts of generic places, including generic_file_write_iter(). 
> >  That's, I think, why ima is using it to try to prevent writes 
> > while it measures the file.
> 
> But all these are _below_ file_operations.  The only place where take
> them in the VFS is for namespace locking, e.g. before calling into
> inode_operations (to generalize a little).

Definitely agree we need an abstraction with defined semantics.

> > > So the answer here is that ima needs to stop playing with
> > > i_rwsem.
> > 
> > Isn't there a happy medium? most sensible filesystems will allow 
> > shared reading (unless they want to tank performance) so we can 
> > rely on the fact that even if a fs does use i_rwsem internally on 
> > the read path, it will have to be shared.
> 
> At least for direct I/O that doesn't always have to be true.

I'm unsure about the DIO case, so lets try defining the semantics and
see if they're implementable for DIO, otherwise simply exclude it.

> > So simply replacing the inode_lock() in ima
> > with inode_lock_shared() should do what ima wants and not interact
> > badly even if the underlying FS uses i_rwsem.  If there's ever a FS
> > that takes it exclusively in the read path, ima can simply
> > blacklist
> > it.
> 
> IFF we actually allow recursive readers for rw_semaphores this would
> work around the issue (but I'm not sure about that fact, at least
> in the past we didn't).  It won't fix IMA for all the file systems
> use other synchronization for reads, e.g. the cluster locks in ocfs2
> or gfs2.  It won't fix NFS which will exhibit exacly the same issue
> as Mimi reported.
> 
> Last but not least it won't solve the problem that IMA has never been
> designed and does neither document the requires it has from a file 
> system, nor is there any systematic testing for it.  It will keep on 
> breaking because it has all kinds of weird implicit assumptions never 
> written down or verified, and the test coverage for it is basically
> non-existing.

OK, so how about we define it.  I think we need two vfs calls:

inode_block_local_writes(inode)
inode_unblock_local_writes(inode)

With semantics that between these two, all write attempts to the file
backed by the inode on this system block but reads of the underlying
file are allowed (I added local so we don't have to implement for
remote filesystems).  inode_block_local_writes() will block until all
local writes to the file have finished, so you're guaranteed the file
only allows reads when it succeeds.

As for implementation in the vfs, I suspect an outstanding write count
in the inode might be the better way?

James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 18:57       ` James Bottomley
@ 2017-01-08 19:09         ` Christoph Hellwig
  2017-01-08 19:26           ` Al Viro
                             ` (2 more replies)
  2017-01-08 19:16         ` Mimi Zohar
  1 sibling, 3 replies; 19+ messages in thread
From: Christoph Hellwig @ 2017-01-08 19:09 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Mimi Zohar, linux-xfs, Dave Chinner,
	linux-fsdevel, linux-kernel, Al Viro

On Sun, Jan 08, 2017 at 10:57:28AM -0800, James Bottomley wrote:
> I'm unsure about the DIO case, so lets try defining the semantics and
> see if they're implementable for DIO, otherwise simply exclude it.

Let's start with the semantics.  First we need to write down what
IMA requires from the FS, and have an interface how the FS can declare
that it supports these features.  As far as I can tell there are not
proper feature checks anywhere right now.  Once we have done that
we can move forward from there.

As you seem to be interested in IMA how about you spearhead documenting
the requirements and adding xfstests support?

> OK, so how about we define it.  I think we need two vfs calls:
> 
> inode_block_local_writes(inode)
> inode_unblock_local_writes(inode)

No.  We need an ->ima_measure file_operation, guts of process_measurement
turned into a library function that the FS can call after taking fs-specific
locks.  And maybe also a small wrapper around it that takes ilock and
can be used directly for file systems not needing special locking.

> With semantics that between these two, all write attempts to the file
> backed by the inode on this system block but reads of the underlying
> file are allowed (I added local so we don't have to implement for
> remote filesystems).

How do you define local?  Are GFS2 and OCFS2 local?  Is XFS with
outstanding pNFS layout local?  Is NFS with the block or SCSI layout
local because it operates on a block device?

The only sane way is to make INA opt-in with a check list of features
that need to be supported, and declared to be supported by the fs,
similar to how we handle NFS exporting.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 18:57       ` James Bottomley
  2017-01-08 19:09         ` Christoph Hellwig
@ 2017-01-08 19:16         ` Mimi Zohar
  1 sibling, 0 replies; 19+ messages in thread
From: Mimi Zohar @ 2017-01-08 19:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, linux-xfs, Dave Chinner, linux-fsdevel,
	linux-kernel, Al Viro

On Sun, 2017-01-08 at 10:57 -0800, James Bottomley wrote:
> On Sun, 2017-01-08 at 19:18 +0100, Christoph Hellwig wrote:
> > On Sun, Jan 08, 2017 at 09:59:25AM -0800, James Bottomley wrote:
> > > Hey, that's not really true: the inode lock (i_rwsem) is used in 
> > > all sorts of generic places, including generic_file_write_iter(). 
> > >  That's, I think, why ima is using it to try to prevent writes 
> > > while it measures the file.
> > 
> > But all these are _below_ file_operations.  The only place where take
> > them in the VFS is for namespace locking, e.g. before calling into
> > inode_operations (to generalize a little).
> 
> Definitely agree we need an abstraction with defined semantics.
> 
> > > > So the answer here is that ima needs to stop playing with
> > > > i_rwsem.
> > > 
> > > Isn't there a happy medium? most sensible filesystems will allow 
> > > shared reading (unless they want to tank performance) so we can 
> > > rely on the fact that even if a fs does use i_rwsem internally on 
> > > the read path, it will have to be shared.
> > 
> > At least for direct I/O that doesn't always have to be true.
> 
> I'm unsure about the DIO case, so lets try defining the semantics and
> see if they're implementable for DIO, otherwise simply exclude it.
> 
> > > So simply replacing the inode_lock() in ima
> > > with inode_lock_shared() should do what ima wants and not interact
> > > badly even if the underlying FS uses i_rwsem.  If there's ever a FS
> > > that takes it exclusively in the read path, ima can simply
> > > blacklist
> > > it.
> > 
> > IFF we actually allow recursive readers for rw_semaphores this would
> > work around the issue (but I'm not sure about that fact, at least
> > in the past we didn't).  It won't fix IMA for all the file systems
> > use other synchronization for reads, e.g. the cluster locks in ocfs2
> > or gfs2.  It won't fix NFS which will exhibit exacly the same issue
> > as Mimi reported.
> > 
> > Last but not least it won't solve the problem that IMA has never been
> > designed and does neither document the requires it has from a file 
> > system, nor is there any systematic testing for it.  It will keep on 
> > breaking because it has all kinds of weird implicit assumptions never 
> > written down or verified, and the test coverage for it is basically
> > non-existing.
> 
> OK, so how about we define it.  I think we need two vfs calls:
> 
> inode_block_local_writes(inode)
> inode_unblock_local_writes(inode)
> 
> With semantics that between these two, all write attempts to the file
> backed by the inode on this system block but reads of the underlying
> file are allowed (I added local so we don't have to implement for
> remote filesystems).  inode_block_local_writes() will block until all
> local writes to the file have finished, so you're guaranteed the file
> only allows reads when it succeeds.
> 
> As for implementation in the vfs, I suspect an outstanding write count
> in the inode might be the better way?

As a reference point, what you're suggesting is similar to the current
locks that prevent writing to an executable, while it is being executed
(eg. bprm).

Mimi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 19:09         ` Christoph Hellwig
@ 2017-01-08 19:26           ` Al Viro
  2017-01-08 20:10             ` Mimi Zohar
  2017-01-08 19:39           ` Mimi Zohar
  2017-01-09 19:44           ` Jeff Layton
  2 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2017-01-08 19:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Bottomley, Mimi Zohar, linux-xfs, Dave Chinner,
	linux-fsdevel, linux-kernel

On Sun, Jan 08, 2017 at 08:09:55PM +0100, Christoph Hellwig wrote:

> No.  We need an ->ima_measure file_operation, guts of process_measurement
> turned into a library function that the FS can call after taking fs-specific
> locks.  And maybe also a small wrapper around it that takes ilock and
> can be used directly for file systems not needing special locking.

Incidentally, it had been literally years since the problems with their
pathname handling had been brought up and we *still* have got no answer.

In the current tree, ima_d_path() is quite capable of returning
path->dentry->d_name.name.  Which gets used by subsequent code,
even though there is no warranty whatsoever that it won't be
pointing to freed memory by the time the caller of ima_d_path()
gets it.

Could IMA folks be bothered to explain how the hell is that supposed to
work?  Note that the race window is *not* narrow - it includes reading the
file contents, for fuck sake!  A plenty of time for the file to be
renamed, and if the name had been long enough to be stored separately,
for the original to be freed/reused/whatnot.

Better yet, in ima_collect_measurement() they have another user of
->d_name.name, with all the same issues.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use    i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 19:09         ` Christoph Hellwig
  2017-01-08 19:26           ` Al Viro
@ 2017-01-08 19:39           ` Mimi Zohar
  2017-01-09 19:44           ` Jeff Layton
  2 siblings, 0 replies; 19+ messages in thread
From: Mimi Zohar @ 2017-01-08 19:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Bottomley, linux-xfs, Dave Chinner, linux-fsdevel,
	linux-kernel, Al Viro

On Sun, 2017-01-08 at 20:09 +0100, Christoph Hellwig wrote:
> On Sun, Jan 08, 2017 at 10:57:28AM -0800, James Bottomley wrote:
> > I'm unsure about the DIO case, so lets try defining the semantics and
> > see if they're implementable for DIO, otherwise simply exclude it.
> 
> Let's start with the semantics.  First we need to write down what
> IMA requires from the FS, and have an interface how the FS can declare
> that it supports these features.  As far as I can tell there are not
> proper feature checks anywhere right now.  Once we have done that
> we can move forward from there.
> 
> As you seem to be interested in IMA how about you spearhead documenting
> the requirements and adding xfstests support?

In addition to calculating the file hash on open and validating the
hash/signature stored as an xattr, we need a safe mechanism for writing
the file hash on the last __fput(), which is currently being deferred
due to locking issues.

> > OK, so how about we define it.  I think we need two vfs calls:
> > 
> > inode_block_local_writes(inode)
> > inode_unblock_local_writes(inode)
> 
> No.  We need an ->ima_measure file_operation, guts of process_measurement
> turned into a library function that the FS can call after taking fs-specific
> locks.  And maybe also a small wrapper around it that takes ilock and
> can be used directly for file systems not needing special locking.

So a vfs generic version for those that don't need special locking, with
the option of replacing the vfs generic version with a filesystem
specific one that directly calls the library function.

process_measurements() is a lot more than just calculating the file
hash, adding the hash to the measurement list and extending the TPM.  We
could limit it to just calculating the file hash and name it
ima_collect.  Or, if you prefer all of process_measurements() it could
be named just ima.

Mimi

> > With semantics that between these two, all write attempts to the file
> > backed by the inode on this system block but reads of the underlying
> > file are allowed (I added local so we don't have to implement for
> > remote filesystems).
> 
> How do you define local?  Are GFS2 and OCFS2 local?  Is XFS with
> outstanding pNFS layout local?  Is NFS with the block or SCSI layout
> local because it operates on a block device?
> 
> The only sane way is to make INA opt-in with a check list of features
> that need to be supported, and declared to be supported by the fs,
> similar to how we handle NFS exporting.
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 19:26           ` Al Viro
@ 2017-01-08 20:10             ` Mimi Zohar
  0 siblings, 0 replies; 19+ messages in thread
From: Mimi Zohar @ 2017-01-08 20:10 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, James Bottomley, linux-xfs, Dave Chinner,
	linux-fsdevel, linux-kernel

On Sun, 2017-01-08 at 19:26 +0000, Al Viro wrote:
> On Sun, Jan 08, 2017 at 08:09:55PM +0100, Christoph Hellwig wrote:
> 
> > No.  We need an ->ima_measure file_operation, guts of process_measurement
> > turned into a library function that the FS can call after taking fs-specific
> > locks.  And maybe also a small wrapper around it that takes ilock and
> > can be used directly for file systems not needing special locking.
> 
> Incidentally, it had been literally years since the problems with their
> pathname handling had been brought up and we *still* have got no answer.

> In the current tree, ima_d_path() is quite capable of returning
> path->dentry->d_name.name.  Which gets used by subsequent code,
> even though there is no warranty whatsoever that it won't be
> pointing to freed memory by the time the caller of ima_d_path()
> gets it.

The pathname/filename is being used in the measurement list and for
audit logging.  It's currently using d_absolute_path() to get the full
pathname, but falls back to using the dentry->d_name.name on failure.
Ok, so instead of returning the dentry->d_name.name, we should make a
copy it, assuming the error isn't memory related.

Mimi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-08 19:09         ` Christoph Hellwig
  2017-01-08 19:26           ` Al Viro
  2017-01-08 19:39           ` Mimi Zohar
@ 2017-01-09 19:44           ` Jeff Layton
  2017-01-10  2:54             ` Mimi Zohar
  2 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2017-01-09 19:44 UTC (permalink / raw)
  To: Christoph Hellwig, James Bottomley
  Cc: Mimi Zohar, linux-xfs, Dave Chinner, linux-fsdevel, linux-kernel,
	Al Viro

On Sun, 2017-01-08 at 20:09 +0100, Christoph Hellwig wrote:
> On Sun, Jan 08, 2017 at 10:57:28AM -0800, James Bottomley wrote:
> > 
> > I'm unsure about the DIO case, so lets try defining the semantics and
> > see if they're implementable for DIO, otherwise simply exclude it.
> 
> Let's start with the semantics.  First we need to write down what
> IMA requires from the FS, and have an interface how the FS can declare
> that it supports these features.  As far as I can tell there are not
> proper feature checks anywhere right now.  Once we have done that
> we can move forward from there.
> 
> As you seem to be interested in IMA how about you spearhead documenting
> the requirements and adding xfstests support?
> 

Another datapoint here:

While doing the i_version rework patches, I noticed that IMA depends
heavily on the filesystem correctly implementing the i_version counter,
but that's only reliable for filesystems that set the MS_I_VERSION flag.

I see nowhere that IMA actually checks that that flag is set, so you can
conceivably turn it on on filesystems that don't implement it correctly
(or just have it turned off like ext4 defaults to) and never notice that
your monitored file has changed.

Documenting the VFS and fs driver requirements for IMA would be a good
way to start fixing some of these problems.

> > 
> > OK, so how about we define it.  I think we need two vfs calls:
> > 
> > inode_block_local_writes(inode)
> > inode_unblock_local_writes(inode)
> 
> No.  We need an ->ima_measure file_operation, guts of process_measurement
> turned into a library function that the FS can call after taking fs-specific
> locks.  And maybe also a small wrapper around it that takes ilock and
> can be used directly for file systems not needing special locking.
> 
> > 
> > With semantics that between these two, all write attempts to the file
> > backed by the inode on this system block but reads of the underlying
> > file are allowed (I added local so we don't have to implement for
> > remote filesystems).
> 
> How do you define local?  Are GFS2 and OCFS2 local?  Is XFS with
> outstanding pNFS layout local?  Is NFS with the block or SCSI layout
> local because it operates on a block device?
> 
> The only sane way is to make INA opt-in with a check list of features
> that need to be supported, and declared to be supported by the fs,
> similar to how we handle NFS exporting.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-09 19:44           ` Jeff Layton
@ 2017-01-10  2:54             ` Mimi Zohar
  2017-01-10 16:22               ` Jeff Layton
  0 siblings, 1 reply; 19+ messages in thread
From: Mimi Zohar @ 2017-01-10  2:54 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Christoph Hellwig, James Bottomley, linux-xfs, Dave Chinner,
	linux-fsdevel, linux-kernel, Al Viro

On Mon, 2017-01-09 at 14:44 -0500, Jeff Layton wrote:
> On Sun, 2017-01-08 at 20:09 +0100, Christoph Hellwig wrote:
> > On Sun, Jan 08, 2017 at 10:57:28AM -0800, James Bottomley wrote:
> > > 
> > > I'm unsure about the DIO case, so lets try defining the semantics and
> > > see if they're implementable for DIO, otherwise simply exclude it.
> > 
> > Let's start with the semantics.  First we need to write down what
> > IMA requires from the FS, and have an interface how the FS can declare
> > that it supports these features.  As far as I can tell there are not
> > proper feature checks anywhere right now.  Once we have done that
> > we can move forward from there.
> > 
> > As you seem to be interested in IMA how about you spearhead documenting
> > the requirements and adding xfstests support?
> > 
> 
> Another datapoint here:
> 
> While doing the i_version rework patches, I noticed that IMA depends
> heavily on the filesystem correctly implementing the i_version counter,
> but that's only reliable for filesystems that set the MS_I_VERSION flag.
> 
> I see nowhere that IMA actually checks that that flag is set, so you can
> conceivably turn it on on filesystems that don't implement it correctly
> (or just have it turned off like ext4 defaults to) and never notice that
> your monitored file has changed.

Yes, the filesystem does need to be mounted with i_version for iMA to
detect file changes.

> Documenting the VFS and fs driver requirements for IMA would be a good
> way to start fixing some of these problems.

Agreed.

Mimi

> > > 
> > > OK, so how about we define it.  I think we need two vfs calls:
> > > 
> > > inode_block_local_writes(inode)
> > > inode_unblock_local_writes(inode)
> > 
> > No.  We need an ->ima_measure file_operation, guts of process_measurement
> > turned into a library function that the FS can call after taking fs-specific
> > locks.  And maybe also a small wrapper around it that takes ilock and
> > can be used directly for file systems not needing special locking.
> > 
> > > 
> > > With semantics that between these two, all write attempts to the file
> > > backed by the inode on this system block but reads of the underlying
> > > file are allowed (I added local so we don't have to implement for
> > > remote filesystems).
> > 
> > How do you define local?  Are GFS2 and OCFS2 local?  Is XFS with
> > outstanding pNFS layout local?  Is NFS with the block or SCSI layout
> > local because it operates on a block device?
> > 
> > The only sane way is to make INA opt-in with a check list of features
> > that need to be supported, and declared to be supported by the fs,
> > similar to how we handle NFS exporting.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs:  commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead"  change causes hang
  2017-01-10  2:54             ` Mimi Zohar
@ 2017-01-10 16:22               ` Jeff Layton
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2017-01-10 16:22 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Christoph Hellwig, James Bottomley, linux-xfs, Dave Chinner,
	linux-fsdevel, linux-kernel, Al Viro

On Mon, 2017-01-09 at 21:54 -0500, Mimi Zohar wrote:
> On Mon, 2017-01-09 at 14:44 -0500, Jeff Layton wrote:
> > 
> > On Sun, 2017-01-08 at 20:09 +0100, Christoph Hellwig wrote:
> > > 
> > > On Sun, Jan 08, 2017 at 10:57:28AM -0800, James Bottomley wrote:
> > > > 
> > > > 
> > > > I'm unsure about the DIO case, so lets try defining the semantics and
> > > > see if they're implementable for DIO, otherwise simply exclude it.
> > > 
> > > Let's start with the semantics.  First we need to write down what
> > > IMA requires from the FS, and have an interface how the FS can declare
> > > that it supports these features.  As far as I can tell there are not
> > > proper feature checks anywhere right now.  Once we have done that
> > > we can move forward from there.
> > > 
> > > As you seem to be interested in IMA how about you spearhead documenting
> > > the requirements and adding xfstests support?
> > > 
> > 
> > Another datapoint here:
> > 
> > While doing the i_version rework patches, I noticed that IMA depends
> > heavily on the filesystem correctly implementing the i_version counter,
> > but that's only reliable for filesystems that set the MS_I_VERSION flag.
> > 
> > I see nowhere that IMA actually checks that that flag is set, so you can
> > conceivably turn it on on filesystems that don't implement it correctly
> > (or just have it turned off like ext4 defaults to) and never notice that
> > your monitored file has changed.
> 
> Yes, the filesystem does need to be mounted with i_version for iMA to
> detect file changes.
> 

Yes, but note that only ext4 recognizes that mount option. XFS and BTRFS
always support the i_version counter. The IMA documentation that I've
seen does not make that clear either, fwiw.

The upshot there is that IMA _really_ ought to check that
IS_I_VERSION(inode) is true. What I'm not sure of is where that check
should be done.


> > 
> > Documenting the VFS and fs driver requirements for IMA would be a good
> > way to start fixing some of these problems.
> 
> Agreed.
> 
> Mimi
> 
> > 
> > > 
> > > > 
> > > > 
> > > > OK, so how about we define it.  I think we need two vfs calls:
> > > > 
> > > > inode_block_local_writes(inode)
> > > > inode_unblock_local_writes(inode)
> > > 
> > > No.  We need an ->ima_measure file_operation, guts of process_measurement
> > > turned into a library function that the FS can call after taking fs-specific
> > > locks.  And maybe also a small wrapper around it that takes ilock and
> > > can be used directly for file systems not needing special locking.
> > > 
> > > > 
> > > > 
> > > > With semantics that between these two, all write attempts to the file
> > > > backed by the inode on this system block but reads of the underlying
> > > > file are allowed (I added local so we don't have to implement for
> > > > remote filesystems).
> > > 
> > > How do you define local?  Are GFS2 and OCFS2 local?  Is XFS with
> > > outstanding pNFS layout local?  Is NFS with the block or SCSI layout
> > > local because it operates on a block device?
> > > 
> > > The only sane way is to make INA opt-in with a check list of features
> > > that need to be supported, and declared to be supported by the fs,
> > > similar to how we handle NFS exporting.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-01-10 16:22 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-08 14:48 xfs: commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead" change causes hang Mimi Zohar
2017-01-08 14:52 ` Christoph Hellwig
2017-01-08 15:03   ` Mimi Zohar
2017-01-08 15:14     ` Christoph Hellwig
2017-01-08 15:31       ` Mimi Zohar
2017-01-08 15:37         ` Christoph Hellwig
2017-01-08 16:38           ` Mimi Zohar
2017-01-08 16:43             ` Christoph Hellwig
2017-01-08 17:59   ` James Bottomley
2017-01-08 18:18     ` Christoph Hellwig
2017-01-08 18:57       ` James Bottomley
2017-01-08 19:09         ` Christoph Hellwig
2017-01-08 19:26           ` Al Viro
2017-01-08 20:10             ` Mimi Zohar
2017-01-08 19:39           ` Mimi Zohar
2017-01-09 19:44           ` Jeff Layton
2017-01-10  2:54             ` Mimi Zohar
2017-01-10 16:22               ` Jeff Layton
2017-01-08 19:16         ` Mimi Zohar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).