Linux-Integrity Archive on lore.kernel.org
 help / color / Atom feed
From: Chuck Lever <chucklever@gmail.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Mimi Zohar <zohar@linux.ibm.com>,
	James Morris <jmorris@namei.org>,
	Deven Bowers <deven.desai@linux.microsoft.com>,
	Pavel Machek <pavel@ucw.cz>, Sasha Levin <sashal@kernel.org>,
	snitzer@redhat.com, dm-devel@redhat.com,
	tyhicks@linux.microsoft.com, agk@redhat.com,
	Paul Moore <paul@paul-moore.com>,
	Jonathan Corbet <corbet@lwn.net>,
	nramas@linux.microsoft.com, serge@hallyn.com,
	pasha.tatashin@soleen.com, Jann Horn <jannh@google.com>,
	linux-block@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
	Jens Axboe <axboe@kernel.dk>,
	mdsakib@microsoft.com, open list <linux-kernel@vger.kernel.org>,
	eparis@redhat.com, linux-security-module@vger.kernel.org,
	linux-audit@redhat.com,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-integrity@vger.kernel.org,
	jaskarankhurana@linux.microsoft.com
Subject: Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
Date: Thu, 13 Aug 2020 10:42:57 -0400
Message-ID: <D470BA4B-EF1A-49CA-AFB9-0F7FFC4C6001@gmail.com> (raw)
In-Reply-To: <1597247482.7293.18.camel@HansenPartnership.com>



> On Aug 12, 2020, at 11:51 AM, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> On Wed, 2020-08-12 at 10:15 -0400, Chuck Lever wrote:
>>> On Aug 11, 2020, at 11:53 AM, James Bottomley
>>> <James.Bottomley@HansenPartnership.com> wrote:
>>> 
>>> On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
> [...]
>>>>> 
>>>>> and what is nice to have to speed up the verification
>>>>> process.  The choice for the latter is cache or reconstruct
>>>>> depending on the resources available.  If the tree gets cached
>>>>> on the server, that would be a server implementation detail
>>>>> invisible to the client.
>>>> 
>>>> We assume that storage targets (for block or file) are not
>>>> trusted. Therefore storage clients cannot rely on intermediate
>>>> results (eg, middle nodes in a Merkle tree) unless those results
>>>> are generated within the client's trust envelope.
>>> 
>>> Yes, they can ... because supplied nodes can be verified.  That's
>>> the whole point of a merkle tree.  As long as I'm sure of the root
>>> hash I can verify all the rest even if supplied by an untrusted
>>> source.  If you consider a simple merkle tree covering 4 blocks:
>>> 
>>>      R
>>>    /   \
>>> H11     H12
>>> / \     / \
>>> H21 H22 H23 H24
>>>>   |   |   |
>>> 
>>> B1   B2  B3  B4
>>> 
>>> Assume I have the verified root hash R.  If you supply B3 you also
>>> supply H24 and H11 as proof.  I verify by hashing B3 to produce H23
>>> then hash H23 and H24 to produce H12 and if H12 and your supplied
>>> H11 hash to R the tree is correct and the B3 you supplied must
>>> likewise be correct.
>> 
>> I'm not sure what you are proving here. Obviously this has to work
>> in order for a client to reconstruct the file's Merkle tree given
>> only R and the file content.
> 
> You implied the server can't be trusted to generate the merkel tree. 
> I'm showing above it can because of the tree path based verification.

What I was implying is that clients can't trust intermediate Merkle
tree content that is not also signed. So far we are talking about
signing only the tree root.

The storage server can store the tree durably, but if the intermediate
parts of the tree are not signed, the client has to verify them anyway,
and that reduces the value of storing potentially large data structures.


>> It's the construction of the tree and verification of the hashes that
>> are potentially expensive. The point of caching intermediate hashes
>> is so that the client verifies them as few times as possible.  I
>> don't see value in caching those hashes on an untrusted server --
>> the client will have to reverify them anyway, and there will be no
>> savings.
> 
> I'm not making any claim about server caching, I'm just saying the
> client can request pieces of the tree from the server without having to
> reconstruct the whole thing itself because it can verify their
> correctness.

To be clear, my concern is about how much of the tree might be stored
in a Merkle-based metadata format. I just don't see that it has much
value to store more than the signed tree root, because the client will
have to reconstitute or verify some tree contents on most every read.

For sufficiently large files, the tree itself can be larger than what
can be stored in an xattr. This is the same problem that fs-verity
faces. And, as I stated earlier, xattr objects are read in their
entirety, they can't be seeked into or read piecemeal.

What it seemed to me that you were suggesting was an offloaded cache
of the Merkle tree. Either the whole tree is stored on the storage
server, or the storage server provides a service that reconstitutes
that tree on behalf of clients. (Please correct me if I misunderstood).
I just don't think that will be practicable or provide the kind of
benefit you might want.


>> Cache once, as close as you can to where the data will be used.
>> 
>> 
>>>> So: if the storage target is considered inside the client's trust
>>>> envelope, it can cache or store durably any intermediate parts of
>>>> the verification process. If not, the network and file storage is
>>>> considered untrusted, and the client has to rely on nothing but
>>>> the signed digest of the tree root.
>>>> 
>>>> We could build a scheme around, say, fscache, that might save the
>>>> intermediate results durably and locally.
>>> 
>>> I agree we want caching on the client, but we can always page in
>>> from the remote as long as we page enough to verify up to R, so
>>> we're always sure the remote supplied genuine information.
>> 
>> Agreed.
>> 
>> 
>>>>>> For this reason, the idea was to save only the signature of
>>>>>> the tree's root on durable storage. The client would retrieve
>>>>>> that signature possibly at open time, and reconstruct the
>>>>>> tree at that time.
>>>>> 
>>>>> Right that's the integrity data you must have.
>>>>> 
>>>>>> Or the tree could be partially constructed on-demand at the
>>>>>> time each unit is to be checked (say, as part of 2. above).
>>>>> 
>>>>> Whether it's reconstructed or cached can be an implementation
>>>>> detail. You clearly have to reconstruct once, but whether you
>>>>> have to do it again depends on the memory available for caching
>>>>> and all the other resource calls in the system.
>>>>> 
>>>>>> The client would have to reconstruct that tree again if
>>>>>> memory pressure caused some or all of the tree to be evicted,
>>>>>> so perhaps an on-demand mechanism is preferable.
>>>>> 
>>>>> Right, but I think that's implementation detail.  Probably what
>>>>> we need is a way to get the log(N) verification hashes from the
>>>>> server and it's up to the client whether it caches them or not.
>>>> 
>>>> Agreed, these are implementation details. But see above about the
>>>> trustworthiness of the intermediate hashes. If they are conveyed
>>>> on an untrusted network, then they can't be trusted either.
>>> 
>>> Yes, they can, provided enough of them are asked for to verify.  If
>>> you look at the simple example above, suppose I have cached H11 and
>>> H12, but I've lost the entire H2X layer.  I want to verify B3 so I
>>> also ask you for your copy of H24.  Then I generate H23 from B3 and
>>> Hash H23 and H24.  If this doesn't hash to H12 I know either you
>>> supplied me the wrong block or lied about H24.  However, if it all
>>> hashes correctly I know you supplied me with both the correct B3
>>> and the correct H24.
>> 
>> My point is there is a difference between a trusted cache and an
>> untrusted cache. I argue there is not much value in a cache where
>> the hashes have to be verified again.
> 
> And my point isn't about caching, it's about where the tree comes from.
> I claim and you agree the client can get the tree from the server a
> piece at a time (because it can path verify it) and doesn't have to
> generate it itself.

OK, let's focus on where the tree comes from. It is certainly
possible to build protocol to exchange parts of a Merkle tree. The
question is how it might be stored on the server. There are some
underlying assumptions about the metadata storage mechanism that
should be stated up front.

Current forms of IMA metadata are limited in size and stored in a
container that is read and written in a single operation. If we stick
with that container format, I don't see a way to store a Merkle tree
in there for all file sizes.

Thus it seems to me that we cannot begin to consider the tree-on-the-
server model unless there is a proposed storage mechanism for that
whole tree. Otherwise, the client must have the primary role in
unpacking and verifying the tree.

Storing only the tree root in the metadata means the metadata format
is nicely bounded in size.


> How much of the tree the client has to store and
> whether the server caches, reads it in from somewhere or reconstructs
> it is an implementation detail.

Sure.


--
Chuck Lever
chucklever@gmail.com




  reply index

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28 21:36 Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 01/11] scripts: add ipe tooling to generate boot policy Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 02/11] security: add ipe lsm evaluation loop and audit system Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 03/11] security: add ipe lsm policy parser and policy loading Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 04/11] ipe: add property for trust of boot volume Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 05/11] fs: add security blob and hooks for block_device Deven Bowers
2020-07-28 22:22   ` Casey Schaufler
2020-07-28 22:40     ` Al Viro
2020-07-28 23:55       ` Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 06/11] dm-verity: move signature check after tree validation Deven Bowers
2020-07-28 21:50   ` Eric Biggers
2020-07-28 23:55     ` Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 07/11] dm-verity: add bdev_setsecurity hook for dm-verity signature Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 08/11] ipe: add property for signed dmverity volumes Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 09/11] dm-verity: add bdev_setsecurity hook for root-hash Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 10/11] documentation: add ipe documentation Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 10/12] ipe: add property for dmverity roothash Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 11/11] cleanup: uapi/linux/audit.h Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 11/12] documentation: add ipe documentation Deven Bowers
2020-07-28 21:36 ` [RFC PATCH v5 12/12] cleanup: uapi/linux/audit.h Deven Bowers
2020-08-02 11:55 ` [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE) Pavel Machek
2020-08-02 14:03   ` Sasha Levin
2020-08-02 14:31     ` Pavel Machek
2020-08-02 16:43       ` [dm-devel] " James Bottomley
2020-08-04 16:07         ` Deven Bowers
2020-08-05 15:01           ` James Bottomley
2020-08-05 16:59             ` James Morris
2020-08-05 18:15               ` Mimi Zohar
2020-08-05 23:51                 ` James Morris
2020-08-06 14:33                   ` Mimi Zohar
2020-08-07 16:41                     ` James Morris
2020-08-07 17:31                       ` Mimi Zohar
2020-08-07 18:40                         ` Mimi Zohar
2020-08-10 20:29                           ` James Morris
2020-08-08 17:47                 ` Chuck Lever
2020-08-09 17:16                   ` Mimi Zohar
2020-08-10 15:35                     ` James Bottomley
2020-08-10 16:35                       ` Mimi Zohar
2020-08-10 17:13                         ` James Bottomley
2020-08-10 17:57                           ` Mimi Zohar
2020-08-10 23:36                       ` Chuck Lever
2020-08-11  5:43                         ` James Bottomley
2020-08-11 14:48                           ` Chuck Lever
2020-08-11 15:32                             ` James Bottomley
2020-08-11 19:30                               ` Pavel Machek
2020-08-12 14:45                               ` Chuck Lever
2020-08-11 15:53                             ` James Bottomley
2020-08-12 14:15                               ` Chuck Lever
2020-08-12 15:51                                 ` James Bottomley
2020-08-13 14:42                                   ` Chuck Lever [this message]
2020-08-13 15:10                                     ` James Bottomley
2020-08-14 14:21                                       ` Chuck Lever
2020-08-11 18:28                             ` James Bottomley
2020-08-12 13:56                               ` Chuck Lever
2020-08-12 15:42                                 ` James Bottomley
2020-08-13 14:21                                   ` Chuck Lever
2020-08-13 14:42                                     ` James Bottomley
2020-08-13 14:56                                       ` Chuck Lever
2020-08-11 21:03                   ` James Morris
2020-08-12 14:18                     ` Chuck Lever
2020-08-12 17:07                       ` Deven Bowers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D470BA4B-EF1A-49CA-AFB9-0F7FFC4C6001@gmail.com \
    --to=chucklever@gmail.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=corbet@lwn.net \
    --cc=deven.desai@linux.microsoft.com \
    --cc=dm-devel@redhat.com \
    --cc=eparis@redhat.com \
    --cc=jannh@google.com \
    --cc=jaskarankhurana@linux.microsoft.com \
    --cc=jmorris@namei.org \
    --cc=linux-audit@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=mdsakib@microsoft.com \
    --cc=nramas@linux.microsoft.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=paul@paul-moore.com \
    --cc=pavel@ucw.cz \
    --cc=sashal@kernel.org \
    --cc=serge@hallyn.com \
    --cc=snitzer@redhat.com \
    --cc=tyhicks@linux.microsoft.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zohar@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Integrity Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-integrity/0 linux-integrity/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-integrity linux-integrity/ https://lore.kernel.org/linux-integrity \
		linux-integrity@vger.kernel.org
	public-inbox-index linux-integrity

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-integrity


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git