Re: [RFC] Persist ima logs to disk

From: Mimi Zohar <zohar@linux.ibm.com>
To: Raphael Gianotti <raphgi@linux.microsoft.com>,
	Amir Goldstein <amir73il@gmail.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>,
	janne.karhunen@gmail.com,
	linux-integrity <linux-integrity@vger.kernel.org>,
	tusharsu@linux.microsoft.com, tyhicks@linux.microsoft.com,
	nramas@linux.microsoft.com, balajib@linux.microsoft.com
Subject: Re: [RFC] Persist ima logs to disk
Date: Tue, 09 Feb 2021 13:08:06 -0500	[thread overview]
Message-ID: <4c210f35a2892d85438c17fb892bd804f488a4e7.camel@linux.ibm.com> (raw)
In-Reply-To: <41e4222e-8daa-ff80-df6d-e546772faf59@linux.microsoft.com>

On Tue, 2021-02-09 at 09:20 -0800, Raphael Gianotti wrote:
> On 2/3/2021 10:45 AM, Mimi Zohar wrote:
> > On Wed, 2021-02-03 at 09:24 +0200, Amir Goldstein wrote:
> >> On Wed, Feb 3, 2021 at 3:02 AM Mimi Zohar <zohar@linux.ibm.com> wrote:
> >>> On Tue, 2021-02-02 at 10:14 -0800, Raphael Gianotti wrote:
> >>>> On 2/2/2021 5:07 AM, Mimi Zohar wrote:
> >>>>> On Tue, 2021-02-02 at 07:54 +0200, Amir Goldstein wrote:
> >>>>>> On Tue, Feb 2, 2021 at 12:53 AM Raphael Gianotti
> >>>>>> <raphgi@linux.microsoft.com> wrote:
> >>>>>>> On 1/8/2021 9:58 AM, Raphael Gianotti wrote:
> >>>>>>>> On 1/8/2021 4:38 AM, Mimi Zohar wrote:
> >>>>>>>>> On Thu, 2021-01-07 at 14:57 -0800, Raphael Gianotti wrote:
> >>>>>>>>>>>>>> But this doesn't address where the offloaded measurement list
> >>>>>>>>>>>>>> will be stored, how long the list will be retained, nor who
> >>>>>>>>>>>>>> guarantees the integrity of the offloaded list.  In addition,
> >>>>>>>>>>>>>> different form factors will have different requirements.
> >>>>>>>>>> For how long the list would be retained, or in the case of a log
> >>>>>>>>>> segments, it
> >>>>>>>>>> might make sense to have that be an admin decision, something that
> >>>>>>>>>> can be
> >>>>>>>>>> configured to satisfy the needs of a specific system, as mentioned
> >>>>>>>>>> below by
> >>>>>>>>>> James, does that seem correct?
> >>>>>>>>> For the discussion on exporting and truncating the IMA measurement
> >>>>>>>>> list, refer to:
> >>>>>>>>> https://lore.kernel.org/linux-integrity/1580998432.5585.411.camel@linux.ibm.com/
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Given the possibility of keeping the logs around for an indefinite
> >>>>>>>>>> amount of
> >>>>>>>>>> time, would using an expansion of the method present in this RFC be
> >>>>>>>>>> more
> >>>>>>>>>> appropriate than going down the vfs_tmpfile route? Forgive my lack
> >>>>>>>>>> on expertise
> >>>>>>>>>> on mm, but would the vfs_tmpfile approach work for keeping several
> >>>>>>>>>> log segments
> >>>>>>>>>> across multiple kexecs?
> >>>>>>>>> With the "vfs_tmpfile" mechanism, breaking up and saving the log in
> >>>>>>>>> segments isn't needed.  The existing mechanism for carrying the
> >>>>>>>>> measurement list across kexec would still be used.  Currently, if the
> >>>>>>>>> kernel cannot allocate the memory needed for carrying the measurement
> >>>>>>>>> across kexec, it simply emits an error message, but continues with the
> >>>>>>>>> kexec.
> >>>>>>>> In this change I had introduced "exporting" the log to disk when the size
> >>>>>>>> of the measurement list was too large. Given part of the motivation
> >>>>>>>> behind
> >>>>>>>> moving the measurement list is the possibility of it growing too large
> >>>>>>>> and taking up too much of the kernel memory, that case would likely lead
> >>>>>>>> to kexec not being able to carry over the logs. Do you believe it's
> >>>>>>>> better
> >>>>>>>> to use the "vfs_tmpfile" mechanism for moving the logs to disk and worry
> >>>>>>>> about potential issues with kexec not being able to carry over the logs
> >>>>>>>> separately, given the "vfs_tempfile" approach seems to be preferred and
> >>>>>>>> also simplifies worries regarding truncating the logs?
> >>>>>>> After a chat with Mimi I went ahead and did some investigative
> >>>>>>> work in the vfs_tmpfile approach suggested, and I wanted to
> >>>>>>> restart this thread with some thoughts/questions that came up
> >>>>>>> from that.
> >>>>>>> For the work I did I simply created a tmp file during ima's
> >>>>>>> initialization and then tried to use vm_mmap to map it to memory,
> >>>>>>> with the goal of using that memory mapping to generate return
> >>>>>>> pointers to the code that writes the measurement entries to memory.
> >>>>>> I don't understand why you would want to do that. I might have misunderstood
> >>>>>> the requirements, but this was not how I meant for tmpfile to be used.
> >>>>>>
> >>>>>> Mimi explained to me that currently the IMA measurement list is entirely in
> >>>>>> memory and that you are looking for a way to dump it into a file in order to
> >>>>>> free up memory.
> >>>>>>
> >>>>>> What I suggested is this:
> >>>>>>
> >>>>>> - User opens an O_TMPFILE and passes fd to IMA to start export
> >>>>>> - IMA starts writing (exporting) records to that file using *kernel* write API
> >>>>>> - Every record written to the file is removed from the in-memory list
> >>>>>> - While list is being exported, IMA keeps in-memory count of exported entries
> >>>>>> - In ima_measurements_start, if export file exists, start iterator
> >>>>>> starts reading
> >>>>>>     records from the file
> >>>>>> - In ima_measurements_next(), when next iterator reaches the export count,
> >>>>>>     it switches over to iterate in-memory list
> >>>>>>
> >>>>>> This process can:
> >>>>>> 1. Continue forever without maintaining any in-memory list
> >>>>>> 2. Work in the background to periodically flush list to file
> >>>>>> 3. Controlled by explicit user commands
> >>>>>> 4. All of the above
> >>>>>>
> >>>>>> Is that understood? Did I understand the requirements correctly?
> >>>> Thanks for the clarification Amir, I never actually saw your initial mails,
> >>>> I apologize for the confusion, the use of mmap was something the original
> >>>> author of the export ima logs to disk mentioned had been suggested, which
> >>>> is why I went down that route.
> >>>> Given the actual suggestion you originally had given, I believe the coding
> >>>> of it is somewhat to the code I sent in the RFC in terms of approach (if we
> >>>> were to have it do periodic flushes, for example). With the addition of
> >>>> reads to the log starting with the file as the oldest logs will be there.
> >>>> I believe the only difference there is whether the list is kept in a tmp
> >>>> file or not, so with the tmp file approach it would be just to keep the
> >>>> list out of memory (either partially or permanently), where with a permanent
> >>>> file, the list would still be available after a cold boot for instance.
> >>> With Amir's suggestion, userspace still accesses the entire measurement
> >>> list via the existing securityfs interface.  Only the kernel should be
> >>> able to append or access the file.
> >>>
> >> This user API is not an important part of the suggestion:
> >>
> >> - User opens an O_TMPFILE and passes fd to IMA to start export
> >>
> >> It is just how I understood the API should be.
> >> Kernel could open the O_TMPFILE or named file for that matter just as well.
> >> If the kernel opens an O_TMPFILE, userspace has no standard way to access
> >> that file. There are, as always, ways for privileged users to learn about that
> >> tmpfile and open it with open_by_handle_at().
> >>
> >> IMA is an LSM, so the best way to block unauthorized access to that file
> >> would be via LSM hooks. IMA keeps a reference to that file, so it can
> >> identify access to that file from userspace.
> > Having the kernel open a O_TMPFILE and use/define additional LSM hooks,
> > as needed, to limit access to the file sounds good.
> >
> > In terms of the rest of the userspace interface, I would probably
> > define a new IMA securityfs file to control the frequency that the
> > measurements are written to the file (e.g. 0 == never, 1 == enabled
> > with default frequency, anything else frequency).
> >
> > thanks,
> >
> > Mimi
> 
> Regarding the comparison between the original approach in the RFC,
> using permanent files, and the TMP file approach (with me having the
> correct suggestion in mind now), I believe there are still some
> benefits from having a file be permanent:
> 	- It provides a way to always keep the logs across the kexec
> 	  boundary, regardless of how much they've grown in size,
> 	  as the file will persist through it, where as with a tmp
> 	  file, the ability to persist the logs through kexec would
> 	  still be subject to the same memory restrictions it currently
> 	  is.
> 	- It provides a way for an attestation service to have a
> 	  history of measurements performed during previous cold boot
> 	  cycles.
> 
> The above, however, does introduce the requirements of persisting
> PCR information that would allow verifying historical logs, and
> that information would have to be verifiable in some way (for instance,
> by having the TPM add a signature alongside the logs).
> 
> I think both solutions have their merits, and the TMP file approach
> seems much simpler overall. What I have in mind is perhaps having this
> be configurable, where a file can be defined to hold the logs, but
> the persisting of logs to disk can still be turned on without that
> file being configures, leading to a TMP file being used. That would
> leave it up to the admin to decide whether a permanent file is needed.
> 
> The above is in addition to any other configurations that may be
> applicable, such as one for the frequency as suggested by Mimi.

With the existing mechanism of exporting the IMA measurement list,
userspace can do whatever it likes with it and save it wherever it
likes (e.g. define a cron job to export the measurement list, the
number of measurements, and the TPM PCRs, even storing in a block
chain).  I don't see the benefit in defining a new kernel mechanism for
exporting the IMA measurement list.

Mimi