Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3] ext4: wipe filename upon file deletion
@ 2021-04-19 16:21 Leah Rumancik
  2021-04-19 22:53 ` Eric Biggers
  0 siblings, 1 reply; 6+ messages in thread
From: Leah Rumancik @ 2021-04-19 16:21 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, Leah Rumancik

Upon file deletion, zero out all fields in ext4_dir_entry2 besides inode
and rec_len. In case sensitive data is stored in filenames, this ensures
no potentially sensitive data is left in the directory entry upon deletion.
Also, wipe these fields upon moving a directory entry during the conversion
to an htree and when splitting htree nodes.

Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
---
 fs/ext4/namei.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 883e2a7cd4ab..df7809a4821f 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1778,6 +1778,11 @@ dx_move_dirents(char *from, char *to, struct dx_map_entry *map, int count,
 		((struct ext4_dir_entry_2 *) to)->rec_len =
 				ext4_rec_len_to_disk(rec_len, blocksize);
 		de->inode = 0;
+
+		/* wipe name_len through and name field */
+		memset(&de->name_len, 0, ext4_rec_len_from_disk(de->rec_len,
+						blocksize) - 6);
+
 		map++;
 		to += rec_len;
 	}
@@ -2102,6 +2107,7 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname,
 	data2 = bh2->b_data;
 
 	memcpy(data2, de, len);
+	memset(de, 0, len); /* wipe old data */
 	de = (struct ext4_dir_entry_2 *) data2;
 	top = data2 + len;
 	while ((char *)(de2 = ext4_next_entry(de, blocksize)) < top)
@@ -2492,6 +2498,11 @@ int ext4_generic_delete_entry(struct inode *dir,
 			else
 				de->inode = 0;
 			inode_inc_iversion(dir);
+
+			/* wipe name_len through name field */
+			memset(&de->name_len, 0,
+				ext4_rec_len_from_disk(de->rec_len, blocksize) - 6);
+
 			return 0;
 		}
 		i += ext4_rec_len_from_disk(de->rec_len, blocksize);
-- 
2.31.1.368.gbe11c130af-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] ext4: wipe filename upon file deletion
  2021-04-19 16:21 [PATCH v3] ext4: wipe filename upon file deletion Leah Rumancik
@ 2021-04-19 22:53 ` Eric Biggers
  2021-04-20  1:59   ` Theodore Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Biggers @ 2021-04-19 22:53 UTC (permalink / raw)
  To: Leah Rumancik; +Cc: linux-ext4, tytso

On Mon, Apr 19, 2021 at 04:21:00PM +0000, Leah Rumancik wrote:
> Upon file deletion, zero out all fields in ext4_dir_entry2 besides inode
> and rec_len. In case sensitive data is stored in filenames, this ensures
> no potentially sensitive data is left in the directory entry upon deletion.
> Also, wipe these fields upon moving a directory entry during the conversion
> to an htree and when splitting htree nodes.

This should include more explanation about why this is useful, and what its
limitations are (e.g. how do the properties of the storage device affect whether
the filename is *really* deleted)...

> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 883e2a7cd4ab..df7809a4821f 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -1778,6 +1778,11 @@ dx_move_dirents(char *from, char *to, struct dx_map_entry *map, int count,
>  		((struct ext4_dir_entry_2 *) to)->rec_len =
>  				ext4_rec_len_to_disk(rec_len, blocksize);
>  		de->inode = 0;
> +
> +		/* wipe name_len through and name field */
> +		memset(&de->name_len, 0, ext4_rec_len_from_disk(de->rec_len,
> +						blocksize) - 6);
> +

The comment is confusing.  IMO it would make more sense to mention what is *not*
being zeroed:

	/* wipe the dir_entry excluding the rec_len field */
	de->inode = 0;
	memset(&de->name_len, 0, ext4_rec_len_from_disk(de->rec_len,
						blocksize) - 6);

> @@ -2492,6 +2498,11 @@ int ext4_generic_delete_entry(struct inode *dir,
>  			else
>  				de->inode = 0;
>  			inode_inc_iversion(dir);
> +
> +			/* wipe name_len through name field */
> +			memset(&de->name_len, 0,
> +				ext4_rec_len_from_disk(de->rec_len, blocksize) - 6);
> +
>  			return 0;

And maybe here too, although here why is the condition for setting the inode to
0 not the same as the condition for zeroing the other fields?

Also, maybe use offsetof(struct ext4_dir_entry_2, name_len) instead of '6'...

- Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] ext4: wipe filename upon file deletion
  2021-04-19 22:53 ` Eric Biggers
@ 2021-04-20  1:59   ` Theodore Ts'o
  2021-04-20 14:55     ` Leah Rumancik
  2021-04-22 17:44     ` Andreas Dilger
  0 siblings, 2 replies; 6+ messages in thread
From: Theodore Ts'o @ 2021-04-20  1:59 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Leah Rumancik, linux-ext4

On Mon, Apr 19, 2021 at 03:53:52PM -0700, Eric Biggers wrote:
> On Mon, Apr 19, 2021 at 04:21:00PM +0000, Leah Rumancik wrote:
> > Upon file deletion, zero out all fields in ext4_dir_entry2 besides inode
> > and rec_len. In case sensitive data is stored in filenames, this ensures
> > no potentially sensitive data is left in the directory entry upon deletion.
> > Also, wipe these fields upon moving a directory entry during the conversion
> > to an htree and when splitting htree nodes.
> 
> This should include more explanation about why this is useful, and what its
> limitations are (e.g. how do the properties of the storage device affect whether
> the filename is *really* deleted)...

Well, it might be useful to talk about how this is not a complete
solution on its own (acknowledge that more changes to make sure
filenames aren't leaked in the journal will be forthcoming).

However, there is a limit to how much we can put in a commit
description, and I'd argue that the people for whom caveats about
flash devices having old copies of directory blocks which could be
extracted by a nation-state intelligence angency, etc., are not likely
going to be the people reading the git commit description.  :-)  That's
the sort of thing that is best placed in a presentation given at a
conference, or in a white paper, or in LWN article.

Commit descriptions are targetted at developers, so a note that "more
commits to follow" would be appropriate.

> > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> > index 883e2a7cd4ab..df7809a4821f 100644
> > --- a/fs/ext4/namei.c
> > +++ b/fs/ext4/namei.c
> > @@ -1778,6 +1778,11 @@ dx_move_dirents(char *from, char *to, struct dx_map_entry *map, int count,
> >  		((struct ext4_dir_entry_2 *) to)->rec_len =
> >  				ext4_rec_len_to_disk(rec_len, blocksize);
> >  		de->inode = 0;
> > +
> > +		/* wipe name_len through and name field */
> > +		memset(&de->name_len, 0, ext4_rec_len_from_disk(de->rec_len,
> > +						blocksize) - 6);
> > +

This change in dx_move_dirents() does work, but I wonder if it would
have been better / more efficient to simply zero out the last
directory entry in dx_pack_dirents() after it is done packing the
directory entries in the original directory block?


> The comment is confusing.  IMO it would make more sense to mention what is *not*
> being zeroed:
> 
> 	/* wipe the dir_entry excluding the rec_len field */

Or maybe, "wipe everything in the directory entry after the rec_len
field".

> > @@ -2492,6 +2498,11 @@ int ext4_generic_delete_entry(struct inode *dir,
> >  			else
> >  				de->inode = 0;
> >  			inode_inc_iversion(dir);
> > +
> > +			/* wipe name_len through name field */
> > +			memset(&de->name_len, 0,
> > +				ext4_rec_len_from_disk(de->rec_len, blocksize) - 6);
> > +
> >  			return 0;
> 
> And maybe here too, although here why is the condition for setting the inode to
> 0 not the same as the condition for zeroing the other fields?

I'd actually suggest wiping the directory entry *before* the "if
(pde)" statement, and yeah, it's probably best to zap the de->inode
unconditionally.

What is going on is if there is a previoud directory entry ("if (pde)
...) the the original code wasn't changing the directory entry at all,
including zero'ing the inode field, but instead simply expanding the
previous directory entry's rec_len to include the directory entry
being deleted.  So in the original code, where the goal is to make
life as easy as possible for undelete programs, skipping "de->inode =
0" when it was unnecessary was a good thing.

But given that the new design goal of the code is, "to heck with
undelete programs, we want to shred anything that's no longer needed",
clearing the inode number is fine.

In fact, what we could actually do is in the if (pde) case, we can zap
the entire directory entry, include de->rec_len.  The advantage of
doing that is it becomes a lot easier to verify that the wiping code
is working correctly.  We can simply check to make sure everything in
every directory entry after the end of the name (e.g., everything between
&de->name[de->name_pen) and ((char *) de) + de->rec_len) MUST be zero.

> Also, maybe use offsetof(struct ext4_dir_entry_2, name_len) instead of '6'...

Sure.  Someone will still need to look at the definition of struct
ext4_dir_entry_2 to understand the structure layout, but offsetof(..)
is going to be a bit more understandable than a magic constant of '6'.

   	       	     	  		      - Ted


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] ext4: wipe filename upon file deletion
  2021-04-20  1:59   ` Theodore Ts'o
@ 2021-04-20 14:55     ` Leah Rumancik
  2021-04-22 17:44     ` Andreas Dilger
  1 sibling, 0 replies; 6+ messages in thread
From: Leah Rumancik @ 2021-04-20 14:55 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Eric Biggers, linux-ext4

On Mon, Apr 19, 2021 at 09:59:06PM -0400, Theodore Ts'o wrote:
> On Mon, Apr 19, 2021 at 03:53:52PM -0700, Eric Biggers wrote:
> > On Mon, Apr 19, 2021 at 04:21:00PM +0000, Leah Rumancik wrote:
> > > Upon file deletion, zero out all fields in ext4_dir_entry2 besides inode
> > > and rec_len. In case sensitive data is stored in filenames, this ensures
> > > no potentially sensitive data is left in the directory entry upon deletion.
> > > Also, wipe these fields upon moving a directory entry during the conversion
> > > to an htree and when splitting htree nodes.
> > 
> > This should include more explanation about why this is useful, and what its
> > limitations are (e.g. how do the properties of the storage device affect whether
> > the filename is *really* deleted)...
> 
> Well, it might be useful to talk about how this is not a complete
> solution on its own (acknowledge that more changes to make sure
> filenames aren't leaked in the journal will be forthcoming).
> 
> However, there is a limit to how much we can put in a commit
> description, and I'd argue that the people for whom caveats about
> flash devices having old copies of directory blocks which could be
> extracted by a nation-state intelligence angency, etc., are not likely
> going to be the people reading the git commit description.  :-)  That's
> the sort of thing that is best placed in a presentation given at a
> conference, or in a white paper, or in LWN article.
> 
> Commit descriptions are targetted at developers, so a note that "more
> commits to follow" would be appropriate.

I'll add a bit more to the description to help clarify some more.

> 
> > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> > > index 883e2a7cd4ab..df7809a4821f 100644
> > > --- a/fs/ext4/namei.c
> > > +++ b/fs/ext4/namei.c
> > > @@ -1778,6 +1778,11 @@ dx_move_dirents(char *from, char *to, struct dx_map_entry *map, int count,
> > >  		((struct ext4_dir_entry_2 *) to)->rec_len =
> > >  				ext4_rec_len_to_disk(rec_len, blocksize);
> > >  		de->inode = 0;
> > > +
> > > +		/* wipe name_len through and name field */
> > > +		memset(&de->name_len, 0, ext4_rec_len_from_disk(de->rec_len,
> > > +						blocksize) - 6);
> > > +
> 
> This change in dx_move_dirents() does work, but I wonder if it would
> have been better / more efficient to simply zero out the last
> directory entry in dx_pack_dirents() after it is done packing the
> directory entries in the original directory block?
> 
> 

Hmm, I was basing it on the fact that it felt more logical to put it with
the code that actually moves the entry in case move_dirents() is ever called
without a packing after. But I see that move_dirents() and pack_dirents()
are each only called once in do_split() so perhaps moving it to
pack_dirents() would be best.

> > The comment is confusing.  IMO it would make more sense to mention what is *not*
> > being zeroed:
> > 
> > 	/* wipe the dir_entry excluding the rec_len field */
> 
> Or maybe, "wipe everything in the directory entry after the rec_len
> field".

Thanks for the suggestion, I was having a hard time coming up with something
understandable.

> 
> > > @@ -2492,6 +2498,11 @@ int ext4_generic_delete_entry(struct inode *dir,
> > >  			else
> > >  				de->inode = 0;
> > >  			inode_inc_iversion(dir);
> > > +
> > > +			/* wipe name_len through name field */
> > > +			memset(&de->name_len, 0,
> > > +				ext4_rec_len_from_disk(de->rec_len, blocksize) - 6);
> > > +
> > >  			return 0;
> > 
> > And maybe here too, although here why is the condition for setting the inode to
> > 0 not the same as the condition for zeroing the other fields?
> 
> I'd actually suggest wiping the directory entry *before* the "if
> (pde)" statement, and yeah, it's probably best to zap the de->inode
> unconditionally.
> 
> What is going on is if there is a previoud directory entry ("if (pde)
> ...) the the original code wasn't changing the directory entry at all,
> including zero'ing the inode field, but instead simply expanding the
> previous directory entry's rec_len to include the directory entry
> being deleted.  So in the original code, where the goal is to make
> life as easy as possible for undelete programs, skipping "de->inode =
> 0" when it was unnecessary was a good thing.
> 
> But given that the new design goal of the code is, "to heck with
> undelete programs, we want to shred anything that's no longer needed",
> clearing the inode number is fine.
> 
> In fact, what we could actually do is in the if (pde) case, we can zap
> the entire directory entry, include de->rec_len.  The advantage of
> doing that is it becomes a lot easier to verify that the wiping code
> is working correctly.  We can simply check to make sure everything in
> every directory entry after the end of the name (e.g., everything between
> &de->name[de->name_pen) and ((char *) de) + de->rec_len) MUST be zero.
> 

Yes this all makes sense, I'll update it. I'll also add a test similar to what
you described in the fstest I'm working on.

> > Also, maybe use offsetof(struct ext4_dir_entry_2, name_len) instead of '6'...
> 
> Sure.  Someone will still need to look at the definition of struct
> ext4_dir_entry_2 to understand the structure layout, but offsetof(..)
> is going to be a bit more understandable than a magic constant of '6'.
> 

Sounds good to me.

>    	       	     	  		      - Ted
> 

Thanks for the comments,
Leah

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] ext4: wipe filename upon file deletion
  2021-04-20  1:59   ` Theodore Ts'o
  2021-04-20 14:55     ` Leah Rumancik
@ 2021-04-22 17:44     ` Andreas Dilger
  2021-04-22 19:58       ` Theodore Ts'o
  1 sibling, 1 reply; 6+ messages in thread
From: Andreas Dilger @ 2021-04-22 17:44 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Eric Biggers, Leah Rumancik, linux-ext4


[-- Attachment #1: Type: text/plain, Size: 1867 bytes --]

On Apr 19, 2021, at 7:59 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> 
> On Mon, Apr 19, 2021 at 03:53:52PM -0700, Eric Biggers wrote:
>> On Mon, Apr 19, 2021 at 04:21:00PM +0000, Leah Rumancik wrote:
>>> Upon file deletion, zero out all fields in ext4_dir_entry2 besides inode
>>> and rec_len. In case sensitive data is stored in filenames, this ensures
>>> no potentially sensitive data is left in the directory entry upon deletion.
>>> Also, wipe these fields upon moving a directory entry during the conversion
>>> to an htree and when splitting htree nodes.
>> 
>> This should include more explanation about why this is useful, and what its
>> limitations are (e.g. how do the properties of the storage device affect whether
>> the filename is *really* deleted)...
> 
> Well, it might be useful to talk about how this is not a complete
> solution on its own (acknowledge that more changes to make sure
> filenames aren't leaked in the journal will be forthcoming).
> 
> However, there is a limit to how much we can put in a commit
> description, and I'd argue that the people for whom caveats about
> flash devices having old copies of directory blocks which could be
> extracted by a nation-state intelligence angency, etc., are not likely
> going to be the people reading the git commit description.  :-)  That's
> the sort of thing that is best placed in a presentation given at a
> conference, or in a white paper, or in LWN article.
> 
> Commit descriptions are targetted at developers, so a note that "more
> commits to follow" would be appropriate.

Since the "delete-after-the-fact" method of security is always going
to have holes in terms of recovering data from the journal, from the
flash device, etc. why not use fscrypt for this kind of workload, if
the data actually needs to be secure?

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] ext4: wipe filename upon file deletion
  2021-04-22 17:44     ` Andreas Dilger
@ 2021-04-22 19:58       ` Theodore Ts'o
  0 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2021-04-22 19:58 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Eric Biggers, Leah Rumancik, linux-ext4

On Thu, Apr 22, 2021 at 11:44:49AM -0600, Andreas Dilger wrote:
> Since the "delete-after-the-fact" method of security is always going
> to have holes in terms of recovering data from the journal, from the
> flash device, etc. why not use fscrypt for this kind of workload, if
> the data actually needs to be secure?

Wiping the journal is something that will be coming soon --- prototype
versions of that patch have been sent out, and the main controversy
has been whether it should be an ext4-specific interface, or whether
it should be done in a file system independent API, and if so, how to
define it.

Whether or not you can recover it from the block device is very block
device specific.  There are certainly situations, such as people
running VM's on AWS, Azure, GCP, etc., where they don't have physical
access to the block device, where making sure it can be wiped so it
can't be accessed via software is quite sufficient.  Even if you have
physical access to block device, recovering overwritten information
from a HDD is *not* trivial.  Not all adversaries have access to
scanning electronic microscopes, and demonstrations that overwritten
disk sectors were done decades ago, when the magnetic domains were far
larger.

Using fscrypt is certainly an option, but using encryption is not free
from a performance standpoint, and you still have to answer the
question of where the encryption keys would be stored.

Cheers,

					- Ted

P.S.  Interesting info from
https://security.stackexchange.com/questions/26132/is-data-remanence-a-myth:

    The best citation I can give is from Overwriting Hard Drive Data: The
    Great Wiping Controversy, which was published as part of the 4th
    International Conference on Information Systems Security, ICISS
    2008. You can view the full text of the paper by viewing the book on
    Google Books, and jumping to page 243.

    The following excerpt is from their conclusion:

        The purpose of this paper was a categorical settlement to the
        controversy surrounding the misconceptions involving the belief that
        data can be recovered following a wipe procedure. This study has
        demonstrated that correctly wiped data cannot reasonably retrieved
        even if it of a small size or found only over small parts of the hard
        drive. Not even with the use of a MFM or other known methods. The
        belief that a tool can be developed to retrieve gigabytes or terabytes
        of data of information from a wiped drive is in error.

        Although there is a good chance of recovery for any individual bit
        from a drive, the chance of recovery of any amount of data from a
        drive using an electron microscope are negligible. Even speculating on
        the possible recovery of an old drive, there is no likelihood that any
        data would be recoverable from the drive. The forensic recovery of
        data using electron microscopy is infeasible. This was true both on
        old drives and has become more difficult over tine. Further, there is
        a need for the data to have been written and then wiped on a raw
        unused drive for there to be any hopy of any level of recovery even at
        the bit level, which does not reflect real situations. It is unlikely
        that a recovered drive will have not been used for a period of time
        and the interaction of defragmentation, file copies and general use
        that overwrites data areas negates any chance of data recovery. The
        fallacy that data can be forensically recovered using an electron
        microscope or related means needs to be put to rest.

    NIST also seem to agree. In NIST SP 800-88, they state the following:

        Studies have shown that most of today’s media can be effectively
        cleared by one overwrite.

        Purging information is a media sanitization process that protects the
        confidentiality of information against a laboratory attack. For some
        media, clearing media would not suffice for purging. However, for ATA
        disk drives manufactured after 2001 (over 15 GB) the terms clearing
        and purging have converged.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-19 16:21 [PATCH v3] ext4: wipe filename upon file deletion Leah Rumancik
2021-04-19 22:53 ` Eric Biggers
2021-04-20  1:59   ` Theodore Ts'o
2021-04-20 14:55     ` Leah Rumancik
2021-04-22 17:44     ` Andreas Dilger
2021-04-22 19:58       ` Theodore Ts'o

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git