linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: inline extents
       [not found] ` <20180920014506.GL27618@dastard>
@ 2018-09-20  3:18   ` Chris Murphy
  2018-09-20 19:34     ` Theodore Y. Ts'o
  0 siblings, 1 reply; 2+ messages in thread
From: Chris Murphy @ 2018-09-20  3:18 UTC (permalink / raw)
  To: Dave Chinner, Linux FS Devel; +Cc: xfs list, Btrfs BTRFS, linux-ext4

Adding fsdevel@, linux-ext4, and btrfs@ (which has a separate subject
on this same issue)



On Wed, Sep 19, 2018 at 7:45 PM, Dave Chinner <david@fromorbit.com> wrote:
>On Wed, Sep 19, 2018 at 10:23:38AM -0600, Chris Murphy wrote:
>> Fedora 29 has a new feature to test if boot+startup fails, so the
>> bootloader can do a fallback at next boot, to a previously working
>> entry. Part of this means GRUB (the bootloader code, not the user
>> space code) uses "save_env" to overwrite the 1024 data bytes with
>> updated environment information.
>
> That's just broken. Illegal. Completely unsupportable. Doesn't
> matter what the filesystem is, nobody is allowed to write directly
> to the block device a filesystem owns.

Yeah, the word I'm thinking of is abomination.

However in their defense, grubenv and the 'save_env' command are old features:

line 3638 @node Environment block
http://git.savannah.gnu.org/cgit/grub.git/tree/docs/grub.texi

"For safety reasons, this storage is only available when installed on a plain
disk (no LVM or RAID), using a non-checksumming filesystem (no ZFS), and
using BIOS or EFI functions (no ATA, USB or IEEE1275)."

I haven't checked how it tests for this. But by now, it should list
the supported file systems, rather than what's exempt. That's a
shorter list.


> ext4 has inline data, too, so there's every chance grub will corrupt
> ext4 filesystems with tit's wonderful new feature. I'm not sure if
> the ext4 metadata cksums cover the entire inode and inline data, but
> if they do it's the same problem as btrfs.

I don't see inline used with a default mkfs, but I do see metadata_csum

e2fsprogs-1.44.3-1.fc29.x86_64

Filesystem features: has_journal ext_attr resize_inode dir_index
filetype extent 64bit flex_bg sparse_super large_file huge_file
dir_nlink extra_isize metadata_csum
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl

>
>> For XFS, I'm not sure how the inline extent is saved, and whether
>> metadata checksumming includes or excludes the inline extent.
>
> When XFS implements this, it will be like btrfs as the data will be
> covered by the metadata CRCs for the inode, and so writing directly
> to it would corrupt the inode and render it unreadable by the
> filesystem.

Good to know.


>
>> I'm also kinda ignoring the reflink ramifications of this behavior,
>> for now. Let's just say even if there's no corruption I'm really
>> suspicious of bootloader code writing anything, even what seems to be
>> a simple overwrite of two sectors.
>
> You're not the only one
>
> Like I said, it doesn't matter what the filesystem is, overwriting
> file data by writing directly to the block device is not
> supportable. It's essentially a filesystem corruption vector, and
> grub needs to have that functionality removed immediately.

I'm in agreement with respect to the more complex file systems. We've
already realized the folly of the bootloader being unable to do
journal replay, ergo it doesn't really for sure have a complete
picture of the file system anyway. That's suboptimal when it results
in boot failure. But if it were going to use stale file system
information, get a wrong idea of the file system, and then use that to
do even 1024 bytes of writes? No, no, and no.

Meanwhile, also in Fedoraland, it's one of the distros where grubenv
and grub.cfg stuff is on the EFI System partition, which is FAT. This
overwrite behavior will work there, but even this case is a kind of
betrayal that the file is being modified, without its metadata being
updated. I think it's an old era hack that by today's standards simply
isn't good enough. I'm a little surprised that all UEFI
implementations permit arbitrary writes from the pre-boot environment
to arbitrary block devices, even with Secure Boot enabled. That seems
specious.

I know some of the file systems have reserve areas for bootloader
stuff. I'm not sure if that's preferred over bootloaders just getting
their own partition and controlling it stem to stern however they
want.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: inline extents
  2018-09-20  3:18   ` inline extents Chris Murphy
@ 2018-09-20 19:34     ` Theodore Y. Ts'o
  0 siblings, 0 replies; 2+ messages in thread
From: Theodore Y. Ts'o @ 2018-09-20 19:34 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Dave Chinner, Linux FS Devel, xfs list, Btrfs BTRFS, linux-ext4

On Wed, Sep 19, 2018 at 09:18:16PM -0600, Chris Murphy wrote:
> > ext4 has inline data, too, so there's every chance grub will corrupt
> > ext4 filesystems with tit's wonderful new feature. I'm not sure if
> > the ext4 metadata cksums cover the entire inode and inline data, but
> > if they do it's the same problem as btrfs.
> 
> I don't see inline used with a default mkfs, but I do see metadata_csum

The grub environment block is 1024 bytes, so unless you are using a 4k
inode size (the default is 256 bytes), the grub environment block
won't be an inline data file.  So in practice, this won't be a
problem.  So in practice this won't be a problem for ext4 today.

There are ways that grub can query the file system (using FIEMAP or
FS_IOC_GETFLAGS) to see whether or not the file is an inline file.
For xfs grub would also need to make sure the file isn't reflinked
file (which can also be queried using FIEMAP).  That's fine for
inline, since a file won't magically change from stored in a block
versus inline, so grub can check this at grub setup time.  But the
problem with reflink is that a file that was originally using a block
exclusively could later have that block be shared by another file, at
which point it's only safe to write the block using copy-on-write.

> I know some of the file systems have reserve areas for bootloader
> stuff. I'm not sure if that's preferred over bootloaders just getting
> their own partition and controlling it stem to stern however they
> want.

Ext4 had reserved the bootloader inode, and we have defined a way for
a bootloader to access it.  The original intent was that not it just
be fore the environment block, but also the grub stage 1.5 or stage 2
loader.  This would allow grub to just use a fixed list of blocks or
extents, and not need to understand the ext4 extents structures.  We
added way this when, because we anticipated this would be easier for
grub than having it learn how to read ext4 extents structures.
Instead, the bootloader would at setup time use the EXT4_IOC_SWAP_BOOT
ioctl and then use a fixed list of blocks or block extents like LILO
used to do.

Of course, grub exceeded expectations and actually learned how to read
ext4 file systems, and probably we (the ext4 developers) should have
reached out to GRUB folks.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-09-21  1:19 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAJCQCtQVzmwey0GcuHmZ8=_XuqmrA24d4DDVVuX9vL3KYStwZg@mail.gmail.com>
     [not found] ` <20180920014506.GL27618@dastard>
2018-09-20  3:18   ` inline extents Chris Murphy
2018-09-20 19:34     ` Theodore Y. Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).