From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.90_1) id 1lOh48-0002Yu-UR for mharc-grub-devel@gnu.org; Tue, 23 Mar 2021 09:27:24 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57876) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lOh47-0002Yg-4g for grub-devel@gnu.org; Tue, 23 Mar 2021 09:27:23 -0400 Received: from b-painless.mh.aa.net.uk ([2001:8b0:0:30::52]:55713) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lOh44-0006cN-KE for grub-devel@gnu.org; Tue, 23 Mar 2021 09:27:22 -0400 Received: from f.b.1.7.2.1.e.f.f.f.a.c.5.0.a.6.4.1.b.e.2.f.f.b.0.b.8.0.1.0.0.2.ip6.arpa ([2001:8b0:bff2:eb14:6a05:caff:fe12:71bf] helo=riva.pelham.vpn.ucam.org) by painless-b.tch.aa.net.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lOh41-00064B-62; Tue, 23 Mar 2021 13:27:17 +0000 Received: from ns1.pelham.vpn.ucam.org ([172.20.153.2] helo=riva.ucam.org) by riva.pelham.vpn.ucam.org with esmtp (Exim 4.92) (envelope-from ) id 1lOh3z-0003vg-Tp; Tue, 23 Mar 2021 13:27:15 +0000 Date: Tue, 23 Mar 2021 13:27:15 +0000 From: Colin Watson To: Javier Martinez Canillas Cc: Michael Chang , The development of GNU GRUB , Marco A Benatto Subject: Re: [PATCH v2] i386-pc: build verifiers API as module Message-ID: <20210323132715.GP26923@riva.ucam.org> References: <20210318113026.24963-1-mchang@suse.com> <20210322152000.ebheegnkkhpqa4d3@tomti.i.net-space.pl> <20210323041621.GA4480@mercury> <000fc9a9-25cd-4d8c-5472-ac16358d888b@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000fc9a9-25cd-4d8c-5472-ac16358d888b@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) Received-SPF: none client-ip=2001:8b0:0:30::52; envelope-from=cjwatson@debian.org; helo=b-painless.mh.aa.net.uk X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: grub-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: The development of GNU GRUB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2021 13:27:23 -0000 On Tue, Mar 23, 2021 at 12:37:20PM +0100, Javier Martinez Canillas wrote: > On 3/23/21 5:16 AM, Michael Chang wrote > > Afterall, keeping existing running system to survive update (NOT new > > install) is really an important thing as many can't afford that to > > happen. If we can make it any better to reduce the cost please consider > > to do it. It doesn't conflict with the purpose to stop the short mbr gap > > support, given we all know the broken system can be avoided in the first > > place ... > > My take on this is that we are conflating the need for distros to prevent an > update to break their users' existing installs, with a new GRUB release that > ships a set of CVE fixes (among other things). > > I do agree that distros should avoid breaking users' installs, but don't think > that upstream should keep supporting the 63 sectors embedding are size forever > just to facilitate that. Otherwise it is a race to the bottom, since GRUB will > have to pile up workarounds and massage the code just to keep this constraint. We don't need to take this out of proportion, though: the actual workarounds being discussed here are not that complicated, and certainly *far* less complicated than some of the existing workarounds that exist only for i386-pc. > In the past, we posted other patches that we had been carrying for a long time > downstream (i.e: "kern: Make grub_error() more verbose" [0]) and were told it > couldn't be accepted due increasing the core.img size. It's really hard to add > new stuff by keeping this constraint, without having a lot of ifdefery in GRUB. This has been an issue in some form or another for as long as I've been working on GRUB. It's not clear to me why it's particularly urgent to break things now, when patches to make things work again exist and aren't of unreasonable complexity compared to the rest of GRUB. (To be clear, I would be perfectly happy to be told that particular details of some particular patch are problematic; it's the blanket refusal even for simple and innocuous patches that don't add complexity or in some cases might even remove complexity that I find unreasonable.) I agree that the size constraints can be difficult at times, but well, that's boot loaders for you; they're a constrained environment. After all, as I understand it, the entire system of a core GRUB image with loadable modules originally came into existence due to this kind of size constraint. For those pointing to the documentation change as justification of unilaterally ignoring certain constraints, I'd say that the number of users who'll actually have seen that in advance of their systems being broken is likely to be a pretty good approximation to zero. I support the basic idea of the documentation change, but it needs a *much* longer deprecation cycle. Yes, that may be inconvenient for us as GRUB developers. Zooming out a bit, it's in the interest of the whole free software ecosystem for security updates to be reliable and not break existing use cases. I don't think any of us want users to be any more hesitant about installing security updates than they already are: the world is generally better off if we can deploy security updates as quickly as possible. The argument I'm making here is, I think, the same sort of argument by which Torvalds famously has a very low tolerance for userspace regressions in the Linux kernel. The argument there runs something like this: perhaps userspace is being objectively unreasonable, but nevertheless, it exists in the real world and nobody is helped by having to solve n-dimensional simultaneous equations in order to work out whether they can install a given new kernel version. Maintaining an intolerance of regressions reduces the friction for installing updates. > Also, most partitions tools have been aligning to a MiB boundary for quite some > time by now. So I don't believe is the correct trade-off for upstream to support > the 31 KiB embedding gap, in detriment of the 1 MiB case that's much more common. Right, I remember working on changes to improve alignment handling as far back as 2010. That's a good thing, but unfortunately the nature of the beast here is that old mistakes stick around for a long time. > Yes, it's a bummer to have to carry downstream patches (and we have a bunch in > our distro) just to support existing users. But it's up to each distro to decide > what's the proper action to take for their supported OS releases (e.g: rebase to > the latest 2.06 version with some patches, only backport security fixes, etc). > > For this particular case, it might be better for distros to just revert commit > 9e95f45ceee ("verifiers: Move verifiers API to kernel image") instead of making > it conditional for i386-pc, adding complexity to the GRUB upstream code IMO. That would also mean skipping or substantially modifying your lockdown patch that followed it, which requires great care. I did something like this in various forms for our security updates because there wasn't much choice there, but I'm not keen on it as a long-term solution. In the long term, we do seem to want to have the verifiers API in the kernel image at least for EFI platforms, don't we? So reverting that patch entirely seems like a bad move, and Michael's approach seems a reasonable compromise. > As mentioned, not every upstream change may work for all distros. For example, > we had to revert e346414725a ("templates: Disable the os-prober by default") > since otherwise users that don't have anything set in their /etc/default/grub > wouldn't have entries for other OS in their GRUB config file anymore. I'm very substantially less concerned about distro divergence in things like grub-mkconfig. While it can introduce confusion, it's typically much less security-critical. Let me be clear here: I'm no newcomer to the business of maintaining large distro patch stacks for GRUB, far from it. I've been doing it for over a decade and I'm well aware of the trade-offs. Often it's either the right thing to do or at least the path of least resistance. Some of the patches I shepherd make it upstream, some never really do, some need significant reworking; that's life. I'm not expecting Debian to get to zero upstream patches against GRUB before I hit retirement age, and I don't think that's the point here. Rather, I don't believe that the existence of distros as intermediaries means that upstream can simply declare a whole class of regressions to be a distro problem, with a mere handful of months of notice in the form of a documentation change in an unreleased version before (no doubt unintentionally) landing patches that break some of the systems in question, and not expect at least some pushback about that. Finally, I also notice that the person who sent the patch at the head of this thread is also the same person who sent https://lists.gnu.org/archive/html/grub-devel/2020-03/msg00059.html, so it seems quite ironic to reject the patch on that basis ... Thanks, -- Colin Watson (he/him) [cjwatson@debian.org]