From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.71) id 1ZmABx-0005wq-V6 for mharc-grub-devel@gnu.org; Tue, 13 Oct 2015 20:45:17 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33076) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZmAAb-0005iG-14 for grub-devel@gnu.org; Tue, 13 Oct 2015 20:43:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZmAAV-0003PY-Th for grub-devel@gnu.org; Tue, 13 Oct 2015 20:43:52 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:33401) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZmAAV-0003PR-NG for grub-devel@gnu.org; Tue, 13 Oct 2015 20:43:47 -0400 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t9E0he8L016941 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 14 Oct 2015 00:43:40 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id t9E0hdbu031735 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Wed, 14 Oct 2015 00:43:40 GMT Received: from abhmp0007.oracle.com (abhmp0007.oracle.com [141.146.116.13]) by userv0122.oracle.com (8.13.8/8.13.8) with ESMTP id t9E0hdrh008699; Wed, 14 Oct 2015 00:43:39 GMT Received: from [10.97.124.35] (/71.6.63.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 13 Oct 2015 17:43:39 -0700 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: [grub PATCH] efinet: disable MNP background polling From: Seth Goldberg X-Mailer: iPhone Mail (13A452) In-Reply-To: <561D83E9.6050703@redhat.com> Date: Tue, 13 Oct 2015 17:43:36 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <343ACDF7-1546-497F-90E2-A8BE8841324B@oracle.com> References: <20151001.182655.371384337.d.hatayama@jp.fujitsu.com> <560D1E07.3090902@redhat.com> <20151013214919.GA6140@router-fw-old.local.net-space.pl> <561D83E9.6050703@redhat.com> To: The development of GNU GRUB X-Source-IP: aserv0022.oracle.com [141.146.126.234] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 141.146.126.69 Cc: arvidjaar@gmail.com, daniel.kiper@oracle.com, HATAYAMA Daisuke , glin@suse.com, Mark Salter , edk2-devel-01 X-BeenThere: grub-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: The development of GNU GRUB List-Id: The development of GNU GRUB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Oct 2015 00:43:54 -0000 > On Oct 13, 2015, at 3:21 PM, Laszlo Ersek wrote: >=20 >> On 10/13/15 23:49, Daniel Kiper wrote: >> Hi Laszlo, >>=20 >> First of all, thanks a lot for very nice explanation! >>=20 >>> On Thu, Oct 01, 2015 at 01:50:31PM +0200, Laszlo Ersek wrote: >>> CC'ing Mark Salter, and edk2-devel, also updating the subject slightly >>> for better context. >>>=20 >>>> On 10/01/15 11:26, HATAYAMA Daisuke wrote: >>>> Currently, as of the commit f348aee7b33dd85e7da62b497a96a7319a0bf9dd, >>>> SNP is exclusively reopened to avoid slowdown or complete failure to >>>> load files due to races between MNP background polling and grub's >>>> receiving and transmitting packets. >>>>=20 >>>> This exclusive SNP reopen affects some EFI applications/drivers that >>>> use SNP and causes PXE boot on such systems to fail. Hardware filter >>>> issue fixed by the commit 49426e9fd2e562c73a4f1206f32eff9e424a1a73 is >>>> one example. >>>=20 >>> I think the above two commit references are in inverse order. That is, >>> commit 49426e9f is the one responsible for the (needed) exclusive open, >>> and commit f348aee7 fixes up the former by reconfiguring the receive >>> filters. >>>=20 >>> In other words, the first two paragraphs here seem accurate, just please= >>> reorder the commit hashes. >>>=20 >>>> The difficulty here is that effects of the exclusive SNP reopen >>>> differs from system to system depending their UEFI firmware >>>> implementations. One system works well with the commit >>>> f348aee7b33dd85e7da62b497a96a7319a0bf9dd only, another system still >>>> needs the commit 49426e9fd2e562c73a4f1206f32eff9e424a1a73, and there >>>> could be other systems where PXE boot still fails. >>>=20 >>> Here too I think the commit hashes should be switched around. >>>=20 >>>>=20 >>>> Actually, PXE boot fails on Fujitsu PRIMEQUEST 2000 series now. >>>>=20 >>>> Thus, it seems to me that the exclusive SNP reopen makes grub >>>> maintenance difficult by affecting a variety of systems differently. >>>=20 >>> (Admittedly, this is going to be a bit of speculation:) I think the >>> difference in behavior is not due to SNP implementations, but due to >>> differences in higher level protocol implementations -- i.e., the >>> behavior depends on what those drivers do to the underlying SNP that are= >>> *closed*. >>>=20 >>> According to the spec of OpenProtocol(), in case of a successful >>> exclusive open, the other BY_DRIVER reference is kicked off with >>> DisconnectController(), which in turn calls the other driver's >>> EFI_DRIVER_BINDING_PROTOCOL.Stop() function. >>>=20 >>> So, the question is what that *other* (higher level) driver does in its >>> Stop() function, when it unbinds the handle with the underlying SNP >>> interface. Does it mess with SNP's receive filters? And so on. >>>=20 >>>>=20 >>>> Instead, the idea of this patch is to disable MNP background polling >>>> temporarily while grub uses SNP. Then, the race issue must be removed. >>>=20 >>> This is not the right approach in my opinion. >>>=20 >>> The original problem is that GRUB's direct access to SNP races with >>> *several* MNP instances to the same SNP. The MNP instances are correctly= >>> arbitrated between each other (see more on this later), but the SNP >>> accesses from GRUB are not. >>>=20 >>> SNP is meant as an exclusive-access interface to the NIC, so those >>> parallel clients won't work. >>>=20 >>> One way to solve that is to kick out those other SNP accessors, by way >>> of exclusive open. This is correct from a UEFI driver model perspective,= >>> but -- unless GRUB does full reconfiguration on the SNP level -- brittle= >>> in practice, as you've experienced; apparently different implementations= >>> of higher level protocols leave the SNP in different states when they go= >>> away. (It's perfectly possible that some of those driver binding Stop() >>> functions have bugs.) >>>=20 >>> However, the other way (because there is another way, yes) is different >>> from interfering with existing MNP instances (note: plural). MNP (and a >>> bunch of other networking related protocols) don't work like your usual >>> UEFI device drivers; they are child protocols (=3D protocol interfaces o= n >>> child handles) that are created *as needed* with the help of Service >>> Binding protocol instances. >>>=20 >>> Please read the following sections of the UEFI spec (v2.5) carefully: >>>=20 >>> - 2.5.8 EFI Services Binding >>> - 10.6 EFI Service Binding Protocol >>> - 24.1 EFI Managed Network Protocol >>> EFI_MANAGED_NETWORK_SERVICE_BINDING_PROTOCOL >>=20 >> Taking into account above and sentences in UEFI spec (v2.5) like "Once th= e >> remote image is successfully loaded, it may utilize the EFI_PXE_BASE_CODE= _PROTOCOL >> interfaces, or even the EFI_SIMPLE_NETWORK_PROTOCOL interfaces, to contin= ue >> the remote process." I have a feeling that UEFI spec is very imprecise ho= w >> to use SNP. I have not seen any single word saying that there are any con= straints >> on SNP usage (am I missing something?). I heard about that in our interna= l >> discussions but I was not convinced that it is true until I have seen you= r >> email with all details. So, now I think that UEFI spec should have specia= l >> paragraph saying how to play with SNP. What do you think about that? >=20 > I think that a request for clarification should be filed with the USWG, > or at least raised on edk2-devel :) >=20 >> By the way, I saw that other boot loaders like PXELINUX or iPXE use SNP. >> Do you suggest that all of them should be rewritten to avoid issues with >> this protocol? >=20 > I don't know these projects overly well :), but I wouldn't suggest such > an indiscriminate rewrite for each. The main incentive for thinking > about MNP at all is that the exclusive open of SNP, which *should* work, > hangs various proprietary UEFI implementations. >=20 > When Mark Salter pinged me on IRC quite a few months back about the > original grub problem, the first thing that I recommended (although > clearly not immediately -- I had to research the spec) was the exclusive > open. Mark went ahead and implemented that, and grub started working on > the UEFI platform we had. (Then he contributed the patch(es) to upstream.)= >=20 > The exclusive open is a valid (and simple) thing to do, according to the > UEFI spec. So the alternative to the elaborate MNP(SB) patch is to debug > and fix the UEFI implementations on which the current (otherwise > spec-conformant) grub code -- the exclusive open -- exposes a hang / > crash etc. >=20 > The people who are likely most interested in either fixing those > problematic UEFI implementations, *or* in converting the various open > source projects to MNP(SB), should be those that want to run the latter > on the former. :) Hatayama-san is certainly at liberty to solve the > issue either way. The folks I've talked to at Intel say that the bug is that the snp exclusi= ve open should fail if any other consumer has already opened snp, and that i= s the case with existing implementations (that is, mnp has already opened sn= p by the time grub is loaded and running). If that bug were fixed, the grub= snp driver would always fail which would make it pretty obvious that mnp is= the way to go.=20 --S >=20 > Thanks! > Laszlo >=20 >>=20 >> Daniel >=20 >=20 > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel