From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 12 Mar 2021 19:18:14 +0100 From: Henning Schild Subject: Re: [Dovetail] x86 test version available (kernel v5.10) Message-ID: <20210312191814.329ae63e@md1za8fc.ad001.siemens.net> In-Reply-To: <4ff6d9aa-ea26-2195-a279-368a1eb7d099@siemens.com> References: <874kixxf9e.fsf@xenomai.org> <20210222163530.443822b3@md1za8fc.ad001.siemens.net> <20210311173541.5e1b13b3@md1za8fc.ad001.siemens.net> <204ed847-12bf-4b8e-56a3-2e741dc9edf3@siemens.com> <4ff6d9aa-ea26-2195-a279-368a1eb7d099@siemens.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: Henning Schild via Xenomai , "Bezdeka, Florian (T RDA IOT SES-DE)" Am Fri, 12 Mar 2021 12:32:54 +0100 schrieb Jan Kiszka : > On 12.03.21 08:22, Jan Kiszka wrote: > > On 11.03.21 17:35, Henning Schild wrote: > >> Am Mon, 22 Feb 2021 16:35:30 +0100 > >> schrieb Henning Schild via Xenomai : > >> > >>> Am Sun, 31 Jan 2021 17:06:21 +0100 > >>> schrieb Philippe Gerum via Xenomai : > >>> > >>>> The initial port of the Cobalt core to Dovetail/x86 is available > >>>> from [1]. Ports to Dovetail/ARM and Dovetail/arm64 should follow > >>>> within a couple of weeks. > >>>> > >>>> So far, latency and switchtest run flawlessly. Most of the smokey > >>>> test suite passes successfully, except the GDB test at the > >>>> moment. > >>>> > >>>> How to test this: > >>>> > >>>> - clone Xenomai from [1], switch to branch for-upstream/dovetail > >>>> - clone the Dovetail tree (v5.10) from [2], switch to branch > >>>> dovetail/master > >>>> - run scripts/prepare-kernel.sh available from [1] into [2] > >>>> (usual procedure) for x86, that would be: > >>>> .../scripts/prepare-kernel.sh --arch=x86 > >>>> - build your kernel using the sources from [2] > >>> > >>> I followed this and got > >>> > >>> 0625b829450dab22b2eea860eef4fb94f64af4fd > >>> 61769e49d82a6775f6eb259bdc16c2f84df79505 > >>> > >>> > >>>> There is no user-visible Kconfig change compared to an I-pipe > >>>> based version. > >>> > >>> took a 4.19 x86_64 savedefconfig in via olddefconfig > >>> > >>>> Alternatively, the Xenomai code base in [1] can also run on top > >>>> of the I-pipe. prepare-kernel.sh detects which pipeline flavour > >>>> is there, and prepares the source tree accordingly. > >>>> > >>>> This code is being gradually merged into Xenomai's -next branch, > >>>> and will be at the core of Xenomai 3.2. Testing and feedback > >>>> appreciated. > >>> > >>> Now i get a BUG pretty early on during boot. Since my config still > >>> seems off i could not yet enter my rootfs, probably some > >>> drivers/filesystems/raid switches got lost ... > >>> > >>> Its a null pointer deref in kmem_cache_alloc, coming somewhere > >>> from mempool_init_node > >>> > >>> First guess was that "numa=off" would hide the problem, seems it > >>> does. So my next guess is that even a qemu with numa could > >>> reproduce this ... probably almost on a random config ... as long > >>> as it has smp+numa > >> > >> It is not about numa, now also see that on qemu. Jan looks into it > >> at the moment. > >> > > > > I'm suspecting an upstream issue. The bug is some freelist > > corruption, possibly only surfaced by dovetail. After digging > > longer, I enabled CONFIG_SLAB_FREELIST_HARDENED, and that gave > > > > [ 0.159611] ACPI Error: Could not remove SCI handler > > (20210105/evmisc-251) > > [ 0.160188] ------------[ cut here ]------------ > > [ 0.160566] cache_from_obj: Wrong slab cache. ftrace_event_field > > but object is from kmalloc-64 > > > > and more. However: You get those bugs also from running latest > > 5.10.23 and even Linus' master. Will debug that further and then > > possibly report upstream. > > This is a corner case of upstream (but still a bug there): > > Your config lost CONFIG_PCI, likely because it is no longer default y. > That not only generates and unbootable systems, it also sends the ACPI > subsystem into an error path. There it seems to release objects to the > wrong caches, leading to corruptions later on. Enabling PCI resolves > all this. > > I'll dump the config to upstream ACPI folks, should be their business. Thanks Jan! > Jan > > PS: Your config has more issues as it does not even boot in QEMU, even > with PCI enabled. You should (re-)derive from x86_64-defconfig. Yes that was CI progressing with "olddefconfig" from a 4.19 "savedefconfig". No human took a closer look at the result, before you. But hey that might fix an upstream bug so it was a good exercise anyhow. Henning