linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* v5.4-rcX: qemu-system-mips64 userspace segfault
@ 2019-10-24 13:12 Bruce Ashfield
  2019-10-24 13:31 ` Vincenzo Frascino
  0 siblings, 1 reply; 5+ messages in thread
From: Bruce Ashfield @ 2019-10-24 13:12 UTC (permalink / raw)
  To: linux-mips, vincenzo.frascino, paul.burton; +Cc: Richard Purdie

Hi all,

I'm not sure if anyone else is running qemu-system-mips64 regularly,
but for the past 4 (or more) years, it has been the primary way that
we run QA on the mips64 Yocto Project reference kernel(s). I take care
of the kernel for the project, so I always have the fun of running
into issues first :D

That's enough preamble ...

I wanted to see if anyone recognized the issue that I'm seeing when I
bumped the linux-yocto dev kernel to the v5.4-rc series.

The one line summary is that I'm seeing a segfault as soon as  the
kernel hands off to userspace during boot. It doesn't matter if it is
systemd, sysvinit, or init=/bin/sh .. I always get a segfault.

Here's the snippet of the boot (it isn't informative, and doesn't
really tell us anything ..)

[   33.155335] md: Waiting for all devices to be available before autodetect
[   33.246899] md: If you don't use raid, use raid=noautodetect
[   33.352059] md: Autodetecting RAID arrays.
[   33.442893] md: autorun ...
[   33.536877] md: ... autorun DONE.
[   33.745949] EXT4-fs (vda): recovery complete
[   33.876766] EXT4-fs (vda): mounted filesystem with ordered data
mode. Opts: (null)
[   34.077905] VFS: Mounted root (ext4 filesystem) on device 253:0.
[   34.184518] devtmpfs: mounted
[   34.359569] Freeing unused kernel memory: 588K
[   34.476478] This architecture does not have kernel memory protection.
[   34.576358] Run /sbin/init as init process
[   35.011380] do_page_fault(): sending SIGSEGV to init for invalid
read access from 0000000000000360
[   35.253603] epc = 0000000000000360 in systemd[aaab121000+12d000]
[   35.368150] ra  = 000000fffdd0c5cc in
[   35.492165] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
[   35.721361] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x0000000b ]---

I was able to bisect the kernel and land on a commit that shows when
the problem first popped up:

> git bisect good
24640f233b466051ad3a5d2786d2951e43026c9d is the first bad commit
commit 24640f233b466051ad3a5d2786d2951e43026c9d
Author: Vincenzo Frascino <vincenzo.frascino@arm.com>
Date:   Fri Jun 21 10:52:46 2019 +0100

    mips: Add support for generic vDSO

    The mips vDSO library requires some adaptations to take advantage of the
    newly introduced generic vDSO library.

    Introduce the following changes:
     - Modification of vdso.c to be compliant with the common vdso datapage
     - Use of lib/vdso for gettimeofday

    Cc: Ralf Baechle <ralf@linux-mips.org>
    Cc: Paul Burton <paul.burton@mips.com>
    Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
    [paul.burton@mips.com: Prepend $(src) to config-n32-o32-env.c path.]
    Signed-off-by: Paul Burton <paul.burton@mips.com>

:040000 040000 2781bc95f79d835c754962eec097eaa149a6d29e
ad346bd742e3df90997075fbf1abeef586a02da3 M      arch

.. which passes the smell test for something that would be in the
right area for the type of segfault I'm seeing.

It of course wasn't trivial to revert, but with the following stack of
reverts, I'm able to build and boot again:

932bb934ed4d mips: compat: vdso: Use legacy syscalls as fallback
cdab7e2c73d5 mips: vdso: Fix flip/flop vdso building bug
b4c0f7fa5308 mips: vdso: Fix source path
1f66c45db330 mips: Add clock_gettime64 entry point
abed3d826f2f mips: Add clock_getres entry point
6393e6064486 mips: fix vdso32 build, again
24640f233b46 mips: Add support for generic vDSO
8919975b6171 MIPS: VDSO: Fix build for binutils < 2.25
90800281e761 MIPS: VDSO: Remove unused gettimeofday.c

I looked, and can't find any obvious way to fix the issue, or a new
config option that I should be tweaking .. or anything else, outside
of the local revert.

This email is already starting to get long, so I'll cut off the
information dump here. I can provide more details (.config, etc), or
whatever else folks might need to get some better idea.

FWIW: this is how we spawn qemu-system-mips64 for the boot test:

% qemu-system-mips64 -device
virtio-net-pci,netdev=net0,mac=52:54:00:12:35:02 -netdev
user,id=net0,hostfwd=tcp::2222-:22,hostfwd=tcp::2323-:23,tftp=poky/build/tmp/deploy/images/qemumips64
-drive file=poky/build/tmp/deploy/images/qemumips64/core-image-minimal-qemumips64-20191017144136.rootfs.ext4,if=virtio,format=raw
-show-cursor -usb -device usb-tablet -object
rng-random,filename=/dev/urandom,id=rng0 -device
virtio-rng-pci,rng=rng0 -nographic -machine malta -cpu
MIPS64R2-generic -m 256 -serial mon:stdio -serial null -kernel
poky/build/tmp/deploy/images/qemumips64/vmlinux -append root=/dev/vda
rw highres=off  console=ttyS0 ip=dhcp console=ttyS0 console=tty

Cheers,

Bruce

-- 
- Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end
- "Use the force Harry" - Gandalf, Star Trek II

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: v5.4-rcX: qemu-system-mips64 userspace segfault
  2019-10-24 13:12 v5.4-rcX: qemu-system-mips64 userspace segfault Bruce Ashfield
@ 2019-10-24 13:31 ` Vincenzo Frascino
       [not found]   ` <CADkTA4N1UzrHRZi4j6MUxxT4yWsv1BSHDb11SaKqtbW_gihZ-g@mail.gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Vincenzo Frascino @ 2019-10-24 13:31 UTC (permalink / raw)
  To: Bruce Ashfield, linux-mips, paul.burton; +Cc: Richard Purdie

Hi Bruce,

On 10/24/19 2:12 PM, Bruce Ashfield wrote:
> Hi all,
> 
> I'm not sure if anyone else is running qemu-system-mips64 regularly,
> but for the past 4 (or more) years, it has been the primary way that
> we run QA on the mips64 Yocto Project reference kernel(s). I take care
> of the kernel for the project, so I always have the fun of running
> into issues first :D
> 
> That's enough preamble ...
> 
> I wanted to see if anyone recognized the issue that I'm seeing when I
> bumped the linux-yocto dev kernel to the v5.4-rc series.
> 
> The one line summary is that I'm seeing a segfault as soon as  the
> kernel hands off to userspace during boot. It doesn't matter if it is
> systemd, sysvinit, or init=/bin/sh .. I always get a segfault.
[...]

Could you please share the .config you are using?

Do you know by any change which vdso clock_mode is set in this scenario?

-- 
Regards,
Vincenzo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: v5.4-rcX: qemu-system-mips64 userspace segfault
       [not found]   ` <CADkTA4N1UzrHRZi4j6MUxxT4yWsv1BSHDb11SaKqtbW_gihZ-g@mail.gmail.com>
@ 2019-10-25  9:08     ` Vincenzo Frascino
  2019-10-25 13:04       ` Bruce Ashfield
  0 siblings, 1 reply; 5+ messages in thread
From: Vincenzo Frascino @ 2019-10-25  9:08 UTC (permalink / raw)
  To: Bruce Ashfield; +Cc: linux-mips, paul.burton, Richard Purdie

Hi Bruce,

On 10/24/19 5:37 PM, Bruce Ashfield wrote:
> On Thu, Oct 24, 2019 at 9:29 AM Vincenzo Frascino
> <vincenzo.frascino@arm.com> wrote:
>>
>> Hi Bruce,
>>
>> On 10/24/19 2:12 PM, Bruce Ashfield wrote:
>>> Hi all,
>>>
>>> I'm not sure if anyone else is running qemu-system-mips64 regularly,
>>> but for the past 4 (or more) years, it has been the primary way that
>>> we run QA on the mips64 Yocto Project reference kernel(s). I take care
>>> of the kernel for the project, so I always have the fun of running
>>> into issues first :D
>>>
>>> That's enough preamble ...
>>>
>>> I wanted to see if anyone recognized the issue that I'm seeing when I
>>> bumped the linux-yocto dev kernel to the v5.4-rc series.
>>>
>>> The one line summary is that I'm seeing a segfault as soon as  the
>>> kernel hands off to userspace during boot. It doesn't matter if it is
>>> systemd, sysvinit, or init=/bin/sh .. I always get a segfault.
>> [...]
>>
>> Could you please share the .config you are using?
> 
> attached (hopefully this won't cause my reply to bounce).
> 

It seems that the .config you shared was generated for a version of the kernel
that is older then the one in which we introduced the unified vDSO hence, since
the options to enable correctly the generic vdso library are selected by the
architecture, this result in a mis-configuration of the vDSO library which leads
to the issues you are seeing.

My advise is to start from a fresh defconfig and then enable the options you
need one by one. I did it with buildroot and it seems working.

Another thing I noticed and this seems confirmed by the patch series you had to
revert is that you are missing a fix that I submitted last week:

8a1bef4193e81c8afae4d2f107f1c09c8ce89470
("mips: vdso: Fix __arch_get_hw_counter()")

Could you please apply it before regenerating the .config? Seems the qemu falls
back on VDSO_CLOCK_NONE at least in the case I reproduced.

> When debugging (and bisecting), as expected, the VDSO configs bounced
> around a bit with the move to generic VDSO, etc.  So there very well
> may be something that with 5.4 I need to enable now and missed in my
> debug.
> 
> I don't have GENERIC_COMPAT_VDSO enabled, but can easily do a boot
> test with it on, similarly with the different vdso boot option. I know
> I had tried a lot of different combos, but would have to redo the
> tests now.
> 

This seems confirming my suspect of the wrong .config.

> 
>>
>> Do you know by any change which vdso clock_mode is set in this scenario?
> 
> Unfortunately not, it isn't something that we've explicitly set in the
> past, so I haven't looked into it. But can do more digging.
> 
> Bruce
> 
>>
>> --
>> Regards,
>> Vincenzo
> 
> 
> 

Please let us know how your investigation proceeds.

-- 
Regards,
Vincenzo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: v5.4-rcX: qemu-system-mips64 userspace segfault
  2019-10-25  9:08     ` Vincenzo Frascino
@ 2019-10-25 13:04       ` Bruce Ashfield
  2019-11-06 17:45         ` Bruce Ashfield
  0 siblings, 1 reply; 5+ messages in thread
From: Bruce Ashfield @ 2019-10-25 13:04 UTC (permalink / raw)
  To: Vincenzo Frascino; +Cc: linux-mips, paul.burton, Richard Purdie

On Fri, Oct 25, 2019 at 5:06 AM Vincenzo Frascino
<vincenzo.frascino@arm.com> wrote:
>
> Hi Bruce,
>
> On 10/24/19 5:37 PM, Bruce Ashfield wrote:
> > On Thu, Oct 24, 2019 at 9:29 AM Vincenzo Frascino
> > <vincenzo.frascino@arm.com> wrote:
> >>
> >> Hi Bruce,
> >>
> >> On 10/24/19 2:12 PM, Bruce Ashfield wrote:
> >>> Hi all,
> >>>
> >>> I'm not sure if anyone else is running qemu-system-mips64 regularly,
> >>> but for the past 4 (or more) years, it has been the primary way that
> >>> we run QA on the mips64 Yocto Project reference kernel(s). I take care
> >>> of the kernel for the project, so I always have the fun of running
> >>> into issues first :D
> >>>
> >>> That's enough preamble ...
> >>>
> >>> I wanted to see if anyone recognized the issue that I'm seeing when I
> >>> bumped the linux-yocto dev kernel to the v5.4-rc series.
> >>>
> >>> The one line summary is that I'm seeing a segfault as soon as  the
> >>> kernel hands off to userspace during boot. It doesn't matter if it is
> >>> systemd, sysvinit, or init=/bin/sh .. I always get a segfault.
> >> [...]
> >>
> >> Could you please share the .config you are using?
> >
> > attached (hopefully this won't cause my reply to bounce).
> >
>
> It seems that the .config you shared was generated for a version of the kernel
> that is older then the one in which we introduced the unified vDSO hence, since
> the options to enable correctly the generic vdso library are selected by the
> architecture, this result in a mis-configuration of the vDSO library which leads
> to the issues you are seeing.

Parts of that .config have been around for years, and others would have been
from my v5.3-dev kernel work. So most definitely there are older
elements floating
around.

>
> My advise is to start from a fresh defconfig and then enable the options you
> need one by one. I did it with buildroot and it seems working.

We don't use defconfigs (at least not in a typical config flow), but absolutely,
I can start stepping through the options again. I've been maintaining this
platform and moving through kernel versions for a few years now, so there
could be something funky with the way the option was introduced and how
it interacts with my uprev workflow. I should have gotten a warning about
it in my config sanity step ... but I'll have a closer look at that  (obviously
my issue) once I'm up and booting.

It's also possible I grabbed the bad .config from the middle of my bisect,
which as I mentioned was toggling the VDSO options (and having some
build issues) due to changing dependencies. I'll compare a clean .config to
the one I sent and follow up if there's something obvious.

>
> Another thing I noticed and this seems confirmed by the patch series you had to
> revert is that you are missing a fix that I submitted last week:
>
> 8a1bef4193e81c8afae4d2f107f1c09c8ce89470
> ("mips: vdso: Fix __arch_get_hw_counter()")

Right, if it isn't already in -rcX, I don't have it yet, since I'm
uprev'ing the -dev
kernel and sanity testing the rc releases. Only if I have issues like this do I
start digging around for patches to apply.

I can definitely do that. It seems like gmail only decided to deliver 3 messages
on the 16 of October, so I don't have a copy of that patch locally, but I was
able to find the archive and will track down the patch later today.


>
> Could you please apply it before regenerating the .config? Seems the qemu falls
> back on VDSO_CLOCK_NONE at least in the case I reproduced.
>
> > When debugging (and bisecting), as expected, the VDSO configs bounced
> > around a bit with the move to generic VDSO, etc.  So there very well
> > may be something that with 5.4 I need to enable now and missed in my
> > debug.
> >
> > I don't have GENERIC_COMPAT_VDSO enabled, but can easily do a boot
> > test with it on, similarly with the different vdso boot option. I know
> > I had tried a lot of different combos, but would have to redo the
> > tests now.
> >
>
> This seems confirming my suspect of the wrong .config.

It was on in some of my testing, it just wasn't on for some of the
bisect runs. I may have grabbed the bad config in my haste. When I
dive back into this, I'll see what I managed to mess up.

Cheers,

Bruce

>
> >
> >>
> >> Do you know by any change which vdso clock_mode is set in this scenario?
> >
> > Unfortunately not, it isn't something that we've explicitly set in the
> > past, so I haven't looked into it. But can do more digging.
> >
> > Bruce
> >
> >>
> >> --
> >> Regards,
> >> Vincenzo
> >
> >
> >
>
> Please let us know how your investigation proceeds.

I definitely will, thanks for the time spent and the confirmation that you
aren't seeing the same thing.

Bruce

>
> --
> Regards,
> Vincenzo


--
- Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end
- "Use the force Harry" - Gandalf, Star Trek II

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: v5.4-rcX: qemu-system-mips64 userspace segfault
  2019-10-25 13:04       ` Bruce Ashfield
@ 2019-11-06 17:45         ` Bruce Ashfield
  0 siblings, 0 replies; 5+ messages in thread
From: Bruce Ashfield @ 2019-11-06 17:45 UTC (permalink / raw)
  To: Vincenzo Frascino; +Cc: linux-mips, paul.burton, Richard Purdie

On Fri, Oct 25, 2019 at 9:04 AM Bruce Ashfield <bruce.ashfield@gmail.com> wrote:
>
> On Fri, Oct 25, 2019 at 5:06 AM Vincenzo Frascino
> <vincenzo.frascino@arm.com> wrote:
> >
> > Hi Bruce,
> >
> > On 10/24/19 5:37 PM, Bruce Ashfield wrote:
> > > On Thu, Oct 24, 2019 at 9:29 AM Vincenzo Frascino
> > > <vincenzo.frascino@arm.com> wrote:
> > >>
> > >> Hi Bruce,
> > >>
> > >> On 10/24/19 2:12 PM, Bruce Ashfield wrote:
> > >>> Hi all,
> > >>>
> > >>> I'm not sure if anyone else is running qemu-system-mips64 regularly,
> > >>> but for the past 4 (or more) years, it has been the primary way that
> > >>> we run QA on the mips64 Yocto Project reference kernel(s). I take care
> > >>> of the kernel for the project, so I always have the fun of running
> > >>> into issues first :D
> > >>>
> > >>> That's enough preamble ...
> > >>>
> > >>> I wanted to see if anyone recognized the issue that I'm seeing when I
> > >>> bumped the linux-yocto dev kernel to the v5.4-rc series.
> > >>>
> > >>> The one line summary is that I'm seeing a segfault as soon as  the
> > >>> kernel hands off to userspace during boot. It doesn't matter if it is
> > >>> systemd, sysvinit, or init=/bin/sh .. I always get a segfault.
> > >> [...]
> > >>
> > >> Could you please share the .config you are using?
> > >
> > > attached (hopefully this won't cause my reply to bounce).
> > >
> >
> > It seems that the .config you shared was generated for a version of the kernel
> > that is older then the one in which we introduced the unified vDSO hence, since
> > the options to enable correctly the generic vdso library are selected by the
> > architecture, this result in a mis-configuration of the vDSO library which leads
> > to the issues you are seeing.
>
> Parts of that .config have been around for years, and others would have been
> from my v5.3-dev kernel work. So most definitely there are older
> elements floating
> around.
>
> >
> > My advise is to start from a fresh defconfig and then enable the options you
> > need one by one. I did it with buildroot and it seems working.
>
> We don't use defconfigs (at least not in a typical config flow), but absolutely,
> I can start stepping through the options again. I've been maintaining this
> platform and moving through kernel versions for a few years now, so there
> could be something funky with the way the option was introduced and how
> it interacts with my uprev workflow. I should have gotten a warning about
> it in my config sanity step ... but I'll have a closer look at that  (obviously
> my issue) once I'm up and booting.
>
> It's also possible I grabbed the bad .config from the middle of my bisect,
> which as I mentioned was toggling the VDSO options (and having some
> build issues) due to changing dependencies. I'll compare a clean .config to
> the one I sent and follow up if there's something obvious.
>
> >
> > Another thing I noticed and this seems confirmed by the patch series you had to
> > revert is that you are missing a fix that I submitted last week:
> >
> > 8a1bef4193e81c8afae4d2f107f1c09c8ce89470
> > ("mips: vdso: Fix __arch_get_hw_counter()")
>
> Right, if it isn't already in -rcX, I don't have it yet, since I'm
> uprev'ing the -dev
> kernel and sanity testing the rc releases. Only if I have issues like this do I
> start digging around for patches to apply.
>
> I can definitely do that. It seems like gmail only decided to deliver 3 messages
> on the 16 of October, so I don't have a copy of that patch locally, but I was
> able to find the archive and will track down the patch later today.
>
>
> >
> > Could you please apply it before regenerating the .config? Seems the qemu falls
> > back on VDSO_CLOCK_NONE at least in the case I reproduced.
> >
> > > When debugging (and bisecting), as expected, the VDSO configs bounced
> > > around a bit with the move to generic VDSO, etc.  So there very well
> > > may be something that with 5.4 I need to enable now and missed in my
> > > debug.
> > >
> > > I don't have GENERIC_COMPAT_VDSO enabled, but can easily do a boot
> > > test with it on, similarly with the different vdso boot option. I know
> > > I had tried a lot of different combos, but would have to redo the
> > > tests now.
> > >
> >
> > This seems confirming my suspect of the wrong .config.
>
> It was on in some of my testing, it just wasn't on for some of the
> bisect runs. I may have grabbed the bad config in my haste. When I
> dive back into this, I'll see what I managed to mess up.

Hi again, and sorry for 13 days in between replies!

I was traveling for the past week and a half and didn't get a chance
to do more boot testing.

I haven't updated to the latest v5.4-rc (but will later today), but I
did cherry pick the  ("mips: vdso: Fix __arch_get_hw_counter()") fix
you mentioned. I can report that my boot still segfaulted in the same
place where it pulled in.

I also attempted to manually set GENERIC_COMPAT_VDSO in my .config,
but as we know, without a prompt it has to be selected by another
Kconfig, and in the platform that I'm building it isn't selected. So I
did a quick one-liner to select it, and even with it on, I'm still
seeing the segfault.

I'm trying with a defconfig base at the moment, but running into some
compilation issues, while I sort those out, I was wondering if I could
get a copy of a working .config from you ? So I can compare and debug
from there. (I'm booting a 64bit malta configuration in qemu).

I'm also looking into the vdso clock_mode you mentioned earlier ... I
see it in clocksource.h, but I've not yet figured out how to influence
what mode my qemu boot is using (some clocksource driver config .. I
do have CLKSRC_MIPS_GIC set in my .config, so that should at least be
present). Booting with vdso=0 didn't change anything either.

Summary: there's definitely something up with my .config that didn't
like the transition to generic VDSO, and hopefully a working .config
will point me in the right direction and limit my flailing :D

Cheers,

Bruce

>
> Cheers,
>
> Bruce
>
> >
> > >
> > >>
> > >> Do you know by any change which vdso clock_mode is set in this scenario?
> > >
> > > Unfortunately not, it isn't something that we've explicitly set in the
> > > past, so I haven't looked into it. But can do more digging.
> > >
> > > Bruce
> > >
> > >>
> > >> --
> > >> Regards,
> > >> Vincenzo
> > >
> > >
> > >
> >
> > Please let us know how your investigation proceeds.
>
> I definitely will, thanks for the time spent and the confirmation that you
> aren't seeing the same thing.
>
> Bruce
>
> >
> > --
> > Regards,
> > Vincenzo
>
>
> --
> - Thou shalt not follow the NULL pointer, for chaos and madness await
> thee at its end
> - "Use the force Harry" - Gandalf, Star Trek II



--
- Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end
- "Use the force Harry" - Gandalf, Star Trek II

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-11-06 17:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24 13:12 v5.4-rcX: qemu-system-mips64 userspace segfault Bruce Ashfield
2019-10-24 13:31 ` Vincenzo Frascino
     [not found]   ` <CADkTA4N1UzrHRZi4j6MUxxT4yWsv1BSHDb11SaKqtbW_gihZ-g@mail.gmail.com>
2019-10-25  9:08     ` Vincenzo Frascino
2019-10-25 13:04       ` Bruce Ashfield
2019-11-06 17:45         ` Bruce Ashfield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).