All of lore.kernel.org
 help / color / mirror / Atom feed
From: Victor Kamensky <kamensky@cisco.com>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: "Richard Henderson" <richard.henderson@linaro.org>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	openembedded-core <openembedded-core@lists.openembedded.org>
Subject: Re: Need arm64/qemu help
Date: Sun, 11 Mar 2018 19:25:08 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LRH.2.00.1803111848530.64087@sjc-ads-6991.cisco.com> (raw)
In-Reply-To: <CAFEAcA-d6-_tVRWKbZpVsQtece4fjuUqF+o+-m-AG4jPUHPY4g@mail.gmail.com>



On Sun, 11 Mar 2018, Peter Maydell wrote:

> On 11 March 2018 at 00:11, Victor Kamensky <kamensky@cisco.com> wrote:
>> Hi Richard, Ian,
>>
>> Any progress on the issue? In case if not, I am adding few Linaro guys
>> who work on aarch64 qemu. Maybe they can give some insight.
>
> No immediate answers, but we might be able to have a look
> if you can provide a repro case (image, commandline, etc)
> that doesn't require us to know anything about OE and your
> build/test infra to look at.

Peter, thank you! Appreciate your attention and response to
this. It is fair ask, I should have tried to narrow test
case down before punting it to you guys.

> (QEMU's currently just about
> to head into codefreeze for our next release, so I'm a bit
> busy for the next week or so. Alex, do you have time to
> take a look at this?)
>
> Does this repro with the current head-of-git QEMU?

I've tried head-of-git QEMU (Mar 9) on my ubuntu-16.04
with the same target Image and rootfs I could not reproduce
the issue.

I've started to play around more trying to reduce the test
case. In my setup with OE qith qemu 2.11.1, if I just passed
'-serial sdtio' or '-nographic', instead of '-serial mon:vc'
- with all things the same image boots fine.

So, I started to suspect, even if problem manifests itself
as some functional failure of qemu, the issue could be some
nasty memory corruption of some qemu operational data.
And since qemu pull bunch of dependent
libraries, problem might be not even in qemu.

I realized that in OE in order to disconnect itself from
underlying host, OE builds a lot of its own "native"
libaries and OE qemu uses them. So I've tried to build
head-of-git QEMU but with all native libraries that OE
builds - now such combinations hangs in the same way.

Also I noticed that OE qemu is built with SDL (v1.2),
and libsdl is one that reponsible for '-serial mon:vc'
handling. And I noticed in default OE conf/local.conf
the following statements:

#
# Qemu configuration
#
# By default qemu will build with a builtin VNC server where graphical 
output can be
# seen. The two lines below enable the SDL backend too. By default 
libsdl-native will
# be built, if you want to use your host's libSDL instead of the minimal 
libsdl built
# by libsdl-native then uncomment the ASSUME_PROVIDED line below.
PACKAGECONFIG_append_pn-qemu-native = " sdl"
PACKAGECONFIG_append_pn-nativesdk-qemu = " sdl"
#ASSUME_PROVIDED += "libsdl-native"

I've tried to build against my host's libSDL and uncommented
above line. It actually failed to build, because my host libSDL
were not happy about ncurses native libraries, so I ended up
adding this as well:

ASSUME_PROVIDED += "ncurses-native"

After that I had to rebuild qemu-native and qemu-helper-native.
With resulting qemu and the same target files, image boots
OK.

With such nasty corruption problem, it always hard to say for
sure, it maybe just timing changes .. , but now it seems it
somewhat points to some issue in OE libsdl version ... And
still it is fairly bizarre, libsdl
that in OE (1.2.15) is the same that I have on my ubuntu
machine and there is no additional patches for it in OE,
although configure options might be quite different.

Thanks,
Victor

>> If for experiment sake I disable loop that tries to find
>> jiffies transition. I.e have something like this:
>>
>> diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
>> index 4769947..e0199fc 100644
>> --- a/lib/raid6/algos.c
>> +++ b/lib/raid6/algos.c
>> @@ -166,8 +166,12 @@ static inline const struct raid6_calls
>> *raid6_choose_gen(
>>
>>                         preempt_disable();
>>                         j0 = jiffies;
>> +#if 0
>>                         while ((j1 = jiffies) == j0)
>>                                 cpu_relax();
>> +#else
>> +                        j1 = jiffies;
>> +#endif /* 0 */
>>                         while (time_before(jiffies,
>>                                             j1 +
>> (1<<RAID6_TIME_JIFFIES_LG2))) {
>>                                 (*algo)->gen_syndrome(disks, PAGE_SIZE,
>> *dptrs);
>> @@ -189,8 +193,12 @@ static inline const struct raid6_calls
>> *raid6_choose_gen(
>>
>>                         preempt_disable();
>>                         j0 = jiffies;
>> +#if 0
>>                         while ((j1 = jiffies) == j0)
>>                                 cpu_relax();
>> +#else
>> +                        j1 = jiffies;
>> +#endif /* 0 */
>>                         while (time_before(jiffies,
>>                                             j1 +
>> (1<<RAID6_TIME_JIFFIES_LG2))) {
>>                                 (*algo)->xor_syndrome(disks, start, stop,
>>
>> Image boots fine after that.
>>
>> I.e it looks as some strange effect in aarch64 qemu that seems does not
>> progress jiffies and code stuck.
>
>> Another observation is that if I put breakpoint for example
>> in do_timer, it actually hits the breakpoint, ie timer interrupt
>> happens in this case, and strangely raid6_choose_gen sequence
>> does progress, ie debugger breakpoints make this case unstuck.
>> Actually several pressing Ctrl-C to interrupt target, followed
>> by continue in gdb let code eventually go out of raid6_choose_gen.
>>
>> Also whenever I presss Ctrl-C in gdb to stop target it always
>> in stalled case drops with $pc into first instruction of el1_irq,
>> I never saw different $pc hang code interrupt. Does it mean qemu
>> hangged on first instruction of el1_irq handler? Note once I do
>> stepi after that it ables to proceseed. If I continue steping
>> eventually it gets to arch_timer_handler_virt and do_timer.
>
> This is definitely rather weird and suggestive of a QEMU bug...
>
>> For Linaro qemu aarch64 guys more details:
>>
>> Situation happens on latest openembedded-core, for qemuarm64 MACHINE.
>> It does not happens always, i.e sometimes it works.
>>
>> Qemu version is 2.11.1 and it is invoked like this (through regular
>> oe runqemu helper utility):
>>
>> /wd6/oe/20180304/systemtap-oe-sysroot/build/tmp-glibc/work/x86_64-linux/qemu-helper-native/1.0-r1/recipe-sysroot-native/usr/bin/qemu-system-aarch64
>> -device virtio-net-device,netdev=net0,mac=52:54:00:12:34:02 -netdev
>> tap,id=net0,ifname=tap0,script=no,downscript=no -drive
>> id=disk0,file=/wd6/oe/20180304/systemtap-oe-sysroot/build/tmp-glibc/deploy/images/qemuarm64/core-image-minimal-qemuarm64-20180305025002.rootfs.ext4,if=none,format=raw
>> -device virtio-blk-device,drive=disk0 -show-cursor -device virtio-rng-pci
>> -monitor null -machine virt -cpu cortex-a57 -m 512 -serial mon:vc -serial
>> null -kernel
>> /wd6/oe/20180304/systemtap-oe-sysroot/build/tmp-glibc/deploy/images/qemuarm64/Image
>> -append root=/dev/vda rw highres=off  mem=512M
>> ip=192.168.7.2::192.168.7.1:255.255.255.0 console=ttyAMA0,38400
>
> Well, you're not running an SMP config, which rules a few
> things out at least.
>
> thanks
> -- PMM
>


  parent reply	other threads:[~2018-03-12  2:25 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-03  9:00 Need arm64/qemu help Richard Purdie
2018-03-03 10:51 ` Ian Arkver
2018-03-03 11:06   ` Richard Purdie
2018-03-03 11:13   ` Ian Arkver
2018-03-11  0:11     ` Victor Kamensky
2018-03-11 14:05       ` Richard Purdie
     [not found]       ` <CAFEAcA-d6-_tVRWKbZpVsQtece4fjuUqF+o+-m-AG4jPUHPY4g@mail.gmail.com>
2018-03-12  2:25         ` Victor Kamensky [this message]
2018-03-17 22:51           ` Victor Kamensky
     [not found]             ` <87a7v4kj5g.fsf@linaro.org>
2018-03-19  6:26               ` Victor Kamensky
     [not found]                 ` <87605sjvk4.fsf@linaro.org>
2018-03-19 17:46                   ` Victor Kamensky
     [not found]                     ` <CAFEAcA8p9uO_7MXzkqyAnc-gifTHhmUgyRyE8X7hZnW=sbgQOg@mail.gmail.com>
2018-03-19 23:24                       ` Victor Kamensky
     [not found]                         ` <e15e3ca1-ee9e-1efa-2247-2789aac05091@linaro.org>
2018-03-20  2:22                           ` Victor Kamensky
     [not found]                       ` <874llbk1xc.fsf@linaro.org>
2018-03-20  2:14                         ` Victor Kamensky
2018-03-03 18:21 ` Richard Purdie
2018-03-05 22:10   ` Ian Arkver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LRH.2.00.1803111848530.64087@sjc-ads-6991.cisco.com \
    --to=kamensky@cisco.com \
    --cc=alex.bennee@linaro.org \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=peter.maydell@linaro.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.