linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Kevin Shanahan <kmshanah@ucwb.org.au>
Cc: Avi Kivity <avi@redhat.com>, "Rafael J. Wysocki" <rjw@sisk.pl>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Kernel Testers List <kernel-testers@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>, Mike Galbraith <efault@gmx.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
Date: Sat, 21 Mar 2009 15:08:25 +0100	[thread overview]
Message-ID: <20090321140823.GA5956@nowhere> (raw)
In-Reply-To: <1237611639.4933.4.camel@kulgan.wumi.org.au>

On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > Ok, I've made a small script based on yours which could do this job.
> > > > You will just have to set yourself a threshold of latency
> > > > that you consider as buggy. I don't remember the latency you observed.
> > > > About 5 secs right?
> > > > 
> > > > It's the "thres" variable in the script.
> > > > 
> > > > The resulting trace should be a mixup of the function graph traces
> > > > and scheduler events which look like this:
> > > > 
> > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > 
> > > > + is a wakeup and ==> is a context switch.
> > > > 
> > > > The script will loop trying some pings and will only keep the trace that matches
> > > > the latency threshold you defined.
> > > > 
> > > > Tell if the following script work for you.
> > 
> > ...
> > 
> > > Either way, I'll try to get some results in my maintenance window
> > > tonight.
> > 
> > Testing did not go so well. I compiled and booted
> > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > started shutting down VMs to try and get it under control, but before I
> > got back to tracing again the machine disappeared off the network -
> > unresponsive to ping.
> > 
> > When I got in this morning, there was nothing on the console, nothing in
> > the logs to show what went wrong. I will try again, but my next chance
> > will probably be Saturday. Stay tuned.
> 
> Okay, new set of traces have been uploaded to:
> 
>   http://disenchant.net/tmp/bug-12465/trace-3/
> 
> These were done on the latest tip, which I pulled down this morning:
> 2.6.29-rc8-tip-02744-gd9937cb.
> 
> The system load was very high again when I first tried to trace with
> sevarl guests running, so I ended up only having the one guest running
> and thankfully the bug was still reproducable that way.
> 
> Fingers crossed this set of traces is able to tell us something.


Thanks a lot Kevin!

The traces seem indeed much more clearer now.
Looking at the first trace, we begin with qemu which answers to the ping.
By roughly simplying the trace, we have that:


Found buggy latency:  9297.585
Please send the trace you will find on /sys/kernel/debug/tracing/trace
# tracer: function_graph
#
#      TIME       CPU  TASK/PID        DURATION                  FUNCTION CALLS
#       |         |    |    |           |   |                     |   |   |   |

							/* answer the ping (socket write) */
 2668.130735 |   0)  qemu-sy-4048  |               |  sys_writev() {
 2668.130735 |   0)  qemu-sy-4048  |   0.361 us    |    fget_light();
 2668.130744 |   0)  qemu-sy-4048  |               |       netif_rx_ni() {
 2668.130744 |   0)  qemu-sy-4048  |               |         netif_rx() {
 2668.130763 |   0)  qemu-sy-4048  |               |           ipv4_conntrack_in() {
 2668.130764 |   0)  qemu-sy-4048  |               |             nf_conntrack_in() {
 2668.130764 |   0)  qemu-sy-4048  |   0.328 us    |               ipv4_get_l4proto();
 2668.130765 |   0)  qemu-sy-4048  |   0.310 us    |               __nf_ct_l4proto_find();
 2668.130776 |   0)  qemu-sy-4048  |               |                 icmp_packet() {
 2668.130804 |   0)  qemu-sy-4048  |               |                   netif_receive_skb() {
 2668.130804 |   0)  qemu-sy-4048  |               |                     ip_rcv() {
 2668.130824 |   0)  qemu-sy-4048  |               |                       raw_rcv() {
 2668.130824 |   0)  qemu-sy-4048  |   0.307 us    |                         skb_push();
 2668.130825 |   0)  qemu-sy-4048  |               |                           raw_rcv_skb() {
 2668.130832 |   0)  qemu-sy-4048  |               |                             __wake_up_common() {
 2668.130838 |   0)  qemu-sy-4048  |               |                               /* sched_wakeup: task ping:7420 [120] success=1 */
 2668.130839 |   0)  qemu-sy-4048  |   0.312 us    |                           }
                                                                              }
                                                                             }
                                                      [...]

							/* ping was waaiting for this response and is now awaken */
 2668.130876 |   0)  qemu-sy-4048  |               |  schedule() {
 2668.130885 |   0)  qemu-sy-4048  |               |  /* sched_switch: task qemu-system-x86:4048 [120] ==> ping:7420 [120] */
 2668.130885 |   0)  qemu-sy-4048  |               |    runqueue_is_locked() {
 2668.130886 |   0)  qemu-sy-4048  |   0.399 us    |    __phys_addr();
 ------------------------------------------
 0)  qemu-sy-4048  =>   ping-7420   
 ------------------------------------------

 2668.130887 |   0)   ping-7420    |               |                  finish_task_switch() {
 2668.130887 |   0)   ping-7420    |               |                    perf_counter_task_sched_in() {
 2668.130888 |   0)   ping-7420    |   0.319 us    |                      _spin_lock();
 2668.130888 |   0)   ping-7420    |   0.959 us    |                    }
 2668.130889 |   0)   ping-7420    |   1.644 us    |                  }
 2668.130889 |   0)   ping-7420    | ! 298102.3 us |                }
 2668.130890 |   0)   ping-7420    |               |                del_timer_sync() {
 2668.130890 |   0)   ping-7420    |               |                  try_to_del_timer_sync() {
 2668.130890 |   0)   ping-7420    |               |                    lock_timer_base() {
 2668.130890 |   0)   ping-7420    |   0.328 us    |                      _spin_lock_irqsave();
 2668.130891 |   0)   ping-7420    |   0.946 us    |                    }
 2668.130891 |   0)   ping-7420    |   0.328 us    |                    _spin_unlock_irqrestore();
 2668.130892 |   0)   ping-7420    |   2.218 us    |                  }
 2668.130892 |   0)   ping-7420    |   2.847 us    |                }
 2668.130893 |   0)   ping-7420    | ! 298108.7 us |              }
 2668.130893 |   0)   ping-7420    |   0.340 us    |              finish_wait();
 2668.130894 |   0)   ping-7420    |   0.328 us    |              _spin_lock_irqsave();
 2668.130894 |   0)   ping-7420    |   0.324 us    |              _spin_unlock_irqrestore();



As you can see we are in the middle of the dialog between ping and the kvm.
It means that we have lost many traces. I thing that the ring buffer does not have
enough space allocated for these 9 seconds of processing.

Just wait a bit while I'm cooking a better script, or at least trying to get a
better number of entries to allocate to the ring buffer and I come back to you.

But anyway, the scheduler switch and wake up events help a lot.

 
> Regards,
> Kevin.
> 
> 


  reply	other threads:[~2009-03-21 14:08 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-14 19:11 2.6.29-rc8: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-14 19:12 ` [Bug #12061] snd_hda_intel: power_save: sound cracks on powerdown Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12411] 2.6.28: BUG in r8169 Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12209] oldish top core dumps (in its meminfo() function) Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12208] uml is very slow on 2.6.28 host Rafael J. Wysocki
2009-03-21 14:44   ` ptrace performance (was: [Bug #12208] uml is very slow on 2.6.28 host) Michael Riepe
2009-03-21 15:22     ` Ingo Molnar
2009-03-21 17:02       ` ptrace performance Michael Riepe
2009-03-14 19:20 ` [Bug #12337] ~100 extra wakeups reported by powertop Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12404] Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12421] GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12426] TMDC Joystick no longer works in kernel 2.6.28 Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-15  9:03   ` Kevin Shanahan
2009-03-15  9:18     ` Avi Kivity
2009-03-15  9:48       ` Ingo Molnar
2009-03-15  9:56         ` Avi Kivity
2009-03-15 10:03           ` Ingo Molnar
2009-03-15 10:13             ` Avi Kivity
2009-03-16  9:49     ` Avi Kivity
2009-03-16 12:46       ` Kevin Shanahan
2009-03-16 20:07         ` Frederic Weisbecker
2009-03-16 22:55           ` Kevin Shanahan
2009-03-18  0:20             ` Frederic Weisbecker
2009-03-18  1:16               ` Kevin Shanahan
2009-03-18  2:24                 ` Frederic Weisbecker
2009-03-18 21:24                 ` Kevin Shanahan
2009-03-21  5:00                   ` Kevin Shanahan
2009-03-21 14:08                     ` Frederic Weisbecker [this message]
2009-03-24 11:44                     ` Frederic Weisbecker
2009-03-24 11:47                       ` Frederic Weisbecker
2009-03-25 23:40                       ` Kevin Shanahan
2009-03-25 23:48                         ` Frederic Weisbecker
2009-03-26 20:22                       ` Kevin Shanahan
2009-03-14 19:20 ` [Bug #12500] r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12619] Regression 2.6.28 and last - boot failed Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12612] hard lockup when interrupting cdda2wav Rafael J. Wysocki
2009-03-17  0:53   ` FUJITA Tomonori
2009-03-17 14:52     ` James Bottomley
2009-03-14 19:20 ` [Bug #12645] DMI low-memory-protect quirk causes resume hang on Samsung NC10 Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12634] video distortion and lockup with i830 video chip and 2.6.28.3 Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12690] DPMS (LCD powersave, poweroff) don't work Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12798] No wake up after suspend Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12818] iwlagn broken after suspend to RAM (iwlagn: MAC is in deep sleep!) Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12835] Regression in backlight detection Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12868] iproute2 and regressing "ipv6: convert tunnels to net_device_ops" Rafael J. Wysocki
  -- strict thread matches above, loose matches on Subject: below --
2009-03-21 17:01 2.6.29-rc8-git5: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-21 17:07 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-21 19:50   ` Ingo Molnar
2009-03-03 19:34 2.6.29-rc6-git7: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-03 19:41 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-04  3:08   ` Kevin Shanahan
2009-03-08 10:04     ` Avi Kivity
2009-02-23 22:00 2.6.29-rc6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-23 22:03 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-24  0:59   ` Kevin Shanahan
2009-02-24  1:37     ` Rafael J. Wysocki
2009-02-24 12:09     ` Avi Kivity
2009-02-24 22:11       ` Kevin Shanahan
2009-02-14 20:48 2.6.29-rc5: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-04 10:55 2.6.29-rc3-git6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-04 10:58 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-05 19:35   ` Kevin Shanahan
2009-02-05 22:37     ` Rafael J. Wysocki
2009-01-19 21:41 2.6.29-rc2-git1: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-01-19 21:45 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-01-20  0:12   ` Kevin Shanahan
2009-01-20 11:35     ` Ingo Molnar
2009-01-20 12:37       ` Avi Kivity
2009-01-20 12:42       ` Kevin Shanahan
2009-01-20 12:56         ` Ingo Molnar
2009-01-20 13:07           ` Ingo Molnar
2009-01-20 14:59             ` Steven Rostedt
2009-01-20 15:04               ` Ingo Molnar
2009-01-20 17:53                 ` Steven Rostedt
2009-01-20 18:39                   ` Ingo Molnar
2009-01-20 17:47               ` Avi Kivity
2009-01-21 14:25                 ` Kevin Shanahan
2009-01-21 14:34                   ` Avi Kivity
2009-01-21 14:51                     ` Kevin Shanahan
2009-01-21 14:59                       ` Avi Kivity
2009-01-21 15:13                         ` Steven Rostedt
2009-01-22  1:48                         ` Steven Rostedt
2009-01-21 15:10                     ` Steven Rostedt
2009-01-21 15:18                     ` Ingo Molnar
2009-01-22 19:57                       ` Kevin Shanahan
2009-01-22 20:31                         ` Ingo Molnar
2009-01-26  9:55                       ` Kevin Shanahan
2009-01-26 11:35                         ` Peter Zijlstra
2009-01-26 15:00                           ` Ingo Molnar
2009-01-20 14:23           ` Kevin Shanahan
2009-01-20 14:25             ` Ingo Molnar
2009-01-20 15:51               ` Kevin Shanahan
2009-01-20 16:06                 ` Ingo Molnar
2009-01-20 16:19                   ` Peter Zijlstra
2009-01-20 14:46             ` Frédéric Weisbecker
2009-01-20 13:04         ` Avi Kivity
2009-01-20 17:54           ` Kevin Shanahan
2009-01-20 18:42             ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090321140823.GA5956@nowhere \
    --to=fweisbec@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=avi@redhat.com \
    --cc=efault@gmx.de \
    --cc=kernel-testers@vger.kernel.org \
    --cc=kmshanah@ucwb.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).