* Re: Xen paravirt frontend block hang
[not found] <4772AC8E.7010007@theshore.net>
@ 2008-02-28 20:00 ` Jeremy Fitzhardinge
2008-03-02 0:43 ` Christopher S. Aker
[not found] ` <47758352.5040504@goop.org>
1 sibling, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2008-02-28 20:00 UTC (permalink / raw)
To: Christopher S. Aker; +Cc: virtualization, Linux Kernel Mailing List, Xen-devel
[-- Attachment #1: Type: text/plain, Size: 1664 bytes --]
Christopher S. Aker wrote:
> Sorry for the noise if this isn't the appropriate venue for this. I
> posted this last month to xen-devel:
>
> http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html
>
> I can reliably cause a paravirt_ops Xen guest to hang during intensive
> IO. My current recipe is an untar/tar loop, without compression, of a
> kernel tree. For example:
>
> wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2
> bzip2 -d linux-2.6.23.tar.bz2
>
> while true;
> echo `date`
> tar xf linux-2.6.23.tar
> tar cf linux-2.6.23.tar linux-2.6.23
> done
>
> After a few loops, anything that touches the xvd device that hung will
> get stuck in D state.
I've been running this all night without seeing any problem. I'm using
current x86.git#testing with a few local patches, but nothing especially
relevent-looking.
Could you try the attached patch to see if it makes any difference?
J
>
> This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools). Paravirt
> guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and
> 2.6.24-rc6. It does *not* occur using the Xensource 2.6.18 domU tree
> from 3.1.2. In all cases, the host continues to run fine, nothing out
> of the ordinary is logged on the dom0 side, xenstore reports the
> status of the devices is fine.
>
> Can anyone reproduce this problem, or let me know what else I can
> provide to help track this down?
>
> Thanks,
> -Chris
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/virtualization
[-- Attachment #2: xen-indirect-iret.patch --]
[-- Type: text/x-patch, Size: 2429 bytes --]
Subject: xen: use iret instruction all the time
Change iret implementation to not be dependent on direct-access vcpu
structure.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
---
arch/x86/xen/enlighten.c | 3 +--
arch/x86/xen/xen-asm.S | 11 +++--------
arch/x86/xen/xen-ops.h | 2 +-
3 files changed, 5 insertions(+), 11 deletions(-)
===================================================================
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -860,7 +860,6 @@ void __init xen_setup_vcpu_info_placemen
pv_irq_ops.irq_disable = xen_irq_disable_direct;
pv_irq_ops.irq_enable = xen_irq_enable_direct;
pv_mmu_ops.read_cr2 = xen_read_cr2_direct;
- pv_cpu_ops.iret = xen_iret_direct;
}
}
@@ -964,7 +963,7 @@ static const struct pv_cpu_ops xen_cpu_o
.read_tsc = native_read_tsc,
.read_pmc = native_read_pmc,
- .iret = (void *)&hypercall_page[__HYPERVISOR_iret],
+ .iret = xen_iret,
.irq_enable_syscall_ret = NULL, /* never called */
.load_tr_desc = paravirt_nop,
===================================================================
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -130,13 +130,8 @@ ENDPATCH(xen_restore_fl_direct)
current stack state in whatever form its in, we keep things
simple by only using a single register which is pushed/popped
on the stack.
-
- Non-direct iret could be done in the same way, but it would
- require an annoying amount of code duplication. We'll assume
- that direct mode will be the common case once the hypervisor
- support becomes commonplace.
*/
-ENTRY(xen_iret_direct)
+ENTRY(xen_iret)
/* test eflags for special cases */
testl $(X86_EFLAGS_VM | XEN_EFLAGS_NMI), 8(%esp)
jnz hyper_iret
@@ -150,9 +145,9 @@ ENTRY(xen_iret_direct)
GET_THREAD_INFO(%eax)
movl TI_cpu(%eax),%eax
movl __per_cpu_offset(,%eax,4),%eax
- lea per_cpu__xen_vcpu_info(%eax),%eax
+ mov per_cpu__xen_vcpu(%eax),%eax
#else
- movl $per_cpu__xen_vcpu_info, %eax
+ movl per_cpu__xen_vcpu, %eax
#endif
/* check IF state we're restoring */
===================================================================
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -63,5 +63,5 @@ DECL_ASM(unsigned long, xen_save_fl_dire
DECL_ASM(unsigned long, xen_save_fl_direct, void);
DECL_ASM(void, xen_restore_fl_direct, unsigned long);
-void xen_iret_direct(void);
+void xen_iret(void);
#endif /* XEN_OPS_H */
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Xen paravirt frontend block hang
[not found] ` <519a8b110802070612j2a1717f3s6aa25eeea8b7d18a@mail.gmail.com>
@ 2008-02-28 20:03 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2008-02-28 20:03 UTC (permalink / raw)
To: xming
Cc: Christopher S. Aker, virtualization, Keir Fraser, Xen-devel,
Linux Kernel Mailing List
xming wrote:
> On Feb 7, 2008 5:09 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>> xming wrote:
>>
>>> But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do
>>> not boot any more and 2.6.24 does boot but will hang after cpufreq changes
>>> the frequency.
>>>
>>>
>> Interesting. Do you mean dom0 cpufreq frequency changes will cause the
>> domU to hang?
>>
>> J
>>
>>
>
> Yes, when Dom0 changes freq while domU is doing something will trigger this.
> When using "on demand" will trigger this very eassily.
>
> This is from xm top when a domU hangs:
>
> test32 ------ 4018 98.8 131072 6.4 131072
> 6.4 1 1 4516 50087 1 0 433908 300403
> 3084907223
>
> So it appers to be running (eating CPU) sometimes the state is "r"
> sometimes "-",
> but both console and network are dead.
>
I haven't tried to repro this yet, but I suspect I won't be able to
because all my test machines have constant_tsc. Does CPU change TSC
rate on processor speed changes?
J
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Xen paravirt frontend block hang
2008-02-28 20:00 ` Xen paravirt frontend block hang Jeremy Fitzhardinge
@ 2008-03-02 0:43 ` Christopher S. Aker
2008-03-02 15:35 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 7+ messages in thread
From: Christopher S. Aker @ 2008-03-02 0:43 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: virtualization, Linux Kernel Mailing List, Xen-devel
Jeremy Fitzhardinge wrote:
> I've been running this all night without seeing any problem. I'm using
> current x86.git#testing with a few local patches, but nothing especially
> relevent-looking.
Meh .. what backend are you using? We're using LVM volumes exported
directly into the domUs like so:
disk =[ 'phy:vg1/xencaker-56392,xvda,w', ... ]
> Could you try the attached patch to see if it makes any difference?
Unfortunately we're still in the same place... pv_ops kernels are still
hanging after heavy disk IO:
works - 2.6.18.x (from xen-unstable)
hangs - 2.6.25-rc3-git3
hangs - 2.6.25-rc3-git3 + your patch
Any other suggestions or debugging I can provide that would be useful to
squash this?
-Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Xen paravirt frontend block hang
2008-03-02 0:43 ` Christopher S. Aker
@ 2008-03-02 15:35 ` Jeremy Fitzhardinge
2008-03-02 16:03 ` Christopher S. Aker
0 siblings, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-02 15:35 UTC (permalink / raw)
To: Christopher S. Aker; +Cc: virtualization, Linux Kernel Mailing List, Xen-devel
Christopher S. Aker wrote:
> Jeremy Fitzhardinge wrote:
>> I've been running this all night without seeing any problem. I'm
>> using current x86.git#testing with a few local patches, but nothing
>> especially relevent-looking.
>
> Meh .. what backend are you using? We're using LVM volumes exported
> directly into the domUs like so:
>
> disk =[ 'phy:vg1/xencaker-56392,xvda,w', ... ]
>
>> Could you try the attached patch to see if it makes any difference?
>
> Unfortunately we're still in the same place... pv_ops kernels are
> still hanging after heavy disk IO:
>
> works - 2.6.18.x (from xen-unstable)
> hangs - 2.6.25-rc3-git3
> hangs - 2.6.25-rc3-git3 + your patch
>
> Any other suggestions or debugging I can provide that would be useful
> to squash this?
Are you running an SMP or UP domain? I found I could get hangs very
easily with UP (but I need confirm it isn't a result of some other very
experimental patches).
J
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Xen paravirt frontend block hang
2008-03-02 15:35 ` Jeremy Fitzhardinge
@ 2008-03-02 16:03 ` Christopher S. Aker
2008-03-18 16:01 ` [Xen-devel] " Jeremy Fitzhardinge
0 siblings, 1 reply; 7+ messages in thread
From: Christopher S. Aker @ 2008-03-02 16:03 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: virtualization, Linux Kernel Mailing List, Xen-devel
Jeremy Fitzhardinge wrote:
> Are you running an SMP or UP domain? I found I could get hangs very
> easily with UP (but I need confirm it isn't a result of some other very
> experimental patches).
The hang occurs with both SMP and UP compiled pv_ops kernels. SMP
kernels are still slightly responsive after the hang occurs, which makes
me think only one proc gets stuck at a time, not the entire kernel.
-Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xen-devel] Re: Xen paravirt frontend block hang
2008-03-02 16:03 ` Christopher S. Aker
@ 2008-03-18 16:01 ` Jeremy Fitzhardinge
2008-03-25 1:37 ` Christopher S. Aker
0 siblings, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-18 16:01 UTC (permalink / raw)
To: Christopher S. Aker
Cc: Xen-devel, Linux Kernel Mailing List, virtualization, xming
Christopher S. Aker wrote:
> Jeremy Fitzhardinge wrote:
>> Are you running an SMP or UP domain? I found I could get hangs very
>> easily with UP (but I need confirm it isn't a result of some other
>> very experimental patches).
>
> The hang occurs with both SMP and UP compiled pv_ops kernels. SMP
> kernels are still slightly responsive after the hang occurs, which
> makes me think only one proc gets stuck at a time, not the entire kernel.
The patch I posted yesterday - "xen: fix RMW when unmasking events" -
should definitively fix the hanging-under-load bugs (I hope). It
problem came from returning to userspace with pending events, which
would leave them hanging around on the vcpu unprocessed, and eventually
everything would deadlock. This was caused by using an unlocked
read-modify-write operation on the event pending flag - which can be set
by another (real) cpu - meaning that the pending event wasn't noticed
until too late. It would only be a problem on an SMP host.
The patch should back-apply to 2.6.24.
J
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xen-devel] Re: Xen paravirt frontend block hang
2008-03-18 16:01 ` [Xen-devel] " Jeremy Fitzhardinge
@ 2008-03-25 1:37 ` Christopher S. Aker
0 siblings, 0 replies; 7+ messages in thread
From: Christopher S. Aker @ 2008-03-25 1:37 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Xen-devel, Linux Kernel Mailing List, virtualization, xming
Jeremy Fitzhardinge wrote:
> Christopher S. Aker wrote:
>> Jeremy Fitzhardinge wrote:
>>> Are you running an SMP or UP domain? I found I could get hangs very
>>> easily with UP (but I need confirm it isn't a result of some other
>>> very experimental patches).
>>
>> The hang occurs with both SMP and UP compiled pv_ops kernels. SMP
>> kernels are still slightly responsive after the hang occurs, which
>> makes me think only one proc gets stuck at a time, not the entire kernel.
>
> The patch I posted yesterday - "xen: fix RMW when unmasking events" -
> should definitively fix the hanging-under-load bugs (I hope).
Confirmed-by: caker@theshore.net
Nice work!
-Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-03-25 1:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <4772AC8E.7010007@theshore.net>
2008-02-28 20:00 ` Xen paravirt frontend block hang Jeremy Fitzhardinge
2008-03-02 0:43 ` Christopher S. Aker
2008-03-02 15:35 ` Jeremy Fitzhardinge
2008-03-02 16:03 ` Christopher S. Aker
2008-03-18 16:01 ` [Xen-devel] " Jeremy Fitzhardinge
2008-03-25 1:37 ` Christopher S. Aker
[not found] ` <47758352.5040504@goop.org>
[not found] ` <479E71B7.7060207@theshore.net>
[not found] ` <479E75E3.6030601@goop.org>
[not found] ` <479E7BA4.5050306@theshore.net>
[not found] ` <519a8b110802060437k70c099b7y7faefe63dd82039@mail.gmail.com>
[not found] ` <47AA845E.8020708@goop.org>
[not found] ` <519a8b110802070612j2a1717f3s6aa25eeea8b7d18a@mail.gmail.com>
2008-02-28 20:03 ` Jeremy Fitzhardinge
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).