From mboxrd@z Thu Jan 1 00:00:00 1970 From: Igor Mammedov Subject: Re: x86: kvmclock: abstract save/restore sched_clock_state Date: Fri, 10 Feb 2012 21:58:47 +0100 Message-ID: <4F358507.2030906@redhat.com> References: <20120207210542.GC20618@amt.cnet> <20120209122742.GA25505@amit.redhat.com> <4F33E299.3090500@redhat.com> <20120210100211.GA2971@amit.redhat.com> <20120210123216.GA11137@amt.cnet> <20120210131828.GA31999@amit.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , kvm@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, johnstul@us.ibm.com, riel@redhat.com, avi@redhat.com, Gleb Natapov To: Amit Shah Return-path: Received: from mx1.redhat.com ([209.132.183.28]:9206 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758980Ab2BJU66 (ORCPT ); Fri, 10 Feb 2012 15:58:58 -0500 In-Reply-To: <20120210131828.GA31999@amit.redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: I've to revoke my ack and say NAK to this patch. Patch itself is in right direction but clock restore happens too late. With patch I've used to hunt down overflow for cpu hotplug (maybe it will be better for it to be in kernel, which will help to detect overflow problem without spending a lot of time to debug it?): diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 42eb330..0081e10 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -43,7 +43,10 @@ void pvclock_set_flags(u8 flags) static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow) { - u64 delta = native_read_tsc() - shadow->tsc_timestamp; + u64 tsc = native_read_tsc(); + u64 shadow_timestamp = shadow->tsc_timestamp; + u64 delta = tsc - shadow_timestamp; + BUG_ON (tsc < shadow_timestamp); return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift); } I've got following back-trace: #29 0xffffffff81004925 in oops_end (flags=70, regs=, signr=11) at arch/x86/kernel/dumpstack.c:246 #30 0xffffffff81004a77 in die (str=0xffffffff8152ca7f "invalid opcode", regs=0xffff8800149859e8, err=0) at arch/x86/kernel/dumpstack.c:305 #31 0xffffffff81001fed in do_trap (trapnr=6, signr=4, str=0xffffffff8152ca7f "invalid opcode", regs=0xffff8800149859e8, error_code=0, info=0xffff880014985948) at arch/x86/kernel/traps.c:168 #32 0xffffffff810021f7 in do_invalid_op (regs=0xffff8800149859e8, error_code=0) at arch/x86/kernel/traps.c:209 #33 #34 pvclock_get_nsec_offset (shadow=0xffff880014985a98) at arch/x86/kernel/pvclock.c:51 #35 pvclock_clocksource_read (src=0xffff88001f1d1ac0) at arch/x86/kernel/pvclock.c:107 #36 0xffffffff8101aa97 in kvm_clock_read () at arch/x86/kernel/kvmclock.c:79 #37 0xffffffff810087cc in paravirt_sched_clock () at /home/imammedov/fc15/linux-2.6/arch/x86/include/asm/paravirt.h:230 #38 sched_clock () at arch/x86/kernel/tsc.c:71 #39 0xffffffff8104f025 in sched_clock_local (scd=0xffff88001f1d2600) at kernel/sched/clock.c:147 #40 0xffffffff8104f1b7 in sched_clock_cpu (cpu=0) at kernel/sched/clock.c:232 #41 0xffffffff8104f1e5 in local_clock () at kernel/sched/clock.c:316 #42 0xffffffff810699d2 in lockstat_clock () at kernel/lockdep.c:158 #43 __lock_acquire (lock=0xffffffff816131b8, subclass=, trylock=0, read=0, check=2, hardirqs_off=, nest_lock=0x0, ip=18446744071578924098, references=0) at kernel/lockdep.c:3098 #44 0xffffffff8106a924 in lock_acquire (lock=0xffffffff816131b8, subclass=0, trylock=0, read=0, check=2, nest_lock=0x0, ip=18446744071578924098) at kernel/lockdep.c:3555 #45 0xffffffff8136b1c3 in __raw_spin_lock (lock=0xffffffff816131a0) at include/linux/spinlock_api_smp.h:143 #46 _raw_spin_lock (lock=0xffffffff816131a0) at kernel/spinlock.c:137 #47 0xffffffff81013442 in prepare_set () at arch/x86/kernel/cpu/mtrr/generic.c:676 #48 0xffffffff8101368c in generic_set_all () at arch/x86/kernel/cpu/mtrr/generic.c:723 #49 0xffffffff8101258b in mtrr_bp_restore () at arch/x86/kernel/cpu/mtrr/main.c:738 #50 0xffffffff812cabfa in __restore_processor_state (ctxt=) at arch/x86/power/cpu.c:227 #51 restore_processor_state () at arch/x86/power/cpu.c:233 ---Type to continue, or q to quit--- #52 0xffffffff8105a1b6 in create_image (platform_mode=0) at kernel/power/hibernate.c:291 #53 hibernation_snapshot (platform_mode=0) at kernel/power/hibernate.c:363 #54 0xffffffff8105a825 in hibernate () at kernel/power/hibernate.c:629 #55 0xffffffff81058868 in state_store (kobj=, attr=, buf=0xffff88001366e000
, n=5) at kernel/power/main.c:284 #56 0xffffffff811e6cc3 in kobj_attr_store (kobj=, attr=, buf=, count=) at lib/kobject.c:699 #57 0xffffffff811372c2 in flush_write_buffer (count=, buffer=, dentry=) at fs/sysfs/file.c:202 #58 sysfs_write_file (file=, buf=0x7f9ce710d000
, count=, ppos=0xffff880014985f58) at fs/sysfs/file.c:236 #59 0xffffffff810ebfbe in vfs_write (file=0xffff88001cb48000, buf=0x7f9ce710d000
, count=, pos=0xffff880014985f58) at fs/read_write.c:435 #60 0xffffffff810ec175 in sys_write (fd=, buf=0x7f9ce710d000
, count=) at fs/read_write.c:487 that shows an access to clock happens right before x86_platform.restore_sched_clock_state is called. Moving x86_platform.restore_sched_clock_state before mtrr_bp_restore solves issue. It isn't bugged on me for 20 save/restore cycles with this change, without this change it bugs on 2nd-3rd cycle. BTW Amit, your config doesn't have CONFIG_KVM_GUEST set, which causes primary cpu clock to be uninitialized too in case of SMP kernel. Commit ca3f10172eea9b95b moved it from kvmclock_init to kvm_guest_init but forgot to change Kconfig too and kvm_guest_init is called only when KVM_GUEST is selected. This commit created following implicit deps if SMP && KVM_CLOCK then select KVM_GUEST But I don't know if it's possible to express it in Kconfig. That rises a question: Do we need KVM_CLOCK option? Maybe better to use KVM_GUEST instead and remove KVM_CLOCK option? Or make KVM_CLOCK depend on KVM_GUEST...