All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
@ 2017-04-12 17:23 Marcelo Tosatti
  2017-04-12 17:40 ` Radim Krčmář
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2017-04-12 17:23 UTC (permalink / raw)
  To: kvm-devel; +Cc: Paolo Bonzini, Radim Krčmář, pagupta


The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK 
attempts to disable interrupts in that section to protect
the values that are calculated in that section from interrupt interference. 

now_ns is calculated inside the irq protected region, 
user_ns.clock is passed from userspace (therefore not susceptible
to interrupt variation).

About the line
                now_ns = __get_kvmclock_ns(kvm);  (1)

Interrupts can happen afterwards local_irq_enable(), 
rendering "now_ns" relative to its execution time PLUS 
interrupt time.

Therefore the local_irq_disable() / local_irq_enable() protection is not
necessary (that is: interrupts triggering after local_irq_enable cause
the same problem that the protection is trying to avoid).

With this reasoning, and the -RT bug that the irq disablement causes
(because spin_lock is now a sleeping lock), remove the IRQ protection as it causes: 

[ 1064.668109] in_atomic(): 0, irqs_disabled(): 1, pid: 15296, name:m
[ 1064.668110] INFO: lockdep is turned off.
[ 1064.668110] irq event stamp: 0
[ 1064.668112] hardirqs last  enabled at (0): [<          (null)>]  )
[ 1064.668116] hardirqs last disabled at (0): [<ffffffff9308184a>] c0
[ 1064.668118] softirqs last  enabled at (0): [<ffffffff9308184a>] c0
[ 1064.668118] softirqs last disabled at (0): [<          (null)>]  )
[ 1064.668121] CPU: 13 PID: 15296 Comm: qemu-kvm Not tainted 3.10.0-1
[ 1064.668121] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 5
[ 1064.668123]  ffff8c1796b88000 00000000afe7344c ffff8c179abf3c68 f3
[ 1064.668125]  ffff8c179abf3c90 ffffffff930ccb3d ffff8c1b992b3610 f0
[ 1064.668126]  00007ffc1a26fbc0 ffff8c179abf3cb0 ffffffff9375f694 f0
[ 1064.668126] Call Trace:
[ 1064.668132]  [<ffffffff93757413>] dump_stack+0x19/0x1b
[ 1064.668135]  [<ffffffff930ccb3d>] __might_sleep+0x12d/0x1f0
[ 1064.668138]  [<ffffffff9375f694>] rt_spin_lock+0x24/0x60
[ 1064.668155]  [<ffffffffc06ab996>] __get_kvmclock_ns+0x36/0x110 [k]
[ 1064.668159]  [<ffffffff93112993>] ? futex_wait_queue_me+0x103/0x10
[ 1064.668171]  [<ffffffffc06b8782>] kvm_arch_vm_ioctl+0xa2/0xd70 [k]
[ 1064.668173]  [<ffffffff9311333c>] ? futex_wait+0x1ac/0x2a0

On -RT kernels.


Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 
arch/x86/kvm/x86.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7c39b8a..935143b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4062,10 +4062,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			goto out;
 
 		r = 0;
-		local_irq_disable();
 		now_ns = __get_kvmclock_ns(kvm);
 		kvm->arch.kvmclock_offset += user_ns.clock - now_ns;
-		local_irq_enable();
 		kvm_gen_update_masterclock(kvm);
 		break;
 	}
@@ -4073,11 +4071,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		struct kvm_clock_data user_ns;
 		u64 now_ns;
 
-		local_irq_disable();
 		now_ns = __get_kvmclock_ns(kvm);
 		user_ns.clock = now_ns;
 		user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
-		local_irq_enable();
 		memset(&user_ns.pad, 0, sizeof(user_ns.pad));
 
 		r = -EFAULT;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
  2017-04-12 17:23 [PATCH] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK Marcelo Tosatti
@ 2017-04-12 17:40 ` Radim Krčmář
  2017-04-13  6:06 ` Pankaj Gupta
  2017-04-13  7:32 ` [PATCH] " Roman Kagan
  2 siblings, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2017-04-12 17:40 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel, Paolo Bonzini, pagupta

2017-04-12 14:23-0300, Marcelo Tosatti:
> The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK 
> attempts to disable interrupts in that section to protect
> the values that are calculated in that section from interrupt interference. 
> 
> now_ns is calculated inside the irq protected region, 
> user_ns.clock is passed from userspace (therefore not susceptible
> to interrupt variation).
> 
> About the line
>                 now_ns = __get_kvmclock_ns(kvm);  (1)
> 
> Interrupts can happen afterwards local_irq_enable(), 
> rendering "now_ns" relative to its execution time PLUS 
> interrupt time.
> 
> Therefore the local_irq_disable() / local_irq_enable() protection is not
> necessary (that is: interrupts triggering after local_irq_enable cause
> the same problem that the protection is trying to avoid).

Makes sense.

get_kvmclock_ns() has the same problem and this patch should cover it as
well.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
  2017-04-12 17:23 [PATCH] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK Marcelo Tosatti
  2017-04-12 17:40 ` Radim Krčmář
@ 2017-04-13  6:06 ` Pankaj Gupta
  2017-04-13 18:30   ` Marcelo Tosatti
  2017-04-13 18:49   ` [PATCH v2] " Marcelo Tosatti
  2017-04-13  7:32 ` [PATCH] " Roman Kagan
  2 siblings, 2 replies; 11+ messages in thread
From: Pankaj Gupta @ 2017-04-13  6:06 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel, Paolo Bonzini, Radim Krčmář

 
> 
> The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK
> attempts to disable interrupts in that section to protect
> the values that are calculated in that section from interrupt interference.
> 
> now_ns is calculated inside the irq protected region,
> user_ns.clock is passed from userspace (therefore not susceptible
> to interrupt variation).
> 
> About the line
>                 now_ns = __get_kvmclock_ns(kvm);  (1)
> 
> Interrupts can happen afterwards local_irq_enable(),
> rendering "now_ns" relative to its execution time PLUS
> interrupt time.

Are you saying interrupt disable code here is to avoid any interrupt
latency before we update time?
> 
> Therefore the local_irq_disable() / local_irq_enable() protection is not
> necessary (that is: interrupts triggering after local_irq_enable cause
> the same problem that the protection is trying to avoid).

I am not sure because this could be different use-case for RT and mainline. 
RT can afford to preempt, but if we do will we get desired results?

Maybe if we want this code path to be "irq safe", can we use 'raw_spin_lock'
in place of 'spin_lock'.

> 
> With this reasoning, and the -RT bug that the irq disablement causes
> (because spin_lock is now a sleeping lock), remove the IRQ protection as it
> causes:
> 
> [ 1064.668109] in_atomic(): 0, irqs_disabled(): 1, pid: 15296, name:m
> [ 1064.668110] INFO: lockdep is turned off.
> [ 1064.668110] irq event stamp: 0
> [ 1064.668112] hardirqs last  enabled at (0): [<          (null)>]  )
> [ 1064.668116] hardirqs last disabled at (0): [] c0
> [ 1064.668118] softirqs last  enabled at (0): [] c0
> [ 1064.668118] softirqs last disabled at (0): [<          (null)>]  )
> [ 1064.668121] CPU: 13 PID: 15296 Comm: qemu-kvm Not tainted 3.10.0-1
> [ 1064.668121] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 5
> [ 1064.668123]  ffff8c1796b88000 00000000afe7344c ffff8c179abf3c68 f3
> [ 1064.668125]  ffff8c179abf3c90 ffffffff930ccb3d ffff8c1b992b3610 f0
> [ 1064.668126]  00007ffc1a26fbc0 ffff8c179abf3cb0 ffffffff9375f694 f0
> [ 1064.668126] Call Trace:
> [ 1064.668132]  [] dump_stack+0x19/0x1b
> [ 1064.668135]  [] __might_sleep+0x12d/0x1f0
> [ 1064.668138]  [] rt_spin_lock+0x24/0x60
> [ 1064.668155]  [] __get_kvmclock_ns+0x36/0x110 [k]
> [ 1064.668159]  [] ? futex_wait_queue_me+0x103/0x10
> [ 1064.668171]  [] kvm_arch_vm_ioctl+0xa2/0xd70 [k]
> [ 1064.668173]  [] ? futex_wait+0x1ac/0x2a0

Even with this patch, we might get same error for:

get_kvmclock_ns {

 local_irq_save(flags);
        ns = __get_kvmclock_ns(kvm);
        local_irq_restore(flags);


}
> 
> On -RT kernels.
> 
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> ---
>  
> arch/x86/kvm/x86.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7c39b8a..935143b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4062,10 +4062,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
>                          goto out;
>  
>                  r = 0;
> -                local_irq_disable();
>                  now_ns = __get_kvmclock_ns(kvm);
>                  kvm->arch.kvmclock_offset += user_ns.clock - now_ns;
> -                local_irq_enable();
>                  kvm_gen_update_masterclock(kvm);
>                  break;
>          }
> @@ -4073,11 +4071,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
>                  struct kvm_clock_data user_ns;
>                  u64 now_ns;
>  
> -                local_irq_disable();
>                  now_ns = __get_kvmclock_ns(kvm);
>                  user_ns.clock = now_ns;
>                  user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
> -                local_irq_enable();
>                  memset(&user_ns.pad, 0, sizeof(user_ns.pad));
>  
>                  r = -EFAULT;
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
  2017-04-12 17:23 [PATCH] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK Marcelo Tosatti
  2017-04-12 17:40 ` Radim Krčmář
  2017-04-13  6:06 ` Pankaj Gupta
@ 2017-04-13  7:32 ` Roman Kagan
  2017-04-13 13:27   ` Radim Krčmář
  2 siblings, 1 reply; 11+ messages in thread
From: Roman Kagan @ 2017-04-13  7:32 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kvm-devel, Paolo Bonzini, Radim Krčmář, pagupta

On Wed, Apr 12, 2017 at 02:23:15PM -0300, Marcelo Tosatti wrote:
> 
> The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK 
> attempts to disable interrupts in that section to protect
> the values that are calculated in that section from interrupt interference. 
> 
> now_ns is calculated inside the irq protected region, 
> user_ns.clock is passed from userspace (therefore not susceptible
> to interrupt variation).
> 
> About the line
>                 now_ns = __get_kvmclock_ns(kvm);  (1)
> 
> Interrupts can happen afterwards local_irq_enable(), 
> rendering "now_ns" relative to its execution time PLUS 
> interrupt time.
> 
> Therefore the local_irq_disable() / local_irq_enable() protection is not
> necessary (that is: interrupts triggering after local_irq_enable cause
> the same problem that the protection is trying to avoid).
> 
> With this reasoning, and the -RT bug that the irq disablement causes
> (because spin_lock is now a sleeping lock), remove the IRQ protection as it causes: 
> 
> [ 1064.668109] in_atomic(): 0, irqs_disabled(): 1, pid: 15296, name:m
> [ 1064.668110] INFO: lockdep is turned off.
> [ 1064.668110] irq event stamp: 0
> [ 1064.668112] hardirqs last  enabled at (0): [<          (null)>]  )
> [ 1064.668116] hardirqs last disabled at (0): [<ffffffff9308184a>] c0
> [ 1064.668118] softirqs last  enabled at (0): [<ffffffff9308184a>] c0
> [ 1064.668118] softirqs last disabled at (0): [<          (null)>]  )
> [ 1064.668121] CPU: 13 PID: 15296 Comm: qemu-kvm Not tainted 3.10.0-1
> [ 1064.668121] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 5
> [ 1064.668123]  ffff8c1796b88000 00000000afe7344c ffff8c179abf3c68 f3
> [ 1064.668125]  ffff8c179abf3c90 ffffffff930ccb3d ffff8c1b992b3610 f0
> [ 1064.668126]  00007ffc1a26fbc0 ffff8c179abf3cb0 ffffffff9375f694 f0
> [ 1064.668126] Call Trace:
> [ 1064.668132]  [<ffffffff93757413>] dump_stack+0x19/0x1b
> [ 1064.668135]  [<ffffffff930ccb3d>] __might_sleep+0x12d/0x1f0
> [ 1064.668138]  [<ffffffff9375f694>] rt_spin_lock+0x24/0x60
> [ 1064.668155]  [<ffffffffc06ab996>] __get_kvmclock_ns+0x36/0x110 [k]
> [ 1064.668159]  [<ffffffff93112993>] ? futex_wait_queue_me+0x103/0x10
> [ 1064.668171]  [<ffffffffc06b8782>] kvm_arch_vm_ioctl+0xa2/0xd70 [k]
> [ 1064.668173]  [<ffffffff9311333c>] ? futex_wait+0x1ac/0x2a0
> 
> On -RT kernels.

Hmm, __get_kvmclock_ns used not to need a spinlock back when it was
added...  Why does it now?

Looking at its current state, I'm not sure I understand what it's
supposed to do: it uses the host tsc rate rather than the guest one,
which seems to just defeat the purpose of originally introducing it: to
have a way to obtain the clock value exactly the same as the guest would
see...

Am I missing anything obvious?

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
  2017-04-13  7:32 ` [PATCH] " Roman Kagan
@ 2017-04-13 13:27   ` Radim Krčmář
  0 siblings, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2017-04-13 13:27 UTC (permalink / raw)
  To: Roman Kagan, Marcelo Tosatti, kvm-devel, Paolo Bonzini, pagupta

2017-04-13 10:32+0300, Roman Kagan:
> Hmm, __get_kvmclock_ns used not to need a spinlock back when it was
> added...  Why does it now?

It accessed vcpu to get the data it now takes from the master clock.
The lock is there to protect master clock's consistency.

There were problems when accessing VCPUs from VM ioctls -- you first had
to conjure a VCPU out of thin air and then be extremely careful when
accessing it, which we didn't manage to and it resulted in an invalid
VMCS access bug, IIRC.  (I would bet there we other bugs.)

> Looking at its current state, I'm not sure I understand what it's
> supposed to do: it uses the host tsc rate rather than the guest one,
> which seems to just defeat the purpose of originally introducing it: to
> have a way to obtain the clock value exactly the same as the guest would
> see...
> 
> Am I missing anything obvious?

It is a bit obscure ... the current state is simulating what the vcpu
does when setting up its kvmclock, and should therefore yield the same
result as the guest kvmclock would see.

Without master clock, the guest kvmclock follows ktime_get_boot_ns().

With master clock, the guest follows host TSC (different frequency than
ktime_get_boot_ns()).  The guest might have extra scaling and offset on
that TSC, but both are known to KVM when computing the kvmclock
variables to get the 1GHz result.

We could first convert the host TSC to guest TSC and then compute
kvmclock from it, but we should obtain the same result when converting
to 1GHz directly from host TSC (within rouding errors, ofc).

This is possible as guest and host use the same physical TSC.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
  2017-04-13  6:06 ` Pankaj Gupta
@ 2017-04-13 18:30   ` Marcelo Tosatti
  2017-04-13 18:49   ` [PATCH v2] " Marcelo Tosatti
  1 sibling, 0 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2017-04-13 18:30 UTC (permalink / raw)
  To: Pankaj Gupta; +Cc: kvm-devel, Paolo Bonzini, Radim Krčmář

On Thu, Apr 13, 2017 at 02:06:36AM -0400, Pankaj Gupta wrote:
>  
> > 
> > The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK
> > attempts to disable interrupts in that section to protect
> > the values that are calculated in that section from interrupt interference.
> > 
> > now_ns is calculated inside the irq protected region,
> > user_ns.clock is passed from userspace (therefore not susceptible
> > to interrupt variation).
> > 
> > About the line
> >                 now_ns = __get_kvmclock_ns(kvm);  (1)
> > 
> > Interrupts can happen afterwards local_irq_enable(),
> > rendering "now_ns" relative to its execution time PLUS
> > interrupt time.
> 
> Are you saying interrupt disable code here is to avoid any interrupt
> latency before we update time?
> > 
> > Therefore the local_irq_disable() / local_irq_enable() protection is not
> > necessary (that is: interrupts triggering after local_irq_enable cause
> > the same problem that the protection is trying to avoid).
> 
> I am not sure because this could be different use-case for RT and mainline. 
> RT can afford to preempt, but if we do will we get desired results?
> 
> Maybe if we want this code path to be "irq safe", can we use 'raw_spin_lock'
> in place of 'spin_lock'.

commit 395c6b0a9d56fe7fdb7aeda12795d0eb02475d24
Author: Avi Kivity <avi@redhat.com>
Date:   Mon Oct 4 12:55:49 2010 +0200

    KVM: Disable interrupts around get_kernel_ns()
    
    get_kernel_ns() wants preemption disabled.  It doesn't make a lot of
sense
    during the get/set ioctls (no way to make them non-racy) but the
callee wants
    it.
    
    Signed-off-by: Avi Kivity <avi@redhat.com>

And

static inline u64 get_kernel_ns(void)
{               
        struct timespec ts;

        WARN_ON(preemptible());
        ktime_get_ts(&ts);
        monotonic_to_bootbased(&ts);
        return timespec_to_ns(&ts);
}

Which was added by:
    Add a helper function to compute the kernel time and convert nanoseconds
    back to CPU specific cycles.  Note that these must not be called in preemptible
    context, as that would mean the kernel could enter software suspend state,
    which would cause non-atomic operation.

However this comment makes no sense: ktime_get_ts(&ts) returns a clock
value relative to either before or after software suspend (either case
is fine).
 
> > With this reasoning, and the -RT bug that the irq disablement causes
> > (because spin_lock is now a sleeping lock), remove the IRQ protection as it
> > causes:
> > 
> > [ 1064.668109] in_atomic(): 0, irqs_disabled(): 1, pid: 15296, name:m
> > [ 1064.668110] INFO: lockdep is turned off.
> > [ 1064.668110] irq event stamp: 0
> > [ 1064.668112] hardirqs last  enabled at (0): [<          (null)>]  )
> > [ 1064.668116] hardirqs last disabled at (0): [] c0
> > [ 1064.668118] softirqs last  enabled at (0): [] c0
> > [ 1064.668118] softirqs last disabled at (0): [<          (null)>]  )
> > [ 1064.668121] CPU: 13 PID: 15296 Comm: qemu-kvm Not tainted 3.10.0-1
> > [ 1064.668121] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 5
> > [ 1064.668123]  ffff8c1796b88000 00000000afe7344c ffff8c179abf3c68 f3
> > [ 1064.668125]  ffff8c179abf3c90 ffffffff930ccb3d ffff8c1b992b3610 f0
> > [ 1064.668126]  00007ffc1a26fbc0 ffff8c179abf3cb0 ffffffff9375f694 f0
> > [ 1064.668126] Call Trace:
> > [ 1064.668132]  [] dump_stack+0x19/0x1b
> > [ 1064.668135]  [] __might_sleep+0x12d/0x1f0
> > [ 1064.668138]  [] rt_spin_lock+0x24/0x60
> > [ 1064.668155]  [] __get_kvmclock_ns+0x36/0x110 [k]
> > [ 1064.668159]  [] ? futex_wait_queue_me+0x103/0x10
> > [ 1064.668171]  [] kvm_arch_vm_ioctl+0xa2/0xd70 [k]
> > [ 1064.668173]  [] ? futex_wait+0x1ac/0x2a0
> 
> Even with this patch, we might get same error for:
> 
> get_kvmclock_ns {
> 
>  local_irq_save(flags);
>         ns = __get_kvmclock_ns(kvm);
>         local_irq_restore(flags);
> 
> 
> }

Yes, should fix that as well.

> > 
> > On -RT kernels.
> > 
> > 
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > ---
> >  
> > arch/x86/kvm/x86.c | 4 ----
> >  1 file changed, 4 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 7c39b8a..935143b 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -4062,10 +4062,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >                          goto out;
> >  
> >                  r = 0;
> > -                local_irq_disable();
> >                  now_ns = __get_kvmclock_ns(kvm);
> >                  kvm->arch.kvmclock_offset += user_ns.clock - now_ns;
> > -                local_irq_enable();
> >                  kvm_gen_update_masterclock(kvm);
> >                  break;
> >          }
> > @@ -4073,11 +4071,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >                  struct kvm_clock_data user_ns;
> >                  u64 now_ns;
> >  
> > -                local_irq_disable();
> >                  now_ns = __get_kvmclock_ns(kvm);
> >                  user_ns.clock = now_ns;
> >                  user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
> > -                local_irq_enable();
> >                  memset(&user_ns.pad, 0, sizeof(user_ns.pad));
> >  
> >                  r = -EFAULT;
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
  2017-04-13  6:06 ` Pankaj Gupta
  2017-04-13 18:30   ` Marcelo Tosatti
@ 2017-04-13 18:49   ` Marcelo Tosatti
  2017-04-17 11:05     ` Pankaj Gupta
  1 sibling, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2017-04-13 18:49 UTC (permalink / raw)
  To: Pankaj Gupta; +Cc: kvm-devel, Paolo Bonzini, Radim Krčmář


[PATCH v2] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK

The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK
attempts to disable software suspend from causing "non atomic behaviour" of
the operation:

    Add a helper function to compute the kernel time and convert nanoseconds
    back to CPU specific cycles.  Note that these must not be called in preemptible
    context, as that would mean the kernel could enter software suspend state,
    which would cause non-atomic operation.

However, assume the kernel can enter software suspend at the following 2 points:

        ktime_get_ts(&ts);
1.
						hypothetical_ktime_get_ts(&ts)
        monotonic_to_bootbased(&ts);
2.

monotonic_to_bootbased() should be correct relative to a ktime_get_ts(&ts)
performed after point 1 (that is after resuming from software suspend),
hypothetical_ktime_get_ts()

Therefore it is also correct for the ktime_get_ts(&ts) before point 1, 
which is 

	ktime_get_ts(&ts) = hypothetical_ktime_get_ts(&ts) + time-to-execute-suspend-code 

Note CLOCK_MONOTONIC does not count during suspension.

So remove the irq disablement, which causes the following warning on 
-RT kernels: 

 With this reasoning, and the -RT bug that the irq disablement causes
 (because spin_lock is now a sleeping lock), remove the IRQ protection as it
 causes:
  
 [ 1064.668109] in_atomic(): 0, irqs_disabled(): 1, pid: 15296, name:m
 [ 1064.668110] INFO: lockdep is turned off.
 [ 1064.668110] irq event stamp: 0
 [ 1064.668112] hardirqs last  enabled at (0): [<          (null)>]  )
 [ 1064.668116] hardirqs last disabled at (0): [] c0
 [ 1064.668118] softirqs last  enabled at (0): [] c0
 [ 1064.668118] softirqs last disabled at (0): [<          (null)>]  )
 [ 1064.668121] CPU: 13 PID: 15296 Comm: qemu-kvm Not tainted 3.10.0-1
 [ 1064.668121] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 5
 [ 1064.668123]  ffff8c1796b88000 00000000afe7344c ffff8c179abf3c68 f3
 [ 1064.668125]  ffff8c179abf3c90 ffffffff930ccb3d ffff8c1b992b3610 f0
 [ 1064.668126]  00007ffc1a26fbc0 ffff8c179abf3cb0 ffffffff9375f694 f0
 [ 1064.668126] Call Trace:
 [ 1064.668132]  [] dump_stack+0x19/0x1b
 [ 1064.668135]  [] __might_sleep+0x12d/0x1f0
 [ 1064.668138]  [] rt_spin_lock+0x24/0x60
 [ 1064.668155]  [] __get_kvmclock_ns+0x36/0x110 [k]
 [ 1064.668159]  [] ? futex_wait_queue_me+0x103/0x10
 [ 1064.668171]  [] kvm_arch_vm_ioctl+0xa2/0xd70 [k]
 [ 1064.668173]  [] ? futex_wait+0x1ac/0x2a0

v2: notice get_kvmclock_ns with the same problem (Pankaj).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1faf620..4901178 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1798,12 +1798,9 @@ static u64 __get_kvmclock_ns(struct kvm *kvm)
 
 u64 get_kvmclock_ns(struct kvm *kvm)
 {
-	unsigned long flags;
 	s64 ns;
 
-	local_irq_save(flags);
 	ns = __get_kvmclock_ns(kvm);
-	local_irq_restore(flags);
 
 	return ns;
 }
@@ -4196,10 +4193,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			goto out;
 
 		r = 0;
-		local_irq_disable();
 		now_ns = __get_kvmclock_ns(kvm);
 		kvm->arch.kvmclock_offset += user_ns.clock - now_ns;
-		local_irq_enable();
 		kvm_gen_update_masterclock(kvm);
 		break;
 	}
@@ -4207,11 +4202,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		struct kvm_clock_data user_ns;
 		u64 now_ns;
 
-		local_irq_disable();
 		now_ns = __get_kvmclock_ns(kvm);
 		user_ns.clock = now_ns;
 		user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
-		local_irq_enable();
 		memset(&user_ns.pad, 0, sizeof(user_ns.pad));
 
 		r = -EFAULT;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
  2017-04-13 18:49   ` [PATCH v2] " Marcelo Tosatti
@ 2017-04-17 11:05     ` Pankaj Gupta
  2017-04-17 15:51       ` [PATCH v3] " Marcelo Tosatti
  0 siblings, 1 reply; 11+ messages in thread
From: Pankaj Gupta @ 2017-04-17 11:05 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel, Paolo Bonzini, Radim Krčmář

> 
> 
> [PATCH v2] KVM: x86: remove irq disablement around
> KVM_SET_CLOCK/KVM_GET_CLOCK
> 
> The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK
> attempts to disable software suspend from causing "non atomic behaviour" of
> the operation:
> 
>     Add a helper function to compute the kernel time and convert nanoseconds
>     back to CPU specific cycles.  Note that these must not be called in
>     preemptible
>     context, as that would mean the kernel could enter software suspend
>     state,
>     which would cause non-atomic operation.
> 
> However, assume the kernel can enter software suspend at the following 2
> points:
> 
>         ktime_get_ts(&ts);
> 1.
> 						hypothetical_ktime_get_ts(&ts)
>         monotonic_to_bootbased(&ts);
> 2.
> 
> monotonic_to_bootbased() should be correct relative to a ktime_get_ts(&ts)
> performed after point 1 (that is after resuming from software suspend),
> hypothetical_ktime_get_ts()
> 
> Therefore it is also correct for the ktime_get_ts(&ts) before point 1,
> which is
> 
> 	ktime_get_ts(&ts) = hypothetical_ktime_get_ts(&ts) +
> 	time-to-execute-suspend-code
> 
> Note CLOCK_MONOTONIC does not count during suspension.
> 
> So remove the irq disablement, which causes the following warning on
> -RT kernels:
> 
>  With this reasoning, and the -RT bug that the irq disablement causes
>  (because spin_lock is now a sleeping lock), remove the IRQ protection as it
>  causes:
>   
>  [ 1064.668109] in_atomic(): 0, irqs_disabled(): 1, pid: 15296, name:m
>  [ 1064.668110] INFO: lockdep is turned off.
>  [ 1064.668110] irq event stamp: 0
>  [ 1064.668112] hardirqs last  enabled at (0): [<          (null)>]  )
>  [ 1064.668116] hardirqs last disabled at (0): [] c0
>  [ 1064.668118] softirqs last  enabled at (0): [] c0
>  [ 1064.668118] softirqs last disabled at (0): [<          (null)>]  )
>  [ 1064.668121] CPU: 13 PID: 15296 Comm: qemu-kvm Not tainted 3.10.0-1
>  [ 1064.668121] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 5
>  [ 1064.668123]  ffff8c1796b88000 00000000afe7344c ffff8c179abf3c68 f3
>  [ 1064.668125]  ffff8c179abf3c90 ffffffff930ccb3d ffff8c1b992b3610 f0
>  [ 1064.668126]  00007ffc1a26fbc0 ffff8c179abf3cb0 ffffffff9375f694 f0
>  [ 1064.668126] Call Trace:
>  [ 1064.668132]  [] dump_stack+0x19/0x1b
>  [ 1064.668135]  [] __might_sleep+0x12d/0x1f0
>  [ 1064.668138]  [] rt_spin_lock+0x24/0x60
>  [ 1064.668155]  [] __get_kvmclock_ns+0x36/0x110 [k]
>  [ 1064.668159]  [] ? futex_wait_queue_me+0x103/0x10
>  [ 1064.668171]  [] kvm_arch_vm_ioctl+0xa2/0xd70 [k]
>  [ 1064.668173]  [] ? futex_wait+0x1ac/0x2a0
> 
> v2: notice get_kvmclock_ns with the same problem (Pankaj).
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1faf620..4901178 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1798,12 +1798,9 @@ static u64 __get_kvmclock_ns(struct kvm *kvm)
>  
>  u64 get_kvmclock_ns(struct kvm *kvm)
>  {
> -	unsigned long flags;
>  	s64 ns;
>  
> -	local_irq_save(flags);
>  	ns = __get_kvmclock_ns(kvm);
> -	local_irq_restore(flags);
>  
>  	return ns;
>  }
> @@ -4196,10 +4193,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  			goto out;
>  
>  		r = 0;
> -		local_irq_disable();
>  		now_ns = __get_kvmclock_ns(kvm);

As we have wrapper 'get_kvmclock_ns' above '__get_kvmclock_ns(kvm)'
we can use that directly.

>  		kvm->arch.kvmclock_offset += user_ns.clock - now_ns;
> -		local_irq_enable();
>  		kvm_gen_update_masterclock(kvm);
>  		break;
>  	}
> @@ -4207,11 +4202,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  		struct kvm_clock_data user_ns;
>  		u64 now_ns;
>  
> -		local_irq_disable();
>  		now_ns = __get_kvmclock_ns(kvm);

wrapper here as well.

>  		user_ns.clock = now_ns;
>  		user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
> -		local_irq_enable();
>  		memset(&user_ns.pad, 0, sizeof(user_ns.pad));
>  
>  		r = -EFAULT;

This patch looks okay to me. Just a small suggestion above.

Thanks,
Pankaj
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
  2017-04-17 11:05     ` Pankaj Gupta
@ 2017-04-17 15:51       ` Marcelo Tosatti
  2017-04-18  5:06         ` Pankaj Gupta
  2017-04-21 10:04         ` Paolo Bonzini
  0 siblings, 2 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2017-04-17 15:51 UTC (permalink / raw)
  To: Pankaj Gupta; +Cc: kvm-devel, Paolo Bonzini, Radim Krčmář


The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK
attempts to disable software suspend from causing "non atomic behaviour" of
the operation:

    Add a helper function to compute the kernel time and convert nanoseconds
    back to CPU specific cycles.  Note that these must not be called in preemptible
    context, as that would mean the kernel could enter software suspend state,
    which would cause non-atomic operation.

However, assume the kernel can enter software suspend at the following 2 points:

        ktime_get_ts(&ts);
1.
						hypothetical_ktime_get_ts(&ts)
        monotonic_to_bootbased(&ts);
2.

monotonic_to_bootbased() should be correct relative to a ktime_get_ts(&ts)
performed after point 1 (that is after resuming from software suspend),
hypothetical_ktime_get_ts()

Therefore it is also correct for the ktime_get_ts(&ts) before point 1, 
which is 

	ktime_get_ts(&ts) = hypothetical_ktime_get_ts(&ts) + time-to-execute-suspend-code 

Note CLOCK_MONOTONIC does not count during suspension.

So remove the irq disablement, which causes the following warning on 
-RT kernels: 

 With this reasoning, and the -RT bug that the irq disablement causes
 (because spin_lock is now a sleeping lock), remove the IRQ protection as it
 causes:
  
 [ 1064.668109] in_atomic(): 0, irqs_disabled(): 1, pid: 15296, name:m
 [ 1064.668110] INFO: lockdep is turned off.
 [ 1064.668110] irq event stamp: 0
 [ 1064.668112] hardirqs last  enabled at (0): [<          (null)>]  )
 [ 1064.668116] hardirqs last disabled at (0): [] c0
 [ 1064.668118] softirqs last  enabled at (0): [] c0
 [ 1064.668118] softirqs last disabled at (0): [<          (null)>]  )
 [ 1064.668121] CPU: 13 PID: 15296 Comm: qemu-kvm Not tainted 3.10.0-1
 [ 1064.668121] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 5
 [ 1064.668123]  ffff8c1796b88000 00000000afe7344c ffff8c179abf3c68 f3
 [ 1064.668125]  ffff8c179abf3c90 ffffffff930ccb3d ffff8c1b992b3610 f0
 [ 1064.668126]  00007ffc1a26fbc0 ffff8c179abf3cb0 ffffffff9375f694 f0
 [ 1064.668126] Call Trace:
 [ 1064.668132]  [] dump_stack+0x19/0x1b
 [ 1064.668135]  [] __might_sleep+0x12d/0x1f0
 [ 1064.668138]  [] rt_spin_lock+0x24/0x60
 [ 1064.668155]  [] __get_kvmclock_ns+0x36/0x110 [k]
 [ 1064.668159]  [] ? futex_wait_queue_me+0x103/0x10
 [ 1064.668171]  [] kvm_arch_vm_ioctl+0xa2/0xd70 [k]
 [ 1064.668173]  [] ? futex_wait+0x1ac/0x2a0

v2: notice get_kvmclock_ns with the same problem (Pankaj).
v3: remove useless helper function (Pankaj).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1faf620..deacbe1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1775,7 +1775,7 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
 #endif
 }
 
-static u64 __get_kvmclock_ns(struct kvm *kvm)
+u64 get_kvmclock_ns(struct kvm *kvm)
 {
 	struct kvm_arch *ka = &kvm->arch;
 	struct pvclock_vcpu_time_info hv_clock;
@@ -1796,18 +1796,6 @@ static u64 __get_kvmclock_ns(struct kvm *kvm)
 	return __pvclock_read_cycles(&hv_clock, rdtsc());
 }
 
-u64 get_kvmclock_ns(struct kvm *kvm)
-{
-	unsigned long flags;
-	s64 ns;
-
-	local_irq_save(flags);
-	ns = __get_kvmclock_ns(kvm);
-	local_irq_restore(flags);
-
-	return ns;
-}
-
 static void kvm_setup_pvclock_page(struct kvm_vcpu *v)
 {
 	struct kvm_vcpu_arch *vcpu = &v->arch;
@@ -4196,10 +4184,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			goto out;
 
 		r = 0;
-		local_irq_disable();
-		now_ns = __get_kvmclock_ns(kvm);
+		now_ns = get_kvmclock_ns(kvm);
 		kvm->arch.kvmclock_offset += user_ns.clock - now_ns;
-		local_irq_enable();
 		kvm_gen_update_masterclock(kvm);
 		break;
 	}
@@ -4207,11 +4193,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		struct kvm_clock_data user_ns;
 		u64 now_ns;
 
-		local_irq_disable();
-		now_ns = __get_kvmclock_ns(kvm);
+		now_ns = get_kvmclock_ns(kvm);
 		user_ns.clock = now_ns;
 		user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
-		local_irq_enable();
 		memset(&user_ns.pad, 0, sizeof(user_ns.pad));
 
 		r = -EFAULT;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
  2017-04-17 15:51       ` [PATCH v3] " Marcelo Tosatti
@ 2017-04-18  5:06         ` Pankaj Gupta
  2017-04-21 10:04         ` Paolo Bonzini
  1 sibling, 0 replies; 11+ messages in thread
From: Pankaj Gupta @ 2017-04-18  5:06 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel, Paolo Bonzini, Radim Krčmář

 
> 
> The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK
> attempts to disable software suspend from causing "non atomic behaviour" of
> the operation:
> 
>     Add a helper function to compute the kernel time and convert nanoseconds
>     back to CPU specific cycles.  Note that these must not be called in
>     preemptible
>     context, as that would mean the kernel could enter software suspend
>     state,
>     which would cause non-atomic operation.
> 
> However, assume the kernel can enter software suspend at the following 2
> points:
> 
>         ktime_get_ts(&ts);
> 1.
> 						hypothetical_ktime_get_ts(&ts)
>         monotonic_to_bootbased(&ts);
> 2.
> 
> monotonic_to_bootbased() should be correct relative to a ktime_get_ts(&ts)
> performed after point 1 (that is after resuming from software suspend),
> hypothetical_ktime_get_ts()
> 
> Therefore it is also correct for the ktime_get_ts(&ts) before point 1,
> which is
> 
> 	ktime_get_ts(&ts) = hypothetical_ktime_get_ts(&ts) +
> 	time-to-execute-suspend-code
> 
> Note CLOCK_MONOTONIC does not count during suspension.
> 
> So remove the irq disablement, which causes the following warning on
> -RT kernels:
> 
>  With this reasoning, and the -RT bug that the irq disablement causes
>  (because spin_lock is now a sleeping lock), remove the IRQ protection as it
>  causes:
>   
>  [ 1064.668109] in_atomic(): 0, irqs_disabled(): 1, pid: 15296, name:m
>  [ 1064.668110] INFO: lockdep is turned off.
>  [ 1064.668110] irq event stamp: 0
>  [ 1064.668112] hardirqs last  enabled at (0): [<          (null)>]  )
>  [ 1064.668116] hardirqs last disabled at (0): [] c0
>  [ 1064.668118] softirqs last  enabled at (0): [] c0
>  [ 1064.668118] softirqs last disabled at (0): [<          (null)>]  )
>  [ 1064.668121] CPU: 13 PID: 15296 Comm: qemu-kvm Not tainted 3.10.0-1
>  [ 1064.668121] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 5
>  [ 1064.668123]  ffff8c1796b88000 00000000afe7344c ffff8c179abf3c68 f3
>  [ 1064.668125]  ffff8c179abf3c90 ffffffff930ccb3d ffff8c1b992b3610 f0
>  [ 1064.668126]  00007ffc1a26fbc0 ffff8c179abf3cb0 ffffffff9375f694 f0
>  [ 1064.668126] Call Trace:
>  [ 1064.668132]  [] dump_stack+0x19/0x1b
>  [ 1064.668135]  [] __might_sleep+0x12d/0x1f0
>  [ 1064.668138]  [] rt_spin_lock+0x24/0x60
>  [ 1064.668155]  [] __get_kvmclock_ns+0x36/0x110 [k]
>  [ 1064.668159]  [] ? futex_wait_queue_me+0x103/0x10
>  [ 1064.668171]  [] kvm_arch_vm_ioctl+0xa2/0xd70 [k]
>  [ 1064.668173]  [] ? futex_wait+0x1ac/0x2a0
> 
> v2: notice get_kvmclock_ns with the same problem (Pankaj).
> v3: remove useless helper function (Pankaj).
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1faf620..deacbe1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1775,7 +1775,7 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
>  #endif
>  }
>  
> -static u64 __get_kvmclock_ns(struct kvm *kvm)
> +u64 get_kvmclock_ns(struct kvm *kvm)
>  {
>  	struct kvm_arch *ka = &kvm->arch;
>  	struct pvclock_vcpu_time_info hv_clock;
> @@ -1796,18 +1796,6 @@ static u64 __get_kvmclock_ns(struct kvm *kvm)
>  	return __pvclock_read_cycles(&hv_clock, rdtsc());
>  }
>  
> -u64 get_kvmclock_ns(struct kvm *kvm)
> -{
> -	unsigned long flags;
> -	s64 ns;
> -
> -	local_irq_save(flags);
> -	ns = __get_kvmclock_ns(kvm);
> -	local_irq_restore(flags);
> -
> -	return ns;
> -}
> -
>  static void kvm_setup_pvclock_page(struct kvm_vcpu *v)
>  {
>  	struct kvm_vcpu_arch *vcpu = &v->arch;
> @@ -4196,10 +4184,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  			goto out;
>  
>  		r = 0;
> -		local_irq_disable();
> -		now_ns = __get_kvmclock_ns(kvm);
> +		now_ns = get_kvmclock_ns(kvm);
>  		kvm->arch.kvmclock_offset += user_ns.clock - now_ns;
> -		local_irq_enable();
>  		kvm_gen_update_masterclock(kvm);
>  		break;
>  	}
> @@ -4207,11 +4193,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  		struct kvm_clock_data user_ns;
>  		u64 now_ns;
>  
> -		local_irq_disable();
> -		now_ns = __get_kvmclock_ns(kvm);
> +		now_ns = get_kvmclock_ns(kvm);
>  		user_ns.clock = now_ns;
>  		user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
> -		local_irq_enable();
>  		memset(&user_ns.pad, 0, sizeof(user_ns.pad));
>  
>  		r = -EFAULT;

Over all patch looks good to me. I think explanation provided by Marcelo
is valid.

Reviewed-by: Pankaj Gupta <pagupta@redhat.com>

> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
  2017-04-17 15:51       ` [PATCH v3] " Marcelo Tosatti
  2017-04-18  5:06         ` Pankaj Gupta
@ 2017-04-21 10:04         ` Paolo Bonzini
  1 sibling, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2017-04-21 10:04 UTC (permalink / raw)
  To: Marcelo Tosatti, Pankaj Gupta; +Cc: kvm-devel, Radim Krčmář



On 17/04/2017 17:51, Marcelo Tosatti wrote:
> 
> The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK
> attempts to disable software suspend from causing "non atomic behaviour" of
> the operation:
> 
>     Add a helper function to compute the kernel time and convert nanoseconds
>     back to CPU specific cycles.  Note that these must not be called in preemptible
>     context, as that would mean the kernel could enter software suspend state,
>     which would cause non-atomic operation.
> 
> However, assume the kernel can enter software suspend at the following 2 points:
> 
>         ktime_get_ts(&ts);
> 1.
> 						hypothetical_ktime_get_ts(&ts)
>         monotonic_to_bootbased(&ts);
> 2.
> 
> monotonic_to_bootbased() should be correct relative to a ktime_get_ts(&ts)
> performed after point 1 (that is after resuming from software suspend),
> hypothetical_ktime_get_ts()
> 
> Therefore it is also correct for the ktime_get_ts(&ts) before point 1, 
> which is 
> 
> 	ktime_get_ts(&ts) = hypothetical_ktime_get_ts(&ts) + time-to-execute-suspend-code 
> 
> Note CLOCK_MONOTONIC does not count during suspension.
> 
> So remove the irq disablement, which causes the following warning on 
> -RT kernels: 
> 
>  With this reasoning, and the -RT bug that the irq disablement causes
>  (because spin_lock is now a sleeping lock), remove the IRQ protection as it
>  causes:
>   
>  [ 1064.668109] in_atomic(): 0, irqs_disabled(): 1, pid: 15296, name:m
>  [ 1064.668110] INFO: lockdep is turned off.
>  [ 1064.668110] irq event stamp: 0
>  [ 1064.668112] hardirqs last  enabled at (0): [<          (null)>]  )
>  [ 1064.668116] hardirqs last disabled at (0): [] c0
>  [ 1064.668118] softirqs last  enabled at (0): [] c0
>  [ 1064.668118] softirqs last disabled at (0): [<          (null)>]  )
>  [ 1064.668121] CPU: 13 PID: 15296 Comm: qemu-kvm Not tainted 3.10.0-1
>  [ 1064.668121] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 5
>  [ 1064.668123]  ffff8c1796b88000 00000000afe7344c ffff8c179abf3c68 f3
>  [ 1064.668125]  ffff8c179abf3c90 ffffffff930ccb3d ffff8c1b992b3610 f0
>  [ 1064.668126]  00007ffc1a26fbc0 ffff8c179abf3cb0 ffffffff9375f694 f0
>  [ 1064.668126] Call Trace:
>  [ 1064.668132]  [] dump_stack+0x19/0x1b
>  [ 1064.668135]  [] __might_sleep+0x12d/0x1f0
>  [ 1064.668138]  [] rt_spin_lock+0x24/0x60
>  [ 1064.668155]  [] __get_kvmclock_ns+0x36/0x110 [k]
>  [ 1064.668159]  [] ? futex_wait_queue_me+0x103/0x10
>  [ 1064.668171]  [] kvm_arch_vm_ioctl+0xa2/0xd70 [k]
>  [ 1064.668173]  [] ? futex_wait+0x1ac/0x2a0
> 
> v2: notice get_kvmclock_ns with the same problem (Pankaj).
> v3: remove useless helper function (Pankaj).
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1faf620..deacbe1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1775,7 +1775,7 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
>  #endif
>  }
>  
> -static u64 __get_kvmclock_ns(struct kvm *kvm)
> +u64 get_kvmclock_ns(struct kvm *kvm)
>  {
>  	struct kvm_arch *ka = &kvm->arch;
>  	struct pvclock_vcpu_time_info hv_clock;
> @@ -1796,18 +1796,6 @@ static u64 __get_kvmclock_ns(struct kvm *kvm)
>  	return __pvclock_read_cycles(&hv_clock, rdtsc());
>  }
>  
> -u64 get_kvmclock_ns(struct kvm *kvm)
> -{
> -	unsigned long flags;
> -	s64 ns;
> -
> -	local_irq_save(flags);
> -	ns = __get_kvmclock_ns(kvm);
> -	local_irq_restore(flags);
> -
> -	return ns;
> -}
> -
>  static void kvm_setup_pvclock_page(struct kvm_vcpu *v)
>  {
>  	struct kvm_vcpu_arch *vcpu = &v->arch;
> @@ -4196,10 +4184,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  			goto out;
>  
>  		r = 0;
> -		local_irq_disable();
> -		now_ns = __get_kvmclock_ns(kvm);
> +		now_ns = get_kvmclock_ns(kvm);
>  		kvm->arch.kvmclock_offset += user_ns.clock - now_ns;
> -		local_irq_enable();
>  		kvm_gen_update_masterclock(kvm);
>  		break;
>  	}
> @@ -4207,11 +4193,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  		struct kvm_clock_data user_ns;
>  		u64 now_ns;
>  
> -		local_irq_disable();
> -		now_ns = __get_kvmclock_ns(kvm);
> +		now_ns = get_kvmclock_ns(kvm);
>  		user_ns.clock = now_ns;
>  		user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
> -		local_irq_enable();
>  		memset(&user_ns.pad, 0, sizeof(user_ns.pad));
>  
>  		r = -EFAULT;
> 

Applied, thanks.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-04-21 10:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-12 17:23 [PATCH] KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK Marcelo Tosatti
2017-04-12 17:40 ` Radim Krčmář
2017-04-13  6:06 ` Pankaj Gupta
2017-04-13 18:30   ` Marcelo Tosatti
2017-04-13 18:49   ` [PATCH v2] " Marcelo Tosatti
2017-04-17 11:05     ` Pankaj Gupta
2017-04-17 15:51       ` [PATCH v3] " Marcelo Tosatti
2017-04-18  5:06         ` Pankaj Gupta
2017-04-21 10:04         ` Paolo Bonzini
2017-04-13  7:32 ` [PATCH] " Roman Kagan
2017-04-13 13:27   ` Radim Krčmář

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.