All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Improved yield performance
@ 2014-02-26 15:14 Christian Borntraeger
  2014-02-26 15:14 ` [PATCH 1/3] KVM: s390: implementation of kvm_arch_vcpu_runnable() Christian Borntraeger
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Christian Borntraeger @ 2014-02-26 15:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, KVM, linux-s390, Cornelia Huck, Michael Mueller,
	Christian Borntraeger

Paolo,

here is the reworked yield heuristics series against kvm/queue with
your suggested changes.

Attached is a minimized testcase that reproduces the performance
win  (runtime 0:50 instead of 1:00). The constants and the setup
seem a bit artificial but these seem to reproduce the problem on my
test system: as soon as there is contention from other guests the 
patch improves the runtime of belows testcase. The original test
had 8 competing guests with 4 cpus, I simplified that a bit to have
one big guest as cpu stealer.

                           host 20 cpus
GUEST1 (20vCPUS)			GUEST2(40vCPUS)
runs the test				all 40 CPUs cpu bound


The testcase stresses the IPC system call, since that creates a lot
of diag44 exits on s390 if contended.

Some other spinlock tests like fio on guest tmpfs seem to be unaffected.

Please consider for kvm/queue. If you have some performance tests
a regression run on x86 is welcome.

Christian

Michael Mueller (3):
  KVM: s390: implementation of kvm_arch_vcpu_runnable()
  KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop
  KVM/s390: Set preempted flag during vcpu wakeup and interrupt delivery

 arch/s390/kvm/interrupt.c | 3 +++
 arch/s390/kvm/kvm-s390.c  | 4 +---
 virt/kvm/kvm_main.c       | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

-------- snip-------------
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define NUM 800
#define ACTIVE 40
#define LOOPS 800

static int mutexsem;

static void busy_work(void)
{
        int i;
        double dummy = 100;

        for (i=0; i<10000; i++) {
                dummy/=1.1;
                asm volatile ("":"+r" (dummy)::"memory");
        }
}

static void *child(void *unused)
{
        int j;
        struct sembuf sops;

        for (j = 0; j < LOOPS; j++) {
                sops.sem_num = 0;
                sops.sem_op = -1;
                sops.sem_flg = 0;
                semop(mutexsem, &sops, 1);

                busy_work();

                sops.sem_num = 0;
                sops.sem_op = 1;
                sops.sem_flg = 0;
                semop(mutexsem, &sops, 1);
        }
        return NULL;
}

static void do_fork(long number)
{
        pid_t pid;

        pid = fork();
        switch (pid) {
        case 0:
                child((void *) number);
                exit(0);
        case -1:
                perror("Couldn't fork child - ");
                exit(1);
        default:
                break;
        }
}

union semun {
        int              val;
        struct semid_ds *buf;
        unsigned short  *array;
        struct seminfo  *__buf;
};

int main(int argc, char **argv)
{
        unsigned i;
        struct sembuf sops;
        union semun value;

        mutexsem = semget(0xdead, 1, IPC_CREAT);
        value.val = 0;
        semctl(mutexsem, 0, SETVAL, value);
        for (i = 0; i < NUM; i++)
                do_fork(i);

        for (i=0; i<ACTIVE; i++) {
                sops.sem_num = 0;
                sops.sem_op = 1;
                sops.sem_flg = 0;
                semop(mutexsem, &sops, 1);
        }

        for (i = 0; i < 1000; i++)
                wait(NULL);
        exit(0);
}

-------- snip-------------

-- 
1.8.4.2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] KVM: s390: implementation of kvm_arch_vcpu_runnable()
  2014-02-26 15:14 [PATCH 0/3] Improved yield performance Christian Borntraeger
@ 2014-02-26 15:14 ` Christian Borntraeger
  2014-02-26 15:14 ` [PATCH 2/3] KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop Christian Borntraeger
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Christian Borntraeger @ 2014-02-26 15:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, KVM, linux-s390, Cornelia Huck, Michael Mueller,
	Christian Borntraeger

From: Michael Mueller <mimu@linux.vnet.ibm.com>

A vcpu is defined to be runnable if an interrupt is pending.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index a5da2cc..18959bb 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -483,9 +483,7 @@ out:
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 {
-	/* kvm common code refers to this, but never calls it */
-	BUG();
-	return 0;
+	return kvm_cpu_has_interrupt(vcpu);
 }
 
 void s390_vcpu_block(struct kvm_vcpu *vcpu)
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop
  2014-02-26 15:14 [PATCH 0/3] Improved yield performance Christian Borntraeger
  2014-02-26 15:14 ` [PATCH 1/3] KVM: s390: implementation of kvm_arch_vcpu_runnable() Christian Borntraeger
@ 2014-02-26 15:14 ` Christian Borntraeger
  2014-02-27 18:11   ` Raghavendra KT
  2014-02-26 15:14 ` [PATCH 3/3] KVM/s390: Set preempted flag during vcpu wakeup and interrupt delivery Christian Borntraeger
  2014-02-26 15:16 ` [PATCH 0/3] Improved yield performance Paolo Bonzini
  3 siblings, 1 reply; 7+ messages in thread
From: Christian Borntraeger @ 2014-02-26 15:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, KVM, linux-s390, Cornelia Huck, Michael Mueller,
	Christian Borntraeger

From: Michael Mueller <mimu@linux.vnet.ibm.com>

Use the arch specific function kvm_arch_vcpu_runnable() to add a further
criterium to identify a suitable vcpu to yield to during undirected yield
processing.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f5668a4..5fd4cf8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1801,7 +1801,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 				continue;
 			if (vcpu == me)
 				continue;
-			if (waitqueue_active(&vcpu->wq))
+			if (waitqueue_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
 				continue;
 			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
 				continue;
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] KVM/s390: Set preempted flag during vcpu wakeup and interrupt delivery
  2014-02-26 15:14 [PATCH 0/3] Improved yield performance Christian Borntraeger
  2014-02-26 15:14 ` [PATCH 1/3] KVM: s390: implementation of kvm_arch_vcpu_runnable() Christian Borntraeger
  2014-02-26 15:14 ` [PATCH 2/3] KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop Christian Borntraeger
@ 2014-02-26 15:14 ` Christian Borntraeger
  2014-02-26 15:16 ` [PATCH 0/3] Improved yield performance Paolo Bonzini
  3 siblings, 0 replies; 7+ messages in thread
From: Christian Borntraeger @ 2014-02-26 15:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, KVM, linux-s390, Cornelia Huck, Michael Mueller

From: Michael Mueller <mimu@linux.vnet.ibm.com>

Commit "kvm: Record the preemption status of vcpus using preempt notifiers"
caused a performance regression on s390. It turned out that in the case that
if a former sleeping cpu, that was woken up, this cpu is not a yield candidate
since it gave up the cpu voluntarily. To retain this candiate its preempted
flag is set during wakeup and interrupt delivery time.

Significant performance measurement work and code analysis to solve this
issue was provided by Mao Chuan Li and his team in Beijing.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/interrupt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 1848080..fff070b 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -505,6 +505,7 @@ enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer)
 	struct kvm_vcpu *vcpu;
 
 	vcpu = container_of(timer, struct kvm_vcpu, arch.ckc_timer);
+	vcpu->preempted = true;
 	tasklet_schedule(&vcpu->arch.tasklet);
 
 	return HRTIMER_NORESTART;
@@ -732,6 +733,7 @@ static int __inject_vm(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
 	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
 	if (waitqueue_active(li->wq))
 		wake_up_interruptible(li->wq);
+	kvm_get_vcpu(kvm, sigcpu)->preempted = true;
 	spin_unlock_bh(&li->lock);
 unlock_fi:
 	spin_unlock(&fi->lock);
@@ -877,6 +879,7 @@ int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
 	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
 	if (waitqueue_active(&vcpu->wq))
 		wake_up_interruptible(&vcpu->wq);
+	vcpu->preempted = true;
 	spin_unlock_bh(&li->lock);
 	mutex_unlock(&vcpu->kvm->lock);
 	return 0;
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] Improved yield performance
  2014-02-26 15:14 [PATCH 0/3] Improved yield performance Christian Borntraeger
                   ` (2 preceding siblings ...)
  2014-02-26 15:14 ` [PATCH 3/3] KVM/s390: Set preempted flag during vcpu wakeup and interrupt delivery Christian Borntraeger
@ 2014-02-26 15:16 ` Paolo Bonzini
  3 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2014-02-26 15:16 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Gleb Natapov, KVM, linux-s390, Cornelia Huck, Michael Mueller

Il 26/02/2014 16:14, Christian Borntraeger ha scritto:
> Paolo,
>
> here is the reworked yield heuristics series against kvm/queue with
> your suggested changes.
>
> Attached is a minimized testcase that reproduces the performance
> win  (runtime 0:50 instead of 1:00). The constants and the setup
> seem a bit artificial but these seem to reproduce the problem on my
> test system: as soon as there is contention from other guests the
> patch improves the runtime of belows testcase. The original test
> had 8 competing guests with 4 cpus, I simplified that a bit to have
> one big guest as cpu stealer.
>
>                            host 20 cpus
> GUEST1 (20vCPUS)			GUEST2(40vCPUS)
> runs the test				all 40 CPUs cpu bound
>
>
> The testcase stresses the IPC system call, since that creates a lot
> of diag44 exits on s390 if contended.
>
> Some other spinlock tests like fio on guest tmpfs seem to be unaffected.
>
> Please consider for kvm/queue. If you have some performance tests
> a regression run on x86 is welcome.

Yes, will apply as soon as kvm/queue moves to kvm/next (1 hour or so).

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop
  2014-02-26 15:14 ` [PATCH 2/3] KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop Christian Borntraeger
@ 2014-02-27 18:11   ` Raghavendra KT
  2014-02-27 18:20     ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Raghavendra KT @ 2014-02-27 18:11 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paolo Bonzini, Gleb Natapov, KVM, linux-s390, Cornelia Huck,
	Michael Mueller

On Wed, Feb 26, 2014 at 8:44 PM, Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
> From: Michael Mueller <mimu@linux.vnet.ibm.com>
>
> Use the arch specific function kvm_arch_vcpu_runnable() to add a further
> criterium to identify a suitable vcpu to yield to during undirected yield
> processing.
>
> Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  virt/kvm/kvm_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f5668a4..5fd4cf8 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1801,7 +1801,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>                                 continue;
>                         if (vcpu == me)
>                                 continue;
> -                       if (waitqueue_active(&vcpu->wq))
> +                       if (waitqueue_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
>                                 continue;
>                         if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
>                                 continue;
> --
> 1.8.4.2
>

I ran kernbench/sysbench/ebizzy on x86 guest to confirm that this did
not have any adverse effect in ple handler path.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop
  2014-02-27 18:11   ` Raghavendra KT
@ 2014-02-27 18:20     ` Paolo Bonzini
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2014-02-27 18:20 UTC (permalink / raw)
  To: Raghavendra KT, Christian Borntraeger
  Cc: Gleb Natapov, KVM, linux-s390, Cornelia Huck, Michael Mueller

Il 27/02/2014 19:11, Raghavendra KT ha scritto:
> On Wed, Feb 26, 2014 at 8:44 PM, Christian Borntraeger
> <borntraeger@de.ibm.com> wrote:
>> From: Michael Mueller <mimu@linux.vnet.ibm.com>
>>
>> Use the arch specific function kvm_arch_vcpu_runnable() to add a further
>> criterium to identify a suitable vcpu to yield to during undirected yield
>> processing.
>>
>> Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
>> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  virt/kvm/kvm_main.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index f5668a4..5fd4cf8 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -1801,7 +1801,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>>                                 continue;
>>                         if (vcpu == me)
>>                                 continue;
>> -                       if (waitqueue_active(&vcpu->wq))
>> +                       if (waitqueue_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
>>                                 continue;
>>                         if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
>>                                 continue;
>> --
>> 1.8.4.2
>>
>
> I ran kernbench/sysbench/ebizzy on x86 guest to confirm that this did
> not have any adverse effect in ple handler path.

Thanks!

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-02-27 18:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-26 15:14 [PATCH 0/3] Improved yield performance Christian Borntraeger
2014-02-26 15:14 ` [PATCH 1/3] KVM: s390: implementation of kvm_arch_vcpu_runnable() Christian Borntraeger
2014-02-26 15:14 ` [PATCH 2/3] KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop Christian Borntraeger
2014-02-27 18:11   ` Raghavendra KT
2014-02-27 18:20     ` Paolo Bonzini
2014-02-26 15:14 ` [PATCH 3/3] KVM/s390: Set preempted flag during vcpu wakeup and interrupt delivery Christian Borntraeger
2014-02-26 15:16 ` [PATCH 0/3] Improved yield performance Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.