All of lore.kernel.org
 help / color / mirror / Atom feed
* qemu-kvm.git build problem
@ 2009-12-28 18:40 Lucas Meneghel Rodrigues
  2009-12-28 19:55 ` Avi Kivity
  2010-01-09  5:23 ` Lucas Meneghel Rodrigues
  0 siblings, 2 replies; 17+ messages in thread
From: Lucas Meneghel Rodrigues @ 2009-12-28 18:40 UTC (permalink / raw)
  To: KVM mailing list; +Cc: Michael Goldish, Eduardo Habkost, Dor Laor, Avi Kivity

Hi, we've had a problem on today's git testing when trying to build from
latest upstream qemu-kvm.git repo:

12/28 13:39:18 DEBUG|     utils:0069| Running 'make -j 2'
12/28 13:39:18 DEBUG|     utils:0101| [stdout] make -C /lib/modules/2.6.31.6-166.fc12.x86_64/build M=`pwd` \
12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		LINUXINCLUDE="-I`pwd`/include -Iinclude \
12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		 -Iarch/x86/include -I`pwd`/include-compat -I`pwd`/x86 \
12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		-include include/linux/autoconf.h \
12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		-include `pwd`/x86/external-module-compat.h" \
12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		"$@"
12/28 13:39:18 DEBUG|     utils:0101| [stdout] make[1]: Entering directory `/usr/src/kernels/2.6.31.6-166.fc12.x86_64'
12/28 13:39:19 DEBUG|     utils:0101| [stdout]   LD      /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/built-in.o
12/28 13:39:19 DEBUG|     utils:0101| [stdout]   CC [M]  /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/svm.o
12/28 13:39:19 DEBUG|     utils:0101| [stdout]   CC [M]  /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.o
12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c:271: error: ‘MSR_TSC_AUX’ undeclared here (not in a function)
12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c: In function ‘setup_msrs’:
12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c:985: warning: passing argument 2 of ‘__find_msr_index’ makes integer from pointer without a cast
12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c:424: note: expected ‘u32’ but argument is of type ‘const u32 *’
12/28 13:39:22 ERROR|     utils:0101| [stderr] make[3]: *** [/usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.o] Error 1
12/28 13:39:22 ERROR|     utils:0101| [stderr] make[3]: *** Waiting for unfinished jobs....
12/28 13:39:23 ERROR|     utils:0101| [stderr] make[2]: *** [/usr/local/autotest/tests/kvm/src/kvm_kmod/x86] Error 2
12/28 13:39:23 DEBUG|     utils:0101| [stdout] make[1]: Leaving directory `/usr/src/kernels/2.6.31.6-166.fc12.x86_64'
12/28 13:39:23 ERROR|     utils:0101| [stderr] make[1]: *** [_module_/usr/local/autotest/tests/kvm/src/kvm_kmod] Error 2
12/28 13:39:23 ERROR|     utils:0101| [stderr] make: *** [all] Error 2

This might be a problem on kvm-kmod, I am just making sure you guys are aware of this.

Cheers!

Lucas



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2009-12-28 18:40 qemu-kvm.git build problem Lucas Meneghel Rodrigues
@ 2009-12-28 19:55 ` Avi Kivity
  2010-01-09  5:23 ` Lucas Meneghel Rodrigues
  1 sibling, 0 replies; 17+ messages in thread
From: Avi Kivity @ 2009-12-28 19:55 UTC (permalink / raw)
  To: Lucas Meneghel Rodrigues
  Cc: KVM mailing list, Michael Goldish, Eduardo Habkost, Dor Laor, Jan Kiszka

On 12/28/2009 08:40 PM, Lucas Meneghel Rodrigues wrote:
> Hi, we've had a problem on today's git testing when trying to build from
> latest upstream qemu-kvm.git repo:
>
> 12/28 13:39:18 DEBUG|     utils:0069| Running 'make -j 2'
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] make -C /lib/modules/2.6.31.6-166.fc12.x86_64/build M=`pwd` \
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		LINUXINCLUDE="-I`pwd`/include -Iinclude \
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		 -Iarch/x86/include -I`pwd`/include-compat -I`pwd`/x86 \
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		-include include/linux/autoconf.h \
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		-include `pwd`/x86/external-module-compat.h" \
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		"$@"
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] make[1]: Entering directory `/usr/src/kernels/2.6.31.6-166.fc12.x86_64'
> 12/28 13:39:19 DEBUG|     utils:0101| [stdout]   LD      /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/built-in.o
> 12/28 13:39:19 DEBUG|     utils:0101| [stdout]   CC [M]  /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/svm.o
> 12/28 13:39:19 DEBUG|     utils:0101| [stdout]   CC [M]  /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.o
> 12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c:271: error: ‘MSR_TSC_AUX’ undeclared here (not in a function)
> 12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c: In function ‘setup_msrs’:
> 12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c:985: warning: passing argument 2 of ‘__find_msr_index’ makes integer from pointer without a cast
> 12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c:424: note: expected ‘u32’ but argument is of type ‘const u32 *’
> 12/28 13:39:22 ERROR|     utils:0101| [stderr] make[3]: *** [/usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.o] Error 1
> 12/28 13:39:22 ERROR|     utils:0101| [stderr] make[3]: *** Waiting for unfinished jobs....
> 12/28 13:39:23 ERROR|     utils:0101| [stderr] make[2]: *** [/usr/local/autotest/tests/kvm/src/kvm_kmod/x86] Error 2
> 12/28 13:39:23 DEBUG|     utils:0101| [stdout] make[1]: Leaving directory `/usr/src/kernels/2.6.31.6-166.fc12.x86_64'
> 12/28 13:39:23 ERROR|     utils:0101| [stderr] make[1]: *** [_module_/usr/local/autotest/tests/kvm/src/kvm_kmod] Error 2
> 12/28 13:39:23 ERROR|     utils:0101| [stderr] make: *** [all] Error 2
>
> This might be a problem on kvm-kmod, I am just making sure you guys are aware of this.
>
>    

It is, copying Jan.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2009-12-28 18:40 qemu-kvm.git build problem Lucas Meneghel Rodrigues
  2009-12-28 19:55 ` Avi Kivity
@ 2010-01-09  5:23 ` Lucas Meneghel Rodrigues
  2010-01-11 10:13   ` Jan Kiszka
  1 sibling, 1 reply; 17+ messages in thread
From: Lucas Meneghel Rodrigues @ 2010-01-09  5:23 UTC (permalink / raw)
  To: KVM mailing list
  Cc: Michael Goldish, Eduardo Habkost, Dor Laor, Avi Kivity, jan.kiszka

On Mon, 2009-12-28 at 16:40 -0200, Lucas Meneghel Rodrigues wrote:
> Hi, we've had a problem on today's git testing when trying to build from
> latest upstream qemu-kvm.git repo:
> 
> 12/28 13:39:18 DEBUG|     utils:0069| Running 'make -j 2'
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] make -C /lib/modules/2.6.31.6-166.fc12.x86_64/build M=`pwd` \
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		LINUXINCLUDE="-I`pwd`/include -Iinclude \
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		 -Iarch/x86/include -I`pwd`/include-compat -I`pwd`/x86 \
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		-include include/linux/autoconf.h \
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		-include `pwd`/x86/external-module-compat.h" \
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		"$@"
> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] make[1]: Entering directory `/usr/src/kernels/2.6.31.6-166.fc12.x86_64'
> 12/28 13:39:19 DEBUG|     utils:0101| [stdout]   LD      /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/built-in.o
> 12/28 13:39:19 DEBUG|     utils:0101| [stdout]   CC [M]  /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/svm.o
> 12/28 13:39:19 DEBUG|     utils:0101| [stdout]   CC [M]  /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.o
> 12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c:271: error: ‘MSR_TSC_AUX’ undeclared here (not in a function)
> 12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c: In function ‘setup_msrs’:
> 12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c:985: warning: passing argument 2 of ‘__find_msr_index’ makes integer from pointer without a cast
> 12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c:424: note: expected ‘u32’ but argument is of type ‘const u32 *’
> 12/28 13:39:22 ERROR|     utils:0101| [stderr] make[3]: *** [/usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.o] Error 1
> 12/28 13:39:22 ERROR|     utils:0101| [stderr] make[3]: *** Waiting for unfinished jobs....
> 12/28 13:39:23 ERROR|     utils:0101| [stderr] make[2]: *** [/usr/local/autotest/tests/kvm/src/kvm_kmod/x86] Error 2
> 12/28 13:39:23 DEBUG|     utils:0101| [stdout] make[1]: Leaving directory `/usr/src/kernels/2.6.31.6-166.fc12.x86_64'
> 12/28 13:39:23 ERROR|     utils:0101| [stderr] make[1]: *** [_module_/usr/local/autotest/tests/kvm/src/kvm_kmod] Error 2
> 12/28 13:39:23 ERROR|     utils:0101| [stderr] make: *** [all] Error 2
> 
> This might be a problem on kvm-kmod, I am just making sure you guys are aware of this.

Hi folks, as of today this build problem hasn't been fixed (just a
friendly reminder).

Lucas




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-09  5:23 ` Lucas Meneghel Rodrigues
@ 2010-01-11 10:13   ` Jan Kiszka
  2010-01-11 10:18     ` Avi Kivity
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kiszka @ 2010-01-11 10:13 UTC (permalink / raw)
  To: Lucas Meneghel Rodrigues
  Cc: KVM mailing list, Michael Goldish, Eduardo Habkost, Dor Laor, Avi Kivity

Lucas Meneghel Rodrigues wrote:
> On Mon, 2009-12-28 at 16:40 -0200, Lucas Meneghel Rodrigues wrote:
>> Hi, we've had a problem on today's git testing when trying to build from
>> latest upstream qemu-kvm.git repo:
>>
>> 12/28 13:39:18 DEBUG|     utils:0069| Running 'make -j 2'
>> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] make -C /lib/modules/2.6.31.6-166.fc12.x86_64/build M=`pwd` \
>> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		LINUXINCLUDE="-I`pwd`/include -Iinclude \
>> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		 -Iarch/x86/include -I`pwd`/include-compat -I`pwd`/x86 \
>> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		-include include/linux/autoconf.h \
>> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		-include `pwd`/x86/external-module-compat.h" \
>> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] 		"$@"
>> 12/28 13:39:18 DEBUG|     utils:0101| [stdout] make[1]: Entering directory `/usr/src/kernels/2.6.31.6-166.fc12.x86_64'
>> 12/28 13:39:19 DEBUG|     utils:0101| [stdout]   LD      /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/built-in.o
>> 12/28 13:39:19 DEBUG|     utils:0101| [stdout]   CC [M]  /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/svm.o
>> 12/28 13:39:19 DEBUG|     utils:0101| [stdout]   CC [M]  /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.o
>> 12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c:271: error: ‘MSR_TSC_AUX’ undeclared here (not in a function)
>> 12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c: In function ‘setup_msrs’:
>> 12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c:985: warning: passing argument 2 of ‘__find_msr_index’ makes integer from pointer without a cast
>> 12/28 13:39:22 ERROR|     utils:0101| [stderr] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.c:424: note: expected ‘u32’ but argument is of type ‘const u32 *’
>> 12/28 13:39:22 ERROR|     utils:0101| [stderr] make[3]: *** [/usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.o] Error 1
>> 12/28 13:39:22 ERROR|     utils:0101| [stderr] make[3]: *** Waiting for unfinished jobs....
>> 12/28 13:39:23 ERROR|     utils:0101| [stderr] make[2]: *** [/usr/local/autotest/tests/kvm/src/kvm_kmod/x86] Error 2
>> 12/28 13:39:23 DEBUG|     utils:0101| [stdout] make[1]: Leaving directory `/usr/src/kernels/2.6.31.6-166.fc12.x86_64'
>> 12/28 13:39:23 ERROR|     utils:0101| [stderr] make[1]: *** [_module_/usr/local/autotest/tests/kvm/src/kvm_kmod] Error 2
>> 12/28 13:39:23 ERROR|     utils:0101| [stderr] make: *** [all] Error 2
>>
>> This might be a problem on kvm-kmod, I am just making sure you guys are aware of this.
> 
> Hi folks, as of today this build problem hasn't been fixed (just a
> friendly reminder).
> 

I'm aware of the issues, already fixed some of them, but were unable to
fully test during vacation + the KVM tree moved on. Hope I'll find some
time to push a complete solution the next days.

BTW, does anybody know how to back-port synchronize_srcu_expedited best?
It looked like a simple mapping to synchronize_srcu was not sufficient
to achieve the same performance as with the pre-srcu locking (e.g.
guest&host stalled during guest's framebuffer setup).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-11 10:13   ` Jan Kiszka
@ 2010-01-11 10:18     ` Avi Kivity
  2010-01-11 10:21       ` Jan Kiszka
  0 siblings, 1 reply; 17+ messages in thread
From: Avi Kivity @ 2010-01-11 10:18 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael Goldish,
	Eduardo Habkost, Dor Laor

On 01/11/2010 12:13 PM, Jan Kiszka wrote:
>
> BTW, does anybody know how to back-port synchronize_srcu_expedited best?
> It looked like a simple mapping to synchronize_srcu was not sufficient
> to achieve the same performance as with the pre-srcu locking (e.g.
> guest&host stalled during guest's framebuffer setup).
>    

Isn't it sufficient to backport kernel/srcu.c?  I thought no sched.c 
changes were necessary.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-11 10:18     ` Avi Kivity
@ 2010-01-11 10:21       ` Jan Kiszka
  2010-01-12  0:23         ` Jan Kiszka
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kiszka @ 2010-01-11 10:21 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael Goldish,
	Eduardo Habkost, Dor Laor

Avi Kivity wrote:
> On 01/11/2010 12:13 PM, Jan Kiszka wrote:
>> BTW, does anybody know how to back-port synchronize_srcu_expedited best?
>> It looked like a simple mapping to synchronize_srcu was not sufficient
>> to achieve the same performance as with the pre-srcu locking (e.g.
>> guest&host stalled during guest's framebuffer setup).
>>    
> 
> Isn't it sufficient to backport kernel/srcu.c?  I thought no sched.c 
> changes were necessary.

Haven't looked yet, but if that's the case, it would indeed be
straightforward.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-11 10:21       ` Jan Kiszka
@ 2010-01-12  0:23         ` Jan Kiszka
  2010-01-12  0:51           ` Paul E. McKenney
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kiszka @ 2010-01-12  0:23 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Avi Kivity, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael Goldish, Eduardo Habkost, Dor Laor

[-- Attachment #1: Type: text/plain, Size: 1534 bytes --]

Jan Kiszka wrote:
> Avi Kivity wrote:
>> On 01/11/2010 12:13 PM, Jan Kiszka wrote:
>>> BTW, does anybody know how to back-port synchronize_srcu_expedited best?
>>> It looked like a simple mapping to synchronize_srcu was not sufficient
>>> to achieve the same performance as with the pre-srcu locking (e.g.
>>> guest&host stalled during guest's framebuffer setup).
>>>    
>> Isn't it sufficient to backport kernel/srcu.c?  I thought no sched.c 
>> changes were necessary.
> 
> Haven't looked yet, but if that's the case, it would indeed be
> straightforward.

It's far away from being straightforward: synchronize_rcu_expedited is
based on synchronize_sched_expedited, introduced to 2.6.32. But that
services is hooked deep into the scheduler, fiddling directly with
runqueues (which are completely private to sched.c). This path looks
like a dead end, specifically when its about supporting ~8 major Linux
releases backwards.

Paul, we have a problem here on the KVM-for-older-kernels front: We need
synchronize_rcu_expedited for acceptable write-side performance (there
are certain phases with lots of changes, plain synchronize_rcu just
stalls both guest and host for several seconds). Our target kernels
(down to 2.6.27, unofficially even 2.6.24) do not have the expedited
service. Can you think of a poor man's solution for those kernels?

Unfortunately, I don't think there is mechanical patching possible to
role-back our srcu use to a rw-sem. But I will check this once again
tomorrow.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-12  0:23         ` Jan Kiszka
@ 2010-01-12  0:51           ` Paul E. McKenney
  2010-01-12  8:28             ` Jan Kiszka
  0 siblings, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2010-01-12  0:51 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael Goldish, Eduardo Habkost, Dor Laor

On Tue, Jan 12, 2010 at 01:23:15AM +0100, Jan Kiszka wrote:
> Jan Kiszka wrote:
> > Avi Kivity wrote:
> >> On 01/11/2010 12:13 PM, Jan Kiszka wrote:
> >>> BTW, does anybody know how to back-port synchronize_srcu_expedited best?
> >>> It looked like a simple mapping to synchronize_srcu was not sufficient
> >>> to achieve the same performance as with the pre-srcu locking (e.g.
> >>> guest&host stalled during guest's framebuffer setup).
> >>>    
> >> Isn't it sufficient to backport kernel/srcu.c?  I thought no sched.c 
> >> changes were necessary.
> > 
> > Haven't looked yet, but if that's the case, it would indeed be
> > straightforward.
> 
> It's far away from being straightforward: synchronize_rcu_expedited is
> based on synchronize_sched_expedited, introduced to 2.6.32. But that
> services is hooked deep into the scheduler, fiddling directly with
> runqueues (which are completely private to sched.c). This path looks
> like a dead end, specifically when its about supporting ~8 major Linux
> releases backwards.
> 
> Paul, we have a problem here on the KVM-for-older-kernels front: We need
> synchronize_rcu_expedited for acceptable write-side performance (there
> are certain phases with lots of changes, plain synchronize_rcu just
> stalls both guest and host for several seconds). Our target kernels
> (down to 2.6.27, unofficially even 2.6.24) do not have the expedited
> service. Can you think of a poor man's solution for those kernels?
> 
> Unfortunately, I don't think there is mechanical patching possible to
> role-back our srcu use to a rw-sem. But I will check this once again
> tomorrow.

Would it help if I backported the RCU expedited stuff?  Yes, it does
involve kernel/sched.c, but it is reasonably hands-off.

If this would help, please let me know to which kernel version you need
the backport.

FYI -- will be AFK Friday and most of the weekend, flying over the
Pacific.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-12  0:51           ` Paul E. McKenney
@ 2010-01-12  8:28             ` Jan Kiszka
  2010-01-12 13:50               ` Paul E. McKenney
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kiszka @ 2010-01-12  8:28 UTC (permalink / raw)
  To: paulmck
  Cc: Avi Kivity, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael Goldish, Eduardo Habkost, Dor Laor

[-- Attachment #1: Type: text/plain, Size: 2597 bytes --]

Paul E. McKenney wrote:
> On Tue, Jan 12, 2010 at 01:23:15AM +0100, Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> Avi Kivity wrote:
>>>> On 01/11/2010 12:13 PM, Jan Kiszka wrote:
>>>>> BTW, does anybody know how to back-port synchronize_srcu_expedited best?
>>>>> It looked like a simple mapping to synchronize_srcu was not sufficient
>>>>> to achieve the same performance as with the pre-srcu locking (e.g.
>>>>> guest&host stalled during guest's framebuffer setup).
>>>>>    
>>>> Isn't it sufficient to backport kernel/srcu.c?  I thought no sched.c 
>>>> changes were necessary.
>>> Haven't looked yet, but if that's the case, it would indeed be
>>> straightforward.
>> It's far away from being straightforward: synchronize_rcu_expedited is
>> based on synchronize_sched_expedited, introduced to 2.6.32. But that
>> services is hooked deep into the scheduler, fiddling directly with
>> runqueues (which are completely private to sched.c). This path looks
>> like a dead end, specifically when its about supporting ~8 major Linux
>> releases backwards.
>>
>> Paul, we have a problem here on the KVM-for-older-kernels front: We need
>> synchronize_rcu_expedited for acceptable write-side performance (there
>> are certain phases with lots of changes, plain synchronize_rcu just
>> stalls both guest and host for several seconds). Our target kernels
>> (down to 2.6.27, unofficially even 2.6.24) do not have the expedited
>> service. Can you think of a poor man's solution for those kernels?
>>
>> Unfortunately, I don't think there is mechanical patching possible to
>> role-back our srcu use to a rw-sem. But I will check this once again
>> tomorrow.
> 
> Would it help if I backported the RCU expedited stuff?  Yes, it does
> involve kernel/sched.c, but it is reasonably hands-off.
> 
> If this would help, please let me know to which kernel version you need
> the backport.

Thanks a lot. Mmm, it's probably already sufficient when you tell me if
I'm now on the right tracks:

The idea behind synchronize_sched_expedited is, instead of waiting for
random task switches on all CPUs, to enforce them. It now just reuses
the migration_thread, locks and request queues from the rqs for this
purpose. So it should have the same effect if we create our own
high-prio srcu_expedited kthreads that include the srcu logic which just
complete() the queued requests of some compat
kvm_synchronize_sched_expedited. Did I get it?

If so, I will try to write something like this the next days. Will
surely appreciate your review afterwards!

Thanks,
Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-12  8:28             ` Jan Kiszka
@ 2010-01-12 13:50               ` Paul E. McKenney
  2010-01-14  0:11                 ` Jan Kiszka
  0 siblings, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2010-01-12 13:50 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael Goldish, Eduardo Habkost, Dor Laor

On Tue, Jan 12, 2010 at 09:28:15AM +0100, Jan Kiszka wrote:
> Paul E. McKenney wrote:
> > On Tue, Jan 12, 2010 at 01:23:15AM +0100, Jan Kiszka wrote:
> >> Jan Kiszka wrote:
> >>> Avi Kivity wrote:
> >>>> On 01/11/2010 12:13 PM, Jan Kiszka wrote:
> >>>>> BTW, does anybody know how to back-port synchronize_srcu_expedited best?
> >>>>> It looked like a simple mapping to synchronize_srcu was not sufficient
> >>>>> to achieve the same performance as with the pre-srcu locking (e.g.
> >>>>> guest&host stalled during guest's framebuffer setup).
> >>>>>    
> >>>> Isn't it sufficient to backport kernel/srcu.c?  I thought no sched.c 
> >>>> changes were necessary.
> >>> Haven't looked yet, but if that's the case, it would indeed be
> >>> straightforward.
> >> It's far away from being straightforward: synchronize_rcu_expedited is
> >> based on synchronize_sched_expedited, introduced to 2.6.32. But that
> >> services is hooked deep into the scheduler, fiddling directly with
> >> runqueues (which are completely private to sched.c). This path looks
> >> like a dead end, specifically when its about supporting ~8 major Linux
> >> releases backwards.
> >>
> >> Paul, we have a problem here on the KVM-for-older-kernels front: We need
> >> synchronize_rcu_expedited for acceptable write-side performance (there
> >> are certain phases with lots of changes, plain synchronize_rcu just
> >> stalls both guest and host for several seconds). Our target kernels
> >> (down to 2.6.27, unofficially even 2.6.24) do not have the expedited
> >> service. Can you think of a poor man's solution for those kernels?
> >>
> >> Unfortunately, I don't think there is mechanical patching possible to
> >> role-back our srcu use to a rw-sem. But I will check this once again
> >> tomorrow.
> > 
> > Would it help if I backported the RCU expedited stuff?  Yes, it does
> > involve kernel/sched.c, but it is reasonably hands-off.
> > 
> > If this would help, please let me know to which kernel version you need
> > the backport.
> 
> Thanks a lot. Mmm, it's probably already sufficient when you tell me if
> I'm now on the right tracks:
> 
> The idea behind synchronize_sched_expedited is, instead of waiting for
> random task switches on all CPUs, to enforce them. It now just reuses
> the migration_thread, locks and request queues from the rqs for this
> purpose. So it should have the same effect if we create our own
> high-prio srcu_expedited kthreads that include the srcu logic which just
> complete() the queued requests of some compat
> kvm_synchronize_sched_expedited. Did I get it?

Yep.  There are any number of ways to force context switches on all CPUs,
and pretty much any will work for synchronize_sched_expedited().

Once you have something that does synchronize_sched_expedited(),
synchronize_srcu_expedited() can just invoke it.

> If so, I will try to write something like this the next days. Will
> surely appreciate your review afterwards!

Sounds good!

						Thanx, Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-12 13:50               ` Paul E. McKenney
@ 2010-01-14  0:11                 ` Jan Kiszka
  2010-01-14  2:13                   ` Paul E. McKenney
                                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Jan Kiszka @ 2010-01-14  0:11 UTC (permalink / raw)
  To: paulmck
  Cc: Avi Kivity, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael Goldish, Eduardo Habkost, Dor Laor

[-- Attachment #1: Type: text/plain, Size: 18249 bytes --]

Paul E. McKenney wrote:
> On Tue, Jan 12, 2010 at 09:28:15AM +0100, Jan Kiszka wrote:
>> If so, I will try to write something like this the next days. Will
>> surely appreciate your review afterwards!
> 
> Sounds good!
> 

Here we go, find the commit inlined below. You may be primarily
interested in kvm_synchronize_sched_expedited() and the helper thread
function kvm_rcu_sync_thread. As I use a dedicated thread I simplified
some parts compared to upstream. Light testing indicates that it works.

The full set of changes (specifically interesting for anyone who wants
to test it) is available on kernel.org. There is now also a
commit [1] included for CPU hotplug support, but it's untested as I don't
have the required hardware at hand.

Thanks in advance,
Jan

[1] http://git.kernel.org/?p=virt/kvm/kvm-kmod.git;a=commitdiff;h=81a3ddcd8ea343a9570f3164bb31160ff4be0ab7

--------->

From fe13d0785586623176879a87949ad626dace192c Mon Sep 17 00:00:00 2001
From: Jan Kiszka <jan.kiszka@siemens.com>
Date: Wed, 13 Jan 2010 09:32:50 +0100
Subject: [PATCH] Compat support for synchronize_srcu_sync

This services was first introduced to 2.6.32 and further optimized in
2.6.33. KVM makes use of it for replacing the rw-sem based slot lock
with SRCU.

TODO: Add CPU-hotplug support to the sync kthreads.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 external-module-compat-comm.h |   17 ++
 srcu.c                        |  382 ++++++++++++++++++++++++++++++-----------
 sync                          |   27 ++-
 3 files changed, 316 insertions(+), 110 deletions(-)

diff --git a/external-module-compat-comm.h b/external-module-compat-comm.h
index ca57567..eabf8a1 100644
--- a/external-module-compat-comm.h
+++ b/external-module-compat-comm.h
@@ -1053,3 +1053,20 @@ static inline void kvm_fire_urn(void)
 static inline void kvm_fire_urn(void) {}
 
 #endif /* CONFIG_USER_RETURN_NOTIFIER */
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,33)
+
+#ifdef CONFIG_SMP
+void kvm_synchronize_srcu_expedited(struct srcu_struct *sp);
+#else
+static inline void kvm_synchronize_srcu_expedited(struct srcu_struct *sp) { }
+#endif
+
+#else
+
+#define kvm_synchronize_srcu_expedited synchronize_srcu_expedited
+
+#endif
+
+int kvm_init_srcu(void);
+void kvm_exit_srcu(void);
diff --git a/srcu.c b/srcu.c
index e9734bc..985d627 100644
--- a/srcu.c
+++ b/srcu.c
@@ -24,8 +24,6 @@
  *
  */
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19)
-
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/percpu.h>
@@ -35,6 +33,118 @@
 #include <linux/slab.h>
 #include <linux/smp.h>
 #include <linux/srcu.h>
+#include <linux/kthread.h>
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19) || \
+    (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,33) && defined(CONFIG_SMP))
+
+/*
+ * srcu_readers_active_idx -- returns approximate number of readers
+ *	active on the specified rank of per-CPU counters.
+ */
+
+static int srcu_readers_active_idx(struct srcu_struct *sp, int idx)
+{
+	int cpu;
+	int sum;
+
+	sum = 0;
+	for_each_possible_cpu(cpu)
+		sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
+	return sum;
+}
+
+/*
+ * Helper function for synchronize_srcu() and synchronize_srcu_expedited().
+ */
+static void __synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void))
+{
+	int idx;
+
+	idx = sp->completed;
+	mutex_lock(&sp->mutex);
+
+	/*
+	 * Check to see if someone else did the work for us while we were
+	 * waiting to acquire the lock.  We need -two- advances of
+	 * the counter, not just one.  If there was but one, we might have
+	 * shown up -after- our helper's first synchronize_sched(), thus
+	 * having failed to prevent CPU-reordering races with concurrent
+	 * srcu_read_unlock()s on other CPUs (see comment below).  So we
+	 * either (1) wait for two or (2) supply the second ourselves.
+	 */
+
+	if ((sp->completed - idx) >= 2) {
+		mutex_unlock(&sp->mutex);
+		return;
+	}
+
+	sync_func();  /* Force memory barrier on all CPUs. */
+
+	/*
+	 * The preceding synchronize_sched() ensures that any CPU that
+	 * sees the new value of sp->completed will also see any preceding
+	 * changes to data structures made by this CPU.  This prevents
+	 * some other CPU from reordering the accesses in its SRCU
+	 * read-side critical section to precede the corresponding
+	 * srcu_read_lock() -- ensuring that such references will in
+	 * fact be protected.
+	 *
+	 * So it is now safe to do the flip.
+	 */
+
+	idx = sp->completed & 0x1;
+	sp->completed++;
+
+	sync_func();  /* Force memory barrier on all CPUs. */
+
+	/*
+	 * At this point, because of the preceding synchronize_sched(),
+	 * all srcu_read_lock() calls using the old counters have completed.
+	 * Their corresponding critical sections might well be still
+	 * executing, but the srcu_read_lock() primitives themselves
+	 * will have finished executing.
+	 */
+
+	while (srcu_readers_active_idx(sp, idx))
+		schedule_timeout_interruptible(1);
+
+	sync_func();  /* Force memory barrier on all CPUs. */
+
+	/*
+	 * The preceding synchronize_sched() forces all srcu_read_unlock()
+	 * primitives that were executing concurrently with the preceding
+	 * for_each_possible_cpu() loop to have completed by this point.
+	 * More importantly, it also forces the corresponding SRCU read-side
+	 * critical sections to have also completed, and the corresponding
+	 * references to SRCU-protected data items to be dropped.
+	 *
+	 * Note:
+	 *
+	 *	Despite what you might think at first glance, the
+	 *	preceding synchronize_sched() -must- be within the
+	 *	critical section ended by the following mutex_unlock().
+	 *	Otherwise, a task taking the early exit can race
+	 *	with a srcu_read_unlock(), which might have executed
+	 *	just before the preceding srcu_readers_active() check,
+	 *	and whose CPU might have reordered the srcu_read_unlock()
+	 *	with the preceding critical section.  In this case, there
+	 *	is nothing preventing the synchronize_sched() task that is
+	 *	taking the early exit from freeing a data structure that
+	 *	is still being referenced (out of order) by the task
+	 *	doing the srcu_read_unlock().
+	 *
+	 *	Alternatively, the comparison with "2" on the early exit
+	 *	could be changed to "3", but this increases synchronize_srcu()
+	 *	latency for bulk loads.  So the current code is preferred.
+	 */
+
+	mutex_unlock(&sp->mutex);
+}
+
+#endif
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19)
 
 #undef kvm_init_srcu_struct
 #undef kvm_cleanup_srcu_struct
@@ -42,6 +152,11 @@
 #undef kvm_srcu_read_unlock
 #undef kvm_synchronize_srcu
 #undef kvm_srcu_batches_completed
+
+static int srcu_readers_active_idx(struct srcu_struct *sp, int idx);
+static void
+__synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void));
+
 /**
  * init_srcu_struct - initialize a sleep-RCU structure
  * @sp: structure to initialize.
@@ -58,22 +173,6 @@ int kvm_init_srcu_struct(struct srcu_struct *sp)
 	return (sp->per_cpu_ref ? 0 : -ENOMEM);
 }
 
-/*
- * srcu_readers_active_idx -- returns approximate number of readers
- *	active on the specified rank of per-CPU counters.
- */
-
-static int srcu_readers_active_idx(struct srcu_struct *sp, int idx)
-{
-	int cpu;
-	int sum;
-
-	sum = 0;
-	for_each_possible_cpu(cpu)
-		sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
-	return sum;
-}
-
 /**
  * srcu_readers_active - returns approximate number of readers.
  * @sp: which srcu_struct to count active readers (holding srcu_read_lock).
@@ -154,94 +253,14 @@ void kvm_srcu_read_unlock(struct srcu_struct *sp, int idx)
  * synchronizing concurrent updates.  Can block; must be called from
  * process context.
  *
- * Note that it is illegal to call synchornize_srcu() from the corresponding
+ * Note that it is illegal to call synchronize_srcu() from the corresponding
  * SRCU read-side critical section; doing so will result in deadlock.
  * However, it is perfectly legal to call synchronize_srcu() on one
  * srcu_struct from some other srcu_struct's read-side critical section.
  */
 void kvm_synchronize_srcu(struct srcu_struct *sp)
 {
-	int idx;
-
-	idx = sp->completed;
-	mutex_lock(&sp->mutex);
-
-	/*
-	 * Check to see if someone else did the work for us while we were
-	 * waiting to acquire the lock.  We need -two- advances of
-	 * the counter, not just one.  If there was but one, we might have
-	 * shown up -after- our helper's first synchronize_sched(), thus
-	 * having failed to prevent CPU-reordering races with concurrent
-	 * srcu_read_unlock()s on other CPUs (see comment below).  So we
-	 * either (1) wait for two or (2) supply the second ourselves.
-	 */
-
-	if ((sp->completed - idx) >= 2) {
-		mutex_unlock(&sp->mutex);
-		return;
-	}
-
-	synchronize_sched();  /* Force memory barrier on all CPUs. */
-
-	/*
-	 * The preceding synchronize_sched() ensures that any CPU that
-	 * sees the new value of sp->completed will also see any preceding
-	 * changes to data structures made by this CPU.  This prevents
-	 * some other CPU from reordering the accesses in its SRCU
-	 * read-side critical section to precede the corresponding
-	 * srcu_read_lock() -- ensuring that such references will in
-	 * fact be protected.
-	 *
-	 * So it is now safe to do the flip.
-	 */
-
-	idx = sp->completed & 0x1;
-	sp->completed++;
-
-	synchronize_sched();  /* Force memory barrier on all CPUs. */
-
-	/*
-	 * At this point, because of the preceding synchronize_sched(),
-	 * all srcu_read_lock() calls using the old counters have completed.
-	 * Their corresponding critical sections might well be still
-	 * executing, but the srcu_read_lock() primitives themselves
-	 * will have finished executing.
-	 */
-
-	while (srcu_readers_active_idx(sp, idx))
-		schedule_timeout_interruptible(1);
-
-	synchronize_sched();  /* Force memory barrier on all CPUs. */
-
-	/*
-	 * The preceding synchronize_sched() forces all srcu_read_unlock()
-	 * primitives that were executing concurrently with the preceding
-	 * for_each_possible_cpu() loop to have completed by this point.
-	 * More importantly, it also forces the corresponding SRCU read-side
-	 * critical sections to have also completed, and the corresponding
-	 * references to SRCU-protected data items to be dropped.
-	 *
-	 * Note:
-	 *
-	 *	Despite what you might think at first glance, the
-	 *	preceding synchronize_sched() -must- be within the
-	 *	critical section ended by the following mutex_unlock().
-	 *	Otherwise, a task taking the early exit can race
-	 *	with a srcu_read_unlock(), which might have executed
-	 *	just before the preceding srcu_readers_active() check,
-	 *	and whose CPU might have reordered the srcu_read_unlock()
-	 *	with the preceding critical section.  In this case, there
-	 *	is nothing preventing the synchronize_sched() task that is
-	 *	taking the early exit from freeing a data structure that
-	 *	is still being referenced (out of order) by the task
-	 *	doing the srcu_read_unlock().
-	 *
-	 *	Alternatively, the comparison with "2" on the early exit
-	 *	could be changed to "3", but this increases synchronize_srcu()
-	 *	latency for bulk loads.  So the current code is preferred.
-	 */
-
-	mutex_unlock(&sp->mutex);
+	__synchronize_srcu(sp, synchronize_sched);
 }
 
 /**
@@ -265,3 +284,166 @@ EXPORT_SYMBOL_GPL(kvm_synchronize_srcu);
 EXPORT_SYMBOL_GPL(kvm_srcu_batches_completed);
 
 #endif
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,33) && defined(CONFIG_SMP)
+
+struct sync_req {
+	struct list_head list;
+	bool pending;
+	bool success;
+	struct completion done;
+};
+
+static DEFINE_PER_CPU(struct sync_req, sync_req);
+static DEFINE_PER_CPU(struct task_struct *, sync_thread);
+static DEFINE_MUTEX(rcu_sched_expedited_mutex);
+
+static long synchronize_sched_expedited_count;
+
+static int kvm_rcu_sync_thread(void *data)
+{
+	int badcpu;
+	int cpu = (long)data;
+	struct sync_req *req = &per_cpu(sync_req, cpu);
+
+	set_current_state(TASK_INTERRUPTIBLE);
+	while (!kthread_should_stop()) {
+		if (!req->pending) {
+			schedule();
+			set_current_state(TASK_INTERRUPTIBLE);
+			continue;
+		}
+		req->pending = false;
+
+		preempt_disable();
+		badcpu = smp_processor_id();
+		if (likely(cpu == badcpu)) {
+			req->success = true;
+		} else {
+			req->success = false;
+			WARN_ONCE(1, "kvm_rcu_sync_thread() on CPU %d, "
+				  "expected %d\n", badcpu, cpu);
+		}
+		preempt_enable();
+
+		complete(&req->done);
+	}
+	__set_current_state(TASK_RUNNING);
+
+	return 0;
+}
+
+static void kvm_synchronize_sched_expedited(void)
+{
+	int cpu;
+	bool need_full_sync = 0;
+	struct sync_req *req;
+	long snap;
+	int trycount = 0;
+
+	smp_mb();  /* ensure prior mod happens before capturing snap. */
+	snap = ACCESS_ONCE(synchronize_sched_expedited_count) + 1;
+	get_online_cpus();
+	while (!mutex_trylock(&rcu_sched_expedited_mutex)) {
+		put_online_cpus();
+		if (trycount++ < 10)
+			udelay(trycount * num_online_cpus());
+		else {
+			synchronize_sched();
+			return;
+		}
+		if (ACCESS_ONCE(synchronize_sched_expedited_count) - snap > 0) {
+			smp_mb(); /* ensure test happens before caller kfree */
+			return;
+		}
+		get_online_cpus();
+	}
+	for_each_online_cpu(cpu) {
+		req = &per_cpu(sync_req, cpu);
+		init_completion(&req->done);
+		smp_wmb();
+		req->pending = true;
+		wake_up_process(per_cpu(sync_thread, cpu));
+	}
+	for_each_online_cpu(cpu) {
+		req = &per_cpu(sync_req, cpu);
+		wait_for_completion(&req->done);
+		if (unlikely(!req->success))
+			need_full_sync = 1;
+	}
+	synchronize_sched_expedited_count++;
+	mutex_unlock(&rcu_sched_expedited_mutex);
+	put_online_cpus();
+	if (need_full_sync)
+		synchronize_sched();
+}
+
+/**
+ * synchronize_srcu_expedited - like synchronize_srcu, but less patient
+ * @sp: srcu_struct with which to synchronize.
+ *
+ * Flip the completed counter, and wait for the old count to drain to zero.
+ * As with classic RCU, the updater must use some separate means of
+ * synchronizing concurrent updates.  Can block; must be called from
+ * process context.
+ *
+ * Note that it is illegal to call synchronize_srcu_expedited()
+ * from the corresponding SRCU read-side critical section; doing so
+ * will result in deadlock.  However, it is perfectly legal to call
+ * synchronize_srcu_expedited() on one srcu_struct from some other
+ * srcu_struct's read-side critical section.
+ */
+void kvm_synchronize_srcu_expedited(struct srcu_struct *sp)
+{
+	__synchronize_srcu(sp, kvm_synchronize_sched_expedited);
+}
+EXPORT_SYMBOL_GPL(kvm_synchronize_srcu_expedited);
+
+int kvm_init_srcu(void)
+{
+	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+	struct task_struct *p;
+	int cpu;
+	int err;
+
+	for_each_online_cpu(cpu) {
+		p = kthread_create(kvm_rcu_sync_thread, (void *)(long)cpu,
+				   "kvmsrcusync/%d", cpu);
+		if (IS_ERR(p))
+			goto error_out;
+
+		kthread_bind(p, cpu);
+		sched_setscheduler(p, SCHED_FIFO, &param);
+		per_cpu(sync_thread, cpu) = p;
+		wake_up_process(p);
+	}
+	return 0;
+
+error_out:
+	printk(KERN_ERR "kvm: kvmsrcsync for %d failed\n", cpu);
+	err = PTR_ERR(p);
+	kvm_exit_srcu();
+	return err;
+}
+
+void kvm_exit_srcu(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu)
+		if (per_cpu(sync_thread, cpu))
+			kthread_stop(per_cpu(sync_thread, cpu));
+}
+
+#else
+
+int kvm_init_srcu(void)
+{
+	return 0;
+}
+
+void kvm_exit_srcu(void)
+{
+}
+
+#endif
diff --git a/sync b/sync
index 0eaff31..d67469e 100755
--- a/sync
+++ b/sync
@@ -40,8 +40,9 @@ def __hack(data):
         'do_machine_check eventfd_signal get_desc_base get_desc_limit '
         'vma_kernel_pagesize native_read_tsc user_return_notifier '
         'user_return_notifier_register user_return_notifier_unregister '
+        'synchronize_srcu_expedited '
         )
-    anon_inodes = anon_inodes_exit = False
+    kvm_init = kvm_exit = False
     mce = False
     result = []
     pr_fmt = ''
@@ -58,24 +59,30 @@ def __hack(data):
             pr_fmt = sub(r'#define pr_fmt\([^)]*\) ("[^"]*").*', r'\1', line) + ' '
             line = ''
         line = sub(r'pr_debug\(([^),]*)', r'pr_debug(' + pr_fmt + r'\1', line)
-        if match(r'^int kvm_init\('): anon_inodes = True
-        if match(r'r = kvm_arch_init\(opaque\);') and anon_inodes:
+        if match(r'^int kvm_init\('): kvm_init = True
+        if match(r'r = kvm_arch_init\(opaque\);') and kvm_init:
             w('\tr = kvm_init_anon_inodes();')
             w('\tif (r)')
             w('\t\treturn r;\n')
+            w('\tr = kvm_init_srcu();')
+            w('\tif (r)')
+            w('\t\tgoto cleanup_anon_inodes;\n')
             w('\tpreempt_notifier_sys_init();')
             w('\thrtimer_kallsyms_resolve();\n')
-        if match(r'return 0;') and anon_inodes:
+        if match(r'return 0;') and kvm_init:
             w('\tprintk("loaded kvm module (%s)\\n");\n' % (version,))
-        if match(r'return r;') and anon_inodes:
-            w('\tkvm_exit_anon_inodes();')
+        if match(r'return r;') and kvm_init:
             w('\tpreempt_notifier_sys_exit();')
-            anon_inodes = False
-        if match(r'^void kvm_exit'): anon_inodes_exit = True
-        if match(r'\}') and anon_inodes_exit:
+            w('\tkvm_exit_srcu();')
+            w('cleanup_anon_inodes:')
             w('\tkvm_exit_anon_inodes();')
+            kvm_init = False
+        if match(r'^void kvm_exit'): kvm_exit = True
+        if match(r'\}') and kvm_exit:
             w('\tpreempt_notifier_sys_exit();')
-            anon_inodes_exit = False
+            w('\tkvm_exit_srcu();')
+            w('\tkvm_exit_anon_inodes();')
+            kvm_exit = False
         if match(r'^int kvm_arch_init'): kvm_arch_init = True
         if match(r'\btsc_khz\b') and kvm_arch_init:
             line = sub(r'\btsc_khz\b', 'kvm_tsc_khz', line)
-- 
1.6.0.2


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-14  0:11                 ` Jan Kiszka
@ 2010-01-14  2:13                   ` Paul E. McKenney
  2010-01-15  8:58                     ` Jan Kiszka
  2010-01-14 20:02                   ` Lucas Meneghel Rodrigues
  2010-01-14 23:48                   ` Paul E. McKenney
  2 siblings, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2010-01-14  2:13 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael Goldish, Eduardo Habkost, Dor Laor

On Thu, Jan 14, 2010 at 01:11:33AM +0100, Jan Kiszka wrote:
> Paul E. McKenney wrote:
> > On Tue, Jan 12, 2010 at 09:28:15AM +0100, Jan Kiszka wrote:
> >> If so, I will try to write something like this the next days. Will
> >> surely appreciate your review afterwards!
> > 
> > Sounds good!
> > 
> 
> Here we go, find the commit inlined below. You may be primarily
> interested in kvm_synchronize_sched_expedited() and the helper thread
> function kvm_rcu_sync_thread. As I use a dedicated thread I simplified
> some parts compared to upstream. Light testing indicates that it works.

Cool!!!

Have you had a chance to run rcutorture on this implementation?

							Thanx, Paul

> The full set of changes (specifically interesting for anyone who wants
> to test it) is available on kernel.org. There is now also a
> commit [1] included for CPU hotplug support, but it's untested as I don't
> have the required hardware at hand.
> 
> Thanks in advance,
> Jan
> 
> [1] http://git.kernel.org/?p=virt/kvm/kvm-kmod.git;a=commitdiff;h=81a3ddcd8ea343a9570f3164bb31160ff4be0ab7
> 
> --------->
> 
> From fe13d0785586623176879a87949ad626dace192c Mon Sep 17 00:00:00 2001
> From: Jan Kiszka <jan.kiszka@siemens.com>
> Date: Wed, 13 Jan 2010 09:32:50 +0100
> Subject: [PATCH] Compat support for synchronize_srcu_sync
> 
> This services was first introduced to 2.6.32 and further optimized in
> 2.6.33. KVM makes use of it for replacing the rw-sem based slot lock
> with SRCU.
> 
> TODO: Add CPU-hotplug support to the sync kthreads.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  external-module-compat-comm.h |   17 ++
>  srcu.c                        |  382 ++++++++++++++++++++++++++++++-----------
>  sync                          |   27 ++-
>  3 files changed, 316 insertions(+), 110 deletions(-)
> 
> diff --git a/external-module-compat-comm.h b/external-module-compat-comm.h
> index ca57567..eabf8a1 100644
> --- a/external-module-compat-comm.h
> +++ b/external-module-compat-comm.h
> @@ -1053,3 +1053,20 @@ static inline void kvm_fire_urn(void)
>  static inline void kvm_fire_urn(void) {}
>  
>  #endif /* CONFIG_USER_RETURN_NOTIFIER */
> +
> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,33)
> +
> +#ifdef CONFIG_SMP
> +void kvm_synchronize_srcu_expedited(struct srcu_struct *sp);
> +#else
> +static inline void kvm_synchronize_srcu_expedited(struct srcu_struct *sp) { }
> +#endif
> +
> +#else
> +
> +#define kvm_synchronize_srcu_expedited synchronize_srcu_expedited
> +
> +#endif
> +
> +int kvm_init_srcu(void);
> +void kvm_exit_srcu(void);
> diff --git a/srcu.c b/srcu.c
> index e9734bc..985d627 100644
> --- a/srcu.c
> +++ b/srcu.c
> @@ -24,8 +24,6 @@
>   *
>   */
>  
> -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19)
> -
>  #include <linux/module.h>
>  #include <linux/mutex.h>
>  #include <linux/percpu.h>
> @@ -35,6 +33,118 @@
>  #include <linux/slab.h>
>  #include <linux/smp.h>
>  #include <linux/srcu.h>
> +#include <linux/kthread.h>
> +
> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19) || \
> +    (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,33) && defined(CONFIG_SMP))
> +
> +/*
> + * srcu_readers_active_idx -- returns approximate number of readers
> + *	active on the specified rank of per-CPU counters.
> + */
> +
> +static int srcu_readers_active_idx(struct srcu_struct *sp, int idx)
> +{
> +	int cpu;
> +	int sum;
> +
> +	sum = 0;
> +	for_each_possible_cpu(cpu)
> +		sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
> +	return sum;
> +}
> +
> +/*
> + * Helper function for synchronize_srcu() and synchronize_srcu_expedited().
> + */
> +static void __synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void))
> +{
> +	int idx;
> +
> +	idx = sp->completed;
> +	mutex_lock(&sp->mutex);
> +
> +	/*
> +	 * Check to see if someone else did the work for us while we were
> +	 * waiting to acquire the lock.  We need -two- advances of
> +	 * the counter, not just one.  If there was but one, we might have
> +	 * shown up -after- our helper's first synchronize_sched(), thus
> +	 * having failed to prevent CPU-reordering races with concurrent
> +	 * srcu_read_unlock()s on other CPUs (see comment below).  So we
> +	 * either (1) wait for two or (2) supply the second ourselves.
> +	 */
> +
> +	if ((sp->completed - idx) >= 2) {
> +		mutex_unlock(&sp->mutex);
> +		return;
> +	}
> +
> +	sync_func();  /* Force memory barrier on all CPUs. */
> +
> +	/*
> +	 * The preceding synchronize_sched() ensures that any CPU that
> +	 * sees the new value of sp->completed will also see any preceding
> +	 * changes to data structures made by this CPU.  This prevents
> +	 * some other CPU from reordering the accesses in its SRCU
> +	 * read-side critical section to precede the corresponding
> +	 * srcu_read_lock() -- ensuring that such references will in
> +	 * fact be protected.
> +	 *
> +	 * So it is now safe to do the flip.
> +	 */
> +
> +	idx = sp->completed & 0x1;
> +	sp->completed++;
> +
> +	sync_func();  /* Force memory barrier on all CPUs. */
> +
> +	/*
> +	 * At this point, because of the preceding synchronize_sched(),
> +	 * all srcu_read_lock() calls using the old counters have completed.
> +	 * Their corresponding critical sections might well be still
> +	 * executing, but the srcu_read_lock() primitives themselves
> +	 * will have finished executing.
> +	 */
> +
> +	while (srcu_readers_active_idx(sp, idx))
> +		schedule_timeout_interruptible(1);
> +
> +	sync_func();  /* Force memory barrier on all CPUs. */
> +
> +	/*
> +	 * The preceding synchronize_sched() forces all srcu_read_unlock()
> +	 * primitives that were executing concurrently with the preceding
> +	 * for_each_possible_cpu() loop to have completed by this point.
> +	 * More importantly, it also forces the corresponding SRCU read-side
> +	 * critical sections to have also completed, and the corresponding
> +	 * references to SRCU-protected data items to be dropped.
> +	 *
> +	 * Note:
> +	 *
> +	 *	Despite what you might think at first glance, the
> +	 *	preceding synchronize_sched() -must- be within the
> +	 *	critical section ended by the following mutex_unlock().
> +	 *	Otherwise, a task taking the early exit can race
> +	 *	with a srcu_read_unlock(), which might have executed
> +	 *	just before the preceding srcu_readers_active() check,
> +	 *	and whose CPU might have reordered the srcu_read_unlock()
> +	 *	with the preceding critical section.  In this case, there
> +	 *	is nothing preventing the synchronize_sched() task that is
> +	 *	taking the early exit from freeing a data structure that
> +	 *	is still being referenced (out of order) by the task
> +	 *	doing the srcu_read_unlock().
> +	 *
> +	 *	Alternatively, the comparison with "2" on the early exit
> +	 *	could be changed to "3", but this increases synchronize_srcu()
> +	 *	latency for bulk loads.  So the current code is preferred.
> +	 */
> +
> +	mutex_unlock(&sp->mutex);
> +}
> +
> +#endif
> +
> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19)
>  
>  #undef kvm_init_srcu_struct
>  #undef kvm_cleanup_srcu_struct
> @@ -42,6 +152,11 @@
>  #undef kvm_srcu_read_unlock
>  #undef kvm_synchronize_srcu
>  #undef kvm_srcu_batches_completed
> +
> +static int srcu_readers_active_idx(struct srcu_struct *sp, int idx);
> +static void
> +__synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void));
> +
>  /**
>   * init_srcu_struct - initialize a sleep-RCU structure
>   * @sp: structure to initialize.
> @@ -58,22 +173,6 @@ int kvm_init_srcu_struct(struct srcu_struct *sp)
>  	return (sp->per_cpu_ref ? 0 : -ENOMEM);
>  }
>  
> -/*
> - * srcu_readers_active_idx -- returns approximate number of readers
> - *	active on the specified rank of per-CPU counters.
> - */
> -
> -static int srcu_readers_active_idx(struct srcu_struct *sp, int idx)
> -{
> -	int cpu;
> -	int sum;
> -
> -	sum = 0;
> -	for_each_possible_cpu(cpu)
> -		sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
> -	return sum;
> -}
> -
>  /**
>   * srcu_readers_active - returns approximate number of readers.
>   * @sp: which srcu_struct to count active readers (holding srcu_read_lock).
> @@ -154,94 +253,14 @@ void kvm_srcu_read_unlock(struct srcu_struct *sp, int idx)
>   * synchronizing concurrent updates.  Can block; must be called from
>   * process context.
>   *
> - * Note that it is illegal to call synchornize_srcu() from the corresponding
> + * Note that it is illegal to call synchronize_srcu() from the corresponding
>   * SRCU read-side critical section; doing so will result in deadlock.
>   * However, it is perfectly legal to call synchronize_srcu() on one
>   * srcu_struct from some other srcu_struct's read-side critical section.
>   */
>  void kvm_synchronize_srcu(struct srcu_struct *sp)
>  {
> -	int idx;
> -
> -	idx = sp->completed;
> -	mutex_lock(&sp->mutex);
> -
> -	/*
> -	 * Check to see if someone else did the work for us while we were
> -	 * waiting to acquire the lock.  We need -two- advances of
> -	 * the counter, not just one.  If there was but one, we might have
> -	 * shown up -after- our helper's first synchronize_sched(), thus
> -	 * having failed to prevent CPU-reordering races with concurrent
> -	 * srcu_read_unlock()s on other CPUs (see comment below).  So we
> -	 * either (1) wait for two or (2) supply the second ourselves.
> -	 */
> -
> -	if ((sp->completed - idx) >= 2) {
> -		mutex_unlock(&sp->mutex);
> -		return;
> -	}
> -
> -	synchronize_sched();  /* Force memory barrier on all CPUs. */
> -
> -	/*
> -	 * The preceding synchronize_sched() ensures that any CPU that
> -	 * sees the new value of sp->completed will also see any preceding
> -	 * changes to data structures made by this CPU.  This prevents
> -	 * some other CPU from reordering the accesses in its SRCU
> -	 * read-side critical section to precede the corresponding
> -	 * srcu_read_lock() -- ensuring that such references will in
> -	 * fact be protected.
> -	 *
> -	 * So it is now safe to do the flip.
> -	 */
> -
> -	idx = sp->completed & 0x1;
> -	sp->completed++;
> -
> -	synchronize_sched();  /* Force memory barrier on all CPUs. */
> -
> -	/*
> -	 * At this point, because of the preceding synchronize_sched(),
> -	 * all srcu_read_lock() calls using the old counters have completed.
> -	 * Their corresponding critical sections might well be still
> -	 * executing, but the srcu_read_lock() primitives themselves
> -	 * will have finished executing.
> -	 */
> -
> -	while (srcu_readers_active_idx(sp, idx))
> -		schedule_timeout_interruptible(1);
> -
> -	synchronize_sched();  /* Force memory barrier on all CPUs. */
> -
> -	/*
> -	 * The preceding synchronize_sched() forces all srcu_read_unlock()
> -	 * primitives that were executing concurrently with the preceding
> -	 * for_each_possible_cpu() loop to have completed by this point.
> -	 * More importantly, it also forces the corresponding SRCU read-side
> -	 * critical sections to have also completed, and the corresponding
> -	 * references to SRCU-protected data items to be dropped.
> -	 *
> -	 * Note:
> -	 *
> -	 *	Despite what you might think at first glance, the
> -	 *	preceding synchronize_sched() -must- be within the
> -	 *	critical section ended by the following mutex_unlock().
> -	 *	Otherwise, a task taking the early exit can race
> -	 *	with a srcu_read_unlock(), which might have executed
> -	 *	just before the preceding srcu_readers_active() check,
> -	 *	and whose CPU might have reordered the srcu_read_unlock()
> -	 *	with the preceding critical section.  In this case, there
> -	 *	is nothing preventing the synchronize_sched() task that is
> -	 *	taking the early exit from freeing a data structure that
> -	 *	is still being referenced (out of order) by the task
> -	 *	doing the srcu_read_unlock().
> -	 *
> -	 *	Alternatively, the comparison with "2" on the early exit
> -	 *	could be changed to "3", but this increases synchronize_srcu()
> -	 *	latency for bulk loads.  So the current code is preferred.
> -	 */
> -
> -	mutex_unlock(&sp->mutex);
> +	__synchronize_srcu(sp, synchronize_sched);
>  }
>  
>  /**
> @@ -265,3 +284,166 @@ EXPORT_SYMBOL_GPL(kvm_synchronize_srcu);
>  EXPORT_SYMBOL_GPL(kvm_srcu_batches_completed);
>  
>  #endif
> +
> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,33) && defined(CONFIG_SMP)
> +
> +struct sync_req {
> +	struct list_head list;
> +	bool pending;
> +	bool success;
> +	struct completion done;
> +};
> +
> +static DEFINE_PER_CPU(struct sync_req, sync_req);
> +static DEFINE_PER_CPU(struct task_struct *, sync_thread);
> +static DEFINE_MUTEX(rcu_sched_expedited_mutex);
> +
> +static long synchronize_sched_expedited_count;
> +
> +static int kvm_rcu_sync_thread(void *data)
> +{
> +	int badcpu;
> +	int cpu = (long)data;
> +	struct sync_req *req = &per_cpu(sync_req, cpu);
> +
> +	set_current_state(TASK_INTERRUPTIBLE);
> +	while (!kthread_should_stop()) {
> +		if (!req->pending) {
> +			schedule();
> +			set_current_state(TASK_INTERRUPTIBLE);
> +			continue;
> +		}
> +		req->pending = false;
> +
> +		preempt_disable();
> +		badcpu = smp_processor_id();
> +		if (likely(cpu == badcpu)) {
> +			req->success = true;
> +		} else {
> +			req->success = false;
> +			WARN_ONCE(1, "kvm_rcu_sync_thread() on CPU %d, "
> +				  "expected %d\n", badcpu, cpu);
> +		}
> +		preempt_enable();
> +
> +		complete(&req->done);
> +	}
> +	__set_current_state(TASK_RUNNING);
> +
> +	return 0;
> +}
> +
> +static void kvm_synchronize_sched_expedited(void)
> +{
> +	int cpu;
> +	bool need_full_sync = 0;
> +	struct sync_req *req;
> +	long snap;
> +	int trycount = 0;
> +
> +	smp_mb();  /* ensure prior mod happens before capturing snap. */
> +	snap = ACCESS_ONCE(synchronize_sched_expedited_count) + 1;
> +	get_online_cpus();
> +	while (!mutex_trylock(&rcu_sched_expedited_mutex)) {
> +		put_online_cpus();
> +		if (trycount++ < 10)
> +			udelay(trycount * num_online_cpus());
> +		else {
> +			synchronize_sched();
> +			return;
> +		}
> +		if (ACCESS_ONCE(synchronize_sched_expedited_count) - snap > 0) {
> +			smp_mb(); /* ensure test happens before caller kfree */
> +			return;
> +		}
> +		get_online_cpus();
> +	}
> +	for_each_online_cpu(cpu) {
> +		req = &per_cpu(sync_req, cpu);
> +		init_completion(&req->done);
> +		smp_wmb();
> +		req->pending = true;
> +		wake_up_process(per_cpu(sync_thread, cpu));
> +	}
> +	for_each_online_cpu(cpu) {
> +		req = &per_cpu(sync_req, cpu);
> +		wait_for_completion(&req->done);
> +		if (unlikely(!req->success))
> +			need_full_sync = 1;
> +	}
> +	synchronize_sched_expedited_count++;
> +	mutex_unlock(&rcu_sched_expedited_mutex);
> +	put_online_cpus();
> +	if (need_full_sync)
> +		synchronize_sched();
> +}
> +
> +/**
> + * synchronize_srcu_expedited - like synchronize_srcu, but less patient
> + * @sp: srcu_struct with which to synchronize.
> + *
> + * Flip the completed counter, and wait for the old count to drain to zero.
> + * As with classic RCU, the updater must use some separate means of
> + * synchronizing concurrent updates.  Can block; must be called from
> + * process context.
> + *
> + * Note that it is illegal to call synchronize_srcu_expedited()
> + * from the corresponding SRCU read-side critical section; doing so
> + * will result in deadlock.  However, it is perfectly legal to call
> + * synchronize_srcu_expedited() on one srcu_struct from some other
> + * srcu_struct's read-side critical section.
> + */
> +void kvm_synchronize_srcu_expedited(struct srcu_struct *sp)
> +{
> +	__synchronize_srcu(sp, kvm_synchronize_sched_expedited);
> +}
> +EXPORT_SYMBOL_GPL(kvm_synchronize_srcu_expedited);
> +
> +int kvm_init_srcu(void)
> +{
> +	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
> +	struct task_struct *p;
> +	int cpu;
> +	int err;
> +
> +	for_each_online_cpu(cpu) {
> +		p = kthread_create(kvm_rcu_sync_thread, (void *)(long)cpu,
> +				   "kvmsrcusync/%d", cpu);
> +		if (IS_ERR(p))
> +			goto error_out;
> +
> +		kthread_bind(p, cpu);
> +		sched_setscheduler(p, SCHED_FIFO, &param);
> +		per_cpu(sync_thread, cpu) = p;
> +		wake_up_process(p);
> +	}
> +	return 0;
> +
> +error_out:
> +	printk(KERN_ERR "kvm: kvmsrcsync for %d failed\n", cpu);
> +	err = PTR_ERR(p);
> +	kvm_exit_srcu();
> +	return err;
> +}
> +
> +void kvm_exit_srcu(void)
> +{
> +	int cpu;
> +
> +	for_each_online_cpu(cpu)
> +		if (per_cpu(sync_thread, cpu))
> +			kthread_stop(per_cpu(sync_thread, cpu));
> +}
> +
> +#else
> +
> +int kvm_init_srcu(void)
> +{
> +	return 0;
> +}
> +
> +void kvm_exit_srcu(void)
> +{
> +}
> +
> +#endif
> diff --git a/sync b/sync
> index 0eaff31..d67469e 100755
> --- a/sync
> +++ b/sync
> @@ -40,8 +40,9 @@ def __hack(data):
>          'do_machine_check eventfd_signal get_desc_base get_desc_limit '
>          'vma_kernel_pagesize native_read_tsc user_return_notifier '
>          'user_return_notifier_register user_return_notifier_unregister '
> +        'synchronize_srcu_expedited '
>          )
> -    anon_inodes = anon_inodes_exit = False
> +    kvm_init = kvm_exit = False
>      mce = False
>      result = []
>      pr_fmt = ''
> @@ -58,24 +59,30 @@ def __hack(data):
>              pr_fmt = sub(r'#define pr_fmt\([^)]*\) ("[^"]*").*', r'\1', line) + ' '
>              line = ''
>          line = sub(r'pr_debug\(([^),]*)', r'pr_debug(' + pr_fmt + r'\1', line)
> -        if match(r'^int kvm_init\('): anon_inodes = True
> -        if match(r'r = kvm_arch_init\(opaque\);') and anon_inodes:
> +        if match(r'^int kvm_init\('): kvm_init = True
> +        if match(r'r = kvm_arch_init\(opaque\);') and kvm_init:
>              w('\tr = kvm_init_anon_inodes();')
>              w('\tif (r)')
>              w('\t\treturn r;\n')
> +            w('\tr = kvm_init_srcu();')
> +            w('\tif (r)')
> +            w('\t\tgoto cleanup_anon_inodes;\n')
>              w('\tpreempt_notifier_sys_init();')
>              w('\thrtimer_kallsyms_resolve();\n')
> -        if match(r'return 0;') and anon_inodes:
> +        if match(r'return 0;') and kvm_init:
>              w('\tprintk("loaded kvm module (%s)\\n");\n' % (version,))
> -        if match(r'return r;') and anon_inodes:
> -            w('\tkvm_exit_anon_inodes();')
> +        if match(r'return r;') and kvm_init:
>              w('\tpreempt_notifier_sys_exit();')
> -            anon_inodes = False
> -        if match(r'^void kvm_exit'): anon_inodes_exit = True
> -        if match(r'\}') and anon_inodes_exit:
> +            w('\tkvm_exit_srcu();')
> +            w('cleanup_anon_inodes:')
>              w('\tkvm_exit_anon_inodes();')
> +            kvm_init = False
> +        if match(r'^void kvm_exit'): kvm_exit = True
> +        if match(r'\}') and kvm_exit:
>              w('\tpreempt_notifier_sys_exit();')
> -            anon_inodes_exit = False
> +            w('\tkvm_exit_srcu();')
> +            w('\tkvm_exit_anon_inodes();')
> +            kvm_exit = False
>          if match(r'^int kvm_arch_init'): kvm_arch_init = True
>          if match(r'\btsc_khz\b') and kvm_arch_init:
>              line = sub(r'\btsc_khz\b', 'kvm_tsc_khz', line)
> -- 
> 1.6.0.2
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-14  0:11                 ` Jan Kiszka
  2010-01-14  2:13                   ` Paul E. McKenney
@ 2010-01-14 20:02                   ` Lucas Meneghel Rodrigues
  2010-01-14 23:48                   ` Paul E. McKenney
  2 siblings, 0 replies; 17+ messages in thread
From: Lucas Meneghel Rodrigues @ 2010-01-14 20:02 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: paulmck, Avi Kivity, KVM mailing list, Michael Goldish,
	Eduardo Habkost, Dor Laor

On Thu, 2010-01-14 at 01:11 +0100, Jan Kiszka wrote:
> Paul E. McKenney wrote:
> > On Tue, Jan 12, 2010 at 09:28:15AM +0100, Jan Kiszka wrote:
> >> If so, I will try to write something like this the next days. Will
> >> surely appreciate your review afterwards!
> > 
> > Sounds good!
> > 
> 
> Here we go, find the commit inlined below. You may be primarily
> interested in kvm_synchronize_sched_expedited() and the helper thread
> function kvm_rcu_sync_thread. As I use a dedicated thread I simplified
> some parts compared to upstream. Light testing indicates that it works.

Indeed it started to build fine on the KVM upstream tests we run on our
test farm.

Thank you very much Jan!

Lucas


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-14  0:11                 ` Jan Kiszka
  2010-01-14  2:13                   ` Paul E. McKenney
  2010-01-14 20:02                   ` Lucas Meneghel Rodrigues
@ 2010-01-14 23:48                   ` Paul E. McKenney
  2010-01-15  8:56                     ` Jan Kiszka
  2 siblings, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2010-01-14 23:48 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael Goldish, Eduardo Habkost, Dor Laor

On Thu, Jan 14, 2010 at 01:11:33AM +0100, Jan Kiszka wrote:
> Paul E. McKenney wrote:
> > On Tue, Jan 12, 2010 at 09:28:15AM +0100, Jan Kiszka wrote:
> >> If so, I will try to write something like this the next days. Will
> >> surely appreciate your review afterwards!
> > 
> > Sounds good!
> > 
> 
> Here we go, find the commit inlined below. You may be primarily
> interested in kvm_synchronize_sched_expedited() and the helper thread
> function kvm_rcu_sync_thread. As I use a dedicated thread I simplified
> some parts compared to upstream. Light testing indicates that it works.
> 
> The full set of changes (specifically interesting for anyone who wants
> to test it) is available on kernel.org. There is now also a
> commit [1] included for CPU hotplug support, but it's untested as I don't
> have the required hardware at hand.

This patch is only for backporting, right?  In mainline, you should be
able to simply call synchronize_srcu_expedited().

> Thanks in advance,
> Jan
> 
> [1] http://git.kernel.org/?p=virt/kvm/kvm-kmod.git;a=commitdiff;h=81a3ddcd8ea343a9570f3164bb31160ff4be0ab7
> 
> --------->
> 
> From fe13d0785586623176879a87949ad626dace192c Mon Sep 17 00:00:00 2001
> From: Jan Kiszka <jan.kiszka@siemens.com>
> Date: Wed, 13 Jan 2010 09:32:50 +0100
> Subject: [PATCH] Compat support for synchronize_srcu_sync
> 
> This services was first introduced to 2.6.32 and further optimized in
> 2.6.33. KVM makes use of it for replacing the rw-sem based slot lock
> with SRCU.
> 
> TODO: Add CPU-hotplug support to the sync kthreads.

Oh.  Never mind my complaint about lack of CPU-hotplug support below.  ;-)

The function migration_call() in kernel/sched.c has the code you would
want to harvest for this.  And re-reading your text above, perhaps you
already have done so.

Anyway, looks good.  There are some questions interspersed below to make
sure I understand the situation.  There is one cut-and-paste spelling
error and another cut-and-paste C-standard violation, the latter being
my fault, and as far as I know, not a problem in practice.

> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  external-module-compat-comm.h |   17 ++
>  srcu.c                        |  382 ++++++++++++++++++++++++++++++-----------
>  sync                          |   27 ++-
>  3 files changed, 316 insertions(+), 110 deletions(-)
> 
> diff --git a/external-module-compat-comm.h b/external-module-compat-comm.h
> index ca57567..eabf8a1 100644
> --- a/external-module-compat-comm.h
> +++ b/external-module-compat-comm.h
> @@ -1053,3 +1053,20 @@ static inline void kvm_fire_urn(void)
>  static inline void kvm_fire_urn(void) {}
>  
>  #endif /* CONFIG_USER_RETURN_NOTIFIER */
> +
> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,33)
> +
> +#ifdef CONFIG_SMP
> +void kvm_synchronize_srcu_expedited(struct srcu_struct *sp);
> +#else
> +static inline void kvm_synchronize_srcu_expedited(struct srcu_struct *sp) { }
> +#endif
> +
> +#else
> +
> +#define kvm_synchronize_srcu_expedited synchronize_srcu_expedited
> +
> +#endif
> +
> +int kvm_init_srcu(void);
> +void kvm_exit_srcu(void);
> diff --git a/srcu.c b/srcu.c

Are we in kernel/srcu.c or a kvm-specific file?

My guess is that we are in kernel/srcu.c for a backport-only patch, but
have to ask!

> index e9734bc..985d627 100644
> --- a/srcu.c
> +++ b/srcu.c
> @@ -24,8 +24,6 @@
>   *
>   */
>  
> -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19)
> -
>  #include <linux/module.h>
>  #include <linux/mutex.h>
>  #include <linux/percpu.h>
> @@ -35,6 +33,118 @@
>  #include <linux/slab.h>
>  #include <linux/smp.h>
>  #include <linux/srcu.h>
> +#include <linux/kthread.h>
> +
> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19) || \
> +    (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,33) && defined(CONFIG_SMP))
> +
> +/*
> + * srcu_readers_active_idx -- returns approximate number of readers
> + *	active on the specified rank of per-CPU counters.
> + */
> +
> +static int srcu_readers_active_idx(struct srcu_struct *sp, int idx)
> +{
> +	int cpu;
> +	int sum;
> +
> +	sum = 0;
> +	for_each_possible_cpu(cpu)
> +		sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
> +	return sum;
> +}
> +
> +/*
> + * Helper function for synchronize_srcu() and synchronize_srcu_expedited().
> + */
> +static void __synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void))
> +{
> +	int idx;
> +
> +	idx = sp->completed;
> +	mutex_lock(&sp->mutex);
> +
> +	/*
> +	 * Check to see if someone else did the work for us while we were
> +	 * waiting to acquire the lock.  We need -two- advances of
> +	 * the counter, not just one.  If there was but one, we might have
> +	 * shown up -after- our helper's first synchronize_sched(), thus
> +	 * having failed to prevent CPU-reordering races with concurrent
> +	 * srcu_read_unlock()s on other CPUs (see comment below).  So we
> +	 * either (1) wait for two or (2) supply the second ourselves.
> +	 */
> +
> +	if ((sp->completed - idx) >= 2) {
> +		mutex_unlock(&sp->mutex);
> +		return;
> +	}
> +
> +	sync_func();  /* Force memory barrier on all CPUs. */
> +
> +	/*
> +	 * The preceding synchronize_sched() ensures that any CPU that
> +	 * sees the new value of sp->completed will also see any preceding
> +	 * changes to data structures made by this CPU.  This prevents
> +	 * some other CPU from reordering the accesses in its SRCU
> +	 * read-side critical section to precede the corresponding
> +	 * srcu_read_lock() -- ensuring that such references will in
> +	 * fact be protected.
> +	 *
> +	 * So it is now safe to do the flip.
> +	 */
> +
> +	idx = sp->completed & 0x1;
> +	sp->completed++;
> +
> +	sync_func();  /* Force memory barrier on all CPUs. */
> +
> +	/*
> +	 * At this point, because of the preceding synchronize_sched(),
> +	 * all srcu_read_lock() calls using the old counters have completed.
> +	 * Their corresponding critical sections might well be still
> +	 * executing, but the srcu_read_lock() primitives themselves
> +	 * will have finished executing.
> +	 */
> +
> +	while (srcu_readers_active_idx(sp, idx))
> +		schedule_timeout_interruptible(1);
> +
> +	sync_func();  /* Force memory barrier on all CPUs. */
> +
> +	/*
> +	 * The preceding synchronize_sched() forces all srcu_read_unlock()
> +	 * primitives that were executing concurrently with the preceding
> +	 * for_each_possible_cpu() loop to have completed by this point.
> +	 * More importantly, it also forces the corresponding SRCU read-side
> +	 * critical sections to have also completed, and the corresponding
> +	 * references to SRCU-protected data items to be dropped.
> +	 *
> +	 * Note:
> +	 *
> +	 *	Despite what you might think at first glance, the
> +	 *	preceding synchronize_sched() -must- be within the
> +	 *	critical section ended by the following mutex_unlock().
> +	 *	Otherwise, a task taking the early exit can race
> +	 *	with a srcu_read_unlock(), which might have executed
> +	 *	just before the preceding srcu_readers_active() check,
> +	 *	and whose CPU might have reordered the srcu_read_unlock()
> +	 *	with the preceding critical section.  In this case, there
> +	 *	is nothing preventing the synchronize_sched() task that is
> +	 *	taking the early exit from freeing a data structure that
> +	 *	is still being referenced (out of order) by the task
> +	 *	doing the srcu_read_unlock().
> +	 *
> +	 *	Alternatively, the comparison with "2" on the early exit
> +	 *	could be changed to "3", but this increases synchronize_srcu()
> +	 *	latency for bulk loads.  So the current code is preferred.
> +	 */
> +
> +	mutex_unlock(&sp->mutex);
> +}
> +
> +#endif
> +
> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19)
>  
>  #undef kvm_init_srcu_struct
>  #undef kvm_cleanup_srcu_struct
> @@ -42,6 +152,11 @@
>  #undef kvm_srcu_read_unlock
>  #undef kvm_synchronize_srcu
>  #undef kvm_srcu_batches_completed
> +
> +static int srcu_readers_active_idx(struct srcu_struct *sp, int idx);
> +static void
> +__synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void));
> +
>  /**
>   * init_srcu_struct - initialize a sleep-RCU structure
>   * @sp: structure to initialize.
> @@ -58,22 +173,6 @@ int kvm_init_srcu_struct(struct srcu_struct *sp)
>  	return (sp->per_cpu_ref ? 0 : -ENOMEM);
>  }
>  
> -/*
> - * srcu_readers_active_idx -- returns approximate number of readers
> - *	active on the specified rank of per-CPU counters.
> - */
> -
> -static int srcu_readers_active_idx(struct srcu_struct *sp, int idx)
> -{
> -	int cpu;
> -	int sum;
> -
> -	sum = 0;
> -	for_each_possible_cpu(cpu)
> -		sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
> -	return sum;
> -}
> -
>  /**
>   * srcu_readers_active - returns approximate number of readers.
>   * @sp: which srcu_struct to count active readers (holding srcu_read_lock).
> @@ -154,94 +253,14 @@ void kvm_srcu_read_unlock(struct srcu_struct *sp, int idx)
>   * synchronizing concurrent updates.  Can block; must be called from
>   * process context.
>   *
> - * Note that it is illegal to call synchornize_srcu() from the corresponding
> + * Note that it is illegal to call synchronize_srcu() from the corresponding
>   * SRCU read-side critical section; doing so will result in deadlock.
>   * However, it is perfectly legal to call synchronize_srcu() on one
>   * srcu_struct from some other srcu_struct's read-side critical section.
>   */
>  void kvm_synchronize_srcu(struct srcu_struct *sp)
>  {
> -	int idx;
> -
> -	idx = sp->completed;
> -	mutex_lock(&sp->mutex);
> -
> -	/*
> -	 * Check to see if someone else did the work for us while we were
> -	 * waiting to acquire the lock.  We need -two- advances of
> -	 * the counter, not just one.  If there was but one, we might have
> -	 * shown up -after- our helper's first synchronize_sched(), thus
> -	 * having failed to prevent CPU-reordering races with concurrent
> -	 * srcu_read_unlock()s on other CPUs (see comment below).  So we
> -	 * either (1) wait for two or (2) supply the second ourselves.
> -	 */
> -
> -	if ((sp->completed - idx) >= 2) {
> -		mutex_unlock(&sp->mutex);
> -		return;
> -	}
> -
> -	synchronize_sched();  /* Force memory barrier on all CPUs. */
> -
> -	/*
> -	 * The preceding synchronize_sched() ensures that any CPU that
> -	 * sees the new value of sp->completed will also see any preceding
> -	 * changes to data structures made by this CPU.  This prevents
> -	 * some other CPU from reordering the accesses in its SRCU
> -	 * read-side critical section to precede the corresponding
> -	 * srcu_read_lock() -- ensuring that such references will in
> -	 * fact be protected.
> -	 *
> -	 * So it is now safe to do the flip.
> -	 */
> -
> -	idx = sp->completed & 0x1;
> -	sp->completed++;
> -
> -	synchronize_sched();  /* Force memory barrier on all CPUs. */
> -
> -	/*
> -	 * At this point, because of the preceding synchronize_sched(),
> -	 * all srcu_read_lock() calls using the old counters have completed.
> -	 * Their corresponding critical sections might well be still
> -	 * executing, but the srcu_read_lock() primitives themselves
> -	 * will have finished executing.
> -	 */
> -
> -	while (srcu_readers_active_idx(sp, idx))
> -		schedule_timeout_interruptible(1);
> -
> -	synchronize_sched();  /* Force memory barrier on all CPUs. */
> -
> -	/*
> -	 * The preceding synchronize_sched() forces all srcu_read_unlock()
> -	 * primitives that were executing concurrently with the preceding
> -	 * for_each_possible_cpu() loop to have completed by this point.
> -	 * More importantly, it also forces the corresponding SRCU read-side
> -	 * critical sections to have also completed, and the corresponding
> -	 * references to SRCU-protected data items to be dropped.
> -	 *
> -	 * Note:
> -	 *
> -	 *	Despite what you might think at first glance, the
> -	 *	preceding synchronize_sched() -must- be within the
> -	 *	critical section ended by the following mutex_unlock().
> -	 *	Otherwise, a task taking the early exit can race
> -	 *	with a srcu_read_unlock(), which might have executed
> -	 *	just before the preceding srcu_readers_active() check,
> -	 *	and whose CPU might have reordered the srcu_read_unlock()
> -	 *	with the preceding critical section.  In this case, there
> -	 *	is nothing preventing the synchronize_sched() task that is
> -	 *	taking the early exit from freeing a data structure that
> -	 *	is still being referenced (out of order) by the task
> -	 *	doing the srcu_read_unlock().
> -	 *
> -	 *	Alternatively, the comparison with "2" on the early exit
> -	 *	could be changed to "3", but this increases synchronize_srcu()
> -	 *	latency for bulk loads.  So the current code is preferred.
> -	 */
> -
> -	mutex_unlock(&sp->mutex);
> +	__synchronize_srcu(sp, synchronize_sched);
>  }

Hmmm...  Why not simply "synchronize_srcu()"?  Is this to allow a smaller
piece of code to cover all the combinations of Linux kernel versions?

>  /**
> @@ -265,3 +284,166 @@ EXPORT_SYMBOL_GPL(kvm_synchronize_srcu);
>  EXPORT_SYMBOL_GPL(kvm_srcu_batches_completed);
>  
>  #endif
> +
> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,33) && defined(CONFIG_SMP)
> +
> +struct sync_req {
> +	struct list_head list;
> +	bool pending;
> +	bool success;
> +	struct completion done;
> +};
> +
> +static DEFINE_PER_CPU(struct sync_req, sync_req);
> +static DEFINE_PER_CPU(struct task_struct *, sync_thread);
> +static DEFINE_MUTEX(rcu_sched_expedited_mutex);
> +
> +static long synchronize_sched_expedited_count;

Sigh!!!  You copied one of my theoretical bugs.

According to the C standard, overflowing a signed integer is undefined
(though it has been a -long- time since I have seen hardware that
did anything other than wrap like you would expect).  My todo list
includes converting these to "unsigned long".  I can do certainly add
this instance to my list, but I would of course be quite happy if you
made the appropriate adjustments.

(This effort got preempted by applying lockdep to RCU, given that Thomas
Gleixner found some non-theoretical RCU-usage bugs.)

> +static int kvm_rcu_sync_thread(void *data)
> +{
> +	int badcpu;
> +	int cpu = (long)data;
> +	struct sync_req *req = &per_cpu(sync_req, cpu);
> +
> +	set_current_state(TASK_INTERRUPTIBLE);
> +	while (!kthread_should_stop()) {
> +		if (!req->pending) {
> +			schedule();
> +			set_current_state(TASK_INTERRUPTIBLE);
> +			continue;
> +		}
> +		req->pending = false;
> +
> +		preempt_disable();
> +		badcpu = smp_processor_id();
> +		if (likely(cpu == badcpu)) {
> +			req->success = true;
> +		} else {
> +			req->success = false;
> +			WARN_ONCE(1, "kvm_rcu_sync_thread() on CPU %d, "
> +				  "expected %d\n", badcpu, cpu);
> +		}
> +		preempt_enable();
> +
> +		complete(&req->done);
> +	}
> +	__set_current_state(TASK_RUNNING);
> +
> +	return 0;
> +}
> +
> +static void kvm_synchronize_sched_expedited(void)
> +{
> +	int cpu;
> +	bool need_full_sync = 0;
> +	struct sync_req *req;
> +	long snap;
> +	int trycount = 0;
> +
> +	smp_mb();  /* ensure prior mod happens before capturing snap. */
> +	snap = ACCESS_ONCE(synchronize_sched_expedited_count) + 1;
> +	get_online_cpus();
> +	while (!mutex_trylock(&rcu_sched_expedited_mutex)) {
> +		put_online_cpus();
> +		if (trycount++ < 10)
> +			udelay(trycount * num_online_cpus());
> +		else {
> +			synchronize_sched();
> +			return;
> +		}
> +		if (ACCESS_ONCE(synchronize_sched_expedited_count) - snap > 0) {
> +			smp_mb(); /* ensure test happens before caller kfree */
> +			return;
> +		}
> +		get_online_cpus();
> +	}
> +	for_each_online_cpu(cpu) {
> +		req = &per_cpu(sync_req, cpu);
> +		init_completion(&req->done);
> +		smp_wmb();
> +		req->pending = true;
> +		wake_up_process(per_cpu(sync_thread, cpu));
> +	}
> +	for_each_online_cpu(cpu) {
> +		req = &per_cpu(sync_req, cpu);
> +		wait_for_completion(&req->done);
> +		if (unlikely(!req->success))
> +			need_full_sync = 1;
> +	}
> +	synchronize_sched_expedited_count++;
> +	mutex_unlock(&rcu_sched_expedited_mutex);
> +	put_online_cpus();
> +	if (need_full_sync)
> +		synchronize_sched();
> +}
> +
> +/**
> + * synchronize_srcu_expedited - like synchronize_srcu, but less patient

"kvm_synchronize_srcu_expedited", right?  ;-)

> + * @sp: srcu_struct with which to synchronize.
> + *
> + * Flip the completed counter, and wait for the old count to drain to zero.
> + * As with classic RCU, the updater must use some separate means of
> + * synchronizing concurrent updates.  Can block; must be called from
> + * process context.
> + *
> + * Note that it is illegal to call synchronize_srcu_expedited()
> + * from the corresponding SRCU read-side critical section; doing so
> + * will result in deadlock.  However, it is perfectly legal to call
> + * synchronize_srcu_expedited() on one srcu_struct from some other
> + * srcu_struct's read-side critical section.
> + */
> +void kvm_synchronize_srcu_expedited(struct srcu_struct *sp)
> +{
> +	__synchronize_srcu(sp, kvm_synchronize_sched_expedited);
> +}
> +EXPORT_SYMBOL_GPL(kvm_synchronize_srcu_expedited);
> +
> +int kvm_init_srcu(void)
> +{
> +	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
> +	struct task_struct *p;
> +	int cpu;
> +	int err;
> +
> +	for_each_online_cpu(cpu) {
> +		p = kthread_create(kvm_rcu_sync_thread, (void *)(long)cpu,
> +				   "kvmsrcusync/%d", cpu);
> +		if (IS_ERR(p))
> +			goto error_out;
> +
> +		kthread_bind(p, cpu);
> +		sched_setscheduler(p, SCHED_FIFO, &param);
> +		per_cpu(sync_thread, cpu) = p;
> +		wake_up_process(p);
> +	}
> +	return 0;
> +
> +error_out:
> +	printk(KERN_ERR "kvm: kvmsrcsync for %d failed\n", cpu);
> +	err = PTR_ERR(p);
> +	kvm_exit_srcu();
> +	return err;
> +}

CPU hotplug?

> +void kvm_exit_srcu(void)
> +{
> +	int cpu;
> +
> +	for_each_online_cpu(cpu)
> +		if (per_cpu(sync_thread, cpu))
> +			kthread_stop(per_cpu(sync_thread, cpu));
> +}
> +
> +#else
> +
> +int kvm_init_srcu(void)
> +{
> +	return 0;
> +}
> +
> +void kvm_exit_srcu(void)
> +{
> +}
> +
> +#endif

I do not understand what follows -- a KVM-specific test suite or some
such?

> diff --git a/sync b/sync
> index 0eaff31..d67469e 100755
> --- a/sync
> +++ b/sync
> @@ -40,8 +40,9 @@ def __hack(data):
>          'do_machine_check eventfd_signal get_desc_base get_desc_limit '
>          'vma_kernel_pagesize native_read_tsc user_return_notifier '
>          'user_return_notifier_register user_return_notifier_unregister '
> +        'synchronize_srcu_expedited '
>          )
> -    anon_inodes = anon_inodes_exit = False
> +    kvm_init = kvm_exit = False
>      mce = False
>      result = []
>      pr_fmt = ''
> @@ -58,24 +59,30 @@ def __hack(data):
>              pr_fmt = sub(r'#define pr_fmt\([^)]*\) ("[^"]*").*', r'\1', line) + ' '
>              line = ''
>          line = sub(r'pr_debug\(([^),]*)', r'pr_debug(' + pr_fmt + r'\1', line)
> -        if match(r'^int kvm_init\('): anon_inodes = True
> -        if match(r'r = kvm_arch_init\(opaque\);') and anon_inodes:
> +        if match(r'^int kvm_init\('): kvm_init = True
> +        if match(r'r = kvm_arch_init\(opaque\);') and kvm_init:
>              w('\tr = kvm_init_anon_inodes();')
>              w('\tif (r)')
>              w('\t\treturn r;\n')
> +            w('\tr = kvm_init_srcu();')
> +            w('\tif (r)')
> +            w('\t\tgoto cleanup_anon_inodes;\n')
>              w('\tpreempt_notifier_sys_init();')
>              w('\thrtimer_kallsyms_resolve();\n')
> -        if match(r'return 0;') and anon_inodes:
> +        if match(r'return 0;') and kvm_init:
>              w('\tprintk("loaded kvm module (%s)\\n");\n' % (version,))
> -        if match(r'return r;') and anon_inodes:
> -            w('\tkvm_exit_anon_inodes();')
> +        if match(r'return r;') and kvm_init:
>              w('\tpreempt_notifier_sys_exit();')
> -            anon_inodes = False
> -        if match(r'^void kvm_exit'): anon_inodes_exit = True
> -        if match(r'\}') and anon_inodes_exit:
> +            w('\tkvm_exit_srcu();')
> +            w('cleanup_anon_inodes:')
>              w('\tkvm_exit_anon_inodes();')
> +            kvm_init = False
> +        if match(r'^void kvm_exit'): kvm_exit = True
> +        if match(r'\}') and kvm_exit:
>              w('\tpreempt_notifier_sys_exit();')
> -            anon_inodes_exit = False
> +            w('\tkvm_exit_srcu();')
> +            w('\tkvm_exit_anon_inodes();')
> +            kvm_exit = False
>          if match(r'^int kvm_arch_init'): kvm_arch_init = True
>          if match(r'\btsc_khz\b') and kvm_arch_init:
>              line = sub(r'\btsc_khz\b', 'kvm_tsc_khz', line)
> -- 
> 1.6.0.2
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-14 23:48                   ` Paul E. McKenney
@ 2010-01-15  8:56                     ` Jan Kiszka
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kiszka @ 2010-01-15  8:56 UTC (permalink / raw)
  To: paulmck
  Cc: Avi Kivity, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael Goldish, Eduardo Habkost, Dor Laor

[-- Attachment #1: Type: text/plain, Size: 25839 bytes --]

Paul E. McKenney wrote:
> On Thu, Jan 14, 2010 at 01:11:33AM +0100, Jan Kiszka wrote:
>> Paul E. McKenney wrote:
>>> On Tue, Jan 12, 2010 at 09:28:15AM +0100, Jan Kiszka wrote:
>>>> If so, I will try to write something like this the next days. Will
>>>> surely appreciate your review afterwards!
>>> Sounds good!
>>>
>> Here we go, find the commit inlined below. You may be primarily
>> interested in kvm_synchronize_sched_expedited() and the helper thread
>> function kvm_rcu_sync_thread. As I use a dedicated thread I simplified
>> some parts compared to upstream. Light testing indicates that it works.
>>
>> The full set of changes (specifically interesting for anyone who wants
>> to test it) is available on kernel.org. There is now also a
>> commit [1] included for CPU hotplug support, but it's untested as I don't
>> have the required hardware at hand.
> 
> This patch is only for backporting, right?  In mainline, you should be
> able to simply call synchronize_srcu_expedited().

Right. kvm-kmod [1] is like compat-wireless: Take latest modules sources
from the kernel, patch them a bit (what the sync script does), wrap a
bit, and you are able to build and run the result on older kernels.

So I was forced to provide synchronize_srcu_expedited for <= 2.6.31.
Actually, I wrap 2.6.32 as well to have a version that includes your
latest optimization.

> 
>> Thanks in advance,
>> Jan
>>
>> [1] http://git.kernel.org/?p=virt/kvm/kvm-kmod.git;a=commitdiff;h=81a3ddcd8ea343a9570f3164bb31160ff4be0ab7
>>
>> --------->
>>
>> From fe13d0785586623176879a87949ad626dace192c Mon Sep 17 00:00:00 2001
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>> Date: Wed, 13 Jan 2010 09:32:50 +0100
>> Subject: [PATCH] Compat support for synchronize_srcu_sync
>>
>> This services was first introduced to 2.6.32 and further optimized in
>> 2.6.33. KVM makes use of it for replacing the rw-sem based slot lock
>> with SRCU.
>>
>> TODO: Add CPU-hotplug support to the sync kthreads.
> 
> Oh.  Never mind my complaint about lack of CPU-hotplug support below.  ;-)
> 
> The function migration_call() in kernel/sched.c has the code you would
> want to harvest for this.  And re-reading your text above, perhaps you
> already have done so.

Yep (stolen from softirq.c):

From 81a3ddcd8ea343a9570f3164bb31160ff4be0ab7 Mon Sep 17 00:00:00 2001
From: Jan Kiszka <jan.kiszka@siemens.com>
Date: Thu, 14 Jan 2010 00:26:07 +0100
Subject: [PATCH] Add CPU hotplug support for kvmsrcsync thread

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 srcu.c |   64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/srcu.c b/srcu.c
index 985d627..841394d 100644
--- a/srcu.c
+++ b/srcu.c
@@ -399,13 +399,66 @@ void kvm_synchronize_srcu_expedited(struct srcu_struct *sp)
 }
 EXPORT_SYMBOL_GPL(kvm_synchronize_srcu_expedited);
 
+static struct sched_param sync_thread_param = {
+	.sched_priority = MAX_RT_PRIO-1
+};
+
+#ifdef CONFIG_HOTPLUG_CPU
+#include <linux/cpumask.h>
+
+static int cpu_callback(struct notifier_block *nfb, unsigned long action,
+			void *hcpu)
+{
+	int hotcpu = (unsigned long)hcpu;
+	struct task_struct *p;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+		p = kthread_create(kvm_rcu_sync_thread, hcpu,
+				   "kvmsrcusync/%d", hotcpu);
+		if (IS_ERR(p)) {
+			printk(KERN_ERR "kvm: kvmsrcsync for %d failed\n",
+			       hotcpu);
+			return NOTIFY_BAD;
+		}
+		kthread_bind(p, hotcpu);
+		sched_setscheduler(p, SCHED_FIFO, &sync_thread_param);
+		per_cpu(sync_thread, hotcpu) = p;
+		break;
+	case CPU_ONLINE:
+	case CPU_ONLINE_FROZEN:
+		wake_up_process(per_cpu(sync_thread, hotcpu));
+		break;
+	case CPU_UP_CANCELED:
+	case CPU_UP_CANCELED_FROZEN:
+		if (!per_cpu(sync_thread, hotcpu))
+			break;
+		/* Unbind so it can run.  Fall thru. */
+		kthread_bind(per_cpu(sync_thread, hotcpu),
+			     cpumask_any(cpu_online_mask));
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		p = per_cpu(sync_thread, hotcpu);
+		per_cpu(sync_thread, hotcpu) = NULL;
+		kthread_stop(p);
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block cpu_nfb = {
+	.notifier_call = cpu_callback
+};
+#endif /* CONFIG_HOTPLUG_CPU */
+
 int kvm_init_srcu(void)
 {
-	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 	struct task_struct *p;
 	int cpu;
 	int err;
 
+	get_online_cpus();
 	for_each_online_cpu(cpu) {
 		p = kthread_create(kvm_rcu_sync_thread, (void *)(long)cpu,
 				   "kvmsrcusync/%d", cpu);
@@ -413,13 +466,19 @@ int kvm_init_srcu(void)
 			goto error_out;
 
 		kthread_bind(p, cpu);
-		sched_setscheduler(p, SCHED_FIFO, &param);
+		sched_setscheduler(p, SCHED_FIFO, &sync_thread_param);
 		per_cpu(sync_thread, cpu) = p;
 		wake_up_process(p);
 	}
+#ifdef CONFIG_HOTPLUG_CPU
+	register_cpu_notifier(&cpu_nfb);
+#endif /* CONFIG_HOTPLUG_CPU */
+	put_online_cpus();
+
 	return 0;
 
 error_out:
+	put_online_cpus();
 	printk(KERN_ERR "kvm: kvmsrcsync for %d failed\n", cpu);
 	err = PTR_ERR(p);
 	kvm_exit_srcu();
@@ -430,6 +489,7 @@ void kvm_exit_srcu(void)
 {
 	int cpu;
 
+	unregister_cpu_notifier(&cpu_nfb);
 	for_each_online_cpu(cpu)
 		if (per_cpu(sync_thread, cpu))
 			kthread_stop(per_cpu(sync_thread, cpu));
-- 
1.6.0.2

> 
> Anyway, looks good.  There are some questions interspersed below to make
> sure I understand the situation.  There is one cut-and-paste spelling
> error and another cut-and-paste C-standard violation, the latter being
> my fault, and as far as I know, not a problem in practice.
> 
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  external-module-compat-comm.h |   17 ++
>>  srcu.c                        |  382 ++++++++++++++++++++++++++++++-----------
>>  sync                          |   27 ++-
>>  3 files changed, 316 insertions(+), 110 deletions(-)
>>
>> diff --git a/external-module-compat-comm.h b/external-module-compat-comm.h
>> index ca57567..eabf8a1 100644
>> --- a/external-module-compat-comm.h
>> +++ b/external-module-compat-comm.h
>> @@ -1053,3 +1053,20 @@ static inline void kvm_fire_urn(void)
>>  static inline void kvm_fire_urn(void) {}
>>  
>>  #endif /* CONFIG_USER_RETURN_NOTIFIER */
>> +
>> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,33)
>> +
>> +#ifdef CONFIG_SMP
>> +void kvm_synchronize_srcu_expedited(struct srcu_struct *sp);
>> +#else
>> +static inline void kvm_synchronize_srcu_expedited(struct srcu_struct *sp) { }
>> +#endif
>> +
>> +#else
>> +
>> +#define kvm_synchronize_srcu_expedited synchronize_srcu_expedited
>> +
>> +#endif
>> +
>> +int kvm_init_srcu(void);
>> +void kvm_exit_srcu(void);
>> diff --git a/srcu.c b/srcu.c
> 
> Are we in kernel/srcu.c or a kvm-specific file?

We are in kvm-kmod/srcu.c, the wrapping code for srcu on older kernels
(used to work down to 2.6.16).

> 
> My guess is that we are in kernel/srcu.c for a backport-only patch, but
> have to ask!
> 
>> index e9734bc..985d627 100644
>> --- a/srcu.c
>> +++ b/srcu.c
>> @@ -24,8 +24,6 @@
>>   *
>>   */
>>  
>> -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19)
>> -
>>  #include <linux/module.h>
>>  #include <linux/mutex.h>
>>  #include <linux/percpu.h>
>> @@ -35,6 +33,118 @@
>>  #include <linux/slab.h>
>>  #include <linux/smp.h>
>>  #include <linux/srcu.h>
>> +#include <linux/kthread.h>
>> +
>> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19) || \
>> +    (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,33) && defined(CONFIG_SMP))
>> +
>> +/*
>> + * srcu_readers_active_idx -- returns approximate number of readers
>> + *	active on the specified rank of per-CPU counters.
>> + */
>> +
>> +static int srcu_readers_active_idx(struct srcu_struct *sp, int idx)
>> +{
>> +	int cpu;
>> +	int sum;
>> +
>> +	sum = 0;
>> +	for_each_possible_cpu(cpu)
>> +		sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
>> +	return sum;
>> +}
>> +
>> +/*
>> + * Helper function for synchronize_srcu() and synchronize_srcu_expedited().
>> + */
>> +static void __synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void))
>> +{
>> +	int idx;
>> +
>> +	idx = sp->completed;
>> +	mutex_lock(&sp->mutex);
>> +
>> +	/*
>> +	 * Check to see if someone else did the work for us while we were
>> +	 * waiting to acquire the lock.  We need -two- advances of
>> +	 * the counter, not just one.  If there was but one, we might have
>> +	 * shown up -after- our helper's first synchronize_sched(), thus
>> +	 * having failed to prevent CPU-reordering races with concurrent
>> +	 * srcu_read_unlock()s on other CPUs (see comment below).  So we
>> +	 * either (1) wait for two or (2) supply the second ourselves.
>> +	 */
>> +
>> +	if ((sp->completed - idx) >= 2) {
>> +		mutex_unlock(&sp->mutex);
>> +		return;
>> +	}
>> +
>> +	sync_func();  /* Force memory barrier on all CPUs. */
>> +
>> +	/*
>> +	 * The preceding synchronize_sched() ensures that any CPU that
>> +	 * sees the new value of sp->completed will also see any preceding
>> +	 * changes to data structures made by this CPU.  This prevents
>> +	 * some other CPU from reordering the accesses in its SRCU
>> +	 * read-side critical section to precede the corresponding
>> +	 * srcu_read_lock() -- ensuring that such references will in
>> +	 * fact be protected.
>> +	 *
>> +	 * So it is now safe to do the flip.
>> +	 */
>> +
>> +	idx = sp->completed & 0x1;
>> +	sp->completed++;
>> +
>> +	sync_func();  /* Force memory barrier on all CPUs. */
>> +
>> +	/*
>> +	 * At this point, because of the preceding synchronize_sched(),
>> +	 * all srcu_read_lock() calls using the old counters have completed.
>> +	 * Their corresponding critical sections might well be still
>> +	 * executing, but the srcu_read_lock() primitives themselves
>> +	 * will have finished executing.
>> +	 */
>> +
>> +	while (srcu_readers_active_idx(sp, idx))
>> +		schedule_timeout_interruptible(1);
>> +
>> +	sync_func();  /* Force memory barrier on all CPUs. */
>> +
>> +	/*
>> +	 * The preceding synchronize_sched() forces all srcu_read_unlock()
>> +	 * primitives that were executing concurrently with the preceding
>> +	 * for_each_possible_cpu() loop to have completed by this point.
>> +	 * More importantly, it also forces the corresponding SRCU read-side
>> +	 * critical sections to have also completed, and the corresponding
>> +	 * references to SRCU-protected data items to be dropped.
>> +	 *
>> +	 * Note:
>> +	 *
>> +	 *	Despite what you might think at first glance, the
>> +	 *	preceding synchronize_sched() -must- be within the
>> +	 *	critical section ended by the following mutex_unlock().
>> +	 *	Otherwise, a task taking the early exit can race
>> +	 *	with a srcu_read_unlock(), which might have executed
>> +	 *	just before the preceding srcu_readers_active() check,
>> +	 *	and whose CPU might have reordered the srcu_read_unlock()
>> +	 *	with the preceding critical section.  In this case, there
>> +	 *	is nothing preventing the synchronize_sched() task that is
>> +	 *	taking the early exit from freeing a data structure that
>> +	 *	is still being referenced (out of order) by the task
>> +	 *	doing the srcu_read_unlock().
>> +	 *
>> +	 *	Alternatively, the comparison with "2" on the early exit
>> +	 *	could be changed to "3", but this increases synchronize_srcu()
>> +	 *	latency for bulk loads.  So the current code is preferred.
>> +	 */
>> +
>> +	mutex_unlock(&sp->mutex);
>> +}
>> +
>> +#endif
>> +
>> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19)
>>  
>>  #undef kvm_init_srcu_struct
>>  #undef kvm_cleanup_srcu_struct
>> @@ -42,6 +152,11 @@
>>  #undef kvm_srcu_read_unlock
>>  #undef kvm_synchronize_srcu
>>  #undef kvm_srcu_batches_completed
>> +
>> +static int srcu_readers_active_idx(struct srcu_struct *sp, int idx);
>> +static void
>> +__synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void));
>> +
>>  /**
>>   * init_srcu_struct - initialize a sleep-RCU structure
>>   * @sp: structure to initialize.
>> @@ -58,22 +173,6 @@ int kvm_init_srcu_struct(struct srcu_struct *sp)
>>  	return (sp->per_cpu_ref ? 0 : -ENOMEM);
>>  }
>>  
>> -/*
>> - * srcu_readers_active_idx -- returns approximate number of readers
>> - *	active on the specified rank of per-CPU counters.
>> - */
>> -
>> -static int srcu_readers_active_idx(struct srcu_struct *sp, int idx)
>> -{
>> -	int cpu;
>> -	int sum;
>> -
>> -	sum = 0;
>> -	for_each_possible_cpu(cpu)
>> -		sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
>> -	return sum;
>> -}
>> -
>>  /**
>>   * srcu_readers_active - returns approximate number of readers.
>>   * @sp: which srcu_struct to count active readers (holding srcu_read_lock).
>> @@ -154,94 +253,14 @@ void kvm_srcu_read_unlock(struct srcu_struct *sp, int idx)
>>   * synchronizing concurrent updates.  Can block; must be called from
>>   * process context.
>>   *
>> - * Note that it is illegal to call synchornize_srcu() from the corresponding
>> + * Note that it is illegal to call synchronize_srcu() from the corresponding
>>   * SRCU read-side critical section; doing so will result in deadlock.
>>   * However, it is perfectly legal to call synchronize_srcu() on one
>>   * srcu_struct from some other srcu_struct's read-side critical section.
>>   */
>>  void kvm_synchronize_srcu(struct srcu_struct *sp)
>>  {
>> -	int idx;
>> -
>> -	idx = sp->completed;
>> -	mutex_lock(&sp->mutex);
>> -
>> -	/*
>> -	 * Check to see if someone else did the work for us while we were
>> -	 * waiting to acquire the lock.  We need -two- advances of
>> -	 * the counter, not just one.  If there was but one, we might have
>> -	 * shown up -after- our helper's first synchronize_sched(), thus
>> -	 * having failed to prevent CPU-reordering races with concurrent
>> -	 * srcu_read_unlock()s on other CPUs (see comment below).  So we
>> -	 * either (1) wait for two or (2) supply the second ourselves.
>> -	 */
>> -
>> -	if ((sp->completed - idx) >= 2) {
>> -		mutex_unlock(&sp->mutex);
>> -		return;
>> -	}
>> -
>> -	synchronize_sched();  /* Force memory barrier on all CPUs. */
>> -
>> -	/*
>> -	 * The preceding synchronize_sched() ensures that any CPU that
>> -	 * sees the new value of sp->completed will also see any preceding
>> -	 * changes to data structures made by this CPU.  This prevents
>> -	 * some other CPU from reordering the accesses in its SRCU
>> -	 * read-side critical section to precede the corresponding
>> -	 * srcu_read_lock() -- ensuring that such references will in
>> -	 * fact be protected.
>> -	 *
>> -	 * So it is now safe to do the flip.
>> -	 */
>> -
>> -	idx = sp->completed & 0x1;
>> -	sp->completed++;
>> -
>> -	synchronize_sched();  /* Force memory barrier on all CPUs. */
>> -
>> -	/*
>> -	 * At this point, because of the preceding synchronize_sched(),
>> -	 * all srcu_read_lock() calls using the old counters have completed.
>> -	 * Their corresponding critical sections might well be still
>> -	 * executing, but the srcu_read_lock() primitives themselves
>> -	 * will have finished executing.
>> -	 */
>> -
>> -	while (srcu_readers_active_idx(sp, idx))
>> -		schedule_timeout_interruptible(1);
>> -
>> -	synchronize_sched();  /* Force memory barrier on all CPUs. */
>> -
>> -	/*
>> -	 * The preceding synchronize_sched() forces all srcu_read_unlock()
>> -	 * primitives that were executing concurrently with the preceding
>> -	 * for_each_possible_cpu() loop to have completed by this point.
>> -	 * More importantly, it also forces the corresponding SRCU read-side
>> -	 * critical sections to have also completed, and the corresponding
>> -	 * references to SRCU-protected data items to be dropped.
>> -	 *
>> -	 * Note:
>> -	 *
>> -	 *	Despite what you might think at first glance, the
>> -	 *	preceding synchronize_sched() -must- be within the
>> -	 *	critical section ended by the following mutex_unlock().
>> -	 *	Otherwise, a task taking the early exit can race
>> -	 *	with a srcu_read_unlock(), which might have executed
>> -	 *	just before the preceding srcu_readers_active() check,
>> -	 *	and whose CPU might have reordered the srcu_read_unlock()
>> -	 *	with the preceding critical section.  In this case, there
>> -	 *	is nothing preventing the synchronize_sched() task that is
>> -	 *	taking the early exit from freeing a data structure that
>> -	 *	is still being referenced (out of order) by the task
>> -	 *	doing the srcu_read_unlock().
>> -	 *
>> -	 *	Alternatively, the comparison with "2" on the early exit
>> -	 *	could be changed to "3", but this increases synchronize_srcu()
>> -	 *	latency for bulk loads.  So the current code is preferred.
>> -	 */
>> -
>> -	mutex_unlock(&sp->mutex);
>> +	__synchronize_srcu(sp, synchronize_sched);
>>  }
> 
> Hmmm...  Why not simply "synchronize_srcu()"?  Is this to allow a smaller
> piece of code to cover all the combinations of Linux kernel versions?

This is for kernels that do not have it.

> 
>>  /**
>> @@ -265,3 +284,166 @@ EXPORT_SYMBOL_GPL(kvm_synchronize_srcu);
>>  EXPORT_SYMBOL_GPL(kvm_srcu_batches_completed);
>>  
>>  #endif
>> +
>> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,33) && defined(CONFIG_SMP)
>> +
>> +struct sync_req {
>> +	struct list_head list;
>> +	bool pending;
>> +	bool success;
>> +	struct completion done;
>> +};
>> +
>> +static DEFINE_PER_CPU(struct sync_req, sync_req);
>> +static DEFINE_PER_CPU(struct task_struct *, sync_thread);
>> +static DEFINE_MUTEX(rcu_sched_expedited_mutex);
>> +
>> +static long synchronize_sched_expedited_count;
> 
> Sigh!!!  You copied one of my theoretical bugs.
> 
> According to the C standard, overflowing a signed integer is undefined
> (though it has been a -long- time since I have seen hardware that
> did anything other than wrap like you would expect).  My todo list
> includes converting these to "unsigned long".  I can do certainly add
> this instance to my list, but I would of course be quite happy if you
> made the appropriate adjustments.

Will do! :)

> 
> (This effort got preempted by applying lockdep to RCU, given that Thomas
> Gleixner found some non-theoretical RCU-usage bugs.)
> 
>> +static int kvm_rcu_sync_thread(void *data)
>> +{
>> +	int badcpu;
>> +	int cpu = (long)data;
>> +	struct sync_req *req = &per_cpu(sync_req, cpu);
>> +
>> +	set_current_state(TASK_INTERRUPTIBLE);
>> +	while (!kthread_should_stop()) {
>> +		if (!req->pending) {
>> +			schedule();
>> +			set_current_state(TASK_INTERRUPTIBLE);
>> +			continue;
>> +		}
>> +		req->pending = false;
>> +
>> +		preempt_disable();
>> +		badcpu = smp_processor_id();
>> +		if (likely(cpu == badcpu)) {
>> +			req->success = true;
>> +		} else {
>> +			req->success = false;
>> +			WARN_ONCE(1, "kvm_rcu_sync_thread() on CPU %d, "
>> +				  "expected %d\n", badcpu, cpu);
>> +		}
>> +		preempt_enable();
>> +
>> +		complete(&req->done);
>> +	}
>> +	__set_current_state(TASK_RUNNING);
>> +
>> +	return 0;
>> +}
>> +
>> +static void kvm_synchronize_sched_expedited(void)
>> +{
>> +	int cpu;
>> +	bool need_full_sync = 0;
>> +	struct sync_req *req;
>> +	long snap;
>> +	int trycount = 0;
>> +
>> +	smp_mb();  /* ensure prior mod happens before capturing snap. */
>> +	snap = ACCESS_ONCE(synchronize_sched_expedited_count) + 1;
>> +	get_online_cpus();
>> +	while (!mutex_trylock(&rcu_sched_expedited_mutex)) {
>> +		put_online_cpus();
>> +		if (trycount++ < 10)
>> +			udelay(trycount * num_online_cpus());
>> +		else {
>> +			synchronize_sched();
>> +			return;
>> +		}
>> +		if (ACCESS_ONCE(synchronize_sched_expedited_count) - snap > 0) {
>> +			smp_mb(); /* ensure test happens before caller kfree */
>> +			return;
>> +		}
>> +		get_online_cpus();
>> +	}
>> +	for_each_online_cpu(cpu) {
>> +		req = &per_cpu(sync_req, cpu);
>> +		init_completion(&req->done);
>> +		smp_wmb();
>> +		req->pending = true;
>> +		wake_up_process(per_cpu(sync_thread, cpu));
>> +	}
>> +	for_each_online_cpu(cpu) {
>> +		req = &per_cpu(sync_req, cpu);
>> +		wait_for_completion(&req->done);
>> +		if (unlikely(!req->success))
>> +			need_full_sync = 1;
>> +	}
>> +	synchronize_sched_expedited_count++;
>> +	mutex_unlock(&rcu_sched_expedited_mutex);
>> +	put_online_cpus();
>> +	if (need_full_sync)
>> +		synchronize_sched();
>> +}
>> +
>> +/**
>> + * synchronize_srcu_expedited - like synchronize_srcu, but less patient
> 
> "kvm_synchronize_srcu_expedited", right?  ;-)

Yes. But it's actually synchronize_srcu_expedited as we patch all
call-sides in the kvm module code to avoid namespace conflicts on
2.6.32.

> 
>> + * @sp: srcu_struct with which to synchronize.
>> + *
>> + * Flip the completed counter, and wait for the old count to drain to zero.
>> + * As with classic RCU, the updater must use some separate means of
>> + * synchronizing concurrent updates.  Can block; must be called from
>> + * process context.
>> + *
>> + * Note that it is illegal to call synchronize_srcu_expedited()
>> + * from the corresponding SRCU read-side critical section; doing so
>> + * will result in deadlock.  However, it is perfectly legal to call
>> + * synchronize_srcu_expedited() on one srcu_struct from some other
>> + * srcu_struct's read-side critical section.
>> + */
>> +void kvm_synchronize_srcu_expedited(struct srcu_struct *sp)
>> +{
>> +	__synchronize_srcu(sp, kvm_synchronize_sched_expedited);
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_synchronize_srcu_expedited);
>> +
>> +int kvm_init_srcu(void)
>> +{
>> +	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
>> +	struct task_struct *p;
>> +	int cpu;
>> +	int err;
>> +
>> +	for_each_online_cpu(cpu) {
>> +		p = kthread_create(kvm_rcu_sync_thread, (void *)(long)cpu,
>> +				   "kvmsrcusync/%d", cpu);
>> +		if (IS_ERR(p))
>> +			goto error_out;
>> +
>> +		kthread_bind(p, cpu);
>> +		sched_setscheduler(p, SCHED_FIFO, &param);
>> +		per_cpu(sync_thread, cpu) = p;
>> +		wake_up_process(p);
>> +	}
>> +	return 0;
>> +
>> +error_out:
>> +	printk(KERN_ERR "kvm: kvmsrcsync for %d failed\n", cpu);
>> +	err = PTR_ERR(p);
>> +	kvm_exit_srcu();
>> +	return err;
>> +}
> 
> CPU hotplug?
> 
>> +void kvm_exit_srcu(void)
>> +{
>> +	int cpu;
>> +
>> +	for_each_online_cpu(cpu)
>> +		if (per_cpu(sync_thread, cpu))
>> +			kthread_stop(per_cpu(sync_thread, cpu));
>> +}
>> +
>> +#else
>> +
>> +int kvm_init_srcu(void)
>> +{
>> +	return 0;
>> +}
>> +
>> +void kvm_exit_srcu(void)
>> +{
>> +}
>> +
>> +#endif
> 
> I do not understand what follows -- a KVM-specific test suite or some
> such?

Patch script that mangles the source when importing it from the kernel
into kvm-kmod. E.g. we patch synchronize_srcu_expedited to redirect it
to our implementation.

> 
>> diff --git a/sync b/sync
>> index 0eaff31..d67469e 100755
>> --- a/sync
>> +++ b/sync
>> @@ -40,8 +40,9 @@ def __hack(data):
>>          'do_machine_check eventfd_signal get_desc_base get_desc_limit '
>>          'vma_kernel_pagesize native_read_tsc user_return_notifier '
>>          'user_return_notifier_register user_return_notifier_unregister '
>> +        'synchronize_srcu_expedited '
>>          )
>> -    anon_inodes = anon_inodes_exit = False
>> +    kvm_init = kvm_exit = False
>>      mce = False
>>      result = []
>>      pr_fmt = ''
>> @@ -58,24 +59,30 @@ def __hack(data):
>>              pr_fmt = sub(r'#define pr_fmt\([^)]*\) ("[^"]*").*', r'\1', line) + ' '
>>              line = ''
>>          line = sub(r'pr_debug\(([^),]*)', r'pr_debug(' + pr_fmt + r'\1', line)
>> -        if match(r'^int kvm_init\('): anon_inodes = True
>> -        if match(r'r = kvm_arch_init\(opaque\);') and anon_inodes:
>> +        if match(r'^int kvm_init\('): kvm_init = True
>> +        if match(r'r = kvm_arch_init\(opaque\);') and kvm_init:
>>              w('\tr = kvm_init_anon_inodes();')
>>              w('\tif (r)')
>>              w('\t\treturn r;\n')
>> +            w('\tr = kvm_init_srcu();')
>> +            w('\tif (r)')
>> +            w('\t\tgoto cleanup_anon_inodes;\n')
>>              w('\tpreempt_notifier_sys_init();')
>>              w('\thrtimer_kallsyms_resolve();\n')
>> -        if match(r'return 0;') and anon_inodes:
>> +        if match(r'return 0;') and kvm_init:
>>              w('\tprintk("loaded kvm module (%s)\\n");\n' % (version,))
>> -        if match(r'return r;') and anon_inodes:
>> -            w('\tkvm_exit_anon_inodes();')
>> +        if match(r'return r;') and kvm_init:
>>              w('\tpreempt_notifier_sys_exit();')
>> -            anon_inodes = False
>> -        if match(r'^void kvm_exit'): anon_inodes_exit = True
>> -        if match(r'\}') and anon_inodes_exit:
>> +            w('\tkvm_exit_srcu();')
>> +            w('cleanup_anon_inodes:')
>>              w('\tkvm_exit_anon_inodes();')
>> +            kvm_init = False
>> +        if match(r'^void kvm_exit'): kvm_exit = True
>> +        if match(r'\}') and kvm_exit:
>>              w('\tpreempt_notifier_sys_exit();')
>> -            anon_inodes_exit = False
>> +            w('\tkvm_exit_srcu();')
>> +            w('\tkvm_exit_anon_inodes();')
>> +            kvm_exit = False
>>          if match(r'^int kvm_arch_init'): kvm_arch_init = True
>>          if match(r'\btsc_khz\b') and kvm_arch_init:
>>              line = sub(r'\btsc_khz\b', 'kvm_tsc_khz', line)
>> -- 
>> 1.6.0.2
>>
> 
> 

Thanks for review!
Jan

[1] http://www.linux-kvm.org/page/Getting_the_kvm_kernel_modules


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-14  2:13                   ` Paul E. McKenney
@ 2010-01-15  8:58                     ` Jan Kiszka
  2010-01-18  2:09                       ` Paul E. McKenney
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kiszka @ 2010-01-15  8:58 UTC (permalink / raw)
  To: paulmck
  Cc: Avi Kivity, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael Goldish, Eduardo Habkost, Dor Laor

[-- Attachment #1: Type: text/plain, Size: 807 bytes --]

Paul E. McKenney wrote:
> On Thu, Jan 14, 2010 at 01:11:33AM +0100, Jan Kiszka wrote:
>> Paul E. McKenney wrote:
>>> On Tue, Jan 12, 2010 at 09:28:15AM +0100, Jan Kiszka wrote:
>>>> If so, I will try to write something like this the next days. Will
>>>> surely appreciate your review afterwards!
>>> Sounds good!
>>>
>> Here we go, find the commit inlined below. You may be primarily
>> interested in kvm_synchronize_sched_expedited() and the helper thread
>> function kvm_rcu_sync_thread. As I use a dedicated thread I simplified
>> some parts compared to upstream. Light testing indicates that it works.
> 
> Cool!!!
> 
> Have you had a chance to run rcutorture on this implementation?
> 

No, this would mean backporting rcutorture - is it straightforward
(limited time...)?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: qemu-kvm.git build problem
  2010-01-15  8:58                     ` Jan Kiszka
@ 2010-01-18  2:09                       ` Paul E. McKenney
  0 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2010-01-18  2:09 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael Goldish, Eduardo Habkost, Dor Laor

On Fri, Jan 15, 2010 at 09:58:40AM +0100, Jan Kiszka wrote:
> Paul E. McKenney wrote:
> > On Thu, Jan 14, 2010 at 01:11:33AM +0100, Jan Kiszka wrote:
> >> Paul E. McKenney wrote:
> >>> On Tue, Jan 12, 2010 at 09:28:15AM +0100, Jan Kiszka wrote:
> >>>> If so, I will try to write something like this the next days. Will
> >>>> surely appreciate your review afterwards!
> >>> Sounds good!
> >>>
> >> Here we go, find the commit inlined below. You may be primarily
> >> interested in kvm_synchronize_sched_expedited() and the helper thread
> >> function kvm_rcu_sync_thread. As I use a dedicated thread I simplified
> >> some parts compared to upstream. Light testing indicates that it works.
> > 
> > Cool!!!
> > 
> > Have you had a chance to run rcutorture on this implementation?
> > 
> 
> No, this would mean backporting rcutorture - is it straightforward
> (limited time...)?

If you aren't backporting too far, the following commit should do it for
you: 804bb8370522a569bd3a732b9de5fbd55e26f155

I do not believe that you should need to run rcutorture on all possible
old versions -- instead, pick some recent ones.  Please let me know if
this does not work out for you.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-01-18  6:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-28 18:40 qemu-kvm.git build problem Lucas Meneghel Rodrigues
2009-12-28 19:55 ` Avi Kivity
2010-01-09  5:23 ` Lucas Meneghel Rodrigues
2010-01-11 10:13   ` Jan Kiszka
2010-01-11 10:18     ` Avi Kivity
2010-01-11 10:21       ` Jan Kiszka
2010-01-12  0:23         ` Jan Kiszka
2010-01-12  0:51           ` Paul E. McKenney
2010-01-12  8:28             ` Jan Kiszka
2010-01-12 13:50               ` Paul E. McKenney
2010-01-14  0:11                 ` Jan Kiszka
2010-01-14  2:13                   ` Paul E. McKenney
2010-01-15  8:58                     ` Jan Kiszka
2010-01-18  2:09                       ` Paul E. McKenney
2010-01-14 20:02                   ` Lucas Meneghel Rodrigues
2010-01-14 23:48                   ` Paul E. McKenney
2010-01-15  8:56                     ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.