From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code Date: Thu, 19 Mar 2015 10:03:57 +0000 Message-ID: <1426759435.2560.104.camel@citrix.com> References: <20150313181109.GA3179@gmail.com> <55032C85.6090805@citrix.com> <550336EE.60909@eu.citrix.com> <5506DF40020000780006A407@mail.emea.novell.com> <5506D1E1.8050404@eu.citrix.com> <5506E11F020000780006A426@mail.emea.novell.com> <1426616308.32500.98.camel@citrix.com> <55093DD1020000780006B1F2@mail.emea.novell.com> <1426668836.2560.13.camel@citrix.com> <55099940.7080007@eu.citrix.com> <1426697366.2560.65.camel@citrix.com> <5509B066.5050302@eu.citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6708370112534314171==" Return-path: In-Reply-To: <5509B066.5050302@eu.citrix.com> Content-Language: en-US List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org Cc: Andrew Cooper , "xen-devel@lists.xen.org" , George Dunlap , "JBeulich@suse.com" , "uma.sharma523@gmail.com" List-Id: xen-devel@lists.xenproject.org --===============6708370112534314171== Content-Language: en-US Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-v8d4BvgqYddX2yA3TfLn" --=-v8d4BvgqYddX2yA3TfLn Content-Type: multipart/mixed; boundary="=-rTgfKNuS8yY/l8rloyNk" --=-rTgfKNuS8yY/l8rloyNk Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 2015-03-18 at 17:05 +0000, George Dunlap wrote: > On 03/18/2015 04:49 PM, Dario Faggioli wrote: > >> If this would work, I think it would be a lot better. It would remove > >> the credit2-specific callback, and it would mean we wouldn't need an > >> additional test to filter out > >> > > I see. Ok, I guess I can give this a try. If it does not explode, > > because some weird dependency we can't see right now, we can either try > > harder to track it, or use the other solution outlined above as a > > fallback. >=20 > If you could look into this, that would be awesome. :-) >=20 So, I did look into this. In summary, I think it could work, but some rework is required. Here's my findings in some more details. So, with the quick-&-dirty(TM) patch attached to this email, Credit2 works-a-charm: root@Zhaman:~# xl dmesg |grep runqueue (XEN) Adding cpu 0 to runqueue 0 (XEN) First cpu on runqueue, activating (XEN) Adding cpu 1 to runqueue 1 (XEN) First cpu on runqueue, activating (XEN) Adding cpu 2 to runqueue 1 (XEN) Adding cpu 3 to runqueue 1 (XEN) Adding cpu 4 to runqueue 1 (XEN) Adding cpu 5 to runqueue 1 (XEN) Adding cpu 6 to runqueue 1 (XEN) Adding cpu 7 to runqueue 1 (XEN) Adding cpu 8 to runqueue 0 (XEN) Adding cpu 9 to runqueue 0 (XEN) Adding cpu 10 to runqueue 0 (XEN) Adding cpu 11 to runqueue 0 (XEN) Adding cpu 12 to runqueue 0 (XEN) Adding cpu 13 to runqueue 0 (XEN) Adding cpu 14 to runqueue 0 (XEN) Adding cpu 15 to runqueue 0 (Yes, cpu0 is assigned to the wrong runqueue, but that is because I'm ignoring the system_state <=3D=3D> boot_cpu_data thing for now, so let's also ignore this, it'll be fixed). So, things are working, less complexity (i.e., no ad-hoc callback for credit2), less code (i.e., lots of '-' in the patch), outside is warm and sunny, spring is coming, birds are singing, etc... :-D :-D Unfortunately, the patch also makes Credit1 go splat like this: (XEN) Xen call trace: (XEN) [] check_lock+0x37/0x3b (XEN) [] _spin_lock+0x11/0x54 (XEN) [] xmem_pool_alloc+0x11c/0x46c (XEN) [] _xmalloc+0x106/0x316 (XEN) [] _xzalloc+0x11/0x32 (XEN) [] csched_alloc_pdata+0x47/0x1c6 (XEN) [] cpu_schedule_callback+0x5a/0xcc (XEN) [] notifier_call_chain+0x67/0x87 (XEN) [] notify_cpu_starting+0x1c/0x24 (XEN) [] start_secondary+0x203/0x256 (XEN)=20 (XEN)=20 (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) Xen BUG at spinlock.c:48 (XEN) **************************************** (...and, suddenly, birds shut up :-( ) That is because pool->lock in xmem_pool_alloc() wants to find IRQs enabled when acquired, as it's been acquired with IRQs enabled before, and we don't mix IRQ enabled and disabled spinlocks (that's what the BUG at spinlock:48 is for). The difference between before and after the patch is that, bebore, csched_alloc_pdata() is called during CPU_UP_PREPARE, after, during CPU_STARTING. More specifically, CPU_UP_PREPARE callbacks are invoked as a consequence of cpu_up() being called in a loop in __start_xen(), and it itself calls notifier_call_chain(..CPU_UP_PREPARE..). This means they run: - on the boot cpu; - with interrupt enabled (see how the call to local_irq_enable() in __start_xen() is *before* the for_each_present_cpu() { cpu_up(i); } loop). OTOH, CPU_STARTING callbacks run: - on the cpu being brought up; - with interrupt disabled (see how the call to local_irq_enable(), in start_secondary(), is *after* the invocation of notify_cpu_starting()). Here we are. And the reason why things works ok in Credit2, is that csched2_alloc_pdata() doesn't really allocate anything! In fact, in general, handling alloc_pdata() during CPU_STARTING would mean that we can't allocate any memory which, given the name of the function, would look rather odd. :-) Nevertheless I see the value of doing so, and hence I think what we could do would be to introduce a new hook in the scheduler interface, called .init_pdata or .init_pcpu, and, in sched_*.c, split the allocation and the initialization parts. The former will be handled during CPU_UP_PREPARE, when allocation is possible, the latter during CPU_STARTING, when we have more info available to perform actual initializations. For Credit2, alloc_pdata() wouldn't even need to exist, and all the work will be done in init_pcpu(), so less complexity than now, and no need for the ad-hoc starting callback inside sched_credit2.c. For Credit1, both will be required, but the added complexity in sched_credit.c doesn't look too bad at all to me. Of couse, I still have to code it up, and see whether this plays well with cpupools, cpu on/offlining, etc., but I'd like to hear whether you like the idea, before getting to do that. :-) Let me know what you think. Regards, Dario --- sched_credit2.c | 63 ++-------------------------------------------------= ----- schedule.c | 10 +++++--- 2 files changed, 9 insertions(+), 64 deletions(-) --- diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index ad0a5d4..ac7a7f2 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -1918,7 +1918,8 @@ static void deactivate_runqueue(struct csched2_privat= e *prv, int rqi) cpumask_clear_cpu(rqi, &prv->active_queues); } =20 -static void init_pcpu(const struct scheduler *ops, int cpu) +static void * +csched2_alloc_pdata(const struct scheduler *ops, int cpu) { int rqi; unsigned long flags; @@ -1932,7 +1933,7 @@ static void init_pcpu(const struct scheduler *ops, in= t cpu) { printk("%s: Strange, cpu %d already initialized!\n", __func__, cpu= ); spin_unlock_irqrestore(&prv->lock, flags); - return; + return NULL; } =20 /* Figure out which runqueue to put it in */ @@ -1980,20 +1981,6 @@ static void init_pcpu(const struct scheduler *ops, i= nt cpu) =20 spin_unlock_irqrestore(&prv->lock, flags); =20 - return; -} - -static void * -csched2_alloc_pdata(const struct scheduler *ops, int cpu) -{ - /* Check to see if the cpu is online yet */ - /* Note: cpu 0 doesn't get a STARTING callback */ - if ( cpu =3D=3D 0 || cpu_to_socket(cpu) >=3D 0 ) - init_pcpu(ops, cpu); - else - printk("%s: cpu %d not online yet, deferring initializatgion\n", - __func__, cpu); - return (void *)1; } =20 @@ -2046,49 +2033,6 @@ csched2_free_pdata(const struct scheduler *ops, void= *pcpu, int cpu) } =20 static int -csched2_cpu_starting(int cpu) -{ - struct scheduler *ops; - - /* Hope this is safe from cpupools switching things around. :-) */ - ops =3D per_cpu(scheduler, cpu); - - if ( ops->alloc_pdata =3D=3D csched2_alloc_pdata ) - init_pcpu(ops, cpu); - - return NOTIFY_DONE; -} - -static int cpu_credit2_callback( - struct notifier_block *nfb, unsigned long action, void *hcpu) -{ - unsigned int cpu =3D (unsigned long)hcpu; - int rc =3D 0; - - switch ( action ) - { - case CPU_STARTING: - csched2_cpu_starting(cpu); - break; - default: - break; - } - - return !rc ? NOTIFY_DONE : notifier_from_errno(rc); -} - -static struct notifier_block cpu_credit2_nfb =3D { - .notifier_call =3D cpu_credit2_callback -}; - -static int -csched2_global_init(void) -{ - register_cpu_notifier(&cpu_credit2_nfb); - return 0; -} - -static int csched2_init(struct scheduler *ops) { int i; @@ -2168,7 +2112,6 @@ const struct scheduler sched_credit2_def =3D { =20 .dump_cpu_state =3D csched2_dump_pcpu, .dump_settings =3D csched2_dump, - .global_init =3D csched2_global_init, .init =3D csched2_init, .deinit =3D csched2_deinit, .alloc_vdata =3D csched2_alloc_vdata, diff --git a/xen/common/schedule.c b/xen/common/schedule.c index ef79847..5e7cdc9 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -1327,10 +1327,6 @@ static int cpu_schedule_up(unsigned int cpu) if ( idle_vcpu[cpu] =3D=3D NULL ) return -ENOMEM; =20 - if ( (ops.alloc_pdata !=3D NULL) && - ((sd->sched_priv =3D ops.alloc_pdata(&ops, cpu)) =3D=3D NULL) ) - return -ENOMEM; - return 0; } =20 @@ -1348,10 +1344,16 @@ static int cpu_schedule_callback( struct notifier_block *nfb, unsigned long action, void *hcpu) { unsigned int cpu =3D (unsigned long)hcpu; + struct schedule_data *sd =3D &per_cpu(schedule_data, cpu); int rc =3D 0; =20 switch ( action ) { + case CPU_STARTING: + if ( (ops.alloc_pdata !=3D NULL) && + ((sd->sched_priv =3D ops.alloc_pdata(&ops, cpu)) =3D=3D NULL)= ) + return -ENOMEM; + break; case CPU_UP_PREPARE: rc =3D cpu_schedule_up(cpu); break; --=-rTgfKNuS8yY/l8rloyNk Content-Disposition: attachment; filename="xen-sched-cpustarting.patch" Content-Type: text/x-patch; name="xen-sched-cpustarting.patch"; charset="UTF-8" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL3hlbi9jb21tb24vc2NoZWRfY3JlZGl0Mi5jIGIveGVuL2NvbW1vbi9zY2hl ZF9jcmVkaXQyLmMKaW5kZXggYWQwYTVkNC4uYWM3YTdmMiAxMDA2NDQKLS0tIGEveGVuL2NvbW1v bi9zY2hlZF9jcmVkaXQyLmMKKysrIGIveGVuL2NvbW1vbi9zY2hlZF9jcmVkaXQyLmMKQEAgLTE5 MTgsNyArMTkxOCw4IEBAIHN0YXRpYyB2b2lkIGRlYWN0aXZhdGVfcnVucXVldWUoc3RydWN0IGNz Y2hlZDJfcHJpdmF0ZSAqcHJ2LCBpbnQgcnFpKQogICAgIGNwdW1hc2tfY2xlYXJfY3B1KHJxaSwg JnBydi0+YWN0aXZlX3F1ZXVlcyk7CiB9CiAKLXN0YXRpYyB2b2lkIGluaXRfcGNwdShjb25zdCBz dHJ1Y3Qgc2NoZWR1bGVyICpvcHMsIGludCBjcHUpCitzdGF0aWMgdm9pZCAqCitjc2NoZWQyX2Fs bG9jX3BkYXRhKGNvbnN0IHN0cnVjdCBzY2hlZHVsZXIgKm9wcywgaW50IGNwdSkKIHsKICAgICBp bnQgcnFpOwogICAgIHVuc2lnbmVkIGxvbmcgZmxhZ3M7CkBAIC0xOTMyLDcgKzE5MzMsNyBAQCBz dGF0aWMgdm9pZCBpbml0X3BjcHUoY29uc3Qgc3RydWN0IHNjaGVkdWxlciAqb3BzLCBpbnQgY3B1 KQogICAgIHsKICAgICAgICAgcHJpbnRrKCIlczogU3RyYW5nZSwgY3B1ICVkIGFscmVhZHkgaW5p dGlhbGl6ZWQhXG4iLCBfX2Z1bmNfXywgY3B1KTsKICAgICAgICAgc3Bpbl91bmxvY2tfaXJxcmVz dG9yZSgmcHJ2LT5sb2NrLCBmbGFncyk7Ci0gICAgICAgIHJldHVybjsKKyAgICAgICAgcmV0dXJu IE5VTEw7CiAgICAgfQogCiAgICAgLyogRmlndXJlIG91dCB3aGljaCBydW5xdWV1ZSB0byBwdXQg aXQgaW4gKi8KQEAgLTE5ODAsMjAgKzE5ODEsNiBAQCBzdGF0aWMgdm9pZCBpbml0X3BjcHUoY29u c3Qgc3RydWN0IHNjaGVkdWxlciAqb3BzLCBpbnQgY3B1KQogCiAgICAgc3Bpbl91bmxvY2tfaXJx cmVzdG9yZSgmcHJ2LT5sb2NrLCBmbGFncyk7CiAKLSAgICByZXR1cm47Ci19Ci0KLXN0YXRpYyB2 b2lkICoKLWNzY2hlZDJfYWxsb2NfcGRhdGEoY29uc3Qgc3RydWN0IHNjaGVkdWxlciAqb3BzLCBp bnQgY3B1KQotewotICAgIC8qIENoZWNrIHRvIHNlZSBpZiB0aGUgY3B1IGlzIG9ubGluZSB5ZXQg Ki8KLSAgICAvKiBOb3RlOiBjcHUgMCBkb2Vzbid0IGdldCBhIFNUQVJUSU5HIGNhbGxiYWNrICov Ci0gICAgaWYgKCBjcHUgPT0gMCB8fCBjcHVfdG9fc29ja2V0KGNwdSkgPj0gMCApCi0gICAgICAg IGluaXRfcGNwdShvcHMsIGNwdSk7Ci0gICAgZWxzZQotICAgICAgICBwcmludGsoIiVzOiBjcHUg JWQgbm90IG9ubGluZSB5ZXQsIGRlZmVycmluZyBpbml0aWFsaXphdGdpb25cbiIsCi0gICAgICAg ICAgICAgICBfX2Z1bmNfXywgY3B1KTsKLQogICAgIHJldHVybiAodm9pZCAqKTE7CiB9CiAKQEAg LTIwNDYsNDkgKzIwMzMsNiBAQCBjc2NoZWQyX2ZyZWVfcGRhdGEoY29uc3Qgc3RydWN0IHNjaGVk dWxlciAqb3BzLCB2b2lkICpwY3B1LCBpbnQgY3B1KQogfQogCiBzdGF0aWMgaW50Ci1jc2NoZWQy X2NwdV9zdGFydGluZyhpbnQgY3B1KQotewotICAgIHN0cnVjdCBzY2hlZHVsZXIgKm9wczsKLQot ICAgIC8qIEhvcGUgdGhpcyBpcyBzYWZlIGZyb20gY3B1cG9vbHMgc3dpdGNoaW5nIHRoaW5ncyBh cm91bmQuIDotKSAqLwotICAgIG9wcyA9IHBlcl9jcHUoc2NoZWR1bGVyLCBjcHUpOwotCi0gICAg aWYgKCBvcHMtPmFsbG9jX3BkYXRhID09IGNzY2hlZDJfYWxsb2NfcGRhdGEgKQotICAgICAgICBp bml0X3BjcHUob3BzLCBjcHUpOwotCi0gICAgcmV0dXJuIE5PVElGWV9ET05FOwotfQotCi1zdGF0 aWMgaW50IGNwdV9jcmVkaXQyX2NhbGxiYWNrKAotICAgIHN0cnVjdCBub3RpZmllcl9ibG9jayAq bmZiLCB1bnNpZ25lZCBsb25nIGFjdGlvbiwgdm9pZCAqaGNwdSkKLXsKLSAgICB1bnNpZ25lZCBp bnQgY3B1ID0gKHVuc2lnbmVkIGxvbmcpaGNwdTsKLSAgICBpbnQgcmMgPSAwOwotCi0gICAgc3dp dGNoICggYWN0aW9uICkKLSAgICB7Ci0gICAgY2FzZSBDUFVfU1RBUlRJTkc6Ci0gICAgICAgIGNz Y2hlZDJfY3B1X3N0YXJ0aW5nKGNwdSk7Ci0gICAgICAgIGJyZWFrOwotICAgIGRlZmF1bHQ6Ci0g ICAgICAgIGJyZWFrOwotICAgIH0KLQotICAgIHJldHVybiAhcmMgPyBOT1RJRllfRE9ORSA6IG5v dGlmaWVyX2Zyb21fZXJybm8ocmMpOwotfQotCi1zdGF0aWMgc3RydWN0IG5vdGlmaWVyX2Jsb2Nr IGNwdV9jcmVkaXQyX25mYiA9IHsKLSAgICAubm90aWZpZXJfY2FsbCA9IGNwdV9jcmVkaXQyX2Nh bGxiYWNrCi19OwotCi1zdGF0aWMgaW50Ci1jc2NoZWQyX2dsb2JhbF9pbml0KHZvaWQpCi17Ci0g ICAgcmVnaXN0ZXJfY3B1X25vdGlmaWVyKCZjcHVfY3JlZGl0Ml9uZmIpOwotICAgIHJldHVybiAw OwotfQotCi1zdGF0aWMgaW50CiBjc2NoZWQyX2luaXQoc3RydWN0IHNjaGVkdWxlciAqb3BzKQog ewogICAgIGludCBpOwpAQCAtMjE2OCw3ICsyMTEyLDYgQEAgY29uc3Qgc3RydWN0IHNjaGVkdWxl ciBzY2hlZF9jcmVkaXQyX2RlZiA9IHsKIAogICAgIC5kdW1wX2NwdV9zdGF0ZSA9IGNzY2hlZDJf ZHVtcF9wY3B1LAogICAgIC5kdW1wX3NldHRpbmdzICA9IGNzY2hlZDJfZHVtcCwKLSAgICAuZ2xv YmFsX2luaXQgICAgPSBjc2NoZWQyX2dsb2JhbF9pbml0LAogICAgIC5pbml0ICAgICAgICAgICA9 IGNzY2hlZDJfaW5pdCwKICAgICAuZGVpbml0ICAgICAgICAgPSBjc2NoZWQyX2RlaW5pdCwKICAg ICAuYWxsb2NfdmRhdGEgICAgPSBjc2NoZWQyX2FsbG9jX3ZkYXRhLApkaWZmIC0tZ2l0IGEveGVu L2NvbW1vbi9zY2hlZHVsZS5jIGIveGVuL2NvbW1vbi9zY2hlZHVsZS5jCmluZGV4IGVmNzk4NDcu LjVlN2NkYzkgMTAwNjQ0Ci0tLSBhL3hlbi9jb21tb24vc2NoZWR1bGUuYworKysgYi94ZW4vY29t bW9uL3NjaGVkdWxlLmMKQEAgLTEzMjcsMTAgKzEzMjcsNiBAQCBzdGF0aWMgaW50IGNwdV9zY2hl ZHVsZV91cCh1bnNpZ25lZCBpbnQgY3B1KQogICAgIGlmICggaWRsZV92Y3B1W2NwdV0gPT0gTlVM TCApCiAgICAgICAgIHJldHVybiAtRU5PTUVNOwogCi0gICAgaWYgKCAob3BzLmFsbG9jX3BkYXRh ICE9IE5VTEwpICYmCi0gICAgICAgICAoKHNkLT5zY2hlZF9wcml2ID0gb3BzLmFsbG9jX3BkYXRh KCZvcHMsIGNwdSkpID09IE5VTEwpICkKLSAgICAgICAgcmV0dXJuIC1FTk9NRU07Ci0KICAgICBy ZXR1cm4gMDsKIH0KIApAQCAtMTM0OCwxMCArMTM0NCwxNiBAQCBzdGF0aWMgaW50IGNwdV9zY2hl ZHVsZV9jYWxsYmFjaygKICAgICBzdHJ1Y3Qgbm90aWZpZXJfYmxvY2sgKm5mYiwgdW5zaWduZWQg bG9uZyBhY3Rpb24sIHZvaWQgKmhjcHUpCiB7CiAgICAgdW5zaWduZWQgaW50IGNwdSA9ICh1bnNp Z25lZCBsb25nKWhjcHU7CisgICAgc3RydWN0IHNjaGVkdWxlX2RhdGEgKnNkID0gJnBlcl9jcHUo c2NoZWR1bGVfZGF0YSwgY3B1KTsKICAgICBpbnQgcmMgPSAwOwogCiAgICAgc3dpdGNoICggYWN0 aW9uICkKICAgICB7CisgICAgY2FzZSBDUFVfU1RBUlRJTkc6CisgICAgICAgIGlmICggKG9wcy5h bGxvY19wZGF0YSAhPSBOVUxMKSAmJgorICAgICAgICAgICAgICgoc2QtPnNjaGVkX3ByaXYgPSBv cHMuYWxsb2NfcGRhdGEoJm9wcywgY3B1KSkgPT0gTlVMTCkgKQorICAgICAgICAgICAgcmV0dXJu IC1FTk9NRU07CisgICAgICAgIGJyZWFrOwogICAgIGNhc2UgQ1BVX1VQX1BSRVBBUkU6CiAgICAg ICAgIHJjID0gY3B1X3NjaGVkdWxlX3VwKGNwdSk7CiAgICAgICAgIGJyZWFrOwo= --=-rTgfKNuS8yY/l8rloyNk-- --=-v8d4BvgqYddX2yA3TfLn Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlUKnwsACgkQk4XaBE3IOsTG0gCffX/+7q77j94GaEh7OmD5hg45 yBsAniBpI4N4CwO9exXOTMoeX7VybxIM =VdGP -----END PGP SIGNATURE----- --=-v8d4BvgqYddX2yA3TfLn-- --===============6708370112534314171== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============6708370112534314171==--