From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752417AbbCKLDY (ORCPT ); Wed, 11 Mar 2015 07:03:24 -0400 Received: from mailapp01.imgtec.com ([195.59.15.196]:24379 "EHLO imgpgp01.kl.imgtec.org" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751516AbbCKLDV (ORCPT ); Wed, 11 Mar 2015 07:03:21 -0400 X-PGP-Universal: processed; by imgpgp01.kl.imgtec.org on Wed, 11 Mar 2015 11:03:19 +0000 Message-ID: <550020F6.6020105@imgtec.com> Date: Wed, 11 Mar 2015 11:03:18 +0000 From: James Hogan User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: CC: , , , , , , , , , , , , , , , , Subject: Re: [PATCH tip/core/rcu 04/20] metag: Use common outgoing-CPU-notification code References: <20150303174144.GA13139@linux.vnet.ibm.com> <1425404595-17816-1-git-send-email-paulmck@linux.vnet.ibm.com> <1425404595-17816-4-git-send-email-paulmck@linux.vnet.ibm.com> <54FF0E22.3010904@imgtec.com> <20150310165935.GR5708@linux.vnet.ibm.com> In-Reply-To: <20150310165935.GR5708@linux.vnet.ibm.com> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DEqmRfTvCpmbBXsewBbeeuMC9fJ8D8Rjh" X-Originating-IP: [192.168.154.110] X-ESG-ENCRYPT-TAG: da4c5968 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --DEqmRfTvCpmbBXsewBbeeuMC9fJ8D8Rjh Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 10/03/15 16:59, Paul E. McKenney wrote: > On Tue, Mar 10, 2015 at 03:30:42PM +0000, James Hogan wrote: >> Hi Paul, >> >> On 03/03/15 17:42, Paul E. McKenney wrote: >>> From: "Paul E. McKenney" >>> >>> This commit removes the open-coded CPU-offline notification with new >>> common code. This change avoids calling scheduler code using RCU fro= m >>> an offline CPU that RCU is ignoring. This commit is compatible with >>> the existing code in not checking for timeout during a prior offline >>> for a given CPU. >>> >>> Signed-off-by: Paul E. McKenney >>> Cc: James Hogan >>> Cc: >> >> I gave this a try via linux-next, but unfortunately it causes the >> following warning every time a CPU goes down: >> META213-Thread0 DSP [LogF] CPU1: unable to kill >=20 > That is certainly not what I had in mind, thank you for finding this! >=20 >> If I add printks, I see that the state on entry to both cpu_wait_death= >> and cpu_report_death is already CPU_POST_DEAD, suggesting that it hasn= 't >> changed from its initial value. >> >> Should arches other than x86 now be calling cpu_set_state_online()? Th= e >> patchlet below seems to resolve it for Meta (not sure if that is the >> best place in the startup sequence to do it, perhaps it doesn't matter= ). >> >> diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c >> index ac3a199e33e7..430e379ec71f 100644 >> --- a/arch/metag/kernel/smp.c >> +++ b/arch/metag/kernel/smp.c >> @@ -383,6 +383,7 @@ asmlinkage void secondary_start_kernel(void) >> * OK, now it's safe to let the boot CPU continue >> */ >> set_cpu_online(cpu, true); >> + cpu_set_state_online(cpu); >> complete(&cpu_running); >> =20 >> /* >> >> Looking at the comment before cpu_set_state_online: >>> /* >>> * Mark the specified CPU online. >>> * >>> * Note that it is permissible to omit this call entirely, as is >>> * done in architectures that do no CPU-hotplug error checking. >>> */ >> >> Which suggests it wasn't wrong to omit it before your patches came >> along. >=20 > And that suggestion is quite correct. The idea was indeed to accommoda= te > architectures that do not do error checking. >=20 > Does the following patch (on top of current -next) remove the need for > your addition of cpu_set_state_online() above? Don't forget the "oldstate =3D=3D ", otherwise it'll work for the wrong reason :-/ Checking for CPU_POST_DEAD does seem to fix the immediate problem, however this still leaves open the possibility of a single timeout propagating to all further offlines after CPU_DEAD_FROZEN gets set. I've confirmed that by adding a delay loop only on the second cpu_report_death() call, and sure enough the 2nd and further offlines all fail even though the CPU stops immediately after the 2nd one. If this check is primarily so that CPU_DEAD_FROZEN is set if cpu_wait_death timed out, would it be better to instead check explicitly for CPU_BROKEN? diff --git a/kernel/smpboot.c b/kernel/smpboot.c index 18688e0b0422..c697f73d82d6 100644 --- a/kernel/smpboot.c +++ b/kernel/smpboot.c @@ -460,7 +460,7 @@ bool cpu_report_death(void) =20 do { oldstate =3D atomic_read(&per_cpu(cpu_hotplug_state, cpu)); - if (oldstate =3D=3D CPU_ONLINE) + if (oldstate !=3D CPU_BROKEN) newstate =3D CPU_DEAD; else newstate =3D CPU_DEAD_FROZEN; Cheers James >=20 > Thanx, Paul >=20 > -----------------------------------------------------------------------= - >=20 > diff --git a/kernel/smpboot.c b/kernel/smpboot.c > index 18688e0b0422..80400e019c86 100644 > --- a/kernel/smpboot.c > +++ b/kernel/smpboot.c > @@ -460,7 +460,7 @@ bool cpu_report_death(void) > =20 > do { > oldstate =3D atomic_read(&per_cpu(cpu_hotplug_state, cpu)); > - if (oldstate =3D=3D CPU_ONLINE) > + if (oldstate =3D=3D CPU_ONLINE || CPU_POST_DEAD) > newstate =3D CPU_DEAD; > else > newstate =3D CPU_DEAD_FROZEN; >=20 --DEqmRfTvCpmbBXsewBbeeuMC9fJ8D8Rjh Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJVACD2AAoJEGwLaZPeOHZ6xPkQAKPyBnoIVmCAFLBHf8S5G0D7 5vKTR5adnWd54OVvhciv7TGc1gyBjaTg5GuxodeZ3X8FWrKROD2h90uFM3sNDg4n vwqFBItU6fB+216iNCWsYh76cp+g3TURMipDYEYwhITSRrLpRM/SbP3dT03UJokm gflJbwhVoDwuALbk35BWgtYF74/94Ck5jC8BI9HjaPUgl+TSZ8Ovdv5bkmtX3Jy8 llACNJihzJQEKc84sXqIY6kcRcuo8onOPvJ8mYqUFNDDUJqnspd2nfwZACQxaHiH gCgkK9hQdl1BNWW1sVBhbg1naje5pkU1+kM3+v+JyDHEgVrqVLkE8Fi6wh/Ef3Z3 qZ+HMGx3jcMu0uVnPTVsKTXfQFkVZyb6gWp5BgRHFbvUeZZfUvKzAienwC+gEsiq 5FjQAdK+2WPWXs4d1VYxysRn+ekzVBXEBW5R6eXSr8NOoaLpG7OKTsEgcxU+FE1l SUiZBeiiozDJ8wlABV5oOE8I/+U/XdHiZ17sgFB327h5/jHSyC6p0D0kuyGGi/w2 HVpu9G25P0pn20UmgHLdPnUqgQ5MYsut/awK9ZEBFLIgsHl4IRLJQxASXoNCNQ6F ei+Rr+S7TqpJl6eU64CUHesrRQWioP1LVI2yZ3ccZ2TGdb0WWlWlfgWDUaVY8bGs WI2q+Cpqh6MLU7BpuuYB =Np2Q -----END PGP SIGNATURE----- --DEqmRfTvCpmbBXsewBbeeuMC9fJ8D8Rjh-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Hogan Subject: Re: [PATCH tip/core/rcu 04/20] metag: Use common outgoing-CPU-notification code Date: Wed, 11 Mar 2015 11:03:18 +0000 Message-ID: <550020F6.6020105@imgtec.com> References: <20150303174144.GA13139@linux.vnet.ibm.com> <1425404595-17816-1-git-send-email-paulmck@linux.vnet.ibm.com> <1425404595-17816-4-git-send-email-paulmck@linux.vnet.ibm.com> <54FF0E22.3010904@imgtec.com> <20150310165935.GR5708@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DEqmRfTvCpmbBXsewBbeeuMC9fJ8D8Rjh" Return-path: In-Reply-To: <20150310165935.GR5708-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: linux-metag-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: To: paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, laijs-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org, dipankar-xthvdsQ13ZrQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org, josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org, tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org, peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org, rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org, dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, edumazet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org, fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, bobby.prani-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-metag-u79uwXL29TY76Z2rM5mHXA@public.gmane.org --DEqmRfTvCpmbBXsewBbeeuMC9fJ8D8Rjh Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 10/03/15 16:59, Paul E. McKenney wrote: > On Tue, Mar 10, 2015 at 03:30:42PM +0000, James Hogan wrote: >> Hi Paul, >> >> On 03/03/15 17:42, Paul E. McKenney wrote: >>> From: "Paul E. McKenney" >>> >>> This commit removes the open-coded CPU-offline notification with new >>> common code. This change avoids calling scheduler code using RCU fro= m >>> an offline CPU that RCU is ignoring. This commit is compatible with >>> the existing code in not checking for timeout during a prior offline >>> for a given CPU. >>> >>> Signed-off-by: Paul E. McKenney >>> Cc: James Hogan >>> Cc: >> >> I gave this a try via linux-next, but unfortunately it causes the >> following warning every time a CPU goes down: >> META213-Thread0 DSP [LogF] CPU1: unable to kill >=20 > That is certainly not what I had in mind, thank you for finding this! >=20 >> If I add printks, I see that the state on entry to both cpu_wait_death= >> and cpu_report_death is already CPU_POST_DEAD, suggesting that it hasn= 't >> changed from its initial value. >> >> Should arches other than x86 now be calling cpu_set_state_online()? Th= e >> patchlet below seems to resolve it for Meta (not sure if that is the >> best place in the startup sequence to do it, perhaps it doesn't matter= ). >> >> diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c >> index ac3a199e33e7..430e379ec71f 100644 >> --- a/arch/metag/kernel/smp.c >> +++ b/arch/metag/kernel/smp.c >> @@ -383,6 +383,7 @@ asmlinkage void secondary_start_kernel(void) >> * OK, now it's safe to let the boot CPU continue >> */ >> set_cpu_online(cpu, true); >> + cpu_set_state_online(cpu); >> complete(&cpu_running); >> =20 >> /* >> >> Looking at the comment before cpu_set_state_online: >>> /* >>> * Mark the specified CPU online. >>> * >>> * Note that it is permissible to omit this call entirely, as is >>> * done in architectures that do no CPU-hotplug error checking. >>> */ >> >> Which suggests it wasn't wrong to omit it before your patches came >> along. >=20 > And that suggestion is quite correct. The idea was indeed to accommoda= te > architectures that do not do error checking. >=20 > Does the following patch (on top of current -next) remove the need for > your addition of cpu_set_state_online() above? Don't forget the "oldstate =3D=3D ", otherwise it'll work for the wrong reason :-/ Checking for CPU_POST_DEAD does seem to fix the immediate problem, however this still leaves open the possibility of a single timeout propagating to all further offlines after CPU_DEAD_FROZEN gets set. I've confirmed that by adding a delay loop only on the second cpu_report_death() call, and sure enough the 2nd and further offlines all fail even though the CPU stops immediately after the 2nd one. If this check is primarily so that CPU_DEAD_FROZEN is set if cpu_wait_death timed out, would it be better to instead check explicitly for CPU_BROKEN? diff --git a/kernel/smpboot.c b/kernel/smpboot.c index 18688e0b0422..c697f73d82d6 100644 --- a/kernel/smpboot.c +++ b/kernel/smpboot.c @@ -460,7 +460,7 @@ bool cpu_report_death(void) =20 do { oldstate =3D atomic_read(&per_cpu(cpu_hotplug_state, cpu)); - if (oldstate =3D=3D CPU_ONLINE) + if (oldstate !=3D CPU_BROKEN) newstate =3D CPU_DEAD; else newstate =3D CPU_DEAD_FROZEN; Cheers James >=20 > Thanx, Paul >=20 > -----------------------------------------------------------------------= - >=20 > diff --git a/kernel/smpboot.c b/kernel/smpboot.c > index 18688e0b0422..80400e019c86 100644 > --- a/kernel/smpboot.c > +++ b/kernel/smpboot.c > @@ -460,7 +460,7 @@ bool cpu_report_death(void) > =20 > do { > oldstate =3D atomic_read(&per_cpu(cpu_hotplug_state, cpu)); > - if (oldstate =3D=3D CPU_ONLINE) > + if (oldstate =3D=3D CPU_ONLINE || CPU_POST_DEAD) > newstate =3D CPU_DEAD; > else > newstate =3D CPU_DEAD_FROZEN; >=20 --DEqmRfTvCpmbBXsewBbeeuMC9fJ8D8Rjh Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJVACD2AAoJEGwLaZPeOHZ6xPkQAKPyBnoIVmCAFLBHf8S5G0D7 5vKTR5adnWd54OVvhciv7TGc1gyBjaTg5GuxodeZ3X8FWrKROD2h90uFM3sNDg4n vwqFBItU6fB+216iNCWsYh76cp+g3TURMipDYEYwhITSRrLpRM/SbP3dT03UJokm gflJbwhVoDwuALbk35BWgtYF74/94Ck5jC8BI9HjaPUgl+TSZ8Ovdv5bkmtX3Jy8 llACNJihzJQEKc84sXqIY6kcRcuo8onOPvJ8mYqUFNDDUJqnspd2nfwZACQxaHiH gCgkK9hQdl1BNWW1sVBhbg1naje5pkU1+kM3+v+JyDHEgVrqVLkE8Fi6wh/Ef3Z3 qZ+HMGx3jcMu0uVnPTVsKTXfQFkVZyb6gWp5BgRHFbvUeZZfUvKzAienwC+gEsiq 5FjQAdK+2WPWXs4d1VYxysRn+ekzVBXEBW5R6eXSr8NOoaLpG7OKTsEgcxU+FE1l SUiZBeiiozDJ8wlABV5oOE8I/+U/XdHiZ17sgFB327h5/jHSyC6p0D0kuyGGi/w2 HVpu9G25P0pn20UmgHLdPnUqgQ5MYsut/awK9ZEBFLIgsHl4IRLJQxASXoNCNQ6F ei+Rr+S7TqpJl6eU64CUHesrRQWioP1LVI2yZ3ccZ2TGdb0WWlWlfgWDUaVY8bGs WI2q+Cpqh6MLU7BpuuYB =Np2Q -----END PGP SIGNATURE----- --DEqmRfTvCpmbBXsewBbeeuMC9fJ8D8Rjh-- -- To unsubscribe from this list: send the line "unsubscribe linux-metag" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html