From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v4 2/3] libxl: print message how to recover from xl cpupool-cpu-remove errors Date: Thu, 14 Apr 2016 19:10:36 +0200 Message-ID: <1460653836.13871.176.camel@citrix.com> References: <1457587634-22819-1-git-send-email-jgross@suse.com> <1457587634-22819-3-git-send-email-jgross@suse.com> <22287.49160.37423.898465@mariner.uk.xensource.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2031578900754148597==" Return-path: In-Reply-To: <22287.49160.37423.898465@mariner.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Ian Jackson , Juergen Gross Cc: Wei Liu , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org --===============2031578900754148597== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-2Pgy5AME2VGH2vq1fTsg" --=-2Pgy5AME2VGH2vq1fTsg Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 2016-04-14 at 17:06 +0100, Ian Jackson wrote: > Juergen Gross writes ("[PATCH v4 2/3] libxl: print message how to > recover from xl cpupool-cpu-remove errors"): > >=20 > > An error occurring when calling "xl cpupool-cpu-remove" might leave > > the system in a state where a cpu is neither completely free nor in > > a cpupool. > Surely this is a bug.=C2=A0=C2=A0Can it not be avoided ? >=20 Not easily (and in general not with any patch that I'd consider appropriate for this phase of the release process), as it depends on transient situations in the hypervisor, such as lock contention on scheduling data structures. > > This can easily be repaired by adding the cpu via > > "xl cpupool-cpu-add" to the cpupool where it was removed from > > before. > > Print a message telling this the user in case of an error. > ... > >=20 > > -=C2=A0=C2=A0=C2=A0=C2=A0if (libxl_cpupool_cpuremove_cpumap(ctx, poolid= , &cpumap)) > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0fprintf(stderr, "some = cpus may not have been removed from > > %s\n", pool); > > +=C2=A0=C2=A0=C2=A0=C2=A0if (libxl_cpupool_cpuremove_cpumap(ctx, poolid= , &cpumap)) { > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0fprintf(stderr, "Some = cpus may have not or only partially > > been removed from '%s'.\n", pool); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0fprintf(stderr, "If a = cpu can't be added to another > > cpupool, add it to '%s' again and retry.\n", pool); > > +=C2=A0=C2=A0=C2=A0=C2=A0} > If it can't be avoided then I guess this will have to do but I remain > to be convinced. >=20 And in fact, it's not something that is introduced by this series, which is, with this patch, just taking the chance to document things better (although, this series introduces one more way for the issue to occur). Doing some retries at levels lower than this would minimize the chance of the user actually getting to deal with the problem. For eaxmple, what's done in libxc... but as you pointed out, that introduces other problems, so I'm not sure. :-/ Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-2Pgy5AME2VGH2vq1fTsg Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlcPzwwACgkQk4XaBE3IOsQ1BQCfaFR3HdJA8P6WeB9WgE911tPs lqAAn1km2VG0+VmW1r0EXOY73QKBdQ/1 =z0E0 -----END PGP SIGNATURE----- --=-2Pgy5AME2VGH2vq1fTsg-- --===============2031578900754148597== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y Zy94ZW4tZGV2ZWwK --===============2031578900754148597==--