From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marek =?utf-8?Q?Marczykowski-G=C3=B3recki?= Subject: Re: [PATCH] libxl: trigger attach events for devices attached before xl devd startup Date: Mon, 11 Jul 2016 11:49:19 +0200 Message-ID: <20160711094919.GQ19103@mail-itl> References: <1468172147-8702-1-git-send-email-marmarek@invisiblethingslab.com> <20160711083117.r675kyztqrz5elob@mac> <20160711085604.GA31462@mail-itl> <20160711094318.4rq7fs4fzqeqn2tn@mac> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8022195340913022176==" Return-path: In-Reply-To: <20160711094318.4rq7fs4fzqeqn2tn@mac> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Roger Pau =?utf-8?B?TW9ubsOp?= Cc: Ian Jackson , Wei Liu , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org --===============8022195340913022176== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="K3Y3NTg/qyuIFs24" Content-Disposition: inline --K3Y3NTg/qyuIFs24 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jul 11, 2016 at 11:43:18AM +0200, Roger Pau Monn=C3=A9 wrote: > On Mon, Jul 11, 2016 at 10:56:04AM +0200, Marek Marczykowski-G=C3=B3recki= wrote: > > On Mon, Jul 11, 2016 at 10:31:17AM +0200, Roger Pau Monn=C3=A9 wrote: > > > On Sun, Jul 10, 2016 at 07:35:47PM +0200, Marek Marczykowski-G=C3=B3r= ecki wrote: > > > > When this daemon is started after creating backend device, that dev= ice > > > > will not be configured. > > > >=20 > > > > Racy situation: > > > > 1. driver domain is started > > > > 2. frontend domain is started (just after kicking driver domain off) > > > > 3. device in frontend domain is connected to the backend (as specif= ied > > > > in frontend domain configuration) > > > > 4. xl devd is started in driver domain > > > >=20 > > > > End result is that backend device in driver domain is not configured > > > > (like network interface is not enabled), so the device doesn't work. > > > >=20 > > > > Fix this by artifically triggering events for devices already prese= nt in > > > > xenstore before xl devd is started. Do this only after xenstore wat= ch is > > > > already registered, and only for devices not already initialized (in > > > > XenbusStateInitWait state). > > >=20 > > > Thanks! > > >=20 > > > > Cc: Ian Jackson > > > > Cc: Wei Liu > > > > Signed-off-by: Marek Marczykowski-G=C3=B3recki > > > > --- > > > > tools/libxl/libxl.c | 40 ++++++++++++++++++++++++++++++++++++++++ > > > > 1 file changed, 40 insertions(+) > > > >=20 > > > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > > > > index 1c81239..99815a7 100644 > > > > --- a/tools/libxl/libxl.c > > > > +++ b/tools/libxl/libxl.c > > > > @@ -4743,8 +4743,16 @@ int libxl_device_events_handler(libxl_ctx *c= tx, > > > > uint32_t domid; > > > > libxl__ddomain ddomain; > > > > char *be_path; > > > > + char **kinds =3D NULL, **domains =3D NULL, **devs =3D NULL; > > > > + const char *sstate; > > > > + char *state_path; > > > > + int state; > > > > + unsigned int nkinds, ndomains, ndevs; > > > > + int i, j, k; > > > > + xs_transaction_t t; > > > > =20 > > > > ddomain.ao =3D ao; > > > > + FILLZERO(ddomain.watch); > > >=20 > > > Is this a different bugfix or stray change? > >=20 > > To cleanly unregister watch and not do nothing if wasn't registered at > > all. If it isn't initialized, libxl__ev_xswatch_deregister call on > > not registered watch isn't harmless. >=20 > Right, I've realized that before your changes the function only registere= d=20 > the watch and exited, this is needed now. > =20 > > > > LIBXL_SLIST_INIT(&ddomain.guests); > > > > =20 > > > > rc =3D libxl__get_domid(gc, &domid); > > > > @@ -4762,9 +4770,41 @@ int libxl_device_events_handler(libxl_ctx *c= tx, > > > > be_path); > > > > if (rc) goto out; > > > > =20 > > > > + rc =3D libxl__xs_transaction_start(gc, &t); > > > > + if (rc) goto out; > > >=20 > > > Why do you need to start a transaction here if you end up aborting it= when=20 > > > finished? > >=20 > > Mostly to ease error checking. Because below code does three level > > listing, I don't want to deal with races where some entry was removed > > between those calls, at least not here. Like this: > >=20 > > xs_directory('backend/vif') -> 3, 4, 5 > > xs_directory('backend/vif/3') -> 0, 1 > > xs_read('backend/vif/3/0/state') -> ... > > xs_read('backend/vif/3/1/state') -> ... > > toolstack removes backend/vif/4 here > > xs_directory('backend/vif/4') -> fail > >=20 > > Of course backend_watch_callback would fail anyway in such a case, which > > is ok. But having snapshot of xenstore during this multi-level listing > > looks like avoiding some corner cases during listing itself. >=20 > AFAICT your code seems to be prepared to deal with entries disappearing i= n=20 > the middle of the tree walk, so I would just remove the transaction. Actually I'm considering changing error handling below to "goto out" instead of "continue", as race condition should be eliminated by the transaction, other errors (permission denied for example) maybe should be considered fatal. So, IMO there two options: - ignore failed reads and remove transaction - error on failed reads and keep transaction Which one would be better? > > > > + kinds =3D libxl__xs_directory(gc, t, be_path, &nkinds); > > > > + if (kinds) { > > > > + for (i =3D 0; i < nkinds; i++) { > > > > + domains =3D libxl__xs_directory(gc, t, > > > > + GCSPRINTF("%s/%s", be_path, kinds[i]), &ndomai= ns); > > > > + if (!domains) > > > > + continue; > > > > + for (j =3D 0; j < ndomains; j++) { > > > > + devs =3D libxl__xs_directory(gc, t, > > > > + GCSPRINTF("%s/%s/%s", be_path, kinds[i], d= omains[j]), &ndevs); > > > > + if (!devs) > > > > + continue; > > > > + for (k =3D 0; k < ndevs; k++) { > > > > + state_path =3D GCSPRINTF("%s/%s/%s/%s/state", > > > > + be_path, kinds[i], domains[j], devs[k]= ); > > > > + rc =3D libxl__xs_read_checked(gc, t, state_pat= h, &sstate); > > > > + if (rc) > > > > + continue; > > > > + state =3D atoi(sstate); > > > > + if (state =3D=3D XenbusStateInitWait) > > > > + backend_watch_callback(egc, &ddomain.watch, > > > > + be_path, state_path); > > > > + } > > > > + } > > > > + } > > > > + } > > > > + > > > > + libxl__xs_transaction_abort(gc, &t); > > > > + > > > > return AO_INPROGRESS; > > > > =20 > > > > out: > > > > + libxl__ev_xswatch_deregister(gc, &ddomain.watch); > > >=20 > > > This seems to be part of a different bugfix also. > >=20 > > No, this code previously wasn't reachable if xswatch was correctly > > registered. >=20 > Right, thanks. >=20 > Roger. --=20 Best Regards, Marek Marczykowski-G=C3=B3recki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? --K3Y3NTg/qyuIFs24 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJXg2uhAAoJENuP0xzK19csXUEH/3uAd/SZ4wNHxh43sYvx8d5y Hvf31WmFkom9kQszSGYtZ2dKy4xBsAj9+ov4foJOw8JH5d2a2q9EFe+LRgS8CY9Q rWrsHSWgv79VCKY/zb0s1DAp+/US3qsnkBeeTEp6C5USrJWziEQ2TyFJj1cZsadz sgh5eUA+LYQlbDTkrWelBGMyYZi/AuPSUMgjmxPhGITwMdVToT2uz4q2gDQRIyLa Nwpz2Hami6NPNf8SMK4cmjt5YfovgaWvEN2s3VoSn7hnqdPJTNOZVDAvQXrx21iY ulm4Eg2xJoB88Uh/WcgXGEvH98yKQ1R/ZBlFkO17Zi39NUQe75o7c2ZrrnMS3r4= =nWtz -----END PGP SIGNATURE----- --K3Y3NTg/qyuIFs24-- --===============8022195340913022176== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============8022195340913022176==--