All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <George.Dunlap@citrix.com>
To: Ian Jackson <Ian.Jackson@citrix.com>
Cc: Anthony Perard <anthony.perard@citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	Wei Liu <wei.liu2@citrix.com>
Subject: Re: [PATCH 8/9] libxl: Kill QEMU by uid when possible
Date: Wed, 28 Nov 2018 17:55:59 +0000	[thread overview]
Message-ID: <E08C72B6-B4AC-4C1B-B8E6-6D0D8D51E3F6@citrix.com> (raw)
In-Reply-To: <23550.51384.89698.110737@mariner.uk.xensource.com>



> On Nov 28, 2018, at 4:56 PM, Ian Jackson <ian.jackson@citrix.com> wrote:
> 
> George Dunlap writes ("[PATCH 8/9] libxl: Kill QEMU by uid when possible"):
>> The privcmd fd that a dm_restrict'ed QEMU has gives it permission to
>> one specific domain ID.  This domain ID will probably eventually be
>> used again.  It is therefore necessary to make absolutely sure that a
>> rogue QEMU process cannot hang around after its domain has exited.
> 
> Thanks.  This looks roughly right but I have some coding style
> quibbles, and I am not convinced the error handling is right.
> 
>> @@ -2382,6 +2389,15 @@ void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss)
>> 
>>     const char *dom_path = libxl__xs_get_dompath(gc, domid);
>> 
>> +    /* 
>> +     * If we're stating the dm with a non-root UID, save the UID so
> 
> Typo for `starting'.
> 
>> +     * that we can reliably kill it and any subprocesses
>> +     */
> ...
>> +static void kill_device_model_uid_cb(libxl__egc *egc,
>> +                                     libxl__ev_child *destroyer,
>> +                                     pid_t pid, int status);
>> +
>> void libxl__destroy_device_model(libxl__egc *egc,
>>                                  libxl__destroy_devicemodel_state *ddms)
>> {
>> @@ -2658,15 +2678,103 @@ void libxl__destroy_device_model(libxl__egc *egc,
>>     int rc;
>>     int domid = ddms->domid;
>>     char *path = DEVICE_MODEL_XS_PATH(gc, LIBXL_TOOLSTACK_DOMID, domid, "");
>> +    const char * dm_uid_str;
>> +    uid_t dm_uid;
>> +    int reaper_pid;
>> +    int ret;
>> 
>>     if (!xs_rm(CTX->xsh, XBT_NULL, path))
>>         LOGD(ERROR, domid, "xs_rm failed for %s", path);
>> 
>> -    /* We should try to destroy the device model anyway. */
>> -    rc = kill_device_model(gc,
>> -              GCSPRINTF("/local/domain/%d/image/device-model-pid", domid));
>> +    /* 
>> +     * We should try to destroy the device model anyway.  Check to see
>> +     * if we can kill by UID
>> +     */
>> +    ret = libxl__xs_read_checked(gc, XBT_NULL,
>> +                                GCSPRINTF("/local/domain/%d/image/device-model-uid",
>> +                                           domid),
>> +                                 &dm_uid_str);
> 
> I know this function is bad in its use of `rc' for syscall return but
> please don't make it worse by introducing `ret' for what should be
> `rc'.  Would you mind adding a pre-patch to change `rc' to `r' and
> then you can use `rc' ?
> 
>> +    if (ret || !dm_uid_str) {
> 
> Shouldn't we fail if libxl__xs_read_checked fails ?
> Otherwise we risk leaving fragments of domain behind even if we fail.
> (Possibly we should carry on, but accumulate errors in rc.)

Right, I didn’t notice that read_checked filtered out ENOENT (thus a non-zero value for ret indicates a different error).

Not really sure what the best thing would be to do in that case; maybe returning an error immediately is the better option.

> 
>> +        /* No uid in xenstore; just kill the pid we have */
>> +        LOGD(DEBUG, domid, "Didn't find dm UID; destroying by pid");
>> +        
>> +        rc = kill_device_model(gc,
>> +                               GCSPRINTF("/local/domain/%d/image/device-model-pid", domid));
>> +    
>> +        libxl__qmp_cleanup(gc, domid);
>> +
>> +        ddms->callback(egc, ddms, rc);
>> +        return;
> 
> Can you please break out this little exit code fragment
> 
>  +        libxl__qmp_cleanup(gc, domid);
>  +        ddms->callback(egc, ddms, rc);
> 
> into a sub-function ?  It occurs twice and it is easier to reason
> about things if the operation has a single exit path.

Actually, I think I’m going to call libxl__qmp_cleanup() before doing the kill()[s] instead.

> 
>> +    /* QEMU has its own uid; kill all processes with that UID */
>> +    LOGD(DEBUG, domid, "Found DM uid %s, destroying by uid", dm_uid_str);
>> +    
>> +    dm_uid = atoi(dm_uid_str);
> 
> I am tempted to suggest a more robust use of strtoul here but since
> this came from our own xenstore node, fine.
> 
>> +    reaper_pid = libxl__ev_child_fork(gc, &ddms->destroyer,
>> +                                      kill_device_model_uid_cb);
>> +    if (reaper_pid < 0)
>> +        ddms->callback(egc, ddms, ERROR_FAIL);
>> +    }
>> +
> 
> And, voila!  Didn't you mean to call libxl__qmp_cleanup ?

Not necessarily — if the fork doesn’t work, the qemu process won’t be killed.  This code was written with the assumption that we wanted qemu dead before removing the socket.

But Anthony tells me that’s not the case, so I’m leaning more towards reorganizing this function to look like the following:

1. libxl__qmp_cleanup()
2. kill_by_pid(pid);
3. if (uid) kill_by_uid(uid);

That is, always kill by pid, and kill by uid if it’s available; but remove the qmp socket first.

>  You're
> missing the exit path.  And maybe the call to it should be an `out'
> block in this function.
> 
> Also, amazingly, you don't return or anything here.  This is only not
> a bug because the rest is all the child process.

Not sure the distinction between these two paragraphs.  But I do think putting an `out:` label that also makes an `assert(rc)` would make this more robust.

>> +    if (!reaper_pid) { /* child */
>> +        const char * call;
>> +
>> +        /* 
>> +         * FIXME: the second uid needs to be distinct to avoid being
>> +         * killed by a potential rogue process
>> +         */
> 
> This is a bit of a funny way of introducing this but fine.

I don’t disagree, but I thought it was better to have a warning than not. I’m open to alternate suggestions.

> 
>> +        ret = setresuid(dm_uid, dm_uid, 0);
>> +        if (ret) {
>> +            call = "setresuid";
>> +            goto badchild;
>> +        }
>> +
>> +        /* And kill everyone but me */
>> +        ret = kill(-1, 9);
>> +        if (ret) {
>> +            call = "kill";
>> +            goto badchild;
>> +        }
>> +        _exit(0);
>> +        
>> +    badchild:
>> +        if (errno > 0  && errno < 126) {
>> +            _exit(errno);
>> +        } else {
>> +            LOGED(ERROR, domid,
>> +                  "Call %s failed (with difficult errno value %d)",
>> +                  call, errno);
>> +            _exit(-1);
>> +        }
>> +    }
> 
> I don't much like these gotos either.  Maybe a subfunction is called
> for.
> 
> Furthermore, once again please see CODING_STYLE re conventional
> variable names `r' and `rc'·
> 
>> +out:
>> +    libxl__qmp_cleanup(gc, ddms->domid);
> 
>>     ddms->callback(egc, ddms, rc);
> 
> Here's that finish fragment again.
> 
>> index 899a86e84b..59eac0662a 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
>> @@ -1135,7 +1135,7 @@ typedef struct {
>>     const char * shim_cmdline;
>>     const char * pv_cmdline;
>> 
>> -    char * dm_runas;
>> +    char * dm_runas,  *dm_uid;
> 
> It would be nice to drop the spurious space before dm_runas while you
> are here, and then presumably the double space before *dm_uid would
> not be needed.

Well presumably it would be better to fix it back in patch 2 instead. :-)

 -George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2018-11-28 17:56 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-23 17:14 [PATCH 1/9] libxl: Remove redundant pidpath setting George Dunlap
2018-11-23 17:14 ` [PATCH 2/9] libxl: Move dm user determination logic into a helper function George Dunlap
2018-11-28 15:20   ` Wei Liu
2018-11-28 16:33   ` Ian Jackson
2018-11-30 12:46   ` George Dunlap
2018-11-23 17:14 ` [PATCH 3/9] libxl: Get rid of support for QEMU_USER_BASE (xen-qemuuser-domidNN) George Dunlap
2018-11-28 15:32   ` Wei Liu
2018-11-30 12:53     ` George Dunlap
2018-11-30 16:56       ` Wei Liu
2018-11-28 16:34   ` Ian Jackson
2018-11-28 17:02     ` George Dunlap
2018-11-29 12:19       ` Ian Jackson
2018-11-23 17:14 ` [PATCH 4/9] dm_depriv: Describe expected usage of device_model_user parameter George Dunlap
2018-11-28 15:37   ` Wei Liu
2018-11-28 16:36   ` Ian Jackson
2018-11-28 17:27     ` George Dunlap
2018-11-29 12:20       ` Ian Jackson
2018-11-23 17:14 ` [PATCH 5/9] libxl: Do root checks once in libxl__domain_get_device_model_uid George Dunlap
2018-11-28 16:39   ` Ian Jackson
2018-11-28 17:38     ` George Dunlap
2018-11-29 17:09       ` Ian Jackson
2018-11-29 17:43         ` George Dunlap
2018-11-23 17:14 ` [PATCH 6/9] libxl: Move qmp cleanup into devicemodel destroy function George Dunlap
2018-11-28 16:39   ` Ian Jackson
2018-11-23 17:15 ` [PATCH 7/9] libxl: Make killing of device model asynchronous George Dunlap
2018-11-28 16:43   ` Ian Jackson
2018-11-30 16:12     ` George Dunlap
2018-11-30 16:14       ` George Dunlap
2018-11-30 16:22         ` Ian Jackson
2018-11-30 16:22       ` Ian Jackson
2018-11-23 17:15 ` [PATCH 8/9] libxl: Kill QEMU by uid when possible George Dunlap
2018-11-23 17:18   ` George Dunlap
2018-11-28 15:57     ` Anthony PERARD
2018-11-29 11:55       ` Wei Liu
2018-11-29 12:00         ` George Dunlap
2018-11-29 12:26           ` Ian Jackson
2018-11-29 12:38             ` George Dunlap
2018-11-29 17:16               ` Ian Jackson
2018-11-28 16:56   ` Ian Jackson
2018-11-28 17:55     ` George Dunlap [this message]
2018-11-29 17:01       ` Ian Jackson
2018-11-30 16:37     ` George Dunlap
2018-11-23 17:15 ` [PATCH 9/9] libxl: Kill QEMU with "reaper" ruid George Dunlap
2018-11-28 17:02   ` Ian Jackson
2018-11-28 18:33     ` George Dunlap
2018-11-29 11:39       ` George Dunlap
2018-11-29 17:15       ` Ian Jackson
2018-11-28 15:20 ` [PATCH 1/9] libxl: Remove redundant pidpath setting Wei Liu
2018-11-28 16:25 ` Ian Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E08C72B6-B4AC-4C1B-B8E6-6D0D8D51E3F6@citrix.com \
    --to=george.dunlap@citrix.com \
    --cc=Ian.Jackson@citrix.com \
    --cc=anthony.perard@citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.