Re: [PATCH] docs/qemu-deprivilege: Revise and update with status and future plans

From: George Dunlap <george.dunlap@citrix.com>
To: Ian Jackson <ian.jackson@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
	Wei Liu <wei.liu2@citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Tim Deegan <tim@xen.org>,
	Ross Lagerwall <ross.lagerwall@citrix.com>,
	Julien Grall <julien.grall@arm.com>,
	Jan Beulich <jbeulich@suse.com>,
	Anthony Perard <anthony.perard@citrix.com>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH] docs/qemu-deprivilege: Revise and update with status and future plans
Date: Tue, 27 Mar 2018 15:24:33 +0100	[thread overview]
Message-ID: <df2b93b0-eae0-88f0-a13e-90f4b7600271@citrix.com> (raw)
In-Reply-To: <1a845c55-df25-1754-3e14-bdc47e658da8@citrix.com>

On 03/27/2018 03:15 PM, George Dunlap wrote:
> On 03/27/2018 02:33 PM, Ian Jackson wrote:
>>>>> +### Namespaces for unused functionality
>> ...
>>>>> +    unshare(CLONE_NEWNS | CLONE_NEWIPC)
>>>>
>>>> This would mean we would have to pass qemu fds for both the network
>>>> tap devices and any vnc consoles.  That makes life considerably more
>>>> complicated.  I think we should perhaps revisit this upstream.
>>>
>>> You haven't read this very carefully (nor do you seem to have read the
>>> discussion with Ross before responding).  I split the "Namespaces"
>>> restriction Ross suggested internally to us into two parts:
>>>  - Namespace restrictions for unused functionality
>>>  - Network namespacing
>>
>> Oh, sorry.  Yes.
> 
> Actually, sorry for being a bit 'pointy' here; upon reflection it
> occured to me that you probably actually did read the whole thread, but
> when you went back to respond you meant to respond to the other section,
> but stopped scanning when you got a match with "namespace".
> 
>>> I now realize that it might be that the limit will follow the current
>>> uid of the process, in which case "setrlimit -> setuid" might have the
>>> expected behavior.  But a quick Google search shows that the interaction
>>> of RLIMIT_NPROC and setuid() is tricky[1][2], and may vary from
>>> implementation to implementation; relying on the interaction to be
>>> correct (and stay correct) seems somewhat risky (unless POSIX has
>>> explicitly documented what should happen in that case, which again a
>>> quick Google search hasn't turned up).
>>
>> RLIMIT_NPROC is not in SuS:
>>  http://pubs.opengroup.org/onlinepubs/9699919799/functions/setrlimit.html
>>
>> FreeBSD does not have any of this madness:
>>  https://www.unix.com/man-page/freebsd/2/setuid/
>>
>> Anyway IMO we should set RLIMIT_NPROC after fork and before execve.
>> If this causes setuid to fail in qemu, qemu will crash.  But this
>> could only be the case if the new uid already has other processes,
>> which it shouldn't do.  (If it does, the new uid is probably
>> contaminated by a previous domain with the same domid.)
> 
> I was more worried about the limit not having the expected effect after
> the setuid().
> 
>>>>> +### libxl UID cleanup
>>>> ...
>>>>> +kill(-1,sig) sends a signal to "every process to which the calling
>>>>> +process has permission to send a signal".  So in theory:
>>>>> +  setuid(X)
>>>>> +  kill(-1,KILL)
>>>>> +should do the trick.
>>>>
>>>> We need to check whether a malicious qemu process could kill this
>>>> one.
>>>
>>> Hmm, in theory it probably could.
>>
>> I could find nothing in SuS explaining when process A may send signals
>> to process B.  So I resorted to the BSD manpages:
>>
>>   https://www.unix.com/man-page/freebsd/2/kill/
>>
>>   For a process to have permission to send a signal to a process
>>   designated by pid, the user must be the super-user, or the real or
>>   saved user ID of the receiving process must match the real or
>>   effective user ID of the sending process.
> 
> The text of both the FreeBSD and Linux man pages looks to be copied
> verbatim from [1].
> 
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html
> 
>> Also kill(-1,) does not signal the sender.
> 
> Linux is the same.
> 
>> My analysis:
>>
>>                                           euid   ruid   suid
>>
>>   compared for receiving process                 ====   ====
>>   compared for sending process            ====   ====
>>
>>   possible rogue processes                qid    qid    qid
>>
>>   unrelated rootish processes             0      any    any
>>      or                                   any    0      any
>>      or                                   any    any    0
>>
>>   killer (our pet issuing kill)
>>      so rogues can't kill killer                 !=qid  !=qid
>>      so it doesn't kill randomly          !=0    !=0
>>      therefore:                           qid    pet    0
>>
>> We will have to reserve one uid `pet' specially for this nonsense.  We
>> call setresuid(2) to switch uids, and then kill(-1,9) and _exit(0).
>> The kill() call will kill other processes with uid `pet'.  So we take
>> a global lock while we do this.
>>
>> `pet' could be a uid associated with a reserved domid, eg dom0.

The alternate would be to have yet another UID range, to that we could
have a "target ID" (i.e., QEMU) and a "reaper ID" for each domain.  I
think that should mean any races should be benign.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel