On Wed, 2012-02-15 at 22:21 +0000, Arnd Bergmann wrote: > On Tuesday 07 February 2012, Alexander Graf wrote: > > On 07.02.2012, at 07:58, Michael Ellerman wrote: > > > > > On Mon, 2012-02-06 at 13:46 -0600, Scott Wood wrote: > > >> You're exposing a large, complex kernel subsystem that does very > > >> low-level things with the hardware. It's a potential source of exploits > > >> (from bugs in KVM or in hardware). I can see people wanting to be > > >> selective with access because of that. > > > > > > Exactly. > > > > > > In a perfect world I'd agree with Anthony, but in reality I think > > > sysadmins are quite happy that they can prevent some users from using > > > KVM. > > > > > > You could presumably achieve something similar with capabilities or > > > whatever, but a node in /dev is much simpler. > > > > Well, you could still keep the /dev/kvm node and then have syscalls operate on the fd. > > > > But again, I don't see the problem with the ioctl interface. It's nice, extensible and works great for us. > > > > ioctl is good for hardware devices and stuff that you want to enumerate > and/or control permissions on. For something like KVM that is really a > core kernel service, a syscall makes much more sense. Yeah maybe. That distinction is at least in part just historical. The first problem I see with using a syscall is that you don't need one syscall for KVM, you need ~90. OK so you wouldn't do that, you'd use a multiplexed syscall like epoll_ctl() - or probably several (vm/vcpu/etc). Secondly you still need a handle/context for those syscalls, and I think the most sane thing to use for that is an fd. At that point you've basically reinvented ioctl :) I also think it is an advantage that you have a node in /dev for permissions. I know other "core kernel" interfaces don't use a /dev node, but arguably that is their loss. > I would certainly never mix the two concepts: If you use a chardev to get > a file descriptor, use ioctl to do operations on it, and if you use a > syscall to get the file descriptor then use other syscalls to do operations > on it. Sure, we use a syscall to get the fd (open) and then other syscalls to do operations on it, ioctl and kvm_vcpu_run. ;) But seriously, I guess that makes sense. Though it's a bit of a pity because if you want a syscall for any of it, eg. vcpu_run(), then you have to basically reinvent ioctl for all the other little operations. cheers