linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Which of the virtualization approaches is more suitable for kernel?
@ 2006-02-20 15:45 Kirill Korotaev
  2006-02-20 16:12 ` Herbert Poetzl
  2006-02-24 21:44 ` Eric W. Biederman
  0 siblings, 2 replies; 27+ messages in thread
From: Kirill Korotaev @ 2006-02-20 15:45 UTC (permalink / raw)
  To: Linus Torvalds, Rik van Riel, Linux Kernel Mailing List, devel,
	Eric W. Biederman, Andrey Savochkin, Alexey Kuznetsov,
	Stanislav Protassov, serue, frankeh, clg, haveblue, mrmacman_g4,
	alan, Herbert Poetzl, Andrew Morton

Linus, Andrew,

We need your help on what virtualization approach you would accept to
mainstream (if any) and where we should go.

If to drop VPID virtualization which caused many disputes, we actually
have the one virtualization solution, but 2 approaches for it. Which one
will go depends on the goals and your approval any way.

So what are the approaches?

1. namespaces for all types of resources (Eric W. Biederman)

The proposed solution is similar to fs namespaces. This approach
introduces namespaces for IPC, IPv4 networking, IPv6 networking, pids
etc. It also adds to task_struct a set of pointers to namespaces which
task belongs to, i.e. current->ipv4_ns, current->ipc_ns etc.

Benefits:
- fine grained namespaces
- task_struct points directly to a structure describing the namespace
and it's data (not to container)
- maybe a bit more logical/clean implementation, since no effective
container is involved unlike a second approach.

Disadvantages:
- it is only proof of concept code right now. nothing more.
- such an approach requires adding of additional argument to many
functions (e.g. Eric's patch for networking is 1.5 bigger than openvz).
it also adds code for getting namespace from objects. e.g.
skb->sk->namespace.
- so it increases stack usage
- it can't efficiently compile to the same not virtualized kernel, which
can be undesired for embedded linux.
- fine grained namespaces are actually an obfuscation, since kernel
subsystems are tightly interconnected. e.g. network -> sysctl -> proc,
mqueues -> netlink, ipc -> fs and most often can be used only as a whole 
container.
- it involves a bit more complicated procedure of a container
create/enter which requires exec or something like this, since there is
no effective container which could be simply triggered.

2. containers (OpenVZ.org/linux-vserver.org)

Container solution was discussed before, and actually it is also
namespace solution, but as a whole total namespace, with a single kernel
structure describing it.
Every task has two cotnainer pointers: container and effective
container. The later is used to temporarily switch to other contexts,
e.g. when handling IRQs, TCP/IP etc.

Benefits:
- clear logical bounded container, it is clear when container is alive
and when not.
- it doesn't introduce additional args for most functions, no additional
stack usage.
- it compiles to old good kernel when virtualization if off, so doesn't
disturb other configurations.
- Eric brought an interesting idea about introducing interface like
DEFINE_CPU_VAR(), which could potentially allow to create virtualized
variables automagically and access them via econtainer().
- mature working code exists which is used in production for years, so
first working version can be done much quicker

Disadvantages:
- one additional pointer dereference when accessing virtualized
resources, e.g. current->econtainer->shm_ids

Kirill


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2006-03-10 18:59 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-20 15:45 Which of the virtualization approaches is more suitable for kernel? Kirill Korotaev
2006-02-20 16:12 ` Herbert Poetzl
2006-02-21 16:00   ` Kirill Korotaev
2006-02-21 20:33     ` Sam Vilain
2006-02-21 23:50     ` Herbert Poetzl
2006-02-22 10:09       ` [Devel] " Kir Kolyshkin
2006-02-22 15:26         ` Eric W. Biederman
2006-02-23 12:02           ` Kir Kolyshkin
2006-02-23 13:25             ` Eric W. Biederman
2006-02-23 14:00               ` Kir Kolyshkin
2006-02-24 21:44 ` Eric W. Biederman
2006-02-24 23:01   ` Herbert Poetzl
2006-02-27 17:42   ` Dave Hansen
2006-02-27 21:14     ` Eric W. Biederman
2006-02-27 21:35       ` Dave Hansen
2006-02-27 21:56         ` Eric W. Biederman
2006-03-04  3:17       ` sysctls inside containers Dave Hansen
2006-03-04 10:27         ` Eric W. Biederman
2006-03-06 16:27           ` Dave Hansen
2006-03-06 17:08             ` Herbert Poetzl
2006-03-06 17:18               ` Dave Hansen
2006-03-06 18:56             ` Eric W. Biederman
2006-03-10 10:17         ` Kirill Korotaev
2006-03-10 13:22           ` Eric W. Biederman
2006-03-10 10:19         ` Kirill Korotaev
2006-03-10 11:55           ` Eric W. Biederman
2006-03-10 18:58           ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).