linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [XEN] using shmfs for swapspace
@ 2005-01-02 16:26 Luke Kenneth Casson Leighton
  2005-01-03 18:31 ` Joseph Fannin
  0 siblings, 1 reply; 14+ messages in thread
From: Luke Kenneth Casson Leighton @ 2005-01-02 16:26 UTC (permalink / raw)
  To: linux-kernel, xen-devel

hi,

am starting to play with XEN - the virtualisation project
(http://xen.sf.net).

i'll give some background first of all and then the question - at the
bottom - will make sense [when posting to lkml i often get questions
asked that are answered by the background material i also provide...
*sigh*]


each virtual machine requires (typically) its own physical ram (a chunk
of the host's real memory) and some virtual memory - swapspace.  xen
uses 32mb for its shm guest OS inter-communication.

so, in the case i'm setting up, that's 5 virtual machines (only one of
which can get away with having only 32mb of ram, the rest require 64mb)
so that's five lots of 256mbyte swap files.

the memory usage is the major concern: i only have 256mb of ram and
you've probably by now added up that the above comes to 320mbytes.

so i started looking at ways to minimise the memory usage.

first, reducing each machine to only having 32mb of ram, and secondly,
on the host, creating a MASSIVE swap file (1gbyte), making a MASSIVE
shmfs/tmpfs partition (1gbyte) and then creating swap files in the
tmpfs partition!!!

the reasoning behind doing this is quite straightforward: by placing the
swapfiles in a tmpfs, presumably then when one of the guest OSes
requires some memory, then RAM on the host OS will be used until such
time as the amount of RAM requested exceeds the host OSes physical
memory, and then it will go into swap-space.

this is presumed to be infinitely better than forcing the swapspace to
be always on disk, especially with the guests only being allocated
32mbyte of physical RAM.

here's the problems:

1) tmpfs doesn't support sparse files

2) files created in tmpfs don't support block devices (???)

3) as a workaround i have to create a swap partition in a 256mb file,
   (dd if=/dev/zero of=/mnt/swapfile bs=1M count=256 and do mkswap on it)
   then copy the ENTIRE file into the tmpfs-mounted partition.

   on every boot-up.

   per swapfile needed.

eeeuw, yuk.

so, my question is a strategic one:

	* in what other ways could the same results be achieved?

	in other words, what other ways can i publish block
	devices from the master OS (and they must be block
	devices for XEN guest OSes to be able to see them)
	that can be used as swap space, that will be in RAM if possible,
	bearing in mind that they can be recreated at boot time,
	i.e. they don't need to be persistent.

ta,

l.


-- 
--
<a href="http://lkcl.net">http://lkcl.net</a>
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [XEN] using shmfs for swapspace
  2005-01-02 16:26 [XEN] using shmfs for swapspace Luke Kenneth Casson Leighton
@ 2005-01-03 18:31 ` Joseph Fannin
  2005-01-03 20:53   ` Luke Kenneth Casson Leighton
  0 siblings, 1 reply; 14+ messages in thread
From: Joseph Fannin @ 2005-01-03 18:31 UTC (permalink / raw)
  To: Luke Kenneth Casson Leighton; +Cc: linux-kernel, xen-devel

On Sun, Jan 02, 2005 at 04:26:52PM +0000, Luke Kenneth Casson Leighton wrote:
[...] 
> this is presumed to be infinitely better than forcing the swapspace to
> be always on disk, especially with the guests only being allocated
> 32mbyte of physical RAM.

    I'd be interested in knowing how a tmpfs that's gone far into swap
performs compared to a more normal on-disk fs.  I don't know if anyone
has ever looked into it.  Is it comparable, or is tmpfs's ability to
swap more a last-resort escape hatch?

    This is the part where I would add something valuable to this
conversation, if I were going to do that. (But no.)
-- 
Joseph Fannin
jhf@rivenstone.net  

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [XEN] using shmfs for swapspace
  2005-01-03 18:31 ` Joseph Fannin
@ 2005-01-03 20:53   ` Luke Kenneth Casson Leighton
  2005-01-03 21:06     ` Alan Cox
  2005-01-03 21:07     ` Adam Heath
  0 siblings, 2 replies; 14+ messages in thread
From: Luke Kenneth Casson Leighton @ 2005-01-03 20:53 UTC (permalink / raw)
  To: linux-kernel, xen-devel

On Mon, Jan 03, 2005 at 01:31:34PM -0500, Joseph Fannin wrote:
> On Sun, Jan 02, 2005 at 04:26:52PM +0000, Luke Kenneth Casson Leighton wrote:
> [...] 
> > this is presumed to be infinitely better than forcing the swapspace to
> > be always on disk, especially with the guests only being allocated
> > 32mbyte of physical RAM.
> 
>     I'd be interested in knowing how a tmpfs that's gone far into swap
> performs compared to a more normal on-disk fs.  I don't know if anyone
> has ever looked into it.  Is it comparable, or is tmpfs's ability to
> swap more a last-resort escape hatch?
> 
>     This is the part where I would add something valuable to this
> conversation, if I were going to do that. (But no.)

 :)

 okay.
 
 some kind person from ibm pointed out that of course if you use a
 file-based swap file (in xen terminology,
 disk=['file:/xen/guest1-swapfile,/dev/sda2,rw'] which means "publish
 guest1-swapfile on the DOM0 VM as /dev/sda2 hard drive on the
 guest1 VM) then you of course end up using the linux filesystem cache
 on DOM0 which is of course RAM-based.

 so this tends to suggest a strategy where you allocate as
 much memory as you can afford to the DOM0 VM, and as little
 as you can afford to the guests, and make the guest swap
 files bigger to compensate.

 ... and i thought it was going to need some wacky wacko non-sharing
 shared-memory virtual-memory pseudo-tmpfs block-based filesystem
 driver.  dang.

 l.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [XEN] using shmfs for swapspace
  2005-01-03 20:53   ` Luke Kenneth Casson Leighton
@ 2005-01-03 21:06     ` Alan Cox
  2005-01-04  3:04       ` [Xen-devel] " Mark Williamson
  2005-01-03 21:07     ` Adam Heath
  1 sibling, 1 reply; 14+ messages in thread
From: Alan Cox @ 2005-01-03 21:06 UTC (permalink / raw)
  To: Luke Kenneth Casson Leighton; +Cc: Linux Kernel Mailing List, xen-devel


>  so this tends to suggest a strategy where you allocate as
>  much memory as you can afford to the DOM0 VM, and as little
>  as you can afford to the guests, and make the guest swap
>  files bigger to compensate.

This is essentially what the mainframe folks are already doing and have
been doing for some time because the kernel VM has no external inputs
for saying "you are virtualised so be nice" 
for doing opportunistic page recycling ("I dont need this page but when
I ask for it back please tell me if you trashed the content") and for
hinting to the underlying VM what pages are best blasted out of
existance first and how to communicate so we dont page them back in
scanning them.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Re: [XEN] using shmfs for swapspace
  2005-01-03 20:53   ` Luke Kenneth Casson Leighton
  2005-01-03 21:06     ` Alan Cox
@ 2005-01-03 21:07     ` Adam Heath
  2005-01-04  9:30       ` Luke Kenneth Casson Leighton
  1 sibling, 1 reply; 14+ messages in thread
From: Adam Heath @ 2005-01-03 21:07 UTC (permalink / raw)
  To: Luke Kenneth Casson Leighton; +Cc: linux-kernel, xen-devel

On Mon, 3 Jan 2005, Luke Kenneth Casson Leighton wrote:

> On Mon, Jan 03, 2005 at 01:31:34PM -0500, Joseph Fannin wrote:
> > On Sun, Jan 02, 2005 at 04:26:52PM +0000, Luke Kenneth Casson Leighton wrote:
> > [...]
> > > this is presumed to be infinitely better than forcing the swapspace to
> > > be always on disk, especially with the guests only being allocated
> > > 32mbyte of physical RAM.
> >
> >     I'd be interested in knowing how a tmpfs that's gone far into swap
> > performs compared to a more normal on-disk fs.  I don't know if anyone
> > has ever looked into it.  Is it comparable, or is tmpfs's ability to
> > swap more a last-resort escape hatch?
> >
> >     This is the part where I would add something valuable to this
> > conversation, if I were going to do that. (But no.)
>
>  :)
>
>  okay.
>
>  some kind person from ibm pointed out that of course if you use a
>  file-based swap file (in xen terminology,
>  disk=['file:/xen/guest1-swapfile,/dev/sda2,rw'] which means "publish
>  guest1-swapfile on the DOM0 VM as /dev/sda2 hard drive on the
>  guest1 VM) then you of course end up using the linux filesystem cache
>  on DOM0 which is of course RAM-based.
>
>  so this tends to suggest a strategy where you allocate as
>  much memory as you can afford to the DOM0 VM, and as little
>  as you can afford to the guests, and make the guest swap
>  files bigger to compensate.

But the guest kernels need real ram to run programs in.

The problem with dom0 doing the caching, is that dom0 has no idea about the
usage pattern for the swap.  It's just a plain file to dom0.  Only each guest
kernel knows how to combine swap reads/writes correctly.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Re: [XEN] using shmfs for swapspace
  2005-01-03 21:06     ` Alan Cox
@ 2005-01-04  3:04       ` Mark Williamson
  2005-01-04 14:05         ` Rik van Riel
  2005-01-05  0:11         ` Arnd Bergmann
  0 siblings, 2 replies; 14+ messages in thread
From: Mark Williamson @ 2005-01-04  3:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Alan Cox, Luke Kenneth Casson Leighton, Linux Kernel Mailing List

> for doing opportunistic page recycling ("I dont need this page but when
> I ask for it back please tell me if you trashed the content")

We've talked about doing this but AFAIK nobody has gotten round to it yet 
because there hasn't been a pressing need (IIRC, it was on the todo list when 
Xen 1.0 came out).

IMHO, it doesn't look terribly difficult but would require (hopefully small) 
modifications to the architecture independent code, plus a little bit of 
support code in Xen.

I'd quite like to look at this one fine day but I suspect there are more 
useful things I should do first...

Cheers,
Mark

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Re: [XEN] using shmfs for swapspace
  2005-01-03 21:07     ` Adam Heath
@ 2005-01-04  9:30       ` Luke Kenneth Casson Leighton
  2005-01-04 14:06         ` Rik van Riel
  0 siblings, 1 reply; 14+ messages in thread
From: Luke Kenneth Casson Leighton @ 2005-01-04  9:30 UTC (permalink / raw)
  To: Adam Heath; +Cc: linux-kernel, xen-devel

On Mon, Jan 03, 2005 at 03:07:42PM -0600, Adam Heath wrote:

> >  so this tends to suggest a strategy where you allocate as
> >  much memory as you can afford to the DOM0 VM, and as little
> >  as you can afford to the guests, and make the guest swap
> >  files bigger to compensate.
> 
> But the guest kernels need real ram to run programs in.
> 
> The problem with dom0 doing the caching, is that dom0 has no idea about the
> usage pattern for the swap.  It's just a plain file to dom0.  Only each guest
> kernel knows how to combine swap reads/writes correctly.

... hmm...

then that tends to suggest that this is an issue that should
really be dealt with by XEN.

that there needs to be coordination of swap management between the
virtual machines.

l.

-- 
--
<a href="http://lkcl.net">http://lkcl.net</a>
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Re: [XEN] using shmfs for swapspace
  2005-01-04  3:04       ` [Xen-devel] " Mark Williamson
@ 2005-01-04 14:05         ` Rik van Riel
  2005-01-06 11:38           ` Luke Kenneth Casson Leighton
  2005-01-05  0:11         ` Arnd Bergmann
  1 sibling, 1 reply; 14+ messages in thread
From: Rik van Riel @ 2005-01-04 14:05 UTC (permalink / raw)
  To: Mark Williamson
  Cc: xen-devel, Alan Cox, Luke Kenneth Casson Leighton,
	Linux Kernel Mailing List

On Tue, 4 Jan 2005, Mark Williamson wrote:

>> for doing opportunistic page recycling ("I dont need this page but when
>> I ask for it back please tell me if you trashed the content")
>
> We've talked about doing this but AFAIK nobody has gotten round to it 
> yet because there hasn't been a pressing need (IIRC, it was on the todo 
> list when Xen 1.0 came out).
>
> IMHO, it doesn't look terribly difficult but would require (hopefully 
> small) modifications to the architecture independent code, plus a little 
> bit of support code in Xen.

The architecture independant changes are fine, since
they're also useful for S390(x), PPC64 and UML...

> I'd quite like to look at this one fine day but I suspect there are more 
> useful things I should do first...

I wonder if the same effect could be achieved by just
measuring the VM pressure inside the guests and
ballooning the guests as required, letting them grow
and shrink with their workloads.

That wouldn't need many kernel changes, maybe just a
few extra statistics, or maybe all the needed stats
already exist.  It would also allow more complex
policy to be done in userspace, eg. dealing with Xen
guests of different priority...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Re: [XEN] using shmfs for swapspace
  2005-01-04  9:30       ` Luke Kenneth Casson Leighton
@ 2005-01-04 14:06         ` Rik van Riel
  0 siblings, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2005-01-04 14:06 UTC (permalink / raw)
  To: Luke Kenneth Casson Leighton
  Cc: Adam Heath, linux-kernel, xen-devel@lists.sf.net

On Tue, 4 Jan 2005, Luke Kenneth Casson Leighton wrote:

> then that tends to suggest that this is an issue that should
> really be dealt with by XEN.

Probably.

> that there needs to be coordination of swap management between the
> virtual machines.

I'd like to see the maximum security separation possible
between the unprivileged guests, though...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Re: [XEN] using shmfs for swapspace
  2005-01-04  3:04       ` [Xen-devel] " Mark Williamson
  2005-01-04 14:05         ` Rik van Riel
@ 2005-01-05  0:11         ` Arnd Bergmann
  2005-01-21 21:37           ` Rik van Riel
  1 sibling, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2005-01-05  0:11 UTC (permalink / raw)
  To: Mark Williamson
  Cc: xen-devel, Alan Cox, Luke Kenneth Casson Leighton,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1978 bytes --]

On Dinsdag 04 Januar 2005 04:04, Mark Williamson wrote:
> > for doing opportunistic page recycling ("I dont need this page but when
> > I ask for it back please tell me if you trashed the content")
> 
> We've talked about doing this but AFAIK nobody has gotten round to it yet 
> because there hasn't been a pressing need (IIRC, it was on the todo list when 
> Xen 1.0 came out).
> 
> IMHO, it doesn't look terribly difficult but would require (hopefully small) 
> modifications to the architecture independent code, plus a little bit of 
> support code in Xen.
> 
> I'd quite like to look at this one fine day but I suspect there are more 
> useful things I should do first...

There are two other alternatives that are already used on s390 for making
multi-level paging a little more pleasant:

- Pseudo faults: When Linux accesses a page that it believes to be present
  but is actually swapped out in z/VM, the VM hypervisor causes a special
  PFAULT exception. Linux can then choose to either ignore this exception
  and continue, which will force VM to swap the page back in. Or it can
  do a task switch and wait for the page to come back. At the point where
  VM has read the page back from its swap device, it causes another
  exception, after which Linux wakes up the sleeping process.
  see arch/s390/mm/fault.c

- Ballooning: 
  z/VM has an interface (DIAG 10) for the OS to tell it about a page that
  is currently unused. The kernel uses get_free_page to reserve a number
  of pages, then calls DIAG10 to give it to z/VM. The amount of pages to
  give back to the hypervisor is determined by a system wide workload
  manager.
  see arch/s390/mm/cmm.c

When you want to introduce some interface in Xen, you probably want
something more powerful than these, but it probably makes sense to
see them as a base line of what can be done with practically no
common code changes (if you don't do similar stuff already).

	Arnd <><

[-- Attachment #2: signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Re: [XEN] using shmfs for swapspace
  2005-01-04 14:05         ` Rik van Riel
@ 2005-01-06 11:38           ` Luke Kenneth Casson Leighton
  0 siblings, 0 replies; 14+ messages in thread
From: Luke Kenneth Casson Leighton @ 2005-01-06 11:38 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mark Williamson, xen-devel, Alan Cox, Linux Kernel Mailing List

On Tue, Jan 04, 2005 at 09:05:13AM -0500, Rik van Riel wrote:
> On Tue, 4 Jan 2005, Mark Williamson wrote:
> 
> >>for doing opportunistic page recycling ("I dont need this page but when
> >>I ask for it back please tell me if you trashed the content")
> >
> >We've talked about doing this but AFAIK nobody has gotten round to it 
> >yet because there hasn't been a pressing need (IIRC, it was on the todo 
> >list when Xen 1.0 came out).
> >
> >IMHO, it doesn't look terribly difficult but would require (hopefully 
> >small) modifications to the architecture independent code, plus a little 
> >bit of support code in Xen.
> 
> The architecture independant changes are fine, since
> they're also useful for S390(x), PPC64 and UML...
> 
> >I'd quite like to look at this one fine day but I suspect there are more 
> >useful things I should do first...
> 
> I wonder if the same effect could be achieved by just
> measuring the VM pressure inside the guests and
> ballooning the guests as required, letting them grow
> and shrink with their workloads.
 
 mem = 64M-128M
 target = 64M

 "if needed, grow me to 128mb but if not, whittle down to 64".

 mem=64M-128
 target=128M

 "if you absolutely have to, steal some of my memory, but don't nick
 any more than 64M".

 i'm probably going to have to "manually" implement something like this.

 l.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Re: [XEN] using shmfs for swapspace
  2005-01-05  0:11         ` Arnd Bergmann
@ 2005-01-21 21:37           ` Rik van Riel
  2005-01-26 20:56             ` Mark Williamson
  0 siblings, 1 reply; 14+ messages in thread
From: Rik van Riel @ 2005-01-21 21:37 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Mark Williamson, xen-devel, Alan Cox,
	Luke Kenneth Casson Leighton, Linux Kernel Mailing List

On Wed, 5 Jan 2005, Arnd Bergmann wrote:

> - Pseudo faults:

These are a problem, because they turn what would be a single
pageout into a pageout, a pagein, and another pageout, in
effect tripling the amount of IO that needs to be done.

> - Ballooning:

Xen already has this.  I wonder if it makes sense to
consolidate the various balloon approaches into a single
driver, and keep the amount of ballooned memory into
account when reporting statistics in /proc/meminfo.

> When you want to introduce some interface in Xen, you probably want
> something more powerful than these,

Xen has a nice balloon driver, that can also be
controlled from outside the guest domain.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Re: [XEN] using shmfs for swapspace
  2005-01-21 21:37           ` Rik van Riel
@ 2005-01-26 20:56             ` Mark Williamson
  2005-01-27 10:33               ` Nuutti Kotivuori
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Williamson @ 2005-01-26 20:56 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Arnd Bergmann, Mark Williamson, xen-devel, Alan Cox,
	Luke Kenneth Casson Leighton, Linux Kernel Mailing List

> > - Pseudo faults:
>
> These are a problem, because they turn what would be a single
> pageout into a pageout, a pagein, and another pageout, in
> effect tripling the amount of IO that needs to be done.

The Disco VMM tackled this by detecting attempts to double-page using a 
special virtual swap disk.  Perhaps it would be possible to find some cleaner 
way to avoid wasteful double-paging by adding some more hooks for virtualised 
architectures...

In any case, for now Xen guests are not swapped onto disk storage at runtime - 
they retain their physical memory reservation unless they alter it using the 
balloon driver.

> Xen already has this.  I wonder if it makes sense to
> consolidate the various balloon approaches into a single
> driver, and keep the amount of ballooned memory into
> account when reporting statistics in /proc/meminfo.

If multiple platforms want to do this, we could refactor the code so that the 
core of the balloon driver can be used in multiple archs.  We could have an 
arch_release/request_memory() that the core balloon driver can call into to 
actually return memory to the VMM.

> > When you want to introduce some interface in Xen, you probably want
> > something more powerful than these,
>
> Xen has a nice balloon driver, that can also be
> controlled from outside the guest domain.

The Xen control interface made this fairly trivial to implement.  Again, the 
balloon driver core could be plumbed into whatever the preferred virtual 
machine control interface for the platform is (I don't know if / how other 
platforms tackle this).

Cheers,
Mark

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Re: [XEN] using shmfs for swapspace
  2005-01-26 20:56             ` Mark Williamson
@ 2005-01-27 10:33               ` Nuutti Kotivuori
  0 siblings, 0 replies; 14+ messages in thread
From: Nuutti Kotivuori @ 2005-01-27 10:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel

Mark Williamson wrote:
> If multiple platforms want to do this, we could refactor the code so
> that the core of the balloon driver can be used in multiple archs.
> We could have an arch_release/request_memory() that the core balloon
> driver can call into to actually return memory to the VMM.

This is also a thing that the UML project would probably be interested
in.

As a generalization though, what is needed is hot-pluggable memory in
Linux kernel. Satisfies Xen, UML and any physical implementations at
some point.

-- Naked



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-01-27 10:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-02 16:26 [XEN] using shmfs for swapspace Luke Kenneth Casson Leighton
2005-01-03 18:31 ` Joseph Fannin
2005-01-03 20:53   ` Luke Kenneth Casson Leighton
2005-01-03 21:06     ` Alan Cox
2005-01-04  3:04       ` [Xen-devel] " Mark Williamson
2005-01-04 14:05         ` Rik van Riel
2005-01-06 11:38           ` Luke Kenneth Casson Leighton
2005-01-05  0:11         ` Arnd Bergmann
2005-01-21 21:37           ` Rik van Riel
2005-01-26 20:56             ` Mark Williamson
2005-01-27 10:33               ` Nuutti Kotivuori
2005-01-03 21:07     ` Adam Heath
2005-01-04  9:30       ` Luke Kenneth Casson Leighton
2005-01-04 14:06         ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).