From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shriram Rajagopalan Subject: Re: Xen Memory De-duplication Date: Mon, 11 Oct 2010 00:58:47 -0700 Message-ID: References: <96a60488-b3aa-4141-92a4-587257b48d86@default> <20101010123408.GE2804@reaktio.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1131108503==" Return-path: In-Reply-To: <20101010123408.GE2804@reaktio.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: =?ISO-8859-1?Q?Pasi_K=E4rkk=E4inen?= Cc: Aditya Gadre , Dan Magenheimer , Xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org --===============1131108503== Content-Type: multipart/alternative; boundary=0022152d6d8dea7384049252ba67 --0022152d6d8dea7384049252ba67 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Not sure about the DMA part, but I suggest you also take a look at satori project code (memshr modules) in xen. http://www.usenix.org/events/usenix09/tech/slides/milos.pdf On Sun, Oct 10, 2010 at 5:34 AM, Pasi K=E4rkk=E4inen wrote: > On Sun, Oct 10, 2010 at 10:54:58AM +0530, Aditya Gadre wrote: > > This kind of implementation will require the disk blocks from > different > > DomUs to be mapped to same physical disk block. > > For example, > > 1) Shared read only filesystem > > 2) Union based filesystem > > 3) Virtual machine images deployed on a host filesystem which has > > deduplication enabled > > > > I guess Xen blktap qcow* images should do? And maybe blktap2 VHD? > > -- Pasi > > > What kind of arrangement of filesystem is used in production > environments > > for DomUs which host large number of VMs as in cloud enviorment? > > > > On Sun, Oct 10, 2010 at 5:10 AM, Dan Magenheimer > > <[1]dan.magenheimer@oracle.com> wrote: > > > > I*m not an expert on it but I believe this sounds very similar to > the > > page sharing implementation that already exists in Xen 4.0. The > > implementation in Xen only works on HVM guests and only on machine= s > that > > have EPT though. The patches (which were accepted into Xen) were > posted > > here: > > > > > > > > [2] > http://lists.xensource.com/archives/html/xen-devel/2009-12/msg00797.html > > > > > > > > From: Aditya Gadre [mailto:[3]adivb2003@gmail.com] > > Sent: Saturday, October 09, 2010 11:56 AM > > To: [4]Xen-devel@lists.xensource.com > > Subject: [Xen-devel] Xen Memory De-duplication > > > > > > > > Aim is to implement Xen Memory Deduplication with minimum overhead= . > > > > Our approach to de-duplication is as follows > > > > In most cases, Domain-U uses a small set of well-known operating > systems > > such as Linux, FreeBSD and Microsoft Windows. In such environment > many > > domains share read-only filesystems that contain operating system > and > > frequently usedprogram files and libraries.Each domain has their o= wn > > writable filesystems for storing data and temporary files. In this > > configuration, multiple pages scattered in different domains mostl= y > > happen to contain same disk block. So, in our approach to perform > > deduplication we intend to add a data structure in dom 0 which sto= re > > disk block number and the machine frame number(MFN) when a read > request > > for the read only code(and data) is made. Now when another domain = U > > places the request for the block of code and Dom 0 recieves a > request > > for I/O (DMA), it will first check into the data structure for the > entry > > for the block. If it finds the block it will return the MFN of the > > already read page and map it to the requesting domain's PFN > resulting in > > zero I/O processing time of blocks which are already read. This in > turn > > results in de-duplication of the read only pages accessed by > multiple > > domains without any overhead of hashing the page. > > > > Test case scenario: > > > > Consider a Dom0 linux kernel using a filesystem with deduplication > > enabled. Then we install a DomU kernel with the virtual disk as a > image > > file on the disk(.img). Then we make multiple copies of the image = to > > deploy multiple DomUs running same kernel. Now, as deduplication i= s > > enabled in the file system initially all the blocks of the domains > will > > be pointing to the same disk blocks. Now when the kernel's are > booted, > > they all will consume memory only once for the programs(code > segment) > > loaded in the memory. Now as these OSs start to write to their own > > virtual filesystems the blocks of the image will be COW'ed by the > > filesystem resulting in different block number. > > Is such a approach implemented? We intend to implement this as a > > project. What are the suspected challanges? > > > > Regards, > > Aditya Gadre > > > > References > > > > Visible links > > 1. mailto:dan.magenheimer@oracle.com > > 2. > http://lists.xensource.com/archives/html/xen-devel/2009-12/msg00797.html > > 3. mailto:adivb2003@gmail.com > > 4. mailto:Xen-devel@lists.xensource.com > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > --=20 perception is but an offspring of its own self --0022152d6d8dea7384049252ba67 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Not sure about the DMA part, but I suggest you also take a look at satori p= roject code (memshr modules) in xen.
http://www.usenix.org/events/usenix09/= tech/slides/milos.pdf

On Sun, Oct 10, 2010 at 5:34 AM, Pasi K=E4rk= k=E4inen <pasik@iki.fi= > wrote:
On Sun, Oct 10, 2010 at 10:54:58AM +0530, Aditya Gadre wr= ote:
> =A0 =A0This kind of implementation will require the disk blocks from d= ifferent
> =A0 =A0DomUs to be mapped to same physical disk block.
> =A0 =A0For example,
> =A0 =A01) Shared read only filesystem
> =A0 =A02) Union based filesystem
> =A0 =A03) Virtual machine images deployed on a host filesystem which h= as
> =A0 =A0deduplication enabled
>

I guess Xen blktap qcow* images should do? And maybe blktap2 VHD?

-- Pasi

> =A0 =A0What kind of =A0arrangement of filesystem is used in production= environments
> =A0 =A0for DomUs which host large number of VMs as in cloud enviorment= ?
>
> =A0 =A0On Sun, Oct 10, 2010 at 5:10 AM, Dan Magenheimer
> =A0 =A0<[1]dan.magenheimer@oracle.com> wrote:
>
> =A0 =A0 =A0I*m not an expert on it but I believe this sounds very simi= lar to the
> =A0 =A0 =A0page sharing implementation that already exists in Xen 4.0.= =A0The
> =A0 =A0 =A0implementation in Xen only works on HVM guests and only on = machines that
> =A0 =A0 =A0have EPT though. =A0The patches (which were accepted into X= en) were posted
> =A0 =A0 =A0here:
>
>
>
> =A0 =A0 =A0[2]http://lists.xensource= .com/archives/html/xen-devel/2009-12/msg00797.html
>
>
>
> =A0 =A0 =A0From: Aditya Gadre [mailto:[3]adivb2003@gmail.com]
> =A0 =A0 =A0Sent: Saturday, October 09, 2010 11:56 AM=
> =A0 =A0 =A0To: [4]Xen-devel@lists.xensource.com
> =A0 =A0 =A0Subject: [Xen-devel] Xen = Memory De-duplication
>
>
>
> =A0 =A0 =A0Aim is to implement Xen Memory Deduplication with minimum o= verhead.
>
> =A0 =A0 =A0Our approach to de-duplication is as follows
>
> =A0 =A0 =A0In most cases, Domain-U uses a small set of well-known oper= ating systems
> =A0 =A0 =A0such as Linux, FreeBSD and Microsoft Windows. In such envir= onment many
> =A0 =A0 =A0domains share read-only filesystems that contain operating = system and
> =A0 =A0 =A0frequently usedprogram files and libraries.Each domain has = their own
> =A0 =A0 =A0writable filesystems for storing data and temporary files. = In this
> =A0 =A0 =A0configuration, multiple pages scattered in different domain= s mostly
> =A0 =A0 =A0happen to contain same disk block. So, in our approach to p= erform
> =A0 =A0 =A0deduplication we intend to add a data structure in dom 0 wh= ich store
> =A0 =A0 =A0disk block number and the machine frame number(MFN) when a = read request
> =A0 =A0 =A0for the read only code(and data) is made. Now when another = domain U
> =A0 =A0 =A0places the request for the block of code and Dom 0 recieves= a request
> =A0 =A0 =A0for I/O (DMA), it will first check into the data structure = for the entry
> =A0 =A0 =A0for the block. If it finds the block it will return the MFN= of the
> =A0 =A0 =A0already read page and map it to the requesting domain's= PFN resulting in
> =A0 =A0 =A0zero I/O processing time of blocks which are already read. = This in turn
> =A0 =A0 =A0results in de-duplication of the read only pages accessed b= y multiple
> =A0 =A0 =A0domains without any overhead of hashing the page.
>
> =A0 =A0 =A0Test case scenario:
>
> =A0 =A0 =A0Consider a Dom0 linux kernel using a filesystem with dedupl= ication
> =A0 =A0 =A0enabled. Then we install a DomU kernel with the virtual dis= k as a image
> =A0 =A0 =A0file on the disk(.img). Then we make multiple copies of the= image to
> =A0 =A0 =A0deploy multiple DomUs running same kernel. Now, as deduplic= ation is
> =A0 =A0 =A0enabled in the file system initially all the blocks of the = domains will
> =A0 =A0 =A0be pointing to the same disk blocks. Now when the kernel= 9;s are booted,
> =A0 =A0 =A0they all will consume memory only once for the programs(cod= e segment)
> =A0 =A0 =A0loaded in the memory. Now as these OSs start to write to th= eir own
> =A0 =A0 =A0virtual filesystems the blocks of the image will be COW'= ;ed by the
> =A0 =A0 =A0filesystem resulting in different block number.
> =A0 =A0 =A0Is such a approach implemented? =A0We intend to implement t= his as a
> =A0 =A0 =A0project. What are the suspected challanges?
>
> =A0 =A0 =A0Regards,
> =A0 =A0 =A0Aditya Gadre
>
> References
>
> =A0 =A0Visible links
> =A0 =A01. mailto:dan.mag= enheimer@oracle.com
> =A0 =A02. http://lists.xensource.com/archi= ves/html/xen-devel/2009-12/msg00797.html
> =A0 =A03. mailto:adivb2003@gmai= l.com
> =A0 =A04. mailto:Xen-= devel@lists.xensource.com

_______________________________________________
Xen-devel mailing list



--
perception = is but an offspring of its own self
--0022152d6d8dea7384049252ba67-- --===============1131108503== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============1131108503==--