From mboxrd@z Thu Jan 1 00:00:00 1970 From: Aditya Gadre Subject: Xen Memory De-duplication Date: Sat, 9 Oct 2010 00:32:56 +0530 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0499034762==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org --===============0499034762== Content-Type: multipart/alternative; boundary=001636c5b1b28aa9eb04921fa804 --001636c5b1b28aa9eb04921fa804 Content-Type: text/plain; charset=ISO-8859-1 Aim is to implement Xen Memory Deduplication with minimum overhead. Our approach to de-duplication is as follows? In most cases, Domain-U uses a small set of well-known operating systems such as Linux, FreeBSD and Microsoft Windows. In such environment many domains share read-only filesystems that contain operating system and frequently usedprogram files and libraries.Each domain has their own writable filesystems for storing data and temporary files. In this configuration, multiple pages scattered in different domains mostly happen to contain same disk block. So, in our approach to perform deduplication we intend to add a data structure in dom 0 which store disk block number and the machine frame number(MFN) when a read request for the read only code(and data) is made. Now when another domain U places the request for the block of code and Dom 0 recieves a request for I/O (DMA), it will first check into the data structure for the entry for the block. If it finds the block it will return the MFN of the already read page and map it to the requesting domain's PFN resulting in zero I/O processing time of blocks which are already read. This in turn results in de-duplication of the read only pages accessed by multiple domains without any overhead of hashing the page. Test case scenario: Consider a Dom0 linux kernel using a filesystem with deduplication enabled. Then we install a DomU kernel with the virtual disk as a image file on the disk(.img). Then we make multiple copies of the image to deploy multiple DomUs running same kernel. Now, as deduplication is enabled in the file system initially all the blocks of the domains will be pointing to the same disk blocks. Now when the kernel's are booted, they all will consume memory only once for the programs(code segment) loaded in the memory. Now as these OSs start to write to their own virtual filesystems the blocks of the image will be COW'ed by the filesystem resulting in different block number. Is such a approach implemented? We intend to implement this project. What are the suspected challanges? Regards, Aditya Gadre --001636c5b1b28aa9eb04921fa804 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Aim is to implement Xen Memory Deduplication wit= h minimum overhead.

Our approach to de-duplication is as follows?
In most cases, Domain-U uses a small set of well-known operating syste= ms such as Linux, FreeBSD and Microsoft Windows. In such environment many d= omains share read-only filesystems that contain operating system and freque= ntly usedprogram files and libraries.Each domain has their own writable fil= esystems for storing data and temporary files. In this configuration, multi= ple pages scattered in different domains mostly happen to contain same disk= block. So, in our approach to perform deduplication we intend to add a dat= a structure in dom 0 which store disk block number and the machine frame nu= mber(MFN) when a read request for the read only code(and data) is made. Now= when another domain U places the request for the block of code and Dom 0 r= ecieves a request for I/O (DMA), it will first check into the data structur= e for the entry for the block. If it finds the block it will return the MFN= of the already read page and map it to the requesting domain's PFN res= ulting in zero I/O processing time of blocks which are already read. This i= n turn results in de-duplication of the read only pages accessed by multipl= e domains without any overhead of hashing the page.

Test case scenario:

Consider a Dom0 linux kernel using a filesy= stem with deduplication enabled. Then we install a DomU kernel with the vir= tual disk as a image file on the disk(.img). Then we make multiple copies o= f the image to deploy multiple DomUs running same kernel. Now, as deduplica= tion is enabled in the file system initially all the blocks of the domains = will be pointing to the same disk blocks. Now when the kernel's are boo= ted, they all will consume memory only once for the programs(code segment) = loaded in the memory. Now as these OSs start to write to their own virtual = filesystems the blocks of the image will be COW'ed by the filesystem re= sulting in different block number.
Is such a approach implemented?=A0 We intend to implement this project. Wha= t are the suspected challanges?


Regards,
Aditya Gadre
--001636c5b1b28aa9eb04921fa804-- --===============0499034762== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============0499034762==--