All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [PATCH][TOOLS] Reducing impactofdomainsave/restore/dump on Dom0
@ 2007-02-23 15:13 Graham, Simon
  2007-02-23 16:28 ` Keir Fraser
  0 siblings, 1 reply; 3+ messages in thread
From: Graham, Simon @ 2007-02-23 15:13 UTC (permalink / raw)
  To: Keir Fraser, xen-devel

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

Take 3 -- I've moved the flushing/fadvise-ing code into a common routine
-- the only way I found to do this in line with other utilities was to
make it inline in xc_private.h (since this is the only file included in
both libxenctrl and libxenguest). The 'ifdef __linux__' stuff is now
confined to this routine which is better...

I've also left the fsync() in place -- I think it is necessary (and
certainly does no harm).

Simon

-----------------------------------
Reduce impact of saving/restoring/dumping large domains on Dom0 memory
usage by means of fadvise64() to tell the OS to discard the cache pages
used for the save/dump file.

Signed-off-by: Simon Graham <Simon.Graham@stratus.com>


> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of Keir Fraser
> Sent: Wednesday, February 21, 2007 10:10 AM
> To: Graham, Simon; Keir Fraser; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] [PATCH][TOOLS] Reducing
> impactofdomainsave/restore/dump on Dom0
> 
> On 21/2/07 15:06, "Graham, Simon" <Simon.Graham@stratus.com> wrote:
> 
> > 2. sync-then-advise is only done at the end of writing a file to
> ensure
> > that all
> >    of the cached pages are discarded. Whilst writing the file, I
only
> > fadvise
> >    which triggers a write back and discards any clean pages up to
the
> > specified offset.
> >    This is indeed a performance thing -- fsyncing on every write
> makes
> > it very slow.
> 
> Do you need the fsync at all? It's possible that the kernel will
> launder-then-discard the affected pages automatically, just from the
> fadvise() alone.
> 
>  -- Keir

[-- Attachment #2: xen-dom0-cache.patch --]
[-- Type: application/octet-stream, Size: 7222 bytes --]

Index: trunk/tools/xcutils/Makefile
===================================================================
--- trunk/tools/xcutils/Makefile	(revision 10784)
+++ trunk/tools/xcutils/Makefile	(working copy)
@@ -17,6 +17,7 @@
 
 CFLAGS += -Werror -fno-strict-aliasing
 CFLAGS += $(INCLUDES)
+CFLAGS += -D_GNU_SOURCE
 
 # Make gcc generate dependencies.
 CFLAGS += -Wp,-MD,.$(@F).d
Index: trunk/tools/libxc/xc_linux_save.c
===================================================================
--- trunk/tools/libxc/xc_linux_save.c	(revision 10784)
+++ trunk/tools/libxc/xc_linux_save.c	(working copy)
@@ -171,7 +171,29 @@
         (new->tv_usec - old->tv_usec);
 }
 
+static int noncached_write(int fd, int live, void *buffer, int len) 
+{
+    static int write_count = 0;
 
+    int rc = write(fd,buffer,len);
+
+    if (!live) {
+        write_count += len;
+
+        if (write_count >= MAX_PAGECACHE_USAGE*PAGE_SIZE) {
+            int serrno = errno;
+
+            /* Time to discard cache - dont care if this fails */
+            xc_discard_file_cache(fd, 0 /* no flush */);
+
+            write_count = 0;
+
+            errno = serrno;
+        }
+    }
+    return rc;
+}
+
 #ifdef ADAPTIVE_SAVE
 
 
@@ -204,7 +226,7 @@
 }
 
 
-static int ratewrite(int io_fd, void *buf, int n)
+static int ratewrite(int io_fd, int live, void *buf, int n)
 {
     static int budget = 0;
     static int burst_time_us = -1;
@@ -214,7 +236,7 @@
     long long delta;
 
     if (START_MBIT_RATE == 0)
-        return write(io_fd, buf, n);
+        return noncached_write(io_fd, live, buf, n);
 
     budget -= n;
     if (budget < 0) {
@@ -250,13 +272,13 @@
             }
         }
     }
-    return write(io_fd, buf, n);
+    return noncached_write(io_fd, live, buf, n);
 }
 
 #else /* ! ADAPTIVE SAVE */
 
 #define RATE_IS_MAX() (0)
-#define ratewrite(_io_fd, _buf, _n) write((_io_fd), (_buf), (_n))
+#define ratewrite(_io_fd, _live, _buf, _n) noncached_write((_io_fd), (_live), (_buf), (_n))
 #define initialize_mbit_rate()
 
 #endif
@@ -1069,7 +1091,7 @@
                     if(race && !live) 
                         goto out; 
 
-                    if (ratewrite(io_fd, page, PAGE_SIZE) != PAGE_SIZE) {
+                    if (ratewrite(io_fd, live, page, PAGE_SIZE) != PAGE_SIZE) {
                         ERROR("Error when writing to state file (4)"
                               " (errno %d)", errno);
                         goto out;
@@ -1078,7 +1100,7 @@
                 }  else {
 
                     /* We have a normal page: just write it directly. */
-                    if (ratewrite(io_fd, spage, PAGE_SIZE) != PAGE_SIZE) {
+                    if (ratewrite(io_fd, live, spage, PAGE_SIZE) != PAGE_SIZE) {
                         ERROR("Error when writing to state file (5)"
                               " (errno %d)", errno);
                         goto out;
@@ -1248,6 +1270,10 @@
             DPRINTF("Warning - couldn't disable shadow mode");
         }
     }
+    else {
+        // flush last write and discard cache for file
+        xc_discard_file_cache(io_fd, 1 /* flush */);
+    }            
 
     if (live_shinfo)
         munmap(live_shinfo, PAGE_SIZE);
Index: trunk/tools/libxc/Makefile
===================================================================
--- trunk/tools/libxc/Makefile	(revision 10784)
+++ trunk/tools/libxc/Makefile	(working copy)
@@ -56,6 +56,7 @@
 CFLAGS   += -Werror -Wmissing-prototypes
 CFLAGS   += -fno-strict-aliasing
 CFLAGS   += $(INCLUDES) -I.
+CFLAGS   += -D_GNU_SOURCE
 
 # Define this to make it possible to run valgrind on code linked with these
 # libraries.
Index: trunk/tools/libxc/xc_private.h
===================================================================
--- trunk/tools/libxc/xc_private.h	(revision 10784)
+++ trunk/tools/libxc/xc_private.h	(working copy)
@@ -41,6 +41,13 @@
 #define INFO     1
 #define PROGRESS 0
 
+/*
+** Define max dirty page cache to permit during save/restore -- need to balance 
+** keeping cache usage down with CPU impact of invalidating too often.
+** (Currently 16MB)
+*/
+#define MAX_PAGECACHE_USAGE (4*1024)
+
 #if INFO
 #define IPRINTF(_f, _a...) printf(_f , ## _a)
 #else
@@ -158,4 +165,42 @@
 void bitmap_64_to_byte(uint8_t *bp, const uint64_t *lp, int nbits);
 void bitmap_byte_to_64(uint64_t *lp, const uint8_t *bp, int nbits);
 
+/* Optionally flush file to disk and discard page cache */
+static inline int xc_discard_file_cache(int fd, int flush) 
+{
+// TODO: Add support for, e.g., Solaris
+#ifdef __linux__
+    off_t cur;
+
+    if (flush && fsync(fd) < 0) {
+        PERROR("Failed to flush file: %s", strerror(errno));
+        return -errno;
+    }
+
+    /* 
+     * Calculate last page boundary of amount written so far 
+     * unless we are flushing in which case entire cache
+     * is discarded
+     */
+    if (flush)
+	cur = 0;
+    else {
+	cur = lseek(fd, 0, SEEK_CUR);
+
+	if (cur != (off_t)-1)
+	    cur &= ~(PAGE_SIZE-1);
+	else
+	    cur = 0;
+    }
+
+    /* and throw out of cache */
+    if (posix_fadvise64(fd, 0, cur, POSIX_FADV_DONTNEED) < 0) {
+	PERROR("Failed to discard cache: %s", strerror(errno));
+	return -errno;
+    }
+#endif
+
+    return 0;
+}
+
 #endif /* __XC_PRIVATE_H__ */
Index: trunk/tools/libxc/xc_core.c
===================================================================
--- trunk/tools/libxc/xc_core.c	(revision 10784)
+++ trunk/tools/libxc/xc_core.c	(working copy)
@@ -145,6 +145,12 @@
         }
     }
 
+    if (length >= DUMP_INCREMENT*PAGE_SIZE) {
+        // Now dumping pages -- make sure we discard clean pages from
+        // the cache after each write
+        xc_discard_file_cache(da->fd, 0 /* no flush */);
+    }
+
     return 0;
 }
 
@@ -165,6 +171,9 @@
     sts = xc_domain_dumpcore_via_callback(
         xc_handle, domid, &da, &local_file_dump);
 
+    /* flush and discard any remaining portion of the file from cache */
+    xc_discard_file_cache(da.fd, 1/* flush first*/);
+
     close(da.fd);
 
     return sts;
Index: trunk/tools/libxc/xc_linux_restore.c
===================================================================
--- trunk/tools/libxc/xc_linux_restore.c	(revision 10784)
+++ trunk/tools/libxc/xc_linux_restore.c	(working copy)
@@ -143,7 +143,7 @@
                      unsigned int console_evtchn, unsigned long *console_mfn)
 {
     DECLARE_DOMCTL;
-    int rc = 1, i, n, pae_extended_cr3 = 0;
+    int rc = 1, i, n, m, pae_extended_cr3 = 0;
     unsigned long mfn, pfn;
     unsigned int prev_pc, this_pc;
     int verify = 0;
@@ -330,7 +330,7 @@
      */
     prev_pc = 0;
 
-    n = 0;
+    n = m = 0;
     while (1) {
 
         int j, nr_mfns = 0; 
@@ -529,6 +529,14 @@
 
         munmap(region_base, j*PAGE_SIZE);
         n+= j; /* crude stats */
+
+	/* discard cache for portion of file read so far up to last page boundary
+	   every 16MB (4k pages) or so */
+	m += j;
+	if (m > MAX_PAGECACHE_USAGE) {
+	    xc_discard_file_cache(io_fd, 0 /* no flush */);
+	    m = 0;
+	}
     }
 
     /*
@@ -863,6 +871,9 @@
     free(p2m);
     free(pfn_type);
 
+    /* discard cache for save file  */
+    xc_discard_file_cache(io_fd, 1 /*flush*/);
+
     DPRINTF("Restore exit with rc=%d\n", rc);
     
     return rc;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH][TOOLS] Reducing impactofdomainsave/restore/dump on Dom0
  2007-02-23 15:13 [PATCH][TOOLS] Reducing impactofdomainsave/restore/dump on Dom0 Graham, Simon
@ 2007-02-23 16:28 ` Keir Fraser
  0 siblings, 0 replies; 3+ messages in thread
From: Keir Fraser @ 2007-02-23 16:28 UTC (permalink / raw)
  To: Graham, Simon, Keir Fraser, xen-devel

On 23/2/07 15:13, "Graham, Simon" <Simon.Graham@stratus.com> wrote:

> Take 3 -- I've moved the flushing/fadvise-ing code into a common routine
> -- the only way I found to do this in line with other utilities was to
> make it inline in xc_private.h (since this is the only file included in
> both libxenctrl and libxenguest). The 'ifdef __linux__' stuff is now
> confined to this routine which is better...
> 
> I've also left the fsync() in place -- I think it is necessary (and
> certainly does no harm).

Looks better. What is the -D_GNU_SOURCE for? It should probably at least be
limited to CFLAGS-$(CONFIG_Linux). And is it really needed in
xcutil/Makefile as well: if it's needed at all it should suffice to use it
when building the library that xcutil links against I would have thought?

 -- Keir

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH][TOOLS] Reducing impactofdomainsave/restore/dump on Dom0
@ 2007-02-23 18:23 Graham, Simon
  0 siblings, 0 replies; 3+ messages in thread
From: Graham, Simon @ 2007-02-23 18:23 UTC (permalink / raw)
  To: Keir Fraser, xen-devel



> Looks better. What is the -D_GNU_SOURCE for? It should probably at
> least be
> limited to CFLAGS-$(CONFIG_Linux). And is it really needed in
> xcutil/Makefile as well: if it's needed at all it should suffice to
use
> it
> when building the library that xcutil links against I would have
> thought?

1. You need _GNU_SOURCE to get posix_fadvise64 defined. I'm happy to
make
   the CFLAGS change (now that I know this facility exists ;-)

2. You need it in xcutil because this directory has code that includes
   xc_private.h -- this seems wrong/bad to me but I wasn't prepared
   to change it!

Simon

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-02-23 18:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-23 15:13 [PATCH][TOOLS] Reducing impactofdomainsave/restore/dump on Dom0 Graham, Simon
2007-02-23 16:28 ` Keir Fraser
2007-02-23 18:23 Graham, Simon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.