* RE: [PATCH][TOOLS] Reducing impactofdomainsave/restore/dump on Dom0
@ 2007-02-23 15:13 Graham, Simon
2007-02-23 16:28 ` Keir Fraser
0 siblings, 1 reply; 3+ messages in thread
From: Graham, Simon @ 2007-02-23 15:13 UTC (permalink / raw)
To: Keir Fraser, xen-devel
[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]
Take 3 -- I've moved the flushing/fadvise-ing code into a common routine
-- the only way I found to do this in line with other utilities was to
make it inline in xc_private.h (since this is the only file included in
both libxenctrl and libxenguest). The 'ifdef __linux__' stuff is now
confined to this routine which is better...
I've also left the fsync() in place -- I think it is necessary (and
certainly does no harm).
Simon
-----------------------------------
Reduce impact of saving/restoring/dumping large domains on Dom0 memory
usage by means of fadvise64() to tell the OS to discard the cache pages
used for the save/dump file.
Signed-off-by: Simon Graham <Simon.Graham@stratus.com>
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of Keir Fraser
> Sent: Wednesday, February 21, 2007 10:10 AM
> To: Graham, Simon; Keir Fraser; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] [PATCH][TOOLS] Reducing
> impactofdomainsave/restore/dump on Dom0
>
> On 21/2/07 15:06, "Graham, Simon" <Simon.Graham@stratus.com> wrote:
>
> > 2. sync-then-advise is only done at the end of writing a file to
> ensure
> > that all
> > of the cached pages are discarded. Whilst writing the file, I
only
> > fadvise
> > which triggers a write back and discards any clean pages up to
the
> > specified offset.
> > This is indeed a performance thing -- fsyncing on every write
> makes
> > it very slow.
>
> Do you need the fsync at all? It's possible that the kernel will
> launder-then-discard the affected pages automatically, just from the
> fadvise() alone.
>
> -- Keir
[-- Attachment #2: xen-dom0-cache.patch --]
[-- Type: application/octet-stream, Size: 7222 bytes --]
Index: trunk/tools/xcutils/Makefile
===================================================================
--- trunk/tools/xcutils/Makefile (revision 10784)
+++ trunk/tools/xcutils/Makefile (working copy)
@@ -17,6 +17,7 @@
CFLAGS += -Werror -fno-strict-aliasing
CFLAGS += $(INCLUDES)
+CFLAGS += -D_GNU_SOURCE
# Make gcc generate dependencies.
CFLAGS += -Wp,-MD,.$(@F).d
Index: trunk/tools/libxc/xc_linux_save.c
===================================================================
--- trunk/tools/libxc/xc_linux_save.c (revision 10784)
+++ trunk/tools/libxc/xc_linux_save.c (working copy)
@@ -171,7 +171,29 @@
(new->tv_usec - old->tv_usec);
}
+static int noncached_write(int fd, int live, void *buffer, int len)
+{
+ static int write_count = 0;
+ int rc = write(fd,buffer,len);
+
+ if (!live) {
+ write_count += len;
+
+ if (write_count >= MAX_PAGECACHE_USAGE*PAGE_SIZE) {
+ int serrno = errno;
+
+ /* Time to discard cache - dont care if this fails */
+ xc_discard_file_cache(fd, 0 /* no flush */);
+
+ write_count = 0;
+
+ errno = serrno;
+ }
+ }
+ return rc;
+}
+
#ifdef ADAPTIVE_SAVE
@@ -204,7 +226,7 @@
}
-static int ratewrite(int io_fd, void *buf, int n)
+static int ratewrite(int io_fd, int live, void *buf, int n)
{
static int budget = 0;
static int burst_time_us = -1;
@@ -214,7 +236,7 @@
long long delta;
if (START_MBIT_RATE == 0)
- return write(io_fd, buf, n);
+ return noncached_write(io_fd, live, buf, n);
budget -= n;
if (budget < 0) {
@@ -250,13 +272,13 @@
}
}
}
- return write(io_fd, buf, n);
+ return noncached_write(io_fd, live, buf, n);
}
#else /* ! ADAPTIVE SAVE */
#define RATE_IS_MAX() (0)
-#define ratewrite(_io_fd, _buf, _n) write((_io_fd), (_buf), (_n))
+#define ratewrite(_io_fd, _live, _buf, _n) noncached_write((_io_fd), (_live), (_buf), (_n))
#define initialize_mbit_rate()
#endif
@@ -1069,7 +1091,7 @@
if(race && !live)
goto out;
- if (ratewrite(io_fd, page, PAGE_SIZE) != PAGE_SIZE) {
+ if (ratewrite(io_fd, live, page, PAGE_SIZE) != PAGE_SIZE) {
ERROR("Error when writing to state file (4)"
" (errno %d)", errno);
goto out;
@@ -1078,7 +1100,7 @@
} else {
/* We have a normal page: just write it directly. */
- if (ratewrite(io_fd, spage, PAGE_SIZE) != PAGE_SIZE) {
+ if (ratewrite(io_fd, live, spage, PAGE_SIZE) != PAGE_SIZE) {
ERROR("Error when writing to state file (5)"
" (errno %d)", errno);
goto out;
@@ -1248,6 +1270,10 @@
DPRINTF("Warning - couldn't disable shadow mode");
}
}
+ else {
+ // flush last write and discard cache for file
+ xc_discard_file_cache(io_fd, 1 /* flush */);
+ }
if (live_shinfo)
munmap(live_shinfo, PAGE_SIZE);
Index: trunk/tools/libxc/Makefile
===================================================================
--- trunk/tools/libxc/Makefile (revision 10784)
+++ trunk/tools/libxc/Makefile (working copy)
@@ -56,6 +56,7 @@
CFLAGS += -Werror -Wmissing-prototypes
CFLAGS += -fno-strict-aliasing
CFLAGS += $(INCLUDES) -I.
+CFLAGS += -D_GNU_SOURCE
# Define this to make it possible to run valgrind on code linked with these
# libraries.
Index: trunk/tools/libxc/xc_private.h
===================================================================
--- trunk/tools/libxc/xc_private.h (revision 10784)
+++ trunk/tools/libxc/xc_private.h (working copy)
@@ -41,6 +41,13 @@
#define INFO 1
#define PROGRESS 0
+/*
+** Define max dirty page cache to permit during save/restore -- need to balance
+** keeping cache usage down with CPU impact of invalidating too often.
+** (Currently 16MB)
+*/
+#define MAX_PAGECACHE_USAGE (4*1024)
+
#if INFO
#define IPRINTF(_f, _a...) printf(_f , ## _a)
#else
@@ -158,4 +165,42 @@
void bitmap_64_to_byte(uint8_t *bp, const uint64_t *lp, int nbits);
void bitmap_byte_to_64(uint64_t *lp, const uint8_t *bp, int nbits);
+/* Optionally flush file to disk and discard page cache */
+static inline int xc_discard_file_cache(int fd, int flush)
+{
+// TODO: Add support for, e.g., Solaris
+#ifdef __linux__
+ off_t cur;
+
+ if (flush && fsync(fd) < 0) {
+ PERROR("Failed to flush file: %s", strerror(errno));
+ return -errno;
+ }
+
+ /*
+ * Calculate last page boundary of amount written so far
+ * unless we are flushing in which case entire cache
+ * is discarded
+ */
+ if (flush)
+ cur = 0;
+ else {
+ cur = lseek(fd, 0, SEEK_CUR);
+
+ if (cur != (off_t)-1)
+ cur &= ~(PAGE_SIZE-1);
+ else
+ cur = 0;
+ }
+
+ /* and throw out of cache */
+ if (posix_fadvise64(fd, 0, cur, POSIX_FADV_DONTNEED) < 0) {
+ PERROR("Failed to discard cache: %s", strerror(errno));
+ return -errno;
+ }
+#endif
+
+ return 0;
+}
+
#endif /* __XC_PRIVATE_H__ */
Index: trunk/tools/libxc/xc_core.c
===================================================================
--- trunk/tools/libxc/xc_core.c (revision 10784)
+++ trunk/tools/libxc/xc_core.c (working copy)
@@ -145,6 +145,12 @@
}
}
+ if (length >= DUMP_INCREMENT*PAGE_SIZE) {
+ // Now dumping pages -- make sure we discard clean pages from
+ // the cache after each write
+ xc_discard_file_cache(da->fd, 0 /* no flush */);
+ }
+
return 0;
}
@@ -165,6 +171,9 @@
sts = xc_domain_dumpcore_via_callback(
xc_handle, domid, &da, &local_file_dump);
+ /* flush and discard any remaining portion of the file from cache */
+ xc_discard_file_cache(da.fd, 1/* flush first*/);
+
close(da.fd);
return sts;
Index: trunk/tools/libxc/xc_linux_restore.c
===================================================================
--- trunk/tools/libxc/xc_linux_restore.c (revision 10784)
+++ trunk/tools/libxc/xc_linux_restore.c (working copy)
@@ -143,7 +143,7 @@
unsigned int console_evtchn, unsigned long *console_mfn)
{
DECLARE_DOMCTL;
- int rc = 1, i, n, pae_extended_cr3 = 0;
+ int rc = 1, i, n, m, pae_extended_cr3 = 0;
unsigned long mfn, pfn;
unsigned int prev_pc, this_pc;
int verify = 0;
@@ -330,7 +330,7 @@
*/
prev_pc = 0;
- n = 0;
+ n = m = 0;
while (1) {
int j, nr_mfns = 0;
@@ -529,6 +529,14 @@
munmap(region_base, j*PAGE_SIZE);
n+= j; /* crude stats */
+
+ /* discard cache for portion of file read so far up to last page boundary
+ every 16MB (4k pages) or so */
+ m += j;
+ if (m > MAX_PAGECACHE_USAGE) {
+ xc_discard_file_cache(io_fd, 0 /* no flush */);
+ m = 0;
+ }
}
/*
@@ -863,6 +871,9 @@
free(p2m);
free(pfn_type);
+ /* discard cache for save file */
+ xc_discard_file_cache(io_fd, 1 /*flush*/);
+
DPRINTF("Restore exit with rc=%d\n", rc);
return rc;
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH][TOOLS] Reducing impactofdomainsave/restore/dump on Dom0
2007-02-23 15:13 [PATCH][TOOLS] Reducing impactofdomainsave/restore/dump on Dom0 Graham, Simon
@ 2007-02-23 16:28 ` Keir Fraser
0 siblings, 0 replies; 3+ messages in thread
From: Keir Fraser @ 2007-02-23 16:28 UTC (permalink / raw)
To: Graham, Simon, Keir Fraser, xen-devel
On 23/2/07 15:13, "Graham, Simon" <Simon.Graham@stratus.com> wrote:
> Take 3 -- I've moved the flushing/fadvise-ing code into a common routine
> -- the only way I found to do this in line with other utilities was to
> make it inline in xc_private.h (since this is the only file included in
> both libxenctrl and libxenguest). The 'ifdef __linux__' stuff is now
> confined to this routine which is better...
>
> I've also left the fsync() in place -- I think it is necessary (and
> certainly does no harm).
Looks better. What is the -D_GNU_SOURCE for? It should probably at least be
limited to CFLAGS-$(CONFIG_Linux). And is it really needed in
xcutil/Makefile as well: if it's needed at all it should suffice to use it
when building the library that xcutil links against I would have thought?
-- Keir
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: [PATCH][TOOLS] Reducing impactofdomainsave/restore/dump on Dom0
@ 2007-02-23 18:23 Graham, Simon
0 siblings, 0 replies; 3+ messages in thread
From: Graham, Simon @ 2007-02-23 18:23 UTC (permalink / raw)
To: Keir Fraser, xen-devel
> Looks better. What is the -D_GNU_SOURCE for? It should probably at
> least be
> limited to CFLAGS-$(CONFIG_Linux). And is it really needed in
> xcutil/Makefile as well: if it's needed at all it should suffice to
use
> it
> when building the library that xcutil links against I would have
> thought?
1. You need _GNU_SOURCE to get posix_fadvise64 defined. I'm happy to
make
the CFLAGS change (now that I know this facility exists ;-)
2. You need it in xcutil because this directory has code that includes
xc_private.h -- this seems wrong/bad to me but I wasn't prepared
to change it!
Simon
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-02-23 18:23 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-23 15:13 [PATCH][TOOLS] Reducing impactofdomainsave/restore/dump on Dom0 Graham, Simon
2007-02-23 16:28 ` Keir Fraser
2007-02-23 18:23 Graham, Simon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.