linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] virtio-trace: Support virtio-trace
@ 2012-07-24  2:36 Yoshihiro YUNOMAE
  2012-07-24  2:37 ` [RFC PATCH 1/6] virtio/console: Add splice_write support Yoshihiro YUNOMAE
                   ` (10 more replies)
  0 siblings, 11 replies; 35+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-07-24  2:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Steven Rostedt,
	virtualization, qemu-devel, yrl.pp-manager.tt

Hi All,

The following patch set provides a low-overhead system for collecting kernel
tracing data of guests by a host in a virtualization environment.

A guest OS generally shares some devices with other guests or a host, so
reasons of any problems occurring in a guest may be from other guests or a host.
Then, to collect some tracing data of a number of guests and a host is needed
when some problems occur in a virtualization environment. One of methods to
realize that is to collect tracing data of guests in a host. To do this, network
is generally used. However, high load will be taken to applications on guests
using network I/O because there are many network stack layers. Therefore,
a communication method for collecting the data without using network is needed.

We submitted a patch set of "IVRing", a ring-buffer driver constructed on
Inter-VM shared memory (IVShmem), to LKML http://lwn.net/Articles/500304/ in
this June. IVRing and the IVRing reader use POSIX shared memory each other
without using network, so a low-overhead system for collecting guest tracing
data is realized. However, this patch set has some problems as follows:
 - use IVShmem instead of virtio
 - create a new ring-buffer without using existing ring-buffer in kernel
 - scalability
   -- not support SMP environment
   -- buffer size limitation
   -- not support live migration (maybe difficult for realize this)

Therefore, we propose a new system "virtio-trace", which uses enhanced
virtio-serial and existing ring-buffer of ftrace, for collecting guest kernel
tracing data. In this system, there are 5 main components:
 (1) Ring-buffer of ftrace in a guest
     - When trace agent reads ring-buffer, a page is removed from ring-buffer.
 (2) Trace agent in the guest
     - Splice the page of ring-buffer to read_pipe using splice() without
       memory copying. Then, the page is spliced from write_pipe to virtio
       without memory copying.
 (3) Virtio-console driver in the guest
     - Pass the page to virtio-ring
 (4) Virtio-serial bus in QEMU
     - Copy the page to kernel pipe
 (5) Reader in the host
     - Read guest tracing data via FIFO(named pipe) 

***Evaluation***
When a host collects tracing data of a guest, the performance of using
virtio-trace is compared with that of using native(just running ftrace),
IVRing, and virtio-serial(normal method of read/write).

<environment>
The overview of this evaluation is as follows:
 (a) A guest on a KVM is prepared.
     - The guest is dedicated one physical CPU as a virtual CPU(VCPU).

 (b) The guest starts to write tracing data to ring-buffer of ftrace.
     - The probe points are all trace points of sched, timer, and kmem.

 (c) Writing trace data, dhrystone 2 in UNIX bench is executed as a benchmark
     tool in the guest.
     - Dhrystone 2 intends system performance by repeating integer arithmetic
       as a score.
     - Since higher score equals to better system performance, if the score
       decrease based on bare environment, it indicates that any operation
       disturbs the integer arithmetic. Then, we define the overhead of
       transporting trace data is calculated as follows:
		OVERHEAD = (1 - SCORE_OF_A_METHOD/NATIVE_SCORE) * 100.

The performance of each method is compared as follows:
 [1] Native
     - only recording trace data to ring-buffer on a guest
 [2] Virtio-trace
     - running a trace agent on a guest
     - a reader on a host opens FIFO using cat command
 [3] IVRing
     - A SystemTap script in a guest records trace data to IVRing.
       -- probe points are same as ftrace.
 [4] Virtio-serial(normal)
     - A reader(using cat) on a guest output trace data to a host using
       standard output via virtio-serial.

Other information is as follows:
 - host
   kernel: 3.3.7-1 (Fedora16)
   CPU: Intel Xeon x5660@2.80GHz(12core)
   Memory: 48GB

 - guest(only booting one guest)
   kernel: 3.5.0-rc4+ (Fedora16)
   CPU: 1VCPU(dedicated)
   Memory: 1GB

<result>
3 patterns based on the bare environment were indicated as follows:
	                   Scores      overhead against [0] Native
    [0] Native:          28807569.5               -
    [1] Virtio-trace:    28685049.5             0.43%
    [2] IVRing:          28418595.5             1.35%
    [3] Virtio-serial:   13262258.7            53.96%


***Just enhancement ideas***
 - Support for trace-cmd
 - Support for 9pfs protocol
 - Support for non-blocking mode in QEMU
 - Make "vhost-serial"

Thank you,

---

Masami Hiramatsu (5):
      virtio/console: Allocate scatterlist according to the current pipe size
      ftrace: Allow stealing pages from pipe buffer
      virtio/console: Wait until the port is ready on splice
      virtio/console: Add a failback for unstealable pipe buffer
      virtio/console: Add splice_write support

Yoshihiro YUNOMAE (1):
      tools: Add guest trace agent as a user tool


 drivers/char/virtio_console.c               |  198 ++++++++++++++++++--
 kernel/trace/trace.c                        |    8 -
 tools/virtio/virtio-trace/Makefile          |   14 +
 tools/virtio/virtio-trace/README            |  118 ++++++++++++
 tools/virtio/virtio-trace/trace-agent-ctl.c |  137 ++++++++++++++
 tools/virtio/virtio-trace/trace-agent-rw.c  |  192 +++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.c     |  270 +++++++++++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.h     |   75 ++++++++
 8 files changed, 985 insertions(+), 27 deletions(-)
 create mode 100644 tools/virtio/virtio-trace/Makefile
 create mode 100644 tools/virtio/virtio-trace/README
 create mode 100644 tools/virtio/virtio-trace/trace-agent-ctl.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent-rw.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.h

-- 
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae.ez@hitachi.com


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 1/6] virtio/console: Add splice_write support
  2012-07-24  2:36 [RFC PATCH 0/6] virtio-trace: Support virtio-trace Yoshihiro YUNOMAE
@ 2012-07-24  2:37 ` Yoshihiro YUNOMAE
  2012-08-09  9:00   ` Amit Shah
  2012-07-24  2:37 ` [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer Yoshihiro YUNOMAE
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-07-24  2:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Steven Rostedt,
	virtualization, qemu-devel, yrl.pp-manager.tt, Masami Hiramatsu,
	Amit Shah, Arnd Bergmann, Greg Kroah-Hartman

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Enable to use splice_write from pipe to virtio-console port.
This steals pages from pipe and directly send it to host.

Note that this may accelerate only the guest to host path.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

 drivers/char/virtio_console.c |  136 +++++++++++++++++++++++++++++++++++++++--
 1 files changed, 128 insertions(+), 8 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index cdf2f54..fe31b2f 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -24,6 +24,8 @@
 #include <linux/err.h>
 #include <linux/freezer.h>
 #include <linux/fs.h>
+#include <linux/splice.h>
+#include <linux/pagemap.h>
 #include <linux/init.h>
 #include <linux/list.h>
 #include <linux/poll.h>
@@ -227,6 +229,7 @@ struct port {
 	bool guest_connected;
 };
 
+#define MAX_SPLICE_PAGES	32
 /* This is the very early arch-specified put chars function. */
 static int (*early_put_chars)(u32, const char *, int);
 
@@ -474,26 +477,52 @@ static ssize_t send_control_msg(struct port *port, unsigned int event,
 	return 0;
 }
 
+struct buffer_token {
+	union {
+		void *buf;
+		struct scatterlist *sg;
+	} u;
+	bool sgpages;
+};
+
+static void reclaim_sg_pages(struct scatterlist *sg)
+{
+	int i;
+	struct page *page;
+
+	for (i = 0; i < MAX_SPLICE_PAGES; i++) {
+		page = sg_page(&sg[i]);
+		if (!page)
+			break;
+		put_page(page);
+	}
+	kfree(sg);
+}
+
 /* Callers must take the port->outvq_lock */
 static void reclaim_consumed_buffers(struct port *port)
 {
-	void *buf;
+	struct buffer_token *tok;
 	unsigned int len;
 
 	if (!port->portdev) {
 		/* Device has been unplugged.  vqs are already gone. */
 		return;
 	}
-	while ((buf = virtqueue_get_buf(port->out_vq, &len))) {
-		kfree(buf);
+	while ((tok = virtqueue_get_buf(port->out_vq, &len))) {
+		if (tok->sgpages)
+			reclaim_sg_pages(tok->u.sg);
+		else
+			kfree(tok->u.buf);
+		kfree(tok);
 		port->outvq_full = false;
 	}
 }
 
-static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
-			bool nonblock)
+static ssize_t __send_to_port(struct port *port, struct scatterlist *sg,
+			      int nents, size_t in_count,
+			      struct buffer_token *tok, bool nonblock)
 {
-	struct scatterlist sg[1];
 	struct virtqueue *out_vq;
 	ssize_t ret;
 	unsigned long flags;
@@ -505,8 +534,7 @@ static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
 
 	reclaim_consumed_buffers(port);
 
-	sg_init_one(sg, in_buf, in_count);
-	ret = virtqueue_add_buf(out_vq, sg, 1, 0, in_buf, GFP_ATOMIC);
+	ret = virtqueue_add_buf(out_vq, sg, nents, 0, tok, GFP_ATOMIC);
 
 	/* Tell Host to go! */
 	virtqueue_kick(out_vq);
@@ -544,6 +572,37 @@ done:
 	return in_count;
 }
 
+static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
+			bool nonblock)
+{
+	struct scatterlist sg[1];
+	struct buffer_token *tok;
+
+	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
+	if (!tok)
+		return -ENOMEM;
+	tok->sgpages = false;
+	tok->u.buf = in_buf;
+
+	sg_init_one(sg, in_buf, in_count);
+
+	return __send_to_port(port, sg, 1, in_count, tok, nonblock);
+}
+
+static ssize_t send_pages(struct port *port, struct scatterlist *sg, int nents,
+			  size_t in_count, bool nonblock)
+{
+	struct buffer_token *tok;
+
+	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
+	if (!tok)
+		return -ENOMEM;
+	tok->sgpages = true;
+	tok->u.sg = sg;
+
+	return __send_to_port(port, sg, nents, in_count, tok, nonblock);
+}
+
 /*
  * Give out the data that's requested from the buffer that we have
  * queued up.
@@ -725,6 +784,66 @@ out:
 	return ret;
 }
 
+struct sg_list {
+	unsigned int n;
+	size_t len;
+	struct scatterlist *sg;
+};
+
+static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
+			struct splice_desc *sd)
+{
+	struct sg_list *sgl = sd->u.data;
+	unsigned int len = 0;
+
+	if (sgl->n == MAX_SPLICE_PAGES)
+		return 0;
+
+	/* Try lock this page */
+	if (buf->ops->steal(pipe, buf) == 0) {
+		/* Get reference and unlock page for moving */
+		get_page(buf->page);
+		unlock_page(buf->page);
+
+		len = min(buf->len, sd->len);
+		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
+		sgl->n++;
+		sgl->len += len;
+	}
+
+	return len;
+}
+
+/* Faster zero-copy write by splicing */
+static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
+				      struct file *filp, loff_t *ppos,
+				      size_t len, unsigned int flags)
+{
+	struct port *port = filp->private_data;
+	struct sg_list sgl;
+	ssize_t ret;
+	struct splice_desc sd = {
+		.total_len = len,
+		.flags = flags,
+		.pos = *ppos,
+		.u.data = &sgl,
+	};
+
+	sgl.n = 0;
+	sgl.len = 0;
+	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,
+			 GFP_ATOMIC);
+	if (unlikely(!sgl.sg))
+		return -ENOMEM;
+
+	sg_init_table(sgl.sg, MAX_SPLICE_PAGES);
+	ret = __splice_from_pipe(pipe, &sd, pipe_to_sg);
+	if (likely(ret > 0))
+		ret = send_pages(port, sgl.sg, sgl.n, sgl.len, true);
+
+	return ret;
+}
+
 static unsigned int port_fops_poll(struct file *filp, poll_table *wait)
 {
 	struct port *port;
@@ -856,6 +975,7 @@ static const struct file_operations port_fops = {
 	.open  = port_fops_open,
 	.read  = port_fops_read,
 	.write = port_fops_write,
+	.splice_write = port_fops_splice_write,
 	.poll  = port_fops_poll,
 	.release = port_fops_release,
 	.fasync = port_fops_fasync,



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer
  2012-07-24  2:36 [RFC PATCH 0/6] virtio-trace: Support virtio-trace Yoshihiro YUNOMAE
  2012-07-24  2:37 ` [RFC PATCH 1/6] virtio/console: Add splice_write support Yoshihiro YUNOMAE
@ 2012-07-24  2:37 ` Yoshihiro YUNOMAE
  2012-08-09  9:03   ` Amit Shah
  2012-07-24  2:37 ` [RFC PATCH 3/6] virtio/console: Wait until the port is ready on splice Yoshihiro YUNOMAE
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-07-24  2:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Steven Rostedt,
	virtualization, qemu-devel, yrl.pp-manager.tt, Masami Hiramatsu,
	Amit Shah, Arnd Bergmann, Greg Kroah-Hartman

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Add a failback memcpy path for unstealable pipe buffer.
If buf->ops->steal() fails, virtio-serial tries to
copy the page contents to an allocated page, instead
of just failing splice().

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

 drivers/char/virtio_console.c |   28 +++++++++++++++++++++++++---
 1 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index fe31b2f..911cb3e 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -794,7 +794,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 			struct splice_desc *sd)
 {
 	struct sg_list *sgl = sd->u.data;
-	unsigned int len = 0;
+	unsigned int offset, len;
 
 	if (sgl->n == MAX_SPLICE_PAGES)
 		return 0;
@@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 
 		len = min(buf->len, sd->len);
 		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
-		sgl->n++;
-		sgl->len += len;
+	} else {
+		/* Failback to copying a page */
+		struct page *page = alloc_page(GFP_KERNEL);
+		char *src = buf->ops->map(pipe, buf, 1);
+		char *dst;
+
+		if (!page)
+			return -ENOMEM;
+		dst = kmap(page);
+
+		offset = sd->pos & ~PAGE_MASK;
+
+		len = sd->len;
+		if (len + offset > PAGE_SIZE)
+			len = PAGE_SIZE - offset;
+
+		memcpy(dst + offset, src + buf->offset, len);
+
+		kunmap(page);
+		buf->ops->unmap(pipe, buf, src);
+
+		sg_set_page(&(sgl->sg[sgl->n]), page, len, offset);
 	}
+	sgl->n++;
+	sgl->len += len;
 
 	return len;
 }



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 3/6] virtio/console: Wait until the port is ready on splice
  2012-07-24  2:36 [RFC PATCH 0/6] virtio-trace: Support virtio-trace Yoshihiro YUNOMAE
  2012-07-24  2:37 ` [RFC PATCH 1/6] virtio/console: Add splice_write support Yoshihiro YUNOMAE
  2012-07-24  2:37 ` [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer Yoshihiro YUNOMAE
@ 2012-07-24  2:37 ` Yoshihiro YUNOMAE
  2012-07-24  2:37 ` [RFC PATCH 4/6] ftrace: Allow stealing pages from pipe buffer Yoshihiro YUNOMAE
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-07-24  2:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Steven Rostedt,
	virtualization, qemu-devel, yrl.pp-manager.tt, Masami Hiramatsu,
	Amit Shah, Arnd Bergmann, Greg Kroah-Hartman

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Wait if the port is not connected or full on splice
like as write is doing.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

 drivers/char/virtio_console.c |   39 +++++++++++++++++++++++++++------------
 1 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 911cb3e..e49d435 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -724,6 +724,26 @@ static ssize_t port_fops_read(struct file *filp, char __user *ubuf,
 	return fill_readbuf(port, ubuf, count, true);
 }
 
+static int wait_port_writable(struct port *port, bool nonblock)
+{
+	int ret;
+
+	if (will_write_block(port)) {
+		if (nonblock)
+			return -EAGAIN;
+
+		ret = wait_event_freezable(port->waitqueue,
+					   !will_write_block(port));
+		if (ret < 0)
+			return ret;
+	}
+	/* Port got hot-unplugged. */
+	if (!port->guest_connected)
+		return -ENODEV;
+
+	return 0;
+}
+
 static ssize_t port_fops_write(struct file *filp, const char __user *ubuf,
 			       size_t count, loff_t *offp)
 {
@@ -740,18 +760,9 @@ static ssize_t port_fops_write(struct file *filp, const char __user *ubuf,
 
 	nonblock = filp->f_flags & O_NONBLOCK;
 
-	if (will_write_block(port)) {
-		if (nonblock)
-			return -EAGAIN;
-
-		ret = wait_event_freezable(port->waitqueue,
-					   !will_write_block(port));
-		if (ret < 0)
-			return ret;
-	}
-	/* Port got hot-unplugged. */
-	if (!port->guest_connected)
-		return -ENODEV;
+	ret = wait_port_writable(port, nonblock);
+	if (ret < 0)
+		return ret;
 
 	count = min((size_t)(32 * 1024), count);
 
@@ -851,6 +862,10 @@ static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
 		.u.data = &sgl,
 	};
 
+	ret = wait_port_writable(port, filp->f_flags & O_NONBLOCK);
+	if (ret < 0)
+		return ret;
+
 	sgl.n = 0;
 	sgl.len = 0;
 	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 4/6] ftrace: Allow stealing pages from pipe buffer
  2012-07-24  2:36 [RFC PATCH 0/6] virtio-trace: Support virtio-trace Yoshihiro YUNOMAE
                   ` (2 preceding siblings ...)
  2012-07-24  2:37 ` [RFC PATCH 3/6] virtio/console: Wait until the port is ready on splice Yoshihiro YUNOMAE
@ 2012-07-24  2:37 ` Yoshihiro YUNOMAE
  2012-07-30 22:12   ` Steven Rostedt
  2012-07-24  2:37 ` [RFC PATCH 5/6] virtio/console: Allocate scatterlist according to the current pipe size Yoshihiro YUNOMAE
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-07-24  2:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Steven Rostedt,
	virtualization, qemu-devel, yrl.pp-manager.tt, Masami Hiramatsu,
	Steven Rostedt, Frederic Weisbecker, Ingo Molnar

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Use generic steal operation on pipe buffer to allow stealing
ring buffer's read page from pipe buffer.

Note that this could reduce the performance of splice on the
splice_write side operation without affinity setting.
Since the ring buffer's read pages are allocated on the
tracing-node, but the splice user does not always execute
splice write side operation on the same node. In this case,
the page will be accessed from the another node.
Thus, it is strongly recommended to assign the splicing
thread to corresponding node.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
---

 kernel/trace/trace.c |    8 +-------
 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a120f98..ae01930 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4194,12 +4194,6 @@ static void buffer_pipe_buf_release(struct pipe_inode_info *pipe,
 	buf->private = 0;
 }
 
-static int buffer_pipe_buf_steal(struct pipe_inode_info *pipe,
-				 struct pipe_buffer *buf)
-{
-	return 1;
-}
-
 static void buffer_pipe_buf_get(struct pipe_inode_info *pipe,
 				struct pipe_buffer *buf)
 {
@@ -4215,7 +4209,7 @@ static const struct pipe_buf_operations buffer_pipe_buf_ops = {
 	.unmap			= generic_pipe_buf_unmap,
 	.confirm		= generic_pipe_buf_confirm,
 	.release		= buffer_pipe_buf_release,
-	.steal			= buffer_pipe_buf_steal,
+	.steal			= generic_pipe_buf_steal,
 	.get			= buffer_pipe_buf_get,
 };
 



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 5/6] virtio/console: Allocate scatterlist according to the current pipe size
  2012-07-24  2:36 [RFC PATCH 0/6] virtio-trace: Support virtio-trace Yoshihiro YUNOMAE
                   ` (3 preceding siblings ...)
  2012-07-24  2:37 ` [RFC PATCH 4/6] ftrace: Allow stealing pages from pipe buffer Yoshihiro YUNOMAE
@ 2012-07-24  2:37 ` Yoshihiro YUNOMAE
  2012-07-24  2:37 ` [RFC PATCH 6/6] tools: Add guest trace agent as a user tool Yoshihiro YUNOMAE
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-07-24  2:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Steven Rostedt,
	virtualization, qemu-devel, yrl.pp-manager.tt, Masami Hiramatsu,
	Amit Shah, Arnd Bergmann, Greg Kroah-Hartman

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Allocate scatterlist according to the current pipe size.
This allows splicing bigger buffer if the pipe size has
been changed by fcntl.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

 drivers/char/virtio_console.c |   23 ++++++++++++-----------
 1 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index e49d435..f5063d5 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -229,7 +229,6 @@ struct port {
 	bool guest_connected;
 };
 
-#define MAX_SPLICE_PAGES	32
 /* This is the very early arch-specified put chars function. */
 static int (*early_put_chars)(u32, const char *, int);
 
@@ -482,15 +481,16 @@ struct buffer_token {
 		void *buf;
 		struct scatterlist *sg;
 	} u;
-	bool sgpages;
+	/* If sgpages == 0 then buf is used, else sg is used */
+	unsigned int sgpages;
 };
 
-static void reclaim_sg_pages(struct scatterlist *sg)
+static void reclaim_sg_pages(struct scatterlist *sg, unsigned int nrpages)
 {
 	int i;
 	struct page *page;
 
-	for (i = 0; i < MAX_SPLICE_PAGES; i++) {
+	for (i = 0; i < nrpages; i++) {
 		page = sg_page(&sg[i]);
 		if (!page)
 			break;
@@ -511,7 +511,7 @@ static void reclaim_consumed_buffers(struct port *port)
 	}
 	while ((tok = virtqueue_get_buf(port->out_vq, &len))) {
 		if (tok->sgpages)
-			reclaim_sg_pages(tok->u.sg);
+			reclaim_sg_pages(tok->u.sg, tok->sgpages);
 		else
 			kfree(tok->u.buf);
 		kfree(tok);
@@ -581,7 +581,7 @@ static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
 	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
 	if (!tok)
 		return -ENOMEM;
-	tok->sgpages = false;
+	tok->sgpages = 0;
 	tok->u.buf = in_buf;
 
 	sg_init_one(sg, in_buf, in_count);
@@ -597,7 +597,7 @@ static ssize_t send_pages(struct port *port, struct scatterlist *sg, int nents,
 	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
 	if (!tok)
 		return -ENOMEM;
-	tok->sgpages = true;
+	tok->sgpages = nents;
 	tok->u.sg = sg;
 
 	return __send_to_port(port, sg, nents, in_count, tok, nonblock);
@@ -797,6 +797,7 @@ out:
 
 struct sg_list {
 	unsigned int n;
+	unsigned int size;
 	size_t len;
 	struct scatterlist *sg;
 };
@@ -807,7 +808,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 	struct sg_list *sgl = sd->u.data;
 	unsigned int offset, len;
 
-	if (sgl->n == MAX_SPLICE_PAGES)
+	if (sgl->n == sgl->size)
 		return 0;
 
 	/* Try lock this page */
@@ -868,12 +869,12 @@ static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
 
 	sgl.n = 0;
 	sgl.len = 0;
-	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,
-			 GFP_ATOMIC);
+	sgl.size = pipe->nrbufs;
+	sgl.sg = kmalloc(sizeof(struct scatterlist) * sgl.size, GFP_ATOMIC);
 	if (unlikely(!sgl.sg))
 		return -ENOMEM;
 
-	sg_init_table(sgl.sg, MAX_SPLICE_PAGES);
+	sg_init_table(sgl.sg, sgl.size);
 	ret = __splice_from_pipe(pipe, &sd, pipe_to_sg);
 	if (likely(ret > 0))
 		ret = send_pages(port, sgl.sg, sgl.n, sgl.len, true);



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 6/6] tools: Add guest trace agent as a user tool
  2012-07-24  2:36 [RFC PATCH 0/6] virtio-trace: Support virtio-trace Yoshihiro YUNOMAE
                   ` (4 preceding siblings ...)
  2012-07-24  2:37 ` [RFC PATCH 5/6] virtio/console: Allocate scatterlist according to the current pipe size Yoshihiro YUNOMAE
@ 2012-07-24  2:37 ` Yoshihiro YUNOMAE
  2012-07-24  3:27 ` [RFC PATCH 0/6] virtio-trace: Support virtio-trace Masami Hiramatsu
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-07-24  2:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Steven Rostedt,
	virtualization, qemu-devel, yrl.pp-manager.tt, Yoshihiro YUNOMAE

This patch adds a user tool, "trace agent" for sending trace data of a guest to
a Host in low overhead. This agent has the following functions:
 - splice a page of ring-buffer to read_pipe without memory copying
 - splice the page from write_pipe to virtio-console without memory copying
 - write trace data to stdout by using -o option
 - controlled by start/stop orders from a Host

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
---

 tools/virtio/virtio-trace/Makefile          |   14 +
 tools/virtio/virtio-trace/README            |  118 ++++++++++++
 tools/virtio/virtio-trace/trace-agent-ctl.c |  137 ++++++++++++++
 tools/virtio/virtio-trace/trace-agent-rw.c  |  192 +++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.c     |  270 +++++++++++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.h     |   75 ++++++++
 6 files changed, 806 insertions(+), 0 deletions(-)
 create mode 100644 tools/virtio/virtio-trace/Makefile
 create mode 100644 tools/virtio/virtio-trace/README
 create mode 100644 tools/virtio/virtio-trace/trace-agent-ctl.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent-rw.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.h

diff --git a/tools/virtio/virtio-trace/Makefile b/tools/virtio/virtio-trace/Makefile
new file mode 100644
index 0000000..ef3adfc
--- /dev/null
+++ b/tools/virtio/virtio-trace/Makefile
@@ -0,0 +1,14 @@
+CC = gcc
+CFLAGS = -O2 -Wall
+LFLAG = -lpthread
+
+all: trace-agent
+
+.c.o:
+	$(CC) $(CFLAGS) $(LFLAG) -c $^ -o $@
+
+trace-agent: trace-agent.o trace-agent-ctl.o trace-agent-rw.o
+	$(CC) $(CFLAGS) $(LFLAG) -o $@ $^
+
+clean:
+	rm -f *.o trace-agent
diff --git a/tools/virtio/virtio-trace/README b/tools/virtio/virtio-trace/README
new file mode 100644
index 0000000..b64845b
--- /dev/null
+++ b/tools/virtio/virtio-trace/README
@@ -0,0 +1,118 @@
+Trace Agent for virtio-trace
+============================
+
+Trace agent is a user tool for sending trace data of a guest to a Host in low
+overhead. Trace agent has the following functions:
+ - splice a page of ring-buffer to read_pipe without memory copying
+ - splice the page from write_pipe to virtio-console without memory copying
+ - write trace data to stdout by using -o option
+ - controlled by start/stop orders from a Host
+
+The trace agent operates as follows:
+ 1) Initialize all structures.
+ 2) Create a read/write thread per CPU. Each thread is bound to a CPU.
+    The read/write threads hold it.
+ 3) A controller thread does poll() for a start order of a host.
+ 4) After the controller of the trace agent receives a start order from a host,
+    the controller wake read/write threads.
+ 5) The read/write threads start to read trace data from ring-buffers and
+    write the data to virtio-serial.
+ 6) If the controller receives a stop order from a host, the read/write threads
+    stop to read trace data.
+
+
+Files
+=====
+
+README: this file
+Makefile: Makefile of trace agent for virtio-trace
+trace-agent.c: includes main function, sets up for operating trace agent
+trace-agent.h: includes all structures and some macros
+trace-agent-ctl.c: includes controller function for read/write threads
+trace-agent-rw.c: includes read/write threads function
+
+
+Setup
+=====
+
+To use this trace agent for virtio-trace, we need to prepare some virtio-serial
+I/Fs.
+
+1) Make FIFO in a host
+ virtio-trace uses virtio-serial pipe as trace data paths as to the number
+of CPUs and a control path, so FIFO (named pipe) should be created as follows:
+	# mkdir /tmp/virtio-trace/
+	# mkfifo /tmp/virtio-trace/trace-path-cpu{0,1,2,...,X}.{in,out}
+	# mkfifo /tmp/virtio-trace/agent-ctl-path.{in,out}
+
+For example, if a guest use three CPUs, the names are
+	trace-path-cpu{0,1,2}.{in.out}
+and
+	agent-ctl-path.{in,out}.
+
+2) Set up of virtio-serial pipe in a host
+ Add qemu option to use virtio-serial pipe.
+
+ ##virtio-serial device##
+     -device virtio-serial-pci,id=virtio-serial0\
+ ##control path##
+     -chardev pipe,id=charchannel0,path=/tmp/virtio-trace/agent-ctl-path\
+     -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,\
+      id=channel0,name=agent-ctl-path\
+ ##data path##
+     -chardev pipe,id=charchannel1,path=/tmp/virtio-trace/trace-path-cpu0\
+     -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0,\
+      id=channel1,name=trace-path-cpu0\
+      ...
+
+If you manage guests with libvirt, add the following tags to domain XML files.
+Then, libvirt passes the same command option to qemu.
+
+	<channel type='pipe'>
+	   <source path='/tmp/virtio-trace/agent-ctl-path'/>
+	   <target type='virtio' name='agent-ctl-path'/>
+	   <address type='virtio-serial' controller='0' bus='0' port='0'/>
+	</channel>
+	<channel type='pipe'>
+	   <source path='/tmp/virtio-trace/trace-path-cpu0'/>
+	   <target type='virtio' name='trace-path-cpu0'/>
+	   <address type='virtio-serial' controller='0' bus='0' port='1'/>
+	</channel>
+	...
+Here, chardev names are restricted to trace-path-cpuX and agent-ctl-path. For
+example, if a guest use three CPUs, chardev names should be trace-path-cpu0,
+trace-path-cpu1, trace-path-cpu2, and agent-ctl-path.
+
+3) Boot the guest
+ You can find some chardev in /dev/virtio-ports/ in the guest.
+
+
+Run
+===
+
+0) Build trace agent in a guest
+	$ make
+
+1) Enable ftrace in the guest
+ <Example>
+	# echo 1 > /sys/kernel/debug/tracing/events/sched/enable
+
+2) Run trace agent in the guest
+ This agent must be operated as root.
+	# ./trace-agent
+read/write threads in the agent wait for start order from host. If you add -o
+option, trace data are output via stdout in the guest.
+
+3) Open FIFO in a host
+	# cat /tmp/virtio-trace/trace-path-cpu0.out
+If a host does not open these, trace data get stuck in buffers of virtio. Then,
+the guest will stop by specification of chardev in QEMU. This blocking mode may
+be solved in the future.
+
+4) Start to read trace data by ordering from a host
+ A host injects read start order to the guest via virtio-serial.
+	# echo 1 > /tmp/virtio-trace/agent-ctl-path.in
+
+5) Stop to read trace data by ordering from a host
+ A host injects read stop order to the guest via virtio-serial.
+	# echo 0 > /tmp/virtio-trace/agent-ctl-path.in
diff --git a/tools/virtio/virtio-trace/trace-agent-ctl.c b/tools/virtio/virtio-trace/trace-agent-ctl.c
new file mode 100644
index 0000000..0739815
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent-ctl.c
@@ -0,0 +1,137 @@
+/*
+ * Controller of read/write threads for virtio-trace
+ *
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Created by Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ *            Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <poll.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "trace-agent.h"
+
+#define HOST_MSG_SIZE		256
+#define EVENT_WAIT_MSEC		100
+
+static volatile sig_atomic_t global_signal_val;
+bool global_sig_receive;	/* default false */
+bool global_run_operation;	/* default false*/
+
+/* Handle SIGTERM/SIGINT/SIGQUIT to exit */
+static void signal_handler(int sig)
+{
+	global_signal_val = sig;
+}
+
+int rw_ctl_init(const char *ctl_path)
+{
+	int ctl_fd;
+
+	ctl_fd = open(ctl_path, O_RDONLY);
+	if (ctl_fd == -1) {
+		fprintf(stderr, "Cannot open ctl_fd\n");
+		goto error;
+	}
+
+	return ctl_fd;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static int wait_order(int ctl_fd)
+{
+	struct pollfd poll_fd;
+	int ret = 0;
+
+	while (!global_sig_receive) {
+		poll_fd.fd = ctl_fd;
+		poll_fd.events = POLLIN;
+
+		ret = poll(&poll_fd, 1, EVENT_WAIT_MSEC);
+
+		if (global_signal_val) {
+			global_sig_receive = true;
+			pr_info("Receive interrupt %d\n", global_signal_val);
+
+			/* Wakes rw-threads when they are sleeping */
+			if (!global_run_operation)
+				pthread_cond_broadcast(&cond_wakeup);
+
+			ret = -1;
+			break;
+		}
+
+		if (ret < 0) {
+			pr_err("Polling error\n");
+			goto error;
+		}
+
+		if (ret)
+			break;
+	};
+
+	return ret;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+/*
+ * contol read/write threads by handling global_run_operation
+ */
+void *rw_ctl_loop(int ctl_fd)
+{
+	ssize_t rlen;
+	char buf[HOST_MSG_SIZE];
+	int ret;
+
+	/* Setup signal handlers */
+	signal(SIGTERM, signal_handler);
+	signal(SIGINT, signal_handler);
+	signal(SIGQUIT, signal_handler);
+
+	while (!global_sig_receive) {
+
+		ret = wait_order(ctl_fd);
+		if (ret < 0)
+			break;
+
+		rlen = read(ctl_fd, buf, sizeof(buf));
+		if (rlen < 0) {
+			pr_err("read data error in ctl thread\n");
+			goto error;
+		}
+
+		if (rlen == 2 && buf[0] == '1') {
+			/*
+			 * If host writes '1' to a control path,
+			 * this controller wakes all read/write threads.
+			 */
+			global_run_operation = true;
+			pthread_cond_broadcast(&cond_wakeup);
+			pr_debug("Wake up all read/write threads\n");
+		} else if (rlen == 2 && buf[0] == '0') {
+			/*
+			 * If host writes '0' to a control path, read/write
+			 * threads will wait for notification from Host.
+			 */
+			global_run_operation = false;
+			pr_debug("Stop all read/write threads\n");
+		} else
+			pr_info("Invalid host notification: %s\n", buf);
+	}
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
diff --git a/tools/virtio/virtio-trace/trace-agent-rw.c b/tools/virtio/virtio-trace/trace-agent-rw.c
new file mode 100644
index 0000000..3aace5e
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent-rw.c
@@ -0,0 +1,192 @@
+/*
+ * Read/write thread of a guest agent for virtio-trace
+ *
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Created by Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ *            Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include "trace-agent.h"
+
+#define READ_WAIT_USEC	100000
+
+void *rw_thread_info_new(void)
+{
+	struct rw_thread_info *rw_ti;
+
+	rw_ti = zalloc(sizeof(struct rw_thread_info));
+	if (rw_ti == NULL) {
+		pr_err("rw_thread_info zalloc error\n");
+		exit(EXIT_FAILURE);
+	}
+
+	rw_ti->cpu_num = -1;
+	rw_ti->in_fd = -1;
+	rw_ti->out_fd = -1;
+	rw_ti->read_pipe = -1;
+	rw_ti->write_pipe = -1;
+	rw_ti->pipe_size = PIPE_INIT;
+
+	return rw_ti;
+}
+
+void *rw_thread_init(int cpu, const char *in_path, const char *out_path,
+				bool stdout_flag, unsigned long pipe_size,
+				struct rw_thread_info *rw_ti)
+{
+	int data_pipe[2];
+
+	rw_ti->cpu_num = cpu;
+
+	/* set read(input) fd */
+	rw_ti->in_fd = open(in_path, O_RDONLY);
+	if (rw_ti->in_fd == -1) {
+		pr_err("Could not open in_fd (CPU:%d)\n", cpu);
+		goto error;
+	}
+
+	/* set write(output) fd */
+	if (!stdout_flag) {
+		/* virtio-serial output mode */
+		rw_ti->out_fd = open(out_path, O_WRONLY);
+		if (rw_ti->out_fd == -1) {
+			pr_err("Could not open out_fd (CPU:%d)\n", cpu);
+			goto error;
+		}
+	} else
+		/* stdout mode */
+		rw_ti->out_fd = STDOUT_FILENO;
+
+	if (pipe2(data_pipe, O_NONBLOCK) < 0) {
+		pr_err("Could not create pipe in rw-thread(%d)\n", cpu);
+		goto error;
+	}
+
+	/*
+	 * Size of pipe is 64kB in default based on fs/pipe.c.
+	 * To read/write trace data speedy, pipe size is changed.
+	 */
+	if (fcntl(*data_pipe, F_SETPIPE_SZ, pipe_size) < 0) {
+		pr_err("Could not change pipe size in rw-thread(%d)\n", cpu);
+		goto error;
+	}
+
+	rw_ti->read_pipe = data_pipe[1];
+	rw_ti->write_pipe = data_pipe[0];
+	rw_ti->pipe_size = pipe_size;
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+/* Bind a thread to a cpu */
+static void bind_cpu(int cpu_num)
+{
+	cpu_set_t mask;
+
+	CPU_ZERO(&mask);
+	CPU_SET(cpu_num, &mask);
+
+	/* bind my thread to cpu_num by assigning zero to the first argument */
+	if (sched_setaffinity(0, sizeof(mask), &mask) == -1)
+		pr_err("Could not set CPU#%d affinity\n", (int)cpu_num);
+}
+
+static void *rw_thread_main(void *thread_info)
+{
+	ssize_t rlen, wlen;
+	ssize_t ret;
+	struct rw_thread_info *ts = (struct rw_thread_info *)thread_info;
+
+	bind_cpu(ts->cpu_num);
+
+	while (1) {
+		/* Wait for a read order of trace data by Host OS */
+		if (!global_run_operation) {
+			pthread_mutex_lock(&mutex_notify);
+			pthread_cond_wait(&cond_wakeup, &mutex_notify);
+			pthread_mutex_unlock(&mutex_notify);
+		}
+
+		if (global_sig_receive)
+			break;
+
+		/*
+		 * Each thread read trace_pipe_raw of each cpu bounding the
+		 * thread, so contention of multi-threads does not occur.
+		 */
+		rlen = splice(ts->in_fd, NULL, ts->read_pipe, NULL,
+				ts->pipe_size, SPLICE_F_MOVE | SPLICE_F_MORE);
+
+		if (rlen < 0) {
+			pr_err("Splice_read in rw-thread(%d)\n", ts->cpu_num);
+			goto error;
+		} else if (rlen == 0) {
+			/*
+			 * If trace data do not exist or are unreadable not
+			 * for exceeding the page size, splice_read returns
+			 * NULL. Then, this waits for being filled the data in a
+			 * ring-buffer.
+			 */
+			usleep(READ_WAIT_USEC);
+			pr_debug("Read retry(cpu:%d)\n", ts->cpu_num);
+			continue;
+		}
+
+		wlen = 0;
+
+		do {
+			ret = splice(ts->write_pipe, NULL, ts->out_fd, NULL,
+					rlen - wlen,
+					SPLICE_F_MOVE | SPLICE_F_MORE);
+
+			if (ret < 0) {
+				pr_err("Splice_write in rw-thread(%d)\n",
+								ts->cpu_num);
+				goto error;
+			} else if (ret == 0)
+				/*
+				 * When host reader is not in time for reading
+				 * trace data, guest will be stopped. This is
+				 * because char dev in QEMU is not supported
+				 * non-blocking mode. Then, writer might be
+				 * sleep in that case.
+				 * This sleep will be removed by supporting
+				 * non-blocking mode.
+				 */
+				sleep(1);
+			wlen += ret;
+		} while (wlen < rlen);
+	}
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+
+pthread_t rw_thread_run(struct rw_thread_info *rw_ti)
+{
+	int ret;
+	pthread_t rw_thread_per_cpu;
+
+	ret = pthread_create(&rw_thread_per_cpu, NULL, rw_thread_main, rw_ti);
+	if (ret != 0) {
+		pr_err("Could not create a rw thread(%d)\n", rw_ti->cpu_num);
+		exit(EXIT_FAILURE);
+	}
+
+	return rw_thread_per_cpu;
+}
diff --git a/tools/virtio/virtio-trace/trace-agent.c b/tools/virtio/virtio-trace/trace-agent.c
new file mode 100644
index 0000000..1816259
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent.c
@@ -0,0 +1,270 @@
+/*
+ * Guest agent for virtio-trace
+ *
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Created by Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ *            Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#define _GNU_SOURCE
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "trace-agent.h"
+
+#define PAGE_SIZE		(sysconf(_SC_PAGE_SIZE))
+#define PIPE_DEF_BUFS		16
+#define PIPE_MIN_SIZE		(PAGE_SIZE*PIPE_DEF_BUFS)
+#define PIPE_MAX_SIZE		(1024*1024)
+#define READ_PATH_FMT	\
+		"/sys/kernel/debug/tracing/per_cpu/cpu%d/trace_pipe_raw"
+#define WRITE_PATH_FMT		"/dev/virtio-ports/trace-path-cpu%d"
+#define CTL_PATH		"/dev/virtio-ports/agent-ctl-path"
+
+pthread_mutex_t mutex_notify = PTHREAD_MUTEX_INITIALIZER;
+pthread_cond_t cond_wakeup = PTHREAD_COND_INITIALIZER;
+
+static int get_total_cpus(void)
+{
+	int nr_cpus = (int)sysconf(_SC_NPROCESSORS_CONF);
+
+	if (nr_cpus <= 0) {
+		pr_err("Could not read cpus\n");
+		goto error;
+	} else if (nr_cpus > MAX_CPUS) {
+		pr_err("Exceed max cpus(%d)\n", (int)MAX_CPUS);
+		goto error;
+	}
+
+	return nr_cpus;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static void *agent_info_new(void)
+{
+	struct agent_info *s;
+	int i;
+
+	s = zalloc(sizeof(struct agent_info));
+	if (s == NULL) {
+		pr_err("agent_info zalloc error\n");
+		exit(EXIT_FAILURE);
+	}
+
+	s->pipe_size = PIPE_INIT;
+	s->use_stdout = false;
+	s->cpus = get_total_cpus();
+	s->ctl_fd = -1;
+
+	/* read/write threads init */
+	for (i = 0; i < s->cpus; i++)
+		s->rw_ti[i] = rw_thread_info_new();
+
+	return s;
+}
+
+static unsigned long parse_size(const char *arg)
+{
+	unsigned long value, round;
+	char *ptr;
+
+	value = strtoul(arg, &ptr, 10);
+	switch (*ptr) {
+	case 'K': case 'k':
+		value <<= 10;
+		break;
+	case 'M': case 'm':
+		value <<= 20;
+		break;
+	default:
+		break;
+	}
+
+	if (value > PIPE_MAX_SIZE) {
+		pr_err("Pipe size must be less than 1MB\n");
+		goto error;
+	} else if (value < PIPE_MIN_SIZE) {
+		pr_err("Pipe size must be over 64KB\n");
+		goto error;
+	}
+
+	/* Align buffer size with page unit */
+	round = value & (PAGE_SIZE - 1);
+	value = value - round;
+
+	return value;
+error:
+	return 0;
+}
+
+static void usage(char const *prg)
+{
+	fprintf(stderr, "usage: %s [-h] [-o] [-s <size of pipe>]\n", prg);
+}
+
+static const char *make_path(int cpu_num, bool this_is_write_path)
+{
+	int ret;
+	char *buf;
+
+	buf = zalloc(PATH_MAX);
+	if (buf == NULL) {
+		pr_err("Could not allocate buffer\n");
+		goto error;
+	}
+
+	if (this_is_write_path)
+		/* write(output) path */
+		ret = snprintf(buf, PATH_MAX, WRITE_PATH_FMT, cpu_num);
+	else
+		/* read(input) path */
+		ret = snprintf(buf, PATH_MAX, READ_PATH_FMT, cpu_num);
+
+	if (ret <= 0) {
+		pr_err("Failed to generate %s path(CPU#%d):%d\n",
+			this_is_write_path ? "read" : "write", cpu_num, ret);
+		goto error;
+	}
+
+	return buf;
+
+error:
+	free(buf);
+	return NULL;
+}
+
+static const char *make_input_path(int cpu_num)
+{
+	return make_path(cpu_num, false);
+}
+
+static const char *make_output_path(int cpu_num)
+{
+	return make_path(cpu_num, true);
+}
+
+static void *agent_info_init(struct agent_info *s)
+{
+	int cpu;
+	const char *in_path = NULL;
+	const char *out_path = NULL;
+
+	/* init read/write threads */
+	for (cpu = 0; cpu < s->cpus; cpu++) {
+		/* set read(input) path per read/write thread */
+		in_path = make_input_path(cpu);
+		if (in_path == NULL)
+			goto error;
+
+		/* set write(output) path per read/write thread*/
+		if (!s->use_stdout) {
+			out_path = make_output_path(cpu);
+			if (out_path == NULL)
+				goto error;
+		} else
+			/* stdout mode */
+			pr_debug("stdout mode\n");
+
+		rw_thread_init(cpu, in_path, out_path, s->use_stdout,
+						s->pipe_size, s->rw_ti[cpu]);
+	}
+
+	/* init controller of read/write threads */
+	s->ctl_fd = rw_ctl_init((const char *)CTL_PATH);
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static void *parse_args(int argc, char *argv[], struct agent_info *s)
+{
+	int cmd;
+	unsigned long size;
+
+	while ((cmd = getopt(argc, argv, "hos:")) != -1) {
+		switch (cmd) {
+		/* stdout mode */
+		case 'o':
+			s->use_stdout = true;
+			break;
+		/* size of pipe */
+		case 's':
+			size = parse_size(optarg);
+			if (size == 0)
+				goto error;
+			s->pipe_size = size;
+			break;
+		case 'h':
+		default:
+			usage(argv[0]);
+			goto error;
+		}
+	}
+
+	agent_info_init(s);
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static void agent_main_loop(struct agent_info *s)
+{
+	int cpu;
+	pthread_t rw_thread_per_cpu[MAX_CPUS];
+
+	/* Start all read/write threads */
+	for (cpu = 0; cpu < s->cpus; cpu++)
+		rw_thread_per_cpu[cpu] = rw_thread_run(s->rw_ti[cpu]);
+
+	rw_ctl_loop(s->ctl_fd);
+
+	/* Finish all read/write threads */
+	for (cpu = 0; cpu < s->cpus; cpu++) {
+		int ret;
+
+		ret = pthread_join(rw_thread_per_cpu[cpu], NULL);
+		if (ret != 0) {
+			pr_err("pthread_join() error:%d (cpu %d)\n", ret, cpu);
+			exit(EXIT_FAILURE);
+		}
+	}
+}
+
+static void agent_info_free(struct agent_info *s)
+{
+	int i;
+
+	close(s->ctl_fd);
+	for (i = 0; i < s->cpus; i++) {
+		close(s->rw_ti[i]->in_fd);
+		close(s->rw_ti[i]->out_fd);
+		close(s->rw_ti[i]->read_pipe);
+		close(s->rw_ti[i]->write_pipe);
+		free(s->rw_ti[i]);
+	}
+	free(s);
+}
+
+int main(int argc, char *argv[])
+{
+	struct agent_info *s = NULL;
+
+	s = agent_info_new();
+	parse_args(argc, argv, s);
+
+	agent_main_loop(s);
+
+	agent_info_free(s);
+
+	return 0;
+}
diff --git a/tools/virtio/virtio-trace/trace-agent.h b/tools/virtio/virtio-trace/trace-agent.h
new file mode 100644
index 0000000..4711a75
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent.h
@@ -0,0 +1,75 @@
+#ifndef __GUEST_AGENT_H__
+#define __GUEST_AGENT_H__
+#include <pthread.h>
+#include <stdbool.h>
+
+#define MAX_CPUS	256
+#define PIPE_INIT       (1024*1024)
+
+/*
+ * agent_info - structure managing total information of guest agent
+ * @pipe_size:	size of pipe (default 1MB)
+ * @use_stdout:	set to true when o option is added (default false)
+ * @cpus:	total number of CPUs
+ * @ctl_fd:	fd of control path, /dev/virtio-ports/agent-ctl-path
+ * @rw_ti:	structure managing information of read/write threads
+ */
+struct agent_info {
+	unsigned long pipe_size;
+	bool use_stdout;
+	int cpus;
+	int ctl_fd;
+	struct rw_thread_info *rw_ti[MAX_CPUS];
+};
+
+/*
+ * rw_thread_info - structure managing a read/write thread a cpu
+ * @cpu_num:	cpu number operating this read/write thread
+ * @in_fd:	fd of reading trace data path in cpu_num
+ * @out_fd:	fd of writing trace data path in cpu_num
+ * @read_pipe:	fd of read pipe
+ * @write_pipe:	fd of write pipe
+ * @pipe_size:	size of pipe (default 1MB)
+ */
+struct rw_thread_info {
+	int cpu_num;
+	int in_fd;
+	int out_fd;
+	int read_pipe;
+	int write_pipe;
+	unsigned long pipe_size;
+};
+
+/* use for stopping rw threads */
+extern bool global_sig_receive;
+
+/* use for notification */
+extern bool global_run_operation;
+extern pthread_mutex_t mutex_notify;
+extern pthread_cond_t cond_wakeup;
+
+/* for controller of read/write threads */
+extern int rw_ctl_init(const char *ctl_path);
+extern void *rw_ctl_loop(int ctl_fd);
+
+/* for trace read/write thread */
+extern void *rw_thread_info_new(void);
+extern void *rw_thread_init(int cpu, const char *in_path, const char *out_path,
+			bool stdout_flag, unsigned long pipe_size,
+			struct rw_thread_info *rw_ti);
+extern pthread_t rw_thread_run(struct rw_thread_info *rw_ti);
+
+static inline void *zalloc(size_t size)
+{
+	return calloc(1, size);
+}
+
+#define pr_err(format, ...) fprintf(stderr, format, ## __VA_ARGS__)
+#define pr_info(format, ...) fprintf(stdout, format, ## __VA_ARGS__)
+#ifdef DEBUG
+#define pr_debug(format, ...) fprintf(stderr, format, ## __VA_ARGS__)
+#else
+#define pr_debug(format, ...) do {} while (0)
+#endif
+
+#endif /*__GUEST_AGENT_H__*/



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-24  2:36 [RFC PATCH 0/6] virtio-trace: Support virtio-trace Yoshihiro YUNOMAE
                   ` (5 preceding siblings ...)
  2012-07-24  2:37 ` [RFC PATCH 6/6] tools: Add guest trace agent as a user tool Yoshihiro YUNOMAE
@ 2012-07-24  3:27 ` Masami Hiramatsu
  2012-07-24 10:02 ` Stefan Hajnoczi
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Masami Hiramatsu @ 2012-07-24  3:27 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: linux-kernel, Amit Shah, Anthony Liguori, Arnd Bergmann,
	Borislav Petkov, Franch Ch. Eigler, Frederic Weisbecker,
	Greg Kroah-Hartman, Herbert Xu, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, virtualization, qemu-devel, yrl.pp-manager.tt

(2012/07/24 11:36), Yoshihiro YUNOMAE wrote:
> Therefore, we propose a new system "virtio-trace", which uses enhanced
> virtio-serial and existing ring-buffer of ftrace, for collecting guest kernel
> tracing data. In this system, there are 5 main components:
>  (1) Ring-buffer of ftrace in a guest
>      - When trace agent reads ring-buffer, a page is removed from ring-buffer.
>  (2) Trace agent in the guest
>      - Splice the page of ring-buffer to read_pipe using splice() without
>        memory copying. Then, the page is spliced from write_pipe to virtio
>        without memory copying.
>  (3) Virtio-console driver in the guest
>      - Pass the page to virtio-ring
>  (4) Virtio-serial bus in QEMU
>      - Copy the page to kernel pipe
>  (5) Reader in the host
>      - Read guest tracing data via FIFO(named pipe)

So, this is our answer for the argued points in previous thread.
This virtio-serial and ftrace enhancements doesn't introduce new
"ringbuffer" in the kernel, and just use virtio's ringbuffer.
Also, using splice gives us a great advantage in the performance
because of copy-less trace-data transfer.

Actually, one copy should occur in the host (to write it into the pipe),
because removing physical pages of the guest is hard to track and may
involve a TLB flush per page, even if it is done in background.

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-24  2:36 [RFC PATCH 0/6] virtio-trace: Support virtio-trace Yoshihiro YUNOMAE
                   ` (6 preceding siblings ...)
  2012-07-24  3:27 ` [RFC PATCH 0/6] virtio-trace: Support virtio-trace Masami Hiramatsu
@ 2012-07-24 10:02 ` Stefan Hajnoczi
  2012-07-24 11:03   ` Masami Hiramatsu
  2012-07-24 20:26 ` [Qemu-devel] " Blue Swirl
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2012-07-24 10:02 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: linux-kernel, Herbert Xu, Arnd Bergmann, Frederic Weisbecker,
	yrl.pp-manager.tt, qemu-devel, Borislav Petkov, virtualization,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	Srikar Dronamraju, Dhaval Giani

On Tue, Jul 24, 2012 at 3:36 AM, Yoshihiro YUNOMAE
<yoshihiro.yunomae.ez@hitachi.com> wrote:
> The performance of each method is compared as follows:
>  [1] Native
>      - only recording trace data to ring-buffer on a guest
>  [2] Virtio-trace
>      - running a trace agent on a guest
>      - a reader on a host opens FIFO using cat command
>  [3] IVRing
>      - A SystemTap script in a guest records trace data to IVRing.
>        -- probe points are same as ftrace.
>  [4] Virtio-serial(normal)
>      - A reader(using cat) on a guest output trace data to a host using
>        standard output via virtio-serial.

The first time I read this I thought you are adding a new virtio-trace
device.  But it looks like this series really add splice support to
virtio-console and that yields a big performance improvement when
sending trace_pipe_raw.

Guest ftrace is useful and I like this.  Have you thought about
controlling ftrace from the host?  Perhaps a command could be added to
the QEMU guest agent which basically invokes trace-cmd/perf.

Are you using text formatted ftrace?

Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-24 10:02 ` Stefan Hajnoczi
@ 2012-07-24 11:03   ` Masami Hiramatsu
  2012-07-24 11:19     ` Yoshihiro YUNOMAE
  2012-07-24 13:43     ` Stefan Hajnoczi
  0 siblings, 2 replies; 35+ messages in thread
From: Masami Hiramatsu @ 2012-07-24 11:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Yoshihiro YUNOMAE, linux-kernel, Herbert Xu, Arnd Bergmann,
	Frederic Weisbecker, yrl.pp-manager.tt, qemu-devel,
	Borislav Petkov, virtualization, Franch Ch. Eigler, Ingo Molnar,
	Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, Amit Shah, Srikar Dronamraju, Dhaval Giani

(2012/07/24 19:02), Stefan Hajnoczi wrote:
> On Tue, Jul 24, 2012 at 3:36 AM, Yoshihiro YUNOMAE
> <yoshihiro.yunomae.ez@hitachi.com> wrote:
>> The performance of each method is compared as follows:
>>  [1] Native
>>      - only recording trace data to ring-buffer on a guest
>>  [2] Virtio-trace
>>      - running a trace agent on a guest
>>      - a reader on a host opens FIFO using cat command
>>  [3] IVRing
>>      - A SystemTap script in a guest records trace data to IVRing.
>>        -- probe points are same as ftrace.
>>  [4] Virtio-serial(normal)
>>      - A reader(using cat) on a guest output trace data to a host using
>>        standard output via virtio-serial.
> 
> The first time I read this I thought you are adding a new virtio-trace
> device.  But it looks like this series really add splice support to
> virtio-console and that yields a big performance improvement when
> sending trace_pipe_raw.

Yes, sorry for the confusion. Actually this is an enhancement of
virtio-serial. I'm working with Yoshihiro on this feature.

> Guest ftrace is useful and I like this.  Have you thought about
> controlling ftrace from the host?  Perhaps a command could be added to
> the QEMU guest agent which basically invokes trace-cmd/perf.

As you can see, guest trace-agent can be controlled via a
control channel. In our scenario, host tools can control that
instead of guest one.

We are considering that exporting the tracing part of guest's
debugfs to host via another virtio-serial channel by using
9pfs, so that the host tools can refer that.

(In this scenario, guest trace-agent will also provide 9pfs server.
Since it means that the agent can handle writing a special file,
trace-agent can be controlled via the special file on exported
debugfs.)

Of course, this also requires modifying trace-cmd/perf to accept
some options like guest-debugfs mount point, guest's serial
channel pipe (or unix socket?), etc. However, it will be a small
change.

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-24 11:03   ` Masami Hiramatsu
@ 2012-07-24 11:19     ` Yoshihiro YUNOMAE
  2012-07-24 13:41       ` Stefan Hajnoczi
  2012-07-24 13:43     ` Stefan Hajnoczi
  1 sibling, 1 reply; 35+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-07-24 11:19 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Masami Hiramatsu, linux-kernel, Herbert Xu, Arnd Bergmann,
	Frederic Weisbecker, yrl.pp-manager.tt, qemu-devel,
	Borislav Petkov, virtualization, Franch Ch. Eigler, Ingo Molnar,
	Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, Amit Shah, Srikar Dronamraju, Dhaval Giani

Hi Stefan,

Thank you for commenting on our patch set.

(2012/07/24 20:03), Masami Hiramatsu wrote:
> (2012/07/24 19:02), Stefan Hajnoczi wrote:
>> On Tue, Jul 24, 2012 at 3:36 AM, Yoshihiro YUNOMAE
>> <yoshihiro.yunomae.ez@hitachi.com> wrote:
>>> The performance of each method is compared as follows:
>>>   [1] Native
>>>       - only recording trace data to ring-buffer on a guest
>>>   [2] Virtio-trace
>>>       - running a trace agent on a guest
>>>       - a reader on a host opens FIFO using cat command
>>>   [3] IVRing
>>>       - A SystemTap script in a guest records trace data to IVRing.
>>>         -- probe points are same as ftrace.
>>>   [4] Virtio-serial(normal)
>>>       - A reader(using cat) on a guest output trace data to a host using
>>>         standard output via virtio-serial.
>>
>> The first time I read this I thought you are adding a new virtio-trace
>> device.  But it looks like this series really add splice support to
>> virtio-console and that yields a big performance improvement when
>> sending trace_pipe_raw.
>
> Yes, sorry for the confusion. Actually this is an enhancement of
> virtio-serial. I'm working with Yoshihiro on this feature.
>
>> Guest ftrace is useful and I like this.  Have you thought about
>> controlling ftrace from the host?  Perhaps a command could be added to
>> the QEMU guest agent which basically invokes trace-cmd/perf.
>
> As you can see, guest trace-agent can be controlled via a
> control channel. In our scenario, host tools can control that
> instead of guest one.
>
> We are considering that exporting the tracing part of guest's
> debugfs to host via another virtio-serial channel by using
> 9pfs, so that the host tools can refer that.
>
> (In this scenario, guest trace-agent will also provide 9pfs server.
> Since it means that the agent can handle writing a special file,
> trace-agent can be controlled via the special file on exported
> debugfs.)
>
> Of course, this also requires modifying trace-cmd/perf to accept
> some options like guest-debugfs mount point, guest's serial
> channel pipe (or unix socket?), etc. However, it will be a small
> change.
>
> Thank you,
>

 >> Are you using text formatted ftrace?
No, currently using raw format, but we'd like to reformat it in text.

Thank you,

-- 
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae.ez@hitachi.com



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-24 11:19     ` Yoshihiro YUNOMAE
@ 2012-07-24 13:41       ` Stefan Hajnoczi
  2012-07-25  9:13         ` Yoshihiro YUNOMAE
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2012-07-24 13:41 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: Masami Hiramatsu, linux-kernel, Herbert Xu, Arnd Bergmann,
	Frederic Weisbecker, yrl.pp-manager.tt, qemu-devel,
	Borislav Petkov, virtualization, Franch Ch. Eigler, Ingo Molnar,
	Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, Amit Shah, Srikar Dronamraju, Dhaval Giani

On Tue, Jul 24, 2012 at 12:19 PM, Yoshihiro YUNOMAE
<yoshihiro.yunomae.ez@hitachi.com> wrote:
>>> Are you using text formatted ftrace?
> No, currently using raw format, but we'd like to reformat it in text.

Capturing the info necessary to translate numbers into symbols is one
of the problems of host<->guest tracing so I'm curious how you handle
this :).

Apologies for my lack of ftrace knowledge but how useful is the raw
tracing data on the host?  How do you pretty-print it in
human-readable form?

Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-24 11:03   ` Masami Hiramatsu
  2012-07-24 11:19     ` Yoshihiro YUNOMAE
@ 2012-07-24 13:43     ` Stefan Hajnoczi
  1 sibling, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2012-07-24 13:43 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Yoshihiro YUNOMAE, linux-kernel, Herbert Xu, Arnd Bergmann,
	Frederic Weisbecker, yrl.pp-manager.tt, qemu-devel,
	Borislav Petkov, virtualization, Franch Ch. Eigler, Ingo Molnar,
	Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, Amit Shah, Srikar Dronamraju, Dhaval Giani

On Tue, Jul 24, 2012 at 12:03 PM, Masami Hiramatsu
<masami.hiramatsu.pt@hitachi.com> wrote:
> (2012/07/24 19:02), Stefan Hajnoczi wrote:
>> On Tue, Jul 24, 2012 at 3:36 AM, Yoshihiro YUNOMAE
>> <yoshihiro.yunomae.ez@hitachi.com> wrote:
>>> The performance of each method is compared as follows:
>>>  [1] Native
>>>      - only recording trace data to ring-buffer on a guest
>>>  [2] Virtio-trace
>>>      - running a trace agent on a guest
>>>      - a reader on a host opens FIFO using cat command
>>>  [3] IVRing
>>>      - A SystemTap script in a guest records trace data to IVRing.
>>>        -- probe points are same as ftrace.
>>>  [4] Virtio-serial(normal)
>>>      - A reader(using cat) on a guest output trace data to a host using
>>>        standard output via virtio-serial.
>>
>> The first time I read this I thought you are adding a new virtio-trace
>> device.  But it looks like this series really add splice support to
>> virtio-console and that yields a big performance improvement when
>> sending trace_pipe_raw.
>
> Yes, sorry for the confusion. Actually this is an enhancement of
> virtio-serial. I'm working with Yoshihiro on this feature.
>
>> Guest ftrace is useful and I like this.  Have you thought about
>> controlling ftrace from the host?  Perhaps a command could be added to
>> the QEMU guest agent which basically invokes trace-cmd/perf.
>
> As you can see, guest trace-agent can be controlled via a
> control channel. In our scenario, host tools can control that
> instead of guest one.
>
> We are considering that exporting the tracing part of guest's
> debugfs to host via another virtio-serial channel by using
> 9pfs, so that the host tools can refer that.
>
> (In this scenario, guest trace-agent will also provide 9pfs server.
> Since it means that the agent can handle writing a special file,
> trace-agent can be controlled via the special file on exported
> debugfs.)
>
> Of course, this also requires modifying trace-cmd/perf to accept
> some options like guest-debugfs mount point, guest's serial
> channel pipe (or unix socket?), etc. However, it will be a small
> change.

Okay, thanks for explaining some of the ideas you have.  I won't ask
more because it's out of scope for this patch series :).

Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-24  2:36 [RFC PATCH 0/6] virtio-trace: Support virtio-trace Yoshihiro YUNOMAE
                   ` (7 preceding siblings ...)
  2012-07-24 10:02 ` Stefan Hajnoczi
@ 2012-07-24 20:26 ` Blue Swirl
  2012-07-25  8:15   ` Masami Hiramatsu
  2012-07-26 11:35 ` Amit Shah
  2012-08-09 10:16 ` Amit Shah
  10 siblings, 1 reply; 35+ messages in thread
From: Blue Swirl @ 2012-07-24 20:26 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: linux-kernel, Herbert Xu, Arnd Bergmann, Frederic Weisbecker,
	yrl.pp-manager.tt, qemu-devel, Borislav Petkov, virtualization,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah

On Tue, Jul 24, 2012 at 2:36 AM, Yoshihiro YUNOMAE
<yoshihiro.yunomae.ez@hitachi.com> wrote:
> Hi All,
>
> The following patch set provides a low-overhead system for collecting kernel
> tracing data of guests by a host in a virtualization environment.
>
> A guest OS generally shares some devices with other guests or a host, so
> reasons of any problems occurring in a guest may be from other guests or a host.
> Then, to collect some tracing data of a number of guests and a host is needed
> when some problems occur in a virtualization environment. One of methods to
> realize that is to collect tracing data of guests in a host. To do this, network
> is generally used. However, high load will be taken to applications on guests
> using network I/O because there are many network stack layers. Therefore,
> a communication method for collecting the data without using network is needed.

I implemented something similar earlier by passing trace data from
OpenBIOS to QEMU using the firmware configuration device. The data
format was the same as QEMU used for simpletrace event structure
instead of ftrace. I didn't commit it because of a few problems.

I'm not familiar with ftrace, is it possible to trace two guest
applications (BIOS and kernel) at the same time? Or could this be
handled by opening two different virtio-serial pipes, one for BIOS and
the other for the kernel?

In my version, the tracepoint ID would have been used to demultiplex
QEMU tracepoints from BIOS tracepoints, but something like separate ID
spaces would have been better.

>
> We submitted a patch set of "IVRing", a ring-buffer driver constructed on
> Inter-VM shared memory (IVShmem), to LKML http://lwn.net/Articles/500304/ in
> this June. IVRing and the IVRing reader use POSIX shared memory each other
> without using network, so a low-overhead system for collecting guest tracing
> data is realized. However, this patch set has some problems as follows:
>  - use IVShmem instead of virtio
>  - create a new ring-buffer without using existing ring-buffer in kernel
>  - scalability
>    -- not support SMP environment
>    -- buffer size limitation
>    -- not support live migration (maybe difficult for realize this)
>
> Therefore, we propose a new system "virtio-trace", which uses enhanced
> virtio-serial and existing ring-buffer of ftrace, for collecting guest kernel
> tracing data. In this system, there are 5 main components:
>  (1) Ring-buffer of ftrace in a guest
>      - When trace agent reads ring-buffer, a page is removed from ring-buffer.
>  (2) Trace agent in the guest
>      - Splice the page of ring-buffer to read_pipe using splice() without
>        memory copying. Then, the page is spliced from write_pipe to virtio
>        without memory copying.
>  (3) Virtio-console driver in the guest
>      - Pass the page to virtio-ring
>  (4) Virtio-serial bus in QEMU
>      - Copy the page to kernel pipe
>  (5) Reader in the host
>      - Read guest tracing data via FIFO(named pipe)
>
> ***Evaluation***
> When a host collects tracing data of a guest, the performance of using
> virtio-trace is compared with that of using native(just running ftrace),
> IVRing, and virtio-serial(normal method of read/write).
>
> <environment>
> The overview of this evaluation is as follows:
>  (a) A guest on a KVM is prepared.
>      - The guest is dedicated one physical CPU as a virtual CPU(VCPU).
>
>  (b) The guest starts to write tracing data to ring-buffer of ftrace.
>      - The probe points are all trace points of sched, timer, and kmem.
>
>  (c) Writing trace data, dhrystone 2 in UNIX bench is executed as a benchmark
>      tool in the guest.
>      - Dhrystone 2 intends system performance by repeating integer arithmetic
>        as a score.
>      - Since higher score equals to better system performance, if the score
>        decrease based on bare environment, it indicates that any operation
>        disturbs the integer arithmetic. Then, we define the overhead of
>        transporting trace data is calculated as follows:
>                 OVERHEAD = (1 - SCORE_OF_A_METHOD/NATIVE_SCORE) * 100.
>
> The performance of each method is compared as follows:
>  [1] Native
>      - only recording trace data to ring-buffer on a guest
>  [2] Virtio-trace
>      - running a trace agent on a guest
>      - a reader on a host opens FIFO using cat command
>  [3] IVRing
>      - A SystemTap script in a guest records trace data to IVRing.
>        -- probe points are same as ftrace.
>  [4] Virtio-serial(normal)
>      - A reader(using cat) on a guest output trace data to a host using
>        standard output via virtio-serial.
>
> Other information is as follows:
>  - host
>    kernel: 3.3.7-1 (Fedora16)
>    CPU: Intel Xeon x5660@2.80GHz(12core)
>    Memory: 48GB
>
>  - guest(only booting one guest)
>    kernel: 3.5.0-rc4+ (Fedora16)
>    CPU: 1VCPU(dedicated)
>    Memory: 1GB
>
> <result>
> 3 patterns based on the bare environment were indicated as follows:
>                            Scores      overhead against [0] Native
>     [0] Native:          28807569.5               -
>     [1] Virtio-trace:    28685049.5             0.43%
>     [2] IVRing:          28418595.5             1.35%
>     [3] Virtio-serial:   13262258.7            53.96%
>
>
> ***Just enhancement ideas***
>  - Support for trace-cmd
>  - Support for 9pfs protocol
>  - Support for non-blocking mode in QEMU
>  - Make "vhost-serial"
>
> Thank you,
>
> ---
>
> Masami Hiramatsu (5):
>       virtio/console: Allocate scatterlist according to the current pipe size
>       ftrace: Allow stealing pages from pipe buffer
>       virtio/console: Wait until the port is ready on splice
>       virtio/console: Add a failback for unstealable pipe buffer
>       virtio/console: Add splice_write support
>
> Yoshihiro YUNOMAE (1):
>       tools: Add guest trace agent as a user tool
>
>
>  drivers/char/virtio_console.c               |  198 ++++++++++++++++++--
>  kernel/trace/trace.c                        |    8 -
>  tools/virtio/virtio-trace/Makefile          |   14 +
>  tools/virtio/virtio-trace/README            |  118 ++++++++++++
>  tools/virtio/virtio-trace/trace-agent-ctl.c |  137 ++++++++++++++
>  tools/virtio/virtio-trace/trace-agent-rw.c  |  192 +++++++++++++++++++
>  tools/virtio/virtio-trace/trace-agent.c     |  270 +++++++++++++++++++++++++++
>  tools/virtio/virtio-trace/trace-agent.h     |   75 ++++++++
>  8 files changed, 985 insertions(+), 27 deletions(-)
>  create mode 100644 tools/virtio/virtio-trace/Makefile
>  create mode 100644 tools/virtio/virtio-trace/README
>  create mode 100644 tools/virtio/virtio-trace/trace-agent-ctl.c
>  create mode 100644 tools/virtio/virtio-trace/trace-agent-rw.c
>  create mode 100644 tools/virtio/virtio-trace/trace-agent.c
>  create mode 100644 tools/virtio/virtio-trace/trace-agent.h
>
> --
> Yoshihiro YUNOMAE
> Software Platform Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: yoshihiro.yunomae.ez@hitachi.com
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: [Qemu-devel] [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-24 20:26 ` [Qemu-devel] " Blue Swirl
@ 2012-07-25  8:15   ` Masami Hiramatsu
  2012-07-27 18:58     ` Blue Swirl
  0 siblings, 1 reply; 35+ messages in thread
From: Masami Hiramatsu @ 2012-07-25  8:15 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Yoshihiro YUNOMAE, linux-kernel, Herbert Xu, Arnd Bergmann,
	Frederic Weisbecker, yrl.pp-manager.tt, qemu-devel,
	Borislav Petkov, virtualization, Franch Ch. Eigler, Ingo Molnar,
	Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, Amit Shah

(2012/07/25 5:26), Blue Swirl wrote:>
>> The following patch set provides a low-overhead system for collecting kernel
>> tracing data of guests by a host in a virtualization environment.
>>
>> A guest OS generally shares some devices with other guests or a host, so
>> reasons of any problems occurring in a guest may be from other guests or a
>> host.
>> Then, to collect some tracing data of a number of guests and a host is needed
>> when some problems occur in a virtualization environment. One of methods to
>> realize that is to collect tracing data of guests in a host. To do this,
>> network
>> is generally used. However, high load will be taken to applications on guests
>> using network I/O because there are many network stack layers. Therefore,
>> a communication method for collecting the data without using network is
>> needed.
>
> I implemented something similar earlier by passing trace data from
> OpenBIOS to QEMU using the firmware configuration device. The data
> format was the same as QEMU used for simpletrace event structure
> instead of ftrace. I didn't commit it because of a few problems.

Sounds interesting :)
I guess you traced BIOS events, right?

> I'm not familiar with ftrace, is it possible to trace two guest
> applications (BIOS and kernel) at the same time?

Since ftrace itself is a tracing feature in the linux kernel, it
can trace two or more applications (processes) if those run on linux
kernel. However, I think OpenBIOS runs *under* the guest kernel.
If so, ftrace currently can't trace OpenBIOS from guest side.

I think it may need another enhancement on both OpenBIOS and linux
kernel to trace BIOS event from linux kernel.

> Or could this be
> handled by opening two different virtio-serial pipes, one for BIOS and
> the other for the kernel?

Of course, virtio-serial itself can open multiple channels, thus, if
OpenBIOS can handle virtio, it can pass trace data via another
channel.

> In my version, the tracepoint ID would have been used to demultiplex
> QEMU tracepoints from BIOS tracepoints, but something like separate ID
> spaces would have been better.

I guess your feature notifies events to QEMU and QEMU records that in
their own buffer. Therefore it must have different tracepoint IDs.
On the other hand, with this feature, QEMU just passes trace-data to
host-side pipe. Since outer tracing tool separately collects trace
data, we don't need to demultiplex the data.

Perhaps, in the analyzing phase (after tracing), we have to mix events
again. At that time, we'll add some guest-ID for each event-ID, but
it can be done offline.

Best Regards,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-24 13:41       ` Stefan Hajnoczi
@ 2012-07-25  9:13         ` Yoshihiro YUNOMAE
  2012-07-26 10:52           ` Stefan Hajnoczi
  0 siblings, 1 reply; 35+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-07-25  9:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Masami Hiramatsu, linux-kernel, Herbert Xu, Arnd Bergmann,
	Frederic Weisbecker, yrl.pp-manager.tt, qemu-devel,
	Borislav Petkov, virtualization, Franch Ch. Eigler, Ingo Molnar,
	Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, Amit Shah, Srikar Dronamraju, Dhaval Giani

Hi Stefan,

(2012/07/24 22:41), Stefan Hajnoczi wrote:
> On Tue, Jul 24, 2012 at 12:19 PM, Yoshihiro YUNOMAE
> <yoshihiro.yunomae.ez@hitachi.com> wrote:
>>>> Are you using text formatted ftrace?
>> No, currently using raw format, but we'd like to reformat it in text.
>
> Capturing the info necessary to translate numbers into symbols is one
> of the problems of host<->guest tracing so I'm curious how you handle
> this :).

Right, your consideration is true.

> Apologies for my lack of ftrace knowledge but how useful is the raw
> tracing data on the host?  How do you pretty-print it in
> human-readable form?

perf and trace-cmd can actually translate raw-formatted trace data to
text-formatted trace data by using information of kernel or trace
format under tracing/events directory in debugfs. In the same way, if
the information of a guest is exported to a host, we can translate
raw trace data of a guest to text trace data on a host. We will use
9pfs to export that.

Thank you,

-- 
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae.ez@hitachi.com



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-25  9:13         ` Yoshihiro YUNOMAE
@ 2012-07-26 10:52           ` Stefan Hajnoczi
  0 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2012-07-26 10:52 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: Masami Hiramatsu, linux-kernel, Herbert Xu, Arnd Bergmann,
	Frederic Weisbecker, yrl.pp-manager.tt, qemu-devel,
	Borislav Petkov, virtualization, Franch Ch. Eigler, Ingo Molnar,
	Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, Amit Shah, Srikar Dronamraju, Dhaval Giani

On Wed, Jul 25, 2012 at 10:13 AM, Yoshihiro YUNOMAE
<yoshihiro.yunomae.ez@hitachi.com> wrote:
> Hi Stefan,
>
>
> (2012/07/24 22:41), Stefan Hajnoczi wrote:
>>
>> On Tue, Jul 24, 2012 at 12:19 PM, Yoshihiro YUNOMAE
>> <yoshihiro.yunomae.ez@hitachi.com> wrote:
>>>>>
>>>>> Are you using text formatted ftrace?
>>>
>>> No, currently using raw format, but we'd like to reformat it in text.
>>
>>
>> Capturing the info necessary to translate numbers into symbols is one
>> of the problems of host<->guest tracing so I'm curious how you handle
>> this :).
>
>
> Right, your consideration is true.
>
>
>> Apologies for my lack of ftrace knowledge but how useful is the raw
>> tracing data on the host?  How do you pretty-print it in
>> human-readable form?
>
>
> perf and trace-cmd can actually translate raw-formatted trace data to
> text-formatted trace data by using information of kernel or trace
> format under tracing/events directory in debugfs. In the same way, if
> the information of a guest is exported to a host, we can translate
> raw trace data of a guest to text trace data on a host. We will use
> 9pfs to export that.

Thanks, it's clear now :).

Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-24  2:36 [RFC PATCH 0/6] virtio-trace: Support virtio-trace Yoshihiro YUNOMAE
                   ` (8 preceding siblings ...)
  2012-07-24 20:26 ` [Qemu-devel] " Blue Swirl
@ 2012-07-26 11:35 ` Amit Shah
  2012-07-27  8:55   ` Yoshihiro YUNOMAE
  2012-08-09 10:16 ` Amit Shah
  10 siblings, 1 reply; 35+ messages in thread
From: Amit Shah @ 2012-07-26 11:35 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: linux-kernel, Herbert Xu, Arnd Bergmann, Frederic Weisbecker,
	yrl.pp-manager.tt, qemu-devel, Borislav Petkov, virtualization,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman

On (Tue) 24 Jul 2012 [11:36:57], Yoshihiro YUNOMAE wrote:
> Hi All,
> 
> The following patch set provides a low-overhead system for collecting kernel
> tracing data of guests by a host in a virtualization environment.
> 
> A guest OS generally shares some devices with other guests or a host, so
> reasons of any problems occurring in a guest may be from other guests or a host.
> Then, to collect some tracing data of a number of guests and a host is needed
> when some problems occur in a virtualization environment. One of methods to
> realize that is to collect tracing data of guests in a host. To do this, network
> is generally used. However, high load will be taken to applications on guests
> using network I/O because there are many network stack layers. Therefore,
> a communication method for collecting the data without using network is needed.
> 
> We submitted a patch set of "IVRing", a ring-buffer driver constructed on
> Inter-VM shared memory (IVShmem), to LKML http://lwn.net/Articles/500304/ in
> this June. IVRing and the IVRing reader use POSIX shared memory each other
> without using network, so a low-overhead system for collecting guest tracing
> data is realized. However, this patch set has some problems as follows:
>  - use IVShmem instead of virtio
>  - create a new ring-buffer without using existing ring-buffer in kernel
>  - scalability
>    -- not support SMP environment
>    -- buffer size limitation
>    -- not support live migration (maybe difficult for realize this)
> 
> Therefore, we propose a new system "virtio-trace", which uses enhanced
> virtio-serial and existing ring-buffer of ftrace, for collecting guest kernel
> tracing data. In this system, there are 5 main components:
>  (1) Ring-buffer of ftrace in a guest
>      - When trace agent reads ring-buffer, a page is removed from ring-buffer.
>  (2) Trace agent in the guest
>      - Splice the page of ring-buffer to read_pipe using splice() without
>        memory copying. Then, the page is spliced from write_pipe to virtio
>        without memory copying.

I really like the splicing idea.

>  (3) Virtio-console driver in the guest
>      - Pass the page to virtio-ring
>  (4) Virtio-serial bus in QEMU
>      - Copy the page to kernel pipe
>  (5) Reader in the host
>      - Read guest tracing data via FIFO(named pipe) 

So will this be useful only if guest and host run the same kernel?

I'd like to see the host kernel not being used at all -- collect all
relevant info from the guest and send it out to qemu, where it can be
consumed directly by apps driving the tracing.

> ***Evaluation***
> When a host collects tracing data of a guest, the performance of using
> virtio-trace is compared with that of using native(just running ftrace),
> IVRing, and virtio-serial(normal method of read/write).

Why is tracing performance-sensitive?  i.e. why try to optimise this
at all?

> <environment>
> The overview of this evaluation is as follows:
>  (a) A guest on a KVM is prepared.
>      - The guest is dedicated one physical CPU as a virtual CPU(VCPU).
> 
>  (b) The guest starts to write tracing data to ring-buffer of ftrace.
>      - The probe points are all trace points of sched, timer, and kmem.
> 
>  (c) Writing trace data, dhrystone 2 in UNIX bench is executed as a benchmark
>      tool in the guest.
>      - Dhrystone 2 intends system performance by repeating integer arithmetic
>        as a score.
>      - Since higher score equals to better system performance, if the score
>        decrease based on bare environment, it indicates that any operation
>        disturbs the integer arithmetic. Then, we define the overhead of
>        transporting trace data is calculated as follows:
> 		OVERHEAD = (1 - SCORE_OF_A_METHOD/NATIVE_SCORE) * 100.
> 
> The performance of each method is compared as follows:
>  [1] Native
>      - only recording trace data to ring-buffer on a guest
>  [2] Virtio-trace
>      - running a trace agent on a guest
>      - a reader on a host opens FIFO using cat command
>  [3] IVRing
>      - A SystemTap script in a guest records trace data to IVRing.
>        -- probe points are same as ftrace.
>  [4] Virtio-serial(normal)
>      - A reader(using cat) on a guest output trace data to a host using
>        standard output via virtio-serial.
> 
> Other information is as follows:
>  - host
>    kernel: 3.3.7-1 (Fedora16)
>    CPU: Intel Xeon x5660@2.80GHz(12core)
>    Memory: 48GB
> 
>  - guest(only booting one guest)
>    kernel: 3.5.0-rc4+ (Fedora16)
>    CPU: 1VCPU(dedicated)
>    Memory: 1GB
> 
> <result>
> 3 patterns based on the bare environment were indicated as follows:
> 	                   Scores      overhead against [0] Native
>     [0] Native:          28807569.5               -
>     [1] Virtio-trace:    28685049.5             0.43%
>     [2] IVRing:          28418595.5             1.35%
>     [3] Virtio-serial:   13262258.7            53.96%
> 
> 
> ***Just enhancement ideas***
>  - Support for trace-cmd
>  - Support for 9pfs protocol
>  - Support for non-blocking mode in QEMU

There were patches long back (by me) to make chardevs non-blocking but
they didn't make it upstream.  Fedora carries them, if you want to try
out.  Though we want to converge on a reasonable solution that's
acceptable upstream as well.  Just that no one's working on it
currently.  Any help here will be appreciated.

>  - Make "vhost-serial"

I need to understand a) why it's perf-critical, and b) why should the
host be involved at all, to comment on these.

Thanks,

		Amit

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-26 11:35 ` Amit Shah
@ 2012-07-27  8:55   ` Yoshihiro YUNOMAE
  2012-07-27  9:43     ` Amit Shah
  0 siblings, 1 reply; 35+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-07-27  8:55 UTC (permalink / raw)
  To: Amit Shah
  Cc: linux-kernel, Herbert Xu, Arnd Bergmann, Frederic Weisbecker,
	yrl.pp-manager.tt, qemu-devel, Borislav Petkov, virtualization,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman,
	Masami Hiramatsu

Hi Amit,

Thank you for commenting on our work.

(2012/07/26 20:35), Amit Shah wrote:
> On (Tue) 24 Jul 2012 [11:36:57], Yoshihiro YUNOMAE wrote:

[...]

>>
>> Therefore, we propose a new system "virtio-trace", which uses enhanced
>> virtio-serial and existing ring-buffer of ftrace, for collecting guest kernel
>> tracing data. In this system, there are 5 main components:
>>   (1) Ring-buffer of ftrace in a guest
>>       - When trace agent reads ring-buffer, a page is removed from ring-buffer.
>>   (2) Trace agent in the guest
>>       - Splice the page of ring-buffer to read_pipe using splice() without
>>         memory copying. Then, the page is spliced from write_pipe to virtio
>>         without memory copying.
>
> I really like the splicing idea.

Thanks. We will improve this patch set.

>>   (3) Virtio-console driver in the guest
>>       - Pass the page to virtio-ring
>>   (4) Virtio-serial bus in QEMU
>>       - Copy the page to kernel pipe
>>   (5) Reader in the host
>>       - Read guest tracing data via FIFO(named pipe)
>
> So will this be useful only if guest and host run the same kernel?
>
> I'd like to see the host kernel not being used at all -- collect all
> relevant info from the guest and send it out to qemu, where it can be
> consumed directly by apps driving the tracing.

No, this patch set is used only for guest kernels, so guest and host
don't need to run the same kernel.

>> ***Evaluation***
>> When a host collects tracing data of a guest, the performance of using
>> virtio-trace is compared with that of using native(just running ftrace),
>> IVRing, and virtio-serial(normal method of read/write).
>
> Why is tracing performance-sensitive?  i.e. why try to optimise this
> at all?

To minimize effects for applications on guests when a host collects
tracing data of guests.
For example, we assume the situation where guests A and B are running
on a host sharing I/O device. An I/O delay problem occur in guest A,
but it doesn't for the requirement in guest B. In this case, we need to
collect tracing data of guests A and B, but a usual method using
network takes high load for applications of guest B even if guest B is
normally running. Therefore, we try to decrease the load on guests.
We also use this feature for performance analysis on production
virtualization systems.

[...]

>>
>> ***Just enhancement ideas***
>>   - Support for trace-cmd
>>   - Support for 9pfs protocol
>>   - Support for non-blocking mode in QEMU
>
> There were patches long back (by me) to make chardevs non-blocking but
> they didn't make it upstream.  Fedora carries them, if you want to try
> out.  Though we want to converge on a reasonable solution that's
> acceptable upstream as well.  Just that no one's working on it
> currently.  Any help here will be appreciated.

Thanks! In this case, since a guest will stop to run when host reads
trace data of the guest, char device is needed to add a non-blocking
mode. I'll read your patch series. Is the latest version 8?
http://lists.gnu.org/archive/html/qemu-devel/2010-12/msg00035.html

>>   - Make "vhost-serial"
>
> I need to understand a) why it's perf-critical, and b) why should the
> host be involved at all, to comment on these.

a) To make collecting overhead decrease for application on a guest.
    (see above)
b) Trace data of host kernel is not involved even if we introduce this
    patch set.

Thank you,

-- 
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae.ez@hitachi.com



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-27  8:55   ` Yoshihiro YUNOMAE
@ 2012-07-27  9:43     ` Amit Shah
  2012-07-31  0:52       ` Yoshihiro YUNOMAE
  0 siblings, 1 reply; 35+ messages in thread
From: Amit Shah @ 2012-07-27  9:43 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: linux-kernel, Herbert Xu, Arnd Bergmann, Frederic Weisbecker,
	yrl.pp-manager.tt, qemu-devel, Borislav Petkov, virtualization,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman,
	Masami Hiramatsu

On (Fri) 27 Jul 2012 [17:55:11], Yoshihiro YUNOMAE wrote:
> Hi Amit,
> 
> Thank you for commenting on our work.
> 
> (2012/07/26 20:35), Amit Shah wrote:
> >On (Tue) 24 Jul 2012 [11:36:57], Yoshihiro YUNOMAE wrote:
> 
> [...]
> 
> >>
> >>Therefore, we propose a new system "virtio-trace", which uses enhanced
> >>virtio-serial and existing ring-buffer of ftrace, for collecting guest kernel
> >>tracing data. In this system, there are 5 main components:
> >>  (1) Ring-buffer of ftrace in a guest
> >>      - When trace agent reads ring-buffer, a page is removed from ring-buffer.
> >>  (2) Trace agent in the guest
> >>      - Splice the page of ring-buffer to read_pipe using splice() without
> >>        memory copying. Then, the page is spliced from write_pipe to virtio
> >>        without memory copying.
> >
> >I really like the splicing idea.
> 
> Thanks. We will improve this patch set.
> 
> >>  (3) Virtio-console driver in the guest
> >>      - Pass the page to virtio-ring
> >>  (4) Virtio-serial bus in QEMU
> >>      - Copy the page to kernel pipe
> >>  (5) Reader in the host
> >>      - Read guest tracing data via FIFO(named pipe)
> >
> >So will this be useful only if guest and host run the same kernel?
> >
> >I'd like to see the host kernel not being used at all -- collect all
> >relevant info from the guest and send it out to qemu, where it can be
> >consumed directly by apps driving the tracing.
> 
> No, this patch set is used only for guest kernels, so guest and host
> don't need to run the same kernel.

OK - that's good to know.

> >>***Evaluation***
> >>When a host collects tracing data of a guest, the performance of using
> >>virtio-trace is compared with that of using native(just running ftrace),
> >>IVRing, and virtio-serial(normal method of read/write).
> >
> >Why is tracing performance-sensitive?  i.e. why try to optimise this
> >at all?
> 
> To minimize effects for applications on guests when a host collects
> tracing data of guests.
> For example, we assume the situation where guests A and B are running
> on a host sharing I/O device. An I/O delay problem occur in guest A,
> but it doesn't for the requirement in guest B. In this case, we need to
> collect tracing data of guests A and B, but a usual method using
> network takes high load for applications of guest B even if guest B is
> normally running. Therefore, we try to decrease the load on guests.
> We also use this feature for performance analysis on production
> virtualization systems.

OK, got it.

> 
> [...]
> 
> >>
> >>***Just enhancement ideas***
> >>  - Support for trace-cmd
> >>  - Support for 9pfs protocol
> >>  - Support for non-blocking mode in QEMU
> >
> >There were patches long back (by me) to make chardevs non-blocking but
> >they didn't make it upstream.  Fedora carries them, if you want to try
> >out.  Though we want to converge on a reasonable solution that's
> >acceptable upstream as well.  Just that no one's working on it
> >currently.  Any help here will be appreciated.
> 
> Thanks! In this case, since a guest will stop to run when host reads
> trace data of the guest, char device is needed to add a non-blocking
> mode. I'll read your patch series. Is the latest version 8?
> http://lists.gnu.org/archive/html/qemu-devel/2010-12/msg00035.html

I suppose the latest version on-list is what you quote above.  The
objections to the patch series are mentioned in Anthony's mails.

Hans maintains a rebased version of the patches in his tree at

http://cgit.freedesktop.org/~jwrdegoede/qemu/

those patches are included in Fedora's qemu-kvm, so you can try that
out if it improves performance for you.

> >>  - Make "vhost-serial"
> >
> >I need to understand a) why it's perf-critical, and b) why should the
> >host be involved at all, to comment on these.
> 
> a) To make collecting overhead decrease for application on a guest.
>    (see above)
> b) Trace data of host kernel is not involved even if we introduce this
>    patch set.

I see, so you suggested vhost-serial only because you saw the guest
stopping problem due to the absence of non-blocking code?  If so, it
now makes sense.  I don't think we need vhost-serial in any way yet.

BTW where do you parse the trace data obtained from guests?  On a
remote host?

Thanks,
		Amit

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: [Qemu-devel] [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-25  8:15   ` Masami Hiramatsu
@ 2012-07-27 18:58     ` Blue Swirl
  0 siblings, 0 replies; 35+ messages in thread
From: Blue Swirl @ 2012-07-27 18:58 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Yoshihiro YUNOMAE, linux-kernel, Herbert Xu, Arnd Bergmann,
	Frederic Weisbecker, yrl.pp-manager.tt, qemu-devel,
	Borislav Petkov, virtualization, Franch Ch. Eigler, Ingo Molnar,
	Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, Amit Shah

On Wed, Jul 25, 2012 at 8:15 AM, Masami Hiramatsu
<masami.hiramatsu.pt@hitachi.com> wrote:
> (2012/07/25 5:26), Blue Swirl wrote:>
>>> The following patch set provides a low-overhead system for collecting kernel
>>> tracing data of guests by a host in a virtualization environment.
>>>
>>> A guest OS generally shares some devices with other guests or a host, so
>>> reasons of any problems occurring in a guest may be from other guests or a
>>> host.
>>> Then, to collect some tracing data of a number of guests and a host is needed
>>> when some problems occur in a virtualization environment. One of methods to
>>> realize that is to collect tracing data of guests in a host. To do this,
>>> network
>>> is generally used. However, high load will be taken to applications on guests
>>> using network I/O because there are many network stack layers. Therefore,
>>> a communication method for collecting the data without using network is
>>> needed.
>>
>> I implemented something similar earlier by passing trace data from
>> OpenBIOS to QEMU using the firmware configuration device. The data
>> format was the same as QEMU used for simpletrace event structure
>> instead of ftrace. I didn't commit it because of a few problems.
>
> Sounds interesting :)
> I guess you traced BIOS events, right?

Yes, I converted a few DPRINTFs to tracepoints as a proof of concept.

>
>> I'm not familiar with ftrace, is it possible to trace two guest
>> applications (BIOS and kernel) at the same time?
>
> Since ftrace itself is a tracing feature in the linux kernel, it
> can trace two or more applications (processes) if those run on linux
> kernel. However, I think OpenBIOS runs *under* the guest kernel.
> If so, ftrace currently can't trace OpenBIOS from guest side.

No, OpenBIOS boots the machine and then passes control to boot loader
and that to kernel. The kernel will make a few calls to OpenBIOS at
start but not later. OpenBIOS is used by QEMU as Sparc and PowerPC
BIOS.

>
> I think it may need another enhancement on both OpenBIOS and linux
> kernel to trace BIOS event from linux kernel.
>

Ideally both OpenBIOS and Linux should be able to feed trace events
back to QEMU independently.

>> Or could this be
>> handled by opening two different virtio-serial pipes, one for BIOS and
>> the other for the kernel?
>
> Of course, virtio-serial itself can open multiple channels, thus, if
> OpenBIOS can handle virtio, it can pass trace data via another
> channel.

Currently OpenBIOS probes the PCI bus and identifies virtio devices
but ignores them, adding virtio-serial support shouldn't be too hard.
There's a time window between CPU boot and PCI probe when the the
device will not be available though.

>
>> In my version, the tracepoint ID would have been used to demultiplex
>> QEMU tracepoints from BIOS tracepoints, but something like separate ID
>> spaces would have been better.
>
> I guess your feature notifies events to QEMU and QEMU records that in
> their own buffer. Therefore it must have different tracepoint IDs.
> On the other hand, with this feature, QEMU just passes trace-data to
> host-side pipe. Since outer tracing tool separately collects trace
> data, we don't need to demultiplex the data.
>
> Perhaps, in the analyzing phase (after tracing), we have to mix events
> again. At that time, we'll add some guest-ID for each event-ID, but
> it can be done offline.

Yes, the multiplexing/demultiplexing is only needed in my version
because the feeds are not independent.

>
> Best Regards,
>
> --
> Masami HIRAMATSU
> Software Platform Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 4/6] ftrace: Allow stealing pages from pipe buffer
  2012-07-24  2:37 ` [RFC PATCH 4/6] ftrace: Allow stealing pages from pipe buffer Yoshihiro YUNOMAE
@ 2012-07-30 22:12   ` Steven Rostedt
  0 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2012-07-30 22:12 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: linux-kernel, Amit Shah, Anthony Liguori, Arnd Bergmann,
	Borislav Petkov, Franch Ch. Eigler, Frederic Weisbecker,
	Greg Kroah-Hartman, Herbert Xu, Ingo Molnar, Mathieu Desnoyers,
	virtualization, qemu-devel, yrl.pp-manager.tt, Masami Hiramatsu

On Tue, 2012-07-24 at 11:37 +0900, Yoshihiro YUNOMAE wrote:
> From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> 
> Use generic steal operation on pipe buffer to allow stealing
> ring buffer's read page from pipe buffer.
> 
> Note that this could reduce the performance of splice on the
> splice_write side operation without affinity setting.
> Since the ring buffer's read pages are allocated on the
> tracing-node, but the splice user does not always execute
> splice write side operation on the same node. In this case,
> the page will be accessed from the another node.
> Thus, it is strongly recommended to assign the splicing
> thread to corresponding node.
> 
> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve


> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> ---
> 
>  kernel/trace/trace.c |    8 +-------
>  1 files changed, 1 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index a120f98..ae01930 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -4194,12 +4194,6 @@ static void buffer_pipe_buf_release(struct pipe_inode_info *pipe,
>  	buf->private = 0;
>  }
>  
> -static int buffer_pipe_buf_steal(struct pipe_inode_info *pipe,
> -				 struct pipe_buffer *buf)
> -{
> -	return 1;
> -}
> -
>  static void buffer_pipe_buf_get(struct pipe_inode_info *pipe,
>  				struct pipe_buffer *buf)
>  {
> @@ -4215,7 +4209,7 @@ static const struct pipe_buf_operations buffer_pipe_buf_ops = {
>  	.unmap			= generic_pipe_buf_unmap,
>  	.confirm		= generic_pipe_buf_confirm,
>  	.release		= buffer_pipe_buf_release,
> -	.steal			= buffer_pipe_buf_steal,
> +	.steal			= generic_pipe_buf_steal,
>  	.get			= buffer_pipe_buf_get,
>  };
>  
> 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-27  9:43     ` Amit Shah
@ 2012-07-31  0:52       ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 35+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-07-31  0:52 UTC (permalink / raw)
  To: Amit Shah
  Cc: linux-kernel, Herbert Xu, Arnd Bergmann, Frederic Weisbecker,
	yrl.pp-manager.tt, qemu-devel, Borislav Petkov, virtualization,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman,
	Masami Hiramatsu

Hi Amit,

Sorry for the late reply.

(2012/07/27 18:43), Amit Shah wrote:
> On (Fri) 27 Jul 2012 [17:55:11], Yoshihiro YUNOMAE wrote:
>> Hi Amit,
>>
>> Thank you for commenting on our work.
>>
>> (2012/07/26 20:35), Amit Shah wrote:
>>> On (Tue) 24 Jul 2012 [11:36:57], Yoshihiro YUNOMAE wrote:
>>

[...]

>>>>
>>>> ***Just enhancement ideas***
>>>>   - Support for trace-cmd
>>>>   - Support for 9pfs protocol
>>>>   - Support for non-blocking mode in QEMU
>>>
>>> There were patches long back (by me) to make chardevs non-blocking but
>>> they didn't make it upstream.  Fedora carries them, if you want to try
>>> out.  Though we want to converge on a reasonable solution that's
>>> acceptable upstream as well.  Just that no one's working on it
>>> currently.  Any help here will be appreciated.
>>
>> Thanks! In this case, since a guest will stop to run when host reads
>> trace data of the guest, char device is needed to add a non-blocking
>> mode. I'll read your patch series. Is the latest version 8?
>> http://lists.gnu.org/archive/html/qemu-devel/2010-12/msg00035.html
>
> I suppose the latest version on-list is what you quote above.  The
> objections to the patch series are mentioned in Anthony's mails.

I'll check the mails.

> Hans maintains a rebased version of the patches in his tree at
>
> http://cgit.freedesktop.org/~jwrdegoede/qemu/
>
> those patches are included in Fedora's qemu-kvm, so you can try that
> out if it improves performance for you.

Thanks. I'll check those patches.

>>>>   - Make "vhost-serial"
>>>
>>> I need to understand a) why it's perf-critical, and b) why should the
>>> host be involved at all, to comment on these.
>>
>> a) To make collecting overhead decrease for application on a guest.
>>     (see above)
>> b) Trace data of host kernel is not involved even if we introduce this
>>     patch set.
>
> I see, so you suggested vhost-serial only because you saw the guest
> stopping problem due to the absence of non-blocking code?  If so, it
> now makes sense.  I don't think we need vhost-serial in any way yet.

I understood. We suggested vhost-serial as one of the ideas for
improving performances. Other features(trace-cmd, 9pfs, and
non-blocking chardev) should be supported first, I think.

> BTW where do you parse the trace data obtained from guests?  On a
> remote host?

It is the best that we can parse the data on a remote host in this
tracing system. Existing trace-cmd can already parse it on a remote
site. If we add the feature collecting event-format data(guest's
debugfs has that) from guests, we can parse tracing data on a remote
host as well as on a host running guests.

Thank you,

-- 
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae.ez@hitachi.com



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 1/6] virtio/console: Add splice_write support
  2012-07-24  2:37 ` [RFC PATCH 1/6] virtio/console: Add splice_write support Yoshihiro YUNOMAE
@ 2012-08-09  9:00   ` Amit Shah
  2012-08-09  9:12     ` Masami Hiramatsu
  0 siblings, 1 reply; 35+ messages in thread
From: Amit Shah @ 2012-08-09  9:00 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: linux-kernel, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Steven Rostedt,
	virtualization, qemu-devel, yrl.pp-manager.tt, Masami Hiramatsu

On (Tue) 24 Jul 2012 [11:37:07], Yoshihiro YUNOMAE wrote:
> From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> 
> Enable to use splice_write from pipe to virtio-console port.
> This steals pages from pipe and directly send it to host.
> 
> Note that this may accelerate only the guest to host path.
> 
> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---

> +/* Faster zero-copy write by splicing */
> +static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
> +				      struct file *filp, loff_t *ppos,
> +				      size_t len, unsigned int flags)
> +{
> +	struct port *port = filp->private_data;
> +	struct sg_list sgl;
> +	ssize_t ret;
> +	struct splice_desc sd = {
> +		.total_len = len,
> +		.flags = flags,
> +		.pos = *ppos,
> +		.u.data = &sgl,
> +	};
> +
> +	sgl.n = 0;
> +	sgl.len = 0;
> +	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,
> +			 GFP_ATOMIC);

Do you expect this function to be called from interrupt context?

> +	if (unlikely(!sgl.sg))
> +		return -ENOMEM;
> +
> +	sg_init_table(sgl.sg, MAX_SPLICE_PAGES);
> +	ret = __splice_from_pipe(pipe, &sd, pipe_to_sg);
> +	if (likely(ret > 0))
> +		ret = send_pages(port, sgl.sg, sgl.n, sgl.len, true);
> +
> +	return ret;
> +}

		Amit

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer
  2012-07-24  2:37 ` [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer Yoshihiro YUNOMAE
@ 2012-08-09  9:03   ` Amit Shah
  2012-08-09  9:24     ` Borislav Petkov
  2012-08-09  9:24     ` Masami Hiramatsu
  0 siblings, 2 replies; 35+ messages in thread
From: Amit Shah @ 2012-08-09  9:03 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: linux-kernel, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Steven Rostedt,
	virtualization, qemu-devel, yrl.pp-manager.tt, Masami Hiramatsu

On (Tue) 24 Jul 2012 [11:37:18], Yoshihiro YUNOMAE wrote:
> From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> 
> Add a failback memcpy path for unstealable pipe buffer.
> If buf->ops->steal() fails, virtio-serial tries to
> copy the page contents to an allocated page, instead
> of just failing splice().
> 
> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
> 
>  drivers/char/virtio_console.c |   28 +++++++++++++++++++++++++---
>  1 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> index fe31b2f..911cb3e 100644
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -794,7 +794,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
>  			struct splice_desc *sd)
>  {
>  	struct sg_list *sgl = sd->u.data;
> -	unsigned int len = 0;
> +	unsigned int offset, len;
>  
>  	if (sgl->n == MAX_SPLICE_PAGES)
>  		return 0;
> @@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
>  
>  		len = min(buf->len, sd->len);
>  		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
> -		sgl->n++;
> -		sgl->len += len;
> +	} else {
> +		/* Failback to copying a page */
> +		struct page *page = alloc_page(GFP_KERNEL);

I prefer zeroing out the page.  If there's not enough data to be
filled in the page, the remaining data can be leaked to the host.

		Amit

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: [RFC PATCH 1/6] virtio/console: Add splice_write support
  2012-08-09  9:00   ` Amit Shah
@ 2012-08-09  9:12     ` Masami Hiramatsu
  0 siblings, 0 replies; 35+ messages in thread
From: Masami Hiramatsu @ 2012-08-09  9:12 UTC (permalink / raw)
  To: Amit Shah
  Cc: Yoshihiro YUNOMAE, linux-kernel, Anthony Liguori, Arnd Bergmann,
	Borislav Petkov, Franch Ch. Eigler, Frederic Weisbecker,
	Greg Kroah-Hartman, Herbert Xu, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, virtualization, qemu-devel, yrl.pp-manager.tt

(2012/08/09 18:00), Amit Shah wrote:
> On (Tue) 24 Jul 2012 [11:37:07], Yoshihiro YUNOMAE wrote:
>> From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>>
>> Enable to use splice_write from pipe to virtio-console port.
>> This steals pages from pipe and directly send it to host.
>>
>> Note that this may accelerate only the guest to host path.
>>
>> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>> Cc: Amit Shah <amit.shah@redhat.com>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> ---
> 
>> +/* Faster zero-copy write by splicing */
>> +static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
>> +				      struct file *filp, loff_t *ppos,
>> +				      size_t len, unsigned int flags)
>> +{
>> +	struct port *port = filp->private_data;
>> +	struct sg_list sgl;
>> +	ssize_t ret;
>> +	struct splice_desc sd = {
>> +		.total_len = len,
>> +		.flags = flags,
>> +		.pos = *ppos,
>> +		.u.data = &sgl,
>> +	};
>> +
>> +	sgl.n = 0;
>> +	sgl.len = 0;
>> +	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,
>> +			 GFP_ATOMIC);
> 
> Do you expect this function to be called from interrupt context?

No, not at all. Oops, that should be GFP_KERNEL...

Thank you for pointing it out.

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer
  2012-08-09  9:03   ` Amit Shah
@ 2012-08-09  9:24     ` Borislav Petkov
  2012-08-09  9:24     ` Masami Hiramatsu
  1 sibling, 0 replies; 35+ messages in thread
From: Borislav Petkov @ 2012-08-09  9:24 UTC (permalink / raw)
  To: Amit Shah
  Cc: Yoshihiro YUNOMAE, linux-kernel, Anthony Liguori, Arnd Bergmann,
	Borislav Petkov, Franch Ch. Eigler, Frederic Weisbecker,
	Greg Kroah-Hartman, Herbert Xu, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, virtualization, qemu-devel, yrl.pp-manager.tt,
	Masami Hiramatsu

On Thu, Aug 09, 2012 at 02:33:12PM +0530, Amit Shah wrote:
> > @@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> >  
> >  		len = min(buf->len, sd->len);
> >  		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
> > -		sgl->n++;
> > -		sgl->len += len;
> > +	} else {
> > +		/* Failback to copying a page */
> > +		struct page *page = alloc_page(GFP_KERNEL);
> 
> I prefer zeroing out the page.  If there's not enough data to be
> filled in the page, the remaining data can be leaked to the host.

get_zeroed_page()?

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer
  2012-08-09  9:03   ` Amit Shah
  2012-08-09  9:24     ` Borislav Petkov
@ 2012-08-09  9:24     ` Masami Hiramatsu
  2012-08-09  9:55       ` Amit Shah
  2012-08-09 12:35       ` Steven Rostedt
  1 sibling, 2 replies; 35+ messages in thread
From: Masami Hiramatsu @ 2012-08-09  9:24 UTC (permalink / raw)
  To: Amit Shah
  Cc: Yoshihiro YUNOMAE, linux-kernel, Anthony Liguori, Arnd Bergmann,
	Borislav Petkov, Franch Ch. Eigler, Frederic Weisbecker,
	Greg Kroah-Hartman, Herbert Xu, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, virtualization, qemu-devel, yrl.pp-manager.tt

(2012/08/09 18:03), Amit Shah wrote:
> On (Tue) 24 Jul 2012 [11:37:18], Yoshihiro YUNOMAE wrote:
>> From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>>
>> Add a failback memcpy path for unstealable pipe buffer.
>> If buf->ops->steal() fails, virtio-serial tries to
>> copy the page contents to an allocated page, instead
>> of just failing splice().
>>
>> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>> Cc: Amit Shah <amit.shah@redhat.com>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> ---
>>
>>  drivers/char/virtio_console.c |   28 +++++++++++++++++++++++++---
>>  1 files changed, 25 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
>> index fe31b2f..911cb3e 100644
>> --- a/drivers/char/virtio_console.c
>> +++ b/drivers/char/virtio_console.c
>> @@ -794,7 +794,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
>>  			struct splice_desc *sd)
>>  {
>>  	struct sg_list *sgl = sd->u.data;
>> -	unsigned int len = 0;
>> +	unsigned int offset, len;
>>  
>>  	if (sgl->n == MAX_SPLICE_PAGES)
>>  		return 0;
>> @@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
>>  
>>  		len = min(buf->len, sd->len);
>>  		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
>> -		sgl->n++;
>> -		sgl->len += len;
>> +	} else {
>> +		/* Failback to copying a page */
>> +		struct page *page = alloc_page(GFP_KERNEL);
> 
> I prefer zeroing out the page.  If there's not enough data to be
> filled in the page, the remaining data can be leaked to the host.

Yeah, it is really easy to fix that.
But out of curiosity, would that be really a problem?
I guess that host can access any guest page if need. If that
is right, is that really insecure to leak randomly allocated
unused page to the host?

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer
  2012-08-09  9:24     ` Masami Hiramatsu
@ 2012-08-09  9:55       ` Amit Shah
  2012-08-09  9:58         ` Avi Kivity
  2012-08-09 12:35       ` Steven Rostedt
  1 sibling, 1 reply; 35+ messages in thread
From: Amit Shah @ 2012-08-09  9:55 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Yoshihiro YUNOMAE, linux-kernel, Anthony Liguori, Arnd Bergmann,
	Borislav Petkov, Franch Ch. Eigler, Frederic Weisbecker,
	Greg Kroah-Hartman, Herbert Xu, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, virtualization, qemu-devel, yrl.pp-manager.tt,
	Avi Kivity, Rusty Russell

On (Thu) 09 Aug 2012 [18:24:58], Masami Hiramatsu wrote:
> (2012/08/09 18:03), Amit Shah wrote:
> > On (Tue) 24 Jul 2012 [11:37:18], Yoshihiro YUNOMAE wrote:
> >> From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> >>
> >> Add a failback memcpy path for unstealable pipe buffer.
> >> If buf->ops->steal() fails, virtio-serial tries to
> >> copy the page contents to an allocated page, instead
> >> of just failing splice().
> >>
> >> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> >> Cc: Amit Shah <amit.shah@redhat.com>
> >> Cc: Arnd Bergmann <arnd@arndb.de>
> >> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >> ---
> >>
> >>  drivers/char/virtio_console.c |   28 +++++++++++++++++++++++++---
> >>  1 files changed, 25 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> >> index fe31b2f..911cb3e 100644
> >> --- a/drivers/char/virtio_console.c
> >> +++ b/drivers/char/virtio_console.c
> >> @@ -794,7 +794,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> >>  			struct splice_desc *sd)
> >>  {
> >>  	struct sg_list *sgl = sd->u.data;
> >> -	unsigned int len = 0;
> >> +	unsigned int offset, len;
> >>  
> >>  	if (sgl->n == MAX_SPLICE_PAGES)
> >>  		return 0;
> >> @@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> >>  
> >>  		len = min(buf->len, sd->len);
> >>  		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
> >> -		sgl->n++;
> >> -		sgl->len += len;
> >> +	} else {
> >> +		/* Failback to copying a page */
> >> +		struct page *page = alloc_page(GFP_KERNEL);
> > 
> > I prefer zeroing out the page.  If there's not enough data to be
> > filled in the page, the remaining data can be leaked to the host.
> 
> Yeah, it is really easy to fix that.
> But out of curiosity, would that be really a problem?
> I guess that host can access any guest page if need. If that
> is right, is that really insecure to leak randomly allocated
> unused page to the host?

I'm not sure if there is a way to really attack, but just something I
had thought about: the host kernel can access any guest page, that's
not something we can prevent.

However, if qemu is restricted from accessing guest pages, and the
guest shares this page with qemu for r/w purposes via the virtio
channel, a qemu exploit can expose guest data to host userspace.

I agree this is completely theoretical; can someone else with more
insight confirm or deny my apprehensions?

		Amit

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer
  2012-08-09  9:55       ` Amit Shah
@ 2012-08-09  9:58         ` Avi Kivity
  2012-08-09 10:14           ` Amit Shah
  0 siblings, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2012-08-09  9:58 UTC (permalink / raw)
  To: Amit Shah
  Cc: Masami Hiramatsu, Yoshihiro YUNOMAE, linux-kernel,
	Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Steven Rostedt,
	virtualization, qemu-devel, yrl.pp-manager.tt, Rusty Russell

On 08/09/2012 12:55 PM, Amit Shah wrote:
> On (Thu) 09 Aug 2012 [18:24:58], Masami Hiramatsu wrote:
>> (2012/08/09 18:03), Amit Shah wrote:
>> > On (Tue) 24 Jul 2012 [11:37:18], Yoshihiro YUNOMAE wrote:
>> >> From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>> >>
>> >> Add a failback memcpy path for unstealable pipe buffer.
>> >> If buf->ops->steal() fails, virtio-serial tries to
>> >> copy the page contents to an allocated page, instead
>> >> of just failing splice().
>> >>
>> >> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>> >> Cc: Amit Shah <amit.shah@redhat.com>
>> >> Cc: Arnd Bergmann <arnd@arndb.de>
>> >> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> >> ---
>> >>
>> >>  drivers/char/virtio_console.c |   28 +++++++++++++++++++++++++---
>> >>  1 files changed, 25 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
>> >> index fe31b2f..911cb3e 100644
>> >> --- a/drivers/char/virtio_console.c
>> >> +++ b/drivers/char/virtio_console.c
>> >> @@ -794,7 +794,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
>> >>  			struct splice_desc *sd)
>> >>  {
>> >>  	struct sg_list *sgl = sd->u.data;
>> >> -	unsigned int len = 0;
>> >> +	unsigned int offset, len;
>> >>  
>> >>  	if (sgl->n == MAX_SPLICE_PAGES)
>> >>  		return 0;
>> >> @@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
>> >>  
>> >>  		len = min(buf->len, sd->len);
>> >>  		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
>> >> -		sgl->n++;
>> >> -		sgl->len += len;
>> >> +	} else {
>> >> +		/* Failback to copying a page */
>> >> +		struct page *page = alloc_page(GFP_KERNEL);
>> > 
>> > I prefer zeroing out the page.  If there's not enough data to be
>> > filled in the page, the remaining data can be leaked to the host.
>> 
>> Yeah, it is really easy to fix that.
>> But out of curiosity, would that be really a problem?
>> I guess that host can access any guest page if need. If that
>> is right, is that really insecure to leak randomly allocated
>> unused page to the host?
> 
> I'm not sure if there is a way to really attack, but just something I
> had thought about: the host kernel can access any guest page, that's
> not something we can prevent.
> 
> However, if qemu is restricted from accessing guest pages, and the
> guest shares this page with qemu for r/w purposes via the virtio
> channel, a qemu exploit can expose guest data to host userspace.
> 
> I agree this is completely theoretical; can someone else with more
> insight confirm or deny my apprehensions?

qemu can read and write any guest page (for the guest it controls).


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer
  2012-08-09  9:58         ` Avi Kivity
@ 2012-08-09 10:14           ` Amit Shah
  0 siblings, 0 replies; 35+ messages in thread
From: Amit Shah @ 2012-08-09 10:14 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Masami Hiramatsu, Yoshihiro YUNOMAE, linux-kernel,
	Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Steven Rostedt,
	virtualization, qemu-devel, yrl.pp-manager.tt, Rusty Russell

On (Thu) 09 Aug 2012 [12:58:13], Avi Kivity wrote:
> On 08/09/2012 12:55 PM, Amit Shah wrote:
> > On (Thu) 09 Aug 2012 [18:24:58], Masami Hiramatsu wrote:
> >> (2012/08/09 18:03), Amit Shah wrote:
> >> > On (Tue) 24 Jul 2012 [11:37:18], Yoshihiro YUNOMAE wrote:
> >> >> From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> >> >>
> >> >> Add a failback memcpy path for unstealable pipe buffer.
> >> >> If buf->ops->steal() fails, virtio-serial tries to
> >> >> copy the page contents to an allocated page, instead
> >> >> of just failing splice().
> >> >>
> >> >> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> >> >> Cc: Amit Shah <amit.shah@redhat.com>
> >> >> Cc: Arnd Bergmann <arnd@arndb.de>
> >> >> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >> >> ---
> >> >>
> >> >>  drivers/char/virtio_console.c |   28 +++++++++++++++++++++++++---
> >> >>  1 files changed, 25 insertions(+), 3 deletions(-)
> >> >>
> >> >> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> >> >> index fe31b2f..911cb3e 100644
> >> >> --- a/drivers/char/virtio_console.c
> >> >> +++ b/drivers/char/virtio_console.c
> >> >> @@ -794,7 +794,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> >> >>  			struct splice_desc *sd)
> >> >>  {
> >> >>  	struct sg_list *sgl = sd->u.data;
> >> >> -	unsigned int len = 0;
> >> >> +	unsigned int offset, len;
> >> >>  
> >> >>  	if (sgl->n == MAX_SPLICE_PAGES)
> >> >>  		return 0;
> >> >> @@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> >> >>  
> >> >>  		len = min(buf->len, sd->len);
> >> >>  		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
> >> >> -		sgl->n++;
> >> >> -		sgl->len += len;
> >> >> +	} else {
> >> >> +		/* Failback to copying a page */
> >> >> +		struct page *page = alloc_page(GFP_KERNEL);
> >> > 
> >> > I prefer zeroing out the page.  If there's not enough data to be
> >> > filled in the page, the remaining data can be leaked to the host.
> >> 
> >> Yeah, it is really easy to fix that.
> >> But out of curiosity, would that be really a problem?
> >> I guess that host can access any guest page if need. If that
> >> is right, is that really insecure to leak randomly allocated
> >> unused page to the host?
> > 
> > I'm not sure if there is a way to really attack, but just something I
> > had thought about: the host kernel can access any guest page, that's
> > not something we can prevent.
> > 
> > However, if qemu is restricted from accessing guest pages, and the
> > guest shares this page with qemu for r/w purposes via the virtio
> > channel, a qemu exploit can expose guest data to host userspace.
> > 
> > I agree this is completely theoretical; can someone else with more
> > insight confirm or deny my apprehensions?
> 
> qemu can read and write any guest page (for the guest it controls).

OK, thanks for confirming -- no need to change this patch, then.

		Amit

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-07-24  2:36 [RFC PATCH 0/6] virtio-trace: Support virtio-trace Yoshihiro YUNOMAE
                   ` (9 preceding siblings ...)
  2012-07-26 11:35 ` Amit Shah
@ 2012-08-09 10:16 ` Amit Shah
  2012-08-21  2:17   ` Rusty Russell
  10 siblings, 1 reply; 35+ messages in thread
From: Amit Shah @ 2012-08-09 10:16 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: linux-kernel, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Steven Rostedt,
	virtualization, qemu-devel, yrl.pp-manager.tt, Rusty Russell

Hi,

On (Tue) 24 Jul 2012 [11:36:57], Yoshihiro YUNOMAE wrote:
> Hi All,
> 
> The following patch set provides a low-overhead system for collecting kernel
> tracing data of guests by a host in a virtualization environment.

So I just have one minor comment, please post a non-RFC version of the
patch.

Since you have an ACK from Steven for the ftrace patch, I guess Rusty
can push this in via his virtio tree?

I'll ack the virtio-console bits in the next series you send.

Thanks,

		Amit

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer
  2012-08-09  9:24     ` Masami Hiramatsu
  2012-08-09  9:55       ` Amit Shah
@ 2012-08-09 12:35       ` Steven Rostedt
  1 sibling, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2012-08-09 12:35 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Amit Shah, Yoshihiro YUNOMAE, linux-kernel, Anthony Liguori,
	Arnd Bergmann, Borislav Petkov, Franch Ch. Eigler,
	Frederic Weisbecker, Greg Kroah-Hartman, Herbert Xu, Ingo Molnar,
	Mathieu Desnoyers, virtualization, qemu-devel, yrl.pp-manager.tt

On Thu, 2012-08-09 at 18:24 +0900, Masami Hiramatsu wrote:

> Yeah, it is really easy to fix that.
> But out of curiosity, would that be really a problem?
> I guess that host can access any guest page if need. If that
> is right, is that really insecure to leak randomly allocated
> unused page to the host?

Yeah, it's like protecting userspace pages from the kernel ;-)

-- Steve



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-08-09 10:16 ` Amit Shah
@ 2012-08-21  2:17   ` Rusty Russell
  2012-08-21  5:09     ` Amit Shah
  0 siblings, 1 reply; 35+ messages in thread
From: Rusty Russell @ 2012-08-21  2:17 UTC (permalink / raw)
  To: Amit Shah, Yoshihiro YUNOMAE
  Cc: linux-kernel, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Steven Rostedt,
	virtualization, qemu-devel, yrl.pp-manager.tt

On Thu, 9 Aug 2012 15:46:20 +0530, Amit Shah <amit.shah@redhat.com> wrote:
> Hi,
> 
> On (Tue) 24 Jul 2012 [11:36:57], Yoshihiro YUNOMAE wrote:
> > Hi All,
> > 
> > The following patch set provides a low-overhead system for collecting kernel
> > tracing data of guests by a host in a virtualization environment.
> 
> So I just have one minor comment, please post a non-RFC version of the
> patch.
> 
> Since you have an ACK from Steven for the ftrace patch, I guess Rusty
> can push this in via his virtio tree?
> 
> I'll ack the virtio-console bits in the next series you send.

You didn't Ack, BTW.  At least, AFAICT.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
  2012-08-21  2:17   ` Rusty Russell
@ 2012-08-21  5:09     ` Amit Shah
  0 siblings, 0 replies; 35+ messages in thread
From: Amit Shah @ 2012-08-21  5:09 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Yoshihiro YUNOMAE, linux-kernel, Anthony Liguori, Arnd Bergmann,
	Borislav Petkov, Franch Ch. Eigler, Frederic Weisbecker,
	Greg Kroah-Hartman, Herbert Xu, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, virtualization, qemu-devel, yrl.pp-manager.tt

On (Tue) 21 Aug 2012 [11:47:16], Rusty Russell wrote:
> On Thu, 9 Aug 2012 15:46:20 +0530, Amit Shah <amit.shah@redhat.com> wrote:
> > Hi,
> > 
> > On (Tue) 24 Jul 2012 [11:36:57], Yoshihiro YUNOMAE wrote:
> > > Hi All,
> > > 
> > > The following patch set provides a low-overhead system for collecting kernel
> > > tracing data of guests by a host in a virtualization environment.
> > 
> > So I just have one minor comment, please post a non-RFC version of the
> > patch.
> > 
> > Since you have an ACK from Steven for the ftrace patch, I guess Rusty
> > can push this in via his virtio tree?
> > 
> > I'll ack the virtio-console bits in the next series you send.
> 
> You didn't Ack, BTW.  At least, AFAICT.

Ah, sorry.  Will do that now.

		Amit

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2012-08-21  5:09 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-24  2:36 [RFC PATCH 0/6] virtio-trace: Support virtio-trace Yoshihiro YUNOMAE
2012-07-24  2:37 ` [RFC PATCH 1/6] virtio/console: Add splice_write support Yoshihiro YUNOMAE
2012-08-09  9:00   ` Amit Shah
2012-08-09  9:12     ` Masami Hiramatsu
2012-07-24  2:37 ` [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer Yoshihiro YUNOMAE
2012-08-09  9:03   ` Amit Shah
2012-08-09  9:24     ` Borislav Petkov
2012-08-09  9:24     ` Masami Hiramatsu
2012-08-09  9:55       ` Amit Shah
2012-08-09  9:58         ` Avi Kivity
2012-08-09 10:14           ` Amit Shah
2012-08-09 12:35       ` Steven Rostedt
2012-07-24  2:37 ` [RFC PATCH 3/6] virtio/console: Wait until the port is ready on splice Yoshihiro YUNOMAE
2012-07-24  2:37 ` [RFC PATCH 4/6] ftrace: Allow stealing pages from pipe buffer Yoshihiro YUNOMAE
2012-07-30 22:12   ` Steven Rostedt
2012-07-24  2:37 ` [RFC PATCH 5/6] virtio/console: Allocate scatterlist according to the current pipe size Yoshihiro YUNOMAE
2012-07-24  2:37 ` [RFC PATCH 6/6] tools: Add guest trace agent as a user tool Yoshihiro YUNOMAE
2012-07-24  3:27 ` [RFC PATCH 0/6] virtio-trace: Support virtio-trace Masami Hiramatsu
2012-07-24 10:02 ` Stefan Hajnoczi
2012-07-24 11:03   ` Masami Hiramatsu
2012-07-24 11:19     ` Yoshihiro YUNOMAE
2012-07-24 13:41       ` Stefan Hajnoczi
2012-07-25  9:13         ` Yoshihiro YUNOMAE
2012-07-26 10:52           ` Stefan Hajnoczi
2012-07-24 13:43     ` Stefan Hajnoczi
2012-07-24 20:26 ` [Qemu-devel] " Blue Swirl
2012-07-25  8:15   ` Masami Hiramatsu
2012-07-27 18:58     ` Blue Swirl
2012-07-26 11:35 ` Amit Shah
2012-07-27  8:55   ` Yoshihiro YUNOMAE
2012-07-27  9:43     ` Amit Shah
2012-07-31  0:52       ` Yoshihiro YUNOMAE
2012-08-09 10:16 ` Amit Shah
2012-08-21  2:17   ` Rusty Russell
2012-08-21  5:09     ` Amit Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).