All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/6] virtio-trace: Support virtio-trace
@ 2012-08-09 12:30 ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Masami Hiramatsu,
	Rusty Russell, Steven Rostedt, virtualization, qemu-devel,
	yrl.pp-manager.tt

Hi All,

The following patch set provides a low-overhead system for collecting kernel
tracing data of guests by a host in a virtualization environment.

A guest OS generally shares some devices with other guests or a host, so
reasons of any problems occurring in a guest may be from other guests or a host.
Then, to collect some tracing data of a number of guests and a host is needed
when some problems occur in a virtualization environment. One of methods to
realize that is to collect tracing data of guests in a host. To do this, network
is generally used. However, high load will be taken to applications on guests
using network I/O because there are many network stack layers. Therefore,
a communication method for collecting the data without using network is needed.

We submitted a patch set of "IVRing", a ring-buffer driver constructed on
Inter-VM shared memory (IVShmem), to LKML http://lwn.net/Articles/500304/ in
this June. IVRing and the IVRing reader use POSIX shared memory each other
without using network, so a low-overhead system for collecting guest tracing
data is realized. However, this patch set has some problems as follows:
 - use IVShmem instead of virtio
 - create a new ring-buffer without using existing ring-buffer in kernel
 - scalability
   -- not support SMP environment
   -- buffer size limitation
   -- not support live migration (maybe difficult for realize this)

Therefore, we propose a new system "virtio-trace", which uses enhanced
virtio-serial and existing ring-buffer of ftrace, for collecting guest kernel
tracing data. In this system, there are 5 main components:
 (1) Ring-buffer of ftrace in a guest
     - When trace agent reads ring-buffer, a page is removed from ring-buffer.
 (2) Trace agent in the guest
     - Splice the page of ring-buffer to read_pipe using splice() without
       memory copying. Then, the page is spliced from write_pipe to virtio
       without memory copying.
 (3) Virtio-console driver in the guest
     - Pass the page to virtio-ring
 (4) Virtio-serial bus in QEMU
     - Copy the page to kernel pipe
 (5) Reader in the host
     - Read guest tracing data via FIFO(named pipe) 
Here, this patch set is only for guest kernels, so guest and host don't need to
run the same kernel.

In this patch set, we cannot get text-formatted data but raw-formatted data on
a host. perf and trace-cmd can actually translate raw to text by using
information of kernel or trace format under tracing/events directory in
debugfs. In the same way, if the information of a guest is exported to a host,
we can translate raw data of a guest to text data on a host or a remote host.
We will use 9pfs to export that.

***Evaluation***
When a host collects tracing data of a guest, the performance of using
virtio-trace is compared with that of using native(just running ftrace),
IVRing, and virtio-serial(normal method of read/write).

<environment>
The overview of this evaluation is as follows:
 (a) A guest on a KVM is prepared.
     - The guest is dedicated one physical CPU as a virtual CPU(VCPU).

 (b) The guest starts to write tracing data to ring-buffer of ftrace.
     - The probe points are all trace points of sched, timer, and kmem.

 (c) Writing trace data, dhrystone 2 in UNIX bench is executed as a benchmark
     tool in the guest.
     - Dhrystone 2 intends system performance by repeating integer arithmetic
       as a score.
     - Since higher score equals to better system performance, if the score
       decrease based on bare environment, it indicates that any operation
       disturbs the integer arithmetic. Then, we define the overhead of
       transporting trace data is calculated as follows:
		OVERHEAD = (1 - SCORE_OF_A_METHOD/NATIVE_SCORE) * 100.

The performance of each method is compared as follows:
 [1] Native
     - only recording trace data to ring-buffer on a guest
 [2] Virtio-trace
     - running a trace agent on a guest
     - a reader on a host opens FIFO using cat command
 [3] IVRing
     - A SystemTap script in a guest records trace data to IVRing.
       -- probe points are same as ftrace.
 [4] Virtio-serial(normal)
     - A reader(using cat) on a guest output trace data to a host using
       standard output via virtio-serial.

Other information is as follows:
 - host
   kernel: 3.3.7-1 (Fedora16)
   CPU: Intel Xeon x5660@2.80GHz(12core)
   Memory: 48GB

 - guest(only booting one guest)
   kernel: 3.5.0-rc4+ (Fedora16)
   CPU: 1VCPU(dedicated)
   Memory: 1GB

<result>
3 patterns based on the bare environment were indicated as follows:
	                   Scores      overhead against [0] Native
    [0] Native:          28807569.5               -
    [1] Virtio-trace:    28685049.5             0.43%
    [2] IVRing:          28418595.5             1.35%
    [3] Virtio-serial:   13262258.7            53.96%


***Just enhancement ideas***
 - Support for trace-cmd
 - Support for 9pfs protocol
 - Support for non-blocking mode in QEMU

v2:
 - Use GFP_KERNEL instead of GFP_ATOMIC in syscall context function in 1/6
 - Just a minor fix for avoiding a confliction with previous patch in 5/6
 - Cleanup (change fprintf() to pr_err() and an include guard) in 6/6

Thank you,

---

Masami Hiramatsu (5):
      virtio/console: Allocate scatterlist according to the current pipe size
      ftrace: Allow stealing pages from pipe buffer
      virtio/console: Wait until the port is ready on splice
      virtio/console: Add a failback for unstealable pipe buffer
      virtio/console: Add splice_write support

Yoshihiro YUNOMAE (1):
      tools: Add guest trace agent as a user tool


 drivers/char/virtio_console.c               |  198 ++++++++++++++++++--
 kernel/trace/trace.c                        |    8 -
 tools/virtio/virtio-trace/Makefile          |   14 +
 tools/virtio/virtio-trace/README            |  118 ++++++++++++
 tools/virtio/virtio-trace/trace-agent-ctl.c |  137 ++++++++++++++
 tools/virtio/virtio-trace/trace-agent-rw.c  |  192 +++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.c     |  270 +++++++++++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.h     |   75 ++++++++
 8 files changed, 985 insertions(+), 27 deletions(-)
 create mode 100644 tools/virtio/virtio-trace/Makefile
 create mode 100644 tools/virtio/virtio-trace/README
 create mode 100644 tools/virtio/virtio-trace/trace-agent-ctl.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent-rw.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.h

-- 
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae.ez@hitachi.com


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH V2 0/6] virtio-trace: Support virtio-trace
@ 2012-08-09 12:30 ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker, Rusty Russell,
	qemu-devel, Borislav Petkov, virtualization, Masami Hiramatsu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	yrl.pp-manager.tt

Hi All,

The following patch set provides a low-overhead system for collecting kernel
tracing data of guests by a host in a virtualization environment.

A guest OS generally shares some devices with other guests or a host, so
reasons of any problems occurring in a guest may be from other guests or a host.
Then, to collect some tracing data of a number of guests and a host is needed
when some problems occur in a virtualization environment. One of methods to
realize that is to collect tracing data of guests in a host. To do this, network
is generally used. However, high load will be taken to applications on guests
using network I/O because there are many network stack layers. Therefore,
a communication method for collecting the data without using network is needed.

We submitted a patch set of "IVRing", a ring-buffer driver constructed on
Inter-VM shared memory (IVShmem), to LKML http://lwn.net/Articles/500304/ in
this June. IVRing and the IVRing reader use POSIX shared memory each other
without using network, so a low-overhead system for collecting guest tracing
data is realized. However, this patch set has some problems as follows:
 - use IVShmem instead of virtio
 - create a new ring-buffer without using existing ring-buffer in kernel
 - scalability
   -- not support SMP environment
   -- buffer size limitation
   -- not support live migration (maybe difficult for realize this)

Therefore, we propose a new system "virtio-trace", which uses enhanced
virtio-serial and existing ring-buffer of ftrace, for collecting guest kernel
tracing data. In this system, there are 5 main components:
 (1) Ring-buffer of ftrace in a guest
     - When trace agent reads ring-buffer, a page is removed from ring-buffer.
 (2) Trace agent in the guest
     - Splice the page of ring-buffer to read_pipe using splice() without
       memory copying. Then, the page is spliced from write_pipe to virtio
       without memory copying.
 (3) Virtio-console driver in the guest
     - Pass the page to virtio-ring
 (4) Virtio-serial bus in QEMU
     - Copy the page to kernel pipe
 (5) Reader in the host
     - Read guest tracing data via FIFO(named pipe) 
Here, this patch set is only for guest kernels, so guest and host don't need to
run the same kernel.

In this patch set, we cannot get text-formatted data but raw-formatted data on
a host. perf and trace-cmd can actually translate raw to text by using
information of kernel or trace format under tracing/events directory in
debugfs. In the same way, if the information of a guest is exported to a host,
we can translate raw data of a guest to text data on a host or a remote host.
We will use 9pfs to export that.

***Evaluation***
When a host collects tracing data of a guest, the performance of using
virtio-trace is compared with that of using native(just running ftrace),
IVRing, and virtio-serial(normal method of read/write).

<environment>
The overview of this evaluation is as follows:
 (a) A guest on a KVM is prepared.
     - The guest is dedicated one physical CPU as a virtual CPU(VCPU).

 (b) The guest starts to write tracing data to ring-buffer of ftrace.
     - The probe points are all trace points of sched, timer, and kmem.

 (c) Writing trace data, dhrystone 2 in UNIX bench is executed as a benchmark
     tool in the guest.
     - Dhrystone 2 intends system performance by repeating integer arithmetic
       as a score.
     - Since higher score equals to better system performance, if the score
       decrease based on bare environment, it indicates that any operation
       disturbs the integer arithmetic. Then, we define the overhead of
       transporting trace data is calculated as follows:
		OVERHEAD = (1 - SCORE_OF_A_METHOD/NATIVE_SCORE) * 100.

The performance of each method is compared as follows:
 [1] Native
     - only recording trace data to ring-buffer on a guest
 [2] Virtio-trace
     - running a trace agent on a guest
     - a reader on a host opens FIFO using cat command
 [3] IVRing
     - A SystemTap script in a guest records trace data to IVRing.
       -- probe points are same as ftrace.
 [4] Virtio-serial(normal)
     - A reader(using cat) on a guest output trace data to a host using
       standard output via virtio-serial.

Other information is as follows:
 - host
   kernel: 3.3.7-1 (Fedora16)
   CPU: Intel Xeon x5660@2.80GHz(12core)
   Memory: 48GB

 - guest(only booting one guest)
   kernel: 3.5.0-rc4+ (Fedora16)
   CPU: 1VCPU(dedicated)
   Memory: 1GB

<result>
3 patterns based on the bare environment were indicated as follows:
	                   Scores      overhead against [0] Native
    [0] Native:          28807569.5               -
    [1] Virtio-trace:    28685049.5             0.43%
    [2] IVRing:          28418595.5             1.35%
    [3] Virtio-serial:   13262258.7            53.96%


***Just enhancement ideas***
 - Support for trace-cmd
 - Support for 9pfs protocol
 - Support for non-blocking mode in QEMU

v2:
 - Use GFP_KERNEL instead of GFP_ATOMIC in syscall context function in 1/6
 - Just a minor fix for avoiding a confliction with previous patch in 5/6
 - Cleanup (change fprintf() to pr_err() and an include guard) in 6/6

Thank you,

---

Masami Hiramatsu (5):
      virtio/console: Allocate scatterlist according to the current pipe size
      ftrace: Allow stealing pages from pipe buffer
      virtio/console: Wait until the port is ready on splice
      virtio/console: Add a failback for unstealable pipe buffer
      virtio/console: Add splice_write support

Yoshihiro YUNOMAE (1):
      tools: Add guest trace agent as a user tool


 drivers/char/virtio_console.c               |  198 ++++++++++++++++++--
 kernel/trace/trace.c                        |    8 -
 tools/virtio/virtio-trace/Makefile          |   14 +
 tools/virtio/virtio-trace/README            |  118 ++++++++++++
 tools/virtio/virtio-trace/trace-agent-ctl.c |  137 ++++++++++++++
 tools/virtio/virtio-trace/trace-agent-rw.c  |  192 +++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.c     |  270 +++++++++++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.h     |   75 ++++++++
 8 files changed, 985 insertions(+), 27 deletions(-)
 create mode 100644 tools/virtio/virtio-trace/Makefile
 create mode 100644 tools/virtio/virtio-trace/README
 create mode 100644 tools/virtio/virtio-trace/trace-agent-ctl.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent-rw.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.h

-- 
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae.ez@hitachi.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V2 0/6] virtio-trace: Support virtio-trace
@ 2012-08-09 12:30 ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker, qemu-devel,
	Borislav Petkov, virtualization, Masami Hiramatsu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	yrl.pp-manager.tt

Hi All,

The following patch set provides a low-overhead system for collecting kernel
tracing data of guests by a host in a virtualization environment.

A guest OS generally shares some devices with other guests or a host, so
reasons of any problems occurring in a guest may be from other guests or a host.
Then, to collect some tracing data of a number of guests and a host is needed
when some problems occur in a virtualization environment. One of methods to
realize that is to collect tracing data of guests in a host. To do this, network
is generally used. However, high load will be taken to applications on guests
using network I/O because there are many network stack layers. Therefore,
a communication method for collecting the data without using network is needed.

We submitted a patch set of "IVRing", a ring-buffer driver constructed on
Inter-VM shared memory (IVShmem), to LKML http://lwn.net/Articles/500304/ in
this June. IVRing and the IVRing reader use POSIX shared memory each other
without using network, so a low-overhead system for collecting guest tracing
data is realized. However, this patch set has some problems as follows:
 - use IVShmem instead of virtio
 - create a new ring-buffer without using existing ring-buffer in kernel
 - scalability
   -- not support SMP environment
   -- buffer size limitation
   -- not support live migration (maybe difficult for realize this)

Therefore, we propose a new system "virtio-trace", which uses enhanced
virtio-serial and existing ring-buffer of ftrace, for collecting guest kernel
tracing data. In this system, there are 5 main components:
 (1) Ring-buffer of ftrace in a guest
     - When trace agent reads ring-buffer, a page is removed from ring-buffer.
 (2) Trace agent in the guest
     - Splice the page of ring-buffer to read_pipe using splice() without
       memory copying. Then, the page is spliced from write_pipe to virtio
       without memory copying.
 (3) Virtio-console driver in the guest
     - Pass the page to virtio-ring
 (4) Virtio-serial bus in QEMU
     - Copy the page to kernel pipe
 (5) Reader in the host
     - Read guest tracing data via FIFO(named pipe) 
Here, this patch set is only for guest kernels, so guest and host don't need to
run the same kernel.

In this patch set, we cannot get text-formatted data but raw-formatted data on
a host. perf and trace-cmd can actually translate raw to text by using
information of kernel or trace format under tracing/events directory in
debugfs. In the same way, if the information of a guest is exported to a host,
we can translate raw data of a guest to text data on a host or a remote host.
We will use 9pfs to export that.

***Evaluation***
When a host collects tracing data of a guest, the performance of using
virtio-trace is compared with that of using native(just running ftrace),
IVRing, and virtio-serial(normal method of read/write).

<environment>
The overview of this evaluation is as follows:
 (a) A guest on a KVM is prepared.
     - The guest is dedicated one physical CPU as a virtual CPU(VCPU).

 (b) The guest starts to write tracing data to ring-buffer of ftrace.
     - The probe points are all trace points of sched, timer, and kmem.

 (c) Writing trace data, dhrystone 2 in UNIX bench is executed as a benchmark
     tool in the guest.
     - Dhrystone 2 intends system performance by repeating integer arithmetic
       as a score.
     - Since higher score equals to better system performance, if the score
       decrease based on bare environment, it indicates that any operation
       disturbs the integer arithmetic. Then, we define the overhead of
       transporting trace data is calculated as follows:
		OVERHEAD = (1 - SCORE_OF_A_METHOD/NATIVE_SCORE) * 100.

The performance of each method is compared as follows:
 [1] Native
     - only recording trace data to ring-buffer on a guest
 [2] Virtio-trace
     - running a trace agent on a guest
     - a reader on a host opens FIFO using cat command
 [3] IVRing
     - A SystemTap script in a guest records trace data to IVRing.
       -- probe points are same as ftrace.
 [4] Virtio-serial(normal)
     - A reader(using cat) on a guest output trace data to a host using
       standard output via virtio-serial.

Other information is as follows:
 - host
   kernel: 3.3.7-1 (Fedora16)
   CPU: Intel Xeon x5660@2.80GHz(12core)
   Memory: 48GB

 - guest(only booting one guest)
   kernel: 3.5.0-rc4+ (Fedora16)
   CPU: 1VCPU(dedicated)
   Memory: 1GB

<result>
3 patterns based on the bare environment were indicated as follows:
	                   Scores      overhead against [0] Native
    [0] Native:          28807569.5               -
    [1] Virtio-trace:    28685049.5             0.43%
    [2] IVRing:          28418595.5             1.35%
    [3] Virtio-serial:   13262258.7            53.96%


***Just enhancement ideas***
 - Support for trace-cmd
 - Support for 9pfs protocol
 - Support for non-blocking mode in QEMU

v2:
 - Use GFP_KERNEL instead of GFP_ATOMIC in syscall context function in 1/6
 - Just a minor fix for avoiding a confliction with previous patch in 5/6
 - Cleanup (change fprintf() to pr_err() and an include guard) in 6/6

Thank you,

---

Masami Hiramatsu (5):
      virtio/console: Allocate scatterlist according to the current pipe size
      ftrace: Allow stealing pages from pipe buffer
      virtio/console: Wait until the port is ready on splice
      virtio/console: Add a failback for unstealable pipe buffer
      virtio/console: Add splice_write support

Yoshihiro YUNOMAE (1):
      tools: Add guest trace agent as a user tool


 drivers/char/virtio_console.c               |  198 ++++++++++++++++++--
 kernel/trace/trace.c                        |    8 -
 tools/virtio/virtio-trace/Makefile          |   14 +
 tools/virtio/virtio-trace/README            |  118 ++++++++++++
 tools/virtio/virtio-trace/trace-agent-ctl.c |  137 ++++++++++++++
 tools/virtio/virtio-trace/trace-agent-rw.c  |  192 +++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.c     |  270 +++++++++++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.h     |   75 ++++++++
 8 files changed, 985 insertions(+), 27 deletions(-)
 create mode 100644 tools/virtio/virtio-trace/Makefile
 create mode 100644 tools/virtio/virtio-trace/README
 create mode 100644 tools/virtio/virtio-trace/trace-agent-ctl.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent-rw.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.h

-- 
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae.ez@hitachi.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V2 1/6] virtio/console: Add splice_write support
  2012-08-09 12:30 ` [Qemu-devel] " Yoshihiro YUNOMAE
  (?)
@ 2012-08-09 12:30   ` Yoshihiro YUNOMAE
  -1 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Masami Hiramatsu,
	Rusty Russell, Steven Rostedt, virtualization, qemu-devel,
	yrl.pp-manager.tt, Masami Hiramatsu

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Enable to use splice_write from pipe to virtio-console port.
This steals pages from pipe and directly send it to host.

Note that this may accelerate only the guest to host path.

Changes in v2:
 - Use GFP_KERNEL instead of GFP_ATOMIC in syscall context function.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---

 drivers/char/virtio_console.c |  136 +++++++++++++++++++++++++++++++++++++++--
 1 files changed, 128 insertions(+), 8 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index cdf2f54..730816c 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -24,6 +24,8 @@
 #include <linux/err.h>
 #include <linux/freezer.h>
 #include <linux/fs.h>
+#include <linux/splice.h>
+#include <linux/pagemap.h>
 #include <linux/init.h>
 #include <linux/list.h>
 #include <linux/poll.h>
@@ -227,6 +229,7 @@ struct port {
 	bool guest_connected;
 };
 
+#define MAX_SPLICE_PAGES	32
 /* This is the very early arch-specified put chars function. */
 static int (*early_put_chars)(u32, const char *, int);
 
@@ -474,26 +477,52 @@ static ssize_t send_control_msg(struct port *port, unsigned int event,
 	return 0;
 }
 
+struct buffer_token {
+	union {
+		void *buf;
+		struct scatterlist *sg;
+	} u;
+	bool sgpages;
+};
+
+static void reclaim_sg_pages(struct scatterlist *sg)
+{
+	int i;
+	struct page *page;
+
+	for (i = 0; i < MAX_SPLICE_PAGES; i++) {
+		page = sg_page(&sg[i]);
+		if (!page)
+			break;
+		put_page(page);
+	}
+	kfree(sg);
+}
+
 /* Callers must take the port->outvq_lock */
 static void reclaim_consumed_buffers(struct port *port)
 {
-	void *buf;
+	struct buffer_token *tok;
 	unsigned int len;
 
 	if (!port->portdev) {
 		/* Device has been unplugged.  vqs are already gone. */
 		return;
 	}
-	while ((buf = virtqueue_get_buf(port->out_vq, &len))) {
-		kfree(buf);
+	while ((tok = virtqueue_get_buf(port->out_vq, &len))) {
+		if (tok->sgpages)
+			reclaim_sg_pages(tok->u.sg);
+		else
+			kfree(tok->u.buf);
+		kfree(tok);
 		port->outvq_full = false;
 	}
 }
 
-static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
-			bool nonblock)
+static ssize_t __send_to_port(struct port *port, struct scatterlist *sg,
+			      int nents, size_t in_count,
+			      struct buffer_token *tok, bool nonblock)
 {
-	struct scatterlist sg[1];
 	struct virtqueue *out_vq;
 	ssize_t ret;
 	unsigned long flags;
@@ -505,8 +534,7 @@ static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
 
 	reclaim_consumed_buffers(port);
 
-	sg_init_one(sg, in_buf, in_count);
-	ret = virtqueue_add_buf(out_vq, sg, 1, 0, in_buf, GFP_ATOMIC);
+	ret = virtqueue_add_buf(out_vq, sg, nents, 0, tok, GFP_ATOMIC);
 
 	/* Tell Host to go! */
 	virtqueue_kick(out_vq);
@@ -544,6 +572,37 @@ done:
 	return in_count;
 }
 
+static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
+			bool nonblock)
+{
+	struct scatterlist sg[1];
+	struct buffer_token *tok;
+
+	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
+	if (!tok)
+		return -ENOMEM;
+	tok->sgpages = false;
+	tok->u.buf = in_buf;
+
+	sg_init_one(sg, in_buf, in_count);
+
+	return __send_to_port(port, sg, 1, in_count, tok, nonblock);
+}
+
+static ssize_t send_pages(struct port *port, struct scatterlist *sg, int nents,
+			  size_t in_count, bool nonblock)
+{
+	struct buffer_token *tok;
+
+	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
+	if (!tok)
+		return -ENOMEM;
+	tok->sgpages = true;
+	tok->u.sg = sg;
+
+	return __send_to_port(port, sg, nents, in_count, tok, nonblock);
+}
+
 /*
  * Give out the data that's requested from the buffer that we have
  * queued up.
@@ -725,6 +784,66 @@ out:
 	return ret;
 }
 
+struct sg_list {
+	unsigned int n;
+	size_t len;
+	struct scatterlist *sg;
+};
+
+static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
+			struct splice_desc *sd)
+{
+	struct sg_list *sgl = sd->u.data;
+	unsigned int len = 0;
+
+	if (sgl->n == MAX_SPLICE_PAGES)
+		return 0;
+
+	/* Try lock this page */
+	if (buf->ops->steal(pipe, buf) == 0) {
+		/* Get reference and unlock page for moving */
+		get_page(buf->page);
+		unlock_page(buf->page);
+
+		len = min(buf->len, sd->len);
+		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
+		sgl->n++;
+		sgl->len += len;
+	}
+
+	return len;
+}
+
+/* Faster zero-copy write by splicing */
+static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
+				      struct file *filp, loff_t *ppos,
+				      size_t len, unsigned int flags)
+{
+	struct port *port = filp->private_data;
+	struct sg_list sgl;
+	ssize_t ret;
+	struct splice_desc sd = {
+		.total_len = len,
+		.flags = flags,
+		.pos = *ppos,
+		.u.data = &sgl,
+	};
+
+	sgl.n = 0;
+	sgl.len = 0;
+	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,
+			 GFP_KERNEL);
+	if (unlikely(!sgl.sg))
+		return -ENOMEM;
+
+	sg_init_table(sgl.sg, MAX_SPLICE_PAGES);
+	ret = __splice_from_pipe(pipe, &sd, pipe_to_sg);
+	if (likely(ret > 0))
+		ret = send_pages(port, sgl.sg, sgl.n, sgl.len, true);
+
+	return ret;
+}
+
 static unsigned int port_fops_poll(struct file *filp, poll_table *wait)
 {
 	struct port *port;
@@ -856,6 +975,7 @@ static const struct file_operations port_fops = {
 	.open  = port_fops_open,
 	.read  = port_fops_read,
 	.write = port_fops_write,
+	.splice_write = port_fops_splice_write,
 	.poll  = port_fops_poll,
 	.release = port_fops_release,
 	.fasync = port_fops_fasync,



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH V2 1/6] virtio/console: Add splice_write support
@ 2012-08-09 12:30   ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker, Rusty Russell,
	qemu-devel, Borislav Petkov, virtualization, Masami Hiramatsu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	yrl.pp-manager.tt

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Enable to use splice_write from pipe to virtio-console port.
This steals pages from pipe and directly send it to host.

Note that this may accelerate only the guest to host path.

Changes in v2:
 - Use GFP_KERNEL instead of GFP_ATOMIC in syscall context function.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---

 drivers/char/virtio_console.c |  136 +++++++++++++++++++++++++++++++++++++++--
 1 files changed, 128 insertions(+), 8 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index cdf2f54..730816c 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -24,6 +24,8 @@
 #include <linux/err.h>
 #include <linux/freezer.h>
 #include <linux/fs.h>
+#include <linux/splice.h>
+#include <linux/pagemap.h>
 #include <linux/init.h>
 #include <linux/list.h>
 #include <linux/poll.h>
@@ -227,6 +229,7 @@ struct port {
 	bool guest_connected;
 };
 
+#define MAX_SPLICE_PAGES	32
 /* This is the very early arch-specified put chars function. */
 static int (*early_put_chars)(u32, const char *, int);
 
@@ -474,26 +477,52 @@ static ssize_t send_control_msg(struct port *port, unsigned int event,
 	return 0;
 }
 
+struct buffer_token {
+	union {
+		void *buf;
+		struct scatterlist *sg;
+	} u;
+	bool sgpages;
+};
+
+static void reclaim_sg_pages(struct scatterlist *sg)
+{
+	int i;
+	struct page *page;
+
+	for (i = 0; i < MAX_SPLICE_PAGES; i++) {
+		page = sg_page(&sg[i]);
+		if (!page)
+			break;
+		put_page(page);
+	}
+	kfree(sg);
+}
+
 /* Callers must take the port->outvq_lock */
 static void reclaim_consumed_buffers(struct port *port)
 {
-	void *buf;
+	struct buffer_token *tok;
 	unsigned int len;
 
 	if (!port->portdev) {
 		/* Device has been unplugged.  vqs are already gone. */
 		return;
 	}
-	while ((buf = virtqueue_get_buf(port->out_vq, &len))) {
-		kfree(buf);
+	while ((tok = virtqueue_get_buf(port->out_vq, &len))) {
+		if (tok->sgpages)
+			reclaim_sg_pages(tok->u.sg);
+		else
+			kfree(tok->u.buf);
+		kfree(tok);
 		port->outvq_full = false;
 	}
 }
 
-static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
-			bool nonblock)
+static ssize_t __send_to_port(struct port *port, struct scatterlist *sg,
+			      int nents, size_t in_count,
+			      struct buffer_token *tok, bool nonblock)
 {
-	struct scatterlist sg[1];
 	struct virtqueue *out_vq;
 	ssize_t ret;
 	unsigned long flags;
@@ -505,8 +534,7 @@ static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
 
 	reclaim_consumed_buffers(port);
 
-	sg_init_one(sg, in_buf, in_count);
-	ret = virtqueue_add_buf(out_vq, sg, 1, 0, in_buf, GFP_ATOMIC);
+	ret = virtqueue_add_buf(out_vq, sg, nents, 0, tok, GFP_ATOMIC);
 
 	/* Tell Host to go! */
 	virtqueue_kick(out_vq);
@@ -544,6 +572,37 @@ done:
 	return in_count;
 }
 
+static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
+			bool nonblock)
+{
+	struct scatterlist sg[1];
+	struct buffer_token *tok;
+
+	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
+	if (!tok)
+		return -ENOMEM;
+	tok->sgpages = false;
+	tok->u.buf = in_buf;
+
+	sg_init_one(sg, in_buf, in_count);
+
+	return __send_to_port(port, sg, 1, in_count, tok, nonblock);
+}
+
+static ssize_t send_pages(struct port *port, struct scatterlist *sg, int nents,
+			  size_t in_count, bool nonblock)
+{
+	struct buffer_token *tok;
+
+	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
+	if (!tok)
+		return -ENOMEM;
+	tok->sgpages = true;
+	tok->u.sg = sg;
+
+	return __send_to_port(port, sg, nents, in_count, tok, nonblock);
+}
+
 /*
  * Give out the data that's requested from the buffer that we have
  * queued up.
@@ -725,6 +784,66 @@ out:
 	return ret;
 }
 
+struct sg_list {
+	unsigned int n;
+	size_t len;
+	struct scatterlist *sg;
+};
+
+static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
+			struct splice_desc *sd)
+{
+	struct sg_list *sgl = sd->u.data;
+	unsigned int len = 0;
+
+	if (sgl->n == MAX_SPLICE_PAGES)
+		return 0;
+
+	/* Try lock this page */
+	if (buf->ops->steal(pipe, buf) == 0) {
+		/* Get reference and unlock page for moving */
+		get_page(buf->page);
+		unlock_page(buf->page);
+
+		len = min(buf->len, sd->len);
+		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
+		sgl->n++;
+		sgl->len += len;
+	}
+
+	return len;
+}
+
+/* Faster zero-copy write by splicing */
+static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
+				      struct file *filp, loff_t *ppos,
+				      size_t len, unsigned int flags)
+{
+	struct port *port = filp->private_data;
+	struct sg_list sgl;
+	ssize_t ret;
+	struct splice_desc sd = {
+		.total_len = len,
+		.flags = flags,
+		.pos = *ppos,
+		.u.data = &sgl,
+	};
+
+	sgl.n = 0;
+	sgl.len = 0;
+	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,
+			 GFP_KERNEL);
+	if (unlikely(!sgl.sg))
+		return -ENOMEM;
+
+	sg_init_table(sgl.sg, MAX_SPLICE_PAGES);
+	ret = __splice_from_pipe(pipe, &sd, pipe_to_sg);
+	if (likely(ret > 0))
+		ret = send_pages(port, sgl.sg, sgl.n, sgl.len, true);
+
+	return ret;
+}
+
 static unsigned int port_fops_poll(struct file *filp, poll_table *wait)
 {
 	struct port *port;
@@ -856,6 +975,7 @@ static const struct file_operations port_fops = {
 	.open  = port_fops_open,
 	.read  = port_fops_read,
 	.write = port_fops_write,
+	.splice_write = port_fops_splice_write,
 	.poll  = port_fops_poll,
 	.release = port_fops_release,
 	.fasync = port_fops_fasync,

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 1/6] virtio/console: Add splice_write support
@ 2012-08-09 12:30   ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker, qemu-devel,
	Borislav Petkov, virtualization, Masami Hiramatsu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	yrl.pp-manager.tt

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Enable to use splice_write from pipe to virtio-console port.
This steals pages from pipe and directly send it to host.

Note that this may accelerate only the guest to host path.

Changes in v2:
 - Use GFP_KERNEL instead of GFP_ATOMIC in syscall context function.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---

 drivers/char/virtio_console.c |  136 +++++++++++++++++++++++++++++++++++++++--
 1 files changed, 128 insertions(+), 8 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index cdf2f54..730816c 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -24,6 +24,8 @@
 #include <linux/err.h>
 #include <linux/freezer.h>
 #include <linux/fs.h>
+#include <linux/splice.h>
+#include <linux/pagemap.h>
 #include <linux/init.h>
 #include <linux/list.h>
 #include <linux/poll.h>
@@ -227,6 +229,7 @@ struct port {
 	bool guest_connected;
 };
 
+#define MAX_SPLICE_PAGES	32
 /* This is the very early arch-specified put chars function. */
 static int (*early_put_chars)(u32, const char *, int);
 
@@ -474,26 +477,52 @@ static ssize_t send_control_msg(struct port *port, unsigned int event,
 	return 0;
 }
 
+struct buffer_token {
+	union {
+		void *buf;
+		struct scatterlist *sg;
+	} u;
+	bool sgpages;
+};
+
+static void reclaim_sg_pages(struct scatterlist *sg)
+{
+	int i;
+	struct page *page;
+
+	for (i = 0; i < MAX_SPLICE_PAGES; i++) {
+		page = sg_page(&sg[i]);
+		if (!page)
+			break;
+		put_page(page);
+	}
+	kfree(sg);
+}
+
 /* Callers must take the port->outvq_lock */
 static void reclaim_consumed_buffers(struct port *port)
 {
-	void *buf;
+	struct buffer_token *tok;
 	unsigned int len;
 
 	if (!port->portdev) {
 		/* Device has been unplugged.  vqs are already gone. */
 		return;
 	}
-	while ((buf = virtqueue_get_buf(port->out_vq, &len))) {
-		kfree(buf);
+	while ((tok = virtqueue_get_buf(port->out_vq, &len))) {
+		if (tok->sgpages)
+			reclaim_sg_pages(tok->u.sg);
+		else
+			kfree(tok->u.buf);
+		kfree(tok);
 		port->outvq_full = false;
 	}
 }
 
-static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
-			bool nonblock)
+static ssize_t __send_to_port(struct port *port, struct scatterlist *sg,
+			      int nents, size_t in_count,
+			      struct buffer_token *tok, bool nonblock)
 {
-	struct scatterlist sg[1];
 	struct virtqueue *out_vq;
 	ssize_t ret;
 	unsigned long flags;
@@ -505,8 +534,7 @@ static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
 
 	reclaim_consumed_buffers(port);
 
-	sg_init_one(sg, in_buf, in_count);
-	ret = virtqueue_add_buf(out_vq, sg, 1, 0, in_buf, GFP_ATOMIC);
+	ret = virtqueue_add_buf(out_vq, sg, nents, 0, tok, GFP_ATOMIC);
 
 	/* Tell Host to go! */
 	virtqueue_kick(out_vq);
@@ -544,6 +572,37 @@ done:
 	return in_count;
 }
 
+static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
+			bool nonblock)
+{
+	struct scatterlist sg[1];
+	struct buffer_token *tok;
+
+	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
+	if (!tok)
+		return -ENOMEM;
+	tok->sgpages = false;
+	tok->u.buf = in_buf;
+
+	sg_init_one(sg, in_buf, in_count);
+
+	return __send_to_port(port, sg, 1, in_count, tok, nonblock);
+}
+
+static ssize_t send_pages(struct port *port, struct scatterlist *sg, int nents,
+			  size_t in_count, bool nonblock)
+{
+	struct buffer_token *tok;
+
+	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
+	if (!tok)
+		return -ENOMEM;
+	tok->sgpages = true;
+	tok->u.sg = sg;
+
+	return __send_to_port(port, sg, nents, in_count, tok, nonblock);
+}
+
 /*
  * Give out the data that's requested from the buffer that we have
  * queued up.
@@ -725,6 +784,66 @@ out:
 	return ret;
 }
 
+struct sg_list {
+	unsigned int n;
+	size_t len;
+	struct scatterlist *sg;
+};
+
+static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
+			struct splice_desc *sd)
+{
+	struct sg_list *sgl = sd->u.data;
+	unsigned int len = 0;
+
+	if (sgl->n == MAX_SPLICE_PAGES)
+		return 0;
+
+	/* Try lock this page */
+	if (buf->ops->steal(pipe, buf) == 0) {
+		/* Get reference and unlock page for moving */
+		get_page(buf->page);
+		unlock_page(buf->page);
+
+		len = min(buf->len, sd->len);
+		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
+		sgl->n++;
+		sgl->len += len;
+	}
+
+	return len;
+}
+
+/* Faster zero-copy write by splicing */
+static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
+				      struct file *filp, loff_t *ppos,
+				      size_t len, unsigned int flags)
+{
+	struct port *port = filp->private_data;
+	struct sg_list sgl;
+	ssize_t ret;
+	struct splice_desc sd = {
+		.total_len = len,
+		.flags = flags,
+		.pos = *ppos,
+		.u.data = &sgl,
+	};
+
+	sgl.n = 0;
+	sgl.len = 0;
+	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,
+			 GFP_KERNEL);
+	if (unlikely(!sgl.sg))
+		return -ENOMEM;
+
+	sg_init_table(sgl.sg, MAX_SPLICE_PAGES);
+	ret = __splice_from_pipe(pipe, &sd, pipe_to_sg);
+	if (likely(ret > 0))
+		ret = send_pages(port, sgl.sg, sgl.n, sgl.len, true);
+
+	return ret;
+}
+
 static unsigned int port_fops_poll(struct file *filp, poll_table *wait)
 {
 	struct port *port;
@@ -856,6 +975,7 @@ static const struct file_operations port_fops = {
 	.open  = port_fops_open,
 	.read  = port_fops_read,
 	.write = port_fops_write,
+	.splice_write = port_fops_splice_write,
 	.poll  = port_fops_poll,
 	.release = port_fops_release,
 	.fasync = port_fops_fasync,

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 2/6] virtio/console: Add a failback for unstealable pipe buffer
  2012-08-09 12:30 ` [Qemu-devel] " Yoshihiro YUNOMAE
  (?)
@ 2012-08-09 12:30   ` Yoshihiro YUNOMAE
  -1 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Masami Hiramatsu,
	Rusty Russell, Steven Rostedt, virtualization, qemu-devel,
	yrl.pp-manager.tt, Masami Hiramatsu

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Add a failback memcpy path for unstealable pipe buffer.
If buf->ops->steal() fails, virtio-serial tries to
copy the page contents to an allocated page, instead
of just failing splice().

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---

 drivers/char/virtio_console.c |   28 +++++++++++++++++++++++++---
 1 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 730816c..22b7373 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -794,7 +794,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 			struct splice_desc *sd)
 {
 	struct sg_list *sgl = sd->u.data;
-	unsigned int len = 0;
+	unsigned int offset, len;
 
 	if (sgl->n == MAX_SPLICE_PAGES)
 		return 0;
@@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 
 		len = min(buf->len, sd->len);
 		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
-		sgl->n++;
-		sgl->len += len;
+	} else {
+		/* Failback to copying a page */
+		struct page *page = alloc_page(GFP_KERNEL);
+		char *src = buf->ops->map(pipe, buf, 1);
+		char *dst;
+
+		if (!page)
+			return -ENOMEM;
+		dst = kmap(page);
+
+		offset = sd->pos & ~PAGE_MASK;
+
+		len = sd->len;
+		if (len + offset > PAGE_SIZE)
+			len = PAGE_SIZE - offset;
+
+		memcpy(dst + offset, src + buf->offset, len);
+
+		kunmap(page);
+		buf->ops->unmap(pipe, buf, src);
+
+		sg_set_page(&(sgl->sg[sgl->n]), page, len, offset);
 	}
+	sgl->n++;
+	sgl->len += len;
 
 	return len;
 }



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH V2 2/6] virtio/console: Add a failback for unstealable pipe buffer
@ 2012-08-09 12:30   ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker, Rusty Russell,
	qemu-devel, Borislav Petkov, virtualization, Masami Hiramatsu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	yrl.pp-manager.tt

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Add a failback memcpy path for unstealable pipe buffer.
If buf->ops->steal() fails, virtio-serial tries to
copy the page contents to an allocated page, instead
of just failing splice().

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---

 drivers/char/virtio_console.c |   28 +++++++++++++++++++++++++---
 1 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 730816c..22b7373 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -794,7 +794,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 			struct splice_desc *sd)
 {
 	struct sg_list *sgl = sd->u.data;
-	unsigned int len = 0;
+	unsigned int offset, len;
 
 	if (sgl->n == MAX_SPLICE_PAGES)
 		return 0;
@@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 
 		len = min(buf->len, sd->len);
 		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
-		sgl->n++;
-		sgl->len += len;
+	} else {
+		/* Failback to copying a page */
+		struct page *page = alloc_page(GFP_KERNEL);
+		char *src = buf->ops->map(pipe, buf, 1);
+		char *dst;
+
+		if (!page)
+			return -ENOMEM;
+		dst = kmap(page);
+
+		offset = sd->pos & ~PAGE_MASK;
+
+		len = sd->len;
+		if (len + offset > PAGE_SIZE)
+			len = PAGE_SIZE - offset;
+
+		memcpy(dst + offset, src + buf->offset, len);
+
+		kunmap(page);
+		buf->ops->unmap(pipe, buf, src);
+
+		sg_set_page(&(sgl->sg[sgl->n]), page, len, offset);
 	}
+	sgl->n++;
+	sgl->len += len;
 
 	return len;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 2/6] virtio/console: Add a failback for unstealable pipe buffer
@ 2012-08-09 12:30   ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker, qemu-devel,
	Borislav Petkov, virtualization, Masami Hiramatsu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	yrl.pp-manager.tt

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Add a failback memcpy path for unstealable pipe buffer.
If buf->ops->steal() fails, virtio-serial tries to
copy the page contents to an allocated page, instead
of just failing splice().

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---

 drivers/char/virtio_console.c |   28 +++++++++++++++++++++++++---
 1 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 730816c..22b7373 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -794,7 +794,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 			struct splice_desc *sd)
 {
 	struct sg_list *sgl = sd->u.data;
-	unsigned int len = 0;
+	unsigned int offset, len;
 
 	if (sgl->n == MAX_SPLICE_PAGES)
 		return 0;
@@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 
 		len = min(buf->len, sd->len);
 		sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
-		sgl->n++;
-		sgl->len += len;
+	} else {
+		/* Failback to copying a page */
+		struct page *page = alloc_page(GFP_KERNEL);
+		char *src = buf->ops->map(pipe, buf, 1);
+		char *dst;
+
+		if (!page)
+			return -ENOMEM;
+		dst = kmap(page);
+
+		offset = sd->pos & ~PAGE_MASK;
+
+		len = sd->len;
+		if (len + offset > PAGE_SIZE)
+			len = PAGE_SIZE - offset;
+
+		memcpy(dst + offset, src + buf->offset, len);
+
+		kunmap(page);
+		buf->ops->unmap(pipe, buf, src);
+
+		sg_set_page(&(sgl->sg[sgl->n]), page, len, offset);
 	}
+	sgl->n++;
+	sgl->len += len;
 
 	return len;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 3/6] virtio/console: Wait until the port is ready on splice
  2012-08-09 12:30 ` [Qemu-devel] " Yoshihiro YUNOMAE
  (?)
@ 2012-08-09 12:31   ` Yoshihiro YUNOMAE
  -1 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Masami Hiramatsu,
	Rusty Russell, Steven Rostedt, virtualization, qemu-devel,
	yrl.pp-manager.tt, Masami Hiramatsu

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Wait if the port is not connected or full on splice
like as write is doing.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---

 drivers/char/virtio_console.c |   39 +++++++++++++++++++++++++++------------
 1 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 22b7373..b2fc2ab 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -724,6 +724,26 @@ static ssize_t port_fops_read(struct file *filp, char __user *ubuf,
 	return fill_readbuf(port, ubuf, count, true);
 }
 
+static int wait_port_writable(struct port *port, bool nonblock)
+{
+	int ret;
+
+	if (will_write_block(port)) {
+		if (nonblock)
+			return -EAGAIN;
+
+		ret = wait_event_freezable(port->waitqueue,
+					   !will_write_block(port));
+		if (ret < 0)
+			return ret;
+	}
+	/* Port got hot-unplugged. */
+	if (!port->guest_connected)
+		return -ENODEV;
+
+	return 0;
+}
+
 static ssize_t port_fops_write(struct file *filp, const char __user *ubuf,
 			       size_t count, loff_t *offp)
 {
@@ -740,18 +760,9 @@ static ssize_t port_fops_write(struct file *filp, const char __user *ubuf,
 
 	nonblock = filp->f_flags & O_NONBLOCK;
 
-	if (will_write_block(port)) {
-		if (nonblock)
-			return -EAGAIN;
-
-		ret = wait_event_freezable(port->waitqueue,
-					   !will_write_block(port));
-		if (ret < 0)
-			return ret;
-	}
-	/* Port got hot-unplugged. */
-	if (!port->guest_connected)
-		return -ENODEV;
+	ret = wait_port_writable(port, nonblock);
+	if (ret < 0)
+		return ret;
 
 	count = min((size_t)(32 * 1024), count);
 
@@ -851,6 +862,10 @@ static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
 		.u.data = &sgl,
 	};
 
+	ret = wait_port_writable(port, filp->f_flags & O_NONBLOCK);
+	if (ret < 0)
+		return ret;
+
 	sgl.n = 0;
 	sgl.len = 0;
 	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH V2 3/6] virtio/console: Wait until the port is ready on splice
@ 2012-08-09 12:31   ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker, Rusty Russell,
	qemu-devel, Borislav Petkov, virtualization, Masami Hiramatsu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	yrl.pp-manager.tt

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Wait if the port is not connected or full on splice
like as write is doing.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---

 drivers/char/virtio_console.c |   39 +++++++++++++++++++++++++++------------
 1 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 22b7373..b2fc2ab 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -724,6 +724,26 @@ static ssize_t port_fops_read(struct file *filp, char __user *ubuf,
 	return fill_readbuf(port, ubuf, count, true);
 }
 
+static int wait_port_writable(struct port *port, bool nonblock)
+{
+	int ret;
+
+	if (will_write_block(port)) {
+		if (nonblock)
+			return -EAGAIN;
+
+		ret = wait_event_freezable(port->waitqueue,
+					   !will_write_block(port));
+		if (ret < 0)
+			return ret;
+	}
+	/* Port got hot-unplugged. */
+	if (!port->guest_connected)
+		return -ENODEV;
+
+	return 0;
+}
+
 static ssize_t port_fops_write(struct file *filp, const char __user *ubuf,
 			       size_t count, loff_t *offp)
 {
@@ -740,18 +760,9 @@ static ssize_t port_fops_write(struct file *filp, const char __user *ubuf,
 
 	nonblock = filp->f_flags & O_NONBLOCK;
 
-	if (will_write_block(port)) {
-		if (nonblock)
-			return -EAGAIN;
-
-		ret = wait_event_freezable(port->waitqueue,
-					   !will_write_block(port));
-		if (ret < 0)
-			return ret;
-	}
-	/* Port got hot-unplugged. */
-	if (!port->guest_connected)
-		return -ENODEV;
+	ret = wait_port_writable(port, nonblock);
+	if (ret < 0)
+		return ret;
 
 	count = min((size_t)(32 * 1024), count);
 
@@ -851,6 +862,10 @@ static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
 		.u.data = &sgl,
 	};
 
+	ret = wait_port_writable(port, filp->f_flags & O_NONBLOCK);
+	if (ret < 0)
+		return ret;
+
 	sgl.n = 0;
 	sgl.len = 0;
 	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 3/6] virtio/console: Wait until the port is ready on splice
@ 2012-08-09 12:31   ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker, qemu-devel,
	Borislav Petkov, virtualization, Masami Hiramatsu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	yrl.pp-manager.tt

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Wait if the port is not connected or full on splice
like as write is doing.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---

 drivers/char/virtio_console.c |   39 +++++++++++++++++++++++++++------------
 1 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 22b7373..b2fc2ab 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -724,6 +724,26 @@ static ssize_t port_fops_read(struct file *filp, char __user *ubuf,
 	return fill_readbuf(port, ubuf, count, true);
 }
 
+static int wait_port_writable(struct port *port, bool nonblock)
+{
+	int ret;
+
+	if (will_write_block(port)) {
+		if (nonblock)
+			return -EAGAIN;
+
+		ret = wait_event_freezable(port->waitqueue,
+					   !will_write_block(port));
+		if (ret < 0)
+			return ret;
+	}
+	/* Port got hot-unplugged. */
+	if (!port->guest_connected)
+		return -ENODEV;
+
+	return 0;
+}
+
 static ssize_t port_fops_write(struct file *filp, const char __user *ubuf,
 			       size_t count, loff_t *offp)
 {
@@ -740,18 +760,9 @@ static ssize_t port_fops_write(struct file *filp, const char __user *ubuf,
 
 	nonblock = filp->f_flags & O_NONBLOCK;
 
-	if (will_write_block(port)) {
-		if (nonblock)
-			return -EAGAIN;
-
-		ret = wait_event_freezable(port->waitqueue,
-					   !will_write_block(port));
-		if (ret < 0)
-			return ret;
-	}
-	/* Port got hot-unplugged. */
-	if (!port->guest_connected)
-		return -ENODEV;
+	ret = wait_port_writable(port, nonblock);
+	if (ret < 0)
+		return ret;
 
 	count = min((size_t)(32 * 1024), count);
 
@@ -851,6 +862,10 @@ static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
 		.u.data = &sgl,
 	};
 
+	ret = wait_port_writable(port, filp->f_flags & O_NONBLOCK);
+	if (ret < 0)
+		return ret;
+
 	sgl.n = 0;
 	sgl.len = 0;
 	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 4/6] ftrace: Allow stealing pages from pipe buffer
  2012-08-09 12:30 ` [Qemu-devel] " Yoshihiro YUNOMAE
  (?)
@ 2012-08-09 12:31   ` Yoshihiro YUNOMAE
  -1 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Masami Hiramatsu,
	Rusty Russell, Steven Rostedt, virtualization, qemu-devel,
	yrl.pp-manager.tt, Masami Hiramatsu, Steven Rostedt

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Use generic steal operation on pipe buffer to allow stealing
ring buffer's read page from pipe buffer.

Note that this could reduce the performance of splice on the
splice_write side operation without affinity setting.
Since the ring buffer's read pages are allocated on the
tracing-node, but the splice user does not always execute
splice write side operation on the same node. In this case,
the page will be accessed from the another node.
Thus, it is strongly recommended to assign the splicing
thread to corresponding node.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
---

 kernel/trace/trace.c |    8 +-------
 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a120f98..ae01930 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4194,12 +4194,6 @@ static void buffer_pipe_buf_release(struct pipe_inode_info *pipe,
 	buf->private = 0;
 }
 
-static int buffer_pipe_buf_steal(struct pipe_inode_info *pipe,
-				 struct pipe_buffer *buf)
-{
-	return 1;
-}
-
 static void buffer_pipe_buf_get(struct pipe_inode_info *pipe,
 				struct pipe_buffer *buf)
 {
@@ -4215,7 +4209,7 @@ static const struct pipe_buf_operations buffer_pipe_buf_ops = {
 	.unmap			= generic_pipe_buf_unmap,
 	.confirm		= generic_pipe_buf_confirm,
 	.release		= buffer_pipe_buf_release,
-	.steal			= buffer_pipe_buf_steal,
+	.steal			= generic_pipe_buf_steal,
 	.get			= buffer_pipe_buf_get,
 };
 



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH V2 4/6] ftrace: Allow stealing pages from pipe buffer
@ 2012-08-09 12:31   ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker, Rusty Russell,
	qemu-devel, Borislav Petkov, virtualization, Masami Hiramatsu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	yrl.pp-manager.tt

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Use generic steal operation on pipe buffer to allow stealing
ring buffer's read page from pipe buffer.

Note that this could reduce the performance of splice on the
splice_write side operation without affinity setting.
Since the ring buffer's read pages are allocated on the
tracing-node, but the splice user does not always execute
splice write side operation on the same node. In this case,
the page will be accessed from the another node.
Thus, it is strongly recommended to assign the splicing
thread to corresponding node.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
---

 kernel/trace/trace.c |    8 +-------
 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a120f98..ae01930 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4194,12 +4194,6 @@ static void buffer_pipe_buf_release(struct pipe_inode_info *pipe,
 	buf->private = 0;
 }
 
-static int buffer_pipe_buf_steal(struct pipe_inode_info *pipe,
-				 struct pipe_buffer *buf)
-{
-	return 1;
-}
-
 static void buffer_pipe_buf_get(struct pipe_inode_info *pipe,
 				struct pipe_buffer *buf)
 {
@@ -4215,7 +4209,7 @@ static const struct pipe_buf_operations buffer_pipe_buf_ops = {
 	.unmap			= generic_pipe_buf_unmap,
 	.confirm		= generic_pipe_buf_confirm,
 	.release		= buffer_pipe_buf_release,
-	.steal			= buffer_pipe_buf_steal,
+	.steal			= generic_pipe_buf_steal,
 	.get			= buffer_pipe_buf_get,
 };
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 4/6] ftrace: Allow stealing pages from pipe buffer
@ 2012-08-09 12:31   ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker, qemu-devel,
	Borislav Petkov, virtualization, Masami Hiramatsu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	yrl.pp-manager.tt

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Use generic steal operation on pipe buffer to allow stealing
ring buffer's read page from pipe buffer.

Note that this could reduce the performance of splice on the
splice_write side operation without affinity setting.
Since the ring buffer's read pages are allocated on the
tracing-node, but the splice user does not always execute
splice write side operation on the same node. In this case,
the page will be accessed from the another node.
Thus, it is strongly recommended to assign the splicing
thread to corresponding node.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
---

 kernel/trace/trace.c |    8 +-------
 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a120f98..ae01930 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4194,12 +4194,6 @@ static void buffer_pipe_buf_release(struct pipe_inode_info *pipe,
 	buf->private = 0;
 }
 
-static int buffer_pipe_buf_steal(struct pipe_inode_info *pipe,
-				 struct pipe_buffer *buf)
-{
-	return 1;
-}
-
 static void buffer_pipe_buf_get(struct pipe_inode_info *pipe,
 				struct pipe_buffer *buf)
 {
@@ -4215,7 +4209,7 @@ static const struct pipe_buf_operations buffer_pipe_buf_ops = {
 	.unmap			= generic_pipe_buf_unmap,
 	.confirm		= generic_pipe_buf_confirm,
 	.release		= buffer_pipe_buf_release,
-	.steal			= buffer_pipe_buf_steal,
+	.steal			= generic_pipe_buf_steal,
 	.get			= buffer_pipe_buf_get,
 };

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 5/6] virtio/console: Allocate scatterlist according to the current pipe size
  2012-08-09 12:30 ` [Qemu-devel] " Yoshihiro YUNOMAE
  (?)
@ 2012-08-09 12:31   ` Yoshihiro YUNOMAE
  -1 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Masami Hiramatsu,
	Rusty Russell, Steven Rostedt, virtualization, qemu-devel,
	yrl.pp-manager.tt, Masami Hiramatsu

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Allocate scatterlist according to the current pipe size.
This allows splicing bigger buffer if the pipe size has
been changed by fcntl.

Changes in v2:
 - Just a minor fix for avoiding a confliction with previous patch.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---

 drivers/char/virtio_console.c |   23 ++++++++++++-----------
 1 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index b2fc2ab..e88f843 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -229,7 +229,6 @@ struct port {
 	bool guest_connected;
 };
 
-#define MAX_SPLICE_PAGES	32
 /* This is the very early arch-specified put chars function. */
 static int (*early_put_chars)(u32, const char *, int);
 
@@ -482,15 +481,16 @@ struct buffer_token {
 		void *buf;
 		struct scatterlist *sg;
 	} u;
-	bool sgpages;
+	/* If sgpages == 0 then buf is used, else sg is used */
+	unsigned int sgpages;
 };
 
-static void reclaim_sg_pages(struct scatterlist *sg)
+static void reclaim_sg_pages(struct scatterlist *sg, unsigned int nrpages)
 {
 	int i;
 	struct page *page;
 
-	for (i = 0; i < MAX_SPLICE_PAGES; i++) {
+	for (i = 0; i < nrpages; i++) {
 		page = sg_page(&sg[i]);
 		if (!page)
 			break;
@@ -511,7 +511,7 @@ static void reclaim_consumed_buffers(struct port *port)
 	}
 	while ((tok = virtqueue_get_buf(port->out_vq, &len))) {
 		if (tok->sgpages)
-			reclaim_sg_pages(tok->u.sg);
+			reclaim_sg_pages(tok->u.sg, tok->sgpages);
 		else
 			kfree(tok->u.buf);
 		kfree(tok);
@@ -581,7 +581,7 @@ static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
 	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
 	if (!tok)
 		return -ENOMEM;
-	tok->sgpages = false;
+	tok->sgpages = 0;
 	tok->u.buf = in_buf;
 
 	sg_init_one(sg, in_buf, in_count);
@@ -597,7 +597,7 @@ static ssize_t send_pages(struct port *port, struct scatterlist *sg, int nents,
 	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
 	if (!tok)
 		return -ENOMEM;
-	tok->sgpages = true;
+	tok->sgpages = nents;
 	tok->u.sg = sg;
 
 	return __send_to_port(port, sg, nents, in_count, tok, nonblock);
@@ -797,6 +797,7 @@ out:
 
 struct sg_list {
 	unsigned int n;
+	unsigned int size;
 	size_t len;
 	struct scatterlist *sg;
 };
@@ -807,7 +808,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 	struct sg_list *sgl = sd->u.data;
 	unsigned int offset, len;
 
-	if (sgl->n == MAX_SPLICE_PAGES)
+	if (sgl->n == sgl->size)
 		return 0;
 
 	/* Try lock this page */
@@ -868,12 +869,12 @@ static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
 
 	sgl.n = 0;
 	sgl.len = 0;
-	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,
-			 GFP_KERNEL);
+	sgl.size = pipe->nrbufs;
+	sgl.sg = kmalloc(sizeof(struct scatterlist) * sgl.size, GFP_KERNEL);
 	if (unlikely(!sgl.sg))
 		return -ENOMEM;
 
-	sg_init_table(sgl.sg, MAX_SPLICE_PAGES);
+	sg_init_table(sgl.sg, sgl.size);
 	ret = __splice_from_pipe(pipe, &sd, pipe_to_sg);
 	if (likely(ret > 0))
 		ret = send_pages(port, sgl.sg, sgl.n, sgl.len, true);



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH V2 5/6] virtio/console: Allocate scatterlist according to the current pipe size
@ 2012-08-09 12:31   ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker, Rusty Russell,
	qemu-devel, Borislav Petkov, virtualization, Masami Hiramatsu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	yrl.pp-manager.tt

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Allocate scatterlist according to the current pipe size.
This allows splicing bigger buffer if the pipe size has
been changed by fcntl.

Changes in v2:
 - Just a minor fix for avoiding a confliction with previous patch.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---

 drivers/char/virtio_console.c |   23 ++++++++++++-----------
 1 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index b2fc2ab..e88f843 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -229,7 +229,6 @@ struct port {
 	bool guest_connected;
 };
 
-#define MAX_SPLICE_PAGES	32
 /* This is the very early arch-specified put chars function. */
 static int (*early_put_chars)(u32, const char *, int);
 
@@ -482,15 +481,16 @@ struct buffer_token {
 		void *buf;
 		struct scatterlist *sg;
 	} u;
-	bool sgpages;
+	/* If sgpages == 0 then buf is used, else sg is used */
+	unsigned int sgpages;
 };
 
-static void reclaim_sg_pages(struct scatterlist *sg)
+static void reclaim_sg_pages(struct scatterlist *sg, unsigned int nrpages)
 {
 	int i;
 	struct page *page;
 
-	for (i = 0; i < MAX_SPLICE_PAGES; i++) {
+	for (i = 0; i < nrpages; i++) {
 		page = sg_page(&sg[i]);
 		if (!page)
 			break;
@@ -511,7 +511,7 @@ static void reclaim_consumed_buffers(struct port *port)
 	}
 	while ((tok = virtqueue_get_buf(port->out_vq, &len))) {
 		if (tok->sgpages)
-			reclaim_sg_pages(tok->u.sg);
+			reclaim_sg_pages(tok->u.sg, tok->sgpages);
 		else
 			kfree(tok->u.buf);
 		kfree(tok);
@@ -581,7 +581,7 @@ static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
 	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
 	if (!tok)
 		return -ENOMEM;
-	tok->sgpages = false;
+	tok->sgpages = 0;
 	tok->u.buf = in_buf;
 
 	sg_init_one(sg, in_buf, in_count);
@@ -597,7 +597,7 @@ static ssize_t send_pages(struct port *port, struct scatterlist *sg, int nents,
 	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
 	if (!tok)
 		return -ENOMEM;
-	tok->sgpages = true;
+	tok->sgpages = nents;
 	tok->u.sg = sg;
 
 	return __send_to_port(port, sg, nents, in_count, tok, nonblock);
@@ -797,6 +797,7 @@ out:
 
 struct sg_list {
 	unsigned int n;
+	unsigned int size;
 	size_t len;
 	struct scatterlist *sg;
 };
@@ -807,7 +808,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 	struct sg_list *sgl = sd->u.data;
 	unsigned int offset, len;
 
-	if (sgl->n == MAX_SPLICE_PAGES)
+	if (sgl->n == sgl->size)
 		return 0;
 
 	/* Try lock this page */
@@ -868,12 +869,12 @@ static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
 
 	sgl.n = 0;
 	sgl.len = 0;
-	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,
-			 GFP_KERNEL);
+	sgl.size = pipe->nrbufs;
+	sgl.sg = kmalloc(sizeof(struct scatterlist) * sgl.size, GFP_KERNEL);
 	if (unlikely(!sgl.sg))
 		return -ENOMEM;
 
-	sg_init_table(sgl.sg, MAX_SPLICE_PAGES);
+	sg_init_table(sgl.sg, sgl.size);
 	ret = __splice_from_pipe(pipe, &sd, pipe_to_sg);
 	if (likely(ret > 0))
 		ret = send_pages(port, sgl.sg, sgl.n, sgl.len, true);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 5/6] virtio/console: Allocate scatterlist according to the current pipe size
@ 2012-08-09 12:31   ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker, qemu-devel,
	Borislav Petkov, virtualization, Masami Hiramatsu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	yrl.pp-manager.tt

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Allocate scatterlist according to the current pipe size.
This allows splicing bigger buffer if the pipe size has
been changed by fcntl.

Changes in v2:
 - Just a minor fix for avoiding a confliction with previous patch.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---

 drivers/char/virtio_console.c |   23 ++++++++++++-----------
 1 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index b2fc2ab..e88f843 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -229,7 +229,6 @@ struct port {
 	bool guest_connected;
 };
 
-#define MAX_SPLICE_PAGES	32
 /* This is the very early arch-specified put chars function. */
 static int (*early_put_chars)(u32, const char *, int);
 
@@ -482,15 +481,16 @@ struct buffer_token {
 		void *buf;
 		struct scatterlist *sg;
 	} u;
-	bool sgpages;
+	/* If sgpages == 0 then buf is used, else sg is used */
+	unsigned int sgpages;
 };
 
-static void reclaim_sg_pages(struct scatterlist *sg)
+static void reclaim_sg_pages(struct scatterlist *sg, unsigned int nrpages)
 {
 	int i;
 	struct page *page;
 
-	for (i = 0; i < MAX_SPLICE_PAGES; i++) {
+	for (i = 0; i < nrpages; i++) {
 		page = sg_page(&sg[i]);
 		if (!page)
 			break;
@@ -511,7 +511,7 @@ static void reclaim_consumed_buffers(struct port *port)
 	}
 	while ((tok = virtqueue_get_buf(port->out_vq, &len))) {
 		if (tok->sgpages)
-			reclaim_sg_pages(tok->u.sg);
+			reclaim_sg_pages(tok->u.sg, tok->sgpages);
 		else
 			kfree(tok->u.buf);
 		kfree(tok);
@@ -581,7 +581,7 @@ static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
 	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
 	if (!tok)
 		return -ENOMEM;
-	tok->sgpages = false;
+	tok->sgpages = 0;
 	tok->u.buf = in_buf;
 
 	sg_init_one(sg, in_buf, in_count);
@@ -597,7 +597,7 @@ static ssize_t send_pages(struct port *port, struct scatterlist *sg, int nents,
 	tok = kmalloc(sizeof(*tok), GFP_ATOMIC);
 	if (!tok)
 		return -ENOMEM;
-	tok->sgpages = true;
+	tok->sgpages = nents;
 	tok->u.sg = sg;
 
 	return __send_to_port(port, sg, nents, in_count, tok, nonblock);
@@ -797,6 +797,7 @@ out:
 
 struct sg_list {
 	unsigned int n;
+	unsigned int size;
 	size_t len;
 	struct scatterlist *sg;
 };
@@ -807,7 +808,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 	struct sg_list *sgl = sd->u.data;
 	unsigned int offset, len;
 
-	if (sgl->n == MAX_SPLICE_PAGES)
+	if (sgl->n == sgl->size)
 		return 0;
 
 	/* Try lock this page */
@@ -868,12 +869,12 @@ static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
 
 	sgl.n = 0;
 	sgl.len = 0;
-	sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,
-			 GFP_KERNEL);
+	sgl.size = pipe->nrbufs;
+	sgl.sg = kmalloc(sizeof(struct scatterlist) * sgl.size, GFP_KERNEL);
 	if (unlikely(!sgl.sg))
 		return -ENOMEM;
 
-	sg_init_table(sgl.sg, MAX_SPLICE_PAGES);
+	sg_init_table(sgl.sg, sgl.size);
 	ret = __splice_from_pipe(pipe, &sd, pipe_to_sg);
 	if (likely(ret > 0))
 		ret = send_pages(port, sgl.sg, sgl.n, sgl.len, true);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 6/6] tools: Add guest trace agent as a user tool
  2012-08-09 12:30 ` [Qemu-devel] " Yoshihiro YUNOMAE
  (?)
@ 2012-08-09 12:31   ` Yoshihiro YUNOMAE
  -1 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Masami Hiramatsu,
	Rusty Russell, Steven Rostedt, virtualization, qemu-devel,
	yrl.pp-manager.tt, Yoshihiro YUNOMAE

This patch adds a user tool, "trace agent" for sending trace data of a guest to
a Host in low overhead. This agent has the following functions:
 - splice a page of ring-buffer to read_pipe without memory copying
 - splice the page from write_pipe to virtio-console without memory copying
 - write trace data to stdout by using -o option
 - controlled by start/stop orders from a Host

Changes in v2:
 - Cleanup (change fprintf() to pr_err() and an include guard)

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
---

 tools/virtio/virtio-trace/Makefile          |   14 +
 tools/virtio/virtio-trace/README            |  118 ++++++++++++
 tools/virtio/virtio-trace/trace-agent-ctl.c |  137 ++++++++++++++
 tools/virtio/virtio-trace/trace-agent-rw.c  |  192 +++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.c     |  270 +++++++++++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.h     |   75 ++++++++
 6 files changed, 806 insertions(+), 0 deletions(-)
 create mode 100644 tools/virtio/virtio-trace/Makefile
 create mode 100644 tools/virtio/virtio-trace/README
 create mode 100644 tools/virtio/virtio-trace/trace-agent-ctl.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent-rw.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.h

diff --git a/tools/virtio/virtio-trace/Makefile b/tools/virtio/virtio-trace/Makefile
new file mode 100644
index 0000000..ef3adfc
--- /dev/null
+++ b/tools/virtio/virtio-trace/Makefile
@@ -0,0 +1,14 @@
+CC = gcc
+CFLAGS = -O2 -Wall
+LFLAG = -lpthread
+
+all: trace-agent
+
+.c.o:
+	$(CC) $(CFLAGS) $(LFLAG) -c $^ -o $@
+
+trace-agent: trace-agent.o trace-agent-ctl.o trace-agent-rw.o
+	$(CC) $(CFLAGS) $(LFLAG) -o $@ $^
+
+clean:
+	rm -f *.o trace-agent
diff --git a/tools/virtio/virtio-trace/README b/tools/virtio/virtio-trace/README
new file mode 100644
index 0000000..b64845b
--- /dev/null
+++ b/tools/virtio/virtio-trace/README
@@ -0,0 +1,118 @@
+Trace Agent for virtio-trace
+============================
+
+Trace agent is a user tool for sending trace data of a guest to a Host in low
+overhead. Trace agent has the following functions:
+ - splice a page of ring-buffer to read_pipe without memory copying
+ - splice the page from write_pipe to virtio-console without memory copying
+ - write trace data to stdout by using -o option
+ - controlled by start/stop orders from a Host
+
+The trace agent operates as follows:
+ 1) Initialize all structures.
+ 2) Create a read/write thread per CPU. Each thread is bound to a CPU.
+    The read/write threads hold it.
+ 3) A controller thread does poll() for a start order of a host.
+ 4) After the controller of the trace agent receives a start order from a host,
+    the controller wake read/write threads.
+ 5) The read/write threads start to read trace data from ring-buffers and
+    write the data to virtio-serial.
+ 6) If the controller receives a stop order from a host, the read/write threads
+    stop to read trace data.
+
+
+Files
+=====
+
+README: this file
+Makefile: Makefile of trace agent for virtio-trace
+trace-agent.c: includes main function, sets up for operating trace agent
+trace-agent.h: includes all structures and some macros
+trace-agent-ctl.c: includes controller function for read/write threads
+trace-agent-rw.c: includes read/write threads function
+
+
+Setup
+=====
+
+To use this trace agent for virtio-trace, we need to prepare some virtio-serial
+I/Fs.
+
+1) Make FIFO in a host
+ virtio-trace uses virtio-serial pipe as trace data paths as to the number
+of CPUs and a control path, so FIFO (named pipe) should be created as follows:
+	# mkdir /tmp/virtio-trace/
+	# mkfifo /tmp/virtio-trace/trace-path-cpu{0,1,2,...,X}.{in,out}
+	# mkfifo /tmp/virtio-trace/agent-ctl-path.{in,out}
+
+For example, if a guest use three CPUs, the names are
+	trace-path-cpu{0,1,2}.{in.out}
+and
+	agent-ctl-path.{in,out}.
+
+2) Set up of virtio-serial pipe in a host
+ Add qemu option to use virtio-serial pipe.
+
+ ##virtio-serial device##
+     -device virtio-serial-pci,id=virtio-serial0\
+ ##control path##
+     -chardev pipe,id=charchannel0,path=/tmp/virtio-trace/agent-ctl-path\
+     -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,\
+      id=channel0,name=agent-ctl-path\
+ ##data path##
+     -chardev pipe,id=charchannel1,path=/tmp/virtio-trace/trace-path-cpu0\
+     -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0,\
+      id=channel1,name=trace-path-cpu0\
+      ...
+
+If you manage guests with libvirt, add the following tags to domain XML files.
+Then, libvirt passes the same command option to qemu.
+
+	<channel type='pipe'>
+	   <source path='/tmp/virtio-trace/agent-ctl-path'/>
+	   <target type='virtio' name='agent-ctl-path'/>
+	   <address type='virtio-serial' controller='0' bus='0' port='0'/>
+	</channel>
+	<channel type='pipe'>
+	   <source path='/tmp/virtio-trace/trace-path-cpu0'/>
+	   <target type='virtio' name='trace-path-cpu0'/>
+	   <address type='virtio-serial' controller='0' bus='0' port='1'/>
+	</channel>
+	...
+Here, chardev names are restricted to trace-path-cpuX and agent-ctl-path. For
+example, if a guest use three CPUs, chardev names should be trace-path-cpu0,
+trace-path-cpu1, trace-path-cpu2, and agent-ctl-path.
+
+3) Boot the guest
+ You can find some chardev in /dev/virtio-ports/ in the guest.
+
+
+Run
+===
+
+0) Build trace agent in a guest
+	$ make
+
+1) Enable ftrace in the guest
+ <Example>
+	# echo 1 > /sys/kernel/debug/tracing/events/sched/enable
+
+2) Run trace agent in the guest
+ This agent must be operated as root.
+	# ./trace-agent
+read/write threads in the agent wait for start order from host. If you add -o
+option, trace data are output via stdout in the guest.
+
+3) Open FIFO in a host
+	# cat /tmp/virtio-trace/trace-path-cpu0.out
+If a host does not open these, trace data get stuck in buffers of virtio. Then,
+the guest will stop by specification of chardev in QEMU. This blocking mode may
+be solved in the future.
+
+4) Start to read trace data by ordering from a host
+ A host injects read start order to the guest via virtio-serial.
+	# echo 1 > /tmp/virtio-trace/agent-ctl-path.in
+
+5) Stop to read trace data by ordering from a host
+ A host injects read stop order to the guest via virtio-serial.
+	# echo 0 > /tmp/virtio-trace/agent-ctl-path.in
diff --git a/tools/virtio/virtio-trace/trace-agent-ctl.c b/tools/virtio/virtio-trace/trace-agent-ctl.c
new file mode 100644
index 0000000..a2d0403
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent-ctl.c
@@ -0,0 +1,137 @@
+/*
+ * Controller of read/write threads for virtio-trace
+ *
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Created by Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ *            Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <poll.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "trace-agent.h"
+
+#define HOST_MSG_SIZE		256
+#define EVENT_WAIT_MSEC		100
+
+static volatile sig_atomic_t global_signal_val;
+bool global_sig_receive;	/* default false */
+bool global_run_operation;	/* default false*/
+
+/* Handle SIGTERM/SIGINT/SIGQUIT to exit */
+static void signal_handler(int sig)
+{
+	global_signal_val = sig;
+}
+
+int rw_ctl_init(const char *ctl_path)
+{
+	int ctl_fd;
+
+	ctl_fd = open(ctl_path, O_RDONLY);
+	if (ctl_fd == -1) {
+		pr_err("Cannot open ctl_fd\n");
+		goto error;
+	}
+
+	return ctl_fd;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static int wait_order(int ctl_fd)
+{
+	struct pollfd poll_fd;
+	int ret = 0;
+
+	while (!global_sig_receive) {
+		poll_fd.fd = ctl_fd;
+		poll_fd.events = POLLIN;
+
+		ret = poll(&poll_fd, 1, EVENT_WAIT_MSEC);
+
+		if (global_signal_val) {
+			global_sig_receive = true;
+			pr_info("Receive interrupt %d\n", global_signal_val);
+
+			/* Wakes rw-threads when they are sleeping */
+			if (!global_run_operation)
+				pthread_cond_broadcast(&cond_wakeup);
+
+			ret = -1;
+			break;
+		}
+
+		if (ret < 0) {
+			pr_err("Polling error\n");
+			goto error;
+		}
+
+		if (ret)
+			break;
+	};
+
+	return ret;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+/*
+ * contol read/write threads by handling global_run_operation
+ */
+void *rw_ctl_loop(int ctl_fd)
+{
+	ssize_t rlen;
+	char buf[HOST_MSG_SIZE];
+	int ret;
+
+	/* Setup signal handlers */
+	signal(SIGTERM, signal_handler);
+	signal(SIGINT, signal_handler);
+	signal(SIGQUIT, signal_handler);
+
+	while (!global_sig_receive) {
+
+		ret = wait_order(ctl_fd);
+		if (ret < 0)
+			break;
+
+		rlen = read(ctl_fd, buf, sizeof(buf));
+		if (rlen < 0) {
+			pr_err("read data error in ctl thread\n");
+			goto error;
+		}
+
+		if (rlen == 2 && buf[0] == '1') {
+			/*
+			 * If host writes '1' to a control path,
+			 * this controller wakes all read/write threads.
+			 */
+			global_run_operation = true;
+			pthread_cond_broadcast(&cond_wakeup);
+			pr_debug("Wake up all read/write threads\n");
+		} else if (rlen == 2 && buf[0] == '0') {
+			/*
+			 * If host writes '0' to a control path, read/write
+			 * threads will wait for notification from Host.
+			 */
+			global_run_operation = false;
+			pr_debug("Stop all read/write threads\n");
+		} else
+			pr_info("Invalid host notification: %s\n", buf);
+	}
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
diff --git a/tools/virtio/virtio-trace/trace-agent-rw.c b/tools/virtio/virtio-trace/trace-agent-rw.c
new file mode 100644
index 0000000..3aace5e
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent-rw.c
@@ -0,0 +1,192 @@
+/*
+ * Read/write thread of a guest agent for virtio-trace
+ *
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Created by Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ *            Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include "trace-agent.h"
+
+#define READ_WAIT_USEC	100000
+
+void *rw_thread_info_new(void)
+{
+	struct rw_thread_info *rw_ti;
+
+	rw_ti = zalloc(sizeof(struct rw_thread_info));
+	if (rw_ti == NULL) {
+		pr_err("rw_thread_info zalloc error\n");
+		exit(EXIT_FAILURE);
+	}
+
+	rw_ti->cpu_num = -1;
+	rw_ti->in_fd = -1;
+	rw_ti->out_fd = -1;
+	rw_ti->read_pipe = -1;
+	rw_ti->write_pipe = -1;
+	rw_ti->pipe_size = PIPE_INIT;
+
+	return rw_ti;
+}
+
+void *rw_thread_init(int cpu, const char *in_path, const char *out_path,
+				bool stdout_flag, unsigned long pipe_size,
+				struct rw_thread_info *rw_ti)
+{
+	int data_pipe[2];
+
+	rw_ti->cpu_num = cpu;
+
+	/* set read(input) fd */
+	rw_ti->in_fd = open(in_path, O_RDONLY);
+	if (rw_ti->in_fd == -1) {
+		pr_err("Could not open in_fd (CPU:%d)\n", cpu);
+		goto error;
+	}
+
+	/* set write(output) fd */
+	if (!stdout_flag) {
+		/* virtio-serial output mode */
+		rw_ti->out_fd = open(out_path, O_WRONLY);
+		if (rw_ti->out_fd == -1) {
+			pr_err("Could not open out_fd (CPU:%d)\n", cpu);
+			goto error;
+		}
+	} else
+		/* stdout mode */
+		rw_ti->out_fd = STDOUT_FILENO;
+
+	if (pipe2(data_pipe, O_NONBLOCK) < 0) {
+		pr_err("Could not create pipe in rw-thread(%d)\n", cpu);
+		goto error;
+	}
+
+	/*
+	 * Size of pipe is 64kB in default based on fs/pipe.c.
+	 * To read/write trace data speedy, pipe size is changed.
+	 */
+	if (fcntl(*data_pipe, F_SETPIPE_SZ, pipe_size) < 0) {
+		pr_err("Could not change pipe size in rw-thread(%d)\n", cpu);
+		goto error;
+	}
+
+	rw_ti->read_pipe = data_pipe[1];
+	rw_ti->write_pipe = data_pipe[0];
+	rw_ti->pipe_size = pipe_size;
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+/* Bind a thread to a cpu */
+static void bind_cpu(int cpu_num)
+{
+	cpu_set_t mask;
+
+	CPU_ZERO(&mask);
+	CPU_SET(cpu_num, &mask);
+
+	/* bind my thread to cpu_num by assigning zero to the first argument */
+	if (sched_setaffinity(0, sizeof(mask), &mask) == -1)
+		pr_err("Could not set CPU#%d affinity\n", (int)cpu_num);
+}
+
+static void *rw_thread_main(void *thread_info)
+{
+	ssize_t rlen, wlen;
+	ssize_t ret;
+	struct rw_thread_info *ts = (struct rw_thread_info *)thread_info;
+
+	bind_cpu(ts->cpu_num);
+
+	while (1) {
+		/* Wait for a read order of trace data by Host OS */
+		if (!global_run_operation) {
+			pthread_mutex_lock(&mutex_notify);
+			pthread_cond_wait(&cond_wakeup, &mutex_notify);
+			pthread_mutex_unlock(&mutex_notify);
+		}
+
+		if (global_sig_receive)
+			break;
+
+		/*
+		 * Each thread read trace_pipe_raw of each cpu bounding the
+		 * thread, so contention of multi-threads does not occur.
+		 */
+		rlen = splice(ts->in_fd, NULL, ts->read_pipe, NULL,
+				ts->pipe_size, SPLICE_F_MOVE | SPLICE_F_MORE);
+
+		if (rlen < 0) {
+			pr_err("Splice_read in rw-thread(%d)\n", ts->cpu_num);
+			goto error;
+		} else if (rlen == 0) {
+			/*
+			 * If trace data do not exist or are unreadable not
+			 * for exceeding the page size, splice_read returns
+			 * NULL. Then, this waits for being filled the data in a
+			 * ring-buffer.
+			 */
+			usleep(READ_WAIT_USEC);
+			pr_debug("Read retry(cpu:%d)\n", ts->cpu_num);
+			continue;
+		}
+
+		wlen = 0;
+
+		do {
+			ret = splice(ts->write_pipe, NULL, ts->out_fd, NULL,
+					rlen - wlen,
+					SPLICE_F_MOVE | SPLICE_F_MORE);
+
+			if (ret < 0) {
+				pr_err("Splice_write in rw-thread(%d)\n",
+								ts->cpu_num);
+				goto error;
+			} else if (ret == 0)
+				/*
+				 * When host reader is not in time for reading
+				 * trace data, guest will be stopped. This is
+				 * because char dev in QEMU is not supported
+				 * non-blocking mode. Then, writer might be
+				 * sleep in that case.
+				 * This sleep will be removed by supporting
+				 * non-blocking mode.
+				 */
+				sleep(1);
+			wlen += ret;
+		} while (wlen < rlen);
+	}
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+
+pthread_t rw_thread_run(struct rw_thread_info *rw_ti)
+{
+	int ret;
+	pthread_t rw_thread_per_cpu;
+
+	ret = pthread_create(&rw_thread_per_cpu, NULL, rw_thread_main, rw_ti);
+	if (ret != 0) {
+		pr_err("Could not create a rw thread(%d)\n", rw_ti->cpu_num);
+		exit(EXIT_FAILURE);
+	}
+
+	return rw_thread_per_cpu;
+}
diff --git a/tools/virtio/virtio-trace/trace-agent.c b/tools/virtio/virtio-trace/trace-agent.c
new file mode 100644
index 0000000..0a0a7dd
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent.c
@@ -0,0 +1,270 @@
+/*
+ * Guest agent for virtio-trace
+ *
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Created by Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ *            Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#define _GNU_SOURCE
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "trace-agent.h"
+
+#define PAGE_SIZE		(sysconf(_SC_PAGE_SIZE))
+#define PIPE_DEF_BUFS		16
+#define PIPE_MIN_SIZE		(PAGE_SIZE*PIPE_DEF_BUFS)
+#define PIPE_MAX_SIZE		(1024*1024)
+#define READ_PATH_FMT	\
+		"/sys/kernel/debug/tracing/per_cpu/cpu%d/trace_pipe_raw"
+#define WRITE_PATH_FMT		"/dev/virtio-ports/trace-path-cpu%d"
+#define CTL_PATH		"/dev/virtio-ports/agent-ctl-path"
+
+pthread_mutex_t mutex_notify = PTHREAD_MUTEX_INITIALIZER;
+pthread_cond_t cond_wakeup = PTHREAD_COND_INITIALIZER;
+
+static int get_total_cpus(void)
+{
+	int nr_cpus = (int)sysconf(_SC_NPROCESSORS_CONF);
+
+	if (nr_cpus <= 0) {
+		pr_err("Could not read cpus\n");
+		goto error;
+	} else if (nr_cpus > MAX_CPUS) {
+		pr_err("Exceed max cpus(%d)\n", (int)MAX_CPUS);
+		goto error;
+	}
+
+	return nr_cpus;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static void *agent_info_new(void)
+{
+	struct agent_info *s;
+	int i;
+
+	s = zalloc(sizeof(struct agent_info));
+	if (s == NULL) {
+		pr_err("agent_info zalloc error\n");
+		exit(EXIT_FAILURE);
+	}
+
+	s->pipe_size = PIPE_INIT;
+	s->use_stdout = false;
+	s->cpus = get_total_cpus();
+	s->ctl_fd = -1;
+
+	/* read/write threads init */
+	for (i = 0; i < s->cpus; i++)
+		s->rw_ti[i] = rw_thread_info_new();
+
+	return s;
+}
+
+static unsigned long parse_size(const char *arg)
+{
+	unsigned long value, round;
+	char *ptr;
+
+	value = strtoul(arg, &ptr, 10);
+	switch (*ptr) {
+	case 'K': case 'k':
+		value <<= 10;
+		break;
+	case 'M': case 'm':
+		value <<= 20;
+		break;
+	default:
+		break;
+	}
+
+	if (value > PIPE_MAX_SIZE) {
+		pr_err("Pipe size must be less than 1MB\n");
+		goto error;
+	} else if (value < PIPE_MIN_SIZE) {
+		pr_err("Pipe size must be over 64KB\n");
+		goto error;
+	}
+
+	/* Align buffer size with page unit */
+	round = value & (PAGE_SIZE - 1);
+	value = value - round;
+
+	return value;
+error:
+	return 0;
+}
+
+static void usage(char const *prg)
+{
+	pr_err("usage: %s [-h] [-o] [-s <size of pipe>]\n", prg);
+}
+
+static const char *make_path(int cpu_num, bool this_is_write_path)
+{
+	int ret;
+	char *buf;
+
+	buf = zalloc(PATH_MAX);
+	if (buf == NULL) {
+		pr_err("Could not allocate buffer\n");
+		goto error;
+	}
+
+	if (this_is_write_path)
+		/* write(output) path */
+		ret = snprintf(buf, PATH_MAX, WRITE_PATH_FMT, cpu_num);
+	else
+		/* read(input) path */
+		ret = snprintf(buf, PATH_MAX, READ_PATH_FMT, cpu_num);
+
+	if (ret <= 0) {
+		pr_err("Failed to generate %s path(CPU#%d):%d\n",
+			this_is_write_path ? "read" : "write", cpu_num, ret);
+		goto error;
+	}
+
+	return buf;
+
+error:
+	free(buf);
+	return NULL;
+}
+
+static const char *make_input_path(int cpu_num)
+{
+	return make_path(cpu_num, false);
+}
+
+static const char *make_output_path(int cpu_num)
+{
+	return make_path(cpu_num, true);
+}
+
+static void *agent_info_init(struct agent_info *s)
+{
+	int cpu;
+	const char *in_path = NULL;
+	const char *out_path = NULL;
+
+	/* init read/write threads */
+	for (cpu = 0; cpu < s->cpus; cpu++) {
+		/* set read(input) path per read/write thread */
+		in_path = make_input_path(cpu);
+		if (in_path == NULL)
+			goto error;
+
+		/* set write(output) path per read/write thread*/
+		if (!s->use_stdout) {
+			out_path = make_output_path(cpu);
+			if (out_path == NULL)
+				goto error;
+		} else
+			/* stdout mode */
+			pr_debug("stdout mode\n");
+
+		rw_thread_init(cpu, in_path, out_path, s->use_stdout,
+						s->pipe_size, s->rw_ti[cpu]);
+	}
+
+	/* init controller of read/write threads */
+	s->ctl_fd = rw_ctl_init((const char *)CTL_PATH);
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static void *parse_args(int argc, char *argv[], struct agent_info *s)
+{
+	int cmd;
+	unsigned long size;
+
+	while ((cmd = getopt(argc, argv, "hos:")) != -1) {
+		switch (cmd) {
+		/* stdout mode */
+		case 'o':
+			s->use_stdout = true;
+			break;
+		/* size of pipe */
+		case 's':
+			size = parse_size(optarg);
+			if (size == 0)
+				goto error;
+			s->pipe_size = size;
+			break;
+		case 'h':
+		default:
+			usage(argv[0]);
+			goto error;
+		}
+	}
+
+	agent_info_init(s);
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static void agent_main_loop(struct agent_info *s)
+{
+	int cpu;
+	pthread_t rw_thread_per_cpu[MAX_CPUS];
+
+	/* Start all read/write threads */
+	for (cpu = 0; cpu < s->cpus; cpu++)
+		rw_thread_per_cpu[cpu] = rw_thread_run(s->rw_ti[cpu]);
+
+	rw_ctl_loop(s->ctl_fd);
+
+	/* Finish all read/write threads */
+	for (cpu = 0; cpu < s->cpus; cpu++) {
+		int ret;
+
+		ret = pthread_join(rw_thread_per_cpu[cpu], NULL);
+		if (ret != 0) {
+			pr_err("pthread_join() error:%d (cpu %d)\n", ret, cpu);
+			exit(EXIT_FAILURE);
+		}
+	}
+}
+
+static void agent_info_free(struct agent_info *s)
+{
+	int i;
+
+	close(s->ctl_fd);
+	for (i = 0; i < s->cpus; i++) {
+		close(s->rw_ti[i]->in_fd);
+		close(s->rw_ti[i]->out_fd);
+		close(s->rw_ti[i]->read_pipe);
+		close(s->rw_ti[i]->write_pipe);
+		free(s->rw_ti[i]);
+	}
+	free(s);
+}
+
+int main(int argc, char *argv[])
+{
+	struct agent_info *s = NULL;
+
+	s = agent_info_new();
+	parse_args(argc, argv, s);
+
+	agent_main_loop(s);
+
+	agent_info_free(s);
+
+	return 0;
+}
diff --git a/tools/virtio/virtio-trace/trace-agent.h b/tools/virtio/virtio-trace/trace-agent.h
new file mode 100644
index 0000000..8de79bf
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent.h
@@ -0,0 +1,75 @@
+#ifndef __TRACE_AGENT_H__
+#define __TRACE_AGENT_H__
+#include <pthread.h>
+#include <stdbool.h>
+
+#define MAX_CPUS	256
+#define PIPE_INIT       (1024*1024)
+
+/*
+ * agent_info - structure managing total information of guest agent
+ * @pipe_size:	size of pipe (default 1MB)
+ * @use_stdout:	set to true when o option is added (default false)
+ * @cpus:	total number of CPUs
+ * @ctl_fd:	fd of control path, /dev/virtio-ports/agent-ctl-path
+ * @rw_ti:	structure managing information of read/write threads
+ */
+struct agent_info {
+	unsigned long pipe_size;
+	bool use_stdout;
+	int cpus;
+	int ctl_fd;
+	struct rw_thread_info *rw_ti[MAX_CPUS];
+};
+
+/*
+ * rw_thread_info - structure managing a read/write thread a cpu
+ * @cpu_num:	cpu number operating this read/write thread
+ * @in_fd:	fd of reading trace data path in cpu_num
+ * @out_fd:	fd of writing trace data path in cpu_num
+ * @read_pipe:	fd of read pipe
+ * @write_pipe:	fd of write pipe
+ * @pipe_size:	size of pipe (default 1MB)
+ */
+struct rw_thread_info {
+	int cpu_num;
+	int in_fd;
+	int out_fd;
+	int read_pipe;
+	int write_pipe;
+	unsigned long pipe_size;
+};
+
+/* use for stopping rw threads */
+extern bool global_sig_receive;
+
+/* use for notification */
+extern bool global_run_operation;
+extern pthread_mutex_t mutex_notify;
+extern pthread_cond_t cond_wakeup;
+
+/* for controller of read/write threads */
+extern int rw_ctl_init(const char *ctl_path);
+extern void *rw_ctl_loop(int ctl_fd);
+
+/* for trace read/write thread */
+extern void *rw_thread_info_new(void);
+extern void *rw_thread_init(int cpu, const char *in_path, const char *out_path,
+			bool stdout_flag, unsigned long pipe_size,
+			struct rw_thread_info *rw_ti);
+extern pthread_t rw_thread_run(struct rw_thread_info *rw_ti);
+
+static inline void *zalloc(size_t size)
+{
+	return calloc(1, size);
+}
+
+#define pr_err(format, ...) fprintf(stderr, format, ## __VA_ARGS__)
+#define pr_info(format, ...) fprintf(stdout, format, ## __VA_ARGS__)
+#ifdef DEBUG
+#define pr_debug(format, ...) fprintf(stderr, format, ## __VA_ARGS__)
+#else
+#define pr_debug(format, ...) do {} while (0)
+#endif
+
+#endif /*__TRACE_AGENT_H__*/



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH V2 6/6] tools: Add guest trace agent as a user tool
@ 2012-08-09 12:31   ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker,
	Yoshihiro YUNOMAE, Rusty Russell, qemu-devel, Borislav Petkov,
	virtualization, Masami Hiramatsu, Franch Ch. Eigler, Ingo Molnar,
	Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, Amit Shah, yrl.pp-manager.tt

This patch adds a user tool, "trace agent" for sending trace data of a guest to
a Host in low overhead. This agent has the following functions:
 - splice a page of ring-buffer to read_pipe without memory copying
 - splice the page from write_pipe to virtio-console without memory copying
 - write trace data to stdout by using -o option
 - controlled by start/stop orders from a Host

Changes in v2:
 - Cleanup (change fprintf() to pr_err() and an include guard)

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
---

 tools/virtio/virtio-trace/Makefile          |   14 +
 tools/virtio/virtio-trace/README            |  118 ++++++++++++
 tools/virtio/virtio-trace/trace-agent-ctl.c |  137 ++++++++++++++
 tools/virtio/virtio-trace/trace-agent-rw.c  |  192 +++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.c     |  270 +++++++++++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.h     |   75 ++++++++
 6 files changed, 806 insertions(+), 0 deletions(-)
 create mode 100644 tools/virtio/virtio-trace/Makefile
 create mode 100644 tools/virtio/virtio-trace/README
 create mode 100644 tools/virtio/virtio-trace/trace-agent-ctl.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent-rw.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.h

diff --git a/tools/virtio/virtio-trace/Makefile b/tools/virtio/virtio-trace/Makefile
new file mode 100644
index 0000000..ef3adfc
--- /dev/null
+++ b/tools/virtio/virtio-trace/Makefile
@@ -0,0 +1,14 @@
+CC = gcc
+CFLAGS = -O2 -Wall
+LFLAG = -lpthread
+
+all: trace-agent
+
+.c.o:
+	$(CC) $(CFLAGS) $(LFLAG) -c $^ -o $@
+
+trace-agent: trace-agent.o trace-agent-ctl.o trace-agent-rw.o
+	$(CC) $(CFLAGS) $(LFLAG) -o $@ $^
+
+clean:
+	rm -f *.o trace-agent
diff --git a/tools/virtio/virtio-trace/README b/tools/virtio/virtio-trace/README
new file mode 100644
index 0000000..b64845b
--- /dev/null
+++ b/tools/virtio/virtio-trace/README
@@ -0,0 +1,118 @@
+Trace Agent for virtio-trace
+============================
+
+Trace agent is a user tool for sending trace data of a guest to a Host in low
+overhead. Trace agent has the following functions:
+ - splice a page of ring-buffer to read_pipe without memory copying
+ - splice the page from write_pipe to virtio-console without memory copying
+ - write trace data to stdout by using -o option
+ - controlled by start/stop orders from a Host
+
+The trace agent operates as follows:
+ 1) Initialize all structures.
+ 2) Create a read/write thread per CPU. Each thread is bound to a CPU.
+    The read/write threads hold it.
+ 3) A controller thread does poll() for a start order of a host.
+ 4) After the controller of the trace agent receives a start order from a host,
+    the controller wake read/write threads.
+ 5) The read/write threads start to read trace data from ring-buffers and
+    write the data to virtio-serial.
+ 6) If the controller receives a stop order from a host, the read/write threads
+    stop to read trace data.
+
+
+Files
+=====
+
+README: this file
+Makefile: Makefile of trace agent for virtio-trace
+trace-agent.c: includes main function, sets up for operating trace agent
+trace-agent.h: includes all structures and some macros
+trace-agent-ctl.c: includes controller function for read/write threads
+trace-agent-rw.c: includes read/write threads function
+
+
+Setup
+=====
+
+To use this trace agent for virtio-trace, we need to prepare some virtio-serial
+I/Fs.
+
+1) Make FIFO in a host
+ virtio-trace uses virtio-serial pipe as trace data paths as to the number
+of CPUs and a control path, so FIFO (named pipe) should be created as follows:
+	# mkdir /tmp/virtio-trace/
+	# mkfifo /tmp/virtio-trace/trace-path-cpu{0,1,2,...,X}.{in,out}
+	# mkfifo /tmp/virtio-trace/agent-ctl-path.{in,out}
+
+For example, if a guest use three CPUs, the names are
+	trace-path-cpu{0,1,2}.{in.out}
+and
+	agent-ctl-path.{in,out}.
+
+2) Set up of virtio-serial pipe in a host
+ Add qemu option to use virtio-serial pipe.
+
+ ##virtio-serial device##
+     -device virtio-serial-pci,id=virtio-serial0\
+ ##control path##
+     -chardev pipe,id=charchannel0,path=/tmp/virtio-trace/agent-ctl-path\
+     -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,\
+      id=channel0,name=agent-ctl-path\
+ ##data path##
+     -chardev pipe,id=charchannel1,path=/tmp/virtio-trace/trace-path-cpu0\
+     -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0,\
+      id=channel1,name=trace-path-cpu0\
+      ...
+
+If you manage guests with libvirt, add the following tags to domain XML files.
+Then, libvirt passes the same command option to qemu.
+
+	<channel type='pipe'>
+	   <source path='/tmp/virtio-trace/agent-ctl-path'/>
+	   <target type='virtio' name='agent-ctl-path'/>
+	   <address type='virtio-serial' controller='0' bus='0' port='0'/>
+	</channel>
+	<channel type='pipe'>
+	   <source path='/tmp/virtio-trace/trace-path-cpu0'/>
+	   <target type='virtio' name='trace-path-cpu0'/>
+	   <address type='virtio-serial' controller='0' bus='0' port='1'/>
+	</channel>
+	...
+Here, chardev names are restricted to trace-path-cpuX and agent-ctl-path. For
+example, if a guest use three CPUs, chardev names should be trace-path-cpu0,
+trace-path-cpu1, trace-path-cpu2, and agent-ctl-path.
+
+3) Boot the guest
+ You can find some chardev in /dev/virtio-ports/ in the guest.
+
+
+Run
+===
+
+0) Build trace agent in a guest
+	$ make
+
+1) Enable ftrace in the guest
+ <Example>
+	# echo 1 > /sys/kernel/debug/tracing/events/sched/enable
+
+2) Run trace agent in the guest
+ This agent must be operated as root.
+	# ./trace-agent
+read/write threads in the agent wait for start order from host. If you add -o
+option, trace data are output via stdout in the guest.
+
+3) Open FIFO in a host
+	# cat /tmp/virtio-trace/trace-path-cpu0.out
+If a host does not open these, trace data get stuck in buffers of virtio. Then,
+the guest will stop by specification of chardev in QEMU. This blocking mode may
+be solved in the future.
+
+4) Start to read trace data by ordering from a host
+ A host injects read start order to the guest via virtio-serial.
+	# echo 1 > /tmp/virtio-trace/agent-ctl-path.in
+
+5) Stop to read trace data by ordering from a host
+ A host injects read stop order to the guest via virtio-serial.
+	# echo 0 > /tmp/virtio-trace/agent-ctl-path.in
diff --git a/tools/virtio/virtio-trace/trace-agent-ctl.c b/tools/virtio/virtio-trace/trace-agent-ctl.c
new file mode 100644
index 0000000..a2d0403
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent-ctl.c
@@ -0,0 +1,137 @@
+/*
+ * Controller of read/write threads for virtio-trace
+ *
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Created by Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ *            Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <poll.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "trace-agent.h"
+
+#define HOST_MSG_SIZE		256
+#define EVENT_WAIT_MSEC		100
+
+static volatile sig_atomic_t global_signal_val;
+bool global_sig_receive;	/* default false */
+bool global_run_operation;	/* default false*/
+
+/* Handle SIGTERM/SIGINT/SIGQUIT to exit */
+static void signal_handler(int sig)
+{
+	global_signal_val = sig;
+}
+
+int rw_ctl_init(const char *ctl_path)
+{
+	int ctl_fd;
+
+	ctl_fd = open(ctl_path, O_RDONLY);
+	if (ctl_fd == -1) {
+		pr_err("Cannot open ctl_fd\n");
+		goto error;
+	}
+
+	return ctl_fd;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static int wait_order(int ctl_fd)
+{
+	struct pollfd poll_fd;
+	int ret = 0;
+
+	while (!global_sig_receive) {
+		poll_fd.fd = ctl_fd;
+		poll_fd.events = POLLIN;
+
+		ret = poll(&poll_fd, 1, EVENT_WAIT_MSEC);
+
+		if (global_signal_val) {
+			global_sig_receive = true;
+			pr_info("Receive interrupt %d\n", global_signal_val);
+
+			/* Wakes rw-threads when they are sleeping */
+			if (!global_run_operation)
+				pthread_cond_broadcast(&cond_wakeup);
+
+			ret = -1;
+			break;
+		}
+
+		if (ret < 0) {
+			pr_err("Polling error\n");
+			goto error;
+		}
+
+		if (ret)
+			break;
+	};
+
+	return ret;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+/*
+ * contol read/write threads by handling global_run_operation
+ */
+void *rw_ctl_loop(int ctl_fd)
+{
+	ssize_t rlen;
+	char buf[HOST_MSG_SIZE];
+	int ret;
+
+	/* Setup signal handlers */
+	signal(SIGTERM, signal_handler);
+	signal(SIGINT, signal_handler);
+	signal(SIGQUIT, signal_handler);
+
+	while (!global_sig_receive) {
+
+		ret = wait_order(ctl_fd);
+		if (ret < 0)
+			break;
+
+		rlen = read(ctl_fd, buf, sizeof(buf));
+		if (rlen < 0) {
+			pr_err("read data error in ctl thread\n");
+			goto error;
+		}
+
+		if (rlen == 2 && buf[0] == '1') {
+			/*
+			 * If host writes '1' to a control path,
+			 * this controller wakes all read/write threads.
+			 */
+			global_run_operation = true;
+			pthread_cond_broadcast(&cond_wakeup);
+			pr_debug("Wake up all read/write threads\n");
+		} else if (rlen == 2 && buf[0] == '0') {
+			/*
+			 * If host writes '0' to a control path, read/write
+			 * threads will wait for notification from Host.
+			 */
+			global_run_operation = false;
+			pr_debug("Stop all read/write threads\n");
+		} else
+			pr_info("Invalid host notification: %s\n", buf);
+	}
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
diff --git a/tools/virtio/virtio-trace/trace-agent-rw.c b/tools/virtio/virtio-trace/trace-agent-rw.c
new file mode 100644
index 0000000..3aace5e
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent-rw.c
@@ -0,0 +1,192 @@
+/*
+ * Read/write thread of a guest agent for virtio-trace
+ *
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Created by Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ *            Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include "trace-agent.h"
+
+#define READ_WAIT_USEC	100000
+
+void *rw_thread_info_new(void)
+{
+	struct rw_thread_info *rw_ti;
+
+	rw_ti = zalloc(sizeof(struct rw_thread_info));
+	if (rw_ti == NULL) {
+		pr_err("rw_thread_info zalloc error\n");
+		exit(EXIT_FAILURE);
+	}
+
+	rw_ti->cpu_num = -1;
+	rw_ti->in_fd = -1;
+	rw_ti->out_fd = -1;
+	rw_ti->read_pipe = -1;
+	rw_ti->write_pipe = -1;
+	rw_ti->pipe_size = PIPE_INIT;
+
+	return rw_ti;
+}
+
+void *rw_thread_init(int cpu, const char *in_path, const char *out_path,
+				bool stdout_flag, unsigned long pipe_size,
+				struct rw_thread_info *rw_ti)
+{
+	int data_pipe[2];
+
+	rw_ti->cpu_num = cpu;
+
+	/* set read(input) fd */
+	rw_ti->in_fd = open(in_path, O_RDONLY);
+	if (rw_ti->in_fd == -1) {
+		pr_err("Could not open in_fd (CPU:%d)\n", cpu);
+		goto error;
+	}
+
+	/* set write(output) fd */
+	if (!stdout_flag) {
+		/* virtio-serial output mode */
+		rw_ti->out_fd = open(out_path, O_WRONLY);
+		if (rw_ti->out_fd == -1) {
+			pr_err("Could not open out_fd (CPU:%d)\n", cpu);
+			goto error;
+		}
+	} else
+		/* stdout mode */
+		rw_ti->out_fd = STDOUT_FILENO;
+
+	if (pipe2(data_pipe, O_NONBLOCK) < 0) {
+		pr_err("Could not create pipe in rw-thread(%d)\n", cpu);
+		goto error;
+	}
+
+	/*
+	 * Size of pipe is 64kB in default based on fs/pipe.c.
+	 * To read/write trace data speedy, pipe size is changed.
+	 */
+	if (fcntl(*data_pipe, F_SETPIPE_SZ, pipe_size) < 0) {
+		pr_err("Could not change pipe size in rw-thread(%d)\n", cpu);
+		goto error;
+	}
+
+	rw_ti->read_pipe = data_pipe[1];
+	rw_ti->write_pipe = data_pipe[0];
+	rw_ti->pipe_size = pipe_size;
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+/* Bind a thread to a cpu */
+static void bind_cpu(int cpu_num)
+{
+	cpu_set_t mask;
+
+	CPU_ZERO(&mask);
+	CPU_SET(cpu_num, &mask);
+
+	/* bind my thread to cpu_num by assigning zero to the first argument */
+	if (sched_setaffinity(0, sizeof(mask), &mask) == -1)
+		pr_err("Could not set CPU#%d affinity\n", (int)cpu_num);
+}
+
+static void *rw_thread_main(void *thread_info)
+{
+	ssize_t rlen, wlen;
+	ssize_t ret;
+	struct rw_thread_info *ts = (struct rw_thread_info *)thread_info;
+
+	bind_cpu(ts->cpu_num);
+
+	while (1) {
+		/* Wait for a read order of trace data by Host OS */
+		if (!global_run_operation) {
+			pthread_mutex_lock(&mutex_notify);
+			pthread_cond_wait(&cond_wakeup, &mutex_notify);
+			pthread_mutex_unlock(&mutex_notify);
+		}
+
+		if (global_sig_receive)
+			break;
+
+		/*
+		 * Each thread read trace_pipe_raw of each cpu bounding the
+		 * thread, so contention of multi-threads does not occur.
+		 */
+		rlen = splice(ts->in_fd, NULL, ts->read_pipe, NULL,
+				ts->pipe_size, SPLICE_F_MOVE | SPLICE_F_MORE);
+
+		if (rlen < 0) {
+			pr_err("Splice_read in rw-thread(%d)\n", ts->cpu_num);
+			goto error;
+		} else if (rlen == 0) {
+			/*
+			 * If trace data do not exist or are unreadable not
+			 * for exceeding the page size, splice_read returns
+			 * NULL. Then, this waits for being filled the data in a
+			 * ring-buffer.
+			 */
+			usleep(READ_WAIT_USEC);
+			pr_debug("Read retry(cpu:%d)\n", ts->cpu_num);
+			continue;
+		}
+
+		wlen = 0;
+
+		do {
+			ret = splice(ts->write_pipe, NULL, ts->out_fd, NULL,
+					rlen - wlen,
+					SPLICE_F_MOVE | SPLICE_F_MORE);
+
+			if (ret < 0) {
+				pr_err("Splice_write in rw-thread(%d)\n",
+								ts->cpu_num);
+				goto error;
+			} else if (ret == 0)
+				/*
+				 * When host reader is not in time for reading
+				 * trace data, guest will be stopped. This is
+				 * because char dev in QEMU is not supported
+				 * non-blocking mode. Then, writer might be
+				 * sleep in that case.
+				 * This sleep will be removed by supporting
+				 * non-blocking mode.
+				 */
+				sleep(1);
+			wlen += ret;
+		} while (wlen < rlen);
+	}
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+
+pthread_t rw_thread_run(struct rw_thread_info *rw_ti)
+{
+	int ret;
+	pthread_t rw_thread_per_cpu;
+
+	ret = pthread_create(&rw_thread_per_cpu, NULL, rw_thread_main, rw_ti);
+	if (ret != 0) {
+		pr_err("Could not create a rw thread(%d)\n", rw_ti->cpu_num);
+		exit(EXIT_FAILURE);
+	}
+
+	return rw_thread_per_cpu;
+}
diff --git a/tools/virtio/virtio-trace/trace-agent.c b/tools/virtio/virtio-trace/trace-agent.c
new file mode 100644
index 0000000..0a0a7dd
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent.c
@@ -0,0 +1,270 @@
+/*
+ * Guest agent for virtio-trace
+ *
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Created by Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ *            Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#define _GNU_SOURCE
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "trace-agent.h"
+
+#define PAGE_SIZE		(sysconf(_SC_PAGE_SIZE))
+#define PIPE_DEF_BUFS		16
+#define PIPE_MIN_SIZE		(PAGE_SIZE*PIPE_DEF_BUFS)
+#define PIPE_MAX_SIZE		(1024*1024)
+#define READ_PATH_FMT	\
+		"/sys/kernel/debug/tracing/per_cpu/cpu%d/trace_pipe_raw"
+#define WRITE_PATH_FMT		"/dev/virtio-ports/trace-path-cpu%d"
+#define CTL_PATH		"/dev/virtio-ports/agent-ctl-path"
+
+pthread_mutex_t mutex_notify = PTHREAD_MUTEX_INITIALIZER;
+pthread_cond_t cond_wakeup = PTHREAD_COND_INITIALIZER;
+
+static int get_total_cpus(void)
+{
+	int nr_cpus = (int)sysconf(_SC_NPROCESSORS_CONF);
+
+	if (nr_cpus <= 0) {
+		pr_err("Could not read cpus\n");
+		goto error;
+	} else if (nr_cpus > MAX_CPUS) {
+		pr_err("Exceed max cpus(%d)\n", (int)MAX_CPUS);
+		goto error;
+	}
+
+	return nr_cpus;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static void *agent_info_new(void)
+{
+	struct agent_info *s;
+	int i;
+
+	s = zalloc(sizeof(struct agent_info));
+	if (s == NULL) {
+		pr_err("agent_info zalloc error\n");
+		exit(EXIT_FAILURE);
+	}
+
+	s->pipe_size = PIPE_INIT;
+	s->use_stdout = false;
+	s->cpus = get_total_cpus();
+	s->ctl_fd = -1;
+
+	/* read/write threads init */
+	for (i = 0; i < s->cpus; i++)
+		s->rw_ti[i] = rw_thread_info_new();
+
+	return s;
+}
+
+static unsigned long parse_size(const char *arg)
+{
+	unsigned long value, round;
+	char *ptr;
+
+	value = strtoul(arg, &ptr, 10);
+	switch (*ptr) {
+	case 'K': case 'k':
+		value <<= 10;
+		break;
+	case 'M': case 'm':
+		value <<= 20;
+		break;
+	default:
+		break;
+	}
+
+	if (value > PIPE_MAX_SIZE) {
+		pr_err("Pipe size must be less than 1MB\n");
+		goto error;
+	} else if (value < PIPE_MIN_SIZE) {
+		pr_err("Pipe size must be over 64KB\n");
+		goto error;
+	}
+
+	/* Align buffer size with page unit */
+	round = value & (PAGE_SIZE - 1);
+	value = value - round;
+
+	return value;
+error:
+	return 0;
+}
+
+static void usage(char const *prg)
+{
+	pr_err("usage: %s [-h] [-o] [-s <size of pipe>]\n", prg);
+}
+
+static const char *make_path(int cpu_num, bool this_is_write_path)
+{
+	int ret;
+	char *buf;
+
+	buf = zalloc(PATH_MAX);
+	if (buf == NULL) {
+		pr_err("Could not allocate buffer\n");
+		goto error;
+	}
+
+	if (this_is_write_path)
+		/* write(output) path */
+		ret = snprintf(buf, PATH_MAX, WRITE_PATH_FMT, cpu_num);
+	else
+		/* read(input) path */
+		ret = snprintf(buf, PATH_MAX, READ_PATH_FMT, cpu_num);
+
+	if (ret <= 0) {
+		pr_err("Failed to generate %s path(CPU#%d):%d\n",
+			this_is_write_path ? "read" : "write", cpu_num, ret);
+		goto error;
+	}
+
+	return buf;
+
+error:
+	free(buf);
+	return NULL;
+}
+
+static const char *make_input_path(int cpu_num)
+{
+	return make_path(cpu_num, false);
+}
+
+static const char *make_output_path(int cpu_num)
+{
+	return make_path(cpu_num, true);
+}
+
+static void *agent_info_init(struct agent_info *s)
+{
+	int cpu;
+	const char *in_path = NULL;
+	const char *out_path = NULL;
+
+	/* init read/write threads */
+	for (cpu = 0; cpu < s->cpus; cpu++) {
+		/* set read(input) path per read/write thread */
+		in_path = make_input_path(cpu);
+		if (in_path == NULL)
+			goto error;
+
+		/* set write(output) path per read/write thread*/
+		if (!s->use_stdout) {
+			out_path = make_output_path(cpu);
+			if (out_path == NULL)
+				goto error;
+		} else
+			/* stdout mode */
+			pr_debug("stdout mode\n");
+
+		rw_thread_init(cpu, in_path, out_path, s->use_stdout,
+						s->pipe_size, s->rw_ti[cpu]);
+	}
+
+	/* init controller of read/write threads */
+	s->ctl_fd = rw_ctl_init((const char *)CTL_PATH);
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static void *parse_args(int argc, char *argv[], struct agent_info *s)
+{
+	int cmd;
+	unsigned long size;
+
+	while ((cmd = getopt(argc, argv, "hos:")) != -1) {
+		switch (cmd) {
+		/* stdout mode */
+		case 'o':
+			s->use_stdout = true;
+			break;
+		/* size of pipe */
+		case 's':
+			size = parse_size(optarg);
+			if (size == 0)
+				goto error;
+			s->pipe_size = size;
+			break;
+		case 'h':
+		default:
+			usage(argv[0]);
+			goto error;
+		}
+	}
+
+	agent_info_init(s);
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static void agent_main_loop(struct agent_info *s)
+{
+	int cpu;
+	pthread_t rw_thread_per_cpu[MAX_CPUS];
+
+	/* Start all read/write threads */
+	for (cpu = 0; cpu < s->cpus; cpu++)
+		rw_thread_per_cpu[cpu] = rw_thread_run(s->rw_ti[cpu]);
+
+	rw_ctl_loop(s->ctl_fd);
+
+	/* Finish all read/write threads */
+	for (cpu = 0; cpu < s->cpus; cpu++) {
+		int ret;
+
+		ret = pthread_join(rw_thread_per_cpu[cpu], NULL);
+		if (ret != 0) {
+			pr_err("pthread_join() error:%d (cpu %d)\n", ret, cpu);
+			exit(EXIT_FAILURE);
+		}
+	}
+}
+
+static void agent_info_free(struct agent_info *s)
+{
+	int i;
+
+	close(s->ctl_fd);
+	for (i = 0; i < s->cpus; i++) {
+		close(s->rw_ti[i]->in_fd);
+		close(s->rw_ti[i]->out_fd);
+		close(s->rw_ti[i]->read_pipe);
+		close(s->rw_ti[i]->write_pipe);
+		free(s->rw_ti[i]);
+	}
+	free(s);
+}
+
+int main(int argc, char *argv[])
+{
+	struct agent_info *s = NULL;
+
+	s = agent_info_new();
+	parse_args(argc, argv, s);
+
+	agent_main_loop(s);
+
+	agent_info_free(s);
+
+	return 0;
+}
diff --git a/tools/virtio/virtio-trace/trace-agent.h b/tools/virtio/virtio-trace/trace-agent.h
new file mode 100644
index 0000000..8de79bf
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent.h
@@ -0,0 +1,75 @@
+#ifndef __TRACE_AGENT_H__
+#define __TRACE_AGENT_H__
+#include <pthread.h>
+#include <stdbool.h>
+
+#define MAX_CPUS	256
+#define PIPE_INIT       (1024*1024)
+
+/*
+ * agent_info - structure managing total information of guest agent
+ * @pipe_size:	size of pipe (default 1MB)
+ * @use_stdout:	set to true when o option is added (default false)
+ * @cpus:	total number of CPUs
+ * @ctl_fd:	fd of control path, /dev/virtio-ports/agent-ctl-path
+ * @rw_ti:	structure managing information of read/write threads
+ */
+struct agent_info {
+	unsigned long pipe_size;
+	bool use_stdout;
+	int cpus;
+	int ctl_fd;
+	struct rw_thread_info *rw_ti[MAX_CPUS];
+};
+
+/*
+ * rw_thread_info - structure managing a read/write thread a cpu
+ * @cpu_num:	cpu number operating this read/write thread
+ * @in_fd:	fd of reading trace data path in cpu_num
+ * @out_fd:	fd of writing trace data path in cpu_num
+ * @read_pipe:	fd of read pipe
+ * @write_pipe:	fd of write pipe
+ * @pipe_size:	size of pipe (default 1MB)
+ */
+struct rw_thread_info {
+	int cpu_num;
+	int in_fd;
+	int out_fd;
+	int read_pipe;
+	int write_pipe;
+	unsigned long pipe_size;
+};
+
+/* use for stopping rw threads */
+extern bool global_sig_receive;
+
+/* use for notification */
+extern bool global_run_operation;
+extern pthread_mutex_t mutex_notify;
+extern pthread_cond_t cond_wakeup;
+
+/* for controller of read/write threads */
+extern int rw_ctl_init(const char *ctl_path);
+extern void *rw_ctl_loop(int ctl_fd);
+
+/* for trace read/write thread */
+extern void *rw_thread_info_new(void);
+extern void *rw_thread_init(int cpu, const char *in_path, const char *out_path,
+			bool stdout_flag, unsigned long pipe_size,
+			struct rw_thread_info *rw_ti);
+extern pthread_t rw_thread_run(struct rw_thread_info *rw_ti);
+
+static inline void *zalloc(size_t size)
+{
+	return calloc(1, size);
+}
+
+#define pr_err(format, ...) fprintf(stderr, format, ## __VA_ARGS__)
+#define pr_info(format, ...) fprintf(stdout, format, ## __VA_ARGS__)
+#ifdef DEBUG
+#define pr_debug(format, ...) fprintf(stderr, format, ## __VA_ARGS__)
+#else
+#define pr_debug(format, ...) do {} while (0)
+#endif
+
+#endif /*__TRACE_AGENT_H__*/

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 6/6] tools: Add guest trace agent as a user tool
@ 2012-08-09 12:31   ` Yoshihiro YUNOMAE
  0 siblings, 0 replies; 30+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-08-09 12:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker, qemu-devel,
	Borislav Petkov, virtualization, Masami Hiramatsu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman, Amit Shah,
	yrl.pp-manager.tt

This patch adds a user tool, "trace agent" for sending trace data of a guest to
a Host in low overhead. This agent has the following functions:
 - splice a page of ring-buffer to read_pipe without memory copying
 - splice the page from write_pipe to virtio-console without memory copying
 - write trace data to stdout by using -o option
 - controlled by start/stop orders from a Host

Changes in v2:
 - Cleanup (change fprintf() to pr_err() and an include guard)

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
---

 tools/virtio/virtio-trace/Makefile          |   14 +
 tools/virtio/virtio-trace/README            |  118 ++++++++++++
 tools/virtio/virtio-trace/trace-agent-ctl.c |  137 ++++++++++++++
 tools/virtio/virtio-trace/trace-agent-rw.c  |  192 +++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.c     |  270 +++++++++++++++++++++++++++
 tools/virtio/virtio-trace/trace-agent.h     |   75 ++++++++
 6 files changed, 806 insertions(+), 0 deletions(-)
 create mode 100644 tools/virtio/virtio-trace/Makefile
 create mode 100644 tools/virtio/virtio-trace/README
 create mode 100644 tools/virtio/virtio-trace/trace-agent-ctl.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent-rw.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.h

diff --git a/tools/virtio/virtio-trace/Makefile b/tools/virtio/virtio-trace/Makefile
new file mode 100644
index 0000000..ef3adfc
--- /dev/null
+++ b/tools/virtio/virtio-trace/Makefile
@@ -0,0 +1,14 @@
+CC = gcc
+CFLAGS = -O2 -Wall
+LFLAG = -lpthread
+
+all: trace-agent
+
+.c.o:
+	$(CC) $(CFLAGS) $(LFLAG) -c $^ -o $@
+
+trace-agent: trace-agent.o trace-agent-ctl.o trace-agent-rw.o
+	$(CC) $(CFLAGS) $(LFLAG) -o $@ $^
+
+clean:
+	rm -f *.o trace-agent
diff --git a/tools/virtio/virtio-trace/README b/tools/virtio/virtio-trace/README
new file mode 100644
index 0000000..b64845b
--- /dev/null
+++ b/tools/virtio/virtio-trace/README
@@ -0,0 +1,118 @@
+Trace Agent for virtio-trace
+============================
+
+Trace agent is a user tool for sending trace data of a guest to a Host in low
+overhead. Trace agent has the following functions:
+ - splice a page of ring-buffer to read_pipe without memory copying
+ - splice the page from write_pipe to virtio-console without memory copying
+ - write trace data to stdout by using -o option
+ - controlled by start/stop orders from a Host
+
+The trace agent operates as follows:
+ 1) Initialize all structures.
+ 2) Create a read/write thread per CPU. Each thread is bound to a CPU.
+    The read/write threads hold it.
+ 3) A controller thread does poll() for a start order of a host.
+ 4) After the controller of the trace agent receives a start order from a host,
+    the controller wake read/write threads.
+ 5) The read/write threads start to read trace data from ring-buffers and
+    write the data to virtio-serial.
+ 6) If the controller receives a stop order from a host, the read/write threads
+    stop to read trace data.
+
+
+Files
+=====
+
+README: this file
+Makefile: Makefile of trace agent for virtio-trace
+trace-agent.c: includes main function, sets up for operating trace agent
+trace-agent.h: includes all structures and some macros
+trace-agent-ctl.c: includes controller function for read/write threads
+trace-agent-rw.c: includes read/write threads function
+
+
+Setup
+=====
+
+To use this trace agent for virtio-trace, we need to prepare some virtio-serial
+I/Fs.
+
+1) Make FIFO in a host
+ virtio-trace uses virtio-serial pipe as trace data paths as to the number
+of CPUs and a control path, so FIFO (named pipe) should be created as follows:
+	# mkdir /tmp/virtio-trace/
+	# mkfifo /tmp/virtio-trace/trace-path-cpu{0,1,2,...,X}.{in,out}
+	# mkfifo /tmp/virtio-trace/agent-ctl-path.{in,out}
+
+For example, if a guest use three CPUs, the names are
+	trace-path-cpu{0,1,2}.{in.out}
+and
+	agent-ctl-path.{in,out}.
+
+2) Set up of virtio-serial pipe in a host
+ Add qemu option to use virtio-serial pipe.
+
+ ##virtio-serial device##
+     -device virtio-serial-pci,id=virtio-serial0\
+ ##control path##
+     -chardev pipe,id=charchannel0,path=/tmp/virtio-trace/agent-ctl-path\
+     -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,\
+      id=channel0,name=agent-ctl-path\
+ ##data path##
+     -chardev pipe,id=charchannel1,path=/tmp/virtio-trace/trace-path-cpu0\
+     -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0,\
+      id=channel1,name=trace-path-cpu0\
+      ...
+
+If you manage guests with libvirt, add the following tags to domain XML files.
+Then, libvirt passes the same command option to qemu.
+
+	<channel type='pipe'>
+	   <source path='/tmp/virtio-trace/agent-ctl-path'/>
+	   <target type='virtio' name='agent-ctl-path'/>
+	   <address type='virtio-serial' controller='0' bus='0' port='0'/>
+	</channel>
+	<channel type='pipe'>
+	   <source path='/tmp/virtio-trace/trace-path-cpu0'/>
+	   <target type='virtio' name='trace-path-cpu0'/>
+	   <address type='virtio-serial' controller='0' bus='0' port='1'/>
+	</channel>
+	...
+Here, chardev names are restricted to trace-path-cpuX and agent-ctl-path. For
+example, if a guest use three CPUs, chardev names should be trace-path-cpu0,
+trace-path-cpu1, trace-path-cpu2, and agent-ctl-path.
+
+3) Boot the guest
+ You can find some chardev in /dev/virtio-ports/ in the guest.
+
+
+Run
+===
+
+0) Build trace agent in a guest
+	$ make
+
+1) Enable ftrace in the guest
+ <Example>
+	# echo 1 > /sys/kernel/debug/tracing/events/sched/enable
+
+2) Run trace agent in the guest
+ This agent must be operated as root.
+	# ./trace-agent
+read/write threads in the agent wait for start order from host. If you add -o
+option, trace data are output via stdout in the guest.
+
+3) Open FIFO in a host
+	# cat /tmp/virtio-trace/trace-path-cpu0.out
+If a host does not open these, trace data get stuck in buffers of virtio. Then,
+the guest will stop by specification of chardev in QEMU. This blocking mode may
+be solved in the future.
+
+4) Start to read trace data by ordering from a host
+ A host injects read start order to the guest via virtio-serial.
+	# echo 1 > /tmp/virtio-trace/agent-ctl-path.in
+
+5) Stop to read trace data by ordering from a host
+ A host injects read stop order to the guest via virtio-serial.
+	# echo 0 > /tmp/virtio-trace/agent-ctl-path.in
diff --git a/tools/virtio/virtio-trace/trace-agent-ctl.c b/tools/virtio/virtio-trace/trace-agent-ctl.c
new file mode 100644
index 0000000..a2d0403
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent-ctl.c
@@ -0,0 +1,137 @@
+/*
+ * Controller of read/write threads for virtio-trace
+ *
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Created by Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ *            Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <poll.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "trace-agent.h"
+
+#define HOST_MSG_SIZE		256
+#define EVENT_WAIT_MSEC		100
+
+static volatile sig_atomic_t global_signal_val;
+bool global_sig_receive;	/* default false */
+bool global_run_operation;	/* default false*/
+
+/* Handle SIGTERM/SIGINT/SIGQUIT to exit */
+static void signal_handler(int sig)
+{
+	global_signal_val = sig;
+}
+
+int rw_ctl_init(const char *ctl_path)
+{
+	int ctl_fd;
+
+	ctl_fd = open(ctl_path, O_RDONLY);
+	if (ctl_fd == -1) {
+		pr_err("Cannot open ctl_fd\n");
+		goto error;
+	}
+
+	return ctl_fd;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static int wait_order(int ctl_fd)
+{
+	struct pollfd poll_fd;
+	int ret = 0;
+
+	while (!global_sig_receive) {
+		poll_fd.fd = ctl_fd;
+		poll_fd.events = POLLIN;
+
+		ret = poll(&poll_fd, 1, EVENT_WAIT_MSEC);
+
+		if (global_signal_val) {
+			global_sig_receive = true;
+			pr_info("Receive interrupt %d\n", global_signal_val);
+
+			/* Wakes rw-threads when they are sleeping */
+			if (!global_run_operation)
+				pthread_cond_broadcast(&cond_wakeup);
+
+			ret = -1;
+			break;
+		}
+
+		if (ret < 0) {
+			pr_err("Polling error\n");
+			goto error;
+		}
+
+		if (ret)
+			break;
+	};
+
+	return ret;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+/*
+ * contol read/write threads by handling global_run_operation
+ */
+void *rw_ctl_loop(int ctl_fd)
+{
+	ssize_t rlen;
+	char buf[HOST_MSG_SIZE];
+	int ret;
+
+	/* Setup signal handlers */
+	signal(SIGTERM, signal_handler);
+	signal(SIGINT, signal_handler);
+	signal(SIGQUIT, signal_handler);
+
+	while (!global_sig_receive) {
+
+		ret = wait_order(ctl_fd);
+		if (ret < 0)
+			break;
+
+		rlen = read(ctl_fd, buf, sizeof(buf));
+		if (rlen < 0) {
+			pr_err("read data error in ctl thread\n");
+			goto error;
+		}
+
+		if (rlen == 2 && buf[0] == '1') {
+			/*
+			 * If host writes '1' to a control path,
+			 * this controller wakes all read/write threads.
+			 */
+			global_run_operation = true;
+			pthread_cond_broadcast(&cond_wakeup);
+			pr_debug("Wake up all read/write threads\n");
+		} else if (rlen == 2 && buf[0] == '0') {
+			/*
+			 * If host writes '0' to a control path, read/write
+			 * threads will wait for notification from Host.
+			 */
+			global_run_operation = false;
+			pr_debug("Stop all read/write threads\n");
+		} else
+			pr_info("Invalid host notification: %s\n", buf);
+	}
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
diff --git a/tools/virtio/virtio-trace/trace-agent-rw.c b/tools/virtio/virtio-trace/trace-agent-rw.c
new file mode 100644
index 0000000..3aace5e
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent-rw.c
@@ -0,0 +1,192 @@
+/*
+ * Read/write thread of a guest agent for virtio-trace
+ *
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Created by Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ *            Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include "trace-agent.h"
+
+#define READ_WAIT_USEC	100000
+
+void *rw_thread_info_new(void)
+{
+	struct rw_thread_info *rw_ti;
+
+	rw_ti = zalloc(sizeof(struct rw_thread_info));
+	if (rw_ti == NULL) {
+		pr_err("rw_thread_info zalloc error\n");
+		exit(EXIT_FAILURE);
+	}
+
+	rw_ti->cpu_num = -1;
+	rw_ti->in_fd = -1;
+	rw_ti->out_fd = -1;
+	rw_ti->read_pipe = -1;
+	rw_ti->write_pipe = -1;
+	rw_ti->pipe_size = PIPE_INIT;
+
+	return rw_ti;
+}
+
+void *rw_thread_init(int cpu, const char *in_path, const char *out_path,
+				bool stdout_flag, unsigned long pipe_size,
+				struct rw_thread_info *rw_ti)
+{
+	int data_pipe[2];
+
+	rw_ti->cpu_num = cpu;
+
+	/* set read(input) fd */
+	rw_ti->in_fd = open(in_path, O_RDONLY);
+	if (rw_ti->in_fd == -1) {
+		pr_err("Could not open in_fd (CPU:%d)\n", cpu);
+		goto error;
+	}
+
+	/* set write(output) fd */
+	if (!stdout_flag) {
+		/* virtio-serial output mode */
+		rw_ti->out_fd = open(out_path, O_WRONLY);
+		if (rw_ti->out_fd == -1) {
+			pr_err("Could not open out_fd (CPU:%d)\n", cpu);
+			goto error;
+		}
+	} else
+		/* stdout mode */
+		rw_ti->out_fd = STDOUT_FILENO;
+
+	if (pipe2(data_pipe, O_NONBLOCK) < 0) {
+		pr_err("Could not create pipe in rw-thread(%d)\n", cpu);
+		goto error;
+	}
+
+	/*
+	 * Size of pipe is 64kB in default based on fs/pipe.c.
+	 * To read/write trace data speedy, pipe size is changed.
+	 */
+	if (fcntl(*data_pipe, F_SETPIPE_SZ, pipe_size) < 0) {
+		pr_err("Could not change pipe size in rw-thread(%d)\n", cpu);
+		goto error;
+	}
+
+	rw_ti->read_pipe = data_pipe[1];
+	rw_ti->write_pipe = data_pipe[0];
+	rw_ti->pipe_size = pipe_size;
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+/* Bind a thread to a cpu */
+static void bind_cpu(int cpu_num)
+{
+	cpu_set_t mask;
+
+	CPU_ZERO(&mask);
+	CPU_SET(cpu_num, &mask);
+
+	/* bind my thread to cpu_num by assigning zero to the first argument */
+	if (sched_setaffinity(0, sizeof(mask), &mask) == -1)
+		pr_err("Could not set CPU#%d affinity\n", (int)cpu_num);
+}
+
+static void *rw_thread_main(void *thread_info)
+{
+	ssize_t rlen, wlen;
+	ssize_t ret;
+	struct rw_thread_info *ts = (struct rw_thread_info *)thread_info;
+
+	bind_cpu(ts->cpu_num);
+
+	while (1) {
+		/* Wait for a read order of trace data by Host OS */
+		if (!global_run_operation) {
+			pthread_mutex_lock(&mutex_notify);
+			pthread_cond_wait(&cond_wakeup, &mutex_notify);
+			pthread_mutex_unlock(&mutex_notify);
+		}
+
+		if (global_sig_receive)
+			break;
+
+		/*
+		 * Each thread read trace_pipe_raw of each cpu bounding the
+		 * thread, so contention of multi-threads does not occur.
+		 */
+		rlen = splice(ts->in_fd, NULL, ts->read_pipe, NULL,
+				ts->pipe_size, SPLICE_F_MOVE | SPLICE_F_MORE);
+
+		if (rlen < 0) {
+			pr_err("Splice_read in rw-thread(%d)\n", ts->cpu_num);
+			goto error;
+		} else if (rlen == 0) {
+			/*
+			 * If trace data do not exist or are unreadable not
+			 * for exceeding the page size, splice_read returns
+			 * NULL. Then, this waits for being filled the data in a
+			 * ring-buffer.
+			 */
+			usleep(READ_WAIT_USEC);
+			pr_debug("Read retry(cpu:%d)\n", ts->cpu_num);
+			continue;
+		}
+
+		wlen = 0;
+
+		do {
+			ret = splice(ts->write_pipe, NULL, ts->out_fd, NULL,
+					rlen - wlen,
+					SPLICE_F_MOVE | SPLICE_F_MORE);
+
+			if (ret < 0) {
+				pr_err("Splice_write in rw-thread(%d)\n",
+								ts->cpu_num);
+				goto error;
+			} else if (ret == 0)
+				/*
+				 * When host reader is not in time for reading
+				 * trace data, guest will be stopped. This is
+				 * because char dev in QEMU is not supported
+				 * non-blocking mode. Then, writer might be
+				 * sleep in that case.
+				 * This sleep will be removed by supporting
+				 * non-blocking mode.
+				 */
+				sleep(1);
+			wlen += ret;
+		} while (wlen < rlen);
+	}
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+
+pthread_t rw_thread_run(struct rw_thread_info *rw_ti)
+{
+	int ret;
+	pthread_t rw_thread_per_cpu;
+
+	ret = pthread_create(&rw_thread_per_cpu, NULL, rw_thread_main, rw_ti);
+	if (ret != 0) {
+		pr_err("Could not create a rw thread(%d)\n", rw_ti->cpu_num);
+		exit(EXIT_FAILURE);
+	}
+
+	return rw_thread_per_cpu;
+}
diff --git a/tools/virtio/virtio-trace/trace-agent.c b/tools/virtio/virtio-trace/trace-agent.c
new file mode 100644
index 0000000..0a0a7dd
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent.c
@@ -0,0 +1,270 @@
+/*
+ * Guest agent for virtio-trace
+ *
+ * Copyright (C) 2012 Hitachi, Ltd.
+ * Created by Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ *            Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#define _GNU_SOURCE
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "trace-agent.h"
+
+#define PAGE_SIZE		(sysconf(_SC_PAGE_SIZE))
+#define PIPE_DEF_BUFS		16
+#define PIPE_MIN_SIZE		(PAGE_SIZE*PIPE_DEF_BUFS)
+#define PIPE_MAX_SIZE		(1024*1024)
+#define READ_PATH_FMT	\
+		"/sys/kernel/debug/tracing/per_cpu/cpu%d/trace_pipe_raw"
+#define WRITE_PATH_FMT		"/dev/virtio-ports/trace-path-cpu%d"
+#define CTL_PATH		"/dev/virtio-ports/agent-ctl-path"
+
+pthread_mutex_t mutex_notify = PTHREAD_MUTEX_INITIALIZER;
+pthread_cond_t cond_wakeup = PTHREAD_COND_INITIALIZER;
+
+static int get_total_cpus(void)
+{
+	int nr_cpus = (int)sysconf(_SC_NPROCESSORS_CONF);
+
+	if (nr_cpus <= 0) {
+		pr_err("Could not read cpus\n");
+		goto error;
+	} else if (nr_cpus > MAX_CPUS) {
+		pr_err("Exceed max cpus(%d)\n", (int)MAX_CPUS);
+		goto error;
+	}
+
+	return nr_cpus;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static void *agent_info_new(void)
+{
+	struct agent_info *s;
+	int i;
+
+	s = zalloc(sizeof(struct agent_info));
+	if (s == NULL) {
+		pr_err("agent_info zalloc error\n");
+		exit(EXIT_FAILURE);
+	}
+
+	s->pipe_size = PIPE_INIT;
+	s->use_stdout = false;
+	s->cpus = get_total_cpus();
+	s->ctl_fd = -1;
+
+	/* read/write threads init */
+	for (i = 0; i < s->cpus; i++)
+		s->rw_ti[i] = rw_thread_info_new();
+
+	return s;
+}
+
+static unsigned long parse_size(const char *arg)
+{
+	unsigned long value, round;
+	char *ptr;
+
+	value = strtoul(arg, &ptr, 10);
+	switch (*ptr) {
+	case 'K': case 'k':
+		value <<= 10;
+		break;
+	case 'M': case 'm':
+		value <<= 20;
+		break;
+	default:
+		break;
+	}
+
+	if (value > PIPE_MAX_SIZE) {
+		pr_err("Pipe size must be less than 1MB\n");
+		goto error;
+	} else if (value < PIPE_MIN_SIZE) {
+		pr_err("Pipe size must be over 64KB\n");
+		goto error;
+	}
+
+	/* Align buffer size with page unit */
+	round = value & (PAGE_SIZE - 1);
+	value = value - round;
+
+	return value;
+error:
+	return 0;
+}
+
+static void usage(char const *prg)
+{
+	pr_err("usage: %s [-h] [-o] [-s <size of pipe>]\n", prg);
+}
+
+static const char *make_path(int cpu_num, bool this_is_write_path)
+{
+	int ret;
+	char *buf;
+
+	buf = zalloc(PATH_MAX);
+	if (buf == NULL) {
+		pr_err("Could not allocate buffer\n");
+		goto error;
+	}
+
+	if (this_is_write_path)
+		/* write(output) path */
+		ret = snprintf(buf, PATH_MAX, WRITE_PATH_FMT, cpu_num);
+	else
+		/* read(input) path */
+		ret = snprintf(buf, PATH_MAX, READ_PATH_FMT, cpu_num);
+
+	if (ret <= 0) {
+		pr_err("Failed to generate %s path(CPU#%d):%d\n",
+			this_is_write_path ? "read" : "write", cpu_num, ret);
+		goto error;
+	}
+
+	return buf;
+
+error:
+	free(buf);
+	return NULL;
+}
+
+static const char *make_input_path(int cpu_num)
+{
+	return make_path(cpu_num, false);
+}
+
+static const char *make_output_path(int cpu_num)
+{
+	return make_path(cpu_num, true);
+}
+
+static void *agent_info_init(struct agent_info *s)
+{
+	int cpu;
+	const char *in_path = NULL;
+	const char *out_path = NULL;
+
+	/* init read/write threads */
+	for (cpu = 0; cpu < s->cpus; cpu++) {
+		/* set read(input) path per read/write thread */
+		in_path = make_input_path(cpu);
+		if (in_path == NULL)
+			goto error;
+
+		/* set write(output) path per read/write thread*/
+		if (!s->use_stdout) {
+			out_path = make_output_path(cpu);
+			if (out_path == NULL)
+				goto error;
+		} else
+			/* stdout mode */
+			pr_debug("stdout mode\n");
+
+		rw_thread_init(cpu, in_path, out_path, s->use_stdout,
+						s->pipe_size, s->rw_ti[cpu]);
+	}
+
+	/* init controller of read/write threads */
+	s->ctl_fd = rw_ctl_init((const char *)CTL_PATH);
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static void *parse_args(int argc, char *argv[], struct agent_info *s)
+{
+	int cmd;
+	unsigned long size;
+
+	while ((cmd = getopt(argc, argv, "hos:")) != -1) {
+		switch (cmd) {
+		/* stdout mode */
+		case 'o':
+			s->use_stdout = true;
+			break;
+		/* size of pipe */
+		case 's':
+			size = parse_size(optarg);
+			if (size == 0)
+				goto error;
+			s->pipe_size = size;
+			break;
+		case 'h':
+		default:
+			usage(argv[0]);
+			goto error;
+		}
+	}
+
+	agent_info_init(s);
+
+	return NULL;
+
+error:
+	exit(EXIT_FAILURE);
+}
+
+static void agent_main_loop(struct agent_info *s)
+{
+	int cpu;
+	pthread_t rw_thread_per_cpu[MAX_CPUS];
+
+	/* Start all read/write threads */
+	for (cpu = 0; cpu < s->cpus; cpu++)
+		rw_thread_per_cpu[cpu] = rw_thread_run(s->rw_ti[cpu]);
+
+	rw_ctl_loop(s->ctl_fd);
+
+	/* Finish all read/write threads */
+	for (cpu = 0; cpu < s->cpus; cpu++) {
+		int ret;
+
+		ret = pthread_join(rw_thread_per_cpu[cpu], NULL);
+		if (ret != 0) {
+			pr_err("pthread_join() error:%d (cpu %d)\n", ret, cpu);
+			exit(EXIT_FAILURE);
+		}
+	}
+}
+
+static void agent_info_free(struct agent_info *s)
+{
+	int i;
+
+	close(s->ctl_fd);
+	for (i = 0; i < s->cpus; i++) {
+		close(s->rw_ti[i]->in_fd);
+		close(s->rw_ti[i]->out_fd);
+		close(s->rw_ti[i]->read_pipe);
+		close(s->rw_ti[i]->write_pipe);
+		free(s->rw_ti[i]);
+	}
+	free(s);
+}
+
+int main(int argc, char *argv[])
+{
+	struct agent_info *s = NULL;
+
+	s = agent_info_new();
+	parse_args(argc, argv, s);
+
+	agent_main_loop(s);
+
+	agent_info_free(s);
+
+	return 0;
+}
diff --git a/tools/virtio/virtio-trace/trace-agent.h b/tools/virtio/virtio-trace/trace-agent.h
new file mode 100644
index 0000000..8de79bf
--- /dev/null
+++ b/tools/virtio/virtio-trace/trace-agent.h
@@ -0,0 +1,75 @@
+#ifndef __TRACE_AGENT_H__
+#define __TRACE_AGENT_H__
+#include <pthread.h>
+#include <stdbool.h>
+
+#define MAX_CPUS	256
+#define PIPE_INIT       (1024*1024)
+
+/*
+ * agent_info - structure managing total information of guest agent
+ * @pipe_size:	size of pipe (default 1MB)
+ * @use_stdout:	set to true when o option is added (default false)
+ * @cpus:	total number of CPUs
+ * @ctl_fd:	fd of control path, /dev/virtio-ports/agent-ctl-path
+ * @rw_ti:	structure managing information of read/write threads
+ */
+struct agent_info {
+	unsigned long pipe_size;
+	bool use_stdout;
+	int cpus;
+	int ctl_fd;
+	struct rw_thread_info *rw_ti[MAX_CPUS];
+};
+
+/*
+ * rw_thread_info - structure managing a read/write thread a cpu
+ * @cpu_num:	cpu number operating this read/write thread
+ * @in_fd:	fd of reading trace data path in cpu_num
+ * @out_fd:	fd of writing trace data path in cpu_num
+ * @read_pipe:	fd of read pipe
+ * @write_pipe:	fd of write pipe
+ * @pipe_size:	size of pipe (default 1MB)
+ */
+struct rw_thread_info {
+	int cpu_num;
+	int in_fd;
+	int out_fd;
+	int read_pipe;
+	int write_pipe;
+	unsigned long pipe_size;
+};
+
+/* use for stopping rw threads */
+extern bool global_sig_receive;
+
+/* use for notification */
+extern bool global_run_operation;
+extern pthread_mutex_t mutex_notify;
+extern pthread_cond_t cond_wakeup;
+
+/* for controller of read/write threads */
+extern int rw_ctl_init(const char *ctl_path);
+extern void *rw_ctl_loop(int ctl_fd);
+
+/* for trace read/write thread */
+extern void *rw_thread_info_new(void);
+extern void *rw_thread_init(int cpu, const char *in_path, const char *out_path,
+			bool stdout_flag, unsigned long pipe_size,
+			struct rw_thread_info *rw_ti);
+extern pthread_t rw_thread_run(struct rw_thread_info *rw_ti);
+
+static inline void *zalloc(size_t size)
+{
+	return calloc(1, size);
+}
+
+#define pr_err(format, ...) fprintf(stderr, format, ## __VA_ARGS__)
+#define pr_info(format, ...) fprintf(stdout, format, ## __VA_ARGS__)
+#ifdef DEBUG
+#define pr_debug(format, ...) fprintf(stderr, format, ## __VA_ARGS__)
+#else
+#define pr_debug(format, ...) do {} while (0)
+#endif
+
+#endif /*__TRACE_AGENT_H__*/

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 0/6] virtio-trace: Support virtio-trace
  2012-08-09 12:30 ` [Qemu-devel] " Yoshihiro YUNOMAE
  (?)
@ 2012-08-12 23:28   ` Rusty Russell
  -1 siblings, 0 replies; 30+ messages in thread
From: Rusty Russell @ 2012-08-12 23:28 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE, linux-kernel
  Cc: Amit Shah, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Masami Hiramatsu,
	Steven Rostedt, virtualization, qemu-devel, yrl.pp-manager.tt

On Thu, 09 Aug 2012 21:30:29 +0900, Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> wrote:
> Hi All,
> 
> The following patch set provides a low-overhead system for collecting kernel
> tracing data of guests by a host in a virtualization environment.

Thankyou!

I've applied this, and it will head into linux-next in the next few
days.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH V2 0/6] virtio-trace: Support virtio-trace
@ 2012-08-12 23:28   ` Rusty Russell
  0 siblings, 0 replies; 30+ messages in thread
From: Rusty Russell @ 2012-08-12 23:28 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE, linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker,
	yrl.pp-manager.tt, qemu-devel, Borislav Petkov, virtualization,
	Masami Hiramatsu, Franch Ch. Eigler, Ingo Molnar,
	Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, Amit Shah

On Thu, 09 Aug 2012 21:30:29 +0900, Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> wrote:
> Hi All,
> 
> The following patch set provides a low-overhead system for collecting kernel
> tracing data of guests by a host in a virtualization environment.

Thankyou!

I've applied this, and it will head into linux-next in the next few
days.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 0/6] virtio-trace: Support virtio-trace
@ 2012-08-12 23:28   ` Rusty Russell
  0 siblings, 0 replies; 30+ messages in thread
From: Rusty Russell @ 2012-08-12 23:28 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE, linux-kernel
  Cc: Herbert Xu, Arnd Bergmann, Frederic Weisbecker,
	yrl.pp-manager.tt, qemu-devel, Borislav Petkov, virtualization,
	Masami Hiramatsu, Franch Ch. Eigler, Ingo Molnar,
	Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, Amit Shah

On Thu, 09 Aug 2012 21:30:29 +0900, Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> wrote:
> Hi All,
> 
> The following patch set provides a low-overhead system for collecting kernel
> tracing data of guests by a host in a virtualization environment.

Thankyou!

I've applied this, and it will head into linux-next in the next few
days.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 0/6] virtio-trace: Support virtio-trace
  2012-08-09 12:30 ` [Qemu-devel] " Yoshihiro YUNOMAE
  (?)
@ 2012-08-21  5:16   ` Amit Shah
  -1 siblings, 0 replies; 30+ messages in thread
From: Amit Shah @ 2012-08-21  5:16 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: linux-kernel, Anthony Liguori, Arnd Bergmann, Borislav Petkov,
	Franch Ch. Eigler, Frederic Weisbecker, Greg Kroah-Hartman,
	Herbert Xu, Ingo Molnar, Mathieu Desnoyers, Masami Hiramatsu,
	Rusty Russell, Steven Rostedt, virtualization, qemu-devel,
	yrl.pp-manager.tt

On (Thu) 09 Aug 2012 [21:30:29], Yoshihiro YUNOMAE wrote:
> Hi All,
> 
> The following patch set provides a low-overhead system for collecting kernel
> tracing data of guests by a host in a virtualization environment.

ACK this series.

I ran it through the virtio-serial test suite, and there's no
regression in existing functionality.

I encourage you to check out the virtio-serial test suite[1] as well
as the virtio-serial unit tests in the autotest[2] code, and
contribute to them to add tests for the new functionality you've
added.

[1]
http://fedorapeople.org/cgit/amitshah/public_git/test-virtserial.git/

[2]
https://github.com/autotest/autotest/blob/e91cd67b845b291622c8d079a8289c4b0cb1e6ae/client/tests/kvm/tests/virtio_console.py


Thanks,

		Amit

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH V2 0/6] virtio-trace: Support virtio-trace
@ 2012-08-21  5:16   ` Amit Shah
  0 siblings, 0 replies; 30+ messages in thread
From: Amit Shah @ 2012-08-21  5:16 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: Arnd Bergmann, qemu-devel, Frederic Weisbecker, Rusty Russell,
	linux-kernel, Borislav Petkov, virtualization, Herbert Xu,
	Franch Ch. Eigler, Ingo Molnar, Mathieu Desnoyers,
	Steven Rostedt, Anthony Liguori, Greg Kroah-Hartman,
	Masami Hiramatsu, yrl.pp-manager.tt

On (Thu) 09 Aug 2012 [21:30:29], Yoshihiro YUNOMAE wrote:
> Hi All,
> 
> The following patch set provides a low-overhead system for collecting kernel
> tracing data of guests by a host in a virtualization environment.

ACK this series.

I ran it through the virtio-serial test suite, and there's no
regression in existing functionality.

I encourage you to check out the virtio-serial test suite[1] as well
as the virtio-serial unit tests in the autotest[2] code, and
contribute to them to add tests for the new functionality you've
added.

[1]
http://fedorapeople.org/cgit/amitshah/public_git/test-virtserial.git/

[2]
https://github.com/autotest/autotest/blob/e91cd67b845b291622c8d079a8289c4b0cb1e6ae/client/tests/kvm/tests/virtio_console.py


Thanks,

		Amit

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 0/6] virtio-trace: Support virtio-trace
@ 2012-08-21  5:16   ` Amit Shah
  0 siblings, 0 replies; 30+ messages in thread
From: Amit Shah @ 2012-08-21  5:16 UTC (permalink / raw)
  To: Yoshihiro YUNOMAE
  Cc: Arnd Bergmann, qemu-devel, Frederic Weisbecker, linux-kernel,
	Borislav Petkov, virtualization, Herbert Xu, Franch Ch. Eigler,
	Ingo Molnar, Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, Masami Hiramatsu, yrl.pp-manager.tt

On (Thu) 09 Aug 2012 [21:30:29], Yoshihiro YUNOMAE wrote:
> Hi All,
> 
> The following patch set provides a low-overhead system for collecting kernel
> tracing data of guests by a host in a virtualization environment.

ACK this series.

I ran it through the virtio-serial test suite, and there's no
regression in existing functionality.

I encourage you to check out the virtio-serial test suite[1] as well
as the virtio-serial unit tests in the autotest[2] code, and
contribute to them to add tests for the new functionality you've
added.

[1]
http://fedorapeople.org/cgit/amitshah/public_git/test-virtserial.git/

[2]
https://github.com/autotest/autotest/blob/e91cd67b845b291622c8d079a8289c4b0cb1e6ae/client/tests/kvm/tests/virtio_console.py


Thanks,

		Amit

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [PATCH V2 0/6] virtio-trace: Support virtio-trace
  2012-08-21  5:16   ` [Qemu-devel] " Amit Shah
  (?)
@ 2012-08-21 12:33     ` Masami Hiramatsu
  -1 siblings, 0 replies; 30+ messages in thread
From: Masami Hiramatsu @ 2012-08-21 12:33 UTC (permalink / raw)
  To: Amit Shah
  Cc: Yoshihiro YUNOMAE, linux-kernel, Anthony Liguori, Arnd Bergmann,
	Borislav Petkov, Franch Ch. Eigler, Frederic Weisbecker,
	Greg Kroah-Hartman, Herbert Xu, Ingo Molnar, Mathieu Desnoyers,
	Rusty Russell, Steven Rostedt, virtualization, qemu-devel,
	yrl.pp-manager.tt

(2012/08/21 14:16), Amit Shah wrote:
> On (Thu) 09 Aug 2012 [21:30:29], Yoshihiro YUNOMAE wrote:
>> Hi All,
>>
>> The following patch set provides a low-overhead system for collecting kernel
>> tracing data of guests by a host in a virtualization environment.
> 
> ACK this series.

Thank you!

> I ran it through the virtio-serial test suite, and there's no
> regression in existing functionality.
> 
> I encourage you to check out the virtio-serial test suite[1] as well
> as the virtio-serial unit tests in the autotest[2] code, and
> contribute to them to add tests for the new functionality you've
> added.

Ah, OK we'll look into them for adding some tests for
splice support :)

> 
> [1]
> http://fedorapeople.org/cgit/amitshah/public_git/test-virtserial.git/
> 
> [2]
> https://github.com/autotest/autotest/blob/e91cd67b845b291622c8d079a8289c4b0cb1e6ae/client/tests/kvm/tests/virtio_console.py
> 
> 

Thank you again,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH V2 0/6] virtio-trace: Support virtio-trace
@ 2012-08-21 12:33     ` Masami Hiramatsu
  0 siblings, 0 replies; 30+ messages in thread
From: Masami Hiramatsu @ 2012-08-21 12:33 UTC (permalink / raw)
  To: Amit Shah
  Cc: Arnd Bergmann, qemu-devel, Frederic Weisbecker,
	Yoshihiro YUNOMAE, Rusty Russell, linux-kernel, Borislav Petkov,
	virtualization, Herbert Xu, Franch Ch. Eigler, Ingo Molnar,
	Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, yrl.pp-manager.tt

(2012/08/21 14:16), Amit Shah wrote:
> On (Thu) 09 Aug 2012 [21:30:29], Yoshihiro YUNOMAE wrote:
>> Hi All,
>>
>> The following patch set provides a low-overhead system for collecting kernel
>> tracing data of guests by a host in a virtualization environment.
> 
> ACK this series.

Thank you!

> I ran it through the virtio-serial test suite, and there's no
> regression in existing functionality.
> 
> I encourage you to check out the virtio-serial test suite[1] as well
> as the virtio-serial unit tests in the autotest[2] code, and
> contribute to them to add tests for the new functionality you've
> added.

Ah, OK we'll look into them for adding some tests for
splice support :)

> 
> [1]
> http://fedorapeople.org/cgit/amitshah/public_git/test-virtserial.git/
> 
> [2]
> https://github.com/autotest/autotest/blob/e91cd67b845b291622c8d079a8289c4b0cb1e6ae/client/tests/kvm/tests/virtio_console.py
> 
> 

Thank you again,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [PATCH V2 0/6] virtio-trace: Support virtio-trace
@ 2012-08-21 12:33     ` Masami Hiramatsu
  0 siblings, 0 replies; 30+ messages in thread
From: Masami Hiramatsu @ 2012-08-21 12:33 UTC (permalink / raw)
  To: Amit Shah
  Cc: Arnd Bergmann, qemu-devel, Frederic Weisbecker, linux-kernel,
	Borislav Petkov, virtualization, Herbert Xu, Franch Ch. Eigler,
	Ingo Molnar, Mathieu Desnoyers, Steven Rostedt, Anthony Liguori,
	Greg Kroah-Hartman, yrl.pp-manager.tt

(2012/08/21 14:16), Amit Shah wrote:
> On (Thu) 09 Aug 2012 [21:30:29], Yoshihiro YUNOMAE wrote:
>> Hi All,
>>
>> The following patch set provides a low-overhead system for collecting kernel
>> tracing data of guests by a host in a virtualization environment.
> 
> ACK this series.

Thank you!

> I ran it through the virtio-serial test suite, and there's no
> regression in existing functionality.
> 
> I encourage you to check out the virtio-serial test suite[1] as well
> as the virtio-serial unit tests in the autotest[2] code, and
> contribute to them to add tests for the new functionality you've
> added.

Ah, OK we'll look into them for adding some tests for
splice support :)

> 
> [1]
> http://fedorapeople.org/cgit/amitshah/public_git/test-virtserial.git/
> 
> [2]
> https://github.com/autotest/autotest/blob/e91cd67b845b291622c8d079a8289c4b0cb1e6ae/client/tests/kvm/tests/virtio_console.py
> 
> 

Thank you again,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2012-08-21 12:33 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-09 12:30 [PATCH V2 0/6] virtio-trace: Support virtio-trace Yoshihiro YUNOMAE
2012-08-09 12:30 ` Yoshihiro YUNOMAE
2012-08-09 12:30 ` [Qemu-devel] " Yoshihiro YUNOMAE
2012-08-09 12:30 ` [PATCH V2 1/6] virtio/console: Add splice_write support Yoshihiro YUNOMAE
2012-08-09 12:30   ` Yoshihiro YUNOMAE
2012-08-09 12:30   ` [Qemu-devel] " Yoshihiro YUNOMAE
2012-08-09 12:30 ` [PATCH V2 2/6] virtio/console: Add a failback for unstealable pipe buffer Yoshihiro YUNOMAE
2012-08-09 12:30   ` Yoshihiro YUNOMAE
2012-08-09 12:30   ` [Qemu-devel] " Yoshihiro YUNOMAE
2012-08-09 12:31 ` [PATCH V2 3/6] virtio/console: Wait until the port is ready on splice Yoshihiro YUNOMAE
2012-08-09 12:31   ` Yoshihiro YUNOMAE
2012-08-09 12:31   ` [Qemu-devel] " Yoshihiro YUNOMAE
2012-08-09 12:31 ` [PATCH V2 4/6] ftrace: Allow stealing pages from pipe buffer Yoshihiro YUNOMAE
2012-08-09 12:31   ` Yoshihiro YUNOMAE
2012-08-09 12:31   ` [Qemu-devel] " Yoshihiro YUNOMAE
2012-08-09 12:31 ` [PATCH V2 5/6] virtio/console: Allocate scatterlist according to the current pipe size Yoshihiro YUNOMAE
2012-08-09 12:31   ` Yoshihiro YUNOMAE
2012-08-09 12:31   ` [Qemu-devel] " Yoshihiro YUNOMAE
2012-08-09 12:31 ` [PATCH V2 6/6] tools: Add guest trace agent as a user tool Yoshihiro YUNOMAE
2012-08-09 12:31   ` Yoshihiro YUNOMAE
2012-08-09 12:31   ` [Qemu-devel] " Yoshihiro YUNOMAE
2012-08-12 23:28 ` [PATCH V2 0/6] virtio-trace: Support virtio-trace Rusty Russell
2012-08-12 23:28   ` Rusty Russell
2012-08-12 23:28   ` [Qemu-devel] " Rusty Russell
2012-08-21  5:16 ` Amit Shah
2012-08-21  5:16   ` Amit Shah
2012-08-21  5:16   ` [Qemu-devel] " Amit Shah
2012-08-21 12:33   ` Masami Hiramatsu
2012-08-21 12:33     ` Masami Hiramatsu
2012-08-21 12:33     ` [Qemu-devel] " Masami Hiramatsu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.