All of lore.kernel.org
 help / color / mirror / Atom feed
* lguest, virtio_blk, id is not a head!
@ 2008-12-12 19:47 Andre Grueneberg
  2008-12-15  0:31 ` [Lguest] " Rusty Russell
  0 siblings, 1 reply; 3+ messages in thread
From: Andre Grueneberg @ 2008-12-12 19:47 UTC (permalink / raw)
  To: lguest; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1155 bytes --]

Hi,

[I'm not regularly reading lkml, but I do read lguest, so please include
me in CC]
I'm using lguest and thus virtio_blk.
Lately I upgraded my host and guest from kernel 2.6.25 (Debian) to
vanilla 2.6.27.8. I have two virtual machines and in both I recognised
the same issue after a while.

After running for some hours the first guest showed the following
message on the console:
virtio_blk virtio1: id 1 is not a head!

Afterwards I was not able to log in to the guest anymore, so I had to
kill it. The kernel was still running -- ping was answered. Only the 
block device (with / on it) seems to have went unresponsive.

A few days later the other guest "crashed" with a similar message
(different id). Another two days later, the same happened.

I do not (yet) know how to reproduce this issue, but it seems to happen
more often. I've never seen this happen with 2.6.25, which ran for ~3
months without stop.

I'm thankful for any idea how to hunt this issue. Especially I wonder
under what (practical) circumstances one can trigger the respective code
in drivers/virtio/virtio_ring.c vring_get_buf().

Thanks,
Andre

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Lguest] lguest, virtio_blk, id is not a head!
  2008-12-12 19:47 lguest, virtio_blk, id is not a head! Andre Grueneberg
@ 2008-12-15  0:31 ` Rusty Russell
  2008-12-16  3:24   ` Andre Grueneberg
  0 siblings, 1 reply; 3+ messages in thread
From: Rusty Russell @ 2008-12-15  0:31 UTC (permalink / raw)
  To: lguest; +Cc: Andre Grueneberg, linux-kernel

On Saturday 13 December 2008 06:17:28 Andre Grueneberg wrote:
> Lately I upgraded my host and guest from kernel 2.6.25 (Debian) to
> vanilla 2.6.27.8. I have two virtual machines and in both I recognised
> the same issue after a while.
> 
> After running for some hours the first guest showed the following
> message on the console:
> virtio_blk virtio1: id 1 is not a head!

Hi Andre,

   This is interesting.  There hasn't been a great deal of change between
those versions on the lguest side.  This sounds like a race, and the only
thing I can see which would have introduced this is the change below.

You can apply this with patch -R (a bit of fuzz, but should be fine), then
see if the problem goes away?

Thanks,
Rusty.

commit 8c79873da0d2bedf4ad6b868c54e426bb0a2fe38
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Tue Jul 29 09:58:38 2008 -0500

    lguest: turn Waker into a thread, not a process
    
    lguest uses a Waker process to break it out of the kernel (ie.
    actually running the guest) when file descriptor needs attention.
    
    Changing this from a process to a thread somewhat simplifies things:
    it can directly access the fd_set of things to watch.  More
    importantly, it means that the Waker can see Guest memory correctly,
    so /dev/vring file descriptors will work as anticipated (the
    alternative is to actually mmap MAP_SHARED, but you can't do that with
    /dev/zero).
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index f9bba2d..b88b0ea 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -76,8 +76,12 @@ static bool verbose;
 	do { if (verbose) printf(args); } while(0)
 /*:*/
 
-/* The pipe to send commands to the waker process */
-static int waker_fd;
+/* File descriptors for the Waker. */
+struct {
+	int pipe[2];
+	int lguest_fd;
+} waker_fds;
+
 /* The pointer to the start of guest memory. */
 static void *guest_base;
 /* The maximum guest physical address allowed, and maximum possible. */
@@ -579,69 +583,64 @@ static void add_device_fd(int fd)
  * watch, but handing a file descriptor mask through to the kernel is fairly
  * icky.
  *
- * Instead, we fork off a process which watches the file descriptors and writes
+ * Instead, we clone off a thread which watches the file descriptors and writes
  * the LHREQ_BREAK command to the /dev/lguest file descriptor to tell the Host
  * stop running the Guest.  This causes the Launcher to return from the
  * /dev/lguest read with -EAGAIN, where it will write to /dev/lguest to reset
  * the LHREQ_BREAK and wake us up again.
  *
  * This, of course, is merely a different *kind* of icky.
+ *
+ * Given my well-known antipathy to threads, I'd prefer to use processes.  But
+ * it's easier to share Guest memory with threads, and trivial to share the
+ * devices.infds as the Launcher changes it.
  */
-static void wake_parent(int pipefd, int lguest_fd)
+static int waker(void *unused)
 {
-	/* Add the pipe from the Launcher to the fdset in the device_list, so
-	 * we watch it, too. */
-	add_device_fd(pipefd);
+	/* Close the write end of the pipe: only the Launcher has it open. */
+	close(waker_fds.pipe[1]);
 
 	for (;;) {
 		fd_set rfds = devices.infds;
 		unsigned long args[] = { LHREQ_BREAK, 1 };
+		unsigned int maxfd = devices.max_infd;
+
+		/* We also listen to the pipe from the Launcher. */
+		FD_SET(waker_fds.pipe[0], &rfds);
+		if (waker_fds.pipe[0] > maxfd)
+			maxfd = waker_fds.pipe[0];
 
 		/* Wait until input is ready from one of the devices. */
-		select(devices.max_infd+1, &rfds, NULL, NULL, NULL);
-		/* Is it a message from the Launcher? */
-		if (FD_ISSET(pipefd, &rfds)) {
-			int fd;
-			/* If read() returns 0, it means the Launcher has
-			 * exited.  We silently follow. */
-			if (read(pipefd, &fd, sizeof(fd)) == 0)
-				exit(0);
-			/* Otherwise it's telling us to change what file
-			 * descriptors we're to listen to.  Positive means
-			 * listen to a new one, negative means stop
-			 * listening. */
-			if (fd >= 0)
-				FD_SET(fd, &devices.infds);
-			else
-				FD_CLR(-fd - 1, &devices.infds);
-		} else /* Send LHREQ_BREAK command. */
-			pwrite(lguest_fd, args, sizeof(args), cpu_id);
+		select(maxfd+1, &rfds, NULL, NULL, NULL);
+
+		/* Message from Launcher? */
+		if (FD_ISSET(waker_fds.pipe[0], &rfds)) {
+			char c;
+			/* If this fails, then assume Launcher has exited.
+			 * Don't do anything on exit: we're just a thread! */
+			if (read(waker_fds.pipe[0], &c, 1) != 1)
+				_exit(0);
+			continue;
+		}
+
+		/* Send LHREQ_BREAK command to snap the Launcher out of it. */
+		pwrite(waker_fds.lguest_fd, args, sizeof(args), cpu_id);
 	}
+	return 0;
 }
 
 /* This routine just sets up a pipe to the Waker process. */
-static int setup_waker(int lguest_fd)
-{
-	int pipefd[2], child;
-
-	/* We create a pipe to talk to the Waker, and also so it knows when the
-	 * Launcher dies (and closes pipe). */
-	pipe(pipefd);
-	child = fork();
-	if (child == -1)
-		err(1, "forking");
-
-	if (child == 0) {
-		/* We are the Waker: close the "writing" end of our copy of the
-		 * pipe and start waiting for input. */
-		close(pipefd[1]);
-		wake_parent(pipefd[0], lguest_fd);
-	}
-	/* Close the reading end of our copy of the pipe. */
-	close(pipefd[0]);
+static void setup_waker(int lguest_fd)
+{
+	/* This pipe is closed when Launcher dies, telling Waker. */
+	if (pipe(waker_fds.pipe) != 0)
+		err(1, "Creating pipe for Waker");
 
-	/* Here is the fd used to talk to the waker. */
-	return pipefd[1];
+	/* Waker also needs to know the lguest fd */
+	waker_fds.lguest_fd = lguest_fd;
+
+	if (clone(waker, malloc(4096) + 4096, CLONE_VM | SIGCHLD, NULL) == -1)
+		err(1, "Creating Waker");
 }
 
 /*
@@ -863,8 +862,8 @@ static bool handle_console_input(int fd, struct device *dev)
 				unsigned long args[] = { LHREQ_BREAK, 0 };
 				/* Close the fd so Waker will know it has to
 				 * exit. */
-				close(waker_fd);
-				/* Just in case waker is blocked in BREAK, send
+				close(waker_fds.pipe[1]);
+				/* Just in case Waker is blocked in BREAK, send
 				 * unbreak now. */
 				write(fd, args, sizeof(args));
 				exit(2);
@@ -996,8 +995,8 @@ static bool handle_tun_input(int fd, struct device *dev)
 static void enable_fd(int fd, struct virtqueue *vq, bool timeout)
 {
 	add_device_fd(vq->dev->fd);
-	/* Tell waker to listen to it again */
-	write(waker_fd, &vq->dev->fd, sizeof(vq->dev->fd));
+	/* Snap the Waker out of its select loop. */
+	write(waker_fds.pipe[1], "", 1);
 }
 
 static void net_enable_fd(int fd, struct virtqueue *vq, bool timeout)
@@ -1134,7 +1133,6 @@ static void handle_input(int fd)
 		 * descriptors and a method of handling them.  */
 		for (i = devices.dev; i; i = i->next) {
 			if (i->handle_input && FD_ISSET(i->fd, &fds)) {
-				int dev_fd;
 				if (i->handle_input(fd, i))
 					continue;
 
@@ -1144,11 +1142,6 @@ static void handle_input(int fd)
 				 * buffers to deliver into.  Console also uses
 				 * it when it discovers that stdin is closed. */
 				FD_CLR(i->fd, &devices.infds);
-				/* Tell waker to ignore it too, by sending a
-				 * negative fd number (-1, since 0 is a valid
-				 * FD number). */
-				dev_fd = -i->fd - 1;
-				write(waker_fd, &dev_fd, sizeof(dev_fd));
 			}
 		}
 
@@ -1880,11 +1873,12 @@ static void __attribute__((noreturn)) restart_guest(void)
 {
 	unsigned int i;
 
-	/* Closing pipes causes the Waker thread and io_threads to die, and
-	 * closing /dev/lguest cleans up the Guest.  Since we don't track all
-	 * open fds, we simply close everything beyond stderr. */
+	/* Since we don't track all open fds, we simply close everything beyond
+	 * stderr. */
 	for (i = 3; i < FD_SETSIZE; i++)
 		close(i);
+
+	/* The exec automatically gets rid of the I/O and Waker threads. */
 	execv(main_args[0], main_args);
 	err(1, "Could not exec %s", main_args[0]);
 }
@@ -2085,10 +2079,10 @@ int main(int argc, char *argv[])
 	 * /dev/lguest file descriptor. */
 	lguest_fd = tell_kernel(pgdir, start);
 
-	/* We fork off a child process, which wakes the Launcher whenever one
-	 * of the input file descriptors needs attention.  We call this the
-	 * Waker, and we'll cover it in a moment. */
-	waker_fd = setup_waker(lguest_fd);
+	/* We clone off a thread, which wakes the Launcher whenever one of the
+	 * input file descriptors needs attention.  We call this the Waker, and
+	 * we'll cover it in a moment. */
+	setup_waker(lguest_fd);
 
 	/* Finally, run the Guest.  This doesn't return. */
 	run_guest(lguest_fd);

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [Lguest] lguest, virtio_blk, id is not a head!
  2008-12-15  0:31 ` [Lguest] " Rusty Russell
@ 2008-12-16  3:24   ` Andre Grueneberg
  0 siblings, 0 replies; 3+ messages in thread
From: Andre Grueneberg @ 2008-12-16  3:24 UTC (permalink / raw)
  To: Rusty Russell; +Cc: lguest, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2354 bytes --]

Hi Rusty,

Rusty Russell wrote:
> > Lately I upgraded my host and guest from kernel 2.6.25 (Debian) to
> > vanilla 2.6.27.8. I have two virtual machines and in both I recognised
> > the same issue after a while.
> > 
> > After running for some hours the first guest showed the following
> > message on the console:
> > virtio_blk virtio1: id 1 is not a head!
>    This is interesting.  There hasn't been a great deal of change between
> those versions on the lguest side.  This sounds like a race, and the only
> thing I can see which would have introduced this is the change below.

I removed the "lguest: turn Waker into a thread, not a process" patch as
proposed. Short result: The problem still exists.

Longer story:
As this issue is extremly hard to reproduce, I made an attempt to cause
many file system reads. If I understood things correctly this is when
the guest side of virtio_blk is likely to call vring_get_buf?! I did so
by creating a large file (300MiB vs. 256MiB memory) in the guest and
reading it multiple times in 1k pieces (dd).

To verify that it works, I ran this loop on 2.6.27.8 guest with the
original launcher: failure after ~200 iterations.

So I went on with the patched version of lguest (launcher with waker
process) and the same 2.6.27.8 guest: failure after ~600 iterations.

Next step was a 2.6.25 guest with the original launcher: I stopped it
after ~1600 iterations did not show a fault.

As it occured to me that using a file in a guest's file system might not
be the optimal target, I have now modified my loop to use the first
300MiB of the block device itself. With the original combination, I now
got the failure after ~750 iterations.

I am now running this test with 2.6.25 guest again. As the host is also
serving some serious purpose, I can only run this kind of long lasting
tests during night time. So I'll be forced to stop this test soon (but
I'll move into bed first).

My next step will be to use some non-root block device, to have a chance
to access the guest after the crash. Do you have any idea what to look
at then?

One thing, I just recognised:
While my 2.6.25 guest kernel is configured with "NOHz mode", the 
2.6.27.8 guest is using "high resolution mode".

Thanks so far,
André
-- 
Reality is a crutch for people who can't handle drugs.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-12-16  3:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-12 19:47 lguest, virtio_blk, id is not a head! Andre Grueneberg
2008-12-15  0:31 ` [Lguest] " Rusty Russell
2008-12-16  3:24   ` Andre Grueneberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.