All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
@ 2011-08-03 14:51 Michael Tokarev
  2011-08-04 19:24 ` Luiz Capitulino
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2011-08-03 14:51 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: Jan Kiszka, Michael Tokarev, qemu-devel

If we do, it results in double monitor_resume() (second being called
from migrate_fd_cleanup() anyway) and monitor suspend count becoming
negative.

Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-By: Jan Kiszka <jan.kiszka@siemens.com>
---
 migration.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/migration.c b/migration.c
index 2a15b98..7ca883f 100644
--- a/migration.c
+++ b/migration.c
@@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
     if (ret == -EAGAIN) {
         qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
     } else if (ret < 0) {
-        if (s->mon) {
-            monitor_resume(s->mon);
-        }
         s->state = MIG_STATE_ERROR;
         notifier_list_notify(&migration_state_notifiers, NULL);
     }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
  2011-08-03 14:51 [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path Michael Tokarev
@ 2011-08-04 19:24 ` Luiz Capitulino
  2011-08-04 19:52   ` Michael Tokarev
  0 siblings, 1 reply; 13+ messages in thread
From: Luiz Capitulino @ 2011-08-04 19:24 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Jan Kiszka, mtosatti, qemu-devel

On Wed,  3 Aug 2011 18:51:44 +0400
Michael Tokarev <mjt@tls.msk.ru> wrote:

> If we do, it results in double monitor_resume() (second being called
> from migrate_fd_cleanup() anyway) and monitor suspend count becoming
> negative.

Are you hitting an specific issue or did you find this by code inspection?

IIRC, I asked Marcelo to add the monitor_resume() call in the fix for
e447b1a603 because the monitor wasn't being resumed in some cases. Don't
remember which though, do you Marcelo?

I see two possibilities here:

 1. After e447b1a603 there was some change that made the monitor_resume() call
    in migrate_fd_put_buffer() unnecessary

 2. We're calling it in the wrong place

Taking a quick look at the code I see that migrate_fd_cleanup() doesn't
seem to be called when qemu_savevm_state_iterate() fails, for example.

> 
> Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
> Reviewed-By: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  migration.c |    3 ---
>  1 files changed, 0 insertions(+), 3 deletions(-)
> 
> diff --git a/migration.c b/migration.c
> index 2a15b98..7ca883f 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
>      if (ret == -EAGAIN) {
>          qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
>      } else if (ret < 0) {
> -        if (s->mon) {
> -            monitor_resume(s->mon);
> -        }
>          s->state = MIG_STATE_ERROR;
>          notifier_list_notify(&migration_state_notifiers, NULL);
>      }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
  2011-08-04 19:24 ` Luiz Capitulino
@ 2011-08-04 19:52   ` Michael Tokarev
  2011-08-04 20:00     ` Michael Tokarev
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2011-08-04 19:52 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: Jan Kiszka, mtosatti, qemu-devel

04.08.2011 23:24, Luiz Capitulino пишет:
> On Wed,  3 Aug 2011 18:51:44 +0400
> Michael Tokarev <mjt@tls.msk.ru> wrote:
> 
>> If we do, it results in double monitor_resume() (second being called
>> from migrate_fd_cleanup() anyway) and monitor suspend count becoming
>> negative.
> 
> Are you hitting an specific issue or did you find this by code inspection?

The specific case I'm talking about can trivially be triggered.
Run migrate "exec:nosuchfile" and your monitor will be stuck
forever.

> IIRC, I asked Marcelo to add the monitor_resume() call in the fix for
> e447b1a603 because the monitor wasn't being resumed in some cases. Don't
> remember which though, do you Marcelo?

And this (e447b1a603) is actually exactly the same thing: when the
child terminates before qemu completes sending stuff to it.

> I see two possibilities here:
> 
>  1. After e447b1a603 there was some change that made the monitor_resume() call
>     in migrate_fd_put_buffer() unnecessary
> 
>  2. We're calling it in the wrong place
> 
> Taking a quick look at the code I see that migrate_fd_cleanup() doesn't
> seem to be called when qemu_savevm_state_iterate() fails, for example.

Yes indeed, but that will lead to a havoc in other places, like
leaving callback notifier registered even after migration gets
cancelled, or keeping the file open.

Note that qemu_savevm_state_iterate() gets called from a callback
routine.

What I observe here is that when the exec: process exits prematurely,
qemu will continue writing stuff to pipe up till migrate_fd_put_buffer()
hits error (EPIPE iirc), at which point we'll call mon->resume() twice.
The same happens in case of tcp migration when the other end closes
the connection.

I don't know when qemu_savevm_state_iterate() may fail, or why
qemu_file_has_error() in there does not return true.

Maybe the problem here is much more severe actually, -- that's why
I initially asked what it is all about, -- before offering the patch.

Thanks,

/mjt

>> Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
>> Reviewed-By: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  migration.c |    3 ---
>>  1 files changed, 0 insertions(+), 3 deletions(-)
>>
>> diff --git a/migration.c b/migration.c
>> index 2a15b98..7ca883f 100644
>> --- a/migration.c
>> +++ b/migration.c
>> @@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
>>      if (ret == -EAGAIN) {
>>          qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
>>      } else if (ret < 0) {
>> -        if (s->mon) {
>> -            monitor_resume(s->mon);
>> -        }
>>          s->state = MIG_STATE_ERROR;
>>          notifier_list_notify(&migration_state_notifiers, NULL);
>>      }
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
  2011-08-04 19:52   ` Michael Tokarev
@ 2011-08-04 20:00     ` Michael Tokarev
  2011-08-04 22:19       ` Jan Kiszka
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2011-08-04 20:00 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: Jan Kiszka, mtosatti, qemu-devel

04.08.2011 23:52, Michael Tokarev пишет:
> 04.08.2011 23:24, Luiz Capitulino пишет:
>> On Wed,  3 Aug 2011 18:51:44 +0400
>> Michael Tokarev <mjt@tls.msk.ru> wrote:
>>
>>> If we do, it results in double monitor_resume() (second being called
>>> from migrate_fd_cleanup() anyway) and monitor suspend count becoming
>>> negative.
>>
>> Are you hitting an specific issue or did you find this by code inspection?
> 
> The specific case I'm talking about can trivially be triggered.
> Run migrate "exec:nosuchfile" and your monitor will be stuck
> forever.
> 
>> IIRC, I asked Marcelo to add the monitor_resume() call in the fix for
>> e447b1a603 because the monitor wasn't being resumed in some cases. Don't
>> remember which though, do you Marcelo?
> 
> And this (e447b1a603) is actually exactly the same thing: when the
> child terminates before qemu completes sending stuff to it.
> 
>> I see two possibilities here:
>>
>>  1. After e447b1a603 there was some change that made the monitor_resume() call
>>     in migrate_fd_put_buffer() unnecessary
>>
>>  2. We're calling it in the wrong place
>>
>> Taking a quick look at the code I see that migrate_fd_cleanup() doesn't
>> seem to be called when qemu_savevm_state_iterate() fails, for example.
> 
> Yes indeed, but that will lead to a havoc in other places, like
> leaving callback notifier registered even after migration gets
> cancelled, or keeping the file open.
> 
> Note that qemu_savevm_state_iterate() gets called from a callback
> routine.
> 
> What I observe here is that when the exec: process exits prematurely,
> qemu will continue writing stuff to pipe up till migrate_fd_put_buffer()
> hits error (EPIPE iirc), at which point we'll call mon->resume() twice.
> The same happens in case of tcp migration when the other end closes
> the connection.
> 
> I don't know when qemu_savevm_state_iterate() may fail, or why
> qemu_file_has_error() in there does not return true.
> 
> Maybe the problem here is much more severe actually, -- that's why
> I initially asked what it is all about, -- before offering the patch.

This looks like a race condition: which of the two places
will detect the error first.

Sometimes I see this (-monitor stdio):

(qemu) migrate "exec:nothing"
sh: nothing: not found

(qemu)

and after this, monitor is stuck as I described.  But sometimes
I see this:

(qemu) migrate "exec:nothing"
sh: nothing: not found
(qemu)

And it continues working.  Note the extra newline.
If the error gets detected sooner (like in case of
exec:nothing, or even better, after removing
/bin/sh so that system(3) will fail), the chance
to have stuck monitor is much higher.  If the
error gets detected later, it will survive most
likely.  Or something like that anyway.

 /mjt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
  2011-08-04 20:00     ` Michael Tokarev
@ 2011-08-04 22:19       ` Jan Kiszka
  2011-08-05  6:51         ` Michael Tokarev
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kiszka @ 2011-08-04 22:19 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: mtosatti, qemu-devel, Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 4583 bytes --]

On 2011-08-04 22:00, Michael Tokarev wrote:
> 04.08.2011 23:52, Michael Tokarev пишет:
>> 04.08.2011 23:24, Luiz Capitulino пишет:
>>> On Wed,  3 Aug 2011 18:51:44 +0400
>>> Michael Tokarev <mjt@tls.msk.ru> wrote:
>>>
>>>> If we do, it results in double monitor_resume() (second being called
>>>> from migrate_fd_cleanup() anyway) and monitor suspend count becoming
>>>> negative.
>>>
>>> Are you hitting an specific issue or did you find this by code inspection?
>>
>> The specific case I'm talking about can trivially be triggered.
>> Run migrate "exec:nosuchfile" and your monitor will be stuck
>> forever.
>>
>>> IIRC, I asked Marcelo to add the monitor_resume() call in the fix for
>>> e447b1a603 because the monitor wasn't being resumed in some cases. Don't
>>> remember which though, do you Marcelo?
>>
>> And this (e447b1a603) is actually exactly the same thing: when the
>> child terminates before qemu completes sending stuff to it.
>>
>>> I see two possibilities here:
>>>
>>>  1. After e447b1a603 there was some change that made the monitor_resume() call
>>>     in migrate_fd_put_buffer() unnecessary
>>>
>>>  2. We're calling it in the wrong place
>>>
>>> Taking a quick look at the code I see that migrate_fd_cleanup() doesn't
>>> seem to be called when qemu_savevm_state_iterate() fails, for example.
>>
>> Yes indeed, but that will lead to a havoc in other places, like
>> leaving callback notifier registered even after migration gets
>> cancelled, or keeping the file open.
>>
>> Note that qemu_savevm_state_iterate() gets called from a callback
>> routine.
>>
>> What I observe here is that when the exec: process exits prematurely,
>> qemu will continue writing stuff to pipe up till migrate_fd_put_buffer()
>> hits error (EPIPE iirc), at which point we'll call mon->resume() twice.
>> The same happens in case of tcp migration when the other end closes
>> the connection.
>>
>> I don't know when qemu_savevm_state_iterate() may fail, or why
>> qemu_file_has_error() in there does not return true.
>>
>> Maybe the problem here is much more severe actually, -- that's why
>> I initially asked what it is all about, -- before offering the patch.
> 
> This looks like a race condition: which of the two places
> will detect the error first.
> 
> Sometimes I see this (-monitor stdio):
> 
> (qemu) migrate "exec:nothing"
> sh: nothing: not found
> 
> (qemu)
> 
> and after this, monitor is stuck as I described.  But sometimes
> I see this:
> 
> (qemu) migrate "exec:nothing"
> sh: nothing: not found
> (qemu)
> 
> And it continues working.  Note the extra newline.
> If the error gets detected sooner (like in case of
> exec:nothing, or even better, after removing
> /bin/sh so that system(3) will fail), the chance
> to have stuck monitor is much higher.  If the
> error gets detected later, it will survive most
> likely.  Or something like that anyway.

Resume on error in migrate_fd_put_buffer raced with migrate_fd_cleanup
triggered via migrate_fd_put_ready called from migrate_fd_connect.

This migration code is a horrible maze. Patch below tries to move
monitor_resume a bit out of this. Please check if it works for you, then
I'll send it out properly.

Thanks,
Jan

---

diff --git a/migration.c b/migration.c
index 2a15b98..756fa62 100644
--- a/migration.c
+++ b/migration.c
@@ -292,18 +292,17 @@ int migrate_fd_cleanup(FdMigrationState *s)
             ret = -1;
         }
         s->file = NULL;
+    } else {
+        if (s->mon) {
+            monitor_resume(s->mon);
+        }
     }
 
-    if (s->fd != -1)
+    if (s->fd != -1) {
         close(s->fd);
-
-    /* Don't resume monitor until we've flushed all of the buffers */
-    if (s->mon) {
-        monitor_resume(s->mon);
+        s->fd = -1;
     }
 
-    s->fd = -1;
-
     return ret;
 }
 
@@ -330,9 +329,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
     if (ret == -EAGAIN) {
         qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
     } else if (ret < 0) {
-        if (s->mon) {
-            monitor_resume(s->mon);
-        }
         s->state = MIG_STATE_ERROR;
         notifier_list_notify(&migration_state_notifiers, NULL);
     }
@@ -458,6 +454,9 @@ int migrate_fd_close(void *opaque)
 {
     FdMigrationState *s = opaque;
 
+    if (s->mon) {
+        monitor_resume(s->mon);
+    }
     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
     return s->close(s);
 }


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
  2011-08-04 22:19       ` Jan Kiszka
@ 2011-08-05  6:51         ` Michael Tokarev
  2011-08-05  7:11           ` Jan Kiszka
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2011-08-05  6:51 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: mtosatti, qemu-devel, Luiz Capitulino

05.08.2011 02:19, Jan Kiszka wrote:
[]
> Resume on error in migrate_fd_put_buffer raced with migrate_fd_cleanup
> triggered via migrate_fd_put_ready called from migrate_fd_connect.
> 
> This migration code is a horrible maze. Patch below tries to move
> monitor_resume a bit out of this. Please check if it works for you, then
> I'll send it out properly.

Yes it's a maze.

The patch was mime-damaged, I had to apply it manually, but it
wasn't difficult (since it's understandable what it does etc).

And now I can't trigger the original problem anymore, with any
of my variants.

Here we go:
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:cat > /tmp/foo"
(qemu) migrate "exec:cat > /tmp/foo"
(qemu) migrate "exec:foo"
sh: foo: not found
(qemu) migrate "exec:foo"
sh: foo: not found

(qemu) q

As you can see, I can hit either of the two cases - with
and without extra newline, and in both cases (and in
successful case too) it works correctly.

There's a difference still between the two - namely that
extra newline - but that's something else and merely
cosmetic.

I also verified that -incoming migration works as expected,
in both failure and success cases - and indeed it works.

Speaking of the patch: shouldn't migrate_fd_close() be
called from migrate_fd_cleanup(), or vise versa, or both
be combined into one?  In migrate_fd_cleanup(), the new
logic is not clear still.  New version of it:

int migrate_fd_cleanup(FdMigrationState *s)
{
    int ret = 0;

    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);

    if (s->file) {
        DPRINTF("closing file\n");
        if (qemu_fclose(s->file) != 0) {
            ret = -1;
        }
        s->file = NULL;
    } else {
        if (s->mon) {
            monitor_resume(s->mon);
        }
    }

    if (s->fd != -1) {
        close(s->fd);
        s->fd = -1;
    }

    return ret;
}

Why it's EITHER s->file OR s->mon, but not both?  Shouldn't
the s->mon thing be unconditional?

And the whole thing is waiting coroutine conversion badly,
since the logic is indeed a maze ;)

Thank you!

/mjt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
  2011-08-05  6:51         ` Michael Tokarev
@ 2011-08-05  7:11           ` Jan Kiszka
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Kiszka @ 2011-08-05  7:11 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: mtosatti, qemu-devel, Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 5529 bytes --]

On 2011-08-05 08:51, Michael Tokarev wrote:
> 05.08.2011 02:19, Jan Kiszka wrote:
> []
>> Resume on error in migrate_fd_put_buffer raced with migrate_fd_cleanup
>> triggered via migrate_fd_put_ready called from migrate_fd_connect.
>>
>> This migration code is a horrible maze. Patch below tries to move
>> monitor_resume a bit out of this. Please check if it works for you, then
>> I'll send it out properly.
> 
> Yes it's a maze.
> 
> The patch was mime-damaged, I had to apply it manually, but it
> wasn't difficult (since it's understandable what it does etc).

Sorry, likely forgot to disable line-wrapping before hitting send.

> 
> And now I can't trigger the original problem anymore, with any
> of my variants.
> 
> Here we go:
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:cat > /tmp/foo"
> (qemu) migrate "exec:cat > /tmp/foo"
> (qemu) migrate "exec:foo"
> sh: foo: not found
> (qemu) migrate "exec:foo"
> sh: foo: not found
> 
> (qemu) q
> 
> As you can see, I can hit either of the two cases - with
> and without extra newline, and in both cases (and in
> successful case too) it works correctly.
> 
> There's a difference still between the two - namely that
> extra newline - but that's something else and merely
> cosmetic.
> 
> I also verified that -incoming migration works as expected,
> in both failure and success cases - and indeed it works.
> 
> Speaking of the patch: shouldn't migrate_fd_close() be
> called from migrate_fd_cleanup(),

Maybe. But it requires careful review what the migration close callbacks
are doing and if that is always equivalent (or harmless).

> or vise versa, or both
> be combined into one?  In migrate_fd_cleanup(), the new
> logic is not clear still.  New version of it:
> 
> int migrate_fd_cleanup(FdMigrationState *s)
> {
>     int ret = 0;
> 
>     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
> 
>     if (s->file) {
>         DPRINTF("closing file\n");
>         if (qemu_fclose(s->file) != 0) {
>             ret = -1;
>         }
>         s->file = NULL;
>     } else {
>         if (s->mon) {
>             monitor_resume(s->mon);
>         }
>     }
> 
>     if (s->fd != -1) {
>         close(s->fd);
>         s->fd = -1;
>     }
> 
>     return ret;
> }
> 
> Why it's EITHER s->file OR s->mon, but not both?  Shouldn't
> the s->mon thing be unconditional?

See description of the patch I just sent out.

> 
> And the whole thing is waiting coroutine conversion badly,
> since the logic is indeed a maze ;)

Yeah, though I think this will not magically remove all windings here.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
  2011-08-03  7:38       ` Michael Tokarev
@ 2011-08-03 13:22         ` Jan Kiszka
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Kiszka @ 2011-08-03 13:22 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Marcelo Tosatti, qemu-devel

On 2011-08-03 09:38, Michael Tokarev wrote:
> So, can we decide on this somehow?  I don't see a code
> path where we don't call monitor_resume at the end,
> so the "intermediate" monitor_resume can be dropped.
> This way we fix real bug.  If there will be other
> problem from that, it can be fixed later - this will
> mean that code path is found...

I do not see any reason for the spurious resume as well, so you may add my

Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>

> 
> Should I resend the initial patch again?

May help to finally trigger the merge. Direct it to Luiz Capitulino as
the subsystem maintainer.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
  2011-07-20 22:06     ` Jan Kiszka
@ 2011-08-03  7:38       ` Michael Tokarev
  2011-08-03 13:22         ` Jan Kiszka
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2011-08-03  7:38 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Marcelo Tosatti, qemu-devel

So, can we decide on this somehow?  I don't see a code
path where we don't call monitor_resume at the end,
so the "intermediate" monitor_resume can be dropped.
This way we fix real bug.  If there will be other
problem from that, it can be fixed later - this will
mean that code path is found...

Should I resend the initial patch again?

Thanks,

/mjt

21.07.2011 02:06, Jan Kiszka wriote:
> On 2011-07-20 18:34, Marcelo Tosatti wrote:
>> On Tue, Jul 19, 2011 at 11:48:22PM +0200, Jan Kiszka wrote:
>>> On 2011-07-19 13:46, Michael Tokarev wrote:
>>>> If we do, it results in double monitor_resume() (second being called
>>>> from migrate_fd_cleanup() anyway) and monitor suspend count becoming
>>>> negative.
>>>>
>>>> Cc'ing people from `git blame' list for the lines in question: the
>>>> change fixes the problem but I'm not sure what the original intention
>>>> of this code was in this place.  Unfortunately noone replied to two
>>>> my attempts to raise this issue.
>>>>
>>>> Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
>>>> ---
>>>>  migration.c |    3 ---
>>>>  1 files changed, 0 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/migration.c b/migration.c
>>>> index af3a1f2..115588c 100644
>>>> --- a/migration.c
>>>> +++ b/migration.c
>>>> @@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
>>>>      if (ret == -EAGAIN) {
>>>>          qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
>>>>      } else if (ret < 0) {
>>>> -        if (s->mon) {
>>>> -            monitor_resume(s->mon);
>>>> -        }
>>>>          s->state = MIG_STATE_ERROR;
>>>>          notifier_list_notify(&migration_state_notifiers);
>>>>      }
>>>
>>> Looks reasonable to me, but Marcelo should comment on this, specifically
>>> which scenario once required the resume.
>>>
>>> Jan
>>
>> If the monitor was suspended (migrate without -d), then this path must
>> resume. Should record that somewhere and check here.
> 
> It's clear that we need to resume. The question is in which case
> migrate_fd_cleanup may not be called after an error in
> migrate_fd_put_buffer.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
  2011-07-20 16:34   ` Marcelo Tosatti
@ 2011-07-20 22:06     ` Jan Kiszka
  2011-08-03  7:38       ` Michael Tokarev
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kiszka @ 2011-07-20 22:06 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Michael Tokarev, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1741 bytes --]

On 2011-07-20 18:34, Marcelo Tosatti wrote:
> On Tue, Jul 19, 2011 at 11:48:22PM +0200, Jan Kiszka wrote:
>> On 2011-07-19 13:46, Michael Tokarev wrote:
>>> If we do, it results in double monitor_resume() (second being called
>>> from migrate_fd_cleanup() anyway) and monitor suspend count becoming
>>> negative.
>>>
>>> Cc'ing people from `git blame' list for the lines in question: the
>>> change fixes the problem but I'm not sure what the original intention
>>> of this code was in this place.  Unfortunately noone replied to two
>>> my attempts to raise this issue.
>>>
>>> Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
>>> ---
>>>  migration.c |    3 ---
>>>  1 files changed, 0 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/migration.c b/migration.c
>>> index af3a1f2..115588c 100644
>>> --- a/migration.c
>>> +++ b/migration.c
>>> @@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
>>>      if (ret == -EAGAIN) {
>>>          qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
>>>      } else if (ret < 0) {
>>> -        if (s->mon) {
>>> -            monitor_resume(s->mon);
>>> -        }
>>>          s->state = MIG_STATE_ERROR;
>>>          notifier_list_notify(&migration_state_notifiers);
>>>      }
>>
>> Looks reasonable to me, but Marcelo should comment on this, specifically
>> which scenario once required the resume.
>>
>> Jan
> 
> If the monitor was suspended (migrate without -d), then this path must
> resume. Should record that somewhere and check here.

It's clear that we need to resume. The question is in which case
migrate_fd_cleanup may not be called after an error in
migrate_fd_put_buffer.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
  2011-07-19 21:48 ` Jan Kiszka
@ 2011-07-20 16:34   ` Marcelo Tosatti
  2011-07-20 22:06     ` Jan Kiszka
  0 siblings, 1 reply; 13+ messages in thread
From: Marcelo Tosatti @ 2011-07-20 16:34 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Michael Tokarev, qemu-devel

On Tue, Jul 19, 2011 at 11:48:22PM +0200, Jan Kiszka wrote:
> On 2011-07-19 13:46, Michael Tokarev wrote:
> > If we do, it results in double monitor_resume() (second being called
> > from migrate_fd_cleanup() anyway) and monitor suspend count becoming
> > negative.
> > 
> > Cc'ing people from `git blame' list for the lines in question: the
> > change fixes the problem but I'm not sure what the original intention
> > of this code was in this place.  Unfortunately noone replied to two
> > my attempts to raise this issue.
> > 
> > Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
> > ---
> >  migration.c |    3 ---
> >  1 files changed, 0 insertions(+), 3 deletions(-)
> > 
> > diff --git a/migration.c b/migration.c
> > index af3a1f2..115588c 100644
> > --- a/migration.c
> > +++ b/migration.c
> > @@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
> >      if (ret == -EAGAIN) {
> >          qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
> >      } else if (ret < 0) {
> > -        if (s->mon) {
> > -            monitor_resume(s->mon);
> > -        }
> >          s->state = MIG_STATE_ERROR;
> >          notifier_list_notify(&migration_state_notifiers);
> >      }
> 
> Looks reasonable to me, but Marcelo should comment on this, specifically
> which scenario once required the resume.
> 
> Jan

If the monitor was suspended (migrate without -d), then this path must
resume. Should record that somewhere and check here.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
  2011-07-19 11:46 Michael Tokarev
@ 2011-07-19 21:48 ` Jan Kiszka
  2011-07-20 16:34   ` Marcelo Tosatti
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kiszka @ 2011-07-19 21:48 UTC (permalink / raw)
  To: Michael Tokarev, Marcelo Tosatti; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]

On 2011-07-19 13:46, Michael Tokarev wrote:
> If we do, it results in double monitor_resume() (second being called
> from migrate_fd_cleanup() anyway) and monitor suspend count becoming
> negative.
> 
> Cc'ing people from `git blame' list for the lines in question: the
> change fixes the problem but I'm not sure what the original intention
> of this code was in this place.  Unfortunately noone replied to two
> my attempts to raise this issue.
> 
> Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
> ---
>  migration.c |    3 ---
>  1 files changed, 0 insertions(+), 3 deletions(-)
> 
> diff --git a/migration.c b/migration.c
> index af3a1f2..115588c 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
>      if (ret == -EAGAIN) {
>          qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
>      } else if (ret < 0) {
> -        if (s->mon) {
> -            monitor_resume(s->mon);
> -        }
>          s->state = MIG_STATE_ERROR;
>          notifier_list_notify(&migration_state_notifiers);
>      }

Looks reasonable to me, but Marcelo should comment on this, specifically
which scenario once required the resume.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
@ 2011-07-19 11:46 Michael Tokarev
  2011-07-19 21:48 ` Jan Kiszka
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2011-07-19 11:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Michael Tokarev, Marcelo Tosatti, Jan Kiszka

If we do, it results in double monitor_resume() (second being called
from migrate_fd_cleanup() anyway) and monitor suspend count becoming
negative.

Cc'ing people from `git blame' list for the lines in question: the
change fixes the problem but I'm not sure what the original intention
of this code was in this place.  Unfortunately noone replied to two
my attempts to raise this issue.

Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
---
 migration.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/migration.c b/migration.c
index af3a1f2..115588c 100644
--- a/migration.c
+++ b/migration.c
@@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
     if (ret == -EAGAIN) {
         qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
     } else if (ret < 0) {
-        if (s->mon) {
-            monitor_resume(s->mon);
-        }
         s->state = MIG_STATE_ERROR;
         notifier_list_notify(&migration_state_notifiers);
     }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-08-05  7:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-03 14:51 [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path Michael Tokarev
2011-08-04 19:24 ` Luiz Capitulino
2011-08-04 19:52   ` Michael Tokarev
2011-08-04 20:00     ` Michael Tokarev
2011-08-04 22:19       ` Jan Kiszka
2011-08-05  6:51         ` Michael Tokarev
2011-08-05  7:11           ` Jan Kiszka
  -- strict thread matches above, loose matches on Subject: below --
2011-07-19 11:46 Michael Tokarev
2011-07-19 21:48 ` Jan Kiszka
2011-07-20 16:34   ` Marcelo Tosatti
2011-07-20 22:06     ` Jan Kiszka
2011-08-03  7:38       ` Michael Tokarev
2011-08-03 13:22         ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.