All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <George.Dunlap@eu.citrix.com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Yongjie Ren <yongjie.ren@intel.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Keir Fraser <keir@xen.org>, Hanweidong <hanweidong@huawei.com>,
	Xudong Hao <xudong.hao@intel.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Tim Deegan <tim@xen.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Yanqiangjun <yanqiangjun@huawei.com>,
	Wangzhenguo <wangzhenguo@huawei.com>,
	YangXiaowei <xiaowei.yang@huawei.com>,
	"Gonglei (Arei)" <arei.gonglei@huawei.com>,
	Jan Beulich <JBeulich@suse.com>,
	YongweiX Xu <yongweix.xu@intel.com>,
	Luonengjun <luonengjun@huawei.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	SongtaoX Liu <songtaox.liu@intel.com>
Subject: Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M
Date: Fri, 14 Jun 2013 11:53:29 +0100	[thread overview]
Message-ID: <CAFLBxZbJn6xYUrNY+Yj026Vm8JH+guF69gXr+unc-DoOKiNzqQ@mail.gmail.com> (raw)
In-Reply-To: <1371144138.6955.59.camel@zakaz.uk.xensource.com>

On Thu, Jun 13, 2013 at 6:22 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Thu, 2013-06-13 at 17:55 +0100, Stefano Stabellini wrote:
>
>> > > We could have a xenstore flag somewhere that enables the old behaviour
>> > > so that people can revert back to qemu-xen-traditional and make the pci
>> > > hole below 4G even bigger than 448MB, but I think that keeping the old
>> > > behaviour around is going to make the code more difficult to maintain.
>> >
>> > The downside of that is that things which worked with the old scheme may
>> > not work with the new one though. Early in a release cycle when we have
>> > time to discover what has broken then that might be OK, but is post rc4
>> > really the time to be risking it?
>>
>> Yes, you are right: there are some scenarios that would have worked
>> before that wouldn't work anymore with the new scheme.
>> Are they important enough to have a workaround, pretty difficult to
>> identify for a user?
>
> That question would be reasonable early in the development cycle. At rc4
> the question should be: do we think this problem is so critical that we
> want to risk breaking something else which currently works for people.
>
> Remember that we are invalidating whatever passthrough testing people
> have already done up to this point of the release.
>
> It is also worth noting that the things which this change ends up
> breaking may for all we know be equally difficult for a user to identify
> (they are after all approximately the same class of issue).
>
> The problem here is that the risk is difficult to evaluate, we just
> don't know what will break with this change, and we don't know therefore
> if the cure is worse than the disease. The conservative approach at this
> point in the release would be to not change anything, or to change the
> minimal possible number of things (which would preclude changes which
> impact qemu-trad IMHO).
>


> WRT pretty difficult to identify -- the root of this thread suggests the
> guest entered a reboot loop with "No bootable device", that sounds
> eminently release notable to me. I also not that it was changing the
> size of the PCI hole which caused the issue -- which does somewhat
> underscore the risks involved in this sort of change.

But that bug was a bug in the first attempt to fix the root problem.
The root problem shows up as qemu crashing at some point because it
tried to access invalid guest gpfn space; see
http://lists.xen.org/archives/html/xen-devel/2013-03/msg00559.html.

Stefano tried to fix it with the above patch, just changing the hole
to start at 0xe; but that was incomplete, as it didn't match with
hvmloader and seabios's view of the world.  That's what this bug
report is about.  This thread is an attempt to find a better fix.

So the root problem is that if we revert this patch, and someone
passes through a pci device using qemu-xen (the default) and the MMIO
hole is resized, at some point in the future qemu will randomly die.

If it's a choice between users experiencing, "My VM randomly crashes"
and experiencing, "I tried to pass through this device but the guest
OS doesn't see it", I'd rather choose the latter.

 -George

WARNING: multiple messages have this Message-ID (diff)
From: George Dunlap <George.Dunlap@eu.citrix.com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Yongjie Ren <yongjie.ren@intel.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Keir Fraser <keir@xen.org>, Hanweidong <hanweidong@huawei.com>,
	Xudong Hao <xudong.hao@intel.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Tim Deegan <tim@xen.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Yanqiangjun <yanqiangjun@huawei.com>,
	Wangzhenguo <wangzhenguo@huawei.com>,
	YangXiaowei <xiaowei.yang@huawei.com>,
	"Gonglei (Arei)" <arei.gonglei@huawei.com>,
	Jan Beulich <JBeulich@suse.com>,
	YongweiX Xu <yongweix.xu@intel.com>,
	Luonengjun <luonengjun@huawei.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	SongtaoX Liu <songtaox.liu@intel.com>
Subject: Re: [BUG 1747]Guest could't find bootable device with memory more than 3600M
Date: Fri, 14 Jun 2013 11:53:29 +0100	[thread overview]
Message-ID: <CAFLBxZbJn6xYUrNY+Yj026Vm8JH+guF69gXr+unc-DoOKiNzqQ@mail.gmail.com> (raw)
In-Reply-To: <1371144138.6955.59.camel@zakaz.uk.xensource.com>

On Thu, Jun 13, 2013 at 6:22 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Thu, 2013-06-13 at 17:55 +0100, Stefano Stabellini wrote:
>
>> > > We could have a xenstore flag somewhere that enables the old behaviour
>> > > so that people can revert back to qemu-xen-traditional and make the pci
>> > > hole below 4G even bigger than 448MB, but I think that keeping the old
>> > > behaviour around is going to make the code more difficult to maintain.
>> >
>> > The downside of that is that things which worked with the old scheme may
>> > not work with the new one though. Early in a release cycle when we have
>> > time to discover what has broken then that might be OK, but is post rc4
>> > really the time to be risking it?
>>
>> Yes, you are right: there are some scenarios that would have worked
>> before that wouldn't work anymore with the new scheme.
>> Are they important enough to have a workaround, pretty difficult to
>> identify for a user?
>
> That question would be reasonable early in the development cycle. At rc4
> the question should be: do we think this problem is so critical that we
> want to risk breaking something else which currently works for people.
>
> Remember that we are invalidating whatever passthrough testing people
> have already done up to this point of the release.
>
> It is also worth noting that the things which this change ends up
> breaking may for all we know be equally difficult for a user to identify
> (they are after all approximately the same class of issue).
>
> The problem here is that the risk is difficult to evaluate, we just
> don't know what will break with this change, and we don't know therefore
> if the cure is worse than the disease. The conservative approach at this
> point in the release would be to not change anything, or to change the
> minimal possible number of things (which would preclude changes which
> impact qemu-trad IMHO).
>


> WRT pretty difficult to identify -- the root of this thread suggests the
> guest entered a reboot loop with "No bootable device", that sounds
> eminently release notable to me. I also not that it was changing the
> size of the PCI hole which caused the issue -- which does somewhat
> underscore the risks involved in this sort of change.

But that bug was a bug in the first attempt to fix the root problem.
The root problem shows up as qemu crashing at some point because it
tried to access invalid guest gpfn space; see
http://lists.xen.org/archives/html/xen-devel/2013-03/msg00559.html.

Stefano tried to fix it with the above patch, just changing the hole
to start at 0xe; but that was incomplete, as it didn't match with
hvmloader and seabios's view of the world.  That's what this bug
report is about.  This thread is an attempt to find a better fix.

So the root problem is that if we revert this patch, and someone
passes through a pci device using qemu-xen (the default) and the MMIO
hole is resized, at some point in the future qemu will randomly die.

If it's a choice between users experiencing, "My VM randomly crashes"
and experiencing, "I tried to pass through this device but the guest
OS doesn't see it", I'd rather choose the latter.

 -George

  reply	other threads:[~2013-06-14 11:06 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-07  9:20 [BUG 1747]Guest could't find bootable device with memory more than 3600M Xu, YongweiX
2013-06-07 12:15 ` Stefano Stabellini
2013-06-07 15:42   ` George Dunlap
2013-06-07 15:56     ` Stefano Stabellini
2013-06-08  7:27       ` Hao, Xudong
2013-06-10 11:49         ` George Dunlap
2013-06-11 17:26           ` [Qemu-devel] [Xen-devel] " Stefano Stabellini
2013-06-11 17:26             ` Stefano Stabellini
2013-06-12  7:25             ` [Qemu-devel] " Jan Beulich
2013-06-12  7:25               ` Jan Beulich
2013-06-12  8:31               ` [Qemu-devel] " Ian Campbell
2013-06-12  8:31                 ` Ian Campbell
2013-06-12  9:02                 ` [Qemu-devel] " Jan Beulich
2013-06-12  9:02                   ` Jan Beulich
2013-06-12  9:22                   ` [Qemu-devel] " Ian Campbell
2013-06-12  9:22                     ` Ian Campbell
2013-06-12 10:07                     ` [Qemu-devel] [Xen-devel] " Jan Beulich
2013-06-12 10:07                       ` Jan Beulich
2013-06-12 11:23                       ` [Qemu-devel] " Ian Campbell
2013-06-12 11:23                         ` Ian Campbell
2013-06-12 11:56                         ` [Qemu-devel] " Jan Beulich
2013-06-12 11:56                           ` Jan Beulich
2013-06-12 11:59                           ` [Qemu-devel] " Ian Campbell
2013-06-12 11:59                             ` Ian Campbell
2013-06-12 10:05               ` [Qemu-devel] " George Dunlap
2013-06-12 10:05                 ` George Dunlap
2013-06-12 10:11                 ` [Qemu-devel] " Jan Beulich
2013-06-12 10:11                   ` Jan Beulich
2013-06-12 10:15                   ` [Qemu-devel] " George Dunlap
2013-06-12 10:15                     ` George Dunlap
2013-06-12 13:23                 ` [Qemu-devel] " Paolo Bonzini
2013-06-12 13:23                   ` Paolo Bonzini
2013-06-12 13:49                   ` [Qemu-devel] " Jan Beulich
2013-06-12 13:49                     ` Jan Beulich
2013-06-12 14:02                     ` [Qemu-devel] " Paolo Bonzini
2013-06-12 14:02                       ` Paolo Bonzini
2013-06-12 14:19                       ` [Qemu-devel] " Jan Beulich
2013-06-12 14:19                         ` Jan Beulich
2013-06-12 15:25                         ` [Qemu-devel] " George Dunlap
2013-06-12 15:25                           ` George Dunlap
2013-06-12 20:13                           ` [Qemu-devel] " Paolo Bonzini
2013-06-12 20:13                             ` Paolo Bonzini
2013-06-13 13:44                 ` [Qemu-devel] " Stefano Stabellini
2013-06-13 13:44                   ` Stefano Stabellini
2013-06-13 13:54                   ` [Qemu-devel] " George Dunlap
2013-06-13 13:54                     ` George Dunlap
2013-06-13 14:50                     ` [Qemu-devel] " Stefano Stabellini
2013-06-13 14:50                       ` Stefano Stabellini
2013-06-13 15:06                       ` [Qemu-devel] [Xen-devel] " Jan Beulich
2013-06-13 15:06                         ` Jan Beulich
2013-06-13 15:29                       ` [Qemu-devel] [Xen-devel] " George Dunlap
2013-06-13 15:29                         ` George Dunlap
2013-06-13 16:13                         ` [Qemu-devel] " Stefano Stabellini
2013-06-13 16:13                           ` Stefano Stabellini
2013-06-13 15:34                       ` [Qemu-devel] " Ian Campbell
2013-06-13 15:34                         ` Ian Campbell
2013-06-13 16:55                         ` [Qemu-devel] " Stefano Stabellini
2013-06-13 16:55                           ` Stefano Stabellini
2013-06-13 17:22                           ` [Qemu-devel] " Ian Campbell
2013-06-13 17:22                             ` Ian Campbell
2013-06-14 10:53                             ` George Dunlap [this message]
2013-06-14 10:53                               ` George Dunlap
2013-06-14 11:34                               ` [Qemu-devel] [Xen-devel] " Ian Campbell
2013-06-14 11:34                                 ` Ian Campbell
2013-06-14 14:14                                 ` [Qemu-devel] " George Dunlap
2013-06-14 14:14                                   ` George Dunlap
2013-06-14 14:36                                   ` [Qemu-devel] " George Dunlap
2013-06-14 14:36                                     ` George Dunlap
2013-06-13 14:54                     ` [Qemu-devel] " Paolo Bonzini
2013-06-13 14:54                       ` Paolo Bonzini
2013-06-13 15:16                     ` [Qemu-devel] [Xen-devel] " Ian Campbell
2013-06-13 15:16                       ` Ian Campbell
2013-06-13 15:30                       ` [Qemu-devel] [Xen-devel] " George Dunlap
2013-06-13 15:30                         ` George Dunlap
2013-06-13 15:36                         ` [Qemu-devel] " Ian Campbell
2013-06-13 15:36                           ` Ian Campbell
2013-06-13 15:40                           ` [Qemu-devel] " George Dunlap
2013-06-13 15:40                             ` George Dunlap
2013-06-13 15:42                             ` [Qemu-devel] " Ian Campbell
2013-06-13 15:42                               ` Ian Campbell
2013-06-13 15:40                       ` [Qemu-devel] " Stefano Stabellini
2013-06-13 15:40                         ` Stefano Stabellini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFLBxZbJn6xYUrNY+Yj026Vm8JH+guF69gXr+unc-DoOKiNzqQ@mail.gmail.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=arei.gonglei@huawei.com \
    --cc=hanweidong@huawei.com \
    --cc=keir@xen.org \
    --cc=luonengjun@huawei.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=songtaox.liu@intel.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=tim@xen.org \
    --cc=wangzhenguo@huawei.com \
    --cc=xen-devel@lists.xensource.com \
    --cc=xiaowei.yang@huawei.com \
    --cc=xudong.hao@intel.com \
    --cc=yanqiangjun@huawei.com \
    --cc=yongjie.ren@intel.com \
    --cc=yongweix.xu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.