Re: [PATCH 0/2] i386: Fix interrupt based Async PF enablement

From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	qemu-devel@nongnu.org,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 0/2] i386: Fix interrupt based Async PF enablement
Date: Wed, 21 Apr 2021 11:48:41 +0200	[thread overview]
Message-ID: <87tuo0ge86.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <YH/yW+mVgvGHXXmW@redhat.com>

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Apr 21, 2021 at 11:29:45AM +0200, Vitaly Kuznetsov wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > On Wed, Apr 21, 2021 at 10:38:06AM +0200, Vitaly Kuznetsov wrote:
>> >> Eduardo Habkost <ehabkost@redhat.com> writes:
>> >> 
>> >> > On Thu, Apr 15, 2021 at 08:14:30PM +0100, Dr. David Alan Gilbert wrote:
>> >> >> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>> >> >> > On 06/04/21 13:42, Vitaly Kuznetsov wrote:
>> >> >> > > older machine types are still available (I disable it for <= 5.1 but we
>> >> >> > > can consider disabling it for 5.2 too). The feature is upstream since
>> >> >> > > Linux 5.8, I know that QEMU supports much older kernels but this doesn't
>> >> >> > > probably mean that we can't enable new KVM PV features unless all
>> >> >> > > supported kernels have it, we'd have to wait many years otherwise.
>> >> >> > 
>> >> >> > Yes, this is a known problem in fact. :(  In 6.0 we even support RHEL 7,
>> >> >> > though that will go away in 6.1.
>> >> >> > 
>> >> >> > We should take the occasion of dropping RHEL7 to be clearer about which
>> >> >> > kernels are supported.
>> >> >> 
>> >> >> It would be nice to be able to define sets of KVM functonality that we
>> >> >> can either start given machine types with, or provide a separate switch
>> >> >> to limit kvm functionality back to some defined point.  We do trip over
>> >> >> the same things pretty regularly when accidentally turning on new
>> >> >> features.
>> >> >
>> >> > The same idea can apply to the hyperv=on stuff Vitaly is working
>> >> > on.  Maybe we should consider making a generic version of the
>> >> > s390x FeatGroup code, use it to define convenient sets of KVM and
>> >> > hyperv features.
>> >> 
>> >> True, the more I look at PV features enablement, the more I think that
>> >> we're missing something important in the logic. All machine types we
>> >> have are generally suposed to work with the oldest supported kernel so
>> >> we should wait many years before enabling some of the new PV features
>> >> (KVM or Hyper-V) by default.
>> >> 
>> >> This also links to our parallel discussion regarding migration
>> >> policies. Currently, we can't enable PV features by default based on
>> >> their availability on the host because of migration, the set may differ
>> >> on the destination host. What if we introduce (and maybe even switch to
>> >> it by default) something like
>> >> 
>> >>  -migratable opportunistic (stupid name, I know)
>> >> 
>> >> which would allow to enable all features supported by the source host
>> >> and then somehow checking that the destination host has them all. This
>> >> would effectively mean that it is possible to migrate a VM to a
>> >> same-or-newer software (both kernel an QEMU) but not the other way
>> >> around. This may be a reasonable choice.
>> >
>> > I don't think this is usable in pratice. Any large cloud or data center
>> > mgmt app using QEMU relies on migration, so can't opportunistically
>> > use arbitrary new features. They can only use features in the oldest
>> > kernel their deployment cares about. This can be newer than the oldest
>> > that QEMU supports, but still older than the newest that exists.
>> >
>> > ie we have situation where:
>> >
>> >  - QEMU upstream minimum host is version 7
>> >  - Latest possible host is version 45
>> >  - A particular deployment has a mixture of hosts at version 24 and 37
>> >
>> > "-migratable opportunistic"  would let QEMU use features from version 37
>> > despite the deployment needing compatibility with host version 24 still.
>> >
>> 
>> True; I was not really thinking about 'big' clouds/data centers, these
>> should have enough resources to carefully set all the required features
>> and not rely on the 'default'. My thoughts were around using migration
>> for host upgrade on smaller (several hosts) deployments and in this case
>> it's probably fairly reasonable to require to start with the oldest host
>> and upgrade them all if getting new features is one of the upgrade goals.
>
>
>> > It is almost as if we need to have a way to explicitly express a minimum
>> > required host version that VM requires compatibility with, so deployments
>> > can set their own baseline that is newer than QEMU minimum.
>> 
>> Yes, maybe, but setting the baseline is also a non-trivial task:
>> e.g. how would users know which PV features they can enable without
>> going through Linux kernel logs or just trying them on the oldest kernel
>> they need? This should probably be solved by some upper layer management
>> app which would collect feature sets from all hosts and come up with a
>> common subset. I'm not sure if this is done by some tools already.
>
> I specifically didn't talk in terms of features, because the problem you
> describe is unreasonable to push onto applications.
>
> Rather QEMU could express host baseline
>
>    - "host-v1"  - features A and B
>    - "host-v2"  - features A, B and C
>    - "host-v3"  - features A, B, C, D, E and f
>
> The mgmt app / admin only has to know which QEMU host baselines their
> hosts support.
>
> Essentially this could be viewed as separating the host kernel dependant
> bits out of the machine type, into a separate configuration axis.

In case we only think about upstream kernels and assuming PV features
never go away that coud work. Distro kernels, however, exist too and
feature backports are common, so which version should I declare when my
kernel has e.g. features A, B and E ? (There used to be
KVM_GET_API_VERSION ioctl but then we switched to CAPs and this happened
for a reason.)

Personaly, I'd vote for having individual PV features in the config if
it ever gets introduced.

-- 
Vitaly