Re: [Qemu-devel] [PATCH 2/2] qdev: safely fail device_add if unable to allocate device

From: "Andreas Färber" <afaerber@suse.de>
To: Markus Armbruster <armbru@redhat.com>,
	Eric Blake <eblake@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>
Cc: pbonzini@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 2/2] qdev: safely fail device_add if unable to allocate device
Date: Mon, 11 Jan 2016 17:04:01 +0100	[thread overview]
Message-ID: <5693D271.3090608@suse.de> (raw)
In-Reply-To: <87wpsbwj9k.fsf@blackfin.pond.sub.org>

Am 18.12.2015 um 22:15 schrieb Markus Armbruster:
> Eric Blake <eblake@redhat.com> writes:
>> On 12/18/2015 09:48 AM, Daniel P. Berrange wrote:
>>> On Fri, Dec 18, 2015 at 04:30:47PM +0100, Igor Mammedov wrote:
>>>> qdev_device_add() currently uses object_new() which
>>>> will abort if there memory allocation for device instance
>>>> fails. While it's fine it startup, it is not desirable
>>>> diring hotplug.
>>>>
>>>> Try to allocate memory for object first and fail safely
>>>> if allocation fails.
>>
>>> This just avoids one small malloc failure.
>>>
>>>> +    object_initialize(dev, obj_size, driver);
>>>
>>> This is going to call g_new many more times, so you'll
>>> still hit OOM almost immediately. eg the call to
>>> g_hash_table_new_full() in object_initialize_with_type
>>> will abort on OOM, not to mention anything run in a
>>> instance constructor function registered against the
>>> class. There's no way to avoid this given that we have
>>> chosen to use GLib in QEMU, so I don't really see any
>>> point in replacing the 'object_new' call with g_try_malloc
> 
> Seconded.
> 
>> We just had a recent thread on it, and I tend to agree that this series
>> isn't helpful.
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2015-12/threads.html#01238
> 
> Plenty of arguments there why recovering from scattered random
> allocation failures isn't useful, and recovering from all of them isn't
> practical.  Limit recovery attempts to big allocations, and spend (some
> of) the saved cycles on actually testing the recovery code.

Jumping in late... I had done work into this direction around 2012/2013
and I believe that changing the hotplug allocation is the right thing to
do here.

http://patchwork.ozlabs.org/patch/269595/

Igor's error handling could still use some improvement (at the time we
did not have the out argument) and my suggested solution was a similar
one to error_abort (pre-allocation and special handling), to avoid
allocation of Error objects and strings on OOM.

Daniel, any allocations other than the core QOM API inside an
instance_init function are plain wrong, has been pointed out to patch
authors and is not an argument for not handling hot-plug errors/design
properly. My huge QOM conversions always had in mind that instance_init
cannot do random allocations and moved them to realize, which may fail,
including for OOM, and should be handled gracefully for hot-plug. For
startup I don't see it as critical - it may be inconvenient not to get a
nice error message but you don't risk losing the guest's data.

Now to the claim that this is just a small allocation. If some 20-char
string allocation fails, there's little QEMU can realistically do
recovery-wise. But a PowerPCCPU device for instance comes with huge
multi-level instruction-dispatch tables even in KVM mode. Therefore my
opinion that this user-initiated scenario and thus code location is
exactly the one we need to handle, even if many others remain unhandled.
It's also why CPU hot-plug needs to take QOM design into account, and
proposals parametrizing core count etc. make size precalculation tricky.

Regards,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)