Re: [PATCH v2 1/5] qemu/qarray.h: introduce QArray

From: Christian Schoenebeck <qemu_oss@crudebyte.com>
To: qemu-devel@nongnu.org, "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Alex Bennée" <alex.bennee@linaro.org>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Greg Kurz" <groug@kaod.org>
Subject: Re: [PATCH v2 1/5] qemu/qarray.h: introduce QArray
Date: Wed, 29 Sep 2021 19:32:39 +0200	[thread overview]
Message-ID: <3065428.eF6XsjkFDY@silver> (raw)
In-Reply-To: <YVNFlwiw9sJS4cea@redhat.com>

On Dienstag, 28. September 2021 18:41:17 CEST Daniel P. Berrangé wrote:
> On Tue, Sep 28, 2021 at 06:23:23PM +0200, Christian Schoenebeck wrote:
> > On Dienstag, 28. September 2021 15:04:36 CEST Daniel P. Berrangé wrote:
> > > On Sun, Aug 22, 2021 at 03:16:46PM +0200, Christian Schoenebeck wrote:
[...]
> > > If using QArray, it still has to keep passing around the
> > > 'size_t naddrs' value because QArray hides the length
> > > field from the code.
> > 
> > Well no, you don't need to pass around anything, as the array length is
> > always accessible; it is always just (compile time) constant -wordsize
> > (-padding) offset away from the C-array pointer. Maybe the phrasing
> > "private" was a bit misleading in the QArray.h comments.
> > 
> > It is correct that my 9p use case so far did not need the array length
> > info by means of accessing an API, for that reason I really just ommitted
> > (yet) to add a separate patch for that. All it would take was extending
> > QArray.h in a way like (roughly):
> > 
> > typedef struct _QArrayGeneric {
> > 
> >     size_t len;
> >     char first[];
> > 
> > } _QArrayGeneric;
> > 
> > /**
> > 
> >  * Returns the amount of scalar elements in the passed array.
> >  *
> >  * @param first - start of array
> >  */
> > 
> > size_t qarray_len(void* first)
> > {
> > 
> >     if (!first) {
> >     
> >         return 0;
> >     
> >     }
> >     _QArrayGeneric *arr = (_QArrayGeneric *) (
> >     
> >         ((char *)first) - offsetof(_QArrayGeneric, first)
> >     
> >     );
> >     return arr->len;
> > 
> > }
> > 
> > #define QARRAY_LEN(arr) qarray_len(arr)
> > 
> > And as this is generic code for all array scalar types, it would probably
> > be partly placed in a separate qarray.c file.
> > 
> > After that change your user example would become:
> >   for (i = 0; i < QARRAY_LEN(addrs); i++) {
> >   
> >       ...try to connect to addrs[i]...
> >   
> >   }
> > 
> > If you want I can post a v3 with a formal patch (or two) to handle that
> > array length API.
> 
> I still find this all overkill compared to just exposing the
> array struct explicitly.

Yes, you made that clear. :)

> > > If it instead just exposed the array struct explicitly, it can
> > > use the normal g_autoptr() declarator, and can also now just
> > > return the array directly since it is a single pointer
> > > 
> > >   int open_conn(const char *hostname) {
> > >   
> > >     g_autoptr(SocketAddressArray) addrs = NULL;
> > >     int ret = -1;
> > >     size_t i;
> > >     
> > >     if (!(addrs = resolve_hostname(hostname)))
> > >     
> > >         return -1;
> > >     
> > >     for (i = 0; i < addrs.len; i++) {
> > >     
> > >         ...try to connect to addrs.data[i]...
> > >     
> > >     }
> > >     
> > >     ret = 0
> > >    
> > >    cleanup:
> > >     return ret;
> > >   
> > >   }
> > > 
> > > In terms of the first example, it adds an indirection to access
> > > the array data, but on the plus side IMHO the code is clearer
> > > because it uses 'g_autoptr' which is what is more in line with
> > > what is expected for variables that are automatically freed.
> > > QArrayRef() as a name doesn't make it clear that the value will
> > > be freed.
> > > 
> > >    void doSomething(int n) {
> > >    
> > >        g_autoptr(FooArray) foos = NULL;
> > >        QARRAY_CREATE(Foo, foos, n);
> > >        for (size_t i = 0; i < foos.len; ++i) {
> > >        
> > >            foos.data[i].i = i;
> > >            foos.data[i].s = calloc(4096, 1);
> > >            snprintf(foos.data[i].s, 4096, "foo %d", i);
> > >        
> > >        }
> > >    
> > >    }
> > 
> > Well, that would destroy the intended major feature "as little refactoring
> > as possible". The amount of locations where you define a reference
> > variable is usually much smaller than the amount of code locations where
> > you actually access arrays.
> 
> If there's a large amount of existing code refactoring to be avoided
> an intermediate variable can be declared to point to the struct field
> to avoid the field references.

That would be one additional (unguarded) raw pointer variable per array & 
function, that multiplied by the amount of arrays and functions ...

... the suggested shared utility code is 34 lines LOC net.

> > Personally I would not mix in this case macros of foreign libraries (glib)
> > with macros of a local framework (QArray), because if for some reason one
> > of the two deviate in future in a certain way, you would need to refactor
> > a whole bunch of user code. By just separating those definitions from day
> > one, you can avoid such future refactoring work right from the start.
> 
> The GLib automatic memory support is explicitly designed to be extendd
> with support for application specific types. We already do exactly that
> all over QEMU with many calls to G_DEFINE_AUTOPTR_CLEANUP_FUNC(..) to
> register functions for free'ing specific types, such that you can
> use 'g_autoptr' with them.

Ok, just to make sure that I am not missing something here, because really if 
there is already something that does the job that I simply haven't seen, then 
I happily drop this QArray code.

But AFAICS this G_DEFINE_AUTOPTR_CLEANUP_FUNC() & g_autoptr concept does not 
have any notion of "size" or "amount", right?

So let's say you already have the following type and cleanup function in your 
existing code:

typedef struct MyScalar {
    int a;
    char *b;
} MyScalar;

void myscalar_free(MayScalar *s) {
    g_free(s->b);
}

Then if you want to use G_DEFINE_AUTOPTR_CLEANUP_FUNC() for an array on that 
scalar type, then you still would need to *manually* write additionally a 
separate type and cleanup function like:

typedef struct MyArray {
    MyScalar *s;
    int n;
};

void myarray_free(MyArray *a) {
    for (int i = 0; i < a->n; ++i) {
        myscalar_free(a->s[i]);
    }
    g_free(a);
}

Plus you have to manually populate that field 'n' after allocation.

Am I wrong?

> > As far as the terminology is concerned: probably a matter of taste. For me
> > a "reference" implies (either unique or shared) ownership, a "pointer"
> > IMO doesn't. And the usage of QArray alone makes it clear that an array
> > without any references gets automatically freed.
> 
> It is more important than a matter of taste - it is about having a
> consistent approach throughout QEMU. That means automatic free'ing of
> variables should involve g_autoptr, not something custom to a specific type
> with different terminology.

The barriers to add few lines of utility code are really high. :)

> > > I would also suggest that QARRAY_CREATE doesn't need to
> > > exist as a macro - callers could just use the allocator
> > > function directly for clearer code, if it was changed to
> > > 
> > > return the ptr rather than use an out parameter:
> > >    void doSomething(int n) {
> > >    
> > >        g_autoptr(FooArray) foos = foo_array_new(n);
> > >        for (size_t i = 0; i < foos.len; ++i) {
> > >        
> > >            foos.data[i].i = i;
> > >            foos.data[i].s = calloc(4096, 1);
> > >            snprintf(foos.data[i].s, 4096, "foo %d", i);
> > >        
> > >        }
> > >    
> > >    }
> > > 
> > > For this it needs to pass 2 args into the DECLARE_QARRAY_TYPE
> > > macro - the struct name and desired method name - basically
> > > the method name is the struct name in lowercase with underscores.
> > 
> > As you can see with patch 2, one of the movations of making this a macro
> > was> 
> > the intention to increase strictness of type safety, e.g to make things 
like:
> > 	void *p;
> > 	...
> > 	QARRAY_CREATE(FooType, p, n);
> > 
> > to raise a compiler error immediately, but that's not all ...
> > 
> > > Overall I think the goal of having an convenient sized array for
> > > types is good, but I think we should make it look a bit less
> > > magic. I think we only need the DECLARE_QARRAY_TYPE and
> > > DEFINE_QARRAY_TYPE macros.
> > 
> > ... actually making it appear anything like magic was not my intention.
> > The
> > actual main reason for wrapping these things into macros is because that's
> > actually the only way to write generic code in C. Especially in larger
> > projects like this one I favour clear separation of API ("how to use it")
> > from its actual implementation ("how does it do it exactly").
> > 
> > So if you use macros for all those things from the beginning, it is far
> > less likely that you will need to refactor a huge amount of user code
> > with future changes of this array framework.
> 
> I can't see the array framework being complex enough that it will be
> changed in a way that invalidates existing usage.

Well, there are some things that would come to my mind (e.g. strong vs. weak 
refs) , but I think for now my upper question is more important ATM, i.e. 
whether there is already something that does the job (right).

> > > Incidentally, I'd suggest naming to be QARRAY_DECLARE_TYPE
> > > and QARRAY_DEFINE_TYPE.
> > 
> > Also a matter of taste I guess. The suggested naming DECLARE_QARRAY_TYPE()
> > and DEFINE_QARRAY_TYPE() reflect more natural language IMO.
> 
> I consider the QEMU normal practice for namespacing types/macros/functions
> is to have the typename as the first component.
> 
> Regards,
> Daniel