Re: [Xen-devel] [RFC] Generating Go bindings for libxl

From: George Dunlap <george.dunlap@citrix.com>
To: Nicholas Rosbrook <rosbrookn@ainfosec.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: "anthony.perard@citrix.com" <anthony.perard@citrix.com>,
	"ian.jackson@eu.citrix.com" <ian.jackson@eu.citrix.com>,
	Brendan Kerrigan <kerriganb@ainfosec.com>,
	Nicolas Belouin <nicolas.belouin@gandi.net>,
	"wl@xen.org" <wl@xen.org>
Subject: Re: [Xen-devel] [RFC] Generating Go bindings for libxl
Date: Tue, 30 Jul 2019 16:22:30 +0100	[thread overview]
Message-ID: <c1c1663b-81ea-4704-e21e-c27a6d5999ba@citrix.com> (raw)
In-Reply-To: <5c6f3ed7b2f444918feea4f4b7cec107@ainfosec.com>

On 7/30/19 2:11 PM, Nicholas Rosbrook wrote:
> Hello,
> 
> As a follow up to the presentation that Brendan Kerrigan and I gave at Xen
> summit earlier this month, "Client Virtualization Toolstack in Go", I would like to open
> a discussion around the development of Go bindings for libxl. George Dunlap,
> Nicolas Belouin and I have had some discussion off-line already.
> 
> So far, these are the topics of discussion:
> 
> - Code generation: Should the Go bindings be generated from the IDL? Or should
>   an existing cgo generator like c-for-go [1] be leveraged?

Well a couple of general considerations:

* The IDL describes things at a more semantic level; it can be
arbitrarily extended with as much information as needed to allow the
generators to do their work.  And we have more control over the output:
in particular, we know we can enforce calling conventions such as
calling libxl_<type>_init() and libxl_<type>_dispose().

* AFAICT at the moment, the IDL is only used to generate C code, not for
any other languages; and only contains information about types, not
about the function signatures.  So using the IDL for "foreign" language
bindings is actually a new use case we haven't done before.

* Work enriching the IDL should have cross-over benefits into other
languages (for instance, ocaml, should XenServer ever decide to port
xapi to use libxl).  Such languages will either have no such
c-to-<language> translator, or will have a very different one.

* At the risk of falling into "NIH", adding any external dependency is
somewhat of a risk. While the c-for-go project seems reasonably stable,
it's not part of the core Go toolset, and doesn't seem to be backed by a
major corporation with a vested interest in keeping it going.  What
happens if the maintainer decides to move on in 4 years?  Making a
custom generator is a little bit of extra work, but saves us having to
potentially deal with abandoned upstream tooling down the line.

* FWIW we don't need to parse any C code to use the IDL, we can use
python's native parser.

All that said, the first question I think is, what the generated code
needs to look like.  Then, if c-for-go can be configured to do that,
then we can consider it; otherwise, making our own generator from the
IDL will be the only option.

Out of curiosity, have you looked at the existing in-tree bindings?  Any
particular opinions?

There are two major differences I note.

First, is that in your version, there seems to be two layers: libxl.go
is generated by c-for-go, and contains simple function calls; e.g.:
domainInfo(), which takes a *Ctx as an argument and calls
C.libxl_domain_info.  Then you have libxl_wrappers.go, which is written
manually, defining DomainInfo as a  method on Ctx, and calls domainInfo().

So you're writing the "idiomatic Go" part by hand anyway; I don't really
see why having a hand-written Go function call an automatically
generated Go function to call a C function is better than having a
hand-written Go function call a C function directly.

The other difference is in the handling of nested structures.  c-for-go
seems to generate a struct which has the core C struct inside it, as
well as a Go-like translation of that struct, and methods on that struct
which will copy things into and out of the C struct.

But rather than doing a "deep copy" for pointers within a struct, it
simply copies and casts the pointer from inside the struct to a pointer
outside the struct.

In fact, there's a Go-like clone of libxl_domain_config, but none for
the elements of it; DeviceDisk, for instance, is simply defined as
C.libxl_device_disk, and config->disks simply copied to the Disks
element of the struct.  That's just all wrong -- it's actually a C
array; Go can only access the first element of it.  How are you supposed
to create a domain with more than one disk?

Furthermore, these pointers are not re-set to `nil` after <type>.Free()
is called.  This just seems very dangerous: It would be way to easy to
introduce a use-after-free bug.

And keeping these C pointers around makes things very tricky, as far as
making sure they get freed.

The in-tree bindings generally only create C structures temporarily, and
do a full marshal and unmarshall into and out of Go structures.  This
means a lot of copying on every function call.  But it also means that
the callers can generally treat the Go structures like normal Go
structures -- they don't have to worry about keeping track of them and
freeing them or cleaning them up; they can let the GC deal with it, just
like they deal with everything else.

Which more or less brings me to the core design decision we have to
make: dealing with pointers to / in transient structures (as opposed to
long-lived structures like libxl_ctxt or xentoollogger).  It seems to me
we have a couple of options:

1. Keep separate structures, and do a full "deep copy", as the in-tree
bindings do.  Advantage: Callers can use GC like normal Go functions.
Structure elements are translated to go-native types. Disadvantage:
Copying overhead on every function call.

2. Use C types; do explicit allocate / free.  Advantage: No copying on
every function call.  Disadvantage: Needing to remember to clean up / no
GC; can't use Go-native types.

3. Nest one structure inside the other, and do a marshall only when one
of them changes.  Advantage: Copying only when one of the two sides
changes, rather than every function call; c-for-go already generates a
lot of the marshalling / unmashalling code.  Disadvantage:  Need to do a
full copy whenever one side changes (which in libxl's case will be
almost every function call); Needing to remember to treat pointers
carefully; complicated management of pointers; c-for-go implementation
probably not easily integrated with libxl_<type>_dispose() calling
discipline.

4. Attempt to use SetFinalizer() to automatically do frees / structure
clean-up [1].  Advantage: No / less copying on every function call, but
can still treat structures like they'll be GC'd.  Disadvantage: Requires
careful thinking; GC may not be as effective if C-allocated memory
greatly exceeds Go-allocated memory; can't use Go-native types for elements.

[1]
http://rabarar.github.io/blog/2015/09/29/cgo-and-destructors-for-managing-allocated-memory-in-go/

c-for-go seems to take the worst bits of #1 and #2: It requires explicit
allocate / free, but also actually does a full copy of each structure
whenever one "half" of it changes.

I think I'm coming more and more to the conclusion that I don't like
what c-for-go produces in libxl's case. :-)

On the whole, I still think #1 is the best option.  Thoughts?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel