All of lore.kernel.org
 help / color / mirror / Atom feed
* Current LibXL Status
@ 2015-11-18 18:32 Martin Osterloh
  2015-11-19  9:20 ` Ian Campbell
  0 siblings, 1 reply; 26+ messages in thread
From: Martin Osterloh @ 2015-11-18 18:32 UTC (permalink / raw)
  To: xen-devel

Hello All,

I am currently working on the OpenXT project [1]. My work is currently 
focused on creating a shim layer between XenMgr [2] (written in Haskell) 
and LibXL. Goal is to replace the current XenOps/XenVM [3] (written in 
OCaml).

I wanted to inquire about the current state of LibXL and in particular 
if there are any issues with using it in long-running processes.

Any feedback would be much appreciated.

Regards,

Martin Osterloh

[1]https://github.com/OpenXT
[2]https://github.com/OpenXT/manager
[3]https://github.com/OpenXT/toolstack

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2015-11-18 18:32 Current LibXL Status Martin Osterloh
@ 2015-11-19  9:20 ` Ian Campbell
  2015-11-19 10:23   ` George Dunlap
  2015-11-19 10:55   ` Andrew Cooper
  0 siblings, 2 replies; 26+ messages in thread
From: Ian Campbell @ 2015-11-19  9:20 UTC (permalink / raw)
  To: Martin Osterloh, xen-devel

On Wed, 2015-11-18 at 18:32 +0000, Martin Osterloh wrote:

> I wanted to inquire about the current state of LibXL and in particular 
> if there are any issues with using it in long-running processes.

It is currently being used by libvirtd which I think has shaken out most of
the issues with that environment.

There are certain to be other bugs, but nothing show-stopping.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2015-11-19  9:20 ` Ian Campbell
@ 2015-11-19 10:23   ` George Dunlap
  2015-11-19 10:55   ` Andrew Cooper
  1 sibling, 0 replies; 26+ messages in thread
From: George Dunlap @ 2015-11-19 10:23 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Euan Harris, Martin Osterloh, Rob Hoes, xen-devel

On Thu, Nov 19, 2015 at 9:20 AM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Wed, 2015-11-18 at 18:32 +0000, Martin Osterloh wrote:
>
>> I wanted to inquire about the current state of LibXL and in particular
>> if there are any issues with using it in long-running processes.
>
> It is currently being used by libvirtd which I think has shaken out most of
> the issues with that environment.
>
> There are certain to be other bugs, but nothing show-stopping.

And in particular, we have an instance running [libvirt+libxl] with
the the OpenStack CI loop, and it passes.

Re moving from xenopsd -- I think one of the big things XenServer has
run into is the error reporting from libxl.  Rob Hoes posted some
preliminary patches back in June[1], but there hasn't been a repost
yet, AFAICS.

CC'ing Rob Hoes and Euan Harris to see if they have any wisdom to offer you. :-)

 -George

[1] http://marc.info/?i=<1435153656-19485-2-git-send-email-rob.hoes@citrix.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2015-11-19  9:20 ` Ian Campbell
  2015-11-19 10:23   ` George Dunlap
@ 2015-11-19 10:55   ` Andrew Cooper
  2015-11-19 11:23     ` Ian Campbell
  2016-02-18 18:15     ` libxl and malloc failure (Re: Current LibXL Status) Ian Jackson
  1 sibling, 2 replies; 26+ messages in thread
From: Andrew Cooper @ 2015-11-19 10:55 UTC (permalink / raw)
  To: Ian Campbell, Martin Osterloh, xen-devel

On 19/11/15 09:20, Ian Campbell wrote:
> On Wed, 2015-11-18 at 18:32 +0000, Martin Osterloh wrote:
>
>> I wanted to inquire about the current state of LibXL and in particular 
>> if there are any issues with using it in long-running processes.
> It is currently being used by libvirtd which I think has shaken out most of
> the issues with that environment.
>
> There are certain to be other bugs, but nothing show-stopping.

There really is a show-stopper, which I have stated before.

Languages such as OCaml use -ENOMEM as a hint to run the garbage
collector some more.  I expect Haskell is the same.

It is not appropriate for libxl (or any library for that matter) to use
exit() as its method of resolving out-of-memory conditions.

~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2015-11-19 10:55   ` Andrew Cooper
@ 2015-11-19 11:23     ` Ian Campbell
  2015-11-19 11:30       ` Processed: " xen
                         ` (3 more replies)
  2016-02-18 18:15     ` libxl and malloc failure (Re: Current LibXL Status) Ian Jackson
  1 sibling, 4 replies; 26+ messages in thread
From: Ian Campbell @ 2015-11-19 11:23 UTC (permalink / raw)
  To: Andrew Cooper, Martin Osterloh, xen-devel

create !
title it libxl exit() on ENOMEM incompatible with gc'd languages
thanks

On Thu, 2015-11-19 at 10:55 +0000, Andrew Cooper wrote:
> On 19/11/15 09:20, Ian Campbell wrote:
> > On Wed, 2015-11-18 at 18:32 +0000, Martin Osterloh wrote:
> > 
> > > I wanted to inquire about the current state of LibXL and in
> > > particular 
> > > if there are any issues with using it in long-running processes.
> > It is currently being used by libvirtd which I think has shaken out
> > most of
> > the issues with that environment.
> > 
> > There are certain to be other bugs, but nothing show-stopping.
> 
> There really is a show-stopper, which I have stated before.

Ah yes, nobody ever made a proposal to fix this so it slipped off my radar.

I've recorded it in the BTS this time.

> Languages such as OCaml use -ENOMEM as a hint to run the garbage
> collector some more.  I expect Haskell is the same.
> 
> It is not appropriate for libxl (or any library for that matter) to use
> exit() as its method of resolving out-of-memory conditions.

Note that the decision to take this approach was widely consulted at the
time, including with Ocaml folks, it wasn't just done on a whim. That's not
to say we cannot reconsider and find a different, better, approach which
works for such languages.

I think retrofitting all the necessary error paths to libxl to report
memory allocation failures back up to the caller is going to be untenable
and probably a project which would never actually be complete.

But I notice that there are only ~10 calls to libxl__alloc_failed in libxl.

I think it would be possible to turn each of those into a retry loop which
calls an application provided hook function on each iteration.

If no hook is provided by the application then the current behaviour would
remain.

Language bindings for Ocaml/haskel/etc would set the hook and use it to
call into their gc. You'd probably want to limit the number of retry
attempts and provide a way for the app to say "no, we really are out of
memory" if it wants.

Would such an approach work for Ocaml and haskell?

In particular I'd be concerned about things like the ocaml interpreter lock
(caml_enter_blocking_section etc) and calls to libxl not consistently
holding/dropping it potentially leading to deadlocks.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Processed: Re:  Current LibXL Status
  2015-11-19 11:23     ` Ian Campbell
@ 2015-11-19 11:30       ` xen
  2015-11-19 11:33       ` Andrew Cooper
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: xen @ 2015-11-19 11:30 UTC (permalink / raw)
  To: Ian Campbell, xen-devel

Processing commands for xen@bugs.xenproject.org:

> create !
Created new bug #51 rooted at `<1447932195.5647.46.camel@citrix.com>'
Title: `Re: [Xen-devel] Current LibXL Status'
> title it libxl exit() on ENOMEM incompatible with gc'd languages
Set title for #51 to `libxl exit() on ENOMEM incompatible with gc'd languages'
> thanks
Finished processing.

Modified/created Bugs:
 - 51: http://bugs.xenproject.org/xen/bug/51 (new)

---
Xen Hypervisor Bug Tracker
See http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for information on reporting bugs
Contact xen-bugs-owner@bugs.xenproject.org with any infrastructure issues

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2015-11-19 11:23     ` Ian Campbell
  2015-11-19 11:30       ` Processed: " xen
@ 2015-11-19 11:33       ` Andrew Cooper
  2015-11-19 11:48         ` Ian Campbell
  2015-11-19 15:34       ` George Dunlap
  2016-02-18 17:26       ` George Dunlap
  3 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2015-11-19 11:33 UTC (permalink / raw)
  To: Ian Campbell, Martin Osterloh, xen-devel

On 19/11/15 11:23, Ian Campbell wrote:
> create !
> title it libxl exit() on ENOMEM incompatible with gc'd languages
> thanks

Can this be extended to "should not use exit() in general" ?

andrewcoop@andrewcoop:/local/xen.git/xen$ git grep exit\( --
:/tools/libxl/libxl*
../tools/libxl/libxl.c:1707:        _exit(0);
../tools/libxl/libxl.c:1711:            _exit(errno);
../tools/libxl/libxl.c:1716:            _exit(-1);
../tools/libxl/libxl_aoutils.c:478:            if (r) {
LOGE(ERROR,"openpty failed"); _exit(-1); }
../tools/libxl/libxl_aoutils.c:482:        if (rc) { LOGE(ERROR,"sendmsg
to parent failed"); _exit(-1); }
../tools/libxl/libxl_aoutils.c:483:        _exit(0);
../tools/libxl/libxl_bootloader.c:557:        if (r) { LOGE(ERROR,
"login_tty failed"); exit(-1); }
../tools/libxl/libxl_bootloader.c:559:        exit(-1);
../tools/libxl/libxl_event.c:1387:    exit(-1);
../tools/libxl/libxl_event.h:104:   * and call exit(-1).
../tools/libxl/libxl_exec.c:106:    _exit(-1);
../tools/libxl/libxl_exec.c:316:        exit(255);
../tools/libxl/libxl_exec.c:324:        _exit(127);
../tools/libxl/libxl_exec.c:340:    _exit(r);
../tools/libxl/libxl_internal.c:28:    _exit(-1);
../tools/libxl/libxl_remus_disk_drbd.c:229:    _exit(ackwait);
../tools/libxl/libxl_save_callout.c:184:                exit(-1);
../tools/libxl/libxl_save_helper.c:64:    if (r < 0) { perror("memory
allocation failed during logging"); exit(-1); }
../tools/libxl/libxl_save_helper.c:102:    exit(-1);
../tools/libxl/libxl_save_helper.c:122:    if (!r) { perror("memory
allocation failed"); exit(-1); }
../tools/libxl/libxl_save_helper.c:189:        if (r<0) {
perror("write"); exit(-1); }
../tools/libxl/libxl_save_helper.c:210:    if (r<=0) exit(-2);
../tools/libxl/libxl_save_helper.c:230:    exit(0);
../tools/libxl/libxlu_cfg_l.c:1763:     exit( YY_EXIT_FAILURE );
../tools/libxl/libxlu_disk_l.c:2248:    exit( YY_EXIT_FAILURE );

The majority of those are cases are not appropriate uses of exit(). 
AFAIIR, the *only* valid use of exit() in a library is to clean up in a
child process from a library-initiated fork().

~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2015-11-19 11:33       ` Andrew Cooper
@ 2015-11-19 11:48         ` Ian Campbell
  2015-11-19 11:55           ` Ian Campbell
  2016-02-18 18:30           ` Ian Jackson
  0 siblings, 2 replies; 26+ messages in thread
From: Ian Campbell @ 2015-11-19 11:48 UTC (permalink / raw)
  To: Andrew Cooper, Martin Osterloh, xen-devel

On Thu, 2015-11-19 at 11:33 +0000, Andrew Cooper wrote:
> 
> The majority of those are cases are not appropriate uses of exit(). 
> AFAIIR, the *only* valid use of exit() in a library is to clean up in a
> child process from a library-initiated fork().

... or (in this case) in the libxl-save-helper (separate process).

The only one I can find which isn't one of this is
in libxl__event_disaster, and that is only if the applications (or language
bindings) haven't provided a suitable disaster callback.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2015-11-19 11:48         ` Ian Campbell
@ 2015-11-19 11:55           ` Ian Campbell
  2015-11-19 12:16             ` Ian Campbell
  2016-02-18 18:30           ` Ian Jackson
  1 sibling, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2015-11-19 11:55 UTC (permalink / raw)
  To: Andrew Cooper, Martin Osterloh, xen-devel, Ian Jackson
  Cc: Shriram Rajagopalan, Yang Hongyang

On Thu, 2015-11-19 at 11:48 +0000, Ian Campbell wrote:
> On Thu, 2015-11-19 at 11:33 +0000, Andrew Cooper wrote:
> >  
> > The majority of those are cases are not appropriate uses of exit(). 
> > AFAIIR, the *only* valid use of exit() in a library is to clean up in a
> > child process from a library-initiated fork().
> 
> ... or (in this case) in the libxl-save-helper (separate process).
> 
> The only one I can find which isn't one of this is
> in libxl__event_disaster, and that is only if the applications (or language
> bindings) haven't provided a suitable disaster callback.

Was looking at 4.4, in staging I also see a very odd one in
drbd_preresume_async, which isn't obviously in a child process AFAICT.

Hongyang, what prevents that exit from killing the whole toolstack process?

Ian.

> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2015-11-19 11:55           ` Ian Campbell
@ 2015-11-19 12:16             ` Ian Campbell
  2015-11-20  0:30               ` Yang Hongyang
  2016-02-18 17:09               ` George Dunlap
  0 siblings, 2 replies; 26+ messages in thread
From: Ian Campbell @ 2015-11-19 12:16 UTC (permalink / raw)
  To: Andrew Cooper, Martin Osterloh, xen-devel, Ian Jackson
  Cc: Shriram Rajagopalan, Yang Hongyang

On Thu, 2015-11-19 at 11:55 +0000, Ian Campbell wrote:
> On Thu, 2015-11-19 at 11:48 +0000, Ian Campbell wrote:
> > On Thu, 2015-11-19 at 11:33 +0000, Andrew Cooper wrote:
> > >  
> > > The majority of those are cases are not appropriate uses of exit(). 
> > > AFAIIR, the *only* valid use of exit() in a library is to clean up in
> > > a
> > > child process from a library-initiated fork().
> > 
> > ... or (in this case) in the libxl-save-helper (separate process).
> > 
> > The only one I can find which isn't one of this is
> > in libxl__event_disaster, and that is only if the applications (or
> > language
> > bindings) haven't provided a suitable disaster callback.
> 
> Was looking at 4.4, in staging I also see a very odd one in
> drbd_preresume_async, which isn't obviously in a child process AFAICT.
> 
> Hongyang, what prevents that exit from killing the whole toolstack
> process?

I had missed an _async suffix on that function versus the one which was the
actual callback, it is invoked via drbd_async_call which involves a fork().

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2015-11-19 11:23     ` Ian Campbell
  2015-11-19 11:30       ` Processed: " xen
  2015-11-19 11:33       ` Andrew Cooper
@ 2015-11-19 15:34       ` George Dunlap
  2016-02-18 17:26       ` George Dunlap
  3 siblings, 0 replies; 26+ messages in thread
From: George Dunlap @ 2015-11-19 15:34 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Andrew Cooper, Martin Osterloh, xen-devel

On Thu, Nov 19, 2015 at 11:23 AM, Ian Campbell <ian.campbell@citrix.com> wrote:
> create !
> title it libxl exit() on ENOMEM incompatible with gc'd languages
> thanks

Actually, can I suggest that someone send a new thread with an
appropriate title, so that people who aren't directly following the
thread know what kinds of things are being discussed?

Also so Martin can decide whether he wants to follow it or not. :-)

 -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2015-11-19 12:16             ` Ian Campbell
@ 2015-11-20  0:30               ` Yang Hongyang
  2016-02-18 17:09               ` George Dunlap
  1 sibling, 0 replies; 26+ messages in thread
From: Yang Hongyang @ 2015-11-20  0:30 UTC (permalink / raw)
  To: Ian Campbell, Andrew Cooper, Martin Osterloh, xen-devel, Ian Jackson
  Cc: Shriram Rajagopalan, Yang Hongyang

On 2015年11月19日 20:16, Ian Campbell wrote:
> On Thu, 2015-11-19 at 11:55 +0000, Ian Campbell wrote:
>> On Thu, 2015-11-19 at 11:48 +0000, Ian Campbell wrote:
>>> On Thu, 2015-11-19 at 11:33 +0000, Andrew Cooper wrote:
>>>>
>>>> The majority of those are cases are not appropriate uses of exit().
>>>> AFAIIR, the *only* valid use of exit() in a library is to clean up in
>>>> a
>>>> child process from a library-initiated fork().
>>>
>>> ... or (in this case) in the libxl-save-helper (separate process).
>>>
>>> The only one I can find which isn't one of this is
>>> in libxl__event_disaster, and that is only if the applications (or
>>> language
>>> bindings) haven't provided a suitable disaster callback.
>>
>> Was looking at 4.4, in staging I also see a very odd one in
>> drbd_preresume_async, which isn't obviously in a child process AFAICT.
>>
>> Hongyang, what prevents that exit from killing the whole toolstack
>> process?
>
> I had missed an _async suffix on that function versus the one which was the
> actual callback, it is invoked via drbd_async_call which involves a fork().

Yeah, it is in a child process.

>
> Ian.
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

-- 
Thanks,
Yang

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2015-11-19 12:16             ` Ian Campbell
  2015-11-20  0:30               ` Yang Hongyang
@ 2016-02-18 17:09               ` George Dunlap
  2016-02-18 17:19                 ` Ian Jackson
  2016-02-18 17:24                 ` Ian Campbell
  1 sibling, 2 replies; 26+ messages in thread
From: George Dunlap @ 2016-02-18 17:09 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andrew Cooper, Ian Jackson, xen-devel, Martin Osterloh,
	Shriram Rajagopalan, Yang Hongyang

On Thu, Nov 19, 2015 at 12:16 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Thu, 2015-11-19 at 11:55 +0000, Ian Campbell wrote:
>> On Thu, 2015-11-19 at 11:48 +0000, Ian Campbell wrote:
>> > On Thu, 2015-11-19 at 11:33 +0000, Andrew Cooper wrote:
>> > >
>> > > The majority of those are cases are not appropriate uses of exit().
>> > > AFAIIR, the *only* valid use of exit() in a library is to clean up in
>> > > a
>> > > child process from a library-initiated fork().
>> >
>> > ... or (in this case) in the libxl-save-helper (separate process).
>> >
>> > The only one I can find which isn't one of this is
>> > in libxl__event_disaster, and that is only if the applications (or
>> > language
>> > bindings) haven't provided a suitable disaster callback.
>>
>> Was looking at 4.4, in staging I also see a very odd one in
>> drbd_preresume_async, which isn't obviously in a child process AFAICT.
>>
>> Hongyang, what prevents that exit from killing the whole toolstack
>> process?
>
> I had missed an _async suffix on that function versus the one which was the
> actual callback, it is invoked via drbd_async_call which involves a fork().

So what was the conclusion here?  It looks like we've confirmed that
exit() is only called:

1. In the case of a malloc() failure
2. in libxl-save-helper (a separate process forked by the library)
3. In libxl__event_disaster(), if no callback is provided

Which just leaves #1 as something to be discussed?

(Also, I had suggested making a new thread, but I can't find any
threads continuing this conversation.)

 -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2016-02-18 17:09               ` George Dunlap
@ 2016-02-18 17:19                 ` Ian Jackson
  2016-02-18 17:26                   ` Ian Campbell
  2016-02-18 17:24                 ` Ian Campbell
  1 sibling, 1 reply; 26+ messages in thread
From: Ian Jackson @ 2016-02-18 17:19 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Andrew Cooper, xen-devel, Martin Osterloh,
	Shriram Rajagopalan, Yang Hongyang

George Dunlap writes ("Re: [Xen-devel] Current LibXL Status"):
> So what was the conclusion here?  It looks like we've confirmed that
> exit() is only called:
> 
> 1. In the case of a malloc() failure
> 2. in libxl-save-helper (a separate process forked by the library)
> 3. In libxl__event_disaster(), if no callback is provided
  4. In other processes forked by the library

But, yes,l basically.

> Which just leaves #1 as something to be discussed?

Is this crashing on malloc failure a problem ?

>From the point of view of libxl's innards, making malloc failures
fatal means that nothing that allocates memory needs to care about
malloc failures.  That massively reduces the number of error paths to
be considered and eliminates an entire class of (largely theoretical)
bugs.

And often there is no good recovery possible (and logging the error is
hard too).

I'm not sure whether I'd want to change this policy even if someone
wanted to commit to auditing libxl and submitting the necessary
patches to cope with every malloc failure.  Having to cope with malloc
failure would be a continual burden on every proposed change or new
feature.

If in practice this is a problem it would probaby be better to run
libxl in a process which can be restarted if it dies due to malloc
failure.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2016-02-18 17:09               ` George Dunlap
  2016-02-18 17:19                 ` Ian Jackson
@ 2016-02-18 17:24                 ` Ian Campbell
  1 sibling, 0 replies; 26+ messages in thread
From: Ian Campbell @ 2016-02-18 17:24 UTC (permalink / raw)
  To: George Dunlap
  Cc: Andrew Cooper, Ian Jackson, xen-devel, Martin Osterloh,
	Shriram Rajagopalan, Yang Hongyang

On Thu, 2016-02-18 at 17:09 +0000, George Dunlap wrote:
> On Thu, Nov 19, 2015 at 12:16 PM, Ian Campbell <ian.campbell@citrix.com>
> wrote:
> > On Thu, 2015-11-19 at 11:55 +0000, Ian Campbell wrote:
> > > On Thu, 2015-11-19 at 11:48 +0000, Ian Campbell wrote:
> > > > On Thu, 2015-11-19 at 11:33 +0000, Andrew Cooper wrote:
> > > > > 
> > > > > The majority of those are cases are not appropriate uses of
> > > > > exit().
> > > > > AFAIIR, the *only* valid use of exit() in a library is to clean
> > > > > up in
> > > > > a
> > > > > child process from a library-initiated fork().
> > > > 
> > > > ... or (in this case) in the libxl-save-helper (separate process).
> > > > 
> > > > The only one I can find which isn't one of this is
> > > > in libxl__event_disaster, and that is only if the applications (or
> > > > language
> > > > bindings) haven't provided a suitable disaster callback.
> > > 
> > > Was looking at 4.4, in staging I also see a very odd one in
> > > drbd_preresume_async, which isn't obviously in a child process
> > > AFAICT.
> > > 
> > > Hongyang, what prevents that exit from killing the whole toolstack
> > > process?
> > 
> > I had missed an _async suffix on that function versus the one which was
> > the
> > actual callback, it is invoked via drbd_async_call which involves a
> > fork().
> 
> So what was the conclusion here?  It looks like we've confirmed that
> exit() is only called:
> 
> 1. In the case of a malloc() failure
> 2. in libxl-save-helper (a separate process forked by the library)
> 3. In libxl__event_disaster(), if no callback is provided
> 
> Which just leaves #1 as something to be discussed?

Somewhere/when I proposed handling #1 by having the small number of places
which call libxl__alloc_failed() call an application (or language binding)
provided "please free some memory" hook (i.e. "please run your garbage
collector") and then retry some appropriate number of times. Perhaps with
an error code available for the hook to say "this is never going to help".

There's less than a dozen such call sites so this is quite doable, vastly
so compared with adding OOM error handling and reporting back up the
callchain to all libxl functions.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2016-02-18 17:19                 ` Ian Jackson
@ 2016-02-18 17:26                   ` Ian Campbell
  2016-02-18 17:40                     ` George Dunlap
  0 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2016-02-18 17:26 UTC (permalink / raw)
  To: Ian Jackson, George Dunlap
  Cc: Shriram Rajagopalan, Andrew Cooper, Yang Hongyang,
	Martin Osterloh, xen-devel

On Thu, 2016-02-18 at 17:19 +0000, Ian Jackson wrote:
> George Dunlap writes ("Re: [Xen-devel] Current LibXL Status"):
> > So what was the conclusion here?  It looks like we've confirmed that
> > exit() is only called:
> > 
> > 1. In the case of a malloc() failure
> > 2. in libxl-save-helper (a separate process forked by the library)
> > 3. In libxl__event_disaster(), if no callback is provided
>   4. In other processes forked by the library
> 
> But, yes,l basically.
> 
> > Which just leaves #1 as something to be discussed?
> 
> Is this crashing on malloc failure a problem ?

It is for non-C language bindings which might be using garbage collection,
since they might be OOM from a malloc perspective but actually have loads
of spare memory waiting to be collected (which they might plausibly be
doing quite lazily).

I just reminded people of my proposal to provide a callback to allow the
app/bindings to run their gc here in my reply to George before I saw your
reply.

> From the point of view of libxl's innards, making malloc failures
> fatal means that nothing that allocates memory needs to care about
> malloc failures.  That massively reduces the number of error paths to
> be considered and eliminates an entire class of (largely theoretical)
> bugs.
> 
> And often there is no good recovery possible (and logging the error is
> hard too).
> 
> I'm not sure whether I'd want to change this policy even if someone
> wanted to commit to auditing libxl and submitting the necessary
> patches to cope with every malloc failure.  Having to cope with malloc
> failure would be a continual burden on every proposed change or new
> feature.

I agree that we don't want to change this policy, but I think an OOM hook
is sufficient to solve the actual problem.


> If in practice this is a problem it would probaby be better to run
> libxl in a process which can be restarted if it dies due to malloc
> failure.
> 
> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2015-11-19 11:23     ` Ian Campbell
                         ` (2 preceding siblings ...)
  2015-11-19 15:34       ` George Dunlap
@ 2016-02-18 17:26       ` George Dunlap
  2016-02-18 17:39         ` Ian Campbell
  3 siblings, 1 reply; 26+ messages in thread
From: George Dunlap @ 2016-02-18 17:26 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andrew Cooper, Jonathan Ludlam, Martin Osterloh, Rob Hoes, xen-devel

On Thu, Nov 19, 2015 at 11:23 AM, Ian Campbell <ian.campbell@citrix.com> wrote:
> create !
> title it libxl exit() on ENOMEM incompatible with gc'd languages
> thanks
>
> On Thu, 2015-11-19 at 10:55 +0000, Andrew Cooper wrote:
>> On 19/11/15 09:20, Ian Campbell wrote:
>> > On Wed, 2015-11-18 at 18:32 +0000, Martin Osterloh wrote:
>> >
>> > > I wanted to inquire about the current state of LibXL and in
>> > > particular
>> > > if there are any issues with using it in long-running processes.
>> > It is currently being used by libvirtd which I think has shaken out
>> > most of
>> > the issues with that environment.
>> >
>> > There are certain to be other bugs, but nothing show-stopping.
[snip]
>> Languages such as OCaml use -ENOMEM as a hint to run the garbage
>> collector some more.  I expect Haskell is the same.

So the implication you've made here, is that if libxl were to return
-ENOMEM to ocaml, that it would run the garbage collector and free up
some heap space, so that malloc() could succeed and the libxl
operation could proceed.

In preparation for writing up a descripion for an "outreach project"
(Outreachy/OPW/GSoC), I went and had a chat with some of the guys on
the xapi team (cc'd) about what such a thing might look like.

According to them, the ocaml garbage collector never frees memory.  It
grows its own internal heap as necessary, but it never reduces the
size of its heap.  So no -ENOMEM call or OOM callback can make a
failed malloc succeed.

One might then ask if libxl could simply allocate memory *from the
ocaml heap* itself.  It turns out that is also not tenable: Data in
the ocaml heap is stored in a heavily coded format.  (For example
integers are stored as (n*2+1), so that pointers can all be even and
non-pointers can all be odd.)

The only thing they said might be improved is:

1.  To know that libxl would never call exit() for any other reason
(which it seems is true)

2. To  have a callback in OOM conditions.  It's unlikely the process
as a whole could do anything but exit, but the ocaml runtime itself
might still have internal heap available, which would allow it to exit
more gracefully.

I think if Haskell or any library *is* capable of integrating with the
garbage collector, we'll have to wait until someone who understands
the language actually writes bindings and can ask for something they
know to be useful for that language.

 -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2016-02-18 17:26       ` George Dunlap
@ 2016-02-18 17:39         ` Ian Campbell
  2016-02-18 17:47           ` George Dunlap
  2016-02-18 17:50           ` Ian Campbell
  0 siblings, 2 replies; 26+ messages in thread
From: Ian Campbell @ 2016-02-18 17:39 UTC (permalink / raw)
  To: George Dunlap
  Cc: Andrew Cooper, Jonathan Ludlam, Martin Osterloh, Rob Hoes, xen-devel

On Thu, 2016-02-18 at 17:26 +0000, George Dunlap wrote:

> According to them, the ocaml garbage collector never frees memory.  It
> grows its own internal heap as necessary, but it never reduces the
> size of its heap.

Assume that's true: Wow!

Although, I presume this is a factor of the current/particular
implementation, rather than a fundamental property of an ocaml runtime, so
in theory it could change (and we have here a good real world argument why
it perhaps should do!)

> So no -ENOMEM call or OOM callback can make a failed malloc succeed.
> 
> One might then ask if libxl could simply allocate memory *from the
> ocaml heap* itself.  It turns out that is also not tenable: Data in
> the ocaml heap is stored in a heavily coded format.  (For example
> integers are stored as (n*2+1), so that pointers can all be even and
> non-pointers can all be odd.)
> 
> The only thing they said might be improved is:
> 
> 1.  To know that libxl would never call exit() for any other reason
> (which it seems is true)
> 
> 2. To  have a callback in OOM conditions.  It's unlikely the process
> as a whole could do anything but exit, but the ocaml runtime itself
> might still have internal heap available, which would allow it to exit
> more gracefully.
> 
> I think if Haskell or any library *is* capable of integrating with the
> garbage collector, we'll have to wait until someone who understands
> the language actually writes bindings and can ask for something they
> know to be useful for that language.

Agreed.

Hopefully it is possible to make the callback without needing to malloc
anything!

If I were adding an API for #2 to libxl (which must therefore be stable) I
think I might be inclined to allow for the possibility of the callback
returning "please retry" since it would be inconvenient in the usual API
stability ways to retrofit it, I would guess, but really until a language
even exposes that possibility we don't really know if it is worthwhile
doing so.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2016-02-18 17:26                   ` Ian Campbell
@ 2016-02-18 17:40                     ` George Dunlap
  0 siblings, 0 replies; 26+ messages in thread
From: George Dunlap @ 2016-02-18 17:40 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andrew Cooper, Ian Jackson, xen-devel, Martin Osterloh,
	Shriram Rajagopalan, Yang Hongyang

On Thu, Feb 18, 2016 at 5:26 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Thu, 2016-02-18 at 17:19 +0000, Ian Jackson wrote:
>> George Dunlap writes ("Re: [Xen-devel] Current LibXL Status"):
>> > So what was the conclusion here?  It looks like we've confirmed that
>> > exit() is only called:
>> >
>> > 1. In the case of a malloc() failure
>> > 2. in libxl-save-helper (a separate process forked by the library)
>> > 3. In libxl__event_disaster(), if no callback is provided
>>   4. In other processes forked by the library
>>
>> But, yes,l basically.
>>
>> > Which just leaves #1 as something to be discussed?
>>
>> Is this crashing on malloc failure a problem ?
>
> It is for non-C language bindings which might be using garbage collection,
> since they might be OOM from a malloc perspective but actually have loads
> of spare memory waiting to be collected (which they might plausibly be
> doing quite lazily).
>
> I just reminded people of my proposal to provide a callback to allow the
> app/bindings to run their gc here in my reply to George before I saw your
> reply.
>
>> From the point of view of libxl's innards, making malloc failures
>> fatal means that nothing that allocates memory needs to care about
>> malloc failures.  That massively reduces the number of error paths to
>> be considered and eliminates an entire class of (largely theoretical)
>> bugs.
>>
>> And often there is no good recovery possible (and logging the error is
>> hard too).
>>
>> I'm not sure whether I'd want to change this policy even if someone
>> wanted to commit to auditing libxl and submitting the necessary
>> patches to cope with every malloc failure.  Having to cope with malloc
>> failure would be a continual burden on every proposed change or new
>> feature.
>
> I agree that we don't want to change this policy, but I think an OOM hook
> is sufficient to solve the actual problem.

And in the situation like ocaml seems to be at the moment, it also
gives the process an opportunity to attempt to shut down as gracefully
as it can even if it knows it can't free up any more memory to libxl.
(And who knows, some future version of ocaml may even allow you to ask
to shrink the heap.)

 -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2016-02-18 17:39         ` Ian Campbell
@ 2016-02-18 17:47           ` George Dunlap
  2016-02-18 17:50           ` Ian Campbell
  1 sibling, 0 replies; 26+ messages in thread
From: George Dunlap @ 2016-02-18 17:47 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andrew Cooper, xen-devel, Martin Osterloh, Jonathan Ludlam, Rob Hoes

On Thu, Feb 18, 2016 at 5:39 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Thu, 2016-02-18 at 17:26 +0000, George Dunlap wrote:
>
>> According to them, the ocaml garbage collector never frees memory.  It
>> grows its own internal heap as necessary, but it never reduces the
>> size of its heap.
>
> Assume that's true: Wow!
>
> Although, I presume this is a factor of the current/particular
> implementation, rather than a fundamental property of an ocaml runtime, so
> in theory it could change (and we have here a good real world argument why
> it perhaps should do!)
>
>> So no -ENOMEM call or OOM callback can make a failed malloc succeed.
>>
>> One might then ask if libxl could simply allocate memory *from the
>> ocaml heap* itself.  It turns out that is also not tenable: Data in
>> the ocaml heap is stored in a heavily coded format.  (For example
>> integers are stored as (n*2+1), so that pointers can all be even and
>> non-pointers can all be odd.)
>>
>> The only thing they said might be improved is:
>>
>> 1.  To know that libxl would never call exit() for any other reason
>> (which it seems is true)
>>
>> 2. To  have a callback in OOM conditions.  It's unlikely the process
>> as a whole could do anything but exit, but the ocaml runtime itself
>> might still have internal heap available, which would allow it to exit
>> more gracefully.
>>
>> I think if Haskell or any library *is* capable of integrating with the
>> garbage collector, we'll have to wait until someone who understands
>> the language actually writes bindings and can ask for something they
>> know to be useful for that language.
>
> Agreed.
>
> Hopefully it is possible to make the callback without needing to malloc
> anything!
>
> If I were adding an API for #2 to libxl (which must therefore be stable) I
> think I might be inclined to allow for the possibility of the callback
> returning "please retry" since it would be inconvenient in the usual API
> stability ways to retrofit it, I would guess, but really until a language
> even exposes that possibility we don't really know if it is worthwhile
> doing so.

Well we could just make the semantics such that if the callback
returns at all, that libxl should retry.  That shouldn't be
particularly harder than just making a callback we don't expect to
ever return (which is what it sounds like the xapi folks would be
happy with).  Then when we come across a language runtime /
application that *can* do something useful, it might possibly Just
Work; but if it doesn't we can implement what they actually need.

 -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2016-02-18 17:39         ` Ian Campbell
  2016-02-18 17:47           ` George Dunlap
@ 2016-02-18 17:50           ` Ian Campbell
  1 sibling, 0 replies; 26+ messages in thread
From: Ian Campbell @ 2016-02-18 17:50 UTC (permalink / raw)
  To: George Dunlap
  Cc: Andrew Cooper, Jonathan Ludlam, Martin Osterloh, Rob Hoes, xen-devel

On Thu, 2016-02-18 at 17:39 +0000, Ian Campbell wrote:
> On Thu, 2016-02-18 at 17:26 +0000, George Dunlap wrote:
> 
> > According to them, the ocaml garbage collector never frees memory.  It
> > grows its own internal heap as necessary, but it never reduces the
> > size of its heap.
> 
> Assume that's true: Wow!

https://github.com/ocaml/ocaml/blob/trunk/byterun/compact.c#L374
is a call by do_compaction to caml_shrink_heap at
https://github.com/ocaml/ocaml/blob/trunk/byterun/memory.c#L408
which calls caml_free_for_heap at
https://github.com/ocaml/ocaml/blob/trunk/byterun/memory.c#L290
which certainly looks like it is returning memory to the malloc pool.

Maybe this never happens in practice for some reason, or perhaps the
"byterun" in those paths is relevant? I couldn't find another heap
implementation which would be associated with native binaries.

http://caml.inria.fr/mantis/view.php?id=5389 also seems to suggest that it
is expected for the heap to shrink (i.e. it was considered a bug that it
did not in this case).

Ian.

> 
> Although, I presume this is a factor of the current/particular
> implementation, rather than a fundamental property of an ocaml runtime,
> so
> in theory it could change (and we have here a good real world argument
> why
> it perhaps should do!)
> 
> > So no -ENOMEM call or OOM callback can make a failed malloc succeed.
> > 
> > One might then ask if libxl could simply allocate memory *from the
> > ocaml heap* itself.  It turns out that is also not tenable: Data in
> > the ocaml heap is stored in a heavily coded format.  (For example
> > integers are stored as (n*2+1), so that pointers can all be even and
> > non-pointers can all be odd.)
> > 
> > The only thing they said might be improved is:
> > 
> > 1.  To know that libxl would never call exit() for any other reason
> > (which it seems is true)
> > 
> > 2. To  have a callback in OOM conditions.  It's unlikely the process
> > as a whole could do anything but exit, but the ocaml runtime itself
> > might still have internal heap available, which would allow it to exit
> > more gracefully.
> > 
> > I think if Haskell or any library *is* capable of integrating with the
> > garbage collector, we'll have to wait until someone who understands
> > the language actually writes bindings and can ask for something they
> > know to be useful for that language.
> 
> Agreed.
> 
> Hopefully it is possible to make the callback without needing to malloc
> anything!
> 
> If I were adding an API for #2 to libxl (which must therefore be stable)
> I
> think I might be inclined to allow for the possibility of the callback
> returning "please retry" since it would be inconvenient in the usual API
> stability ways to retrofit it, I would guess, but really until a language
> even exposes that possibility we don't really know if it is worthwhile
> doing so.
> 
> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* libxl and malloc failure (Re:  Current LibXL Status)
  2015-11-19 10:55   ` Andrew Cooper
  2015-11-19 11:23     ` Ian Campbell
@ 2016-02-18 18:15     ` Ian Jackson
  2016-02-19 10:52       ` Ian Campbell
  1 sibling, 1 reply; 26+ messages in thread
From: Ian Jackson @ 2016-02-18 18:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Martin Osterloh, Ian Campbell, xen-devel

Andrew Cooper writes ("Re: [Xen-devel] Current LibXL Status"):
> There really is a show-stopper, which I have stated before.
> 
> Languages such as OCaml use -ENOMEM as a hint to run the garbage
> collector some more.  I expect Haskell is the same.

This cannot possibly be true (if what you mean is that they use
ENOMEM[1] from malloc as such an indication).  It would make it
impossible to write a correct binding to a normal C library.

Typically C library which calls malloc will do so in the middle of its
execution.  Even if the library returns the resulting error up as
a distinguishable error code, you can't just make the same library
call again - it may have done half its work but not the other half.

So I think you must be wrong.

Ian.

[1] NB that when malloc fails there is no -ENOMEM, only ENOMEM.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Current LibXL Status
  2015-11-19 11:48         ` Ian Campbell
  2015-11-19 11:55           ` Ian Campbell
@ 2016-02-18 18:30           ` Ian Jackson
  1 sibling, 0 replies; 26+ messages in thread
From: Ian Jackson @ 2016-02-18 18:30 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Andrew Cooper, Martin Osterloh, xen-devel

Ian Campbell writes ("Re: [Xen-devel] Current LibXL Status"):
> The only one I can find which isn't one of this is
> in libxl__event_disaster, and that is only if the applications (or language
> bindings) haven't provided a suitable disaster callback.

libxl__event_disaster can currently happen in the following cases.

Disasters which are specific to a specific requested event (type!=0):

  xc_domain_getinfolist failure (breaks reporting domain death, only
  xw_write failure for acknowledging disk eject (breaks reporting
      ejects, only)

General disasters (type==0) due to xenstore problems:

  POLLERR or POLLHUP on xenstore fd
  xs_check_watch fails (represents trouble on xenstore fd or with xenstore)
  xs_read for domain death check

General disasters (type==0) due to hypercall trouble:

  xenevtchn_pending fails other than with EAGAIN
  (unrequested) event other than POLLIN on evtchn fd

General disasters (type==0) due to syscall failure:

  poll syscall fails
  (unrequested) event other than POLLIN on self-pipe
  read() or write() fails on a self-pipe (other than EAGAIN/EWOULDBLOCK)
  waitpid() syscall fails

General disasters (type==0) induced by the application:

  waitpid reports ECHILD when we know we have a child (probably,
    something else in the process reaped it; arguably this should abort)
  application's childproc_hooks->reaped_callback failure

Disasters are not reported unless it is impossible to proceed.  So in
general it is not possible for a process to recover from a disaster
reported by libxl.

But disasters are never expected.  They should occur only if
everything is totally doomed anyway.  Having a daemon crash is a
perfectly reasonable response.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: libxl and malloc failure (Re: Current LibXL Status)
  2016-02-18 18:15     ` libxl and malloc failure (Re: Current LibXL Status) Ian Jackson
@ 2016-02-19 10:52       ` Ian Campbell
  2016-02-19 11:00         ` Processed: " xen
  2016-02-22 16:48         ` Ian Jackson
  0 siblings, 2 replies; 26+ messages in thread
From: Ian Campbell @ 2016-02-19 10:52 UTC (permalink / raw)
  To: Ian Jackson, Andrew Cooper; +Cc: Martin Osterloh, xen-devel

graft 51 ^
thanks

On Thu, 2016-02-18 at 18:15 +0000, Ian Jackson wrote:
> Andrew Cooper writes ("Re: [Xen-devel] Current LibXL Status"):
> > There really is a show-stopper, which I have stated before.
> > 
> > Languages such as OCaml use -ENOMEM as a hint to run the garbage
> > collector some more.  I expect Haskell is the same.
> 
> This cannot possibly be true (if what you mean is that they use
> ENOMEM[1] from malloc as such an indication).  It would make it
> impossible to write a correct binding to a normal C library.
> 
> Typically C library which calls malloc will do so in the middle of its
> execution.  Even if the library returns the resulting error up as
> a distinguishable error code, you can't just make the same library
> call again - it may have done half its work but not the other half.

FWIW there is some code in the Ocaml runtime which raises an Out_of_memory
exception in response to ENOMEM in C land (and from ocaml code in some
circumstances too).

The Unix module (which wraps standard Unix libc type APIs) looks to (when
appropriate) reasonably consistently return errors as Unix_error(errno),
that is handled generically so will include raising Unix_error(ENOMEM) as
necessary.

In principal it ought to be possible to write an ocaml program which
correctly dealt with those (and similar ones from other lower level
bindings), for example the interactive command line interpreter catches
Out_of_memory (but not Unix_error) and calls the gc before iterating. (I
did see one reference in a book which suggested Out_of_memory wasn't
catcheable, but that appears to be out of date from what I can tell).

So I suppose it ought to be possible to write an ocaml daemon which caught
errors like these at the top level and aborted only the currently attempted
operation without taking down the whole daemon process. How feasible that
is in practice I can't say.

This of course relies on all the C bindings correctly turning errors into
exceptions and the semantics of your particular daemon allowing for
abandoning things half way through (which I suppose is a relative of some
kind to crash-only s/w).

Ian.

> 
> So I think you must be wrong.
> 
> Ian.
> 
> [1] NB that when malloc fails there is no -ENOMEM, only ENOMEM.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Processed: Re: libxl and malloc failure (Re: Current LibXL Status)
  2016-02-19 10:52       ` Ian Campbell
@ 2016-02-19 11:00         ` xen
  2016-02-22 16:48         ` Ian Jackson
  1 sibling, 0 replies; 26+ messages in thread
From: xen @ 2016-02-19 11:00 UTC (permalink / raw)
  To: Ian Campbell, xen-devel

Processing commands for xen@bugs.xenproject.org:

> graft 51 ^
Graft `<22214.2619.86697.724156@mariner.uk.xensource.com>' onto #51
> thanks
Finished processing.

Modified/created Bugs:
 - 51: http://bugs.xenproject.org/xen/bug/51

---
Xen Hypervisor Bug Tracker
See http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for information on reporting bugs
Contact xen-bugs-owner@bugs.xenproject.org with any infrastructure issues

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: libxl and malloc failure (Re: Current LibXL Status)
  2016-02-19 10:52       ` Ian Campbell
  2016-02-19 11:00         ` Processed: " xen
@ 2016-02-22 16:48         ` Ian Jackson
  1 sibling, 0 replies; 26+ messages in thread
From: Ian Jackson @ 2016-02-22 16:48 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Andrew Cooper, Martin Osterloh, xen-devel

Ian Campbell writes ("Re: libxl and malloc failure (Re: [Xen-devel] Current LibXL Status)"):
> On Thu, 2016-02-18 at 18:15 +0000, Ian Jackson wrote:
> > Andrew Cooper writes ("Re: [Xen-devel] Current LibXL Status"):
> > > Languages such as OCaml use -ENOMEM as a hint to run the garbage
> > > collector some more.  I expect Haskell is the same.
> > 
> > This cannot possibly be true (if what you mean is that they use
> > ENOMEM[1] from malloc as such an indication).  It would make it
> > impossible to write a correct binding to a normal C library.
> > 
> > Typically C library which calls malloc will do so in the middle of its
> > execution.  Even if the library returns the resulting error up as
> > a distinguishable error code, you can't just make the same library
> > call again - it may have done half its work but not the other half.
> 
> FWIW there is some code in the Ocaml runtime which raises an Out_of_memory
> exception in response to ENOMEM in C land (and from ocaml code in some
> circumstances too).

ISTM that either there is something else which is turning ocaml out of
heap errors into ENOMEM, or that this code you describe is
fundamentally misconceived.

> In principal it ought to be possible to write an ocaml program which
> correctly dealt with those (and similar ones from other lower level
> bindings), for example the interactive command line interpreter catches
> Out_of_memory (but not Unix_error) and calls the gc before iterating. (I
> did see one reference in a book which suggested Out_of_memory wasn't
> catcheable, but that appears to be out of date from what I can tell).

But iterating in the cli would not recover the operation which had
failed.

In the general case, there is no recovery possible when a C language
library function returns ENOMEM.  Sometimes the internal state of the
library will now be unsuitable.  Certainly, a policy of always
retrying the operation after a gc would be very wrong indeed.

Ie if ocaml has the behaviour claimed by Andrew (ie, that the need for
a gc can be detected during malloc and reported as ENOMEM), all ocaml
programs would already suffer from occasional random failures of
operations being performed by code written in C.

Note that "need for a gc" is not an abnormal part of operation in a
consing garbage collected language like ocaml.  It is entirely
routine.  Ie these random failures would occur in normal, otherwise
entirely successful, operation.  They would occur even if the
program's actual memory use was modest.

> This of course relies on all the C bindings correctly turning errors into
> exceptions and the semantics of your particular daemon allowing for
> abandoning things half way through (which I suppose is a relative of some
> kind to crash-only s/w).

Even if the operations are properly abandonded, they have still been
erroneously made to fail.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-02-22 16:48 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-18 18:32 Current LibXL Status Martin Osterloh
2015-11-19  9:20 ` Ian Campbell
2015-11-19 10:23   ` George Dunlap
2015-11-19 10:55   ` Andrew Cooper
2015-11-19 11:23     ` Ian Campbell
2015-11-19 11:30       ` Processed: " xen
2015-11-19 11:33       ` Andrew Cooper
2015-11-19 11:48         ` Ian Campbell
2015-11-19 11:55           ` Ian Campbell
2015-11-19 12:16             ` Ian Campbell
2015-11-20  0:30               ` Yang Hongyang
2016-02-18 17:09               ` George Dunlap
2016-02-18 17:19                 ` Ian Jackson
2016-02-18 17:26                   ` Ian Campbell
2016-02-18 17:40                     ` George Dunlap
2016-02-18 17:24                 ` Ian Campbell
2016-02-18 18:30           ` Ian Jackson
2015-11-19 15:34       ` George Dunlap
2016-02-18 17:26       ` George Dunlap
2016-02-18 17:39         ` Ian Campbell
2016-02-18 17:47           ` George Dunlap
2016-02-18 17:50           ` Ian Campbell
2016-02-18 18:15     ` libxl and malloc failure (Re: Current LibXL Status) Ian Jackson
2016-02-19 10:52       ` Ian Campbell
2016-02-19 11:00         ` Processed: " xen
2016-02-22 16:48         ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.