On 01/20/2016 02:02 AM, Markus Armbruster wrote:

>> @@ -519,6 +519,8 @@ static QObject *parse_literal(JSONParserContext *ctxt)
>>      }
>>      case JSON_FLOAT:
>>          /* FIXME dependent on locale */
>> +        /* FIXME our lexer matches RFC7159 in forbidding Inf or NaN,
> 
> For what it's worth, the RFC spells this "RFC 7159".

Looks like we use space more often than not, but that we're
inconsistent.  For example:

slirp/tcp.h: * Per RFC 793, September, 1981.
slirp/tcp.h: * Per RFC793, September, 1981.

Will fix if I need to respin, otherwise I assume you can do it.


>> +        /* FIXME: snprintf() is locale dependent; but JSON requires
>> +         * numbers to be formatted as if in the C locale. */
> 
> The new FIXME matches the existing one in parse_literal(), visible
> above.
> 
> However, dependence on C locale is a pervasive issue in QEMU.  These two
> comments could give readers the idea that it's an issue just here.
> Suggest to add something like "Dependence on C locale is a pervasive
> issue in QEMU."

Good idea.

> 
>> +        /* FIXME: This risks printing Inf or NaN, which are not valid
>> +         * JSON values. */
>> +        /* FIXME: the default precision of %f may be insufficient to
>> +         * tell this number apart from others. */
> 
> Yup.
> 
> The easy way to avoid loss of precision is %a, but of course that's way
> too much sophistication for JSON.
> 
> Avoiding loss of precision with a decimal format is non-trivial; see
> Steele, Jr., Guy L., and White, Jon L. How to print floating-point
> numbers accurately, SIGPLAN ’90, and later improvements collected here:
> http://stackoverflow.com/questions/7153979/algorithm-to-convert-an-ieee-754-double-to-a-string

Ah, memories.  I read and implemented that paper when working on the
jikes compiler for the Java language back in the late nineties, as it is
the Java language specification which had the very neat property of
requiring the shortest decimal string that will unambiguously round back
to the same floating point pattern.

One alternative is to always output a guaranteed unambiguous decimal
string (although not necessarily the shortest), by using %.17f, using
<float.h> DBL_DECIMAL_DIG.  (Note that DBL_DIG of 15 is NOT sufficient -
it is the lower limit that says that a decimal->float->decimal will not
change the decimal; but we want the converse where a
float->decimal->float will not change the float.  There are stretches of
numbers where the pigeonhole principle applies; you can think of it this
way: there is no way to map all possible 2^10 (1024) binary values
inside 2^3 (1000) decimal digits without at least 24 of them needing one
more decimal digit.  But by the same arguments, DBL_DECIMAL_DIG is an
upper limit and usually more than you need.)

So, the question is whether we want to always output 17 digits, or
whether we want to do the poor-man's truncation scheme (easy to
understand, but not optimal use of the processor), or go all the way to
the algorithm of that paper (faster but lots harder to understand).  For
reference, here's the poor-man's algorithm in pseudocode:

if 0, inf, nan:
    special case
else:
    Obtain the DBL_DECIMAL_DIG string via sprintf %.17f
    i = 17;
    do {
        truncate the original string to i-1 decimal characters
        parse that with strtod()
        if the bit pattern differs:
            break;
    } while (--i);
    assert(i)
    use i digits of the string

As a separate patch, of course, but I have a pending patch that provides
a single place where we could drop in such an improvement:
https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg03932.html

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org