[std-discussion] floating-point to/from_chars: halfway values, underspecification, buffer size

Discussion:

Edward Catmur

2017-04-10 09:52:41 UTC

Firstly, I want to say that I think to_chars and from_chars are a great
addition to the Standard and I look forward to using them in C++17. I have
a few questions regarding their behavior on floating point types.

(As background for the first few questions: for each floating-point type
there are a (relatively) small number of large integers that are exactly
halfway between two adjacent values of that type, and which have a
relatively short scientific decimal representation. For example, 1e23 has
hexadecimal floating-point representation 0x1.52d02c7e14af68p76, which is
exactly halfway between the adjacent IEEE 754 (64-bit) double values
0x1.52d02c7e14af6p76 and 0x1.52d02c7e14af7p76. Parsing the string "1e23"
into double using from_chars [utility.from.chars] is required to produce
one of those two values.)

Firstly, is from_chars expected to have idempotent behavior, or is it
allowed to be dependent on e.g. floating-point environment or the use of
80-bit floating point (32-bit Linux on x86)?

Secondly, is from_chars expected or encouraged to have the same behavior as
the compiler? i.e. for double d; auto s = "1e23" should we expect 1e23 ==
(from_chars(s, s + 4, d), d) ever to fail?

Most importantly, is to_chars permitted to produce an overlong output where
the shorter output round-trips on the same implementation but is not
guaranteed to do so globally? For example, if an implementation always
reads "1e23" as 0x1.52d02c7e14af6p76, is it permitted to output
0x1.52d02c7e14af6p76 as "9.999999999999999e22" on the basis that this is
guaranteed to be read correctly by a different implementation that might
read "1e23" as 0x1.52d02c7e14af7p76?

In addition, I would be interested in knowing whether the following
underspecification is intentional:

Is the result of to_chars() required to represent the closest to the input
value among strings of that length that round-trip? For example
0x1.0000000000001p0 is approx. 1.000000000000000222045, so is
1.0000000000000003 an acceptable output from to_chars, or only
1.0000000000000002? Or consider the smallest positive subnormal IEEE
double, 0x1p-1074, approx. 4.94e-324 - is 4e-324 an acceptable output, or
only 5e-324? (In Florian Loitsch [1], this is the "closeness" property of
Grisu3.)

I hope the above questions don't come across as overly pedantic; I would be
perfectly satisfied to be told that all of the above are QOI matters, but
I'd hope to know what to expect before retiring our current code using
Google double-conversion[2].

Finally, it would be useful to know the minimum buffer size necessary to
guarantee successful conversion in all cases. I would guess this is
something like 4 + numeric_limits<T>::max_digits10 +
max(log10(numeric_limits<T>::max_exponent10), 1 +
log10(-numeric_limits<T>::min_exponent10)) but it would be useful to have
confirmation of this calculation or indeed to have it available in the
Standard as a constant.

Thanks!

1. http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf
2. https://github.com/google/double-conversion

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

'Edward Catmur' via ISO C++ Standard - Discussion

2017-04-10 13:05:47 UTC

Permalink

Post by Edward Catmur
Firstly, I want to say that I think to_chars and from_chars are a great
addition to the Standard and I look forward to using them in C++17. I
have a few questions regarding their behavior on floating point types.
(As background for the first few questions: for each floating-point type
there are a (relatively) small number of large integers that are exactly
halfway between two adjacent values of that type, and which have a
relatively short scientific decimal representation. For example, 1e23
has hexadecimal floating-point representation 0x1.52d02c7e14af68p76,
which is exactly halfway between the adjacent IEEE 754 (64-bit) double
values 0x1.52d02c7e14af6p76 and 0x1.52d02c7e14af7p76. Parsing the string
"1e23" into double using from_chars [utility.from.chars] is required to
produce one of those two values.)
Firstly, is from_chars expected to have idempotent behavior, or is it
allowed to be dependent on e.g. floating-point environment or the use of
80-bit floating point (32-bit Linux on x86)?
Secondly, is from_chars expected or encouraged to have the same behavior
as the compiler? i.e. for double d; auto s = "1e23" should we expect
1e23 == (from_chars(s, s + 4, d), d) ever to fail?
Most importantly, is to_chars permitted to produce an overlong output
where the shorter output round-trips on the same implementation but is
not guaranteed to do so globally? For example, if an implementation
always reads "1e23" as 0x1.52d02c7e14af6p76, is it permitted to output
0x1.52d02c7e14af6p76 as "9.999999999999999e22" on the basis that this is
guaranteed to be read correctly by a different implementation that might
read "1e23" as 0x1.52d02c7e14af7p76?
In addition, I would be interested in knowing whether the following
Is the result of to_chars() required to represent the closest to the
input value among strings of that length that round-trip? For example
0x1.0000000000001p0 is approx. 1.000000000000000222045, so is
1.0000000000000003 an acceptable output from to_chars, or only
1.0000000000000002? Or consider the smallest positive subnormal IEEE
double, 0x1p-1074, approx. 4.94e-324 - is 4e-324 an acceptable output,
or only 5e-324? (In Florian Loitsch [1], this is the "closeness"
property of Grisu3.)
I hope the above questions don't come across as overly pedantic; I would
be perfectly satisfied to be told that all of the above are QOI matters,
but I'd hope to know what to expect before retiring our current code
using Google double-conversion[2].
Finally, it would be useful to know the minimum buffer size necessary to
guarantee successful conversion in all cases. I would guess this is
something like 4 + numeric_limits<T>::max_digits10 +
max(log10(numeric_limits<T>::max_exponent10), 1 +
log10(-numeric_limits<T>::min_exponent10)) but it would be useful to
have confirmation of this calculation or indeed to have it available in
the Standard as a constant.
Thanks!
1. http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf
2. https://github.com/google/double-conversion

The results are not required to be portable, only to roundtrip on the same
"The functions that take a floating-point value but not a precision
parameter ensure that the string representation consists of the smallest
number of characters such that there is at least one digit before the radix
point (if present) and parsing the representation using the corresponding
from_chars function recovers value exactly.
[Note: This guarantee applies only if to_chars and from_chars are executed
on the same implementation. âend note ]"

Thanks. My question (with regard to the round-trip guarantee) was whether a
conforming implementation is *permitted* to ensure portability.

In other words, is a conforming implementation permitted to prioritise
portability over the shortest-string guarantee? This does not seem clear to
me from the quoted text.

Nicol Bolas

2017-04-10 13:29:10 UTC

Permalink

Post by 'Edward Catmur' via ISO C++ Standard - Discussion

The results are not required to be portable, only to roundtrip on the
"The functions that take a floating-point value but not a precision
parameter ensure that the string representation consists of the smallest
number of characters such that there is at least one digit before the radix
point (if present) and parsing the representation using the corresponding
from_chars function recovers value exactly.
[Note: This guarantee applies only if to_chars and from_chars are
executed on the same implementation. âend note ]"

Thanks. My question (with regard to the round-trip guarantee) was whether
a conforming implementation is *permitted* to ensure portability.
In other words, is a conforming implementation permitted to prioritise
portability over the shortest-string guarantee? This does not seem clear to
me from the quoted text.

It seems clear enough to me: the implementation shall generate the smallest
number of characters to permit round-tripping. If the "smallest number of
characters" is insufficient to guarantee inter-implementation portability,
then inter-implementation portability isn't going to happen.

"Smallest" means *smallest*.

'Edward Catmur' via ISO C++ Standard - Discussion

2017-04-10 14:51:34 UTC

Permalink

Post by Nicol Bolas

Post by 'Edward Catmur' via ISO C++ Standard - Discussion

Thanks. My question (with regard to the round-trip guarantee) was whether
a conforming implementation is *permitted* to ensure portability.
In other words, is a conforming implementation permitted to prioritise
portability over the shortest-string guarantee? This does not seem clear to
me from the quoted text.

It seems clear enough to me: the implementation shall generate the
smallest number of characters to permit round-tripping. If the "smallest
number of characters" is insufficient to guarantee inter-implementation
portability, then inter-implementation portability isn't going to happen.
"Smallest" means *smallest*.

OK, I see your point. That's a little unfortunate,

I wonder, would that change if from_chars has nondeterministic behavior or
if its behavior is dependent on environment or configuration settings?

Nicol Bolas

2017-04-10 15:32:10 UTC

Permalink

Post by 'Edward Catmur' via ISO C++ Standard - Discussion
I wonder, would that change if from_chars has nondeterministic behavior or
if its behavior is dependent on environment or configuration settings?

`from_chars` is required to have deterministic behavior (round-tripping),
within its implementation. Matters of "environment" or "configuration
settings" define what the implementation is. So those are
extra-specification matters.

'Edward Catmur' via ISO C++ Standard - Discussion

2017-04-10 21:41:10 UTC

Permalink

`from_chars` is required to have deterministic behavior (round-tripping),
within its implementation.

I'm not sure I understand. Certainly to_chars followed by from_chars is
required to round-trip. But why does it follow that from_chars is required
to be deterministic? Surely it is only required to be deterministic on
values in the codomain of to_chars?

Matters of "environment" or "configuration settings" define what the
implementation is. So those are extra-specification matters.

OK, I see that for compiler flags etc. What about floating point
environment; isn't that within the purview of the standard?

---
You received this message because you are subscribed to a topic in the
Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/
isocpp.org/d/topic/std-discussion/5iGjnDD61tQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.

Richard Smith

2017-04-10 23:00:17 UTC

Permalink

On 10 April 2017 at 14:41, 'Edward Catmur' via ISO C++ Standard -

Post by Nicol Bolas

Post by 'Edward Catmur' via ISO C++ Standard - Discussion
I wonder, would that change if from_chars has nondeterministic behavior
or if its behavior is dependent on environment or configuration settings?

Well, <cfenv> is part of the C++ standard, but it occupies a somewhat
dubious position since a conforming C++ implementation is not required to
support "#pragma STDC FENV_ACCESS ON", and without that, the functions to
modify the floating-point environment in <cfenv> (might) result in UB.

My reading of the current wording is: to_chars must produce a value that
round-trips, regardless of changes to global state between the to_chars
call and the from_chars call (even across invocations of the program, as
far as I can see). Thus if a particular implementation supports modifying
the floating-point environment at runtime, and from_chars depends on the
floating-point environment, then to_chars must produce a[1] shortest
representation that will produce the correct value regardless of the
floating-point environment at the point of the call to from_chars.

I don't know if that matches the intent or not (or even whether this was
considered).

[1]: I see no requirement as to how to choose between multiple shortest
representations, nor that this choice even be deterministic.

Nicol Bolas

2017-04-10 23:46:20 UTC

Permalink

Post by Nicol Bolas

Post by 'Edward Catmur' via ISO C++ Standard - Discussion
I wonder, would that change if from_chars has nondeterministic behavior
or if its behavior is dependent on environment or configuration settings?

It depends on what you mean by "deterministic". The specification says that
a `to_chars`/`from_chars` loop "recovers `value` exactly". I can't think of
something that would be more "deterministic" than that.

After all, the standard doesn't specify the exact representation of
floating-point values. So there's no way that it can guarantee the exact
behavior when given a particular string, except to say if that it will
represent that value and that if it were generated by `to_chars`, you'll
get the exact same value back.

Richard Smith

2017-04-11 01:24:52 UTC

Permalink

Post by Nicol Bolas

Post by 'Edward Catmur' via ISO C++ Standard - Discussion
I wonder, would that change if from_chars has nondeterministic behavior
or if its behavior is dependent on environment or configuration settings?

Guaranteeing that from_chars always produces the same sequence of
characters from the same floating-point value would be more deterministic
than that.

After all, the standard doesn't specify the exact representation of

Post by Nicol Bolas
floating-point values. So there's no way that it can guarantee the exact
behavior when given a particular string, except to say if that it will
represent that value and that if it were generated by `to_chars`, you'll
get the exact same value back.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.

Jens Maurer

2017-04-11 12:15:38 UTC

Permalink

Thanks for your questions.

As an initial comment, I'd like to point out that C++ is woefully
underspecified in its floating-point semantics, so there's lots
of room for QoI.

The round-trip guarantees for to_chars / from_chars are for the
same implementation only, because a different implementation might
not even have enough bits in their "double" to represent the
original number.

Post by Edward Catmur
Firstly, is from_chars expected to have idempotent behavior, or is it
allowed to be dependent on e.g. floating-point environment or the use
of 80-bit floating point (32-bit Linux on x86)?

The question seems to be whether the floating-point environment
is part of the (allowed and unavoidable) implementation divergence
under the hood, or whether it's exposed. Richard makes a good
point here: We do expose the floating-point environment using
<cfenv>, and I don't think there's anything in a string <-> double
conversion that would intrinsically depend on the floating-point
environment (as opposed to, say, floating-point multiplication,
where the rounding mode is taken into consideration).

So, my understanding is that from_chars applied to a given string
should always yield the same "double" value, regardless of floating-
point environment. (That might mean for the implementation to
temporarily switch back to "round to nearest" while parsing the
string.)

(No, that particular question was not considered before.)

Regarding the 80-bit FP on x86 issue, this seems dubious to me to
start with, because storing a "double" computation result in
(e.g. volatile) memory might mean that the double value I read
from that memory doesn't compare equal to the (in-register)
computation result. And of course, it's totally uncontrollable
when the compiler decides to spill some values to memory. So,
arithmetic in 80-bit precision seems ok, but once comparisons
with other doubles happen, it seems least surprising to
truncate / round to 64-bit at that point.

That said, I'm not sure which accidents you envision when
80-bit floating-point numbers are used in from_chars returning
a double.

Post by Edward Catmur
Secondly, is from_chars expected or encouraged to have the same
behavior as the compiler? i.e. for double d; auto s = "1e23" should
we expect 1e23 == (from_chars(s, s + 4, d), d) ever to fail?

It's desirable, yes, but not prescribed. We also allow
"constexpr" floating-point evaluations (at compile-time)
to yield different results from runtime evaluations of
the same expression.

If you feel that a non-normative note would be helpful,
this could be accommodated, I believe.

Post by Edward Catmur
Most importantly, is to_chars permitted to produce an overlong output
where the shorter output round-trips on the same implementation but
is not guaranteed to do so globally?

No, it's required to produce the shortest output so that it can
round-trip on the same implementation.

Post by Edward Catmur
For example, if an
implementation always reads "1e23" as 0x1.52d02c7e14af6p76, is it
permitted to output 0x1.52d02c7e14af6p76 as "9.999999999999999e22" on
the basis that this is guaranteed to be read correctly by a different
implementation that might read "1e23" as 0x1.52d02c7e14af7p76?

No, 1e23 is shorter than the 9.99..999e22 number you gave, so 1e23
takes precedence.

Note that your argument is flawed in that another implementation might
use 128-bit decimal floating-point numbers, where "9.999999999999999e22"
is actually a different number than 1e23, so the round-trip guarantee
is violated right then and there.

Should we give non-normative encouragement to round towards zero
for these cases, so that we get practical portability across
64-bit IEEE double platforms?

Post by Edward Catmur
In addition, I would be interested in knowing whether the following
Is the result of to_chars() required to represent the closest to the
input value among strings of that length that round-trip? For example
0x1.0000000000001p0 is approx. 1.000000000000000222045, so is
1.0000000000000003 an acceptable output from to_chars, or only
1.0000000000000002? Or consider the smallest positive subnormal IEEE
double, 0x1p-1074, approx. 4.94e-324 - is 4e-324 an acceptable
output, or only 5e-324? (In Florian Loitsch [1], this is the
"closeness" property of Grisu3.)

If the output has a choice between two strings that are both
equally short and would both round-trip to the same (original)
number, there is no specification which one to use. I think
that would be an area where we could improve the specification
normatively by prescribing minimal numeric distance to the "true"
number.

Post by Edward Catmur
Finally, it would be useful to know the minimum buffer size necessary
to guarantee successful conversion in all cases. I would guess this
is something like 4 + numeric_limits<T>::max_digits10 +
max(log10(numeric_limits<T>::max_exponent10), 1 +
log10(-numeric_limits<T>::min_exponent10)) but it would be useful to
have confirmation of this calculation or indeed to have it available
in the Standard as a constant.

The calculation seems right to me, but I wouldn't burden the standard
with it. (I do wonder why we have "4 +" at the start, given
that we need to account for the sign, the decimal point, and the
"e" only, which is just 3 characters.)

Jens

Matthew Woehlke

2017-04-11 14:12:28 UTC

Permalink

Post by Jens Maurer
I do wonder why we have "4 +" at the start, given
that we need to account for the sign, the decimal point, and the
"e" only, which is just 3 characters.

Trailing NUL?
--
Matthew

'Edward Catmur' via ISO C++ Standard - Discussion

2017-04-11 14:34:37 UTC

Permalink

Post by Jens Maurer

Hm. According to https://sourceware.org/bugzilla/show_bug.cgi?id=14518
glibc strtod() respects the rounding mode (and failure to do so was
considered a bug). I'm not saying that from_chars would have to behave
identically to strtod, but it might be considered odd if it didn't.

That said, I'm not sure which accidents you envision when

Post by Jens Maurer
80-bit floating-point numbers are used in from_chars returning
a double.

Thinking about it more closely, the values at issue are exactly
representable in 80-bit double so any sensible implementation of from_chars
wouldn't be affected by double-rounding.

Post by Jens Maurer
Secondly, is from_chars expected or encouraged to have the same

Post by Edward Catmur
behavior as the compiler? i.e. for double d; auto s = "1e23" should
we expect 1e23 == (from_chars(s, s + 4, d), d) ever to fail?

It's desirable, yes, but not prescribed. We also allow
"constexpr" floating-point evaluations (at compile-time)
to yield different results from runtime evaluations of
the same expression.
If you feel that a non-normative note would be helpful,
this could be accommodated, I believe.

I guess we already have [expr.const]/6 and footnote 89 thereto, so the
intent is pretty clear.

Post by Jens Maurer
Note that your argument is flawed in that another implementation might
use 128-bit decimal floating-point numbers, where "9.999999999999999e22"
is actually a different number than 1e23, so the round-trip guarantee
is violated right then and there.

Ah, of course. Thanks!

Post by Jens Maurer
Should we give non-normative encouragement to round towards zero
for these cases, so that we get practical portability across
64-bit IEEE double platforms?

Well, I'd prefer round-to-nearest (which breaks ties as round-to-even).

Post by Jens Maurer
I do wonder why we have "4 +" at the start, given
that we need to account for the sign, the decimal point, and the
"e" only, which is just 3 characters.

You need log10(k) + 1 characters to represent an exponent k (e.g. for k =
100 you need 3 characters).

Jens Maurer

2017-04-11 18:56:19 UTC

Permalink

Post by Jens Maurer
So, my understanding is that from_chars applied to a given string
should always yield the same "double" value, regardless of floating-
point environment. (That might mean for the implementation to
temporarily switch back to "round to nearest" while parsing the
string.)
Hm. According to
https://sourceware.org/bugzilla/show_bug.cgi?id=14518 glibc strtod()
respects the rounding mode (and failure to do so was considered a
bug). I'm not saying that from_chars would have to behave identically
to strtod, but it might be considered odd if it didn't.

Hm. I'd really like to keep the round-trip property irrespective
of the active rounding mode. Doesn't that remove the freedom to
respect the rounding mode in from_chars?

Post by Jens Maurer

It's desirable, yes, but not prescribed. We also allow
"constexpr" floating-point evaluations (at compile-time)
to yield different results from runtime evaluations of
the same expression.
If you feel that a non-normative note would be helpful,
this could be accommodated, I believe.
I guess we already have [expr.const]/6 and footnote 89 thereto, so the intent is pretty clear.

But that's far away from to_chars / from_chars, I'd say.

Post by Jens Maurer
Should we give non-normative encouragement to round towards zero
for these cases, so that we get practical portability across
64-bit IEEE double platforms?
Well, I'd prefer round-to-nearest (which breaks ties as round-to-even).

... and should we consider the active rounding mode for these cases?

Post by Jens Maurer
I do wonder why we have "4 +" at the start, given
that we need to account for the sign, the decimal point, and the
"e" only, which is just 3 characters.
You need log10(k) + 1 characters to represent an exponent k (e.g. for k = 100 you need 3 characters).

Right, thanks.

Jens

'Edward Catmur' via ISO C++ Standard - Discussion

2017-04-19 15:20:10 UTC

Permalink

Post by Jens Maurer

Hm. I'd really like to keep the round-trip property irrespective
of the active rounding mode. Doesn't that remove the freedom to
respect the rounding mode in from_chars?

Yes, unless either:
* to_chars also respects the active rounding mode (that is, it outputs an
in-between representation only if that representation rounds to the desired
value in the current rounding mode), or
* to_chars never outputs in-between representations (violating the
shortness requirement, by some lights).

Post by Jens Maurer
Should we give non-normative encouragement to round towards zero

Post by Jens Maurer
for these cases, so that we get practical portability across
64-bit IEEE double platforms?
Well, I'd prefer round-to-nearest (which breaks ties as round-to-even).

... and should we consider the active rounding mode for these cases?

Maybe. I'd certainly expect glibc to do so.