Discussion:
[PATCH] Optimize x*x*x*x*x*x using 3 multiplications.
Roger Sayle
2003-07-29 17:13:44 UTC
Permalink
The following patch allows gcc to optimize floating point expressions
such as x*x*x*x*x*x when compiled with -ffast-math to use only three
multiplications rather than five (as currently generated by mainline).

The secret is to reuse the new __builtin_pow{,f,l} optimizations by
adding three "term rewriting" rules to GCC's constant folding.
These are:

x*x => pow(x,2)
x*pow(x,c) => pow(x,c+1)
pow(x,c)*x => pow(x,c+1)

and just for luck, I've also added:

pow(x,c)/x => pow(x,c-1)

We already have rules for y/pow(x,z) => y*pow(x,-z) and also for
pow(x,y)*pow(x,z) => pow(x,y+z).



For the computer scientists and/or reviewers here are some invariants.
These transformations guarantee that we only use pow/powf/powl as a place
holder when we know that expand_builtin_pow will generate multiplications.
Firstly, we only convert x*x into pow(x,2) with unsafe_math_optimizations
and !optimize_size. Secondly, all of the GCC's constant folding
transformations perserve the second argument as an integer, unless the
source code contains an explicit call to pow/powf/powl. Thirdly, pow
is guaranteed not to affect errno if the second argument is an integer.

One piece of ugliness is that we now have to disable the constant folding
transformation of pow(x,2) into x*x. Clearly, if GCC's constant folding
transforms both ways calling fold on the result, we'll end up with
unbounded recursion. For this reason, pow(x,-1), pow(x,0) and pow(x,1)
have "canonical" representations, 1.0/x, 1.0 and x respectively, and
all other expressions prefer the "functional" form. The patch below
therefore disables the inverse transformation *during folding*.

To compensate for disabling pow(x,2) into x*x in constant folding, which
is done even without -ffast-math or with -Os, I also do the counter-tweak
to expand_builtin_pow, so that we always expand exponents -1, 0, 1 and 2
as RTL. The only functional change should be that we no longer convert
pow(x,-2) as 1.0/(x*x) with "-ffast-math -Os". This should not be
unreasonable as 1.0/(x*x) may be larger on machines without FP insns.


To demonstrate, with "-O2 -ffast-math -fomit-frame-pointer" on x86:

double foo(double x)
{
return x*x*x*x*x*x;
}

Before...
foo: fldl 4(%esp)
fld %st(0)
fmul %st(1), %st
fmul %st(1), %st
fmul %st(1), %st
fmul %st(1), %st
fmulp %st, %st(1)
ret

After...
foo:
fldl 4(%esp)
fld %st(0)
fmul %st(1), %st
fmulp %st, %st(1)
fmul %st(0), %st
ret



The following patch has been tested on i686-pc-linux-gnu with a full
"make bootstrap", all languages except treelang, and regression tested
with a top-level "make -k check" with no new failures.

Ok for mainline?



2003-07-29 Roger Sayle <***@eyesopen.com>

* fold-const.c (fold <MULT_EXPR>): Optimize both x*pow(x,c) and
pow(x,c)*x as pow(x,c+1) for constant values c. Optimize x*x
as pow(x,2.0) when the latter will be expanded back into x*x.
(fold <RDIV_EXPR>): Optimize pow(x,c)/x as pow(x,c-1).
* builtins.c (expand_builtin_pow): Ignore flag_errno_math as
pow can never set errno when used with an integer exponent.
Always use expand_powi when exponent is -1, 0, 1 or 2.
(fold_builtin): Don't rewrite pow(x,2.0) as x*x nor pow(x,-2.0)
as 1.0/(x*x). This avoids unbounded recursion as we now prefer
the pow forms of these expressions.

* gcc.dg/builtins-27.c: New test case.


Index: fold-const.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fold-const.c,v
retrieving revision 1.284
diff -c -3 -p -r1.284 fold-const.c
*** fold-const.c 22 Jul 2003 23:30:06 -0000 1.284
--- fold-const.c 28 Jul 2003 22:15:45 -0000
*************** fold (tree expr)
*** 6049,6054 ****
--- 6049,6128 ----
return build_function_call_expr (sinfn,
TREE_OPERAND (arg0, 1));
}
+
+ /* Optimize x*pow(x,c) as pow(x,c+1). */
+ if (fcode1 == BUILT_IN_POW
+ || fcode1 == BUILT_IN_POWF
+ || fcode1 == BUILT_IN_POWL)
+ {
+ tree arg10 = TREE_VALUE (TREE_OPERAND (arg1, 1));
+ tree arg11 = TREE_VALUE (TREE_CHAIN (TREE_OPERAND (arg1,
+ 1)));
+ if (TREE_CODE (arg11) == REAL_CST
+ && ! TREE_CONSTANT_OVERFLOW (arg11)
+ && operand_equal_p (arg0, arg10, 0))
+ {
+ tree powfn = TREE_OPERAND (TREE_OPERAND (arg1, 0), 0);
+ REAL_VALUE_TYPE c;
+ tree arg, arglist;
+
+ c = TREE_REAL_CST (arg11);
+ real_arithmetic (&c, PLUS_EXPR, &c, &dconst1);
+ arg = build_real (type, c);
+ arglist = build_tree_list (NULL_TREE, arg);
+ arglist = tree_cons (NULL_TREE, arg0, arglist);
+ return build_function_call_expr (powfn, arglist);
+ }
+ }
+
+ /* Optimize pow(x,c)*x as pow(x,c+1). */
+ if (fcode0 == BUILT_IN_POW
+ || fcode0 == BUILT_IN_POWF
+ || fcode0 == BUILT_IN_POWL)
+ {
+ tree arg00 = TREE_VALUE (TREE_OPERAND (arg0, 1));
+ tree arg01 = TREE_VALUE (TREE_CHAIN (TREE_OPERAND (arg0,
+ 1)));
+ if (TREE_CODE (arg01) == REAL_CST
+ && ! TREE_CONSTANT_OVERFLOW (arg01)
+ && operand_equal_p (arg1, arg00, 0))
+ {
+ tree powfn = TREE_OPERAND (TREE_OPERAND (arg0, 0), 0);
+ REAL_VALUE_TYPE c;
+ tree arg, arglist;
+
+ c = TREE_REAL_CST (arg01);
+ real_arithmetic (&c, PLUS_EXPR, &c, &dconst1);
+ arg = build_real (type, c);
+ arglist = build_tree_list (NULL_TREE, arg);
+ arglist = tree_cons (NULL_TREE, arg1, arglist);
+ return build_function_call_expr (powfn, arglist);
+ }
+ }
+
+ /* Optimize x*x as pow(x,2.0), which is expanded as x*x. */
+ if (! optimize_size
+ && operand_equal_p (arg0, arg1, 0))
+ {
+ tree powfn;
+
+ if (type == double_type_node)
+ powfn = implicit_built_in_decls[BUILT_IN_POW];
+ else if (type == float_type_node)
+ powfn = implicit_built_in_decls[BUILT_IN_POWF];
+ else if (type == long_double_type_node)
+ powfn = implicit_built_in_decls[BUILT_IN_POWL];
+ else
+ powfn = NULL_TREE;
+
+ if (powfn)
+ {
+ tree arg = build_real (type, dconst2);
+ tree arglist = build_tree_list (NULL_TREE, arg);
+ arglist = tree_cons (NULL_TREE, arg0, arglist);
+ return build_function_call_expr (powfn, arglist);
+ }
+ }
}
}
goto associate;
*************** fold (tree expr)
*** 6304,6309 ****
--- 6378,6407 ----
return fold (build (RDIV_EXPR, type,
build_real (type, dconst1),
tmp));
+ }
+ }
+
+ /* Optimize pow(x,c)/x as pow(x,c-1). */
+ if (fcode0 == BUILT_IN_POW
+ || fcode0 == BUILT_IN_POWF
+ || fcode0 == BUILT_IN_POWL)
+ {
+ tree arg00 = TREE_VALUE (TREE_OPERAND (arg0, 1));
+ tree arg01 = TREE_VALUE (TREE_CHAIN (TREE_OPERAND (arg0, 1)));
+ if (TREE_CODE (arg01) == REAL_CST
+ && ! TREE_CONSTANT_OVERFLOW (arg01)
+ && operand_equal_p (arg1, arg00, 0))
+ {
+ tree powfn = TREE_OPERAND (TREE_OPERAND (arg0, 0), 0);
+ REAL_VALUE_TYPE c;
+ tree arg, arglist;
+
+ c = TREE_REAL_CST (arg01);
+ real_arithmetic (&c, MINUS_EXPR, &c, &dconst1);
+ arg = build_real (type, c);
+ arglist = build_tree_list (NULL_TREE, arg);
+ arglist = tree_cons (NULL_TREE, arg1, arglist);
+ return build_function_call_expr (powfn, arglist);
}
}
}
Index: builtins.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/builtins.c,v
retrieving revision 1.234
diff -c -3 -p -r1.234 builtins.c
*** builtins.c 24 Jul 2003 21:04:12 -0000 1.234
--- builtins.c 28 Jul 2003 22:15:53 -0000
*************** expand_builtin_pow (tree exp, rtx target
*** 2170,2179 ****
arg0 = TREE_VALUE (arglist);
arg1 = TREE_VALUE (TREE_CHAIN (arglist));

! if (flag_unsafe_math_optimizations
! && ! flag_errno_math
! && ! optimize_size
! && TREE_CODE (arg1) == REAL_CST
&& ! TREE_CONSTANT_OVERFLOW (arg1))
{
REAL_VALUE_TYPE cint;
--- 2170,2176 ----
arg0 = TREE_VALUE (arglist);
arg1 = TREE_VALUE (TREE_CHAIN (arglist));

! if (TREE_CODE (arg1) == REAL_CST
&& ! TREE_CONSTANT_OVERFLOW (arg1))
{
REAL_VALUE_TYPE cint;
*************** expand_builtin_pow (tree exp, rtx target
*** 2183,2195 ****
c = TREE_REAL_CST (arg1);
n = real_to_integer (&c);
real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
! if (real_identical (&c, &cint)
! && powi_cost (n) <= POWI_MAX_MULTS)
{
! enum machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
! rtx op = expand_expr (arg0, subtarget, VOIDmode, 0);
! op = force_reg (mode, op);
! return expand_powi (op, mode, n);
}
}
return expand_builtin_mathfn_2 (exp, target, NULL_RTX);
--- 2180,2200 ----
c = TREE_REAL_CST (arg1);
n = real_to_integer (&c);
real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
! if (real_identical (&c, &cint))
{
! /* If the exponent is -1, 0, 1 or 2, then expand_powi is exact.
! Otherwise, check the number of multiplications required.
! Note that pow never sets errno for an integer exponent. */
! if ((n >= -1 && n <= 2)
! || (flag_unsafe_math_optimizations
! && ! optimize_size
! && powi_cost (n) <= POWI_MAX_MULTS))
! {
! enum machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
! rtx op = expand_expr (arg0, subtarget, VOIDmode, 0);
! op = force_reg (mode, op);
! return expand_powi (op, mode, n);
! }
}
}
return expand_builtin_mathfn_2 (exp, target, NULL_RTX);
*************** fold_builtin (tree exp)
*** 6244,6271 ****
return fold (build (RDIV_EXPR, type,
build_real (type, dconst1),
arg0));
-
- /* Optimize pow(x,2.0) = x*x. */
- if (REAL_VALUES_EQUAL (c, dconst2)
- && (*lang_hooks.decls.global_bindings_p) () == 0
- && ! CONTAINS_PLACEHOLDER_P (arg0))
- {
- arg0 = save_expr (arg0);
- return fold (build (MULT_EXPR, type, arg0, arg0));
- }
-
- /* Optimize pow(x,-2.0) = 1.0/(x*x). */
- if (flag_unsafe_math_optimizations
- && REAL_VALUES_EQUAL (c, dconstm2)
- && (*lang_hooks.decls.global_bindings_p) () == 0
- && ! CONTAINS_PLACEHOLDER_P (arg0))
- {
- arg0 = save_expr (arg0);
- return fold (build (RDIV_EXPR, type,
- build_real (type, dconst1),
- fold (build (MULT_EXPR, type,
- arg0, arg0))));
- }

/* Optimize pow(x,0.5) = sqrt(x). */
if (flag_unsafe_math_optimizations
--- 6249,6254 ----


/* Copyright (C) 2003 Free Software Foundation.

Check that constant folding of built-in math functions doesn't
break anything and produces the expected results.

Written by Roger Sayle, 29th July 2003. */

/* { dg-do link } */
/* { dg-options "-O2 -ffast-math" } */

extern void link_error(void);

extern double pow(double,double);

void test(double x)
{
if (pow(x,2.0) != x*x)
link_error ();

if (x*pow(x,2.0) != pow(x,3.0))
link_error ();

if (pow(x,2.0)*x != pow(x,3.0))
link_error ();

if (pow(x,3.0) != x*x*x)
link_error ();

if (pow(x,2.0)*x != x*x*x)
link_error ();

if (x*pow(x,2.0) != x*x*x)
link_error ();

if (pow(x,3.0)/x != pow(x,2.0))
link_error ();

if (pow(x,3.0)/x != x*x)
link_error ();
}

int main()
{
test (2.0);
return 0;
}



Roger
--
Roger Sayle, E-mail: ***@eyesopen.com
OpenEye Scientific Software, WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road, Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507. Fax: (+1) 505-473-0833
Gabriel Dos Reis
2003-07-29 17:31:27 UTC
Permalink
Roger Sayle <***@eyesopen.com> writes:

Just to make sure I understand what you're acheiving.

| One piece of ugliness is that we now have to disable the constant folding
| transformation of pow(x,2) into x*x.

People have been recently suggesting to forward calls to pow() in the
hope that it would, for example, expand pow(x, 2) into x * x. What
are the reasons for nto doing that?

| Clearly, if GCC's constant folding
| transforms both ways calling fold on the result, we'll end up with
| unbounded recursion.

Clearly.

-- Gaby
Roger Sayle
2003-07-29 18:35:08 UTC
Permalink
Hi Gaby,
Post by Gabriel Dos Reis
Just to make sure I understand what you're acheiving.
| One piece of ugliness is that we now have to disable the constant
| folding transformation of pow(x,2) into x*x.
People have been recently suggesting to forward calls to pow() in the
hope that it would, for example, expand pow(x, 2) into x * x. What
are the reasons for not doing that?
The distinction needs to be made between tree-level constant folding
and RTL expansion and later passes. Basically, the game plan is
for x*x to be converted into pow(x,2) at the tree-level, and then
converted back to x*x at the RTL-level. Hence pow(x,2) will always
generate x*x, but the pass where that particular optimization will
be performed will be moved later in the compiler.

We only have a problem with infinite recursion if we convert from X to Y
and from Y to X in a single pass. Hence, at the tree-level, we use a
high level of abstraction, pow(x,2), but when we lower to RTL, perform
the reverse transformation. My changes to fold-const performed the
first step, and my changes to builtins.c performed the second.


[This is very similar to things like address calculations where we
strive to use higher levels of abstraction such as multiplications
at the early in the compiler, but then convert them back to shifts
and additions prior to code generation.]


Does this make things any clearer?



The std::pow thread on "gcc" is an insteresting one. I too would
recommend forwarding std::pow(double,int) to std::pow(double,double) for
the case where the second argument is __builtin_constant_p. An even
better solution would be to provide a __builtin_powi function directly
that is always guaranteed to expanded inline, independently of -Os and
-ffast-math, and that expands to a loop (that can be unrolled!) even
when the second argument isn't constant. This functionality doesn't
correspond to a function in libc, but is useful for the both the
fortran and C++ front-ends.

Roger
--
Phil Edwards
2003-07-29 18:40:54 UTC
Permalink
Post by Roger Sayle
An even
better solution would be to provide a __builtin_powi function directly
that is always guaranteed to expanded inline, independently of -Os and
-ffast-math, and that expands to a loop (that can be unrolled!) even
when the second argument isn't constant.
Ooooh. I like that idea. I might even try implementing that one.
--
If ye love wealth greater than liberty, the tranquility of servitude greater
than the animating contest for freedom, go home and leave us in peace. We seek
not your counsel, nor your arms. Crouch down and lick the hand that feeds you;
and may posterity forget that ye were our countrymen. - Samuel Adams
Gabriel Dos Reis
2003-07-29 19:12:40 UTC
Permalink
Roger Sayle <***@eyesopen.com> writes:

| Hi Gaby,
| > Just to make sure I understand what you're acheiving.
| >
| > | One piece of ugliness is that we now have to disable the constant
| > | folding transformation of pow(x,2) into x*x.
| >
| > People have been recently suggesting to forward calls to pow() in the
| > hope that it would, for example, expand pow(x, 2) into x * x. What
| > are the reasons for not doing that?
|
| The distinction needs to be made between tree-level constant folding
| and RTL expansion and later passes.

I guess the key sentence is the above.

[...]

| Does this make things any clearer?

Oh, yes. Thanks.

| The std::pow thread on "gcc" is an insteresting one. I too would
| recommend forwarding std::pow(double,int) to std::pow(double,double) for
| the case where the second argument is __builtin_constant_p. An even
| better solution would be to provide a __builtin_powi function directly

I was going to contact you to see how we could manage to provide
support for things like

__builtin_pow{, f, l}i(FLOAT, int)
with FLOAT in {double, float, long double}

I'm not sure I'm happy with the notion of forwarding std::pow(double, int)
to std::pow(double, double).

| that is always guaranteed to expanded inline, independently of -Os and
| -ffast-math, and that expands to a loop (that can be unrolled!) even
| when the second argument isn't constant. This functionality doesn't
| correspond to a function in libc, but is useful for the both the
| fortran and C++ front-ends.

Oh yes.

Thanks,

-- Gaby
Roger Sayle
2003-07-29 20:09:47 UTC
Permalink
Post by Gabriel Dos Reis
| The std::pow thread on "gcc" is an insteresting one. I too would
| recommend forwarding std::pow(double,int) to std::pow(double,double) for
| the case where the second argument is __builtin_constant_p. An even
| better solution would be to provide a __builtin_powi function directly
I was going to contact you to see how we could manage to provide
support for things like
__builtin_pow{, f, l}i(FLOAT, int)
with FLOAT in {double, float, long double}
That's a neat idea. I'll admit that I hadn't thought about it but adding
builtins for powif and powil in addition to powi is obviously a excellent
suggestion. The current expand_builtin_powi code already handles all
three forms (but only for second argument constant). I've Cc'd Phil
Edwards who's expressed an interest in implementing this. Phil, let me
know if there's anything I can do to help.

It might also be worth contacting the glibc/newlib folks to check that
they don't already have a name for these functions, even if they don't
have any immediate plans to provide implementations. It would be
unfortunate to have this functionality ratified by a future ISO C/C++
standard, but under a different name (if its avoidable).

Roger
--
Gabriel Dos Reis
2003-07-29 20:25:33 UTC
Permalink
Roger Sayle <***@eyesopen.com> writes:

| On 29 Jul 2003, Gabriel Dos Reis wrote:
| > | The std::pow thread on "gcc" is an insteresting one. I too would
| > | recommend forwarding std::pow(double,int) to std::pow(double,double) for
| > | the case where the second argument is __builtin_constant_p. An even
| > | better solution would be to provide a __builtin_powi function directly
| >
| > I was going to contact you to see how we could manage to provide
| > support for things like
| >
| > __builtin_pow{, f, l}i(FLOAT, int)
| > with FLOAT in {double, float, long double}
|
| That's a neat idea. I'll admit that I hadn't thought about it but adding
| builtins for powif and powil in addition to powi is obviously a excellent
| suggestion. The current expand_builtin_powi code already handles all
| three forms (but only for second argument constant). I've Cc'd Phil
| Edwards who's expressed an interest in implementing this. Phil, let me
| know if there's anything I can do to help.
|
| It might also be worth contacting the glibc/newlib folks to check that
| they don't already have a name for these functions, even if they don't
| have any immediate plans to provide implementations. It would be
| unfortunate to have this functionality ratified by a future ISO C/C++
| standard, but under a different name (if its avoidable).

Hmm, I'm not sure I understand the second paragraph.
Normally, glibc people are not supposed to provide an implementation to
anything named __builtin_xxx, am I wrong?
Secondly, the C++ standard does not use any name starting with __builtin_.
However, the C++ standards does mandate

double std::pow(double, int);
float std::pow(float, int);
long double std::pow(long double, int);

(Note: they are not supposed to be defined at the global scope).

My plan was to implement them like this

namespace std
{
inline double
pow(double __x, int __n)
{ return __builtin_powi(__x, __n)

inline float
pow(float __x, int __n)
{ return __builtin_powfi(__x, __n)

inline long double
pow(long double __x, int __n)
{ return __builtin_powli(__x, __n)
}

(that is basically the scheme we use for most functions with built-in
implementations).

-- Gaby
Roger Sayle
2003-07-29 20:44:10 UTC
Permalink
Post by Gabriel Dos Reis
| It might also be worth contacting the glibc/newlib folks to check that
| they don't already have a name for these functions, even if they don't
| have any immediate plans to provide implementations. It would be
| unfortunate to have this functionality ratified by a future ISO C/C++
| standard, but under a different name (if its avoidable).
Hmm, I'm not sure I understand the second paragraph.
Normally, glibc people are not supposed to provide an implementation to
anything named __builtin_xxx, am I wrong?
By convention, the name "xxx" used in "__builtin_xxx" attempts follow
the naming and parameter passing conventions mandated by the C and C++
standards. Where possible if the same functionality exists then the
same name should be used.

For example in the C library, the suffix letters "l" and "f", by
convention denote the float and long double forms of a function when used
at the end of an identifier. This would prefer "powif" over "powfi",
for example.

Normally, this isn't an issue as builtins are either purely GCC_BUILTIN
or LIB_BUILTIN, but this is an example that falls somewhere between the
two. Another example I'm investigating is "__builtin_memequal" or
"__builtin_memsame" or "__builtin_memeq" which is intended to be a
internal specialization of memcmp/strcmp/strncmp when the result is
only tested for equality/inequality with zero.

I hate naming things and if a higher authority can provide guidance,
I'm more than happy to follow it. C++'s STL isn't much of a help
here because of the polymorphism. Perhaps __builtin_Z3powdi? But I
now understand why you prefer the "f" and "l" before the "i" :>

Roger
--
Gabriel Dos Reis
2003-07-29 21:02:15 UTC
Permalink
Roger Sayle <***@eyesopen.com> writes:

| On 29 Jul 2003, Gabriel Dos Reis wrote:
| > | It might also be worth contacting the glibc/newlib folks to check that
| > | they don't already have a name for these functions, even if they don't
| > | have any immediate plans to provide implementations. It would be
| > | unfortunate to have this functionality ratified by a future ISO C/C++
| > | standard, but under a different name (if its avoidable).
| >
| > Hmm, I'm not sure I understand the second paragraph.
| > Normally, glibc people are not supposed to provide an implementation to
| > anything named __builtin_xxx, am I wrong?
|
| By convention, the name "xxx" used in "__builtin_xxx" attempts follow
| the naming and parameter passing conventions mandated by the C and C++
| standards. Where possible if the same functionality exists then the
| same name should be used.

Aha, I know get what you meant by "name for these functions". Yes, I
knew of that convention but for some reasons I could not understand
you meant that.

I was thinking of those as purely internal to GCC, of the "class"
GCC_BUILTIN, not LIB_BUILTIN.

| For example in the C library, the suffix letters "l" and "f", by
| convention denote the float and long double forms of a function when used
| at the end of an identifier. This would prefer "powif" over "powfi",
| for example.

Sure, I don't insist on powfi -- although powif looks odd to me
because the arguments are in the reverse order.

The C standard does not define any function like the one we're
discussing; but it provides the following Hungarian notation framework

[#1] The header <math.h> declares two types and many
mathematical functions and defines several macros. Most
synopses specify a family of functions consisting of a
principal function with one or more double parameters, a
double return value, or both; and other functions with the
same name but with f and l suffixes, which are corresponding
functions with float and long double parameters, return
values, or both.189) Integer arithmetic functions and
conversion functions are discussed later.

so go for powif for consistency.

[...]

| I hate naming things and if a higher authority can provide guidance,
| I'm more than happy to follow it. C++'s STL isn't much of a help
| here because of the polymorphism. Perhaps __builtin_Z3powdi? But I
| now understand why you prefer the "f" and "l" before the "i" :>

I guess you understand why I can't handle Hungarian-like notation :-D

-- Gaby
Jakub Jelinek
2003-07-29 21:13:36 UTC
Permalink
Post by Roger Sayle
I hate naming things and if a higher authority can provide guidance,
I'm more than happy to follow it. C++'s STL isn't much of a help
here because of the polymorphism. Perhaps __builtin_Z3powdi? But I
now understand why you prefer the "f" and "l" before the "i" :>
Shouldn't that be:
__builtin__ZSt3powdi
__builtin__ZSt3powfi
__builtin__ZSt3powei
?

Jakub
Phil Edwards
2003-07-29 21:19:38 UTC
Permalink
Post by Jakub Jelinek
Post by Roger Sayle
I hate naming things and if a higher authority can provide guidance,
I'm more than happy to follow it. C++'s STL isn't much of a help
here because of the polymorphism. Perhaps __builtin_Z3powdi? But I
now understand why you prefer the "f" and "l" before the "i" :>
__builtin__ZSt3powdi
__builtin__ZSt3powfi
__builtin__ZSt3powei
_Z11__builtinSt3pow?i, probably. And yes, I'd prefer the "f" and "l"
before the "i" as well, but it's not my call.
--
If ye love wealth greater than liberty, the tranquility of servitude greater
than the animating contest for freedom, go home and leave us in peace. We seek
not your counsel, nor your arms. Crouch down and lick the hand that feeds you;
and may posterity forget that ye were our countrymen. - Samuel Adams
Gabriel Dos Reis
2003-07-29 21:30:22 UTC
Permalink
Phil Edwards <***@jaj.com> writes:

| On Tue, Jul 29, 2003 at 05:13:36PM -0400, Jakub Jelinek wrote:
| > On Tue, Jul 29, 2003 at 02:44:10PM -0600, Roger Sayle wrote:
| > > I hate naming things and if a higher authority can provide guidance,
| > > I'm more than happy to follow it. C++'s STL isn't much of a help
| > > here because of the polymorphism. Perhaps __builtin_Z3powdi? But I
| > > now understand why you prefer the "f" and "l" before the "i" :>
| >
| > Shouldn't that be:
| > __builtin__ZSt3powdi
| > __builtin__ZSt3powfi
| > __builtin__ZSt3powei
|
| _Z11__builtinSt3pow?i, probably.

Well, since everybody wants the answer, it is _ZSt13__builtin_pow?i
corresponding to

namespace std
{
double __builtin_pow(<FLOAT>, int);
}

But, I would expect them to have a "C" linkage specification.
But that does not matter since they are internal to GCC anyway.

Oh wait a minute, if they are internal to GCC why can't we just
overload them (therefore dropping the "C" linkage requirement) and
call them __builtin_pow?

-- Gaby
Phil Edwards
2003-07-29 21:37:04 UTC
Permalink
Post by Gabriel Dos Reis
Well, since everybody wants the answer, it is _ZSt13__builtin_pow?i
corresponding to
namespace std
{
double __builtin_pow(<FLOAT>, int);
}
I was joking, actually.
Post by Gabriel Dos Reis
Oh wait a minute, if they are internal to GCC why can't we just
overload them (therefore dropping the "C" linkage requirement) and
call them __builtin_pow?
I'm fairly certain that DEF_BUILTIN will not understand overloaded names.


Phil
--
If ye love wealth greater than liberty, the tranquility of servitude greater
than the animating contest for freedom, go home and leave us in peace. We seek
not your counsel, nor your arms. Crouch down and lick the hand that feeds you;
and may posterity forget that ye were our countrymen. - Samuel Adams
Gabriel Dos Reis
2003-07-30 04:52:37 UTC
Permalink
Phil Edwards <***@jaj.com> writes:
| > Oh wait a minute, if they are internal to GCC why can't we just
| > overload them (therefore dropping the "C" linkage requirement) and
| > call them __builtin_pow?
|
| I'm fairly certain that DEF_BUILTIN will not understand overloaded names.

Not that I wanted it to happen, but I'm certain that DEF_BUILTIN doesn't
care. It cares however about ENUM and TYPE. But again, it is not
something important for me.

-- Gaby
Gabriel Dos Reis
2003-07-29 21:20:02 UTC
Permalink
Jakub Jelinek <***@redhat.com> writes:

| On Tue, Jul 29, 2003 at 02:44:10PM -0600, Roger Sayle wrote:
| > I hate naming things and if a higher authority can provide guidance,
| > I'm more than happy to follow it. C++'s STL isn't much of a help
| > here because of the polymorphism. Perhaps __builtin_Z3powdi? But I
| > now understand why you prefer the "f" and "l" before the "i" :>
|
| Shouldn't that be:
| __builtin__ZSt3powdi
| __builtin__ZSt3powfi
| __builtin__ZSt3powei
| ?

No -- for one thing, you'll have to put the prefix "_Z" at the
begining :-)

Anyway, they have "C" linkage specification, so they won't be
mangled. Ouf.

-- Gaby
Richard Henderson
2003-07-31 03:09:18 UTC
Permalink
Post by Roger Sayle
The following patch allows gcc to optimize floating point expressions
such as x*x*x*x*x*x when compiled with -ffast-math to use only three
multiplications rather than five (as currently generated by mainline).
I'd really wish we handled this properly via reassociation.
Post by Roger Sayle
+ /* Optimize x*pow(x,c) as pow(x,c+1). */
+ if (fcode1 == BUILT_IN_POW
+ || fcode1 == BUILT_IN_POWF
+ || fcode1 == BUILT_IN_POWL)
+ {
+ tree arg10 = TREE_VALUE (TREE_OPERAND (arg1, 1));
+ tree arg11 = TREE_VALUE (TREE_CHAIN (TREE_OPERAND (arg1,
+ 1)));
+ if (TREE_CODE (arg11) == REAL_CST
+ && ! TREE_CONSTANT_OVERFLOW (arg11)
+ && operand_equal_p (arg0, arg10, 0))
I don't see how you'll prevent "x*x*x" -> "pow(x,3)" ...
Post by Roger Sayle
! /* If the exponent is -1, 0, 1 or 2, then expand_powi is exact.
! Otherwise, check the number of multiplications required.
! Note that pow never sets errno for an integer exponent. */
! if ((n >= -1 && n <= 2)
! || (flag_unsafe_math_optimizations
! && ! optimize_size
! && powi_cost (n) <= POWI_MAX_MULTS))
from *not* calling the libm function if !flag_unsafe_math_optimizations.


r~
Roger Sayle
2003-07-31 03:53:36 UTC
Permalink
Post by Richard Henderson
Post by Roger Sayle
The following patch allows gcc to optimize floating point expressions
such as x*x*x*x*x*x when compiled with -ffast-math to use only three
multiplications rather than five (as currently generated by mainline).
I'd really wish we handled this properly via reassociation.
I'm not sure whether reassociation alone could work out that x^23
is better calculated as x^13*x^10 instead of x^22*x. However, if
you know of a paper or have suggestions of how reassociation should
be implemented, I can certainly look into it. I'm sure it could be
profitable for other operators.

I haven't checked but I'd guess GCC also handles "x+x+x+x+x+x" the same
way as my patch, i.e. by representing it internally by a multiplication.
Post by Richard Henderson
I don't see how you'll prevent "x*x*x" -> "pow(x,3)" ...
from *not* calling the libm function if !flag_unsafe_math_optimizations.
It's probably not clear from the immediate context of the patch, but
the transformation to convert x*x into pow(x,2) is already nested inside
an "if (flag_unsafe_math_optimizations)" condition [fold-const.c:5944].
I could repeat the flag_unsafe_math_optimizations check redundantly
if you think that would make things clearer?


I agree there's an uncomfortable relationship between the conditions
we use to convert x*x into pow(x,2), and the conditions that are used
to check whether we expand things inline. They are consistent at the
moment, but at worst we may call libm's pow but only under -ffast-math
and when its guaranteed to be there. For example, it seems awkward
that we can't optimize x*x*x*x with -ffree-standing! Ultimately, my
suggestion of adding a __builtin_powi [or an equivalent EXP_EXPR tree
node] that is always expanded inline (into a loop if the second argument
isn't constant) would provide a better intermediate representation
internally.

I think the current patch provides a nice optimization with the
infrastructure we've already got. But looking forward, I'm very
interested in your opinions on a __builtin_powi and/or an EXP_EXPR.
Similarly, do you have a good name for __builtin_memeq? I hate
naming things.

Roger
--
Richard Henderson
2003-07-31 18:44:44 UTC
Permalink
Post by Roger Sayle
I'm not sure whether reassociation alone could work out that x^23
is better calculated as x^13*x^10 instead of x^22*x.
Well, lets see:

a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a
= (a*a)*(a*a)*(a*a)*(a*a)*(a*a)*(a*a)*(a*a)*(a*a)*(a*a)*(a*a)*(a*a)*a

b = a*a = a^2
= b*b*b*b*b*b*b*b*b*b*b*a
= (b*b)*(b*b)*(b*b)*(b*b)*(b*b)*(b*a)

c = b*a = a^3
d = b*b = a^4
= d*d*d*d*d*c
= (d*d)*(d*d)*(d*c)

e = d*c = a^7
f = d*d = a^8
= f*f*e

Yields 7 multiplications via bottom-up reassociation.

r = a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a
= (a*a*a*a*a*a*a*a*a*a*a*a)*(a*a*a*a*a*a*a*a*a*a*a)
= b * c
b = a*a*a*a*a*a*a*a*a*a*a*a = a^12
= (a*a*a*a*a*a)*(a*a*a*a*a*a)
= d * d
c = a*a*a*a*a*a*a*a*a*a*a = a^11
= (a*a*a*a*a*a)*(a*a*a*a*a)
= d * e
d = a*a*a*a*a*a = a^6
= (a*a*a)*(a*a*a)
= f * f
e = a*a*a*a*a = a^5
= (a*a*a)*(a*a)
= f * g
f = a*a*a = a^3
= (a*a)*a
= g * a
g = a*a = a^2

Yields a different set of 7 multiplications via top-down reassociation.

r = x^23 = x^13 * x^10 = a * b
a = x^13 = x^10 * x^3 = b * d
b = x^10 = x^5 * x^5 = c * c
c = x^5 = x^3 * x^2 = d * e
d = x^3 = x^2 * x = e * x
e = x^2 = x * x = x * x

Your power table yields 6 multiplications total. Hmm...

So while reassociation can produce similar results for some inputs,
(e.g. the x^6 in the subject line) it can't for all inputs. I can't
argue with results like that.
Post by Roger Sayle
I haven't checked but I'd guess GCC also handles "x+x+x+x+x+x" the same
way as my patch, i.e. by representing it internally by a multiplication.
Yes, it should.
Post by Roger Sayle
It's probably not clear from the immediate context of the patch, but
the transformation to convert x*x into pow(x,2) is already nested inside
an "if (flag_unsafe_math_optimizations)" condition [fold-const.c:5944].
Ah.
Post by Roger Sayle
I could repeat the flag_unsafe_math_optimizations check redundantly
if you think that would make things clearer?
No, that's ok.
Post by Roger Sayle
Ultimately, my suggestion of adding a __builtin_powi [or an equivalent
EXP_EXPR tree node]
Or both. Ask the Fortran folk what they need from such a node.
I'm not sure what they're doing with "**" at the moment...
Post by Roger Sayle
Similarly, do you have a good name for __builtin_memeq?
Huh?


r~
Roger Sayle
2003-07-31 20:50:55 UTC
Permalink
Post by Richard Henderson
Post by Roger Sayle
I'm not sure whether reassociation alone could work out that x^23
is better calculated as x^13*x^10 instead of x^22*x.
Yields 7 multiplications via bottom-up reassociation.
Yields a different set of 7 multiplications via top-down reassociation.
Your power table yields 6 multiplications total. Hmm...
You might have guessed 23 wasn't chosen at random. The minimal addition
chain problem is NP-complete, so a lookup table of optimal results really
helps. I apologise for loading the dice.
Post by Richard Henderson
So while reassociation can produce similar results for some inputs,
(e.g. the x^6 in the subject line) it can't for all inputs. I can't
argue with results like that.
Reassociaton! I now have a name for my pain. I've just been reading
up on reassociation-based optimization in compilers and several of GCC's
deficiencies on my wish list turn out to be reassociation related.
For example, how "(x & c1) & c2" isn't reduced in simplify_rtx, or
how the following code:

extern int z1, z2;
void foo(int x, int y, int z)
{
z1 = (x+y)+z;
z2 = x+(y+z);
}

gets optimized to z2 = z1 with MSVC but not by GCC. Or for the current
example, x*y*y*x*x*y which could be simplified via pow(x,3)*pow(y,3) into
pow(x*y,3), and so on, etc... I now have a new life goal to implement
better reassociation in GCC's constant folding and RTL optimizers.
Post by Richard Henderson
Post by Roger Sayle
I haven't checked but I'd guess GCC also handles "x+x+x+x+x+x" the same
way as my patch, i.e. by representing it internally by a multiplication.
Yes, it should.
Arghh!! I just tried the experiment:

foo:
fldl 4(%esp)
fld %st(0)
fadd %st(1), %st
fadd %st(1), %st
fadd %st(1), %st
fadd %st(1), %st
faddp %st, %st(1)
ret


Even with -ffast-math. Yet another new entry on my TODO list!
Post by Richard Henderson
Post by Roger Sayle
It's probably not clear from the immediate context of the patch, but
the transformation to convert x*x into pow(x,2) is already nested inside
an "if (flag_unsafe_math_optimizations)" condition [fold-const.c:5944].
Ah.
Post by Roger Sayle
I could repeat the flag_unsafe_math_optimizations check redundantly
if you think that would make things clearer?
No, that's ok.
Can I take that as an approval for the current patch?


Roger
--
Richard Henderson
2003-07-31 21:34:19 UTC
Permalink
Post by Roger Sayle
I apologise for loading the dice.
:-)
Post by Roger Sayle
I now have a new life goal to implement better reassociation
in GCC's constant folding and RTL optimizers.
Well, I'll not stop you, but given my druthers I'd really prefer that
this work be done for tree-ssa.

In the first case because the constant folder only sees full expressions
if they are written that way by the user. E.g. suppose we have a smart
programmer who has done his own reassociation for x^23, but doesn't know
about the power table approach. In that case we'll not be able to put
it all back together to generate optimial code.

In the second case because we'd like to relegate rtl optimization to
what it is good at -- target-specific representation and optimization.

If I remember right, you can do reassociation at the same time that you
do global value numbering. Something like: first pass of GVN collects
the as-written expression tree, then you apply height reduction
transformations to the expression tree (reassociation), then the final
pass of GVN writes it all back out again in the optimal places in the CFG.
Post by Roger Sayle
Can I take that as an approval for the current patch?
Yes.


r~
Michael Matz
2003-08-01 00:54:03 UTC
Permalink
Hi,
Post by Roger Sayle
extern int z1, z2;
void foo(int x, int y, int z)
{
z1 = (x+y)+z;
z2 = x+(y+z);
}
gets optimized to z2 = z1 with MSVC but not by GCC.
I have a somewhat ugly patch for this problem which basically reorders all
operands of an associative, commutative operator into some canonical
order. This makes CSEing them possible.


Ciao,
Michael.
Zack Weinberg
2003-08-01 01:50:53 UTC
Permalink
Post by Michael Matz
Hi,
Post by Roger Sayle
extern int z1, z2;
void foo(int x, int y, int z)
{
z1 = (x+y)+z;
z2 = x+(y+z);
}
gets optimized to z2 = z1 with MSVC but not by GCC.
I have a somewhat ugly patch for this problem which basically reorders all
operands of an associative, commutative operator into some canonical
order. This makes CSEing them possible.
We have to be careful ... is this optimization not forbidden in
Fortran? (with the parentheses present, anyway)

zw
Michael Matz
2003-08-01 01:55:58 UTC
Permalink
Hi,
Post by Michael Matz
I have a somewhat ugly patch for this problem which basically reorders all
operands of an associative, commutative operator into some canonical
order. This makes CSEing them possible.
We have to be careful ... is this optimization not forbidden in Fortran?
(with the parentheses present, anyway)
I placed it at the same point were all the other reassociations are
already taking place in fold-const, so it shouldn't be more unsafe than it
was. For the curious I attach some version in the works. It's relative
to some 3.3 version, though and contains some dead code.


Ciao,
Michael.

Index: fold-const.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fold-const.c,v
retrieving revision 1.226.2.9
diff -u -p -r1.226.2.9 fold-const.c
--- fold-const.c 21 Jul 2003 19:07:12 -0000 1.226.2.9
+++ fold-const.c 30 Jul 2003 21:29:05 -0000
@@ -1011,6 +1011,233 @@ associate_trees (t1, t2, code, type)

return fold (build (code, type, convert (type, t1), convert (type, t2)));
}
+
+static tree tree_get_index PARAMS ((tree));
+static int indirect_ref_zero_ofs_p PARAMS ((tree));
+static int indirect_ref_nonzero_ofs_p PARAMS ((tree));
+static tree associate_trees_refs PARAMS ((tree, tree, enum tree_code, tree));
+static void split_tree_indirect_refs PARAMS ((tree, enum tree_code, tree));
+static void split_tree_for_reorder PARAMS ((tree, enum tree_code, tree));
+static int compare_trees PARAMS ((const PTR, const PTR));
+static tree reorder_memrefs PARAMS ((tree, tree, enum tree_code, tree));
+static tree reorder_summands PARAMS ((tree, tree, enum tree_code, tree));
+
+static tree
+tree_get_index (t)
+ tree t;
+{
+ if (TREE_CODE (t) == ARRAY_REF)
+ return TREE_OPERAND (t, 1);
+ else
+ {
+ t = TREE_OPERAND (t, 0);
+ if (TREE_CODE (t) != PLUS_EXPR)
+ return NULL_TREE;
+ return TREE_OPERAND (t, 1);
+ }
+}
+
+static int
+indirect_ref_nonzero_ofs_p (t)
+ tree t;
+{
+ tree i = tree_get_index (t);
+ if (i && TREE_CODE (i) == INTEGER_CST && !integer_zerop (i))
+ return 1;
+ return 0;
+}
+
+static int
+indirect_ref_zero_ofs_p (t)
+ tree t;
+{
+ tree i = tree_get_index (t);
+ if (!i)
+ return 1;
+ if (TREE_CODE (i) == INTEGER_CST && integer_zerop (i))
+ return 1;
+ return 0;
+}
+
+static tree
+associate_trees_refs (t1, t2, code, type)
+ tree t1, t2;
+ enum tree_code code;
+ tree type;
+{
+ if (t1 == 0)
+ return t2;
+ else if (t2 == 0)
+ return t1;
+ return build (code, type, convert (type, t1), convert (type, t2));
+}
+
+#define MAX_SUMMANDS 100
+static tree merged_z, merged_nz, merged_rest;
+static tree summands[MAX_SUMMANDS];
+static int num_summands;
+static void
+split_tree_indirect_refs (in, code, type)
+ tree in;
+ enum tree_code code;
+ tree type;
+{
+ /* Strip any conversions that don't change the machine mode or signedness. */
+ STRIP_SIGN_NOPS (in);
+
+ if (TREE_CODE (in) == INDIRECT_REF || TREE_CODE (in) == ARRAY_REF)
+ {
+ if (indirect_ref_zero_ofs_p (in))
+ merged_z = associate_trees_refs (merged_z, in, code, type);
+ else if (indirect_ref_nonzero_ofs_p (in))
+ merged_nz = associate_trees_refs (merged_nz, in, code, type);
+ else
+ merged_rest = associate_trees_refs (merged_rest, in, code, type);
+ }
+ else if (TREE_CODE (in) == code)
+ {
+ tree op0 = TREE_OPERAND (in, 0);
+ tree op1 = TREE_OPERAND (in, 1);
+ split_tree_indirect_refs (op0, code, type);
+ split_tree_indirect_refs (op1, code, type);
+ }
+ else
+ merged_rest = associate_trees_refs (merged_rest, in, code, type);
+}
+
+static void
+split_tree_for_reorder (in, code, type)
+ tree in;
+ enum tree_code code;
+ tree type;
+{
+ /* Strip any conversions that don't change the machine mode or signedness. */
+ STRIP_SIGN_NOPS (in);
+
+ if ((TREE_CODE (in) == INDIRECT_REF
+ || TREE_CODE (in) == ARRAY_REF
+ || TREE_CODE (in) == VAR_DECL)
+ && num_summands < MAX_SUMMANDS)
+ summands[num_summands++] = in;
+ else if (TREE_CODE (in) == code)
+ {
+ tree op0 = TREE_OPERAND (in, 0);
+ tree op1 = TREE_OPERAND (in, 1);
+ split_tree_for_reorder (op0, code, type);
+ split_tree_for_reorder (op1, code, type);
+ }
+ else
+ merged_rest = associate_trees_refs (merged_rest, in, code, type);
+}
+
+static int
+compare_trees (v1, v2)
+ const PTR v1;
+ const PTR v2;
+{
+ tree t1 = *((const tree *) v1);
+ tree t2 = *((const tree *) v2);
+
+ if (t1 == t2)
+ return 0;
+ /* Everything besides var_decls and indirect_refs last. */
+ if (TREE_CODE (t1) != INDIRECT_REF
+ && TREE_CODE (t1) != ARRAY_REF
+ && TREE_CODE (t1) != VAR_DECL)
+ return 1;
+ if (TREE_CODE (t2) != INDIRECT_REF
+ && TREE_CODE (t2) != ARRAY_REF
+ && TREE_CODE (t2) != VAR_DECL)
+ return -1;
+ /* All indirect_refs with nonzero index before var_decls.
+ All indirect_refs with zero index after var_decls. */
+ if (TREE_CODE (t1) != TREE_CODE (t2))
+ {
+ if (TREE_CODE (t1) == INDIRECT_REF || TREE_CODE (t1) == ARRAY_REF)
+ {
+ if (indirect_ref_nonzero_ofs_p (t1))
+ return -1;
+ else
+ return 1;
+ }
+ /* t2 is a INDIRECT_REF or ARRAY_REF, so we can call that without
+ checking. */
+ else if (indirect_ref_nonzero_ofs_p (t2))
+ return 1;
+ else
+ return -1;
+ }
+ if (TREE_CODE (t1) == VAR_DECL)
+ {
+ if (IDENTIFIER_HASH_VALUE (DECL_NAME (t1))
+ < IDENTIFIER_HASH_VALUE (DECL_NAME (t2)))
+ return -1;
+ else
+ return 1;
+ }
+ /* We have two indirect_refs here. */
+ if (indirect_ref_nonzero_ofs_p (t1) && indirect_ref_nonzero_ofs_p (t2))
+ /* As both had non-zero index, we know that tree_get_index() really
+ returns INTEGER_CST. */
+ return tree_int_cst_compare (tree_get_index (t1), tree_get_index (t2));
+ else if (indirect_ref_zero_ofs_p (t1))
+ return 1;
+ else
+ return -1;
+}
+
+static tree
+reorder_memrefs (in0, in1, code, type)
+ tree in0, in1;
+ enum tree_code code;
+ tree type;
+{
+ tree ret = 0;
+ merged_z = 0;
+ merged_nz = 0;
+ merged_rest = 0;
+ /* XXX We can't yet handle MINUS_EXPR. */
+ if (code == MINUS_EXPR)
+ return 0;
+ if (in0)
+ split_tree_indirect_refs (in0, code, type);
+ if (in1)
+ split_tree_indirect_refs (in1, code, type);
+ if (merged_nz || merged_z)
+ {
+ ret = associate_trees_refs (merged_nz, merged_z, code, type);
+ ret = associate_trees_refs (ret, merged_rest, code, type);
+ }
+ return ret;
+}
+
+static tree
+reorder_summands (in0, in1, code, type)
+ tree in0, in1;
+ enum tree_code code;
+ tree type;
+{
+ tree ret = 0;
+ merged_rest = 0;
+ num_summands = 0;
+ /* XXX We can't yet handle MINUS_EXPR. */
+ if (code == MINUS_EXPR)
+ return 0;
+ if (in0)
+ split_tree_for_reorder (in0, code, type);
+ if (in1)
+ split_tree_for_reorder (in1, code, type);
+ if (num_summands > 2)
+ {
+ int i;
+ qsort (summands, num_summands, sizeof (tree), compare_trees);
+ ret = summands[0];
+ for (i = 1; i < num_summands; i++)
+ ret = associate_trees_refs (ret, summands[i], code, type);
+ ret = associate_trees_refs (ret, merged_rest, code, type);
+ }
+ return ret;
+}

/* Combine two integer constants ARG1 and ARG2 under operation CODE
to produce a new constant.
@@ -5378,11 +5605,16 @@ fold (expr)
+ (lit0 != 0) + (lit1 != 0)
+ (minus_lit0 != 0) + (minus_lit1 != 0)))
{
+ tree tmp;
+ tmp = reorder_summands (var0, var1, code, type);
/* Recombine MINUS_EXPR operands by using PLUS_EXPR. */
if (code == MINUS_EXPR)
code = PLUS_EXPR;

- var0 = associate_trees (var0, var1, code, type);
+ if (tmp)
+ var0 = tmp;
+ else
+ var0 = associate_trees (var0, var1, code, type);
con0 = associate_trees (con0, con1, code, type);
lit0 = associate_trees (lit0, lit1, code, type);
minus_lit0 = associate_trees (minus_lit0, minus_lit1, code, type);
@@ -5424,6 +5656,9 @@ fold (expr)
con0 = associate_trees (con0, lit0, code, type);
return convert (type, associate_trees (var0, con0, code, type));
}
+ var0 = reorder_summands (arg0, arg1, code, type);
+ if (var0)
+ return convert (type, var0);
}

binary:
Roger Sayle
2003-08-01 03:35:44 UTC
Permalink
Hi Michael,

Cool! Your proto-patch seems to have a lot of potential. Sorting the
operands is exactly the sort of "first cut" I had in mind. The cleverness
is in getting a good comparison/ordering operator.

My one suggestion...
Post by Michael Matz
+static int
+compare_trees (v1, v2)
+ const PTR v1;
+ const PTR v2;
+{
+ tree t1 = *((const tree *) v1);
+ tree t2 = *((const tree *) v2);
+
+ if (t1 == t2)
+ return 0;
+ /* Everything besides var_decls and indirect_refs last. */
+ if (TREE_CODE (t1) != INDIRECT_REF
+ && TREE_CODE (t1) != ARRAY_REF
+ && TREE_CODE (t1) != VAR_DECL)
+ return 1;
+ if (TREE_CODE (t2) != INDIRECT_REF
+ && TREE_CODE (t2) != ARRAY_REF
+ && TREE_CODE (t2) != VAR_DECL)
+ return -1;
A scheme like the one used in rtlanal.c's swap_commutative_operands
would appear to be more scalable here. For example, we want to move
all of the constants together. Eventually, we'd want to order
x*c next to x in an addition chain and pow(x,c) next to x in a
multiplication chain. But this level of tweaking would require a
very flexible "comparison" infrastructure.

It should also be possible to reuse your compare_trees to provide
a tree-level swap_commutative_operands, which could both eliminate
and simplify a lot of code in fold-const.c. To give a trivial
example, the patch in the subject line had to test for both
x*pow(x,c) and pow(x,c)*x, one of which wouldn't be required if
we had a canonical ordering.

Just my thoughts. I think the concept itself looks sound. Hopefully,
the improvements in CSE and reduction in register pressure will outweigh
the loss of parallelism available to the scheduler.

Roger
--
Michael Matz
2003-08-01 03:53:40 UTC
Permalink
Hi,
Post by Roger Sayle
A scheme like the one used in rtlanal.c's swap_commutative_operands
would appear to be more scalable here. For example, we want to move
all of the constants together.
Yes. Although note, that constants and literals are split away before the
reorder_summands() function sees it by the already existing machinery in
fold-const (split_trees and friends). Nevertheless this can possibly be
merged with a general resort-everything.
Post by Roger Sayle
Eventually, we'd want to order x*c next to x in an addition chain and
pow(x,c) next to x in a multiplication chain. But this level of
tweaking would require a very flexible "comparison" infrastructure.
Indeed. It's currently specifically only doing what I needed, namely put
all indirect_refs or array_refs with non-zero index first, then all direct
var_decls, last all indirect_refs and array_refs with zero index (and then
all unhandled summands). (The underlying reason having to do with the
fact that non-zero based refs can't alias with scalars) And of course all
operands in the same group in some fixed order.
Post by Roger Sayle
Just my thoughts. I think the concept itself looks sound. Hopefully,
the improvements in CSE and reduction in register pressure will outweigh
the loss of parallelism available to the scheduler.
The current version shouldn't change the the parallelism systematically.
It just reorders operands, but doesn't change for instance the number of
operators. With an extended version of course this would be the case.
But it does fix some (artificial I admit) CSE cases.


Ciao,
Michael.
Roger Sayle
2003-08-01 13:24:08 UTC
Permalink
Hi Michael,
Post by Michael Matz
The current version shouldn't change the the parallelism systematically.
It just reorders operands, but doesn't change for instance the number of
operators. With an extended version of course this would be the case.
But it does fix some (artificial I admit) CSE cases.
But doesn't it linearlize the expression tree? For example, turning
(a+b)+(c+d) into ((a+b)+c)+d. This has the same number of operators
but in the original (a+b) and (c+d) could be evaluated in parallel or
scheduled to overlap in a pipelined processor.

I haven't read your patch carefully, so I might be completely wrong!

Roger
--
Richard Henderson
2003-08-01 20:36:37 UTC
Permalink
Post by Michael Matz
I placed it at the same point were all the other reassociations are
already taking place in fold-const, so it shouldn't be more unsafe than it
was.
We *should* still add a bit or code that says don't do that. This
point is one reason Java doesn't call into the generic fold routines.


r~
Roger Sayle
2003-08-01 23:40:22 UTC
Permalink
Post by Richard Henderson
Post by Michael Matz
I placed it at the same point were all the other reassociations are
already taking place in fold-const, so it shouldn't be more unsafe
than it was.
We *should* still add a bit or code that says don't do that. This
point is one reason Java doesn't call into the generic fold routines.
Is there an tree code that front-ends can use to act as an association
delimiter, so that fold won't tranform expressions that have been suitably
structured by the user? Perhaps COMPOUND_EXPR or STMT_EXPR? Its unfair
to criticize the middle-end, if no hints are provided by the front-ends.
This would also help with FORTRAN expressions such as "(A+B+C)+D" where
some operators may be transformed/reassociated whilst others can't.

Roger
--
Richard Henderson
2003-08-02 00:44:27 UTC
Permalink
Post by Roger Sayle
Is there an tree code that front-ends can use to act as an association
delimiter, so that fold won't tranform expressions that have been suitably
structured by the user?
Not yet.


r~
Joseph S. Myers
2003-08-02 01:13:38 UTC
Permalink
Post by Roger Sayle
Is there an tree code that front-ends can use to act as an association
delimiter, so that fold won't tranform expressions that have been suitably
structured by the user? Perhaps COMPOUND_EXPR or STMT_EXPR? Its unfair
to criticize the middle-end, if no hints are provided by the front-ends.
This would also help with FORTRAN expressions such as "(A+B+C)+D" where
some operators may be transformed/reassociated whilst others can't.
That suggests that GENERIC (or FORTRAN-specific trees) should have
variants of PLUS_EXPR etc. that take more than two arguments - to
represent the reassociatable additions (the existing forms being those for
C which can't be reassociated unless it is provable this is safe or
-funsafe-math-optimizations is used).

You then have the problem of what level these optimizations go at (given
tree-ssa). A lot of what fold-const does would make sense to move to the
GIMPLE level, after constant propagation etc. (and so that breaking up
complex expressions manually using temporary variables doesn't affect
optimization). But language-permitted reassociations would need to be
done before or during the conversion to GIMPLE.

fold-const.c does many different things, and the lack of clear rules about
what is legitimate causes the problems for Java etc.. It would be nice to
have a clear separation between:

* Constant folding needed by the language definition as part of semantic
analysis. This should be the *only* folding done during parsing and
semantic analysis and diagnosis of constraint violations. This is
inherently language-specific but when it actually comes down to an
operation on constant operands, central helper routines are needed
somewhere.

* Folding for optimization, whether done on full expression trees or on
GIMPLE. Benchmarks would be needed to determine whether some or all of
this should be disabled at -O0. As this comes after semantic analysis,
and hopefully on language-independent trees (GENERIC or GIMPLE) with
well-defined semantics, language-specific problems shouldn't arise at this
stage.

In particular, some folding in fold-const.c (and indeed some C front end
tree optimizations) is affected by -pedantic - necessary for some language
constraints, but hardly appropriate in a file shared by many front ends.
A cleaner separation would avoid any possibility of code generation
depending on -pedantic in this way.

(Building potentially bigger trees for full expressions before doing most
of the folding could affect performance in either direction - making it
worse because of the bigger trees involved, or making it better by
improving cache usage through less intermixing of parsing and optimization
code.)
--
Joseph S. Myers
***@polyomino.org.uk
Tom Tromey
2003-08-02 16:11:04 UTC
Permalink
Joseph> * Constant folding needed by the language definition as part
Joseph> of semantic analysis. This should be the *only* folding done
Joseph> during parsing and semantic analysis and diagnosis of
Joseph> constraint violations.

Java also needs different kinds of floating point operations depending
on whether the context is strictfp. In a strictfp context, floating
point math must give more precise answers. I'm not an fp expert, but
my understanding is that basically in a non-strict-fp, we can generate
faster code on x86, and in a strictfp context we have to generate
slower but more accurate code. (I've read that on non-x86
architectures the code can often end up being the same; supposedly
this has to do with oddities of x86 fp code.)

Last time this topic came up, I thought we had talked about the
possibility of having a flag on the tree nodes that would tell the
back end what to do. Maybe I'm misremembering though.


I think step 1 here should be to collect a list of requirements that
is as complete as possible. I'd be happy to send a list of Java needs
(if someone actually wants that :-).

Tom
Joseph S. Myers
2003-08-02 21:30:35 UTC
Permalink
Post by Tom Tromey
Last time this topic came up, I thought we had talked about the
possibility of having a flag on the tree nodes that would tell the
back end what to do. Maybe I'm misremembering though.
C99 needs multiple such flags for the standard pragmas (FP_CONTRACT,
FENV_ACCESS, CX_LIMITED_RANGE). Some flags may need to be carried down to
the RTL level.
--
Joseph S. Myers
***@polyomino.org.uk
Segher Boessenkool
2003-08-01 18:02:14 UTC
Permalink
Post by Roger Sayle
Post by Richard Henderson
Post by Roger Sayle
I haven't checked but I'd guess GCC also handles "x+x+x+x+x+x" the same
way as my patch, i.e. by representing it internally by a multiplication.
Yes, it should.
fldl 4(%esp)
fld %st(0)
fadd %st(1), %st
fadd %st(1), %st
fadd %st(1), %st
fadd %st(1), %st
faddp %st, %st(1)
ret
Even with -ffast-math. Yet another new entry on my TODO list!
simplify-rtx.c:simplify_plus_minus() will only work on a limited
number of addends (luckily, as it is at least quadratic, and probably
much worse), but it will fail completely, not doing any substitutions,
if there turn out to be more addends. Hurray.

Try also

a+b+a+b+a+b+a+b
a-a+a-a+a-a+a-a


Segher
Roger Sayle
2003-08-03 19:17:26 UTC
Permalink
The following patch fixes a deficiency in GCC's algebric simplification
optimizations, spotted recently during a discussion with Richard
Henderson about optimizing x*x*x*x*x*x into x**6. This patch uses
almost identical logic to achieve the same for addition->multiplication.

We convert x+x into 2*x, the new preferred tree-level representation
and disable the conversion of 2*x back into x+x. We then provide
the new term rewriting rules x*c+x and c*x+x into x*(c+1) for floating
point expressions with constant values c. Additionally, we also
provide the simplication x*c1 + x*c2 into x*(c1+c2). Finally, even
though 2.0*x is converted back into x+x during RTL simplification, I
also tweaked expand_mult to do the same during RTL expansion.

These optimizations are already performed by MSVC, for example.

The following patch has been tested on i686-pc-linux-gnu with a full
"make bootstrap", all languages except treelang, and regression tested
with a top-level "make -k check" with no new failures.

Ok for mainline?


2003-08-03 Roger Sayle <***@eyesopen.com>

* fold-const.c (fold <PLUS_EXPR>): Transform x+x into x*2.0.
Optimize x*c+x and x+x*c into x*(c+1) and x*c1+x*c2 into x*(c1+c2)
for floating point expressions with -ffast-math.
(fold <MULT_EXPR>): Don't transform x*2.0 into x+x.
* expmed.c (expand_mult): Wrap long line. Expand x*2.0 as x+x.

* gcc.dg/20030803-1.c: New test case.


Index: fold-const.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fold-const.c,v
retrieving revision 1.287
diff -c -3 -p -r1.287 fold-const.c
*** fold-const.c 1 Aug 2003 00:36:52 -0000 1.287
--- fold-const.c 3 Aug 2003 15:33:19 -0000
*************** fold (tree expr)
*** 5659,5672 ****
same));
}
}

! /* See if ARG1 is zero and X + ARG1 reduces to X. */
! else if (fold_real_zero_addition_p (TREE_TYPE (arg0), arg1, 0))
! return non_lvalue (convert (type, arg0));

! /* Likewise if the operands are reversed. */
! else if (fold_real_zero_addition_p (TREE_TYPE (arg1), arg0, 0))
! return non_lvalue (convert (type, arg1));

bit_rotate:
/* (A << C1) + (A >> C2) if A is unsigned and C1+C2 is the size of A
--- 5659,5730 ----
same));
}
}
+ else
+ {
+ /* See if ARG1 is zero and X + ARG1 reduces to X. */
+ if (fold_real_zero_addition_p (TREE_TYPE (arg0), arg1, 0))
+ return non_lvalue (convert (type, arg0));

! /* Likewise if the operands are reversed. */
! if (fold_real_zero_addition_p (TREE_TYPE (arg1), arg0, 0))
! return non_lvalue (convert (type, arg1));
!
! /* Convert x+x into x*2.0. */
! if (operand_equal_p (arg0, arg1, 0))
! return fold (build (MULT_EXPR, type, arg0,
! build_real (type, dconst2)));
!
! /* Convert x*c+x into x*(c+1). */
! if (flag_unsafe_math_optimizations
! && TREE_CODE (arg0) == MULT_EXPR
! && TREE_CODE (TREE_OPERAND (arg0, 1)) == REAL_CST
! && ! TREE_CONSTANT_OVERFLOW (TREE_OPERAND (arg0, 1))
! && operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0))
! {
! REAL_VALUE_TYPE c;
!
! c = TREE_REAL_CST (TREE_OPERAND (arg0, 1));
! real_arithmetic (&c, PLUS_EXPR, &c, &dconst1);
! return fold (build (MULT_EXPR, type, arg1,
! build_real (type, c)));
! }
!
! /* Convert x+x*c into x*(c+1). */
! if (flag_unsafe_math_optimizations
! && TREE_CODE (arg1) == MULT_EXPR
! && TREE_CODE (TREE_OPERAND (arg1, 1)) == REAL_CST
! && ! TREE_CONSTANT_OVERFLOW (TREE_OPERAND (arg1, 1))
! && operand_equal_p (TREE_OPERAND (arg1, 0), arg0, 0))
! {
! REAL_VALUE_TYPE c;

! c = TREE_REAL_CST (TREE_OPERAND (arg1, 1));
! real_arithmetic (&c, PLUS_EXPR, &c, &dconst1);
! return fold (build (MULT_EXPR, type, arg0,
! build_real (type, c)));
! }
!
! /* Convert x*c1+x*c2 into x*(c1+c2). */
! if (flag_unsafe_math_optimizations
! && TREE_CODE (arg0) == MULT_EXPR
! && TREE_CODE (arg1) == MULT_EXPR
! && TREE_CODE (TREE_OPERAND (arg0, 1)) == REAL_CST
! && ! TREE_CONSTANT_OVERFLOW (TREE_OPERAND (arg0, 1))
! && TREE_CODE (TREE_OPERAND (arg1, 1)) == REAL_CST
! && ! TREE_CONSTANT_OVERFLOW (TREE_OPERAND (arg1, 1))
! && operand_equal_p (TREE_OPERAND (arg0, 0),
! TREE_OPERAND (arg1, 0), 0))
! {
! REAL_VALUE_TYPE c1, c2;
!
! c1 = TREE_REAL_CST (TREE_OPERAND (arg0, 1));
! c2 = TREE_REAL_CST (TREE_OPERAND (arg1, 1));
! real_arithmetic (&c1, PLUS_EXPR, &c1, &c2);
! return fold (build (MULT_EXPR, type,
! TREE_OPERAND (arg0, 0),
! build_real (type, c1)));
! }
! }

bit_rotate:
/* (A << C1) + (A >> C2) if A is unsigned and C1+C2 is the size of A
*************** fold (tree expr)
*** 5953,5967 ****
if (!HONOR_SNANS (TYPE_MODE (TREE_TYPE (arg0)))
&& real_minus_onep (arg1))
return fold (build1 (NEGATE_EXPR, type, arg0));
-
- /* x*2 is x+x */
- if (! wins && real_twop (arg1)
- && (*lang_hooks.decls.global_bindings_p) () == 0
- && ! CONTAINS_PLACEHOLDER_P (arg0))
- {
- tree arg = save_expr (arg0);
- return fold (build (PLUS_EXPR, type, arg, arg));
- }

if (flag_unsafe_math_optimizations)
{
--- 6011,6016 ----
Index: expmed.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/expmed.c,v
retrieving revision 1.138
diff -c -3 -p -r1.138 expmed.c
*** expmed.c 19 Jul 2003 14:47:02 -0000 1.138
--- expmed.c 3 Aug 2003 15:33:21 -0000
*************** synth_mult (struct algorithm *alg_out, u
*** 2300,2306 ****
you should swap the two operands if OP0 would be constant. */

rtx
! expand_mult (enum machine_mode mode, rtx op0, rtx op1, rtx target, int unsignedp)
{
rtx const_op1 = op1;

--- 2300,2307 ----
you should swap the two operands if OP0 would be constant. */

rtx
! expand_mult (enum machine_mode mode, rtx op0, rtx op1, rtx target,
! int unsignedp)
{
rtx const_op1 = op1;

*************** expand_mult (enum machine_mode mode, rtx
*** 2511,2516 ****
--- 2512,2539 ----
abort ();

return accum;
+ }
+ }
+
+ if (GET_CODE (op0) == CONST_DOUBLE)
+ {
+ rtx temp = op0;
+ op0 = op1;
+ op1 = temp;
+ }
+
+ /* Expand x*2.0 as x+x. */
+ if (GET_CODE (op1) == CONST_DOUBLE
+ && GET_MODE_CLASS (mode) == MODE_FLOAT)
+ {
+ REAL_VALUE_TYPE d;
+ REAL_VALUE_FROM_CONST_DOUBLE (d, op1);
+
+ if (REAL_VALUES_EQUAL (d, dconst2))
+ {
+ op0 = force_reg (GET_MODE (op0), op0);
+ return expand_binop (mode, add_optab, op0, op0,
+ target, unsignedp, OPTAB_LIB_WIDEN);
}
}


/* Copyright (C) 2003 Free Software Foundation.

Check that constant folding of mathematical expressions doesn't
break anything.

Written by Roger Sayle, 3rd August 2003. */

/* { dg-do link } */
/* { dg-options "-O2 -ffast-math" } */

extern void link_error(void);

void test(double x)
{
if (x+x != 2.0*x)
link_error ();
if (x+x != x*2.0)
link_error ();

if (x+x+x != 3.0*x)
link_error ();
if (x+x+x != x*3.0)
link_error ();

if ((x+x)+x != 3.0*x)
link_error ();
if ((x+x)+x != x*3.0)
link_error ();

if (x+(x+x) != 3.0*x)
link_error ();
if (x+(x+x) != x*3.0)
link_error ();

if (x+4.0*x != 5.0*x)
link_error ();
if (x+4.0*x != x*5.0)
link_error ();
if (x+x*4.0 != 5.0*x)
link_error ();
if (x+x*4.0 != x*5.0)
link_error ();
if (4.0*x+x != 5.0*x)
link_error ();
if (4.0*x+x != x*5.0)
link_error ();
if (x*4.0+x != 5.0*x)
link_error ();
if (x*4.0+x != x*5.0)
link_error ();

if (3.0*x + 5.0*x != 8.0*x)
link_error ();
if (3.0*x + 5.0*x != x*8.0)
link_error ();
if (x*3.0 + 5.0*x != 8.0*x)
link_error ();
if (x*3.0 + 5.0*x != x*8.0)
link_error ();
if (3.0*x + x*5.0 != 8.0*x)
link_error ();
if (3.0*x + x*5.0 != x*8.0)
link_error ();
if (x*3.0 + x*5.0 != 8.0*x)
link_error ();
if (x*3.0 + x*5.0 != x*8.0)
link_error ();
}

int main()
{
test(2.0);
return 0;
}


Roger
--
Roger Sayle, E-mail: ***@eyesopen.com
OpenEye Scientific Software, WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road, Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507. Fax: (+1) 505-473-0833
Richard Henderson
2003-08-04 22:02:45 UTC
Permalink
Post by Roger Sayle
* fold-const.c (fold <PLUS_EXPR>): Transform x+x into x*2.0.
Optimize x*c+x and x+x*c into x*(c+1) and x*c1+x*c2 into x*(c1+c2)
for floating point expressions with -ffast-math.
(fold <MULT_EXPR>): Don't transform x*2.0 into x+x.
* expmed.c (expand_mult): Wrap long line. Expand x*2.0 as x+x.
* gcc.dg/20030803-1.c: New test case.
Ok.


r~

Richard Kenner
2003-08-01 02:32:31 UTC
Permalink
Obviously, this is missing many comments ...
Michael Matz
2003-08-01 02:30:14 UTC
Permalink
Hi,
Post by Richard Kenner
Obviously, this is missing many comments ...
Like I said it's an experiment still, not thought for inclusion.


Ciao,
Michael.
Loading...